Securely Inserting User Generated Content and JSON Into Templates – A Cocktail Approach



Securely Inserting User Generated Content and JSON Into Templates – A Cocktail Approach

0 0


eng-hoedown


On Github amira-eb / eng-hoedown

Securely Inserting User Generated Content and JSON Into Templates

A Cocktail Approach

Created by Amira Anuar

Securely? Why?

"Browsers are extremely finicky beasts." - pmp

At the moment you may feel like this, by the end of the presentation you may feel more like...
this.

Disclaimer

insert image/note here related to this being a smorgasboard of various things and that I am not an expert

There is a LOT more to know when it comes to security than what I will be covering. This is called a cocktail approach as I'll be covering a little bit of a, b,and c, and x, y, and z, but hopefully the most important things you need to know when it comes to escaping JSON and User generated content within your templates.

Different Browser Contexts

HTML Body -- <body>${ text }</body> Element Attribute -- <a href="" onclick="{...}"/> Links With JS -- <a href="javascript:alert(1)"> JS String Literal -- <script>var x='${foo}' >/script> JSON Body Responses E-mail addresses URLs So on...

So, it's really important that engineers understand the context of where their untrusted data is being added to the template when thinking about the encoding. Keep these things in mind as I continue this presentation.

An Example of Vulnerability

  < script >
      var x = '${ foo | n };'
  < /script >

  foo = "&#39;; alert(document.cookie);'"

Why is this vulnerable?

JavaScript on the client side will first decode the HTML entities, leaving:

  var x = " ;alert(document.cookie);"

First I'm going to jump right into showing an example of why we do this, which is to prevent site vulnerabilities. This is assuming I override the other j# variables. Otherwise we will get another error.

A Common XSS Issue at EB

Default encoder for core v. django templates is DIFFERENT. In core, default encoder for template variables is raw. In Django, default encoder is HTML entity encode, or | h
  from ebapps.organizers.models import Organizer
    ...
    org_desc = organizer.description  ## the raw database field
    p.setValue("org_desc", org_desc)
    p.display()

Then in the core template:

    <h2>${ org_desc }</h2>

Take this piece of code for example. "raw" meaning the template expects that either the data is already sanitized and safe for output, or another filter is applied in the template directly ( |h, |escapejs, etc.) For Django views, so if you pass in some raw untreated html, you will likely get something safe to render. This bites us when, in a core template, we think it should work like django but it doesn't.

Never Trust Your Inputs!

Never trust your inputs, just like you should never trust a friend when they tell you a drink is weak and has like one shot in it. Don't just assume it's good for you, and vet the contents first.This is very important when it comes to templates and escaping, because we accept a lot of user-generated content and inputs, store them in our database, and then use those inputs in the formulation of the response bodies back to the clients You can have a variety of injection attacks such as XSS, SQLi, Command Injection, XXE and others, things I won't go into here Furthermore, you can have non functional ugly sites that break what the product requirements are (like being able to click register)

How?

This is the cocktail part of the talk where I will talk about the various methods of escaping and stripping, and when you should probably use them.

clean_html, strip_html, strip_tags

This is a common question - what are these doing and when do we use them?

clean_html is like [alcohol analogy?]

Takes HTML, removes < script > tags and other attack vendors

Closes un-closed HTML tags

Use this when you want to take the HTML data provided by the user, and display it on the page

DEMO in the app. However, clean_html is not performant. If you have an explicit whitelist of tags...

strip_html

Now uses bleach, which is more performant

Pass in a 'whitelist' of tags

All HTML is stripped but those tags

Use strip_html instead of clean_html when you want no HTML tags, or just a few

I don't actually understand strip tags vs strip html, Paul?

However, clean_html is not performant. If you have an explicit whitelist of tags...

as_safe_markup

paul can we chat on what I should say on this? is it preferred to strip html and clean html? seems way better to me

escape_js, django.utils.html.escape

Django 'escape' - Escapes a String's HTML

< is converted to & lt;

is converted to & gt;

' (single quote) is converted to & #39;

" (double quote) is converted to & quot;

& is converted to & amp;

Does NOT get called if the string has already been "marked safe"

More on mark safe later (what should I say about when this should be used?)

escapejs

Escapes characters for use in JavaScript strings

testing\r\njavascript \'string" <b>escaping

This does not make the string safe for use in HTML, but does protect you from syntax errors when using templates to generate JavaScript/JSON.

escapejs is great, but safejson is better

Outputs a Python data structure as JSON

Can pass in strings or more complex structures, and less likely to miss cases

Example

Like escapejs, this ensures that dangerous sequences are correctly escaped for use within a script block without opening a potential XSS hole.

| n

From what I can tell, we should try to not use this. It basically tells us that this string is 'safe' so we do not double encode it, but that is dangerous in case i t is actually not safe. But we need it for translations...?

| h

Unclear on what this does and if I should cover it.

Past relevant security issues and their fixes

If I have time I may go over a couple of these such as https://jira.evbhome.com/browse/EB-18312 which is fixed by checking if is_url.

So far, we reviewed...

  • strip_html
  • clean_html
  • | n
  • | h
  • safejson
  • escapejs
  • mark_safe
  • as_safe_markup

Yes and no. There are a lot of different things I could've chosen to talk about, and many other edge cases and scenarios I didn't consider. But for now....

Have fun safely and securely embedding user generated content / json!

Questions?

Thank you to Paul, Tamara, Simon, and Darren for your knowledge and help! (I owe you all a drink.)

Securely Inserting User Generated Content and JSON Into Templates A Cocktail Approach Created by Amira Anuar