Securely Inserting User Generated Content and JSON Into Templates
A Cocktail Approach
Created by Amira Anuar
Securely? Why?
"Browsers are extremely finicky beasts." - pmp
At the moment you may feel like this, by the end of the presentation you may feel more like...
Disclaimer
insert image/note here related to this being a smorgasboard of various things and that I am not an expert
There is a LOT more to know when it comes to security than what I will be covering. This is called a cocktail approach as I'll be covering a little bit of a, b,and c, and x, y, and z, but hopefully the most important things you need to know when it comes to escaping JSON and User generated content within your templates.
Different Browser Contexts
HTML Body -- <body>${ text }</body>
Element Attribute -- <a href="" onclick="{...}"/>
Links With JS -- <a href="javascript:alert(1)">
JS String Literal -- <script>var x='${foo}' >/script>
JSON Body Responses
E-mail addresses
URLs
So on...
So, it's really important that engineers understand the context of where their untrusted data is being added to the template when thinking about the encoding.
Keep these things in mind as I continue this presentation.
An Example of Vulnerability
< script >
var x = '${ foo | n };'
< /script >
foo = "'; alert(document.cookie);'"
Why is this vulnerable?
JavaScript on the client side will first decode the HTML entities, leaving:
var x = " ;alert(document.cookie);"
First I'm going to jump right into showing an example of why we do this, which is to prevent site vulnerabilities.
This is assuming I override the other j# variables. Otherwise we will get another error.
A Common XSS Issue at EB
Default encoder for core v. django templates is DIFFERENT.
In core, default encoder for template variables is raw.
In Django, default encoder is HTML entity encode, or | h
from ebapps.organizers.models import Organizer
...
org_desc = organizer.description ## the raw database field
p.setValue("org_desc", org_desc)
p.display()
Then in the core template:
<h2>${ org_desc }</h2>
Take this piece of code for example.
"raw" meaning the template expects that either the data is already sanitized and safe for output,
or another filter is applied in the template directly ( |h, |escapejs, etc.)
For Django views, so if you pass in some raw untreated html, you will likely get something safe to render.
This bites us when, in a core template, we think it should work like django but it doesn't.
Never Trust Your Inputs!
Never trust your inputs, just like you should never trust a friend when they tell you a drink is weak and has like one shot in it. Don't just assume it's good for you, and vet the contents first.This is very important when it comes to templates and escaping, because we accept a lot of user-generated content and inputs, store them in our database, and then use those inputs in the formulation of the response bodies back to the clients
You can have a variety of injection attacks such as XSS, SQLi, Command Injection, XXE and others, things I won't go into here
Furthermore, you can have non functional ugly sites that break what the product requirements are (like being able to click register)
How?
This is the cocktail part of the talk where I will talk about the various methods of escaping and stripping, and when you should probably use them.
clean_html, strip_html, strip_tags
This is a common question - what are these doing and when do we use them?
clean_html is like [alcohol analogy?]
Takes HTML, removes < script > tags and other attack vendors
Closes un-closed HTML tags
Use this when you want to take the HTML data provided by the user, and display it on the page
DEMO in the app. However, clean_html is not performant. If you have an explicit whitelist of tags...
strip_html
Now uses bleach, which is more performant
Pass in a 'whitelist' of tags
All HTML is stripped but those tags
Use strip_html instead of clean_html when you want no HTML tags, or just a few
I don't actually understand strip tags vs strip html, Paul?
However, clean_html is not performant. If you have an explicit whitelist of tags...
as_safe_markup
paul can we chat on what I should say on this? is it preferred to strip html and clean html? seems way better to me
escape_js, django.utils.html.escape
Django 'escape' - Escapes a String's HTML
< is converted to & lt;
is converted to & gt;
' (single quote) is converted to & #39;
" (double quote) is converted to & quot;
& is converted to & amp;
Does NOT get called if the string has already been "marked safe"
More on mark safe later (what should I say about when this should be used?)
escapejs
Escapes characters for use in JavaScript strings
testing\r\njavascript \'string" <b>escaping
This does not make the string safe for use in HTML, but does protect you from syntax errors when using templates to generate JavaScript/JSON.
escapejs is great, but safejson is better
Outputs a Python data structure as JSON
Can pass in strings or more complex structures, and less likely to miss cases
Example
Like escapejs, this ensures that dangerous sequences are correctly escaped for use within a script block without opening a potential XSS hole.
| n
From what I can tell, we should try to not use this. It basically tells us that this string is 'safe' so we do not double encode it, but that is dangerous in case i t is actually not safe. But we need it for translations...?
| h
Unclear on what this does and if I should cover it.
So far, we reviewed...
- strip_html
- clean_html
- | n
- | h
- safejson
- escapejs
- mark_safe
- as_safe_markup
Yes and no. There are a lot of different things I could've chosen to talk about, and many other edge cases and scenarios I didn't consider.
But for now....
Have fun safely and securely embedding user generated content / json!
Questions?
Thank you to Paul, Tamara, Simon, and Darren for your knowledge and help! (I owe you all a drink.)
Securely Inserting User Generated Content and JSON Into Templates
A Cocktail Approach
Created by Amira Anuar