Best practice for preventing saving malicious client script in HTML - asp.net

We have an ASP.NET custom control that lets users enter HTML (similar to a Rich text box). We noticed that a user can potentially inject malicious client scripts within the <script> tag in the HTML view. I can validate HTML code on save to ensure that I remove any <script> elements.
Is this all I need to do? Are all other tags other than the <script> tag safe? If you were an attacker, what else would you attempt to do?
Any best practices I need to follow?
EDIT - How is the MS anti Xss library different from the native HtmlEncode for my purpose?

XSS (Cross Site Scripting) is a big a difficult subject to tackle correctly.
Instead of black-listing some tags (and missing some of the ways you may be attacked), it is better to decide on a set of tags that are OK for your site and only allowing them.
This in itself will not be enough, as you will have to catch all possible encodings an attacker might try and there are other things an attacker might try. There are anti-xss libraries that help - here is one from Microsoft.
For more information and guidance, see this OWASP article.

Have a look at this page:
http://ha.ckers.org/xss.html
to get an idea of different XSS attacks that somebody may try.

There's a whole lot to do when it comes to filtering out JavaScript from HTML. Here's a short list of some of the bigger points:
Multiple passes over the input is required to make sure that what you removed before doesn't create a new injection. If you're doing a single pass, things like <scr<script></script>ipt>alert("XSS!");</scr<script></script>ipt> will get past you since after your remove <script> tags from the string, you'll have created a new one.
Strip the use of the javascript: protocol in href and src attributes.
Strip embedded event handler attributes like onmouseover/out, onclick, onkeypress, etc.
White lists are safer than black lists. Only allow tags and attributes that you know are safe.
Make sure you're dealing with all the same character encoding. If you treat the input like ASCII (single byte) and the input has Unicode (multibyte) characters, you're going to get a nasty surprise.
Here's a more complete cheat sheet. Also, Oli linked to a good article at ha.ckers.org with samples to test your filtration.

Removing only the <script> tags will not be sufficient as there are lots of methods for encoding / hiding them in input. Most languages now have anti-xss and anti-csrf libraries and functions for filtering input. You should use one of these generally agreed upon libraries to filter your user input.
I'm not sure what the best options are in ASP.NET, but this might shed some light:
http://msdn.microsoft.com/en-us/library/ms998274.aspx

This is called a Cross Site Scripting (XSS) attack. They can be very hard to prevent, as there are a lot of surprising ways of getting JavaScript code to execute (javascript: URLs, sometimes CSS, object and iframe tags, etc).
The best approach is to whitelist tags, attributes, and types of URLs (and keep the whitelist as small as possible to do what you need) instead of blacklisting. That means that you only allow certain tags that you know are safe, rather than banning tags that you believe to be dangerous. This way, there are fewer possible ways for people to get an attack into your system, because tags that you didn't think about won't be allowed, rather than blacklisting where if you missed something, you will still have a vulnerability. Here's an example of a whitelist approach to sanitization.

Related

Custom CSS security

I'm doing work on a website, and a user can create a custom CSS stylesheet. I understand that there will always be a danger in this, but is there any way that I could make my validation more secure? I'm using this:
$customCSS = $_POST["submittedCustomCSS"]; //put user's submitted stylesheet into variable
$customCSS = htmlspecialchars($customCSS); //hopefully validate it?
file_put_contents("../custom.css", $customCSS); //save user's stylesheet
The page the custom CSS is displayed on is PHP-enabled, and the CSS is shown through <link rel="stylesheet" href="<?php echo $postID; ?>/custom.css">
Is there any way to make this more secure? Thanks in advance! :)
htmlspecialchars($customCSS); //hopefully validate it?
No, this is not sufficient. This may stop the CSS from escaping a </style> element in which it is embedded, but does nothing to prevent the CSS from styling arbitrary elements on the page, or from loading custom fonts or from abusing other problematic features of CSS whose security implications are still poorly understood.
If a custom stylesheet can be applied to any page that it's author cannot access, then you need to be significantly more strict than this. There are ways that custom stylesheets can be exploited to steal data like Credit-Card numbers or XSRF tokens that don't need to run JS.
For example, if one user can elect to use another user's custom stylesheet, then that could lead to a security vulnerability, and you should not require users to be able to read and vet a CSS file to use features of your site safely.
"Scriptless Attacks – Stealing the Pie Without Touching the Sill" explains some of the ways injected CSS can be problematic:
We show that CSS markup, which is traditionally considered to be only
used for decoration/display purposes, actually enables an
attacker to perform malicious activities.
...
We introduce several novel attacks that we call
scriptless attacks, as an attacker can obtain the credit card
number by injecting markup to this page without relying on
any kind of (JavaScript) code execution.
...
Neither of the discussed attacks depends on user interaction
on the victim’s part, but uses a mix of benign HTML, CSS
and Web Open Font Format (WOFF [23]) features combined
with a HTTP-request-based side channel to measure and ex-
filtrate almost arbitrary data displayed on the website.
Because Microsoft added CSS expressions as a proprietary extension to Internet Explorer, handling untrusted CSS securely is more complex than simply encoding it correctly. To do this properly you need to parse the CSS then only output things matching a whitelist. Unless you do this, it's trivial to inject JavaScript into the page for Internet Explorer visitors.
An alternative approach would be to only accept valid CSS, however I'd be concerned that Microsoft might try to sneak something in inside comments or something like they did with HTML.

How do I protect against XSS in CSS?

I'm looking at allowing users of our online system to control their own .css files with us by simply including them on a user by user basis.
What is the best way to avoid any potential XSS attacks in said files?
Is it possible to completely protect ourselves at all?
It would be possible for us to host the files for them and obviously then check them ourselves but it would be more convenient for our users to be able to update them as well.
The problem with allowing CSS is that many clever attacks can occur. CSS can be very dangerous indeed.
Some CSS expressions allow executing arbitrary JavaScript. This is difficult to prevent by blacklisting, so I'd suggest whitelisting.
Additionally, someone may create a CSS file that changes the page to impersonate another site, another page, or maybe it cleverly orients other elements on the page. Imagine if someone were able to position their own login form above your real one. Then they could intercept login requests. Depending on how your site is set up, this may or may not be possible; but be forewarned! Some know this as clickjacking.
Firstly, exercise caution, as others have said. Beyond that though, try and white-list the valid inputs you'd expect in the file. See if you can locate any libraries for your chosen framework (you haven't mentioned what this is), that can validate a string for CSS structure compliance.
The other thing you might to consider is parameterising certain CSS attributes and allowing users to configure them (i.e. color, font etc). This would significantly mitigate your risk as it takes out the ability to arbitrarily create your own malicious CSS (and conversely, create your own innocent CSS!)
As for your original question "Is it possible to completely protect ourselves at all?", that's easy - no!

What is the benefit of writing meaningful css .class and #id names?

What is the benefit of writing meaningful css .class and #id names? Do screen readers speak to help the user understand the meaning and purpose of content inside the tags?
Generally-speaking, it's beneficial for the developer/designer only.
Again, as all your recent questions on semantics, the answer stays the same:
It all depends on the data-context of the entity in question.
If your element holds a meaningful field, it is useful to assign it a class (even if you do not want to apply CSS to it) just to easily define that particular field:
<span class="username">Andrew Moore</span>
Doing so has the following advantages:
It easily identifies the field's content in your code.
It increases maintainability.
It helps parsers and third-party applications to fetch this field's value.
Microformats are just a larger example of this. Simply put, they are a set of pre-defined elements and attributes that hold a particular set of data, meant to ease parsing by third-party tools.
Other answers are good, but I will focus on the scraping/third party tools aspect here.
Case 1 is spiders and crawling like search engines. If they parse your page and see something like id="username", they will be more likely to figure out some meaning in that than id="div-style-32". Granted, I'm not sure Google is doing this sort of thing now, but it could be if more people were better about it.
Case 2 is people writing scripts to pull down the HTML and process it in order to extract its content as data. Pretty much anyone who wants to do this can with any markup, its just a matter of how annoying it is. Cleaner and more well described markup allows the scraper script to more easily find the information it needs due to it's increased semantics.
This also includes things like browser extensions or Greasemonkey scripts that allow users to alter the behavior of the site. It will be easier to create these modifications with cleaner markup.
But if you don't want people scraping or modifying your site with client side extension, there is little you can do about from a technical standpoint. You can't stop it, you can only make it more of a pain in the ass. And the benefits of maintainability for the site developers are huge. So really, why not?
In short it makes all the different things you or others could do with your site easier to do.
You don't do it for the machines but for the humans.
If we only cared about machines we'd still be coding in assembly :)

HTMLEncode script tags only

I'm working on StackQL.net, which is just a simple web site that allows you to run ad hoc tsql queries on the StackOverflow public dataset. It's ugly (I'm not a graphic designer), but it works.
One of the choices I made is that I do not want to html encode the entire contents of post bodies. This way, you see some of the formatting from the posts in your queries. It will even load images, and I'm okay with that.
But I am concerned that this will also leave <script> tags active. Someone could plant a malicious script in a stackoverflow answer; they could even immediately delete it, so no one sees it. One of the most common queries people try when they first visit is a simple Select * from posts, so with a little bit of timing a script like this could end up running in several people's browsers. I want to make sure this isn't a concern before I update to the (hopefully soon-to-be-released) October data export.
What is the best, safest way to make sure just script tags end up encoded?
You may want to modify the HTMLSanatize script to fit your purposes. It was written by Jeff Atwood to allow certain kinds of HTML to be shown. Since it was written for Stack Overflow, it'd fit your purpose as well.
I don't know whether it's 'up to date' with what Jeff currently has deployed, but it's a good starting point.
Don't forget onclick, onmouseover, etc or javascript: psuedo-urls (<img src="javascript:evil!Evil!">) or CSS (style="property: expression(evil!Evil!);") or…
There are a host of attack vectors beyond simple script elements.
Implement a white list, not a black list.
If the messages are in XHTML format then you could do an XSL transform and encode/strip tags and properties that you don't want. It gets a little easier if you use something like TinyMCE or CKEditor to provide a wysiwyg editor that outputs XHTML.
What about simply breaking the <script> tags? Escaping only < and > for that tag, ending up with <script>, could be one simple and easy way.
Of course links are another vector. You should also disable every instance of href='javascript:', and every attribute starting with on*.
Just to be sure, nuke it from orbit.
But I am concerned that this will also leave <script tags active.
Oh, that's just the beginning of HTML ‘malicious content’ that can cause cross-site scripting. There's also event handlers; inline, embedded and linked CSS (expressions, behaviors, bindings), Flash and other embeddable plugins, iframes to exploit sites, javascript: and other dangerous schemes (there are more than you think!) in every place that can accept a URL, meta-refresh, UTF-8 overlongs, UTF-7 mis-sniffing, data binding, VML and other non-HTML stuff, broken markup parsed as scripts by permissive browsers...
In short any quick-fix attempt to sanitise HTML with a simple regex will fail badly.
Either escape everything so that any HTML is displayed as plain text, or use a full parser-and-whitelist-based sanitiser. (And keep it up-to-date, because even that's a hard job and there are often newly-discovered holes in them.)
But aren't you using the same Markdown system as SO itself to render posts? That would be the obvious thing to do. I can't guarantee there are no holes in Markdown that would allow cross-site scripting (there certainly have been in the past and there are probably some more obscure ones still in there as it's quite a complicated system). But at least you'd be no more insecure than SO is!
Use a Regex to replace the script tags with the encoded tags. This will filter the tags which has the word "script" in it and HtmlEncode it. Thus, all the script tags such as <script>, </script> and <script type="text/javascript"> etc. will get encoded and will not encode other tags in the string.
Regex.Replace(text, #"</?(\w+)[^>]*>",
tag => tag.Groups[1].Value.ToLower().Contains("script") ? HttpUtility.HtmlEncode(tag.Value) : tag.Value,
RegexOptions.Singleline);

Browser WYSIWYG best practices

I am using a rich text editor on a web page. .NET has feature that prevent one from posting HTML tags, so I added a JavaScript snippet to change the angle brackets to and alias pair of characters before the post. The alias is replaced on the server with the necessary angle bracket and then stored in the database. With XSS aside, what are common ways of fixing this problem. (i.e. Is there a better way?)
If you have comments on XSS(cross-site scripting), I'm sure that will help someone.
There's actually a way to turn that "feature" off. This will allow the user to post whichever characters they want, and there will be no need to convert characters to an alias using Javascript. See this article for disabling request validation. It means that you'll have to do your own validation, but from the sounds of your post, it seems that is what you are looking to do anyway. You can also disable it per page by following the instructions here.
I think the safest way to go is to NOT allow the user to create tags with your WISYWIG. Maybe using something like a markdown editor like on this site or available here. would be another approach.
Also keep the Page directive ValidateRequest=true which should stop markup from being sent in the request, you'll of course need to handle this error when it comes up. People will always be able to inject tags into the request either way using firefox extensions like Tamper data, but the ValidateRequest=true should at least stop ASP.NET from accepting them.
A straight forward post on XSS attacks was recently made by Jeff here. It also speaks to making your cookies HttpOnly, which is a semi-defense against cookie theft. Good luck!
My first comment would be to avoid using JavaScript to change the angle brackets. Bypassing this is as simple as disabling JavaScript in the browser. Almost all server-side languages have some utility method that converts some HTML characters into their entity counterparts. For instance, PHP uses htmlentities(), and I am sure .NET has an equivalent utility method. In the least, you can do a regex replace for angle brackets, parenthesis and double quotes, and that will get you a long way toward a secure solution.

Resources