Is HttpUtility.HtmlEncode safe? - asp.net

I want the user to enter text and i would like to show the text back to the user and keep all the whitespaces. I dont want any exploits and have the user inject html or javascript. Is HttpUtility.HtmlEncode safe enough to use? ATM it looks correct since its properly encoding < > and other test letters. To display the the text back correctly what do i use? right now i am using <pre><code>. It looks alright, is this the correct way to display it?

HtmlEncode should be secure as far as any HTML codes or JavaScript. Any HTML markup characters will be encoded so that they appear only as other characters when displayed on a web page.
Yes, if I wanted to keep formatting (including all spaces), I would use <pre>.

You'll want to have a look at the GetSafeHTMLFragment method in the AntiXSS section of the Web Protection Library. This uses a whitelist of what HTML is considered 'safe' for XSS purposes, anything not in the whitelist is stripped out. Blowdart (who works on the WPL team) has a great blogpost on using the method.

Related

Storing HTML in db while avoiding persistent xss/sql injection

I'm building a page in asp.net that will use tiny mce to provide a rich text editor on the page. Tiny mce outputs the rich text as html which I would like to save to a database. Then at a later date, I want to pull the HTML from the database and display it in a page.
I'm concerned about allowing malicious html, js tags into my database that would later be output.
Can someone walk me through at what point in my process I should html encode/decode etc. to prevent a persistent xss attack and or sql injection attack?
We use the Microsoft Web Protection Library to scrape out any potentially dangerous HTML on the way in. What I mean by "on the way in" - when the page is posted to the server, we scrub the HTML using MS WPL and take the results of that and throw that into the database. Don't even let any bad data get to your database, and you'll be safer for it. As far as encoding, you won't want to mess with HTML encoding/decoding - just take whatever is in your tinyMCE control, scrub it, and save it. Then on your display page, just write it out like it exists in your database into a literal control or something like that, and you should be good.
I believe Microsoft.Security.Application.Sanitizer.GetSafeHtmlFragment(input) will do exactly what you want here.
Are these admins that are using the RTE? If so, I wouldn't worry about it.
If not, then I don't recommend using a WYSIWYIG such as TinyMCE. You'll have to actually look for malicious input, and chances are, you will miss some. Since the RTE outputs plain HTML, which I assume you want, you can't just convert HTML entities. That would kind of eliminate the whole point of using TinyMCE.
Stopping SQL injection is done in the backend when inserting the data into the database. You will want to use a parametrized query or escape the input (not sure how in ASP.NET, I'm a PHP guy.)
Couldn't you use a rich text editor that uses BBCode and on the server, escape everything that needs to be escaped and convert BBCode to HTML markup afterwards?
You could also, instead of producing BBCode on the client, convert the HTML markup to BBCode on the server, escape the remaining HTML and convert the result from BBCode back to HTML.
There are two approaches, you will probably use the first one
1) you will make a list of permitted tags and escape/strip rest of them. TinyMCE has probably some feature to disallow user to use some tags..(vut this is only client side, you should validate it on server)
2) you will encode permitted tags differently ([b]bold[/b]), than you could save everything to DB and while rendering escape everything and than interpret your special tags
Third approach: if the user is admin (the one who should know whats is he doing), than you can leave everyhing without escaping...he is the responsible one for his own mistakes....

Best practice for preventing saving malicious client script in HTML

We have an ASP.NET custom control that lets users enter HTML (similar to a Rich text box). We noticed that a user can potentially inject malicious client scripts within the <script> tag in the HTML view. I can validate HTML code on save to ensure that I remove any <script> elements.
Is this all I need to do? Are all other tags other than the <script> tag safe? If you were an attacker, what else would you attempt to do?
Any best practices I need to follow?
EDIT - How is the MS anti Xss library different from the native HtmlEncode for my purpose?
XSS (Cross Site Scripting) is a big a difficult subject to tackle correctly.
Instead of black-listing some tags (and missing some of the ways you may be attacked), it is better to decide on a set of tags that are OK for your site and only allowing them.
This in itself will not be enough, as you will have to catch all possible encodings an attacker might try and there are other things an attacker might try. There are anti-xss libraries that help - here is one from Microsoft.
For more information and guidance, see this OWASP article.
Have a look at this page:
http://ha.ckers.org/xss.html
to get an idea of different XSS attacks that somebody may try.
There's a whole lot to do when it comes to filtering out JavaScript from HTML. Here's a short list of some of the bigger points:
Multiple passes over the input is required to make sure that what you removed before doesn't create a new injection. If you're doing a single pass, things like <scr<script></script>ipt>alert("XSS!");</scr<script></script>ipt> will get past you since after your remove <script> tags from the string, you'll have created a new one.
Strip the use of the javascript: protocol in href and src attributes.
Strip embedded event handler attributes like onmouseover/out, onclick, onkeypress, etc.
White lists are safer than black lists. Only allow tags and attributes that you know are safe.
Make sure you're dealing with all the same character encoding. If you treat the input like ASCII (single byte) and the input has Unicode (multibyte) characters, you're going to get a nasty surprise.
Here's a more complete cheat sheet. Also, Oli linked to a good article at ha.ckers.org with samples to test your filtration.
Removing only the <script> tags will not be sufficient as there are lots of methods for encoding / hiding them in input. Most languages now have anti-xss and anti-csrf libraries and functions for filtering input. You should use one of these generally agreed upon libraries to filter your user input.
I'm not sure what the best options are in ASP.NET, but this might shed some light:
http://msdn.microsoft.com/en-us/library/ms998274.aspx
This is called a Cross Site Scripting (XSS) attack. They can be very hard to prevent, as there are a lot of surprising ways of getting JavaScript code to execute (javascript: URLs, sometimes CSS, object and iframe tags, etc).
The best approach is to whitelist tags, attributes, and types of URLs (and keep the whitelist as small as possible to do what you need) instead of blacklisting. That means that you only allow certain tags that you know are safe, rather than banning tags that you believe to be dangerous. This way, there are fewer possible ways for people to get an attack into your system, because tags that you didn't think about won't be allowed, rather than blacklisting where if you missed something, you will still have a vulnerability. Here's an example of a whitelist approach to sanitization.

What's the best way to remove (or ignore) script and form tags in HTML?

I have text stored in SQL as HTML. I'm not guaranteed that this data is well-formed, as users can copy/paste from anywhere into the editor control I'm using, or manually edit the HTML that's generated.
The question is: what's the best way of going about removing or somehow ignoring <script/> and <form/> tags so that, when the user's text is displayed elsewhere in the Web Application, it doesn't disrupt the normal operation of the containing page.
I've toyed with the idea of simply doing a "Find and Replace" for <script>/<form>with <div> (obviously taking into account whitespace and closing tags, if they exist). I'm also open to any way to somehow "ignore" certain tags. For all I know, there could be some built-in way of saying (in HTML, CSS, or JavaScript) "for all elements in <div id="MyContent">, treat <form> and <script> as <div>.
Any help or advice would be greatly appreciated!
In terms of sanitising user input, form and script tags are not the only ones that should be cleaned up.
The best way of doing this job depends a little on what tools you are using. Have a look at these questions:
What’s the best method for sanitizing user input with PHP?
Sanitising user input using Python
It depends on which language you're using. In general, I'd recommend using an HTML parser, constructing a small DOM from the snippet, then nuking unwanted elements. There are many good HTML parser, especially designed to handle real-world, messy HTML. Examples include BeautifulSoup (Python), HTMLParser (Java)... And, since the answer came in while I was typing, what Colin said!
Don't try and do it yourself - there are far too many tricks for getting bits of script and general nastiness into a page. Use the Microsoft AntiXSS library - version 3.1 has HTML sanitation built in. You probably want the GetSafeHTMLFragment method, which returns a sanitised chunk of HTML. See my previous answer.
Since you're using .Net I would recommend HtmlAgilityPack as it is easy to work with and works well with malformed HTML.
Though the answers suggested were acceptable, I ended up using a good old regular expression to replace begin and end <script> and <form> tags with <div>'s.
txtStore.Text=Regex.Replace(txtStore, "<.*?>", string.Empty);
I had faced same problem before. But my scenario was something different. I was adding content with ajax request to page. The content coming in ajax response was html and it also included script tags. I just wanted to get html without any script so I did removed all script tags from ajax response with jquery.
jquery-remove-script-tags-from-string

Preventing XSS (Cross-site Scripting)

Let's say I have a simple ASP.NET MVC blog application and I want to allow readers to add comments to a blog post. If I want to prevent any type of XSS shenanigans, I could HTML encode all comments so that they become harmless when rendered. However, what if I wanted to some basic functionality like hyperlinks, bolding, italics, etc?
I know that StackOverflow uses the WMD Markdown Editor, which seems like a great choice for what I'm trying to accomplish, if not for the fact that it supports both HTML and Markdown which leaves it open to XSS attacks.
If you are not looking to use an editor you might consider OWASP's AntiSamy.
You can run an example here:
http://www.antisamy.net/
How much HTML are you going to support? Just bold/italics/the basic stuff? In that case, you can convert those to markdown syntax and then strip the rest of the HTML.
The stripping needs to be done server side, before you store it. You need to validate the input on the server as well, when checking for SQL-vulnerabilities and other unwanted stuff.
If you need to do it in the browser: http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer
I'd suggest you only submit the markdown syntax. On the front end, the client can type markdown and have an HTML preview (same as SO), but only submit the markdown syntax server-side. Then you can validate it, generate the HTML, escape it and store it.
I believe that's the way most of us do it. In either case, markdown is there to alleviate anyone from writing structured HTML code and give power to those who wouldn't even know how to.
If there's something specific you'd like to do with the HTML, then you can tweak it with some CSS inheritance '.comment a { color: #F0F; }', front end JS or just traverse over the generated HTML from parsing markdown before you store it.
Why don't you use Jeff's code ? http://refactormycode.com/codes/333-sanitize-html
I'd vote for the FCKEditor but you have to do some extra steps to the returned output too.
You could use an HTML whitelist so that certain tags can still be used, but everything else is blocked.
There are tools that can do this for you. SO uses the code that Slough linked.

Browser WYSIWYG best practices

I am using a rich text editor on a web page. .NET has feature that prevent one from posting HTML tags, so I added a JavaScript snippet to change the angle brackets to and alias pair of characters before the post. The alias is replaced on the server with the necessary angle bracket and then stored in the database. With XSS aside, what are common ways of fixing this problem. (i.e. Is there a better way?)
If you have comments on XSS(cross-site scripting), I'm sure that will help someone.
There's actually a way to turn that "feature" off. This will allow the user to post whichever characters they want, and there will be no need to convert characters to an alias using Javascript. See this article for disabling request validation. It means that you'll have to do your own validation, but from the sounds of your post, it seems that is what you are looking to do anyway. You can also disable it per page by following the instructions here.
I think the safest way to go is to NOT allow the user to create tags with your WISYWIG. Maybe using something like a markdown editor like on this site or available here. would be another approach.
Also keep the Page directive ValidateRequest=true which should stop markup from being sent in the request, you'll of course need to handle this error when it comes up. People will always be able to inject tags into the request either way using firefox extensions like Tamper data, but the ValidateRequest=true should at least stop ASP.NET from accepting them.
A straight forward post on XSS attacks was recently made by Jeff here. It also speaks to making your cookies HttpOnly, which is a semi-defense against cookie theft. Good luck!
My first comment would be to avoid using JavaScript to change the angle brackets. Bypassing this is as simple as disabling JavaScript in the browser. Almost all server-side languages have some utility method that converts some HTML characters into their entity counterparts. For instance, PHP uses htmlentities(), and I am sure .NET has an equivalent utility method. In the least, you can do a regex replace for angle brackets, parenthesis and double quotes, and that will get you a long way toward a secure solution.

Resources