Browser WYSIWYG best practices - asp.net

I am using a rich text editor on a web page. .NET has feature that prevent one from posting HTML tags, so I added a JavaScript snippet to change the angle brackets to and alias pair of characters before the post. The alias is replaced on the server with the necessary angle bracket and then stored in the database. With XSS aside, what are common ways of fixing this problem. (i.e. Is there a better way?)
If you have comments on XSS(cross-site scripting), I'm sure that will help someone.

There's actually a way to turn that "feature" off. This will allow the user to post whichever characters they want, and there will be no need to convert characters to an alias using Javascript. See this article for disabling request validation. It means that you'll have to do your own validation, but from the sounds of your post, it seems that is what you are looking to do anyway. You can also disable it per page by following the instructions here.

I think the safest way to go is to NOT allow the user to create tags with your WISYWIG. Maybe using something like a markdown editor like on this site or available here. would be another approach.
Also keep the Page directive ValidateRequest=true which should stop markup from being sent in the request, you'll of course need to handle this error when it comes up. People will always be able to inject tags into the request either way using firefox extensions like Tamper data, but the ValidateRequest=true should at least stop ASP.NET from accepting them.
A straight forward post on XSS attacks was recently made by Jeff here. It also speaks to making your cookies HttpOnly, which is a semi-defense against cookie theft. Good luck!

My first comment would be to avoid using JavaScript to change the angle brackets. Bypassing this is as simple as disabling JavaScript in the browser. Almost all server-side languages have some utility method that converts some HTML characters into their entity counterparts. For instance, PHP uses htmlentities(), and I am sure .NET has an equivalent utility method. In the least, you can do a regex replace for angle brackets, parenthesis and double quotes, and that will get you a long way toward a secure solution.

Related

How do I create an html link that has a link name, the same as the URL address?

Is this the easiest way in an html doc to create a link to a page that has the same name as the url?
So basically it will say:
Please click the following link:
http://test.com.
That is all I want it to say.
The code I wrote for this is as follows:
http://test.com.
Or is there a more all inclusive way where you don't have to write the name of the url twice?
Obviously my code doesnt include the initial text, this is just for example purposes.
Unless you want to copy the URL from one place to another using JavaScript, you will have to write the URL twice.
I advise agains the JavaScript copying, because its performance and SEO costs are much worse than the cost of typing everything twice.
What you have got now is the easiest way.
If it's not an option for some reason you can use server side scripting to search the page content for URLs and wrap an <a> tag around them.
This will require some very complicated regex. Daring Fireball has a very good blog post instructing you how to do this, and explaining exactly why it's actually impossible for this to be perfectly reliable (which is probably why HTML doesn't allow it):
http://daringfireball.net/2010/07/improved_regex_for_matching_urls
I've done this sort of thing before (with emails actually) and it's very difficult and took years to get right. If at all possible, you should just do what you're already doing - manually type in the <a> tag yourself.
Alternatively, you could use something like smarty (for PHP. I don't know what the ASP equivalent would be) to write something along the lines of the following, to programatically generate the full <a> tag:
{link url='http://example.com'}
Why don't we just sidestep the issue by making our links more semantically-rich?
Instead of:
For more information on our delicious pizza, visit www.pizzasrawesome.com.
Use this:
Read more about our delicious pizza.

Is HttpUtility.HtmlEncode safe?

I want the user to enter text and i would like to show the text back to the user and keep all the whitespaces. I dont want any exploits and have the user inject html or javascript. Is HttpUtility.HtmlEncode safe enough to use? ATM it looks correct since its properly encoding < > and other test letters. To display the the text back correctly what do i use? right now i am using <pre><code>. It looks alright, is this the correct way to display it?
HtmlEncode should be secure as far as any HTML codes or JavaScript. Any HTML markup characters will be encoded so that they appear only as other characters when displayed on a web page.
Yes, if I wanted to keep formatting (including all spaces), I would use <pre>.
You'll want to have a look at the GetSafeHTMLFragment method in the AntiXSS section of the Web Protection Library. This uses a whitelist of what HTML is considered 'safe' for XSS purposes, anything not in the whitelist is stripped out. Blowdart (who works on the WPL team) has a great blogpost on using the method.

Best practice for preventing saving malicious client script in HTML

We have an ASP.NET custom control that lets users enter HTML (similar to a Rich text box). We noticed that a user can potentially inject malicious client scripts within the <script> tag in the HTML view. I can validate HTML code on save to ensure that I remove any <script> elements.
Is this all I need to do? Are all other tags other than the <script> tag safe? If you were an attacker, what else would you attempt to do?
Any best practices I need to follow?
EDIT - How is the MS anti Xss library different from the native HtmlEncode for my purpose?
XSS (Cross Site Scripting) is a big a difficult subject to tackle correctly.
Instead of black-listing some tags (and missing some of the ways you may be attacked), it is better to decide on a set of tags that are OK for your site and only allowing them.
This in itself will not be enough, as you will have to catch all possible encodings an attacker might try and there are other things an attacker might try. There are anti-xss libraries that help - here is one from Microsoft.
For more information and guidance, see this OWASP article.
Have a look at this page:
http://ha.ckers.org/xss.html
to get an idea of different XSS attacks that somebody may try.
There's a whole lot to do when it comes to filtering out JavaScript from HTML. Here's a short list of some of the bigger points:
Multiple passes over the input is required to make sure that what you removed before doesn't create a new injection. If you're doing a single pass, things like <scr<script></script>ipt>alert("XSS!");</scr<script></script>ipt> will get past you since after your remove <script> tags from the string, you'll have created a new one.
Strip the use of the javascript: protocol in href and src attributes.
Strip embedded event handler attributes like onmouseover/out, onclick, onkeypress, etc.
White lists are safer than black lists. Only allow tags and attributes that you know are safe.
Make sure you're dealing with all the same character encoding. If you treat the input like ASCII (single byte) and the input has Unicode (multibyte) characters, you're going to get a nasty surprise.
Here's a more complete cheat sheet. Also, Oli linked to a good article at ha.ckers.org with samples to test your filtration.
Removing only the <script> tags will not be sufficient as there are lots of methods for encoding / hiding them in input. Most languages now have anti-xss and anti-csrf libraries and functions for filtering input. You should use one of these generally agreed upon libraries to filter your user input.
I'm not sure what the best options are in ASP.NET, but this might shed some light:
http://msdn.microsoft.com/en-us/library/ms998274.aspx
This is called a Cross Site Scripting (XSS) attack. They can be very hard to prevent, as there are a lot of surprising ways of getting JavaScript code to execute (javascript: URLs, sometimes CSS, object and iframe tags, etc).
The best approach is to whitelist tags, attributes, and types of URLs (and keep the whitelist as small as possible to do what you need) instead of blacklisting. That means that you only allow certain tags that you know are safe, rather than banning tags that you believe to be dangerous. This way, there are fewer possible ways for people to get an attack into your system, because tags that you didn't think about won't be allowed, rather than blacklisting where if you missed something, you will still have a vulnerability. Here's an example of a whitelist approach to sanitization.

HTMLEncode script tags only

I'm working on StackQL.net, which is just a simple web site that allows you to run ad hoc tsql queries on the StackOverflow public dataset. It's ugly (I'm not a graphic designer), but it works.
One of the choices I made is that I do not want to html encode the entire contents of post bodies. This way, you see some of the formatting from the posts in your queries. It will even load images, and I'm okay with that.
But I am concerned that this will also leave <script> tags active. Someone could plant a malicious script in a stackoverflow answer; they could even immediately delete it, so no one sees it. One of the most common queries people try when they first visit is a simple Select * from posts, so with a little bit of timing a script like this could end up running in several people's browsers. I want to make sure this isn't a concern before I update to the (hopefully soon-to-be-released) October data export.
What is the best, safest way to make sure just script tags end up encoded?
You may want to modify the HTMLSanatize script to fit your purposes. It was written by Jeff Atwood to allow certain kinds of HTML to be shown. Since it was written for Stack Overflow, it'd fit your purpose as well.
I don't know whether it's 'up to date' with what Jeff currently has deployed, but it's a good starting point.
Don't forget onclick, onmouseover, etc or javascript: psuedo-urls (<img src="javascript:evil!Evil!">) or CSS (style="property: expression(evil!Evil!);") or…
There are a host of attack vectors beyond simple script elements.
Implement a white list, not a black list.
If the messages are in XHTML format then you could do an XSL transform and encode/strip tags and properties that you don't want. It gets a little easier if you use something like TinyMCE or CKEditor to provide a wysiwyg editor that outputs XHTML.
What about simply breaking the <script> tags? Escaping only < and > for that tag, ending up with <script>, could be one simple and easy way.
Of course links are another vector. You should also disable every instance of href='javascript:', and every attribute starting with on*.
Just to be sure, nuke it from orbit.
But I am concerned that this will also leave <script tags active.
Oh, that's just the beginning of HTML ‘malicious content’ that can cause cross-site scripting. There's also event handlers; inline, embedded and linked CSS (expressions, behaviors, bindings), Flash and other embeddable plugins, iframes to exploit sites, javascript: and other dangerous schemes (there are more than you think!) in every place that can accept a URL, meta-refresh, UTF-8 overlongs, UTF-7 mis-sniffing, data binding, VML and other non-HTML stuff, broken markup parsed as scripts by permissive browsers...
In short any quick-fix attempt to sanitise HTML with a simple regex will fail badly.
Either escape everything so that any HTML is displayed as plain text, or use a full parser-and-whitelist-based sanitiser. (And keep it up-to-date, because even that's a hard job and there are often newly-discovered holes in them.)
But aren't you using the same Markdown system as SO itself to render posts? That would be the obvious thing to do. I can't guarantee there are no holes in Markdown that would allow cross-site scripting (there certainly have been in the past and there are probably some more obscure ones still in there as it's quite a complicated system). But at least you'd be no more insecure than SO is!
Use a Regex to replace the script tags with the encoded tags. This will filter the tags which has the word "script" in it and HtmlEncode it. Thus, all the script tags such as <script>, </script> and <script type="text/javascript"> etc. will get encoded and will not encode other tags in the string.
Regex.Replace(text, #"</?(\w+)[^>]*>",
tag => tag.Groups[1].Value.ToLower().Contains("script") ? HttpUtility.HtmlEncode(tag.Value) : tag.Value,
RegexOptions.Singleline);

What's the best way to remove (or ignore) script and form tags in HTML?

I have text stored in SQL as HTML. I'm not guaranteed that this data is well-formed, as users can copy/paste from anywhere into the editor control I'm using, or manually edit the HTML that's generated.
The question is: what's the best way of going about removing or somehow ignoring <script/> and <form/> tags so that, when the user's text is displayed elsewhere in the Web Application, it doesn't disrupt the normal operation of the containing page.
I've toyed with the idea of simply doing a "Find and Replace" for <script>/<form>with <div> (obviously taking into account whitespace and closing tags, if they exist). I'm also open to any way to somehow "ignore" certain tags. For all I know, there could be some built-in way of saying (in HTML, CSS, or JavaScript) "for all elements in <div id="MyContent">, treat <form> and <script> as <div>.
Any help or advice would be greatly appreciated!
In terms of sanitising user input, form and script tags are not the only ones that should be cleaned up.
The best way of doing this job depends a little on what tools you are using. Have a look at these questions:
What’s the best method for sanitizing user input with PHP?
Sanitising user input using Python
It depends on which language you're using. In general, I'd recommend using an HTML parser, constructing a small DOM from the snippet, then nuking unwanted elements. There are many good HTML parser, especially designed to handle real-world, messy HTML. Examples include BeautifulSoup (Python), HTMLParser (Java)... And, since the answer came in while I was typing, what Colin said!
Don't try and do it yourself - there are far too many tricks for getting bits of script and general nastiness into a page. Use the Microsoft AntiXSS library - version 3.1 has HTML sanitation built in. You probably want the GetSafeHTMLFragment method, which returns a sanitised chunk of HTML. See my previous answer.
Since you're using .Net I would recommend HtmlAgilityPack as it is easy to work with and works well with malformed HTML.
Though the answers suggested were acceptable, I ended up using a good old regular expression to replace begin and end <script> and <form> tags with <div>'s.
txtStore.Text=Regex.Replace(txtStore, "<.*?>", string.Empty);
I had faced same problem before. But my scenario was something different. I was adding content with ajax request to page. The content coming in ajax response was html and it also included script tags. I just wanted to get html without any script so I did removed all script tags from ajax response with jquery.
jquery-remove-script-tags-from-string

Resources