Encoder.HtmlEncode encodes Farsi characters - asp.net

I want to use the Microsoft AntiXss library for my project. When I use the Microsoft.Security.Application.Encoder.HtmlEncode(str) function to safely show some value in my web page, it encodes Farsi characters which I consider to be safe. For instance, it converts لیست to لیست. Am I using the wrong function? How should I be able to print the user input in my page safely?
I'm currently using it like this:
<h2>#Encoder.HtmlEncode(ViewBag.UserInput)</h2>

I think I messed up! Razor view encodes the values unless you use #Html.Raw right? Well, I encoded the string and it encoded it again. So in the end it just got encoded twice and hence, the weird looking chars (Unicode values)!

If your encoding (lets assume that it's Unicode by default) supports Farsi it's safe to use Farsi, without any additional effort, in ASP.NET MVC almost always.
First of all, escape-on-input is just wrong - you've taken some input and applied some transformation that is totally irrelevant to that data. It's generally wrong to encode your data immediately after you receive it from the user. You should store the data in pure view to your database and encode it only when you display it to the user and according to the possible vulnerabilities for the current system. For example the 'dangerous' html characters are not 'dangerous' for SQL or android etc. and that's one of the main reasons why you shouldn't encode the data when you store it in the server. And one more reason - when you html encode the string you got 6-7 times more characters for your string. This can be a problem with server constraints for strings length. When you store the data to the sql server you should escape, validate, sanitize your data only for it and prevent only its vulnerabilities (like sql injection).
Now for ASP.NET MVC and razor you don't need to html encode your strings because it's done by default unless you use Html.Raw() but generally you should avoid it (or html encode when you use it). Also if you double encode your data you'll result in corrupted output :)
I Hope this will help to clear your mind.

Related

Microsoft.Security.Application.Encoder.HtmlEncode preventing single quotes from rendering

As a security measure we're using the Microsoft.Security.Application.Encoder.HtmlEncode method to encode and render values that have been stored in our database by various users.
We would like to allow the user to use single quotes but they are being encoded as & #39;
Does anyone know of a safe way to allow single quotes to render but ensure the rest of the input is encoded? Is it just a case of replacing after the encoding has taken place? This approach seems a bit hacky.
I got to the bottom of this. The web control was also encoding the input data and therefore html encoding was taking place twice.

How to detect wrong encoding declaration?

I am building a ASP.NET webservice loading other webpages and then hand it clients.
I have been doing quite well with character code treatment, reading the meta tag from HTML then use that codeset to read the file.
But nevertheless, some less educated users just don't understand code sets. They declare a specific encoding method e.g. "gb2312", but in fact, he is just using normal UTF8. When I use gb2312 to decode the text, everything turns out a holy mess.
How can I detect whether the text is properly decoded? I loaded that page into my IE, which correctly use UTF-8 to decode the page. How does it achieve that?
Based on the BOM you can tell what encoding is used.
BOM and encoding
If you want to detect character set you could use the C# port of mozilla's character set detector.
CharDetSharp
If you want to make it extra sure that you are using a correct one, you maybe could be looking for special characters that are not supposed to be there. It is not very likely to include "óké". So you could be looking for such characters and try to use different encoding/character set to process your file.
Actually it is really hard to make your application completely "fool-proof".

How to prevent XSS in Asp.NET

What are the techniques that one can use to prevent cross site scripting in asp.net? Are there any non ready implementations that one can use to achieve a website protected against xss?
We did in-house development for this purpose for a long time, but finally Microsoft provided a library for it. We now replaced our library with this one completely. It can simply be used as follows:
string sanitizedString = Microsoft.Security.Application.Sanitizer.GetSafeHtmlFragment(string myStringToBeChecked);
The only problem with this method is that it trims multiple whitespaces that are separated with line ending characters. If you do not want that to happen, you may consider splitting the string first with respect to line ending characters (\r\n), then calculate the number of whitespaces before and after these splitted strings, apply sanitizer, append whitespaces back and concatenate.
Other than that, Microsoft library works fine.
The microsoft anti cross site scripting library is a good start. It has some useful helper methods to prevent XSS. It is now part of the larger Microsoft Web Protection Library
To protect cross site scripting attack as following ways.
1. Use HtmlEncoding while saving the input content that is received from web application controls usch as textbox etc.
2. Use InnerText instead of InnerHtml while displaying data on the page.
3. Do the input sanitization before saving the data into database.

Letters becoming "ë"

I Have a website, and there a few textboxes. If the users fill in something that contains the letters "ë" then it becomes like:
ë
How can I store it ë like this in the database?
My website is built on .NET and Iam using the C# language.
Both ASP.Net (your server-side application) and SQL Server are Unicode-aware. They can handle different languages, and different character sets:
http://msdn.microsoft.com/en-us/library/39d1w2xf.aspx
Internally, the code behind ASP.NET Web pages handles all string data
as Unicode. You can set how the page encodes its response, which sets
the CharSet attribute on the Content-Type part of the HTTP header.
This enables browsers to determine the encoding without a meta tag or
having to deduce the correct encoding from the content. You can also
set how the page interprets information that is sent in a request.
Finally, you can set how ASP.NET interprets the content of the page
itself — in other words, the encoding of the physical .aspx file on
disk. If you set the file encoding, all ASP pages must use that
encoding. Notepad.exe can save files that are encoded in the current
system ANSI codepage, in UTF-8, or in UTF-16 (also called Unicode).
The ASP.NET runtime can distinguish between these three encodings. The
encoding of the physical ASP.NET file must match the encoding that is
specified in the file in the # Page encoding attributes.
This article is also helpful:
http://support.microsoft.com/kb/893663
This "Joel-on-Software" article is an absolute must-read
The Absolute Minimum Every Software Developer Absolutely Positively Must Know About Unicode (No Excuses!)
Please read all three articles, and let us know if that helps.
You need HtmlEncode and HtmlDecode functions.
SQL Server is fine with ë and any other local or 'unusual' characters but HTML is not. This is because some characters have special meanings in HTML. Best examples are < or > which are essential to HTML syntax but there is lots more. For some reason ë is also special. To be able to display characters like that they need to be encoded before transmission as HTML. Transmission means also sending to a browser.
So, although you see ë in a browser your app is handling it in an encoded version which is ë and it's always in this form including database. If you want ë to be saved in SQL Server as ë you need to decode it first. Remember to encode it back to ë before displaying on your page.
Use these functions to decode/encode all your texts before saving/displaying respectively. They will only convert special characters and leave alone everything else:
string encoded = HttpUtility.HtmlEncode("Noël")
string decoded = HttpUtility.HtmlDecode("Noël")
There is another important reason to operate on encoded texts - JavaScript injections. It is an attack on your site meant to disrupt it by placing JavaScript chunks into edit/memo boxes with a hope that they will get executed at one point on someone else's browser. If you encode all texts you get from UI, those JavaScripts will never run because they will be treated as texts rather than an executable code.

To HTMLENCODE or not to HTMLENCODE user input on web form (asp.net vb)

I have many params making up an insert form for example:
x.Parameters.AddWithValue("#city", City.Text)
I had a failed xss attack on the site this morning, so I am trying to beef up security measures anyway....
Should I be adding my input params like this?
x.Parameters.AddWithValue("#city", HttpUtility.HtmlEncode(City.Text))
Is there anything else I should consider to avoid attacks?
Don't encode input. Do encode output. At some point in the future, you might decide you want to use the same data to produce PDF or a Word document (or something else), at which point you won't want it to be HTML.
When you are accepting data, it is just data.
When you are inserting data into a database, it needs to be converted to make sense for the database.
When you are inserting data into an HTML document, it needs to be converted to make sense for HTML.
… and so on.
I strongly recommending looking at the OWASP XSS Prevention Cheat Sheet. It helps classify the different areas of a html document you can inject into, and a recipe for how to encode your output appropriately for each location.
Know that you can't just universally trust a function like htmlEncode() and expecct it to be a magic pill for all ills. To quote from the OWASP document linked:
Why Can't I Just HTML Entity Encode Untrusted Data?
HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a tag. It even sort of works for untrusted data that goes into attributes, particularly if you're religious about using quotes around your attributes. But HTML entity encoding doesn't work if you're putting untrusted data inside a tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into. That's what the rules below are all about.
Take time to understand exactly how and why XSS works. Then just follow these 7 rules and you'll be safe.

Resources