i scrape HTML pages and write certain values into an sqlite database.
my question is: should the values, which i insert into the database, be html escaped or unescaped? what is best practice?
right now, e.g., one value looks like this in my db (note the escaped ampersand):
The database itself does not care.
It is your choice whether you escape the values before writing them to the DB, or after reading them from the DB.
However, you might need to apply different escaping algorithms in different contexts (URL, HTML, XML, JSON, CSV, etc.), and if you write HTML code to an .html file, you need no escaping at all.
So it would be a bad idea to force the values in the DB to have one specific one.
Related
As a security measure we're using the Microsoft.Security.Application.Encoder.HtmlEncode method to encode and render values that have been stored in our database by various users.
We would like to allow the user to use single quotes but they are being encoded as & #39;
Does anyone know of a safe way to allow single quotes to render but ensure the rest of the input is encoded? Is it just a case of replacing after the encoding has taken place? This approach seems a bit hacky.
I got to the bottom of this. The web control was also encoding the input data and therefore html encoding was taking place twice.
I want to use the Microsoft AntiXss library for my project. When I use the Microsoft.Security.Application.Encoder.HtmlEncode(str) function to safely show some value in my web page, it encodes Farsi characters which I consider to be safe. For instance, it converts لیست to لیست. Am I using the wrong function? How should I be able to print the user input in my page safely?
I'm currently using it like this:
<h2>#Encoder.HtmlEncode(ViewBag.UserInput)</h2>
I think I messed up! Razor view encodes the values unless you use #Html.Raw right? Well, I encoded the string and it encoded it again. So in the end it just got encoded twice and hence, the weird looking chars (Unicode values)!
If your encoding (lets assume that it's Unicode by default) supports Farsi it's safe to use Farsi, without any additional effort, in ASP.NET MVC almost always.
First of all, escape-on-input is just wrong - you've taken some input and applied some transformation that is totally irrelevant to that data. It's generally wrong to encode your data immediately after you receive it from the user. You should store the data in pure view to your database and encode it only when you display it to the user and according to the possible vulnerabilities for the current system. For example the 'dangerous' html characters are not 'dangerous' for SQL or android etc. and that's one of the main reasons why you shouldn't encode the data when you store it in the server. And one more reason - when you html encode the string you got 6-7 times more characters for your string. This can be a problem with server constraints for strings length. When you store the data to the sql server you should escape, validate, sanitize your data only for it and prevent only its vulnerabilities (like sql injection).
Now for ASP.NET MVC and razor you don't need to html encode your strings because it's done by default unless you use Html.Raw() but generally you should avoid it (or html encode when you use it). Also if you double encode your data you'll result in corrupted output :)
I Hope this will help to clear your mind.
I'm after a general regex for sanitising form input, I want to use it on first name last name fields , which will be stored in DB, and pretty much use it in other general places if I can.
I'm using ASP.net does any on
Sanitising user data is an output problem, not an input problem.
What is considered "sanitary" for a MySQL database is not necessarily "sanitary" for MSSQL or PostGreSQL. What is considered "sanitary" for a database is most likely not the same as what you could safely send in an HTML document. XHTML is a different story again and if you are outputing the user-supplied data into a javascript block or a CSS block it's different yet again. There is no way to sanitise user-supplied data for all output targets.
It's better to use the supplied library functions for sanitising data rather than building your own regex. PHP (which I happen to know better than ASP.net) has mysql_real_escape_string(). I'm sure ASP.net will have a library function for sanitising user-supplied data for use with various databases. It will also likely have library functions for sanitising user-supplied data for HTML as well.
Parameterised queries are even better than sanitising user-supplied data. And it can be done with ASP.net. This is the right way to use a database.
I have many params making up an insert form for example:
x.Parameters.AddWithValue("#city", City.Text)
I had a failed xss attack on the site this morning, so I am trying to beef up security measures anyway....
Should I be adding my input params like this?
x.Parameters.AddWithValue("#city", HttpUtility.HtmlEncode(City.Text))
Is there anything else I should consider to avoid attacks?
Don't encode input. Do encode output. At some point in the future, you might decide you want to use the same data to produce PDF or a Word document (or something else), at which point you won't want it to be HTML.
When you are accepting data, it is just data.
When you are inserting data into a database, it needs to be converted to make sense for the database.
When you are inserting data into an HTML document, it needs to be converted to make sense for HTML.
… and so on.
I strongly recommending looking at the OWASP XSS Prevention Cheat Sheet. It helps classify the different areas of a html document you can inject into, and a recipe for how to encode your output appropriately for each location.
Know that you can't just universally trust a function like htmlEncode() and expecct it to be a magic pill for all ills. To quote from the OWASP document linked:
Why Can't I Just HTML Entity Encode Untrusted Data?
HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a tag. It even sort of works for untrusted data that goes into attributes, particularly if you're religious about using quotes around your attributes. But HTML entity encoding doesn't work if you're putting untrusted data inside a tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into. That's what the rules below are all about.
Take time to understand exactly how and why XSS works. Then just follow these 7 rules and you'll be safe.
Regardless of the sql database collation being used is there any way to replace the special characters when displayed in the interface. At least is there any way to implement that for the "Turkish I" so discussed here :-) I want to eliminate small dotless 'i'.
What about a simple String.Replace for the characters you dont want?
You can either keep the original data in the database and do this when the page renders or you can do this before saving the data to the database.