We have taken over a .NET project recently and upon looking at the db we have the following in some columns:
1) Some columns have values such as
" & etc etc
2) Some have <script> tags and other non html encoded tags
This data is displayed all over the site. When trying out HtmlEncoding on point number 1 we get the following " -> "
Obviously we are wanting to htmlencode when displaying as point 2 contains javascript which we don't want executed.
Is there a way to use HtmlEncoded on values that might or might not be already encoded?
Is there a way to use HtmlEncoded on values that might or might not be already encoded?
No there isn't.
What i would suggest is that you write a quick script that goes through the database and unencode the already encoded data. Then use something like the Microsoft AntiXSS library (tutorial here) to encode all output before it gets output to the web page. Remember that it is fine to store the data unencoded1, the danger is when you echo it back out to the end user.
Some controls already encode output using encode functionality built into the .Net framework - which is not bulletproof to XSS - you just have to either avoid using those controls or just not encode the data displayed by them. There is a FAQ question pertaining to the MS controls that encode at the bottom of the page for the first link which you should read. Also some third party control vendors encode the output of their controls, you would do yourself a favor if you test them to make sure they are not still susceptible to XSS.
1Don't forget to take steps to prevent SQL injection though!
Before applying HtmlEncode( "myText" ) use HtmlDecode method to the input text.
That way you will decode your string from:
& quot; & amp; etc etc < script>
to
" & etc etc < script>
and afterwards apply encode "from scratch".
Related
I have following string:
soqDi22c2_A-eY4ahWKJV6GAYgmuJBZ3poNNEixha1lOhXxxoucRuuzmcyDD_9ZYp_ECXRPbrBf6issNn23CUDJrh_A5L3Y5dHhB0o_U5Oq_j4rDCXOJ4Q==
It's a query parameter generated by form on a page. (This is done server-side in ASP.net) We are able to submit this form programatically and get the string we need (it just leads to a detail page of an object [realworld parcel/building, publicly accessible]) and redirect our user to it. However I would like to know, if there is a way to decrypt/deobfuscate this string to know what it contains and if we could possibly just generate these without going through the form (it's a multi step form).
The string also has some sort of expiration, so I sadly cannot provide a link to the result page, as it would stop working after like 10 minutes or so.
It feels a bit like it's base64, but after trying to run it through base64 -d, it says it's invalid.
It's likely base64 with + and / replaced with - and _ to make it more browser-friendly.
Though even if it's base64-encoded, it may just be a completely random key. You won't nessesarily be able to decode it to something readable.
I have a website that allows to enter HTML through a TinyMCE rich editor control. It's purpose is to allow users to format text using HTML.
This user entered content is then outputted to other users of the system.
However this means someone could insert JavaScript into the HTML in order to perform a XSS attack on other users of the system.
What is the best way to filter out JavaScript code from a HTML string?
If I perform a Regular Expression check for <SCRIPT> tags it's a good start, but an evil doer could still attach JavaScript to the onclick attribute of a tag.
Is there a fool-proof way to script out all JavaScript code, whilst leaving the rest of the HTML untouched?
For my particular implementation, I'm using C#
Microsoft have produced their own anti-XSS library, Microsoft Anti-Cross Site Scripting Library V4.0:
The Microsoft Anti-Cross Site Scripting Library V4.0 (AntiXSS V4.0) is an encoding library designed to help developers protect their ASP.NET web-based applications from XSS attacks. It differs from most encoding libraries in that it uses the white-listing technique -- sometimes referred to as the principle of inclusions -- to provide protection against XSS attacks. This approach works by first defining a valid or allowable set of characters, and encodes anything outside this set (invalid characters or potential attacks). The white-listing approach provides several advantages over other encoding schemes. New features in this version of the Microsoft Anti-Cross Site Scripting Library include:- A customizable safe list for HTML and XML encoding- Performance improvements- Support for Medium Trust ASP.NET applications- HTML Named Entity Support- Invalid Unicode detection- Improved Surrogate Character Support for HTML and XML encoding- LDAP Encoding Improvements- application/x-www-form-urlencoded encoding support
It uses a whitelist approach to strip out potential XSS content.
Here are some relevant links related to AntiXSS:
Anti-Cross Site Scripting Library
Microsoft Anti-Cross Site Scripting Library V4.2 (AntiXSS V4.2)
Microsoft Web Protection Library
Peter, I'd like to introduce you to two concepts in security;
Blacklisting - Disallow things you know are bad.
Whitelisting - Allow things you know are good.
While both have their uses, blacklisting is insecure by design.
What you are asking, is in fact blacklisting. If there had to be an alternative to <script> (such as <img src="bad" onerror="hack()"/>), you won't be able to avoid this issue.
Whitelisting, on the other hand, allows you to specify the exact conditions you are allowing.
For example, you would have the following rules:
allow only these tags: b, i, u, img
allow only these attributes: src, href, style
That is just the theory. In practice, you must parse the HTML accordingly, hence the need of a proper HTML parser.
If you want to allow some HTML but not all, you should use something like OWASP AntiSamy, which allows you to build a whitelisted policy over which tags and attributes you allow.
HTMLPurifier might also be an alternative.
It's of key importance that it is a whitelist approach, as new attributes and events are added to HTML5 all the time, so any blacklisting would fail within short time, and knowing all "bad" attributes is also difficult.
Edit: Oh, and regex is a bit hard to do here. HTML can have lots of different formats. Tags can be unclosed, attributes can start with or without quotes (single or double), you can have line breaks and all kinds of spaces within the tags to name a few issues. I would rely on a welltested library like the ones I mentioned above.
Regular expressions are the wrong tool for the job, you need a real HTML parser or things will turn bad. You need to parse the HTML string and then remove all elements and attributes but the allowed ones (whitelist approach, blacklists are inherently insecure). You can take the lists used by Mozilla as a starting point. There you also have a list of attributes that take URL values - you need to verify that these are either relative URLs or use an allowed protocol (typically only http:/https:/ftp:, in particular no javascript: or data:). Once you've removed everything that isn't allowed you serialize your data back to HTML - now you have something that is safe to insert on your web page.
I try to replace tag element format like this:
public class Utility
{
public static string PreventXSS(string sInput) {
if (sInput == null)
return string.Empty;
string sResult = string.Empty;
sResult = Regex.Replace(sInput, "<", "< ");
sResult = Regex.Replace(sResult, #"<\s*", "< ");
return sResult;
}
}
Usage before save to db:
string sResultNoXSS = Utility.PreventXSS(varName)
I have test that I have input data like :
<script>alert('hello XSS')</script>
it will be run on browser. After I add Anti XSS the code above will be:
< script>alert('hello XSS')< /script>
(There is a space after <)
And the result, the script won't be run on browser.
The problem here is the middle of the line (HTML).
The chain:
I have WinForm program that uses awesomium (alternative to native webBrowser) to view Html page that has a part of asp.net page in it's iframe.
The problem:
The problem is that I need to pass value to asp.net page, it is easily achieved without middle of the chain (Html iframe) by sending hashed and crypted querystring.
How it works:
WinForm do some thing, then use few-step-crypt to code all the needed values into 1 string.
Then it should send this string to asp.net page through the iframe (and that's the problem, it is easy to receive query string in asp.net page, but firstly I need to receive it in Html and send to asp.net).
Acceptable answers:
1) Probably the most easily one - using JavaScript. I have heard it is possible to be done in that way.
How I imagine this - I send query string from WinForm to Html page as http:\\HtmlPage.html?AspNet.aspx?CryptedString
Then Html receive it with JavaScript and put querystring "AspNet.aspx?CryptedString" into iframe's "src=http:\\" resulting in "src=http:\\AspNet.aspx?CryptedString"
And then I easily get it in asp.net page.
2) Somehow create >>>VIRTUAL<<<(NOTE: Virtual, I don't want querystring to be saved on the HDD, even don't suggest) asp.net or html page with iframe source taken directly from WinForm string.
Probably that is possible with awesomium, but I'm new to it and don't know how to (if it is possible ofc).
3) Some web service with which I can communicate between asp.net and WinForm through the existing HTML iframe.
4) Another way that replace one of 3 previous, that doesn't save "values" in querystring/else on HDD nor is visible for the user, doesn't use asp.net page's server to create iframe-page on it. On HTML page's server HTML is only allowed, PhP isn't.
5) If you don't know any of 4 above - suggest free PhP hosting without ads (if such exists, what I highly doubt).
Priority:
The best one would be #3, then #2, then #1, then #5 (#4 is excluded as it is unknown).
And in the end:
Thanks in advance for your help.
P.S.Currently at work, so I'll check/try all answers later on and will report tomorrow if any suits my needs. Thanks again.
Answering my own question. I have found 2 ways that can do what I did want.
The first one:
Creating a RAM file System.IO.MemoryStream or another method (google c# create a file in ram).
The second one:
Creating a hidden+encrypted+system+custom-readable-only-by-program-crypt file somewhere in the far away folder via File.SetAttributes Method and System.IO.StreamWriter/Reader or System.IO.FileStream or System.IO.TextWriter, etc. depending on what it should be.
Once this file was used for needs delete it + delete on exit + delete on start using
if (File.Exists(path)
{
File.Delete(path);
}
(Need more reputation to post few links -_-, and I don't want to post only part of them, either all or no at all, so use google if you'll need anything from here).
If you'll need to store "Small temp file" and not for a long time use first one, if "Heavy" use second one, unless you badly need to use RAM for it.
I was reading some questions trying to find a good solution to preventing XSS in user provided URLs(which get turned into a link). I've found one for PHP but I can't seem to find anything for .Net.
To be clear, all I want is a library which will make user-provided text safe(including unicode gotchas?) and make user-provided URLs safe(used in a or img tags)
I noticed that StackOverflow has very good XSS protection, but sadly that part of their Markdown implementation seems to be missing from MarkdownSharp. (and I use MarkdownSharp for a lot of my content)
Microsoft has the Anti-Cross Site Scripting Library; you could start by taking a look at it and determining if it fits your needs. They also have some guidance on how to avoid XSS attacks that you could follow if you determine the tool they offer is not really what you need.
There's a few things to consider here. Firstly, you've got ASP.NET Request Validation which will catch many of the common XSS patterns. Don't rely exclusively on this, but it's a nice little value add.
Next up you want to validate the input against a white-list and in this case, your white-list is all about conforming to the expected structure of a URL. Try using Uri.IsWellFormedUriString for compliance against RFC 2396 and RFC 273:
var sourceUri = UriTextBox.Text;
if (!Uri.IsWellFormedUriString(sourceUri, UriKind.Absolute))
{
// Not a valid URI - bail out here
}
AntiXSS has Encoder.UrlEncode which is great for encoding string to be appended to a URL, i.e. in a query string. Problem is that you want to take the original string and not escape characters such as the forward slashes otherwise http://troyhunt.com ends up as http%3a%2f%2ftroyhunt.com and you've got a problem.
As the context you're encoding for is an HTML attribute (it's the "href" attribute you're setting), you want to use Encoder.HtmlAttributeEncode:
MyHyperlink.NavigateUrl = Encoder.HtmlAttributeEncode(sourceUri);
What this means is that a string like http://troyhunt.com/<script> will get escaped to http://troyhunt.com/<script> - but of course Request Validation would catch that one first anyway.
Also take a look at the OWASP Top 10 Unvalidated Redirects and Forwards.
i think you can do it yourself by creating an array of the charecters and another array with the code,
if you found characters from the array replace it with the code, this will help you ! [but definitely not 100%]
character array
<
>
...
Code Array
& lt;
& gt;
...
I rely on HtmlSanitizer. It is a .NET library for cleaning HTML fragments and documents from constructs that can lead to XSS attacks.
It uses AngleSharp to parse, manipulate, and render HTML and CSS.
Because HtmlSanitizer is based on a robust HTML parser it can also shield you from deliberate or accidental
"tag poisoning" where invalid HTML in one fragment can corrupt the whole document leading to broken layout or style.
Usage:
var sanitizer = new HtmlSanitizer();
var html = #"<script>alert('xss')</script><div onload=""alert('xss')"""
+ #"style=""background-color: test"">Test<img src=""test.gif"""
+ #"style=""background-image: url(javascript:alert('xss')); margin: 10px""></div>";
var sanitized = sanitizer.Sanitize(html, "http://www.example.com");
Assert.That(sanitized, Is.EqualTo(#"<div style=""background-color: test"">"
+ #"Test<img style=""margin: 10px"" src=""http://www.example.com/test.gif""></div>"));
There's an online demo, plus there's also a .NET Fiddle you can play with.
(copy/paste from their readme)
This is all internal servers and software, so I'm very limited on my options, but this is where I'm at. This is already a band-aid to a workaround but I have no choice, so I'm just trying to make it work.
I have a simple .asp file on my server that is protected by a service that will handle the user authentication (I have no control over this service). When a user goes to this .asp file, it requires them to authenticate via the service, and the service then redirects them to the .asp.
The service is inserting custom values in to the http header that allow me to identify who has logged in (I need it further down the line). When I use the asp to view the ALL_RAW and ALL_HTTP values from the header, I can see all the custom values. But when I try to call these values specifically I get nothing.
I ran this simple loop:
<%
for each x in Request.ServerVariables
response.write("<B>" & x & ":</b> " & Request.ServerVariables(x) & "<p />")
next
%>
and all the keys display including the custom ones. But none of the custom values will. The values are the part I need.
the only thing I can find unique about the custom values is that they look slightly different in the ALL_RAW value, but they all look correct in the ALL_HTTP. As best I can tell, they are formatted correctly. the only formatting differences between the standard and custom values are case and underscores instead of hyphens.
Why can I not read these custom values?
I found my answer.
When I ran this loop
<%
for each x in Request.ServerVariables
response.write("<B>" & x & ":</b> " & Request.ServerVariables(x) & "<p />")
next
%>
it would return a list of all the names that were in the header and their values. The custom value I was looking for would show as name "HTTP_CUSTOM_ID" and I could see it, with it's value in the ALL_HTTP and ALL_RAW, but when I tried to pull that specific value, it would return an empty string. The solution I stumbled on (by talking to someone else here at work who had gone through a similar situation with the same service I was trying to accommodate is to use:
<%=Request.ServerVariables("HEADER_CUSTOM_ID")%>
When viewing the full header, nothing led me to use the HEADER prefix instead of the HTTP, in fact, it led me opposite. And I never found any mention of this anywhere searching online either. So I'm posting my own answer to my question here so it is on the web.
For the sake of expedience, why not just parse Request.ServerVariables("ALL_RAW") yourself?
There is a better way than parsing each item yourself. Look at the values in Request.ServerVariables("ALL_HTTP") and find the header you need but named a bit different.
All HTTP headers start with HTTP_. I was looking for If-None-Match and it was in the collection as HTTP_IF_NONE_MATCH. To get the value I used Request.ServerVariables("HTTP_IF_NONE_MATCH").