Why?
Server.UrlEncode("2*")
return 2*
while it should return 2%2A
as tested on this demo site
RFC 1738 specifically allows * in the URL:
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
So, there is no need to encode it.
The page you link to is a classic asp page so uses UrlEncode, so quite an old implementation and not the .NET one.
According to .NET, * is a 'safe' character and needs not to be encoded.
Whether this is actually correct or not, I do not know.
Related
I am trying to construct the below URL:
https://console.aws.amazon.com/elasticmapreduce/home?region=us-east-1#cluster-details:j-1IGU6572KT6LB
I am not sure how to include the :j-1IGU6572KT6LB. When I include :`, it gets encoded. Trying to see if that can be avoided.
This is what I have:
UriBuilder
.fromPath("console.aws.amazon.com")
.path("elasticmapreduce")
.path("home")
.queryParam("region","us-east-1")
.fragment("cluster-details")
.port(-1)
.scheme("https")
If the ":" in fragments is encoded that appears to me to be a bug (see RFC 3986, Section 3.5 and 3.3). I recommend to open a bug report.
OTOH, if a recipient fails to handle the percent-encoded colon, that's a bug as well.
Context: ASP.NET MVC running in IIS, with a a UTF-8 %-encoded URL.
Using the standard project template, and a test-action in HomeController like:
public ActionResult Test(string id)
{
return Content(id, "text/plain");
}
This works fine for most %-encoded UTF-8 routes, such as:
http://mydevserver/Home/Test/%e4%ba%ac%e9%83%bd%e5%bc%81
with the expected result 京都弁
However using the route:
http://mydevserver/Home/Test/%ee%93%bb
the url is not received correctly.
Aside: %ee%93%bb is %-encoded code-point 0xE4FB; basic-multilingual-plane, private-use area; but ultimately - a valid unicode code-point; you can verify this manually, or via:
string value = ((char) 0xE4FB).ToString();
string encoded = HttpUtility.UrlEncode(value); // %ee%93%bb
Now, what happens next depends on the web-server; on the Visual Studio Development Server (aka cassini), the correct id is received - a string of length one, containing code-point 0xE4FB.
If, however, I do this in IIS or IIS Express, I get a different id, specifically "î“»", code-points: 0xEE, 0x201C, 0xBB. You will immediately recognise the first and last as the start and end of our percent-encoded string... so what happened in the middle?
Well:
code-point 0x93 is “ (source)
code-point 0x201c is “ (source)
It looks to me very much like IIS has performed some kind of quote-translation when processing my url. Now maybe this might have uses in a few scenarios (I don't know), but it is certainly a bad thing when it happens in the middle of a %-encoded UTF-8 block.
Note that HttpContext.Current.Request.Raw also shows this translation has occurred, so this does not look like an MVC bug; note also Darin's comment, highlighting that it works differently in the path vs query portion of the url.
So (two-parter):
is my analysis missing some important subtlety of unicode / url processing?
how do I fix it? (i.e. make it so that I receive the expected character)
id = Encoding.UTF8.GetString(Encoding.Default.GetBytes(id));
This will give you your original id.
IIS uses Default (ANSI) encoding for path characters. Your url encoded string is decoded using that and that is why you're getting a weird thing back.
To get the original id you can convert it back to bytes and get the string using utf8 encoding.
See Unicode and ISAPI Filters
ISAPI Filter is an ANSI API - all values you can get/set using the API
must be ANSI. Yes, I know this is shocking; after all, it is 2006 and
everything nowadays are in Unicode... but remember that this API
originated more than a decade ago when barely anything was 32bit, much
less Unicode. Also, remember that the HTTP protocol which ISAPI
directly manipulates is in ANSI and not Unicode.
EDIT: Since you mentioned that it works with most other characters so I'm assuming that IIS has some sort of encoding detection mechanism which is failing in this case. As a workaround though you can prefix your id with this char and then you can easily detect if the problem occurred (if this char is missing). Not a very ideal solution but it will work. You can then write your custom model binder and a wrapper class in ASP.NET MVC to make your consumption code cleaner.
Once Upon A Time, URLs themselves were not in UTF-8. They were in the ANSI code page. This facilitates the fact that they often are used to select, well, pathnames in the server's file system. In ancient times, IE had an option to tell whether you wanted to send UTF-8 URLs or not.
Perhaps buried in the bowels of the IIS config there is a place to specify the URL encoding, and perhaps not.
Ultimately, to get around this, I had to use request.ServerVariables["HTTP_URL"] and some manual parsing, with a bunch of error-handling fallbacks (additionally compensating for some related glitches in Uri). Not great, but only affects a tiny minority of awkward requests.
I was reading some questions trying to find a good solution to preventing XSS in user provided URLs(which get turned into a link). I've found one for PHP but I can't seem to find anything for .Net.
To be clear, all I want is a library which will make user-provided text safe(including unicode gotchas?) and make user-provided URLs safe(used in a or img tags)
I noticed that StackOverflow has very good XSS protection, but sadly that part of their Markdown implementation seems to be missing from MarkdownSharp. (and I use MarkdownSharp for a lot of my content)
Microsoft has the Anti-Cross Site Scripting Library; you could start by taking a look at it and determining if it fits your needs. They also have some guidance on how to avoid XSS attacks that you could follow if you determine the tool they offer is not really what you need.
There's a few things to consider here. Firstly, you've got ASP.NET Request Validation which will catch many of the common XSS patterns. Don't rely exclusively on this, but it's a nice little value add.
Next up you want to validate the input against a white-list and in this case, your white-list is all about conforming to the expected structure of a URL. Try using Uri.IsWellFormedUriString for compliance against RFC 2396 and RFC 273:
var sourceUri = UriTextBox.Text;
if (!Uri.IsWellFormedUriString(sourceUri, UriKind.Absolute))
{
// Not a valid URI - bail out here
}
AntiXSS has Encoder.UrlEncode which is great for encoding string to be appended to a URL, i.e. in a query string. Problem is that you want to take the original string and not escape characters such as the forward slashes otherwise http://troyhunt.com ends up as http%3a%2f%2ftroyhunt.com and you've got a problem.
As the context you're encoding for is an HTML attribute (it's the "href" attribute you're setting), you want to use Encoder.HtmlAttributeEncode:
MyHyperlink.NavigateUrl = Encoder.HtmlAttributeEncode(sourceUri);
What this means is that a string like http://troyhunt.com/<script> will get escaped to http://troyhunt.com/<script> - but of course Request Validation would catch that one first anyway.
Also take a look at the OWASP Top 10 Unvalidated Redirects and Forwards.
i think you can do it yourself by creating an array of the charecters and another array with the code,
if you found characters from the array replace it with the code, this will help you ! [but definitely not 100%]
character array
<
>
...
Code Array
& lt;
& gt;
...
I rely on HtmlSanitizer. It is a .NET library for cleaning HTML fragments and documents from constructs that can lead to XSS attacks.
It uses AngleSharp to parse, manipulate, and render HTML and CSS.
Because HtmlSanitizer is based on a robust HTML parser it can also shield you from deliberate or accidental
"tag poisoning" where invalid HTML in one fragment can corrupt the whole document leading to broken layout or style.
Usage:
var sanitizer = new HtmlSanitizer();
var html = #"<script>alert('xss')</script><div onload=""alert('xss')"""
+ #"style=""background-color: test"">Test<img src=""test.gif"""
+ #"style=""background-image: url(javascript:alert('xss')); margin: 10px""></div>";
var sanitized = sanitizer.Sanitize(html, "http://www.example.com");
Assert.That(sanitized, Is.EqualTo(#"<div style=""background-color: test"">"
+ #"Test<img style=""margin: 10px"" src=""http://www.example.com/test.gif""></div>"));
There's an online demo, plus there's also a .NET Fiddle you can play with.
(copy/paste from their readme)
I am new to ASP.NET MVC. I am getting an error when i use these characters - *#%":?<> - in URL.
My question is - Does ASP.NET MVC handle *#%":?<> characters in the URL?
RFC 1738:
Thus, only alphanumerics, the special
characters "$-_.+!*'(),", and reserved
characters used for their reserved
purposes may be used unencoded within
a URL.
Of the characters you listed, only * " and - can theoretically be used unencoded. In practice, many sites would encode all the characters you listed.
No, it does not work, even when you encode them.
It is a stupid limitation in ASP.NET.
They do work in the querystring part though, just not the path part.
Take a look at this.. While it does not solve the problem, at least you know you are not alone :)
Look over here: http://www.w3schools.com/TAGS/ref_urlencode.asp
If you want the chars to be transferred as plain chars, you have to encode them, as they have a meaning in urls.
Use encode in url parameter.
Example:
javascript - window.location = 'path?parameter=' + encodeURIComponent(value);
Razor - #Url.Action("Action", new { parameter=Uri.EscapeUriString(#value) })"
Is there a difference between Server.UrlEncode and HttpUtility.UrlEncode?
I had significant headaches with these methods before, I recommend you avoid any variant of UrlEncode, and instead use Uri.EscapeDataString - at least that one has a comprehensible behavior.
Let's see...
HttpUtility.UrlEncode(" ") == "+" //breaks ASP.NET when used in paths, non-
//standard, undocumented.
Uri.EscapeUriString("a?b=e") == "a?b=e" // makes sense, but rarely what you
// want, since you still need to
// escape special characters yourself
But my personal favorite has got to be HttpUtility.UrlPathEncode - this thing is really incomprehensible. It encodes:
" " ==> "%20"
"100% true" ==> "100%%20true" (ok, your url is broken now)
"test A.aspx#anchor B" ==> "test%20A.aspx#anchor%20B"
"test A.aspx?hmm#anchor B" ==> "test%20A.aspx?hmm#anchor B" (note the difference with the previous escape sequence!)
It also has the lovelily specific MSDN documentation "Encodes the path portion of a URL string for reliable HTTP transmission from the Web server to a client." - without actually explaining what it does. You are less likely to shoot yourself in the foot with an Uzi...
In short, stick to Uri.EscapeDataString.
HttpServerUtility.UrlEncode will use HttpUtility.UrlEncode internally. There is no specific difference. The reason for existence of Server.UrlEncode is compatibility with classic ASP.
Fast-forward almost 9 years since this was first asked, and in the world of .NET Core and .NET Standard, it seems the most common options we have for URL-encoding are WebUtility.UrlEncode (under System.Net) and Uri.EscapeDataString. Judging by the most popular answer here and elsewhere, Uri.EscapeDataString appears to be preferable. But is it? I did some analysis to understand the differences and here's what I came up with:
WebUtility.UrlEncode encodes space as +; Uri.EscapeDataString encodes it as %20.
Uri.EscapeDataString percent-encodes !, (, ), and *; WebUtility.UrlEncode does not.
WebUtility.UrlEncode percent-encodes ~; Uri.EscapeDataString does not.
Uri.EscapeDataString throws a UriFormatException on strings longer than 65,520 characters; WebUtility.UrlEncode does not. (A more common problem than you might think, particularly when dealing with URL-encoded form data.)
Uri.EscapeDataString throws a UriFormatException on the high surrogate characters; WebUtility.UrlEncode does not. (That's a UTF-16 thing, probably a lot less common.)
For URL-encoding purposes, characters fit into one of 3 categories: unreserved (legal in a URL); reserved (legal in but has special meaning, so you might want to encode it); and everything else (must always be encoded).
According to the RFC, the reserved characters are: :/?#[]#!$&'()*+,;=
And the unreserved characters are alphanumeric and -._~
The Verdict
Uri.EscapeDataString clearly defines its mission: %-encode all reserved and illegal characters. WebUtility.UrlEncode is more ambiguous in both definition and implementation. Oddly, it encodes some reserved characters but not others (why parentheses and not brackets??), and stranger still it encodes that innocently unreserved ~ character.
Therefore, I concur with the popular advice - use Uri.EscapeDataString when possible, and understand that reserved characters like / and ? will get encoded. If you need to deal with potentially large strings, particularly with URL-encoded form content, you'll need to either fall back on WebUtility.UrlEncode and accept its quirks, or otherwise work around the problem.
EDIT: I've attempted to rectify ALL of the quirks mentioned above in Flurl via the Url.Encode, Url.EncodeIllegalCharacters, and Url.Decode static methods. These are in the core package (which is tiny and doesn't include all the HTTP stuff), or feel free to rip them from the source. I welcome any comments/feedback you have on these.
Here's the code I used to discover which characters are encoded differently:
var diffs =
from i in Enumerable.Range(0, char.MaxValue + 1)
let c = (char)i
where !char.IsHighSurrogate(c)
let diff = new {
Original = c,
UrlEncode = WebUtility.UrlEncode(c.ToString()),
EscapeDataString = Uri.EscapeDataString(c.ToString()),
}
where diff.UrlEncode != diff.EscapeDataString
select diff;
foreach (var diff in diffs)
Console.WriteLine($"{diff.Original}\t{diff.UrlEncode}\t{diff.EscapeDataString}");
Keep in mind that you probably shouldn't be using either one of those methods. Microsoft's Anti-Cross Site Scripting Library includes replacements for HttpUtility.UrlEncode and HttpUtility.HtmlEncode that are both more standards-compliant, and more secure. As a bonus, you get a JavaScriptEncode method as well.
Server.UrlEncode() is there to provide backward compatibility with Classic ASP,
Server.UrlEncode(str);
Is equivalent to:
HttpUtility.UrlEncode(str, Response.ContentEncoding);
The same, Server.UrlEncode() calls HttpUtility.UrlEncode()