How does StackExchange handle invalid characters in route URLs?

How does StackExchange handle invalid characters in route URLs? - asp.net

Scott Hanselman's post on using wacky chars in a Request URL, explains how IIS and ASP.Net security features can be circumvented to allow invalid characters to be passed on in a URL... but I am sure stack exchange is doing it different as his methodology would leave the site wide open to nasty attacks and bugs.
StackExchange has links to tags, like C# that are sent to the web server in a GET request encoded, like this:
// C#
http://stackoverflow.com/questions/tagged/c%23
// C++
http://stackoverflow.com/questions/tagged/c%2b%2b
The trick is... they are sent as request path values (ex. route parameters), not as values in a query string...
If you see Hanselman's article, he suggests it is only possible by turning off several other security features beyond RequestValidation (the later allows encoded chars in a query string portion of a URL).
Questions
How does StackExchange accomplish this?
If it is done the same way Hanselman illustrates in his blog, what extra steps do they take to protect themselves?

They don't accept just any character. They use slugs.

Related

How to sanitize the query string input against cross site scripting - reflected issues in ASP .Net (Web Forms) Application?

I am working on a legacy Application which has some cross site scripting - reflected issues when we take the input from query string. The issues are being reported by Fortify code scan (WebInspect) tool.
For example:
I have a page called ProgressDisplay.aspx which takes reportPath as a query string parameter.
/ReportViewer/ProgressDisplay.aspx?reportPath=%27%3b%61%6c%65%72%74%28%35%36%36%34%35%29%2f%2f
In the above code reportPath is a query string parameter where the malicious payload is being passed which shows an alert in the response.
Above payload becomes alert(56645) after rendering.
Like this, there are several similar issues are being reported. Is there any centralized approach to fix all the issues at one shot by using any ASP .Net library Or making some changes in the config instead of fixing each issue Or I'll have to fix all the issue one by one?
After the fix, the page shouldn't return the response as 200 when a malicious script is inserted. We have to return a Bad Request in response.

Use below recommendation to avoid Cross site scripting attack in Microsoft.NET Language :
URL Encoding : It prevent malicious script from being injected into a URL.
Not: You can use Microsoft Web Protection Library (WPL) to avoid all xss-Reflected issues.
eg: <a href=<%# Microsoft.Security.Application.Encoder.UrlEncode(TEST.Url) %>>View Details</a>
Enabling a Content Security Policy (CSP)
Validated Input data:Input data should be validated Before execute.
XSS- Microsoft.NET
Encode data on output

How to obtain urls with descriptive content

Asp.Net Web Forms: I need an advice about handling urls like many news site do.
Example: i've seen that some site publish an article about a soccer match and its url is something like this:
http://siteAddress/soccer/big-match-Milan-Champions-league
In an intranet scenario (that i'm used to) i would have had a table with a numeric id and my url would have been this:
http:/ipAddress/article.aspx?id=345
How can i obtain an url like that?
I know the urlrewrite concepts in asp.net.
Thank you!

ScottGu's article describes several ways to solve this problem: http://weblogs.asp.net/scottgu/archive/2007/02/26/tip-trick-url-rewriting-with-asp-net.aspx
On my site, I created an HttpModule (like his 2nd & 3rd approaches). The module calls the database to lookup the article ID that matches the given URL. If a URL is found, then the module calls Context.RewritePath to send the incoming request to the correct page.
If you do this, make sure to cache the database calls. It would be best to not call the database for every incoming request. Depending on how often you generate new content, you could pre-fetch all of the "url to article ID" mappings from your database so that you wouldn't need any database hits to rewrite the URL.

Is there any downside for using a leading double slash to inherit the protocol in a URL? i.e. src="//domain.example"

I have a stylesheet that loads images from an external domain and I need it to load from https:// from secure order pages and http:// from other pages, based on the current URL. I found that starting the URL with a double slash inherits the current protocol. Do all browsers support this technique?
HTML ex:
<img src="//cdn.domain.example/logo.png" />
CSS ex:
.class { background: url(//cdn.domain.example/logo.png); }

If the browser supports RFC 1808 Section 4, RFC 2396 Section 5.2, or RFC 3986 Section 5.2, then it will indeed use the page URL's scheme for references that begin with "//".

When used on a link or #import, IE7/IE8 will download the file twice per http://paulirish.com/2010/the-protocol-relative-url/
Update from 2014:
Now that SSL is encouraged for everyone and doesn’t have performance concerns, this technique is now an anti-pattern. If the asset you need is available on SSL, then always use the https:// asset.

One downside occurs if your URLs are viewed outside the context of a web page. For example, an email message sitting in an email client (say, Outlook) effectively has no URL, and when you're viewing a message containing a protocol-relative URL, there is no obvious protocol context at all (the message itself is independent of the protocol used to fetch it, whether it's POP3, IMAP, Exchange, uucp or whatever) so the URL has no protocol to be relative to. I've not investigated compatibility with email clients to see what they do when presented with a missing protocol handler - I'm guessing that most will take a guess at http. Apple Mail refuses to let you enter a URL without a protocol. It's analogous to the way that relative URLs do not work in email because of a similarly missing context.
Similar problems could occur in other non-HTTP contexts such as in tweets, SMS messages, Word documents etc.
The more general explanation is that anonymous protocol URLs cannot work in isolation; there must be a relevant context. In a typical web page it's thus fine to pull in a script library that way, but any external links should always specify a protocol. I did try one simple test: //stackoverflow.com maps to file:///stackoverflow.com in all browsers I tried it in, so they really don't work by themselves.

The reason could be to provide portable web pages. If the outer page is not transported encrypted (http), why should the linked scripts be encrypted? This seems to be an unnecessary performance loss. In case, the outer page is securely transported encrypted (https), then the linked content should be encrypted, too. If the page is encrypted, the linked content not, IE seems to issue a Mixed Content warning. The reason is that an attacker can manipulate the scripts on the way. See http://ie.microsoft.com/testdrive/Browser/MixedContent/Default.html?o=1 for a longer discussion.
The HTTPS Everywhere campaign from the EFF suggests to use https whenever possible. We have the server capacity these days to serve web pages always encrypted.

Just for completeness. This was mentioned in another thread:
The "two forward slashes" are a common shorthand for "whatever protocol is being used right now"
if (plain http environment) {
use 'http://example.com/my-resource.js'
} else {
use 'https://example.com/my-resource.js'
}
Please check the full thread.

It seems to be a pretty common technique now. There is no downside, it only helps to unify the protocol for all assets on the page so should be used wherever possible.

RFC question about cookies and paths

I'm trying to set a session cookie restricted to a particular path (let's say /foo) when a user logs in. The complication being that the login page is on /, but the request immediately redirects to /foo/something. Something like this:
Request:
POST / HTTP/1.1
username=foo&password=bar
Response:
HTTP/1.0 302 Found
Location: http://example.com/foo/home
Set-Cookie: session=whatever; path=/foo
However, the relevant bits of the RFCs I could find (rfc2109 and rfc2965) say this:
To prevent possible security or privacy violations, a user agent
rejects a cookie (shall not store its information) if any of the
following is true:
The value for the Path attribute is not a prefix of the request-
URI.
...
The cookie-setting process described above seems to work okay, but as far as I can tell the RFCs are saying it shouldn't.
I'd like to use this in a production system, but I really don't want to do that if I'm going to face horrible browser incompatibility problems later.
Am I misreading the RFCs?
Thanks in advance!

Don't pay any attention to those RFCs; they diverge from reality pretty badly.
There's currently an IETF WG that's documenting actual cookie behaviour; their document, while just a draft, is much better source material.
See:
http://datatracker.ietf.org/doc/draft-ietf-httpstate-cookie/
If you don't find text that addresses your question in the draft, bring it up with the Working Group!

Based on your question, I think your understanding of the RFC is correct. It sounds like you want to set the cookie after the redirect to '/foo/home'. I think the real question is: "How do you tell '/foo/home' that the user was authenticated correctly by '/'?"
If you must use a Location header (redirect) to get from '/' to '/foo/home', it seems the only way to do this would be to use a query string parameter in the Location header's value.
Maybe a design question to consider is: why are users authenticating against a URL outside of the path they will be accessing securely? If the only secure content is under '/foo', then why not POST to '/foo/login' instead of '/' for authentication?

Accessing Jump Links (the part of the URL after a hasch character, #) from the code behind

Anyone know if it's possible to access the name of a jump link in c# code?
I'm doing some url Rewriting stuff and I'm thinking I might not be able to see that part of the URL.
So basically, my URL looks a little something like this:
http://www.mysite.com/Terms.aspx#Term1
And I want to access "Term1". I can't see it in the ServerVariables...
Any ideas?!?!?
THANKS!

The hash character is meant for client side navigation. Anything after the # is not submitted to the server.
From the wikipedia article:
The fragment identifier functions differently than the rest of the URI: namely, its processing is exclusively client-side with no participation from the server. When an agent (such as a Web browser) requests a resource from a Web server, the agent sends the URI to the server, but does not send the fragment.
Its technical name is Fragment Identifier

Perhaps System.Uri.Fragment? Or what is it you don't see?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How does StackExchange handle invalid characters in route URLs? - asp.net

They don't accept just any character. They use slugs.

Related

How to sanitize the query string input against cross site scripting - reflected issues in ASP .Net (Web Forms) Application?

How to obtain urls with descriptive content

Is there any downside for using a leading double slash to inherit the protocol in a URL? i.e. src="//domain.example"

RFC question about cookies and paths

Accessing Jump Links (the part of the URL after a hasch character, #) from the code behind

Categories

Resources