is something like
www.example.com?&hello=world
valid can you have ?& on valid URL, I tested on firefox, chrome, and safari, they seem to just remove the & between the ? and the h
thanks
The part between ? and # or the end of the URI is called “query”. According to the specification there are no restrictions on how the part actually may look like. So yes, it is a valid URI. The key=value format is just one possible (albeit the most common one) format for that place.
As for how servers or clients parsing that URL of yours behave, they will just ignore that empty section. So no, it’s not a problem, and you won’t run into issues because of that extraneous ampersand.
Related
What is going on with the following cookie:
"=value"
In Chrome and Firefox this is identical to:
"value"
i.e. the value for empty cookie name becomes a cookie name.
Is there any official reason for this behavior?
It looks like a bug, since rfc says:
If the name string is empty, ignore the set-cookie-string entirely.
The cookie RFC standards are a bit vague and contradictory in places, and have also changed behaviour over various revisions. Consequently, the browsers also have varying behaviour as far as the requirements for cookies. So in short, for some browsers an empty cookie name is fine, for others not. If this is an app you're building (that you want to work across the various browsers) then you'd be probably safest setting a cookie name.
https://www.rfc-editor.org/rfc/rfc6265#section-5.2
5. If the name string is empty, ignore the set-cookie-string
entirely.
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-rfc6265bis-05#section-5.3
2. If the name-value-pair string lacks a %x3D ("=") character, then
the name string is empty, and the value string is the value of
name-value-pair.
Otherwise, the name string consists of the characters up to, but
not including, the first %x3D ("=") character, and the (possibly
empty) value string consists of the characters after the first
%x3D ("=") character.
I stumbled upon the same question today.
To clarify the answer of #buffoonism ...
https://stackoverflow.com/a/72250741/2323764
The set-cookie header must be ignored.
According to RFC 3986 the following characters are reserved and need to be percent-encoded in order to be used in a URI other than as their reserved uses:
:/?#[]#!$&'()*+,;=
Furthermore it specifies some characters that are specifically unreserved: a-zA-Z0-9\-._~
It seems clear that generally one should encode reserved characters (to prevent misinterpretation) and not encode unreserved characters (for readability), but how should characters that do not fall into either category be handled? For example { and } do not appear in either list, but they are standard ASCII characters.
Looking to modern browsers for guidance, it seems they sometimes have different behaviors.
For example, consider pasting the URL https://www.google.com/search?q={ into the address bar of a web browser:
Chrome 34.0.1847.116 m does not change it.
Firefox 28.0 does not change it.
Internet Explorer 9.0 does not change it.
Safari 5.1.7 changes it to https://www.google.com/search?q=%7B
However, if one pastes https://www.google.com/#q={ (removing "search" and changing the ? to a #, making the character part of the fragment/hash rather than the query string) we find that:
Chrome 34.0.1847.116 m changes it to https://www.google.com/#q=%7B (via JavaScript)
Firefox 28.0 does not change it.
Internet Explorer 9.0 does not change it.
Safari 5.1.7 changes it to https://www.google.com/#q=%7B (before executing JavaScript)
Furthermore, when using JavaScript to perform the request asynchronously (i.e. using this MDN example modified to use a URL of ?q={), the URL is not percent-encoded automatically. (I'm guessing this is because the XMLHttpRequest API assumes that the URL be encoded/escaped beforehand.)
I would like to (for a reason related to a bizarre customer requirement) use { and } in the filename portion of URLs without (1) breaking things and ideally also without (2) creating ugly-looking percent-encoded entries in the network panel of modern browsers' web inspectors/debuggers.
(RFC 2396)
You should be encoding any of the unwise section and the rfc gives the reason.
additional information from the RFC
Account for < > # % primarily
any control characters 00-1F and 7F
also marked as unwise in the rfc: " { } | \ ^ [ ] `
if you are intending to allow for # to be in the querystring values then that's a special case, because a # is a fragment identifier of a uri.
Some characters which do not have to be encoded, are accepted either encoded or not such as ~
There are 2 generally accepted encodings for (space) %20 and +
Here's a fiddle with some of the test cases I'm using.
Should the following URLs be considered functionally equivalent?
http://example.com/foo?a=&b=
http://example.com/foo?a&b
This came about when a user of a Drupal module I wrote which parses apart and then rewrites URIs noticed that the code sometimes causes the query string parts to change in unexpected ways due to how some of the underlying PHP functions behave. For example:
parse_str("a&b", $values); print http_build_query($values);
a=&b=
Is this something I should bother worrying about?
Edit so SO stops complaining that this question is similar to another one: The question is whether it's safe to assume that "no value for X" and "empty value for X" are equivalent, not whether the "no value" style is syntactically correct (which it is).
RFC 3986 Uniform Resource Identifier (URI): Generic Syntax doesn't have anything to say about the structure of the query string aside from how characters like ? should be dealt with. So strictly speaking, your two example URLs are different. Of course, the application which receives those query strings may treat them as functionally equivalent, but this isn't something you can determine from the URL alone.
As per RFC6570 empty query parameters are allowed. Please refer to section 3.2.9
Example Template Expansion
{&x,y,empty} &x=1024&y=768&empty=
In some javascript, I have:
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
alert( url );
location.href = url;
where the value of address is the string "Seattle, WA".
In the alert I see
find.aspx?Seattle%2C%20WA
as I expect.
But on the server side, when I look at Request.Url, the relevant substring I see is
find.aspx?Seattle, WA
And in the Firefox url window I see
find.aspx?location=Seattle%2C WA
So I'm getting three different representations whereas I would expect that in all three places I should see what I see in the alert. My expectation is that the url I assign to location.href should show up as-is in the browser url window, and should be passed as-is to the server in Request.Url (and I would need to decode the values on the server before using them). What's happening?
Firefox converts certain encoded characters into their literal forms as a way to be friendly to users. It will also convert spaces typed into the address bar into %20 for the server.
Update: The reason Firefox doesn't display the comma unencoded is because commas are allowed in URLs, but spaces are not, so it knows that a space is going to be unambiguously interpreted, whereas the pre-encoded comma is different from a non-encoded comma to some servers. see: Can I use commas in a URL?
ASP is probably trying to help you out by auto-un-encoding the string for you.
Update: It looks like ASP.NET unencodes Request.Url for you by default, as mentioned here: QueryString malformed after URLDecode They also mention that you can use HttpRequest.Url.Query to access the un-decoded version.
The alert is the only thing not doing any "magic" for you.
For the alert, you are doing the encoding yourself. Perhaps it looks the same as on the server-side if you removed encodeURIComponent.
On the server side, ASP.NET will always show you the unencoded form. This is to make it easier to directly map to files that also have text that needed to be (un)encoded.
Note that you can replace every letter for its UTF8 representation in URL Encoding. It will still be the same URL. I.e., type the following in the browser window and it will still work: %66%59%6E%64.aspx?location=Seattle%2C%20WA. To only encode the necessary chars, use UrlEncode on the server side if you create a link yourself.
URL encoding can become fairly tricky. You ask to explain it. To know the correct escape of a certain character, you need to know how that character looks in UTF8. The hexadecimal value of the UTF-8 bytes then become the %XX%YY value of your letter. Sometimes it's one %XX, but it can be up to six byte sequences in total (some Chinese characters for instance).
URL Encoding works one way only. Never double-encode or double-unencode. This is prohibited by the specification. Also, because you can encode any character, it is not always possible (as you found out) to do roundtrip encoding/unencoding. If you unencode and re-encode again, it is well possible that the resulting string is different, but syntactically the same.
In HTML, URL Encoding is sometimes interspersed with HTML Encoding. I.e., the ampersand is valid in HTML, but not in HTML. find.aspx?city=A&name=B becomes find.aspx?city=A&name=B in and HTML URL. However, browsers are lenient and will accept wrongly HTML-encoded strings.
Finally, a not on the browser: if you type in a space in a link, even inside an <a> tag, it will escape the space (or other character) for you. Likewise, it will nowadays show the odd characters (é, ï etc) in the address bar, but when it sends it over HTTP, the browser will correctly do the encoding for you.
Update: about anwering your question of needing a "definitive" reference or proof.
While I couldn't find any on the internet, I decided to look for it myself using Reflector. Going through the methods that set, for instance, the HttpRequest.QueryString, you quickly encounter the private method HttpRequest.FillInQueryStringCollection which then calls HttpValueCollection.FillfromEncodedBytes. Somewhat near the end of that method, HttpUtility.UrlDecode is called for the values. Conclusion: do not call it yourself, to prevent double decoding.
You can see this for yourself when you download Reflector and disassemble the .NET libs of System.Web.
For your example you can change this line
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
to
var url = "find.aspx?" + "location=" + address;
and see the address as it is. Bu if address variable contains any '&' character your variable will be corrupt. So you are using encodeURIComponent to encode these things url.
On the Server side all these encoded strings are decoded back. It means encodeURIComponent is just for sending the address variable (whether it contains & character or not) to server side correctly.
I have a query string parameter value that contains an ampersand. For example, a valid value for the parameter may be:
a & b
When I generate the URL that contains the parameter, I'm using System.Web.HTTPUtility.UrlEncode() to make each element URL-friendly. It's (correctly) giving me a URL like:
http://example.com/foo?bar=a+%26b
The problem is that ASP.NET's Request object is interpreting the (encoded) ampersand as a Query String parameter delimiter, and is thus splitting my value into 2 parts (the first has "bar" as the parameter name; the second has a null name).
It appears that ASP.NET is URL-decoding the URL first and then using that when parsing the query string.
What's the best way to work around this?
UPDATE: The problem hinges on URLRewriter (a third-party plugin) and not ASP.NET itself. I've changed the title to reflect this, but I'll leave the rest of the question text as-is until I find out more about the problem.
man,
i am with you in the same boat, i have spent like hours and hours trying to figure out what is the problem, and as you said it is a bug in both, as normal links that contain weird characters or UTF-8 code characters are parsed fine by asp.net.
i think we have to switch to MVC.routing
Update: man you wont believe it, i have found the problem it is so strange, it is with IIS,
try to launch your page from visual studio Dev server and Unicode characters will be parsed just fine, but if you launch the page from IIS 7 it will give you the ???? characters.
hope some body will shade some light here
I would have thought that %26 and '&' mean exactly the same thing to the web server, so its the expected behavior. Urlencode is for encoding URLs, not encoding query strings.
... hang on ...
Try searching for abc&def in google, you'll get:
http://www.google.com.au/search?q=abc%26def
So your query string is correct, %26 is a literal ampersand. Hmm you're right, sounds like a bug. How do you go with an & instead of the %26 ?
Interesting reading:
http://www.stylusstudio.com/xsllist/200104/post11060.html
Switching to UrlRewritingNet.UrlRewrite did not help, as it apparently has the same bug. I'm thinking it might have something to do with ASP.NET after all.
I think URLRewriter has a problem with nameless parameters (null name).
I had a similar problem. When I gave my nameless parameter a (dummy) name, everything worked as expected.