Parameter separator in URLs, the case of misused question mark - http

What I don't really understand is the benefit of using '?' instead of '&' in urls:
It makes nobody's life easier if we use a different character as the first separator character.
Can you come up with a reasonable explanation?
EDIT: after more research I found that "&" can be a part of file name (terms&conditions.html) so "?" is a good separator. But still I think using "?" for separators makes lives easier (from url generators and parsers point of view):
Is there any advantage in using "&" which is not clear at the first glance?

From the URI spec's (RFC 3986) point of view, the only separator here is "?". the format of the query is opaque; the ampersands just are something that HTML happens to use for form submissions.

The answer's pretty much in this article - http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/ . To highlight it, here goes :
Query is the preferred way to send some parameters to a resource on
the server. These are key=value pairs and are separated from the rest
of the URL by a ? (question mark) character and are normally separated
from each other by & (ampersand) characters. What you may not know is
the fact that it is legal to separate them from each other by the ;
(semi-colon) character as well. The following URLs are equivalent:
http://www.blah.com/some/crazy/path.html?param1=foo&param2=bar
http://www.blah.com/some/crazy/path.html?param1=foo;param2

The RFC 3896 (https://www.ietf.org/rfc/rfc3986.txt) defines general and sub delimiters ... '?' is a general, '&' and ';' are sub. The spec is pretty clear about that.
In this case the latter '?' chars would be treated as part of the query. If the query parser follows the spec strictly, it would then pass the whole query on to the app-destination. If the app-destination could choose to further process the query string in a manner which treats the ? as a param name-value pairs delimiter, that is up to the app's designers.
My guess is that this often 'just works' because code that splits query strings and the original uri uses all delimiters for matching: 1) first query is split on '?' then 2) query string is parsed using char match list that includes '?' (convenience only).... This could be occurring in ubiquitous parsing libraries already.

Related

What's the best way to get a file's basename from an URI, in Vala?

I can think of two ways: first, I could just manipulate the string itself; strip everything that precedes the last "/". Or, I could use the URI to get a File object, then call query_info().get_display_name().
The first doesn't feel right, while the second results in two objects being created. What is the best practice to follow here?
The second way (using GLib.File) is probably the most robust, but...
If what you have is really a path, not a URI (e.g.., /home/foo/bar not file:///home/foo/bar) you can just use GLib.Path.get_basename:
GLib.Path.get_basename ("/home/foo/bar");
Because characters in a URI can be encoded (e.g., %20 instead of a space), if you really have a URI you may need to unescape the string first:
GLib.Path.get_basename (GLib.Uri.unescape_string ("file:///home/foo/bar%20baz"));

In ASP.NET, why is there UrlEncode() AND UrlPathEncode()?

In a recent project, I had the pleasure of troubleshooting a bug that involved images not loading when spaces were in the filename. I thought "What a simple issue, I'll UrlEncode() it!" But, NAY! Simply using UrlEncode() didn't resolve the problem.
The new problem was the HttpUtilities.UrlEncode() method switched spaces () to plusses (+) instead of %20 like the browser wanted. So file+image+name.jpg would return not-found while file%20image%20name.jpg was found correctly.
Thankfully, a coworker pointed out HttpUtilities.UrlPathEncode() to me which uses %20 for spaces instead of +.
WHY are there two ways of handling Url encoding? WHY are there two commands that behave so differently?
UrlEncode is useful for use with a QueryString as browsers tend to use a + here in place of a space when submitting forms with the GET method.
UrlPathEncode simply replaces all characters that cannot be used within a URL, such as <, > and .
Both MSDN links include this quote:
You can encode a URL using with the UrlEncode method or the
UrlPathEncode method. However, the methods return different results.
The UrlEncode method converts each space character to a plus character
(+). The UrlPathEncode method converts each space character into the
string "%20", which represents a space in hexadecimal notation. Use
the UrlPathEncode method when you encode the path portion of a URL in
order to guarantee a consistent decoded URL, regardless of which
platform or browser performs the decoding.
So in a URL you have the path and then a ? and then the parameters (i.e. http://some_path/page.aspx?parameters). URL paths encode spaces differently then the url parameters, that's why there is the two versions. For a long time spaces were not valid in a URL, but were in in the parameters.
In other words the formatting urls has changed over time. For a long time only ANSI chars could be in a URL too.

Query string: Can a query string contain a URL that also contains query strings?

Example:
http://foo.com/generatepdf.aspx?u=http://foo.com/somepage.aspx?color=blue&size=15
I added the iis tag because I am guessing it also depends on what server technology you use?
The server technology shouldn't make a difference.
When you pass a value to a query string you need to url encode the name/value pair. If you want to pass in a value that contains a special character such as a question mark (?) you'll just need to encode that character as %3F. If you then needed to recursively pass another query string to the encoded url, you'll need to double/triple/etc encode the url resulting in the original ? turning into %253F, %25253F, etc.
you'll probably want to UrlEncode the url that is in the query string.
As reported in http://en.wikipedia.org/wiki/Query_string
W3C recommends that all web servers support semicolon separators in
addition to ampersand separators (link reported on that wiki page) to allow
application/x-www-form-urlencoded query strings in URLs within HTML
documents without having to entity escape ampersands.
So, I suppose the answer to the question is yes and you have to change in a ";" semicolon the "&" ampersand usaully used for key=value separator.
Yes it can, as far as I can tell, according to RFC 3986: Uniform Resource Identifier (URI): Generic Syntax (from year 2005):
This is the BNF for the query string:
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
The spec says:
The characters slash ("/") and question mark ("?") may represent data within the query component.
as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters
(But I suppose your server framework might or might not follow the specification exactly.)
No, but you can encode the url and decode it later.

regular expression for physical path

can someone tell me the javascript regular expression for physical path like
1) User Should enter something like this in the textbox( c://Folder1/) . Maybe in d: or e:
2) But after that acceptable
a) (c://Folder1/Folder2/)
b) (d://Folder1/Folder2/Folder3/abc.txt)
e) (c://Folder1/Folder2/Folder3/abc.txt)
From the examples you've given, something like this should work:
[a-zA-Z]://(\w+/)+
ie:
[a-zA-Z] = a single letter (upper or lower case)
followed by
:// = the characters "://"
followed by:
(\w+/)+ = at least one "something/".
"something/" defined as :
\w+ = at least one word character (ie any alphanumeric), followed by
/ = literal character "/"
Hope this helps - my syntax may be a little off as I'm not fully up to speed on the javascript variant for regex.
Edit: put regex in code tags so it is visible! And tidy up explanation.
This problem is actually trickier than you think. You're trying to validate a path, but paths can be surprisingly hard to properly validate. Are you properly handling UNC network paths, e.g.?
This is known as the canonicalization problem and is part of writing secure code. I suggest checking out some guidance from Microsoft for properly canonicalizing and validating the path in your application. The advantage of canonicalizing your path is that you also implicitly validate its format because the canonical form will be returned from a library call that will only return paths that are potentially valid (properly formatted). This means that you don't have to do any sort of regex validation at all. Just throw your string at the method that canonicalizes the path (Path.GetFullPath() probably) and handle the exception for an invalid path.

Help with a regular expression to validate a series of n email addresses seperated by semicolons

I'm using an asp.net Web Forms RegularExpressionValidator Control to validate a text field to ensure it contains a series of email addresses separated by semicolons.
What is the proper regex for this task?
I think this one will work:
^([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}(;|$))+
Breakdown:
[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4} : valid email (from http://www.regular-expressions.info/)
(;|$) : either semicolon or end of string
(...)+ : repeat all one or more times
Make sure you are using case-insensitive matching. Also, this pattern does not allow whitespace between emails or at the start or end of the string.
The 'proper' (aka RFC2822) regex is too complicated. Try something like (\S+#[a-zA-Z0-9-.]+(\s*;\s*|\s*\Z))+
Not perfect but should be there 90% (haven't tried it, so it might need some alteration)
Note: Not too sure about \Z it might be a Perl only thing. Try $ as well if it doesn't work.

Resources