How exactly does an array get urlencoded? - http

Many languages allow one to pass an array of values through the url. I need to , for various reasons, directly construct the url by hand. How is an array of values urlencoded?

It looks like the content in the form of MIME-Type: application/x-www-form-urlencoded.
This is the default content type. Forms submitted with this content type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by +, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., %0D%0A).
The control names/values are listed in the order they appear in the document. The name is separated from the value by = and name/value pairs are separated from each other by &.
Which is used for the POST. To do it for the GET, you'll have to append a ? after your URL, and the rest is almost equal. In the comments, mdma states, that the URL may not contain a + for a space character. Instead use %20.
So an array of values:
http://localhost/someapp/?0=zero&1=valueone%20withspace&2=etc&3=etc
Often there is some functionality in libraries that will do the URL encoding for you (point 1). Point two is easily implementable by looping over your array, building the string, appending the index, =, the URL encoded value and when it's not the last entry an &.

Related

Semicolon in URLs

I have a URL like that: localhost:8080/demo/
And when I call localhost:8080/demo/''''''''' It working fine.
But when I try with localhost:8080/demo/;;; It not working and return HTTP code 404 Not Found.
I tried with few special character # % \ ? / , it returned 400 too.
Anyone can explain it for me?
Thank you so much!
These special characters are not directly allowed in URLs,
because they have special meanings there.
For example:
/ is separator within the path,
? marks the query-part of an URL,
# marks a page-internal link,
etc.
Quoted from Wikipedia: Percent-encoding reserved characters:
When a character from the reserved set (a "reserved character")
has special meaning (a "reserved purpose") in a certain context,
and a URI scheme says that it is necessary to use that character
for some other purpose, then the character must be percent-encoded.
Percent-encoding a reserved character involves converting the
character to its corresponding byte value in ASCII and then
representing that value as a pair of hexadecimal digits. The digits,
preceded by a percent sign (%) which is used as an escape character,
are then used in the URI in place of the reserved character.
For example: ; is a reserved character. Therefore, when ; shall occur
in an URL but without having its special meaning, then it needs to be
replaced by %3B as defined here

HTTP POST request - sending key value pairs - value contains '&'

'&' is used as separator between key and value pairs.
But one of my value contains '&', how can I send this data?
You will have to url encode it (most http related libraries have utility functions for doing so).
For example, key=&value will become key=%26value
You can find more information in Wikipedia.
When a character from the reserved set (a "reserved character") has
special meaning (a "reserved purpose") in a certain context, and a URI
scheme says that it is necessary to use that character for some other
purpose, then the character must be percent-encoded. Percent-encoding
a reserved character involves converting the character to its
corresponding byte value in ASCII and then representing that value as
a pair of hexadecimal digits. The digits, preceded by a percent sign
("%") which is used as an escape character, are then used in the URI
in place of the reserved character. (For a non-ASCII character, it is
typically converted to its byte sequence in UTF-8, and then each byte
value is represented as above.) The reserved character "/", for
example, if used in the "path" component of a URI, has the special
meaning of being a delimiter between path segments. If, according to a
given URI scheme, "/" needs to be in a path segment, then the three
characters "%2F" or "%2f" must be used in the segment instead of a raw
"/".
This question is probably a duplicate of this.

Can the HTML 'class' element attribute contain line breaks?

Can the 'class' attribute of HTML5 elements contain line breaks? Is it allowable in the specs and do browsers support it?
I ask because I have some code that dynamically inserts various classes into the element and this has created one very long line that is hard to manage. Normally I would build the class value using a variable but the CMS I'm using requires the template conditional tags to be positioned inline with the HTML. I can't use variables or PHP.
What I found in my research is that some HTML tag attributes need to be a single line, but I haven't been able to discover if the class attribute is one of those.
Does anyone know something about this?
Per the HTML 4 spec, the class attribute is CDATA:
User agents should interpret attribute values as follows:
o Replace character entities with characters
o Ignore line feeds
o Replace each carriage return or tab with a single space.
so you're in good shape there.
The HTML5 spec describes a class as a set of space separated tokens, where a 'space' includes newlines.
So you should be good there, too.
Can the [class] attribute of HTML5 elements contain line breaks?
Yes. The HTML5 spec says:
The attribute, if specified, must have a value that is a set of space-separated tokens representing the various classes that the element belongs to.
The link proceeds to say:
A set of space-separated tokens is a string containing zero or more words (known as tokens) separated by one or more space characters, where words consist of any string of one or more characters, none of which are space characters.
And space characters include:
space (' ')
tab (\t)
line feed (\n)
form feed (\f)
carriage return (\r)
The space characters, for the purposes of this specification, are U+0020 SPACE, "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), and "CR" (U+000D).
Newlines as you would add to UTF-8 documents are:
line feeds (\n)
carriage returns (\r)
a carriage return followed immediately by a line feed (\r\n)

HttpServerUtility.UrlPathEncode vs HttpServerUtility.UrlEncode

What's the difference between HttpServerUtility.UrlPathEncode and HttpServerUtility.UrlEncode? And when should I choose one over the other?
UrlEncode is useful for query string values (so to the left or especially, right, of each =).
In this url, foo, fooval, bar, and barval should EACH be UrlEncode'd separately:
http://www.example.com/whatever?foo=fooval&bar=barval
UrlEncode encodes everything, such as ?, &, =, and /, accented or other non-ASCII characters, etc, into %-style encoding, except space which it encodes as a +. This is form-style encoding, and is best for something you intend to put in the querystring (or maybe between two slashes in a url) as a parameter without it getting all jiggy with the url's control characters (like &). Otherwise an unfortunately placed & or = in a user's form input or db value value could break things.
EDIT: Uri.EscapeDataString is a very close match to UrlEncode, and may be preferable, though I don't know the exact differences.
UrlPathEncode is useful for the rest of the query string, it affects everything to the left of the ?.
In this url, the entire url (from http to barval) should be run through UrlPathEncode.
http://www.example.com/whatever?foo=fooval&bar=barval
UrlPathEncode does NOT encode ?, &, =, or /. It DOES, however, like UrlEncode, encode accented/non-ASCII characters with % notation, and space also becomes %20. This is useful to make sure the url is valid, since spaces and accented characters are not. It won't touch your querystring (everything to the right of ?), so you have to encode that with UrlEncode, above.
Update: as of 4.5, per MSDN reference, Microsoft recommends to only use UrlEncode. Also, the information previously listed in MSDN does not fully describe behavior of the two methods - see comments.
The difference is all in the space escaping - UrlEncode escapes them into + sign, UrlPathEncode escapes into %20. + and %20 are only equivalent if they are part of QueryString portion per W3C. So you can't escape whole URL using + sign, only querystring portion. Bottom line is that UrlPathEncode is always better imho
You can encode a URL using with the UrlEncode() method or the UrlPathEncode() method. However, the methods return different results. The UrlEncode() method converts each space character to a plus character (+). The UrlPathEncode() method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode() method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx
To explain it as simply as possible:
HttpUtility.UrlPathEncode("http://www.foo.com/a b/?eggs=ham&bacon=1")
becomes
http://www.foo.com/a%20b/?eggs=ham&bacon=1
and
HttpUtility.UrlEncode("http://www.foo.com/a b/?eggs=ham&bacon=1")
becomes
http%3a%2f%2fwww.foo.com%2fa+b%2f%3feggs%3dham%26bacon%3d1

Multiple Base64 encoded parameters that appear as 1 in a URL query string

I need to pass 2 parameters in a query string but would like them to appear as a single parameter to the user. At a low level, how can I concatinate these two values and then later separate them? Both values are Base64 encoded.
?Name=abcyxz
where both abc and xyz are separate Base64 encoded strings.
why don't you just do something like this
temp = base64_encode("var1=abc&var2=yxz")
and then call
?Name=temp
Later you can decode the whole string and split the vars.
(sry for pseudo code :P)
Edit: a small quote from wikipedia
The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols. The "=" symbol is also used as a special suffix code. The original specification, RFC 989, additionally used the "*" symbol to delimit encoded but unencrypted data within the output stream.
You should either use some separator or store the length of the first item.
First of all, I would be curious as to why you can't just pass two parameters. But with that as a given, just choose any character that's a valid character in a URL query string, but won't show up in your base64 encoding, such as ~

Resources