HTTP POST request - sending key value pairs - value contains '&' - http

'&' is used as separator between key and value pairs.
But one of my value contains '&', how can I send this data?

You will have to url encode it (most http related libraries have utility functions for doing so).
For example, key=&value will become key=%26value
You can find more information in Wikipedia.
When a character from the reserved set (a "reserved character") has
special meaning (a "reserved purpose") in a certain context, and a URI
scheme says that it is necessary to use that character for some other
purpose, then the character must be percent-encoded. Percent-encoding
a reserved character involves converting the character to its
corresponding byte value in ASCII and then representing that value as
a pair of hexadecimal digits. The digits, preceded by a percent sign
("%") which is used as an escape character, are then used in the URI
in place of the reserved character. (For a non-ASCII character, it is
typically converted to its byte sequence in UTF-8, and then each byte
value is represented as above.) The reserved character "/", for
example, if used in the "path" component of a URI, has the special
meaning of being a delimiter between path segments. If, according to a
given URI scheme, "/" needs to be in a path segment, then the three
characters "%2F" or "%2f" must be used in the segment instead of a raw
"/".
This question is probably a duplicate of this.

Related

Semicolon in URLs

I have a URL like that: localhost:8080/demo/
And when I call localhost:8080/demo/''''''''' It working fine.
But when I try with localhost:8080/demo/;;; It not working and return HTTP code 404 Not Found.
I tried with few special character # % \ ? / , it returned 400 too.
Anyone can explain it for me?
Thank you so much!
These special characters are not directly allowed in URLs,
because they have special meanings there.
For example:
/ is separator within the path,
? marks the query-part of an URL,
# marks a page-internal link,
etc.
Quoted from Wikipedia: Percent-encoding reserved characters:
When a character from the reserved set (a "reserved character")
has special meaning (a "reserved purpose") in a certain context,
and a URI scheme says that it is necessary to use that character
for some other purpose, then the character must be percent-encoded.
Percent-encoding a reserved character involves converting the
character to its corresponding byte value in ASCII and then
representing that value as a pair of hexadecimal digits. The digits,
preceded by a percent sign (%) which is used as an escape character,
are then used in the URI in place of the reserved character.
For example: ; is a reserved character. Therefore, when ; shall occur
in an URL but without having its special meaning, then it needs to be
replaced by %3B as defined here

http header value safe characters

I have a method which encodes some key-value entries into an ASCII string with Percent-Encoding.
The result value is expected to be used as a http header value.
With following entries
("English", "love")
("한국어", "사랑")
The method generates
%ED%95%9C%EA%B5%AD%EC%96%B4=%EC%82%AC%EB%9E%91&English=love
Which looks like
key=value(&key=value)*
Keys and values are encoded as Percent-Encoding
Encoded key and value are concatenated with =.
Pairs of encoded key and values are concatenated with &.
My question is, Is this output string can be used as http header field-value?
Is there any problem or concern?
As long you use printable US-ASCII, there shouldn't be a problem.

Why does the encoding's of a URL and the query string part differ?

I was researching why my query parameters have plus + signs in it instead of %20 and why they have strings like %C3%BC instead of a ü (UTF-8) as an encoded URL does.
After 2 hours of thinking my webapp is not compatible to the URL encoding standard I found that the encoding scheme of a query string is not the same as the encoding of a URL (here i mean the part without the query string).
Examples:
URL:
whitespace encodes to %20
UTF-8 chars stays UTF-8 chars
Query params:
whitespace encodes to +
UTF-8 chars encodes to the hex representation
So can someone tell me why do the encoding schemes differ, since the query parameters are a part of the URL?
See:
wiki Percent-encoding
wiki: Query String
URIs originated in RFC 1630, with percent-encoding as a method to allow "unsafe" characters to be represented. This original version actually mentioned the ISO Latin 1 character set as the encoding for non-ASCII characters. RFC 1738 later that year removed this reference to Latin-1 in defining URLs.
The query string format is actually a different but related encoding, application/x-www-form-urlencoded, defined in RFC 1866 along with HTML 2.0. It was based on RFC 1738, but specified that spaces (not all whitespace, just the character with ASCII code 0x20) are replaced by '+' and that line breaks are to be encoded as CRLF (i.e. %0D%0A). The former is likely because that saves 2 bytes for a very common character in form submissions at the expense of using an extra 2 bytes for a much less common character, and the latter is to avoid problems when transferring between systems using different end-of-line codings. Non-ASCII characters were left unconsidered.
UTF-8 coding in URIs came over a decade later, in RFC 3986, although individual protocols may have specified this or another encoding of non-ASCII characters earlier. To maintain backwards compatibility, all UTF-8 octets must be percent-encoded. The companion RFC 3987 defines "Internationalized Resource Identifiers" (IRIs) which are basically "URIs with most codepoints 160 and above allowed to appear unencoded", but many protocols still require URIs. Note that your statement above is incorrect, as a URL may not contain an unencoded ü or any other non-ASCII character.
application/x-www-form-urlencoded has been internationalized in a different manner. The HTML5 specification of application/x-www-form-urlencoded explicitly allows that any ASCII-compatible character set may be used for characters in the query string, and in fact different fields may use different character sets, but all non-ASCII octets must still be percent-encoded. When used in the query part of an IRI, it is possible that these characters could be represented unencoded if properly-normalized UTF-8 is being used as the character set, since conversion back to a URI would result in correct application/x-www-form-urlencoded data.
They don't necessarily have to differ, a + is a valid path character and a ü is a valid search character (per RFC 3987). You're probably seeing browsers or some other preconceived encoding scheme making assumptions that are either outdated or overly cautious.
There is no difference between + and %20 when it comes to Query string parameters:
SPACE is encoded as '+' or '%20'
Quote reference

How exactly does an array get urlencoded?

Many languages allow one to pass an array of values through the url. I need to , for various reasons, directly construct the url by hand. How is an array of values urlencoded?
It looks like the content in the form of MIME-Type: application/x-www-form-urlencoded.
This is the default content type. Forms submitted with this content type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by +, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., %0D%0A).
The control names/values are listed in the order they appear in the document. The name is separated from the value by = and name/value pairs are separated from each other by &.
Which is used for the POST. To do it for the GET, you'll have to append a ? after your URL, and the rest is almost equal. In the comments, mdma states, that the URL may not contain a + for a space character. Instead use %20.
So an array of values:
http://localhost/someapp/?0=zero&1=valueone%20withspace&2=etc&3=etc
Often there is some functionality in libraries that will do the URL encoding for you (point 1). Point two is easily implementable by looping over your array, building the string, appending the index, =, the URL encoded value and when it's not the last entry an &.

Multiple Base64 encoded parameters that appear as 1 in a URL query string

I need to pass 2 parameters in a query string but would like them to appear as a single parameter to the user. At a low level, how can I concatinate these two values and then later separate them? Both values are Base64 encoded.
?Name=abcyxz
where both abc and xyz are separate Base64 encoded strings.
why don't you just do something like this
temp = base64_encode("var1=abc&var2=yxz")
and then call
?Name=temp
Later you can decode the whole string and split the vars.
(sry for pseudo code :P)
Edit: a small quote from wikipedia
The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols. The "=" symbol is also used as a special suffix code. The original specification, RFC 989, additionally used the "*" symbol to delimit encoded but unencrypted data within the output stream.
You should either use some separator or store the length of the first item.
First of all, I would be curious as to why you can't just pass two parameters. But with that as a given, just choose any character that's a valid character in a URL query string, but won't show up in your base64 encoding, such as ~

Resources