how to send binary data (del and null characters) through http url - http

I want to send some binary characters along with HTTP URL, can some one tell me the best way to do it.
Ex: \x7F/a.html (\x7F represents ASCII DEL in binary form)
Sending it with telnet or curl is sending it as a string. Do you think sending on the sockets directly will work
sock.send('GET /test\x7F/a.html HTTP/1.0\r\nHost: 1.1.1.1\r\n') will work??

According to the HTTP spec, the request-target token can have multiple values "derived" from a URI path. From the URI spec a path can only contain printable 7-bit ASCII alphanumeric characters and a few symbols like '-', '.', '%', '~' and others. It does not allow ASCII control characters.
According to the URI spec, path characters outside the printable 7-bit ASCII range should be percent-encoded, so ASCII DEL should be encoded %7F and ASCII NULL %00.
It's hard to say whether percent-encoding your binary characters “would work” as you do not explain what you expect to get from them. An HTTP request-target is an opaque identifier interpreted by the server, and need not correspond to a file name or actual data. It is perfectly feasible (and common) to refer to binary targets with ASCII alphanumeric request-targets.

Related

How to handle non-ascii characters in HTTP request header?

In our application, we are sending passwords as part of the header for authentication to our auth service. However, we're running into a situation where users are using non-ascii characters as part of their password, and I found out that non-ascii characters are not supported in HTTP.
What are some approaches to handling this?
You need to encode it in an ASCII compatible format.
Base 64 is such an encoding.
Here is an exemple of how they did it for the HTTP Basic Authentication using Base 64 encoding.
The Authorization field is constructed as follows:
The username and password are combined with a single colon (:). This means that the username itself cannot contain a colon.
The resulting string is encoded into an octet sequence. The character set to use for this encoding is by default unspecified, as long as it is compatible with US-ASCII, but the server may suggest use of UTF-8 by sending the charset parameter.
The resulting string is encoded using a variant of Base64.
The authorization method and a space (e.g. "Basic ") is then prepended to the encoded string.
For example, if the browser uses Aladdin as the username and OpenSesame as the password, then the field's value is the base64-encoding of Aladdin:OpenSesame, or QWxhZGRpbjpPcGVuU2VzYW1l. Then the Authorization header will appear as:
Authorization: Basic QWxhZGRpbjpPcGVuU2VzYW1l
So let's say your password is ǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟ, which cannot be represented using the ASCII charset.
Here is some pseudo code showing you how to do it
var password = 'ǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟ'
var base64EncodedPassword = base64Encode(password)
var httpHeader = new HttpHeader('Password', base64EncodedPassword)
And it would results in the following header. Represented using only ASCII char
Password: x4HHgseDx4THhceGx4fHiMeJx4rHi8eMx43HjsePx5DHkceSx5PHlMeVx5bHl8eYx5nHmsebx5zHnceex58=

How to use HMAC-SHA256 Authorization header with Unicode bytes instead of UTF-8?

I'm creating HMAC-SHA256 Authorization header for my rest request.
My hunch is that internally Paw is using UTF-8 (or some other non-Unicode) encoding to calculate the checksum. My server side API uses Unicode to calculate the same thing for comparison but with the same inputs I receive different outputs on each end :(
Is there a way to configure Paw to use Unicode?
For unicode inputs for HMAC-SHA256 you can use the Escape Sequence dynamic value. Choose ``Custom` escape sequence and type your sequence in the input field (\u + code for unicode characters and \x + code for hex bytes).
If this doesn't work for you, don't hesitate ti send us a support e-mail to support#luckymarmot.com

Characters in URL after host transmitted as-is?

If I visit the URL:
http://foo/bar
Than the web browser will connect to host foo by TCP on port 80 and transmit a GET request:
GET /bar HTTP...
Clearly not all characters in the bar part will work (be transmitted verbatim). For example the space character (#20).
Of the 256 possible bytes, which will a standard web browser transmit verbatim (as is, without special encoding) from a URL entered into the address bar to the GET request, and which will it not?
RFC-3986 talks about reserved and unreserved characters. Unreserved characters can be used without special encoding; everything else must be url encoded, using the %xx notation.
The unreserved characters include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde.
The browser is smart enough to automatically escape space and other characters before opening a socket connection to the server.
EDIT
Reserved characters need not be encoded when used for their intended purpose. But, when they are used as part of query string or path components, they must be url-encoded.
The reserved characters are ! * ' ( ) ; : # & = + $ , / ? # [ ]
For example, here is a url
http://example.com:8090/email/tigerwoods%40gmail.com?folder=sport%2Fgolf
Over here, the / and ? are not encoded when they serve their normal roles. However, the # symbol is encoded as part of the email. The / is also encoded as part of the query string parameter.

ampersand in URL of RSS Feed

As part of our app, user can save some data as XML on server which becomes RSS feed for them.
Now some of the file user created have & in file name as BB&T_RSS.xml.
So when user point this to http://example.com/BB&T.xml, they won't get this.
How to stop this? I tried BB%26T.xml, BB&T.xml without any success with IE, Chrome
use an
%26
for an
&
http://example.com/BB%26T.xml,
http://www.w3schools.com/tags/ref_urlencode.asp
then use
HttpServerUtility.UrlDecode Method
to get the file from the url again
URL encoding ensures that all browsers will correctly transmit text in URL strings. Characters such as a question mark (?), ampersand (&), slash mark (/), and spaces might be truncated or corrupted by some browsers. As a result, these characters must be encoded in tags or in query strings where the strings can be re-sent by a browser in a request string.
Many URL schemes reserve certain characters for a special meaning:
their appearance in the scheme-specific part of the URL has a
designated semantics. If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded. The characters ";",
"/", "?", ":", "#", "=" and "&" are the characters which may be
reserved for special meaning within a scheme. No other characters may
be reserved within a scheme. (src)

What's the correct encoding of HTTP get request strings?

Does the HTTP standard or something define which encoding should be used on special characters before they are encoded in url with %XXs? If it doesn't define is there a way define which encoding is used? It seems that most browsers send the data in utf-8.
Does the HTTP standard or something define which encoding should be used on special characters before they are encoded in url with %XXs?
The HTTP standard, no. But another standard, IRI, can come into play.
URIs are explicitly (once %-decoded) byte sequences. What Unicode characters those bytes map onto is not specified by the URI standard or the HTTP standard for http:-scheme URIs.
Specifically for query parameters: web browsers will use the encoding of the originating page to make a form submission GET URL, so if you have a page in ISO-8859-1 and you put ‘é’ in a search box you'll get ‘?search=%E9’, but if you do the same in a page encoded as UTF-8 you'll get ‘?search=%C3%E9’. If you don't serve your form page with any particular charset the browser will guess, which you don't want as it'll make it impossible to guess what format the submission is going to come in as.
For the other parts of a URL, a browser won't generate them itself, but if you supply it with non-ASCII characters in links it will usually encode them as UTF-8. This is not reliable as it depends on browser and locale settings, so it's best not to use this at the moment.
The standard that properly allows non-ASCII characters in links is IRI. IRI converts to URI by UTF-8-%-encoding most of the URL, but the hostname is converted using Punycode instead. For compatibility it is best not to rely on browsers understanding IRIs in links yet. Instead, UTF-8-then-%-encode your path and parameter characters yourself. They will still appear as the right characters in the address bar in modern browsers; unfortunately IE won't display the decoded-character IRI form in all cases, depending on language settings.
The Wiki IRI for the Greek gamma character is:
http://en.wikipedia.org/wiki/Γ
Encoded into a URI, it is:
http://en.wikipedia.org/wiki/%CE%93
Per RFC 2616,
CHAR = <any US-ASCII character (octets 0 - 127)>
and
token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "#"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
and URIs are tokens with various specific separators. So, in theory, nothing but US-ASCII should be there. (In practice, since the ISO-8859-1 extension to US-ASCII is used in many other spots in the HTTP specs, it's not unusual to find HTTP implementations which support ISO-8859-1 rather than just US-ASCII, but strictly speaking that's not standards-compliant HTTP).
As far as I'm aware, there is no way to define it, though I've always assumed that it is ASCII, since that is what DNS is (currently, though localised DNS is coming, with all the problems that entails).
Note: UTF8 is "ASCII compatible" unless you try to use extended characters. This probably plays some small part in the reasoning behind why some browsers might send their GET data UTF8 encoded.
EDIT: From your comment, it seems like you don't know how the % encoding works at all, so here goes.
Given the following string query string, "?foo=Hello World!", the "Hello World!" part needs URL encoding. The way this works is any 'special' characters get their ASCII value taken and converted to hex prefixed by a '%'. So the above string would convert to "?foo=Hello%20World%21".

Resources