What charset encoding is used for HTTP GET requests URL? [duplicate] - http

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
What's the correct encoding of HTTP get request strings?
One of my clients sent me they require HTTP requests to be encoded in ISO-8859-2,
so I wonder about what charset is used for HTTP communication, and if this request is somehow technicaly right.

It depends. A "smart" server will always use percent-escaped UTF-8, but you can't rely on that.

Pure ASCII is all that's allowed in HTTP headers. But, as far as HTTP is concerned, anything goes in the request body of a POST. The headers and body are always separated by a blank line. A set of headers will normally identify the format of the content/body. Responses work the same way. However, HTML has some additional rules regarding what normally goes in a POST.
EDIT: Sorry, I missed the word 'GET' in your title. Might be nice to duplicate that in the body of your question.
At any rate, I believe I am correct in saying ONLY ASCII (ANSI X3.4-1986) is allowed in the headers of any HTTP request, GET or POST. So no, ISO-8859-2 requests are not strictly valid HTTP. That said, there's probably a way to escape the desired special characters in the query string if that's what you're really asking for here.
SOURCE: https://www.rfc-editor.org/rfc/rfc2616

Related

How to add Parameters, Data, and Headers to a POST request in Golang? [duplicate]

This question already has an answer here:
How to translate this curl call into Go?
(1 answer)
Closed 1 year ago.
Is there any way to add headers and data to an HTTP request in Golang, as well as URL parameters? I have tried to add data to a POST request, but it seems Golang just isn't meant for sending any kind of complex request. After much research online I'm not sure if there's a way, but if anyone knows how, please tell me!
Yes, you can. See How to set headers in http get request?, it answers your question. Also, you can find more information about the default http package in https://pkg.go.dev/net/http.

What would a body in an HTTP GET look like? [duplicate]

This question already has answers here:
HTTP GET with request body
(23 answers)
Closed 8 years ago.
I realize this is not recommended, but I'm wondering what the URL would look like, for example:
http://myserver.com/rest/info?param1=foo&param2=bar
how would you append a body to that URL, I don't think &body would work.
Reason I'm looking at this is I'm looking for a way around a rather limited preset access to CURL from within my chosen language, so I'm wondering if I can use GET somehow instead of POST.
EDIT: Several people marked this as a duplicate of another question, but the essence of my question was "What would the body look like" where the other question is and I quote here from the other post:
My questions:
Is this a good idea altogether? Will HTTP clients have issues with
using request bodies within a GET request?
Therefore I don't believe this is a duplicate at all, I believe those who marked it as such perhaps didn't really read either question much beyond the title.
The URL wouldn't change by adding a body to the HTTP request. In addition to the normal HTTP headers your request would also include a body (separated from the header by a blank line).
I would strongly recommend that you DON'T do this with GET requests though - it is not well treated with HTTP servers and is somewhat wrong if you want to follow and respect the HTTP specification.
A simple request would look something like this:
GET /whatever HTTP/1.1
Host: foobar.com
User-Agent: Selfmade telnet
Connection: close
hello world

HTTP query and URI encoding doubts [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Recently I was researching HTTP query strings while wondering about possibilities on web service access interface API. And it seems very underspecified.
In fact RFC 3986 (Uniform Resource Identifier (URI): Generic Syntax) doesn’t say anything about format of the query string fragment and ends on defining which characters are allowed and how to encode other characters. (I will return to this later.)
The only thing I found was HTML specification on how forms are mangled into query string (HTML 4.01; 17.13.4 Form content types, application/x-www-form-urlencoded). HTML 5 algorithm seems close enough (4.10.22.5 URL-encoded form data).
This might seem OK. After all why would anyone want to set a query string format for everyone else. What for? But are there any other (than HTML) well established standards? Is anyone else using a different format?
A side question here is dealing with [] in form fields names. PHP uses that to ensure that multiple occurrences of a field are all present in $_GET superglobal variable. (Otherwise only last occurrence is present.)
But from RFC 3986 it seems that neither [ nor ] are allowed in query string. Yet my experiments with various browsers suggested that no browser encodes those characters and they are there in the URI just like that...
Is this real life practice? Or am I testing it incorrectly? I tested with PHP 5.3.17 on IIS 7. Using Internet Explorer, Firefox and Chrome. Then I compared what is in $_SERVER['QUERY_STRING'] and $_GET.
Another question is real life support for semicolon separation.
HTML 4.01 specification (B.2.2 Ampersands in URI attribute values) recommends HTTP servers to accept semicolon (;) as parameter separator (opposed to ampersand &).
Is any server supporting it? Is anyone using this? Is it worth to bother with that (when considering allowed formats of query string for a web service)?
Then how about non-ASCII characters support?
HTML 4.01 specification (B.2.1 Non-ASCII characters in URI attribute values) restates clearly what URI describing RFCs stated in the first place: non-ASCII characters are not allowed in URI. Yet specification takes into account existing practice (of use of illegal URIs) and advices to change such characters into UTF-8 encoding and then treat each byte with URI-standard hex encoding.
From my tests is seems that for example Chrome and Firefox do so. But Internet Explorer did not and just sent those characters like they were. PHP partially coped with that. $_SERVER['QUERY_STRING'] and $_GET contained those characters. But $_SERVER['REQUEST_URI'] contained ? instead.
Are there any standards or practices how to approach such cases?
And another connected question is how then should authors publish (by URI) resources with names containing non-ASCII (for example national) characters? Considering all the various parties (HTML code, browser sending request, browser saving file do disk, server receiving and processing request and server storing the file) it seems nearly impossible to have it working consistently. Or at least I never managed.
When it comes to web pages I’m already used to that and always replace national characters with corresponding Latin base characters. But when it comes to external files (PDFs, images, …) it somehow “feels wrong” to “downgrade” the names. Especially if one expects users to save those files on disk.. How to deal with this issue?
Have you checked HTTP specyfication (RFC2616)?
Take a look at those parts:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2
The practical advice would be to use Base64 to encode the fields that you expect to contain risky characters and later on decode them on your backend.
Btw. Your question is really long. It decreases the chance that someone will dig into it.
In fact RFC 3986 (Uniform Resource Identifier (URI): Generic Syntax) doesn’t say anything about format of the query string fragment
Yes, it does, in Section 3.4:
query = *( pchar / "/" / "?" )
pchar is defined in Section 3.3:
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
and ends on defining which characters are allowed and how to encode other characters.
Exactly. That is defining the format of the query string fragment.
But from RFC 3986 it seems that neither [ nor ] are allowed in query string.
Officially, yes. But not all browsers do it, and that is broken behavior on their part. All official specs I have seen (and 3986 is not the only one in play) say those characters must be percent-encoded.
Then how about non-ASCII characters support?
Non-ASCII characters are not allowed in URIs. They must be charset-encoded and percent-encoded. The actual charset used is server-specific, there is no spec that allows a URI to specify the charset used. Various specs recommend UTF-8, but do not require UTF-8, and some foreign servers indeed do not use UTF-8.
The IRI spec (RFC 3987), which replaces the URL/URI specs, supports the full Unicode charset, but IRIs are still relatively new and many servers do not support them yet. However, The RFC does define algorithms for converting IRIs to URIs and vice versa.
When in doubt, percent-encode everything you are not sure about. Servers are required to support an decode them when present, before then processing the decoded data as needed.

HTTP requests and querystring vs headers?

I'm trying to understand the difference between querystrings and headers. Where do you use each?
Query strings might be more useful in making URLs human readable I suppose, but other than that, wouldn't it be easier to just embed that in your own custom HTTP header (side question, but how this relate to cookies?)? What's the distinction between the two?
Refer a similar question Adding Custom HTTP Headers
Why would I prefer query string over http-header fields?
It is easy
I don't need any additional API
It is also recommended in
HTTP-RFC to "follow common-forms" when it comes to header
fields.

How to send MIME over HTTP?

I need to send certain data to the server in a .zip archive, over HTTP POST request, MIME encoded. I take it that means only that I need to specify MIME type in a request header. But I'm confused as to what should I put in request's body. So far I can see two ways to do it:
Usually, as I take it (sorry, I'm not a web coder, so kinda lame with HTTP), POST request body consists of pairs parameter_name=some+data divided by '&'. Should I do it the same way and write contents of my file in base64 in one of parameters? That would also let me provide supplemental parameters.
Or should I just fill POST body with contents of my file (in base64, right?)? If so, is there any way to provide additional info about the file?
Is only one of theese ways acceptable or are both? If so, what would be the best practice?
Also, code sample in C++ for Qt would be very-very much appreciated, but totally not necessary :)
The whole key=value body in POST requests is just for when you are sending form-data to your server. If you want to POST only the contents of a .zip file you can just send that as the body of your POST, no need to set it up like a form post as you describe. You can set the following headers in the request:
Content-Type: application/zip
Content-Disposition: attachment; filename=myzip.zip
You don't even necessarily have to base64 encode the body, although you should if that's what your server is expecting.
The Content-Disposition is the thing you need to describe more about your file upload. You can find some details about it here:
http://en.wikipedia.org/wiki/MIME#Content-Disposition
and here
http://www.ietf.org/rfc/rfc2183.txt
At the server end, you just need to write some code which will get the response body in its entirity (which is straightforward, although YMMV depending on language and framework), and handle it however you want.
For a real world example, you might find it useful to look at, say, AtomPub for how this is done:
http://bitworking.org/projects/atom/rfc5023.html

Resources