the header fields of an HTTP message: who chooses them? - http

Looking at the network panel in the developer tools of google chrome I can read the HTTP request and response messages of each file in a web page and, in particular, I can read the start line and the headers with all their fields.
I know (and I hope that is right) that the start line of each HTTP message has a specific and rigorous structure (different for request and response message, of course) and any element inside a start line cannot be missed.
Unlike the start line, the header of an HTTP message contains additional informations, so, I guess, the header fields are facultative or, at least, not so strictly requested like the fields in the start line.
Considering all this, I'm wondering: who sets the header fields in an HTTP message? Or, in other words, how are determined the header fields of an HTTP message?
For example, i can actually see that the HTTP request message for a web page is this:
GET / HTTP/1.1
Host: www.corriere.it
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.130 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4,de;q=0.2
Cookie: rccsLocalPref=milano%7CMilano%7C015146; rcsLocalPref=milano%7CMilano; _chartbeat2=DVgclLD1BW8iBl8sAi.1422913713367.1430683372200.1111111111111111; rlId=8725ab22-cbfc-45f7-a737-7c788ad27371; __ric=5334%3ASat%20Jun%2006%202015%2014%3A13%3A31%20GMT+0200%20%28ora%20legale%20Europa%20occidentale%29%7C; optimizelyEndUserId=oeu1433680191192r0.8780217287130654; optimizelySegments=%7B%222207780387%22%3A%22gc%22%2C%222230660652%22%3A%22false%22%2C%222231370123%22%3A%22referral%22%7D; optimizelyBuckets=%7B%7D; __gads=ID=bbe86fc4200ddae2:T=1434976116:S=ALNI_MZnWxlEim1DkFzJn-vDIvTxMXSJ0g; fbm_203568503078644=base_domain=.corriere.it; apw_browser=3671792671815076067.; channel=Direct; apw_cache=1438466400.TgwTeVxF.1437740670.0.0.0...EgjHfb6VZ2K4uRK4LT619Zau06UsXnMdig-EXKOVhvw; ReadSpeakerSettings=enlarge=enlargeoff; _ga=GA1.2.1780902850.1422986273; __utma=226919106.1780902850.1422986273.1439110897.1439114180.19; __utmc=226919106; __utmz=226919106.1439114180.19.18.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); s_cm_COR=Googlewww.google.it; gvsC=New; rcsddfglr=1441375682.3.2.m0i10Mw-|z1h7I0wH.3671792671815076067..J3ouwyCkNXBCyau35GWCru0I1mfcA3hRLNURnDWREPs; cpmt_xa=5334,5364; utag_main=v_id:014ed4175b8e000f4d2bb480bdd10606d001706500bd0$_sn:74$_ss:1$_st:1439133960323$_pn:1%3Bexp-session$ses_id:1439132160323%3Bexp-session; testcookie=true; s_cc=true; s_nr=1439132160762-Repeat; SC_LNK_CR=%5B%5BB%5D%5D; s_sq=%5B%5BB%5D%5D; dtLatC=116p80.5p169.5p91.5p76.5p130.5p74p246.5p100p74.5p122.5; dtCookie=E4365758C13B82EE9C1C69A59B6F077E|Corriere|1|_default|1; dtPC=-; NSC_Wjq_Dpssjfsf_Dbdif=ffffffff091a1f8d45525d5f4f58455e445a4a423660; hz_amChecked=1
how these header fields are chosen? Who/what chose them? (The browser? Not me, of course...)
p.s.:
hope my question is clear, please, forgive my bad english

All internet websites are hosted on HTTP servers, these headers are set by the http server who is hosting the webpage. They are used to control how pages are shown, cached, and encoded.
Web browsers set the headers when requesting the pages from the servers. This mutual communication protocol is the HTTP protocol linked above.

here is a list of all the possible header fields for a request message: the question is, why the broser chooses only some of them?
The browser doesn't include all possible request headers in every request because either:
They aren't applicable to the current request or
The default value is the desired value
For instance:
Accept tells the server that only certain data formats are acceptable in the response. If any kind of data is acceptable, then it can be omitted as the default is "everything".
Content-Length describes the length of the body of the request. A GET request doesn't have a body, so there is nothing to describe the length of.
Cookie contains a cookie set by the server (or JavaScript) on a previous request. If a cookie hasn't been set, then there isn't one to send back to the server.
and so on.

Related

Jetty client http error code 412

I am accessing one website (I am hiding origin website name as it is against the policy) using browser and jetty/apache httpclient.
The website works fine with web browser.
Using api I am able to login into website,gets the session cookie JSESSIONID and home page html content. But after that when I submit any form or call the links from html I receive the HTTP error code 412(Pre condition failed).
I understand this error is due problem in client header. I set all the headers from browser(checked using inspect element in chrome browser). Still I have the same error.
I am not able to track down which header is causing the problem.
Here is the Header from browser
Request Headers
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:318
Content-Type:application/x-www-form-urlencoded
Cookie:language=en_IN; __gads=ID=14c64d8f9fd7de54:T=1424658276:S=ALNI_Mba1kvJO4mLo7R-T2jUJE9zCYck5A; SLB_Cookie=ffffffff09461c2d45525d5f4f58455e445a4a422971; JSESSIONID=36m4Oo6dCML_Wvx-Wgmm9rtLh9mbURxnZhWIVwg-zHaNzFQeUt9C!-1989013783; _ga=GA1.3.379900459.1428120216
Host:www.irctc.co.in
Origin:https://www.example.com
Referer:https://www.example.com/context/home
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36
Same header I am setting from jetty client.
Request request = httpClient.newRequest(url);
request.method(HttpMethod.POST);
request.agent(USER_AGENT);
request.accept("text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
request.header(HttpHeader.REFERER,"https://www.example.com/context/home");
request.header(HttpHeader.ACCEPT_ENCODING, "gzip, deflate");
request.header(HttpHeader.ACCEPT_LANGUAGE, "en-GB,en-US;q=0.8,en;q=0.6");
request.header(HttpHeader.CACHE_CONTROL, "max-age=0");
request.header(HttpHeader.CONNECTION, "keep-alive");
request.header(HttpHeader.CONTENT_TYPE, "application/x-www-form-urlencoded");
request.header(HttpHeader.HOST, "www.example.com);
for (Map.Entry<String, String> entry : params.entrySet()){
request.param(URLEncoder.encode(entry.getKey(), "UTF-8"), URLEncoder.encode(entry.getValue(), "UTF-8"));
if(!StringUtils.isEmpty(content)){
content+="&";
}
content+=URLEncoder.encode(entry.getKey(), "UTF-8")+"="+URLEncoder.encode(entry.getValue(), "UTF-8");
}
request.header(HttpHeader.CONTENT_LENGTH, ""+content.length());
I see JSESSIONID and SLB_Cookie are present in the request. Since the website is out of our control I really can not track what is the issue.
Please help me to resolve this issue. Any pointers to resolve the issue on client side is appreciated. is there any way we can make sure which header causing this issue.
I solved the problem.
Issue with the form parameters value. I was sending encoded values where were encoded by jetty client again

HTTP headers which cause PREFLIGHT - Clarification?

Simple requests are requests that meet the following criteria :
HTTP Method matches (case-sensitive) one of:
HEAD
GET
POST
HTTP Headers matches (case-insensitive):
Accept
Accept-Language
Content-Language
Last-Event-ID
Content-Type, but only if the value is one of:
application/x-www-form-urlencoded
multipart/form-data
text/plain
But looking at this test page which is not causing preflight request :
General :
Remote Address:69.163.243.142:80
Request URL:http://aruner.net/resources/access-control-with-get/
Request Method:GET
Status Code:200 OK
Request Headers
Accept:*/*
Accept-Encoding:gzip, deflate, sdch
Accept-Language:en-US,en;q=0.8,he;q=0.6
Cache-Control:no-cache
Connection:keep-alive
DNT:1
Host:aruner.net
Origin:http://arunranga.com
Pragma:no-cache
Referer:http://arunranga.com/examples/access-control/simpleXSInvocation.html
User-Agent:Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36
Response Headers
Access-Control-Allow-Origin:http://arunranga.com
Connection:Keep-Alive
Content-Type:application/xml
Date:Sat, 26 Sep 2015 09:00:26 GMT
Keep-Alive:timeout=2, max=100
Server:Apache
Transfer-Encoding:chunked
Being pedantic and looking at the request section , There are many headers which are not in the preceding criteria section :
Cache-Control is not on the list
Connection is not on the list
DNT is not is not on the list
User-Agent is not on the list
Accept-Encoding is not on the list
I know that those are more of "general" headers. But so does accept-language
Question
What am I missing here? According to the criteria section, a request with those headers should cause a preflight request.
Looking at your code:
invocation.open('GET', url, true);
invocation.onreadystatechange = handler;
invocation.send();
You are not actually setting any custom headers. e.g.
invocation.setRequestHeader("X-Requested-With", "XMLHttpRequest");
Therefore there will be no preflight. Default browser headers do not count. The preflight mechanism is only there to ensure any custom headers, such as the one in my example above, are allowed to be passed cross domain by the receiving site.
As further clarification on top of the accepted answer: See the HTTP header layer division section of the Fetch Standard (where the CORS protocol and UA requirements are defined these days).
For the purposes of fetching, the platform has an API layer (HTML's
img, CSS' background-image), early fetch layer, service worker layer,
and network & cache layer. Accept and Accept-Language are set in
the early fetch layer (typically by the user agent). Most other
headers controlled by the user agent, such as Accept-Encoding,
Host, and Referer, are set in the network & cache layer.
Developers can set headers either at the API layer or in the service
worker layer (typically through a Request object).
So, based on that, we can essentially say:
the headers in the question are controlled by the UA, and set in the “network & cache layer”
so, the headers are not headers that developers can set in the “API layer“
so, the headers have not yet been set at the point when the algorithm runs for determining if a preflight request is required (instead they’re set by the UA later, after that’s already been done)
Then, given the above, despite the fact those headers can be seen in the request, we know they have played no part in the determination of whether a preflight should have been required.
In other words, those headers are essentially irrelevant to CORS. And so also, the only headers that are relevant are those that developers manually set in the “API layer” or service-worker layer.

How are parameters sent in an HTTP POST request?

In an HTTP GET request, parameters are sent as a query string:
http://example.com/page?parameter=value&also=another
In an HTTP POST request, the parameters are not sent along with the URI.
Where are the values? In the request header? In the request body? What does it look like?
The values are sent in the request body, in the format that the content type specifies.
Usually the content type is application/x-www-form-urlencoded, so the request body uses the same format as the query string:
parameter=value&also=another
When you use a file upload in the form, you use the multipart/form-data encoding instead, which has a different format. It's more complicated, but you usually don't need to care what it looks like, so I won't show an example, but it can be good to know that it exists.
The content is put after the HTTP headers. The format of an HTTP POST is to have the HTTP headers, followed by a blank line, followed by the request body. The POST variables are stored as key-value pairs in the body.
You can see this in the raw content of an HTTP Post, shown below:
POST /path/script.cgi HTTP/1.0
From: frog#jmarshall.com
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32
home=Cosby&favorite+flavor=flies
You can see this using a tool like Fiddler, which you can use to watch the raw HTTP request and response payloads being sent across the wire.
Short answer: in POST requests, values are sent in the "body" of the request. With web-forms they are most likely sent with a media type of application/x-www-form-urlencoded or multipart/form-data. Programming languages or frameworks which have been designed to handle web-requests usually do "The Right Thing™" with such requests and provide you with easy access to the readily decoded values (like $_REQUEST or $_POST in PHP, or cgi.FieldStorage(), flask.request.form in Python).
Now let's digress a bit, which may help understand the difference ;)
The difference between GET and POST requests are largely semantic. They are also "used" differently, which explains the difference in how values are passed.
GET (relevant RFC section)
When executing a GET request, you ask the server for one, or a set of entities. To allow the client to filter the result, it can use the so called "query string" of the URL. The query string is the part after the ?. This is part of the URI syntax.
So, from the point of view of your application code (the part which receives the request), you will need to inspect the URI query part to gain access to these values.
Note that the keys and values are part of the URI. Browsers may impose a limit on URI length. The HTTP standard states that there is no limit. But at the time of this writing, most browsers do limit the URIs (I don't have specific values). GET requests should never be used to submit new information to the server. Especially not larger documents. That's where you should use POST or PUT.
POST (relevant RFC section)
When executing a POST request, the client is actually submitting a new document to the remote host. So, a query string does not (semantically) make sense. Which is why you don't have access to them in your application code.
POST is a little bit more complex (and way more flexible):
When receiving a POST request, you should always expect a "payload", or, in HTTP terms: a message body. The message body in itself is pretty useless, as there is no standard (as far as I can tell. Maybe application/octet-stream?) format. The body format is defined by the Content-Type header. When using a HTML FORM element with method="POST", this is usually application/x-www-form-urlencoded. Another very common type is multipart/form-data if you use file uploads. But it could be anything, ranging from text/plain, over application/json or even a custom application/octet-stream.
In any case, if a POST request is made with a Content-Type which cannot be handled by the application, it should return a 415 status-code.
Most programming languages (and/or web-frameworks) offer a way to de/encode the message body from/to the most common types (like application/x-www-form-urlencoded, multipart/form-data or application/json). So that's easy. Custom types require potentially a bit more work.
Using a standard HTML form encoded document as example, the application should perform the following steps:
Read the Content-Type field
If the value is not one of the supported media-types, then return a response with a 415 status code
otherwise, decode the values from the message body.
Again, languages like PHP, or web-frameworks for other popular languages will probably handle this for you. The exception to this is the 415 error. No framework can predict which content-types your application chooses to support and/or not support. This is up to you.
PUT (relevant RFC section)
A PUT request is pretty much handled in the exact same way as a POST request. The big difference is that a POST request is supposed to let the server decide how to (and if at all) create a new resource. Historically (from the now obsolete RFC2616 it was to create a new resource as a "subordinate" (child) of the URI where the request was sent to).
A PUT request in contrast is supposed to "deposit" a resource exactly at that URI, and with exactly that content. No more, no less. The idea is that the client is responsible to craft the complete resource before "PUTting" it. The server should accept it as-is on the given URL.
As a consequence, a POST request is usually not used to replace an existing resource. A PUT request can do both create and replace.
Side-Note
There are also "path parameters" which can be used to send additional data to the remote, but they are so uncommon, that I won't go into too much detail here. But, for reference, here is an excerpt from the RFC:
Aside from dot-segments in hierarchical paths, a path segment is considered
opaque by the generic syntax. URI producing applications often use the
reserved characters allowed in a segment to delimit scheme-specific or
dereference-handler-specific subcomponents. For example, the semicolon (";")
and equals ("=") reserved characters are often used to delimit parameters and
parameter values applicable to that segment. The comma (",") reserved
character is often used for similar purposes. For example, one URI producer
might use a segment such as "name;v=1.1" to indicate a reference to version
1.1 of "name", whereas another might use a segment such as "name,1.1" to
indicate the same. Parameter types may be defined by scheme-specific
semantics, but in most cases the syntax of a parameter is specific
to the implementation of the URIs dereferencing algorithm.
You cannot type it directly on the browser URL bar.
You can see how POST data is sent on the Internet with Live HTTP Headers for example.
Result will be something like that
http://127.0.0.1/pass.php
POST /pass.php HTTP/1.1
Host: 127.0.0.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://127.0.0.1/pass.php
Cookie: passx=87e8af376bc9d9bfec2c7c0193e6af70; PHPSESSID=l9hk7mfh0ppqecg8gialak6gt5
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 30
username=zurfyx&pass=password
Where it says
Content-Length: 30
username=zurfyx&pass=password
will be the post values.
The default media type in a POST request is application/x-www-form-urlencoded. This is a format for encoding key-value pairs. The keys can be duplicate. Each key-value pair is separated by an & character, and each key is separated from its value by an = character.
For example:
Name: John Smith
Grade: 19
Is encoded as:
Name=John+Smith&Grade=19
This is placed in the request body after the HTTP headers.
Form values in HTTP POSTs are sent in the request body, in the same format as the querystring.
For more information, see the spec.
Some of the webservices require you to place request data and metadata separately. For example a remote function may expect that the signed metadata string is included in a URI, while the data is posted in a HTTP-body.
The POST request may semantically look like this:
POST /?AuthId=YOURKEY&Action=WebServiceAction&Signature=rcLXfkPldrYm04 HTTP/1.1
Content-Type: text/tab-separated-values; charset=iso-8859-1
Content-Length: []
Host: webservices.domain.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: identity
User-Agent: Mozilla/3.0 (compatible; Indy Library)
name id
John G12N
Sarah J87M
Bob N33Y
This approach logically combines QueryString and Body-Post using a single Content-Type which is a "parsing-instruction" for a web-server.
Please note: HTTP/1.1 is wrapped with the #32 (space) on the left and with #10 (Line feed) on the right.
First of all, let's differentiate between GET and POST
Get: It is the default HTTP request that is made to the server and is used to retrieve the data from the server and query string that comes after ? in a URI is used to retrieve a unique resource.
this is the format
GET /someweb.asp?data=value HTTP/1.0
here data=value is the query string value passed.
POST: It is used to send data to the server safely so anything that is needed, this is the format of a POST request
POST /somweb.aspHTTP/1.0
Host: localhost
Content-Type: application/x-www-form-urlencoded //you can put any format here
Content-Length: 11 //it depends
Name= somename
Why POST over GET?
In GET the value being sent to the servers are usually appended to the base URL in the query string,now there are 2 consequences of this
The GET requests are saved in browser history with the parameters. So your passwords remain un-encrypted in browser history. This was a real issue for Facebook back in the days.
Usually servers have a limit on how long a URI can be. If have too many parameters being sent you might receive 414 Error - URI too long
In case of post request your data from the fields are added to the body instead. Length of request params is calculated, and added to the header for content-length and no important data is directly appended to the URL.
You can use the Google Developer Tools' network section to see basic information about how requests are made to the servers.
and you can always add more values in your Request Headers like Cache-Control , Origin , Accept.
There are many ways/formats of post parameters
formdata
raw data
json
encoded data
file
xml
They are controlled by content-type in Header that are representes as mime-types.
In CGI Programming on the World Wide Web the author says:
Using the POST method, the server sends the data as an input stream to
the program. ..... since the server passes information to this program
as an input stream, it sets the environment variable CONTENT_LENGTH to
the size of the data in number of bytes (or characters). We can use
this to read exactly that much data from standard input.

Why .Net WebApi don't detect the request contentType automatically and do auto-binding?

Why .Net WebApi don't detect the request contentType automatically and do auto-binding?
If I make a request without informing the contentType a HTTP 500 error occour:
No MediaTypeFormatter is available to read an object of type 'ExampleObject' from content with media type ''undefined''.
why not try to detect the incoming data and bind automatically?
Another case:
This request with Content-Type: application/x-www-form-urlencoded send a JSON:
User-Agent: Fiddler
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Host: localhost:10329
Content-Length: 42
Request Body:
{"Name":"qq","Email":"ww","Message":"ee"}:
My Action don't detect the JSON request data automatically in object param:
public void Create(ExampleObject example) //example is null
{
{
Instead of letting the object null why they do not try to solve it?
Then, for the binding occurs I need to send with Content-Type: application/json.
It would be best if .Net WebAPI detects the type of request data and do a auto-binding? Why not in this way?
application/x-www-form-urlencoded means you will be sending data in the x-www-form-urlencoded standard. Sending data in another standard will not work.
Sounds like what you want to do is accept multiple formats from the server.
the way http works is that the client makes a request to the server for a resource and tells the server what content types it understands. This means that the client doesnt get a response it isnt able to decode, and the server knows which responses are more appropriate on the client. For example if you are a web-browser the most appropriate content type is text/html but if you get XML you can probably do something with that too. So you would make a request with the following:
accept: text/html, application/xml
this says you prefer html but also understand XML
In your example if your client wants application/x-www-form-urlencoded but can also deal with JSON then you should do the following when making a request
accept: application/x-www-form-urlencoded, application/json
For more details see the HTTP Spec on accept headers here http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
You may also want to create a new media type formatter so your server knows how to give clients application/x-www-form-urlencoded, take a look at this blog post for more info on how to do this http://www.strathweb.com/2012/04/rss-atom-mediatypeformatter-for-asp-net-webapi/

context.Request.Files collection empty on remote server only

I'm using a custom ashx handler to handle a file upload. When run locally, the file uploads fine.
When I use the same setup on the web server I get a "Index out of range" error.
In firebug I see the binary contents of the file in the post data and the file name is also passed in the query string.
Any one seen this before?
I`m sure its something minor, but its driving me up the wall.
Update: Made some progress. I found out that I'm getting two different errors. One from FF / Chrome and one from IE. I'm focusing on FF now, just because firebug makes debugging easier. Now I get an error "Could not find a part of the path 'C:\inetpub\wwwroot\'"
Update 2: Got this working in FF/Chrome. Turns out IE and FF/Chrome post the data differentlly.
Update 3: Here is the output of the network profiler in IE dev tool:
Request header:
Key Value
Request POST /Secured/UploadHandler.ashx? HTTP/1.1
Accept text/html, application/xhtml+xml, */*
Referer http://cms.webstreet.co.il/Secured/fileUpload.aspx
Accept-Language he-IL
User-Agent Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
Content-Type multipart/form-data; boundary=---------------------------7db13b13d1b12
Accept-Encoding gzip, deflate
Host cms.webstreet.co.il
Content-Length 262854
Connection Keep-Alive
Cache-Control no-cache
Request body:
-----------------------------7db13b13d1b12
Content-Disposition: form-data; name="qqfile"; filename="P-Art_Page_Digital.jpg"
Content-Type: image/jpeg
<Binary File Data Not Shown>
---------------------------7db13b13d1b12--
See the (big) list of comments and replies attached to the original question. Not sure why it works now, but Elad seems to have fixed his problem.
You have to specify the name tag.
<input id="File1" name="file1" type="file" />
I am pretty sure file uploads CANNOT be done via Ajax; you need to use a regular form post.
Also make sure you have the enctype attribute set correctly on your form tag.

Resources