In developer.mozilla.org says:
HTTP headers allow the client and the server to pass additional
information with the request or the response
but I don't understand what is the use of that? What is the need to pass additional information with the request or the response?
This is a hard question to answer concisely because of the many different types of HTTP headers and what they do, but here's an attempt at a one-line answer:
HTTP headers allow a client and server to understand each other better, meaning they can communicate more effectively.
So then if you look at individual headers, it becomes clearer why each is needed:
User-Agent header
Sent by the client
Tells the server about the client's setup (browser, OS etc.)
Mostly used to improve client experience, e.g. tailoring responses for mobile devices or dealing with browser compatibility issues
set-cookie header
Sent by the server
Tells the browser to set a cookie
host header
Sent by the client
Specifies the exact domain name of the site the client wants to reach, this is used when a single server hosts multiple websites (a.k.a. virtual hosting)
I'm currently working on a system where a client makes HTTP 1.1 requests of an origin server. I control both the client and the server software, so have free reign over HTTP headers set. Between the client are multiple, hierarchical layers of web proxy / cache devices (think, Squid or similar).
The data served up by the origin is usually highly cacheable, and I intend to set HTTP response headers to indicate this. Specifically, I plan to use Cache-Control: public, max-age=<value>. I understand that this will mean that intermediate proxies will cache the response up to the specified max-age, at which point they will revalidate against the origin (presumably with a Last-Modified header, looking for a 304 response).
The problem I have is that the client might become aware that the data held by caches might now be invalid. In this case, I need the client to make a request which instructs the caches to either fetch or revalidate their response with the origin. If the origin response is now different, the cache should store this new response. In my mind, this would involve the client making the request, and each cache in the chain should revalidate its response with the next upstream device, all the way back to the origin. The new response can then be served from the closest cache which actually has it.
What's the correct HTTP headers that need to be set on the client request to achieve this? At first I thought that setting Cache-control: no-cache in the HTTP request would make this happen, but reading the RFC, it seems that this will instruct the intermediate caches to both go back to the origin (desired) but also not cache the new response (not desired). I then saw an article in which an HTTP request header of Cache-control: max-age=0 would perhaps do this, but I'm not sure.
Will max-age=0 do what I need here, or do I need some other combination of HTTP headers?
I asked a similar question here: How to make proxy revalidate resource from origin. I since learned that proxy revalidate wasn't supported by nginx at the time of writing. It is scheduled for the 1.5 release.
Sending max-age=0 from the client should trigger this revalidate mechanism in the proxy, if the original response from the origin contained the right cache control headers.
But whether your upstream server(s) will respect these headers and revalidate with their origin is clearly not something you can just assume. If you have control over your upstream servers I think it could work.
Also etag is preferred over modified since headers afaik.
I found these to be helpful articles on the subject:
caching tutorial
cache control directives
http specs on validation
section 14.9.4 on this spec
[UPDATE]
Nginx version 1.5.8 has been released since, and I can confirm that this mechanism is now working!
Seems most web server support subrequest.
I found here's one question about sub request:
Subrequest for PHP-CGI
But what is the point of sub request at all,when is that kind of thing really useful?
Is it defined in http protocol?
Apache subrequests can be used in your e.g. PHP application with virtual() to access resources from the same server. The resource gets processed from Apache normally as normal requests are, but you don't have the overhead of sending a full HTTP request on the network interface.
Less overhead is probably the only reason one would want to use it instead of a real HTTP request.
Edit: The resources are processed by apache, which means that the apache modules are used if configured. You can request a mod_perl or mod_ruby processed resource from PHP.
I'm wondering if it is bad practice to have a reverse proxy that selects the upstream server depending on the HTTP method used?
The background is that I have an abitrary web server that handles POST requests with some logic behind. The same resources also contain static content, that can be retrieved using GET. After some benchmarking I realized that nginx would handle the static content way faster than my abitrary web server doing this.
I checked the option to forward incoming requests internally using nginx, which is feasible.
But this would lead to the fact that different servers would serve a distinct resource, only depending on issuing a GET or POST, including different header fields.
No, it's not bad practice, partitioning the tasks by the nature of the task is perfectly fine, as long as you don't need to store persistent per-user-session data on the server.
Could somebody give me a brief overview of the differences between HTTP 1.0 and HTTP 1.1? I've spent some time with both of the RFCs, but haven't been able to pull out a lot of difference between them. Wikipedia says this:
HTTP/1.1 (1997-1999)
Current version; persistent connections enabled by default and works well with proxies. Also supports request pipelining, allowing multiple requests to be sent at the same time, allowing the server to prepare for the workload and potentially transfer the requested resources more quickly to the client.
But that doesn't mean a lot to me. I realize this is a somewhat complicated subject, so I'm not expecting a full answer, but can someone give me a brief overview of the differences at a bit lower level?
By this I mean that I'm looking for the info I would need to know to implement either an HTTP server or application. I'm mostly looking for a nudge in the right direction so that I can figure it out on my own.
Proxy support and the Host field:
HTTP 1.1 has a required Host header by spec.
HTTP 1.0 does not officially require a Host header, but it doesn't hurt to add one, and many applications (proxies) expect to see the Host header regardless of the protocol version.
Example:
GET / HTTP/1.1
Host: www.blahblahblahblah.com
This header is useful because it allows you to route a message through proxy servers, and also because your web server can distinguish between different sites on the same server.
So this means if you have blahblahlbah.com and helohelohelo.com both pointing to the same IP. Your web server can use the Host field to distinguish which site the client machine wants.
Persistent connections:
HTTP 1.1 also allows you to have persistent connections which means that you can have more than one request/response on the same HTTP connection.
In HTTP 1.0 you had to open a new connection for each request/response pair. And after each response the connection would be closed. This lead to some big efficiency problems because of TCP Slow Start.
OPTIONS method:
HTTP/1.1 introduces the OPTIONS method. An HTTP client can use this method to determine the abilities of the HTTP server. It's mostly used for Cross Origin Resource Sharing in web applications.
Caching:
HTTP 1.0 had support for caching via the header: If-Modified-Since.
HTTP 1.1 expands on the caching support a lot by using something called 'entity tag'.
If 2 resources are the same, then they will have the same entity tags.
HTTP 1.1 also adds the If-Unmodified-Since, If-Match, If-None-Match conditional headers.
There are also further additions relating to caching like the Cache-Control header.
100 Continue status:
There is a new return code in HTTP/1.1 100 Continue. This is to prevent a client from sending a large request when that client is not even sure if the server can process the request, or is authorized to process the request. In this case the client sends only the headers, and the server will tell the client 100 Continue, go ahead with the body.
Much more:
Digest authentication and proxy authentication
Extra new status codes
Chunked transfer encoding
Connection header
Enhanced compression support
Much much more.
HTTP 1.0 (1994)
It is still in use
Can be used by a client that cannot deal with chunked
(or compressed) server replies
HTTP 1.1 (1996- 2015)
Formalizes many extensions to version 1.0
Supports persistent and pipelined connections
Supports chunked transfers, compression/decompression
Supports virtual hosting (a server with a single IP Address hosting multiple domains)
Supports multiple languages
Supports byte-range transfers; useful for resuming interrupted data
transfers
HTTP 1.1 is an enhancement of HTTP 1.0. The following lists the
four major improvements:
Efficient use of IP addresses, by allowing multiple domains to be
served from a single IP address.
Faster response, by allowing a web browser to send multiple
requests over a single persistent connection.
Faster response for dynamically-generated pages, by support for
chunked encoding, which allows a response to be sent before its
total length is known.
Faster response and great bandwidth savings, by adding cache
support.
For trivial applications (e.g. sporadically retrieving a temperature value from a web-enabled thermometer) HTTP 1.0 is fine for both a client and a server. You can write a bare-bones socket-based HTTP 1.0 client or server in about 20 lines of code.
For more complicated scenarios HTTP 1.1 is the way to go. Expect a 3 to 5-fold increase in code size for dealing with the intricacies of the more complex HTTP 1.1 protocol. The complexity mainly comes, because in HTTP 1.1 you will need to create, parse, and respond to various headers. You can shield your application from this complexity by having a client use an HTTP library, or server use a web application server.
A key compatibility issue is support for persistent connections. I recently worked on a server that "supported" HTTP/1.1, yet failed to close the connection when a client sent an HTTP/1.0 request. When writing a server that supports HTTP/1.1, be sure it also works well with HTTP/1.0-only clients.
One of the first differences that I can recall from top of my head are multiple domains running in the same server, partial resource retrieval, this allows you to retrieve and speed up the download of a resource (it's what almost every download accelerator does).
If you want to develop an application like a website or similar, you don't need to worry too much about the differences but you should know the difference between GET and POST verbs at least.
Now if you want to develop a browser then yes, you will have to know the complete protocol as well as if you are trying to develop a HTTP server.
If you are only interested in knowing the HTTP protocol I would recommend you starting with HTTP/1.1 instead of 1.0.
HTTP 1.1 is the latest version of Hypertext Transfer Protocol, the World Wide Web application protocol that runs on top of the Internet's TCP/IP suite of protocols. compare to HTTP 1.0 , HTTP 1.1 provides faster delivery of Web pages than the original HTTP and reduces Web traffic.
Web traffic Example: For example, if you are accessing a server. At the same time so many users are accessing the server for the data, Then there is a chance for hanging the Server. This is Web traffic.
HTTP 1.1 comes with the host header in its specification while the HTTP 1.0 doesn't officially have a host header, but it doesn't refuse to add one.
The host header is useful because it allows the client to route a message throughout the proxy server, and the major difference between 1.0 and 1.1 versions HTTP are:
HTTP 1.1 comes with persistent connections which define that we can have more than one request or response on the same HTTP connection.
while in HTTP 1.0 you have to open a new connection for each request and response
In HTTP 1.0 it has a pragma while in HTTP 1.1 it has Cache-Control
this is similar to pragma