HTTP Keep-Alive to a different Host - http

If a webserver serves multiple virtual hosts (which can be selected by the client in the HTTP request header Host) and supports Keep-Alive, is the client allowed to send subsequent requests over the same connection with a different Host header?

From performance/efficiency point of view, both browser and server should Keep-Alive base on IP, not Hostname. Anyway, the reused connection is on TCP level, not HTTP level.
Although no information is found in RFC2068's Persistent Connections section. It seems Apache Keep-Alive between different virtual hosts (Is http keep-alive effective with different domain on the same webserver?). Also, #Michael Neale reports that connection is reused for different virtual hosts, on Chrome.

Related

Does HTTP/1.1 server know if client is not there anymore?

Client connects to server and sends request with keep-alive header. Server response also contains keep-alive header.
Now client loses power. Is there any mechanism (like ping) in TCP or HTTP stack, which tells server, that client is not there (other than timeout)?
Now client loses power. Is there any mechanism (like ping) in TCP or HTTP stack, which tells server, that client is not there (other than timeout)?
There is TCP keep alive which is specifically designed to detect lost connectivity.

HTTP REDIRECT(3xx) TO A DIFFERENT HOST

I'm building a HTTP client(for embedded devices) and I was wondering,
If I receive a HTTP 3xx response, and in the location header I get a hostname different from the one I had in the request. Should I disconnect the TCP connection and reconnect to the new host, or I just need to send a new request with a new host header and keep the old TCP connection alive.
Thank you
It doesn't make sense to reuse the original TCP connection if you're being redirected elsewhere. If my webserver only hosts example.com and I redirect you to elsewhere.net, my webserver will probably not respond to a request for elsewhere.net.
Worse, this also potentially sets you up for a great man-in-the-middle attack if my server redirects you to http://bank.com and you reuse the same TCP connection when sending a request to bank.com. My server can maliciously respond to requests with Host: bank.com, which isn't something you want to happen.
You can't assume the original connection can be reused unless the redirect is to the same same host with the same protocol.
Persistent HTTP connections are a little tricky with the number of client/server combinations. You can avoid the complexity by just wasting time closing and re-establishing each connection:
If you're implementing a HTTP/1.0 client, Connection: keep-alive isn't something you have to implement. Compliant servers should just close the connection after every request if you don't negotiate that you support persistent connections.
If you're implementing a HTTP/1.1 client and don't want to persist the connection, just send Connection: close with your request and a HTTP/1.1 server should close the connection.

How does a browser know which response belongs to which request?

Suppose when we request a resource over HTTP, we get a response as shown below:
GET / HTTP/1.1
Host: www.google.co.in
HTTP/1.1 200 OK
Date: Thu, 20 Apr 2017 10:03:16 GMT
...
But when a browser requests many resources at a time, how can it identify which request got which response?
when a browser requests many resources at a time, how can it identify which request got which response?
A browser can open one or more connections to a web server in order to request resources. For each of those connections the rules regarding HTTP keep-alive are the same and apply to both HTTP 1.0 and 1.1:
If HTTP keep-alive is off, the request is sent by the client, the response is sent by the server, the connection is closed:
Connection 1: [Open][Request1][Response1][Close]
If HTTP keep-alive is on, one "persistent" connection can be reused for succeeding requests. The requests are still issued serially over the same connection, so:
Connection 1: [Open][Request1][Response1][Request3][Response3][Close]
Connection 2: [Open][Request2][Response2][Request4][Response4][Close]
With HTTP Pipelining, introduced with HTTP 1.1, if it is enabled (on most browsers it is by default disabled, because of buggy servers), browsers can issue requests after each other without waiting for the response, but the responses are still returned in the same order as they were requested.
This can happen simultaneously over multiple (persistent) connections:
Connection 1: [Open][Request1][Request2][Response1][Response2][Close]
Connection 2: [Open][Request3][Request4][Response3][Response4][Close]
Both approaches (keep-alive and pipelining) still utilize the default "request-response" mechanism of HTTP: each response will arrive in the order of the requests on that same connection. They also have the "head of line blocking" problem: if [Response1] is slow and/or big, it holds up all responses that follow on that connection.
Enter HTTP 2 multiplexing: What is the difference between HTTP/1.1 pipelining and HTTP/2 multiplexing?. Here, a response can be fragmented, allowing a single TCP connection to transmit fragments of different requests and responses intermingled:
Connection 1: [Open][Rq1][Rq2][Resp1P1][Resp2P1][Rep2P2][Resp1P2][Close]
It does this by giving each fragment an identifier to indicate to which request-response pair it belongs, so the receiver can recompose the message.
I think you are really asking for HTTP Pipelining here. This is a technique introduced in HTTP/1.1, through which all requests would be sent out by the client in order and be responded by the server in the very same order. All the gory details are now in RFC 7230, sec. 6.3.2.
HTTP/1.0 had (or has) a comparable method known as Keep Alive. This would allow a client to issue a new request right after the previous has been answered. The benefit of this approach is that client and server no longer need to negotiate through another TCP handshake for a new request/response cycle.
The important part is that in both methods the order of the responses matches the order of the issued requests over one connection. Therefore, responses can be uniquely mapped to the issuing requests by the order in which the client is receiving them: First response matches, first request, second response matches second request, … and so forth.
I think the answer you are looking for is TCP,
HTTP is a protocol that relies on TCP to establish connection between the Client and the Host
In HTTP/1.0 a different TCP connection is created for each request/response pair,
HTTP/1.1 introduced pipelining, wich allowed mutiple request/response pair, to reuse a single TCP connection, to boost performance (Didnt work very well)
So the request and the corresponding response are linked by the TCP connection they rely on,
It's then easy to associate a specific request with the response it produced,
PS: HTTP is not bound to use TCP forever, for example google is experimenting with other transport protocols like QUIC, that might end up being more efficient than TCP for the needs of HTTP
In addition to the explanations above consider a browser can open many parallel connections, usually up to 6 to the same server. For each connection it uses a different socket. For each request-response in each socket it is easy to determine the correlation.
In each TCP connection, request and response are sequential. A TCP connection can be re-used after finishing a request-response cycle.
With HTTP pipelining, a single connection can be multiplexed for
multiple overlapping requests.
In theory, there can be any number[*1] of simultaneous TCP connections, enabling parallel requests and responses.
In practice, the number of simultaneous connections is usually limited on the browser and often on the server as well.
[*1] The number of simultaneous connections is limited by the number of ephemeral TCP ports the browser can allocate on a given system. Depending on the operating system, ephemeral ports start at 1024 (RFC 6056), 49152 (IANA), or 32768 (some Linux versions).
So, that may allow up to 65,535 - 1023 = 64,512 TCP source ports for an application. A TCP socket connection is defined by its local port number, the local IP address, the remote port number and the remote IP address. Assuming the server uses a single IP address and port number, the limit is the number of local ports you can use.

Can you use Keep-Alive with a CONNECT request with an HTTP proxy?

I know that with HTTP/1.1 proxies, it's possible to use Keep-Alive to keep a persistent connection with the proxy and from the proxy to the remote server, but I'm curious if/how that would work with an HTTPS connection. I know that to do this, the browser sends a CONNECT request to the proxy to establish a connection then begins communicating using HTTPS. I'm curious if it's possible to use Keep-Alive with HTTPS through an http proxy.
Simply put, CONNECT is always keep-alive.
In HTTP, “persistent connection” means a connection that persists after one request-response pair. But CONNECT establishes a tunnel through the proxy. The proxy cannot even see the requests and responses that are sent over this tunnel (because they are encrypted). So there is no way for this tunnel to not be persistent.
Of course, if the server (the target of CONNECT) decides to close the connection, then the tunnel is destroyed, too. So the server must support persistent connections (just as with a regular, non-TLS proxy).

Why are HTTP proxies able to support protocols like IRC and FTP?

I understand that a SOCKS proxy only establishes a connection at the TCP level while an HTTP proxy interprets traffic at HTTP level. Thus a SOCKS proxy can work for any kind of protocol while an HTTP Proxy can only handle HTTP traffic. But why does an HTTP Proxy like Squid can support protocol like IRC, FTP ? When we use an HTTP Proxy for an IRC or FTP connection, what does specifically happen? Is there any metadata added to the package when it is sent to the proxy over the HTTP protocol?
HTTP proxy is able to support high level protocols other than HTTP if it supports CONNECT method, which is primarily used for HTTPS connections, here is description from Squid wiki:
The CONNECT method is a way to tunnel any kind of connection through an HTTP proxy. By default, the proxy establishes a TCP connection to the specified server, responds with an HTTP 200 (Connection Established) response, and then shovels packets back and forth between the client and the server, without understanding or interpreting the tunnelled traffic
If client software supports connection through 'HTTP CONNECT'-enabled (HTTPS) proxy it can be any high level protocol that can work with such a proxy (VPN, SSH, SQL, version control, etc.)
As others have mentioned, the "HTTP CONNECT" method allows you to establish any TCP-based connection via a proxy. This functionality is needed primarily for HTTPS connections, since for HTTPS connections, the entire HTTP request is encrypted (so it appears to the proxy as a "meaningless" TCP connection). In other words, an HTTPS session over a proxy, or a SSH/FTPS session over a proxy, will both appear as "encrypted sessions" to the proxy, and it won't be able to tell them apart, so it has to either allow them all or none of them.
During normal operation, the HTTP proxy receives the HTTP request, and is "smart enough" to understand the request to be able to do high level things with it (e.g. search its cache to see if it can serve the response without going to the destination server, or consults a whitelist/blacklist to see if this URL is allowed, etc.). In "CONNECT" mode, none of this happens. The proxy establishes a TCP connection to the destination server, and simply forwards all traffic from the client to the destination server and all traffic from the destination server to the client. That means any TCP protocol can work (HTTPS, SSH, FTP - even plain HTTP)

Resources