About HTTP 1.1 persistent connection and TCP - http

When I was introduced to HTTP 1.0 and 1.1, it was emphasized that the main difference was that 1.1 allows a single TCP connection for all of the objects as opposed to 1.0 where a new connection was made for each object download.
My question is, since a connection isn't really continuous but discrete (i.e in packets), then how come each different packet of each object that is being downloaded, doesn't need to go through the ACK, SYN TCP protocol?
How do they all know about the first ACK, SYN that was made? (perhaps made even to a different server than the objects?)

Not much of your question makes sense. HTTP keepalive only operates on connections to the same target, not 'for all of the objects'. The part about packets doesn't have anything to do with HTTP: you're really asking how TCP works; and the answer to that is that every TCP segment contains the source and destination IP addresses and ports, which are unique to a connection, and a sequence number, for data ordering purposes.

Related

Do all protocols based on TCP use one socket per transfer?

I'm studying Socket Programming HOWTO and the author at some point says that
A protocol like HTTP uses a socket for only one transfer.
Is it because of the design of the HTTP protocol itself? Or is it because it is based on TCP, so all protocols based on it (e.g. UDP) must use one socket for only one transfer?
This statement is taken out of context. The context is to point out that TCP is not a message based protocol but an unstructured byte stream. And to have a message semantic one needs to have some way to determine where a message ends.
It then takes HTTP as an example where a message might simply end with a connection close and points out the limitations - namely only a single message per connection per direction. Then it goes on to describe how protocols can be designed without this limitation, i.e. having multiple messages per connection.
HTTP still can be used like this, i.e. have a single request and end with connection close. This is the design of HTTP version 0.9, but can still be done with HTTP/1. But with HTTP/1 it can also be used for multiple messages, one after the other. And with HTTP/2 it can do multiple messages in parallel, multiplexed over a single TCP connection. And HTTP/3 does not even use TCP anymore.
Do all protocols based on TCP use one socket per transfer?
Protocols are not limited to one connection ("socket") per message ("transfer"). Depending on the design of the protocol multiple messages can be send one after the other by having some pre-known message size or a clear message delimiter. Some protocols might send multiple messages in parallel by implementing a multiplexing layer on top of TCP. Some protocols might even use multiple TCP connections in parallel to deliver a single message, i.e. distributing the message over multiple connections.
That statement was probably written in 1996 or earlier. Since 1997, HTTP supports persistent connections, reusing the same TCP connection and the same socket for multiple queries.

Why can't tcp use two way handshake?

Client Send SYN with client isn.
Server reply SYN ACK with server's isn.
Client resend SYN if timeout?
When client send data, it can accumulated confirm the server's isn.
I try to search, but can't find the answer.
I know how the tcp is designed now, I just don't know why it's designed like this. Why can't use a two way handshake.
It cannot use a two way handshake by definition. TCP/IP is formalized as a standard for communication across networks (Internetworking). Specifically, RFC 793 requires that:
The "three-way handshake" is the procedure used to establish a
connection. This procedure normally is initiated by one TCP and
responded to by another TCP. The procedure also works if two TCP
simultaneously initiate the procedure. When simultaneous attempt
occurs, each TCP receives a "SYN" segment which carries no
acknowledgment after it has sent a "SYN". Of course, the arrival of
an old duplicate "SYN" segment can potentially make it appear, to the
recipient, that a simultaneous connection initiation is in progress.
Proper use of "reset" segments can disambiguate these cases.
If you think about how the protocol works, you do not actually have a single full duplex connection. Instead, you have two simplex connections, each going in one direction. This is why the proper response to a SYN is a SYN-ACK. The server is acknowledging the synchronization request by sending the originators sequence number as the acknowledgement number plus one; it is simultaneously attempting to open its own connection to the client by sending a SYN request to synchronize.
The proper answer to a SYN will always be an ACK, even if the connection fails.
Regarding your question about sending a retry, yes; the client will send a retry (typically up to three, but it could be as many as eight in practice... There's no limit defined) attempting to elicit a response of some kind (preferably a SYN-ACK, but possibly a RST-ACK).

What does no timestamps option in SYN-ACK packet mean?

I'm working on a tool to test a tcp proxy and I see the following:
Client sends a SYN to the server with timestamps option (TSval=12345, TSecr=0).
Server sends a SYN-ACK and omits the timestamps option.
The problem is that I'm not sure if a TCP is supposed to interpret that to mean that the timestamps option should not be used.
I've been poring over rfc 1323 for a while now, and this is all I could find on the subject:
A TCP may send the Timestamps option
(TSopt) in an initial segment
(i.e., segment containing a SYN bit
and no ACK bit), and may send a TSopt
in other segments only if it received a TSopt in the initial
segment for the connection.
I infer from this that the fact that the SYN-ACK is missing the timestamps option doesn't mean anything about whether or not it's valid later in the session. However, doing so requires me to assume a bit more than I'd like. Does anyone have an authoritative source on the subject or personal experience with how different TCP stacks behave in this situation?
The peer TCP is indicating that it does not support (or wish to use) the option. You should not send any further Timestamp options on that session.
This is fairly standard practice for TCP options.

tcpip 3-way handshake

Why is data not transferred during the 3rd part of TCP 3-way handshake?
e.g.
(A to B)SYN
(B to A)ACK+SYN
(A to B) ACK.... why cant data be transferred along with this ACK?
I've always believed it was to keep the session establishment phase separate from the data transfer phase so that no real data is transferred until both ends of the session have agreed on the sequence numbers and session options, especially since packets arriving may be from a totally different, previous, session that just happens to have the same endpoints.
However, on further investigation, I'm not entirely certain that transmitting data with the handshake packets is disallowed. The section on TCP connection establishment in my Internetworking with TCP/IP1 book contains the following snippet:
Because of the protocol design, it is possible to send data along with the initial sequence numbers in the handshake segments. In such cases, the TCP software must hold the data until the handshake completes. Once a connection has been established, the TCP software can release data being held and deliver it to a waiting application program quickly.
Since it's certainly possible to construct a TCP packet with SYN (or ACK) and data, this may well be allowed. I've never seen it happen in the wild but, then again, I've never seen a hairy-eared dwarf lemur in the wild either, though I'm assured they exist.
It may be that it's the sockets software that prevents data going out before the session is fully established but TCP appears to consider it valid. It appears you can send data with a SYN-ACK packet (phase 2 of the connection establishment) since you have the other end's sequence number and options. Likewise, sending data with the phase 3 ACK packet appears to be possible as well.
The reason the TCP software holds on to the data until the handshake is fully complete is probably due to the reason mentioned above - only once both ends have agreed on the sequence numbers can you be sure that the data is not from a previous session.
1 Internetworking with TCP/IP Volume 1 Principles, Protocols and Architecture, 3rd edition, Douglas E. Comer, ISBN 0-13-216987-8.

Working with persistent HTTP connections

We are trying to implement a proxy proof of concept but have encountered an interesting question: Since a single HTTP connection can, and indeed should, make multiple requests, and the HTTP transactions are sent via multiple packets due to TCP's magic, is it possible for a HTTP request to begin in the middle of a packet?
Bear in mind that this is not a theoretical question regarding possible optimization of the browser, but whether it actually happens in real life. It would be even better if someone could point me to a written reference on whether or not this is possible and if so how often it can occur.
Clarification update: We know that if we work in the HTTP layer alone we would not need to bother with this question, however we're trying to figure out if some advanced technique could be applied by working on the TCP layer first.
Assuming that you are talking about IP packets: Yes, it is possible that HTTP request starts middle of IP packet.
When you are using persistent HTTP connections, that is, using same TCP connection for several HTTP requests, it is fully possible that request boundary is middle of IP packet.
Also there is a TCP protocol between IP and HTTP. TCP contains also some headers so a IP packet may start with some TCP headers and rest of the packet consists of HTTP request.
HTTP request may also consist of several IP packets (in case of file uploads, transmission errors and following retransmissions etc).
However, I wonder why you are interested in packets if you are working at HTTP level. TCP should hide the IP packet details.
First of all, TCP is a stream based protocol and has no concept of packets. HTTP itself might have some kind of message or record delimiter, but TCP doesn't.
This page might be helpful: Structure of HTTP Transactions
From your question it sounds like you think that each read from a TCP socket is a "packet" of data. In reality, each read simply reads as many bytes as are in the buffer up to the maximum that you requested, without any concept of records or packets.
So for instance, lets say you read 2048 bytes from the socket, you could have the tail end of one transaction, followed by the beginning of a second response half way through the data you read, and only get the remainder of your second response on your next read from the socket.
If you're here in Jerusalem or near by maybe I could help you out.
Unless you are implementing your own TCP stack, you should not need to worry about the packets, but rather about the API that the TCP provides, in case of POSIX interfaces it would be the recv() or read(). So I treat the question then as "Can more than one HTTP requests come into a single read(), and can the HTTP request be split between multiple read() requests?" -- The answer to both would be "yes, it is possible".
An example of where this can happen is HTTP pipelining. This not frequent in real life (ironically, at least some of the browsers disable it by default because of "buggy proxies" :-) - but when it happens, can be a bit of a problem for the users to diagnose - especially if they have no access to the proxy.
One very notable place where it does happen by default apt-get in Debian-derived linux systems. Just install a Debian or Ubuntu server and try to use it through your proxy. You can do that by editing the /etc/apt/apt.conf.d/proxy file and placing the following there:
Acquire::http::Proxy "http://your.proxy.address:8080";
Depends of which abstraction layer of a packet you are talking about: there are many layers underneath HTTP.
HTTP --> TCP (byte stream) --> IP (packet) --> (possibly something else) Ethernet (frame) --> (possibly) some other transport
If you are talking about the IP layer, then yes the HTTP layer would start later on... Note that TCP presents a "byte stream interface" to its Client layer hence, no concept of packet here.
I think I understand where you are trying to go with this question.
If you don't use persistent HTTP connections, the HTTP GET request header is always the very first thing which is sent over the TCP connection, so we can be sure that the start of the HTTP GET request header does "not start in the middle of some TCP packet". But keep in mind that there may be one or more TCP packets without any user data, e.g. only a SYN, which may preceed the TCP packet with the start of the HTTP GET request header. And also keep in mind that the HTTP GET request header may not be contained in a single TCP packet.
If you do use persistent HTTP connections, the start of the HTTP GET request header for request number N+1 can start in the middle of a TCP packet, namely after the end of HTTP GET request body of request number N.
If you are asking these questions you are possibly "doing it wrong". As several other responders have already pointed out, in the vast majority of cases you should probably just be a TCP client and deal with a TCP stream of data and let the TCP code worry about the TCP packets. (Unless, of course, you are working on some special hardware which is looking at individual IP packets as they fly by and try to do some processing at the HTTP layer.)

Resources