Http requests and TCP connections - http

My understanding so far is that when someone tries to access web page the following happens:
HTTP request is formed
New socket is opened
HTTP request is sent
If everything went OK, the web browser accepts HTTP response and builds DOM tree out of received HTML. If there are any resources missing, new HTTP request needs to be made for each one separately.
Each of those HTTP requests requires opening another socket (establishing new virtual connection with server).
Q: How is that efficient? I understand those resources could be located on another host (which would indeed require new TCP connection) but if they are all on the same host wouldn't it be way more efficient to transfer all data within single TCP connection.

Each of those HTTP requests requires opening another socket (establishing new virtual connection with server).
No it doesn't. HTTP 1.1 uses persistent connections by default, and HTTP 1.0 before it had the unofficial Connection: keep-alive header, which accomplished the same thing, nearly twenty years ago.
Q: How is that efficient?
It isn't, and that's why it doesn't happen.
I understand those resources could be located on another host (which would indeed require new TCP connection) but if they are all on the same host wouldn't it be way more efficient to transfer all data within single TCP connection.
Yes, and that is what happens by default.

Related

Close HTTP request socket connection

I'm implementing HTTP over TLS proxy server (sni-proxy) that make two socket connection:
Client to ProxyServer
ProxyServer to TargetServer
and transfer data between Client and TargetServer(TargetServer detected using server_name extension in ClientHello)
The problem is that the client doesn't close the connection after the response has been received and the proxy server waits for data to transfer and uses resources when the request has been done.
What is the best practice for implementing this project?
The client behavior is perfectly normal - HTTP keep alive inside the TLS connection or maybe even a Websocket connection. Given that the proxy does transparent forwarding of the encrypted traffic it is not possible to look at the HTTP traffic in order to determine exactly when the connection can be closed. A good approach is therefore to keep the connection open as long as the resources allow this and on resource shortage close the connections which were idle (no traffic) the longest time.

When request the resources of HTML page will open new TCP connections?

We know when request a web page, there will open a TCP connection, request the html page.
there is an example:
Suppose, there open the TCP connection:
192.168.1.2.54587 --- 104.17.23.75.443 (cloudflare)
we know, in the main HTML page, there are many js files, css files and images embed in it.
when request those resources, will open new TCP connections? or just use the existing connection?
It depends on the actual application protocol used and its configuration. With HTTP/2 and HTTP/3 (which is not even TCP, i.e. it uses UDP) the same underlying connection will be used as long as the requested resource is on the same server.
With HTTP/1 a new TCP connection will be created or an existing one reused, depending if an existing connection can be used at all (HTTP keep-alive), is idle and how many TCP connections are already used to the target. Details are browser specific too.

HTTP over AF_UNIX: HTTP connection to unix socket

We have HTTP server , for which we have HTTP client based application (on Linux) working fine.
But now we need to listen on Unix domain sockets from our client application.
So is it possible to send/receive httprequest, httpresponse packet from the unix domain socket?
Scenerio1:When connecting to localhost, it is required to eliminate the SSL
overhead by connecting HTTP to the unix socket instead of HTTPS to the
local port.
Basically Looking for a standard encoding a unix socket path in an HTTP URL.
Many Thanks in advance.
So long as your socket is a stream socket (SOCK_STREAM rather than SOCK_DGRAM) then it's technically possible. There's nothing in HTTP that requires TCP/IP, it just requires a reliable bidirectional stream.
However I've never seen an HTTP client that knows how to connect to such a socket. There's no URL format that I know of that would work, should you actually need to use a URL to talk to the server.
Also note that some things that normal web servers depend on (such as getpeername(), to identify the client) make no sense when you're not using TCP/IP.
EDIT I just saw your edit about mapping localhost to the UNIX socket. This is perfectly feasible, you just need to ensure that the client knows how to find the path of the UNIX socket that should be used instead of connecting to 127.0.0.1:xxx

HTTP and Sessions

I just went through the specification of http 1.1 at http://www.w3.org/Protocols/rfc2616/rfc2616.html and came across a section about connections http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8 that says
" A significant difference between HTTP/1.1 and earlier versions of HTTP is that persistent connections are the default behavior of any HTTP connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server.
Persistent connections provide a mechanism by which a client and a server can signal the close of a TCP connection. This signaling takes place using the Connection header field (section 14.10). Once a close has been signaled, the client MUST NOT send any more requests on that connection. "
Then I also went through a section on http state management at https://www.rfc-editor.org/rfc/rfc2965 that says in its section 2 that
"Currently, HTTP servers respond to each client request without relating that request to previous or subsequent requests;"
A section about the need to have persistent connections in the RFC 2616 also said that prior to persistent connections every time a client wished to fetch a url it had to establish a new TCP connection for each and every new request.
Now my question is, if we have persistent connections in http/1.1 then as mentioned above a client does not need to make a new connection for every new request. It can send multiple requests over the same connection. So if the server knows that every subsequent request is coming over the same connection, would it not be obvious that the request is from the same client? And hence would this just not suffice to maintain the state and would this just nit be enough for the server to understand that the request was from the same client ? In this case then why is a separate state management mechanism required at all ?
Basically, yes, it would make sense, but HTTP persistent connections are used to eliminate administrative TCP/IP overhead of connection handling (e.g. connect/disconnect/reconnect, etc.). It is not meant to say anything about the state of the data moving across the connection, which is what you're talking about.
No. For instance, there might an intermediate (such as a proxy or a reverse proxy) in the request path that aggregates requests from multiple TCP connections.
See http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-21.html#intermediaries.

IIS HTTP Keep-Alives

I am reading that Keep-Alives is meant for performance - so that no connections need to be recreated but just reuse the existing ones. What if there is a traffic spike, will new connections be created?
Additionally, if I don't turn on Keep-Alive and in a high traffic environment, will it eventually running out of connections/socket port on client side? because a new connection has to be created for each http/web request.
HTTP is a stateless protocol.
In HTTP 1.0 each request meant opening a new TCP connection.
That caused performance issues (e.g. have to re-do the 3-way handshake for each GET or POST) so the Keep-Alive Header was added to maintain the connection across requests and in HTTP1.1 the default is persistent connection.
This means that the connection is reused across requests.
I am not really familiar with IIS but if there is a configuration to close the connection after each HTTP response, it will have impact on the performance.
Concerning the running out of sockets/ports on the client side, that could occur if the client fires a huge amount of requests and a new TCP connection must be opened per HTTP request.
After a while the ports will be depleted

Resources