Is it better to keep a TCP socket open for a long time, or re-establish connection frequently?
Let's take HTTP in a browser as an example. Is it better to establish a connection, make the HTTP request, and upon receiving a response, close it only to open a new one when we make a new request? Or: Should you keep the socket open for as long as that particular user is still browsing that specific site?
There is no problem with leaving a socket open. Keep alive option is meant for this. I'll leave it open.
Related
Are there any general rules on when a website sends out a TCP reset, triggering the Connection reset by peer error?
Like
too many open connections
too high bandwidth use
connected for too long
…?
I'm pretty certain that there is no law governing this and that different websites/web developers have different tastes, but I would be interested if there are some general rule sets (from websites or textbooks on the subject or what you have been taught in school/at work) that are mostly followed.
Reason why I'm asking, of course, is that I want to get around being blocked…
I'm downloading some government data that is freely available, but is lacking an API or something, so the two official ways to get it are either clicking around in some web-GIS a few thousand times or going along the Kafkaesque path of explaining various levels of clerks the concepts of databases, csv files, zip files and that you can't (and won't need to, if they'd just did what you try to explain them) just drive to their agency with a "giant" harddrive, so I'm trying to just go the most resource saving way for everyone involved…
A website is not "sending" a "Connection reset by peer" error. This error is generated by the OS kernel on the client site if it gets a TCP reset for an active connection. There are many reasons this TCP reset might be sent. A TCP reset might be sent by design from some kind of load limit, for example to limit the number of connections from the same IP address within a specific time as a form of DOS protection, to restrict data scraping or to enforce some kind of fair use. There is no general rule or even law for this kind of explicit limits.
A TCP reset might also be caused by the application being overloaded, application crashing, system running out of resources ... .
And a TCP reset will happen if the client writes to a connection which the server already considers as closed. This can happen for example with HTTP keep alive: the server might close the connection on inactivity at any time after the HTTP response was sent. If the client sends a new request on the same connection at the same time the server closes the connection, the server will reject the new request (since the connection is closed on the server end) and will send a TCP RST, causing a connection reset by peer at the client. The client needs to properly handle this situation by creating a new connection and sending the request again (provided that the request was not state changing, i.e. is idempotent).
I've read some articles about http long-polling, but I don't understand these 2 things:
Why is recommended keeping connection open less than a minute? It should cause problems such as timeout or ..?
Why I should reopen the connection after I've received data from the server?
I have not heard that a long poll should be open for less than a minute. However, my first thought is that you may do that to detect if the connection has been dropped or to account for mobile devices switching between wifi and mobile data.
Your second question is much easier to answer, If you application is relying on long-polling to receive push notifications from the server, it will need to constantly keep a long polling connection open. Once data is sent from the server over a long polled connection, the request is completed and the connection is closed, which means you would need to open it again to receive another notification.
According to this blog, it seems half open connection is what we want to avoid.
So why does Java still provides the facility to make a socket half close?
According to this blog, it seems half open connection is what we want to avoid.
This author of the blog explicitly notes that he does not talk about deliberately half-closed connections but about half-open connections which are caused by intermediate devices like routers which drop the connection state after some timeout.
So why does Java still provides the facility to make a socket half close?
Because there are useful? Half-close just means that no more data will be send on the socket but it will still be able to receive data. This kind of behavior is actually useful for various situations where the client sends only a request and receives a response because it can be used to indicate the end of the request to the peer.
I have a general question regarding TCP-IP communication...
for the time being I try to create a small communication between an ATMega and a Raspberry Pi. I will transmit some data for example every 5 minutes (e.g. 100 byte) via TCP/IP Protocol.
Does it make sense to keep the connection open or shall I create a new connection for each dataset?
Thanks for your help...
webbolle
I would lean towards keeping the TCP connection open rather than opening a new one everytime.
Here are a few reasons. First, by using the same connection, you would save on not having to send TCP handshake message (SYN-based messages) and teardown messages (FIN-based messages). In your case, if you are going to transmit 100 bytes every 5 minutes, the overhead of SYN/FIN messages might be more than that. Second, if you already have the connection open, then you would save on time since there is no need to do the reconnection. Third, TCP might go to slow-start every time you start the connection -- should not be a problem with 100 bytes, but if you need to send more bytes, then with every new connection, TCP would start its send window with 1 MSS. But, if you reuse an existing connection, TCP would (probably) use the current window.
Also:
An open connection doesn't consume any resources (bandwith etc.) except for the ports it holds on both devices. Basically every TCP-connection that has been opened and not been closed is still open, save unintended disconnections etc.
For detecting those is also doesn't make a difference wether you keep open or reopen:
If the connection dropped out in the meantime you'll receive the more or less same error.
In my mochiweb application, I am using a long held HTTP request. I wanted to detect when the connection with the user died, and I figured out how to do that by doing:
Socket = Req:get(socket),
inet:setopts(Socket, [{active, once}]),
receive
{tcp_closed, Socket} ->
% handle clean up
Data ->
% do something
end.
This works when: user closes his tab/browser or refreshes the page. However, when the internet connection dies suddenly (say wifi signal lost all of a sudden), or when the browser crashes abnormally, I am not able to detect a tcp close.
Am I missing something, or is there any other way to achieve this?
There is a TCP keepalive protocol and it can be enabled with inet:setopts/2 under the option {keepalive, Boolean}.
I would suggest that you don't use it. The keep-alive timeout and max-retries tends to be system wide, and it is optional after all. Using timeouts on the protocol level is better.
The HTTP protocol has the status code Request Timeout which you can send to the client if it seems dead.
Check out the after clause in receive blocks that you can use to timeout waiting for data, or use the timer module, or use erlang:start_timer/3. They all have different performance characteristics and resource costs.
There isn't a default "keep alive" (but can be enabled if supported) protocol over TCP: in case there is a connection fault when no data is exchanged, this translates to a "silent failure". You would need to account for this type of failure by yourself e.g. implement some form of connection probing.
How does this affect HTTP? HTTP is a stateless protocol - this means that every request is independent of every other. The "keep alive" functionality of HTTP doesn’t change that i.e. "silent failure" can still occur.
Only when data is exchanged can this condition be detected (or when TCP Keep Alive is enabled).
I would suggest sending the application level keep alive messages over HTTP chunked-encoding. Have your client/server smart enough to understand the keep alive messages and ignore them if they arrive on time or close and re-establish the connection again.