Troubleshoot RServe config option keep.alive

Troubleshoot RServe config option keep.alive - r

I am using RServe 1.7.3 on a headless RHEL 7.9 VM. On the client, I am using RserveCLI2.
On long running jobs, the TCP/IP connection becomes blocked by a fire wall, after 2 hours.
I came across the keep.alive configuration option, that is available since RServe 1.7.2 (RServe News/Changelog).
The specs read:
added support for keep.alive configuration option - it is global to
all servers and if enabled the client sockets are instructed to keep
the connection alive by periodic messages.
I added the following to /etc/Rserv.conf:
keep.alive enable
but this does no prevent the connection from being blocked.
Unfortunately, I cannot run a network monitoring tool, like Wireshark, to monitor the traffic between client and server.
How could I troubleshoot this?
Some specific questions I have:
Is the path of the config file indeed /etc/Rserv.conf, as specified in Documentation for Rserve? Notice that it does not have a final e, like Rserve.
Does this behaviour depend on de RServe client in use, or is this completely handled at the socket level?
Can I inspect the runtime settings of RServe, to see if keep.alive is enabled?

We got this to work.
To summarize, we adjusted some kernel settings to make sure keep-alive packets are send at shorter intervals to prevent the connection from being deemed dead by network components.
This is how and why.
The keep.alive enable setting is in fact an instruction to the socket layer to periodically emit keep-alive packets from server to client. The client is expected to return an ACK on these packets. The behaviour is governed by three kernel-level settings, as explained in TCP Keepalive HOWTO - Using TCP keepalive under Linux:
tcp_keepalive_time (defaults to 7200 seconds)
tcp_keepalive_intvl (defaults to 75 seconds)
tcp_keepalive_probes (defaults to 9 times)
The tcp_keepalive_time is the first time a keep-alive packet is sent, after establishing the tcp/ip connection. The tcp_keepalive_intvl interval is de wait time between subsequent packets and tcp_keepalive_probes the number of subsequent unacknowledged packets that make the system decide the connection is dead.
So, the first keep-alive packet was only send after 2 hours. After that time, some network component had already decided the connection was dead and the keep-alive packet never made it to the client and thus no ACK was ever send.
We lowered both tcp_keepalive_time and tcp_keepalive_intvl to 600 seconds.
With tcpdump -i [interface] port 6311 we were able to monitor the keep-alive packets.
15:40:11.225941 IP <server>.6311 <some node>.<port>: Flags [.], ack 1576, win 237, length 0
15:40:11.226196 IP <some node>.<port> <server>.6311: Flags [.], ack 401, win 511, length 0
This continues until the results are send back and the connection is closed. At least, I test for a duration of 12 hours.
So, we use keep-alive here not to check for dead peers, but to prevent disconnection due to network inactivity, as is discussed in TCP Keepalive HOWTO - 2.2. Why use TCP keepalive?. In that scenario, you want to use low values for keep-alive time and interval.
Note that these are kernel level settings, and thus are applied system-wide. We use a dedicated server, so this is no issue for us, but may be in other cases.
Finally, for completeness, I'll answer my own three questions.
The path of the the configuration is /etc/Rserv.conf, as was confirmed by changing another setting (remoted enable to remote disable).
This is handled a the socket level.
I am not sure, but using tcpdump shows that Rserve emits keep-alive packets, which is a more useful way to inspect what's happening.

Related

Data cost of keeping a tcp connection open

Let's suppose 2 computers:
The first is running a netcat server on a tcp port.
The second is running a netcat client, connected to the previous netcat server.
(netcat is an example, you can imagine a basic c program with socket)
We ca send data between the 2 computers.
Let's imagine nobody send data during multiple days.
Is there a timeout in tcp stack ?
Does netcat (or operating system) sends some packets to keep the connection opened ?
What i want to know is how much data is sent if there is no top level activity.
Thanks

Is there a timeout in tcp stack ?
There are many different timeouts in the TCP stack, depending on what state we are currently in, and how the connection was configured (e.g. with keepalive or not). The idle connection timeout (which is what you refer to) does not seem to be defined. With keepalive the timeout is ~2 hours. That being said pretty much every firewall in the world will setup some timeout. Based on this reddit thread 15 minutes looks like a reasonable assumption, maybe even 1 hour. But multiple days? I doubt it will be alive in any network (except your own).
Does netcat (or operating system) sends some packets to keep the connection opened ?
No. You will have to do it yourself by sending data. With the keepalive option for TCP, the OS will do it for you (note: keepalive is disabled by default), but this works between direct peers, i.e. may fail when proxies are involved. Sending data is definitely a better approach.

how to modify Iperf TCP connection timeout?

I am using mininet for emulation of a network. My network has a delay of 3000ms(linear topology of 3 switches).When I tried to do iperf I got Connection failed : No route to host error in client. After a lot of time with the help of internet i came to know that this is happening because of large delay of network which causes ACK packet delayed. Thus thia ACK pcket for SYN will client after timeout. So I want to modify this timeout value. How can I do this. I am using iperf2 and ubuntu18.04. ( I think using iperf3 this is possible with --connect-timeout nms)

iperf 2 doesn't support --connect-timeouts. The preferred way to control that is via the operating system itself, e.g. syn retries. More on that here. We don't think we should be messing with TCP fundamentals directly as we want to separate testing from the things under test.
As an aside, iperf 2.0.14 has a --connect-only option which can be used to measure the TCP 3WHS performance. We also added a --connect-retries for application level retries.
Bob

TCP Teardown - Retransmissions

I am writing a small stateful firewall application as a school project.
I try to figure out, why my webserver (172.16.0.11) does TCP retransmissions of his FIN+ACK packet.
The first retransmission might be because of the retransmission timeout (the firewall application was quite busy at this time).
I am more interested in the retransmissions at the end.
From my point of view, the sequence 1635678 was acknowledged by the client (172.16.0.10) in his FIN+ACK (No. 298).
Why are there retransmissions of the FIN+ACK, even if the sequence was already acknowledged?
Both (client and server) are Debian 8.1 systems, using Iceweasel as browser and Apache as webserver.
Capture at server:
Capture at client:

How many times will TCP retransmit

In the case of a half open connection where the server crashes (no FIN or RESET sent to client), and the client attempts to send some data on this broken connection, each TCP segment will go un-ACKED. TCP will attempt to retransmit packets after some timeout. How many times will TCP attempt to retransmit before giving up and what happens in this case? How does it inform the operating system that the host is unreachable? Where is this specified in the TCP RFC?

If the server program crashes, the kernel will clean up all open sockets appropriately. (Well, appropriate from a TCP point of view; it might violate the application layer protocol, but applications should be prepared for this event.)
If the server kernel crashes and does not come back up, the number and timing of retries depends if the socket were connected yet or not:
tcp_retries1 (integer; default: 3; since Linux 2.2)
The number of times TCP will attempt to
retransmit a packet on an established connection
normally, without the extra effort of getting
the network layers involved. Once we exceed
this number of retransmits, we first have the
network layer update the route if possible
before each new retransmit. The default is the
RFC specified minimum of 3.
tcp_retries2 (integer; default: 15; since Linux 2.2)
The maximum number of times a TCP packet is
retransmitted in established state before giving
up. The default value is 15, which corresponds
to a duration of approximately between 13 to 30
minutes, depending on the retransmission
timeout. The RFC 1122 specified minimum limit
of 100 seconds is typically deemed too short.
(From tcp(7).)
If the server kernel crashes and does come back up, it won't know about any of the sockets, and will RST those follow-on packets, enabling failure much faster.
If any single-point-of-failure routers along the way crash, if they come back up quickly enough, the connection may continue working. This would require that firewalls and routers be stateless, or if they are stateful, have rulesets that allow preexisting connections to continue running. (Potentially unsafe, different firewall admins have different policies about this.)
The failures are returned to the program with errno set to ECONNRESET (at least for send(2)).

Tcp Socket Closed

I always thought that if you didn't implement a heartbeat, there was no way to know if one side of a TCP connection died unexpectedly. If the process was just killed on one side and didn't exit gracefully, there was no way for the socket to send FIN or let the other side know that it was closed.
(See some of the comments here for example http://www.perlmonks.org/?node_id=566568 )
But there is a stock order server that I connect to that has a new "cancel all orders on disconnect feature" that cancels live orders if the client dis-connects. It works even when I kill the process on my end, and there is definitely no heartbeat from my app to it.
So how is it able to detect when I've killed the process? My app is running on Windows Server 2003 and the order server is on Suse Linux Enterprise Server 10. Does Windows detect that the process associated with the socket is no longer alive and send the FIN?

When a process exits - for whatever reason - the OS will close the TCP connections it had open.
There's numerous other ways a TCP connection can go dead undetected
someone yanks out a network cable inbetween.
the computer at the other end gets nuked.
a nat gateway inbetween silently drops the connection
the OS at the other end crashes hard.
the FIN packets gets lost.
Though enabling tcp keepalive, you'll detect it eventually - atleast during a couple of hours.

It could be using a TCP Keep Alive to check for dead peers:
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html

As far as I know, the OS detects the process termination and closes all the file descriptors/sockets/handles the process was using. So, there isn't difference between "killing" application and "gracefully terminating". Of course, the kernel itself must be running (=pc turned on, wire connected...). But it's on the OS the job of sending the FIN and so on...
Also, if a host becomes unreachable /turned off, disconnected...) an intermediate gateway (or the client itself) may detect the event (e.g. loss of carrier, DHCP lease not renewed...) and reply to the packets sent to the died host with a ICMP error (host/network unreachable). This causes the peer's TCP connection to die, but it happens only if the client has some packet to send to the host.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex