How to increase maximum connection backlog limit in windows server - tcp

I have a Windows Server. Its maximum connection backlog limit (TCP) is 100.
Is there any way to increase this limit to a higher value – say 1000 or 2000?
I changed the Registry under this key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters
TcpNumConnections: DWORD 0x00fffffe
MaxHashTableSize: DWORD 0xffff
MaxFreeTcbs: DWORD 0xffffffff
TcpMaxHalfOpen: DWORD 0xffff
But, no matter what I do, I am still limited to a backlog of 100 connections.
Any ideas?

Related

DPDK MLX5 driver - QP creation failure

I am developing a DPDK program using a Mellanox ConnectX-5 100G.
My program starts N workers (one per core), and each worker deals with its own dedicated TX and RX queue, therefore I need to setup N TX and N RX queues.
I am using flow director and rte_flow APIs to send ingress traffic to the different queues.
For each RX queue I create a mbuf pool with:
n = 262144
cache size = 512
priv_size = 0
data_room_size = RTE_MBUF_DEFAULT_BUF_SIZE
For N<=4 everything works fine, but with N=8, rte_eth_dev_start returns:
Unknown error -12
and the following log message:
net_mlx5: port 0 Tx queue 0 QP creation failure
net_mlx5: port 0 Tx queue allocation failed: Cannot allocate memory
I tried:
to increment the number of Hugepages (up to 64x1G)
change the pool size in different ways
both DPDK 18.05 and 18.11
change the number of TX/RX descriptors from 32768 to 16384
but with no success.
You can see my port_init function here (for DPDK 18.11).
Thanks for your help!
The issue is related to the TX inlining feature of the MLX5 driver, which is only enabled when the number of queues is >=8.
TX inlining uses DMA to send the packet directly to the host memory buffer.
With TX inlining, there are some checks that fail in the underlying verbs library (which is called from DPDK during QP Creation) if a large number of descriptors is used. So a workaround is to use fewer descriptors.
I was using 32768 descriptors, since the advertised value in dev_info.rx_desc_lim.nb_max is higher.
The issue is solved using 1024 descriptors.

SMP affinity vs XPS on paired queues and TX queue selection control

I have a solarflare nic with paired rx and tx queues (8 sets, 8 core machine real machine, not hyperthreading, running ubuntu) and each set shares an IRQ number. I have used smp_affinity to set which irqs are processed by which core. Does this ensure that the transmit (tx) interrupts are also handled by the same core. How will this work with xps?
For instance, lets say the irq# is 115, set to core 2 (via smp_affinity). Say the nic chooses tx-2 for outgoing tcp packets, which also happens to have 115 irq number. If I have an xps setting saying tx-2 should be accessible by cpu 4, then which one takes precedence - xps or smp_affinity?
Also is there a way to see/set which tx queue is being used for a particular app/tcp connection? I have an app that receives udp data, processes it and sends tcp packets, in a very latency sensitive environment. I wish to handle the tx interrupts on the outgoing on the same cpu (or one on the same numa node) as the app creating this traffic, however, I have no idea how to find which tx queue is being used by this app for this purpose. While the receive side has indirection tables to set up rules, I do not know if there is a way to set the tx-queue selection and therefore pin it to a set of dedicated cpus.
You can tell the application the preferred CPU by setting the cpu affinity (taskset) or numa node affinity, and you can also set the IRQ affinities (in /proc/irq/270/node, or by using the old intel script floating around 'set_irq_affinity.sh' which is on github). This won't completely guarantee which irq / cpu is being used, but it will give you a good head start on it. If all that fails, to improve latency you might want to enable packet steering in the rxqueue so you get the packets in quicker to the correct cpu (/sys/class/net//queues/rx-#/rps_cpus and tx-#/xps-cpus). There is also the irqbalance program and more....it is a broad subject and i am just learning much of it myself.

How to set ssthresh value in tcp

I am trying to start TCP slow start congestion algoritham in my raspberry device. As it documented in RFC 2581, it needs to set ssthresh value greater than the congestion window (cwnd). So I have chnaged /sys/module/tcp_cubic/parameters# sudo nano initial_ssthresh value to 65000 and cwnd was set to 10 ( checked with ss -i). After this settings I tried to send big packet from raspberry of size 19000 bytes. According to slow start it first needs to send to the destination device 2 packtes and then 4, then 8 ..etc.
But its not happening at raspberry. it sending me 10 packtes. Did I do something worng ?. In this case How can i start slow start algoritham.
Thanks
When CWND is less than ssthresh, the connection is in slowstart. When the CWND becomes greater than the ssthresh, the connection goes into congestion avoidance.
What you're seeing is that newer versions of linux have the initial congestion window set to 10. Before it was the default setting, you could change your initial congestion window from 3 through an ip route command. I haven't tried it, but I'm guessing you can do the opposite here.
Long story short, your machine is doing slow start. It is just starting with a larger initial congestion window.

Serial driver hw fifo overrun at 460800 baud rate

I am using 2.6.32 OMAP based linux kernel. I have observed that at high speed data rate (Serial port set to 460800 baud rate) serial port HW fifo overflow happens.
The serial port is configured to generate interrupt at every 8 bytes in rx and tx both direction (i.e when the serial port HW fifo is 8 byte full serial interrupt is generated which reads the data from the serial port at once).
I am transmitting 114 bytes packet continuously (Serial driver has no clue about the packet mode, it receives data in raw mode). Based on calculations,
460800 bits/sec => 460800/10 = 46080 bytes/sec (Where 1 stop bit and 1 start bit) so in 1 second I can transmit under worst case 46080/114 => 404.21 packets without any issue.
But, I expect the serial port to handle at least 1000 packets per second as such I have configured serial driver to generate interrupt every 8 bytes.
I tried the same using windows XP and I am able to read upto 600 packets / second.
Do you think this is feasible on linux under above circumstances? or I am missing something? Let me know your comments.
could someone also, send some important configuration settings that needs to be configured in .config file. I am unable to attach .config file otherwise, I can share it.
There are two kind of overflows that can occur for a serial port. The first one is the one you are talking about, the driver not responding to the interrupt fast enough to empty the FIFO. They are typically around 16 bytes deep so getting a fifo overflow requires the interrupt handler to be unresponsive for 1 / (46080 / 16) = 347 microseconds. That's a really, really long time. You have to have a pretty drastically screwed up driver with a higher priority interrupt to trip that.
The second kind is the one you didn't consider and offers more hope for a logical explanation. The driver copies the bytes from the fifo into a receive buffer. Where they will sit until the user mode program calls read() to read them. Getting an overflow on that buffer will happen when don't configure any kind of handshaking with the device and the user mode program is not calling read() often enough. It looks exactly like a fifo buffer overflow, bytes just disappear. There are status bits to warn about these problems but not checking them is a frequent oversight. You didn't mention doing that either.
So start by improving the diagnostics, do check the overflow status bits to know what's going on. Then do consider enabling handshaking if you find out that it is actually a buffer overflow issue. Increasing the buffer size is possible but not a solution if this is a fire-hose problem. Getting the user mode program to call read() more often is a fix but not an easy one. Just lowering the baud rate, yeah, that always works.

TCP Socket no connection timeout

I open a TCP socket and connect it to another socket somewhere else on the network. I can then successfully send and receive data. I have a timer that sends something to the socket every second.
I then rudely interrupt the connection by forcibly losing the connection (pulling out the Ethernet cable in this case). My socket is still reporting that it is successfully writing data out every second. This continues for approximately 1hour and 30 minutes, where a write error is eventually given.
What specifies this time-out where a socket finally accepts the other end has disappeared? Is it the OS (Ubuntu 11.04), is it from the TCP/IP specification, or is it a socket configuration option?
Pulling the network cable will not break a TCP connection(1) though it will disrupt communications. You can plug the cable back in and once IP connectivity is established, all back-data will move. This is what makes TCP reliable, even on cellular networks.
When TCP sends data, it expects an ACK in reply. If none comes within some amount of time, it re-transmits the data and waits again. The time it waits between transmissions generally increases exponentially.
After some number of retransmissions or some amount of total time with no ACK, TCP will consider the connection "broken". How many times or how long depends on your OS and its configuration but it typically times-out on the order of many minutes.
From Linux's tcp.7 man page:
tcp_retries2 (integer; default: 15; since Linux 2.2)
The maximum number of times a TCP packet is retransmitted in
established state before giving up. The default value is 15, which
corresponds to a duration of approximately between 13 to 30 minutes,
depending on the retransmission timeout. The RFC 1122 specified
minimum limit of 100 seconds is typically deemed too short.
This is likely the value you'll want to adjust to change how long it takes to detect if your connection has vanished.
(1) There are exceptions to this. The operating system, upon noticing a cable being removed, could notify upper layers that all connections should be considered "broken".
If want a quick socket error propagation to your application code, you may wanna try this socket option:
TCP_USER_TIMEOUT (since Linux 2.6.37)
This option takes an unsigned int as an argument. When the
value is greater than 0, it specifies the maximum amount of
time in milliseconds that transmitted data may remain
unacknowledged before TCP will forcibly close the
corresponding connection and return ETIMEDOUT to the
application. If the option value is specified as 0, TCP will
use the system default.
See full description on linux/man/tcp(7). This option is more flexible (you can set it on the fly, just right after a socket creation) than tcp_retries2 editing and exactly applies to a situation when you client's socket doesn't aware about server's one state and may get into so called half-closed state.
Two excellent answers are here and here.
TCP user timeout may work for your case: The TCP user timeout controls how long transmitted data may remain unacknowledged before a connection is forcefully closed.
there are 3 OS dependent TCP timeout parameters.
On Linux the defaults are:
tcp_keepalive_time default 7200 seconds
tcp_keepalive_probes default 9
tcp_keepalive_intvl default 75 sec
Total timeout time is tcp_keepalive_time + (tcp_keepalive_probes * tcp_keepalive_intvl), with these defaults 7200 + (9 * 75) = 7875 secs
To set these parameters on Linux:
sysctl -w net.ipv4.tcp_keepalive_time=1800 net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=20

Resources