I'm going to write a program that listens to an UDP port, and then dispatches the data to multiple server instances. The code of the server software has been structure to listen to a port itself, not to receive data from another program that runs locally. So my idea is basically to create a second UDP stream from the front-end program to the server instances through the loopback interface.
The application is latency critical, that is, the overhead shouldn't exceed 1 millisec. I'm wondering then if this is then the best approach or not: I fear that the packet would be scheduled again for another turn of dispatch by the kernel (linux in my case). If I'm right, will this latency be noticeable? If so, is the only solution to rewrite a new form of inter process communication between the front-end and the server application?
Scheduling is not really issue. If a process or thread is waiting on select() or recvmsg() it should be awakened by an incoming datagram almost instantly. (The sender gives up its CPU slice when it invokes sendmsg(), the kernel gets it to pass the message through the loopback, then the receiver will be higher than the sender in the scheduler.)
Latency through the loopback interface will be sub-millisecond, provided that the receiver is ready to receive.
Most of your latency will come from whatever each receiver is doing between reading packets off the socket. For example, if a receiver needs 0.5 milliseconds of CPU time processing between receipts, then your latency is going to be about 0.5 milliseconds. But if you are running 3 such receivers per CPU core, then your latency cannot be less than 1.5 milliseconds.
Related
I am chasing down a bug in my code that I think might have to do with a socket blocking on send. I'm working in C#, using the Socket class in blocking mode. I would love to be able to do some testing in my local environment to see what happens if the SendTo method blocks on send, but I am not sure if there is a way to do this on Windows.
What I am trying to do right now is to have two programs which I'm running locally. One sets up a UDP socket and then reads data very slowly (1 second delay between reads). The other program just sends a firehose of data to the first--one 63k datagram after another as fast as possible in an infinite loop. I was hoping that eventually some buffer somewhere would fill up and SendTo would block, but I am not having any luck.
Is my test fundamentally flawed, or is there some way to actually do this in Windows?
One other note: I am sending my packets on 127.0.0.1--do I actually need to have the packets routed out of my NIC for this to have any chance of working?
UDP is a datagram protocol for unreliable delivery. Your OS can just discard packets; there's no expectation of buffering. In fact, for applications like video streaming it's GOOD to discard packets, else you can end up with a lot of lag. UDP applications should detect lost packets themselves and adapt, e.g. by lowering the video resolution (or whatever makes sense for your type of application).
TCP is the reliable protocol. The OS will talk with the other OS to verify that all TCP data arrived. A slow reader also acknowledges the data slowly, which in turn slows down the sender. localhost simplifies this a bit, but for applications that doesn't really matter.
[edit]
Addressing your problem a bit more directly: since UDP doesn't care one iota about receiving data, you can just drop that receiver. That will loose 100% of the packets, but you were only wondering about the send part.
As for the buffering, you're right that localhost won't be effective. It's way too fast. The best approach might be to intentionally worsen your network connection. Perhaps your Ethernet can be forced to 100 Mbps? Perhaps an USB2 network adapter? UDP might be tolerant about losses, but if you run UDP over a VPN over TCP you suddenly get a bunch of slower software layers. Packets can now get lost after the VPN server, but any packets lost on the way to the VPN server need to be resent. And when that VPN connection runs over some bad WiFi, there's bound to be some packet loss.
I am told to increase TCP buffer size in order to process messages faster.
My Question is, no matter what buffer i am using for TCP message(ByteBuffer, DirectByteBuffer etc), whenever CPU receives interrupt from say NIC, to handle network request to read the socket data, does OS maintain any buffer in memory outside Address Space of requesting process(i.g. the process which is listening on that socket)
or
whatever way CPU receives network data, it will always be written in a buffer of process address space only and no buffer(including 'Recv-Q' and 'Send-Q' of netstat command) outside of the address space is maintained for this communication?
The process by which the Linux network stack receives data is a bit complicated. I wrote a comprehensive guide to the Linux network stack that explains everything you need to know starting from the device driver up to a userland program's socket receive queue.
There are many places buffers are maintained in the kernel:
The DMA ring where packets are written by the NIC after they've arrived.
References to the packets on the DMA ring are used to process the packet.
Eventually, the packet data is added to process' receive queue, if the receive queue is not full already.
Reads from the socket will pull packets from the process' receive queue.
If packet sniffing is occurring, packet data is duplicated and sent to any filters added by the packet sniffing code.
The full process of how data is moved, accounted for, and dropped (when required) is described in the blog post linked above.
Now, if you want to process messages faster, I assume you mean you want to reduce your packet processing latency, correct? If so, you should consider using SO_BUSYPOLL which can help reduce packet processing latency.
Increasing the receive buffer just increases the number of packets that can be queued for a userland socket. To increasing packet processing power, you need to carefully monitor and tune each component of the network stack. You may need to use something like RPS to increase the number of CPUs processing packets.
You will also want to monitor each component of your network stack to ensure that available buffers and CPU processing power is sufficient to handle your packet workload.
See:
http://linux.die.net/man/3/setsockopt
The options are SO_SNDBUF, and SO_RCVBUF. If you directly use the C-API, the call is setsockopt itself. If you use some kind of framework look up how to set socket options. This is indeed a kernel-side buffer, not one held by your process. It determines how many bytes the kernel can hold ready for you to fetch from a call to read/receive. It also affects the flow control mechanism of TCP.
You are being told to increase the socket send or receive buffer sizes. These are associated with the socket, in the TCP part of the kernel. See setsockopt() and SO_RCVBUF and SO_SNDBUF.
I have test case where I have one channel between hosts. Client schedules 10 threads. Each thread performs 1 request over this channel. These threads get executed at the same time.
I track communication using Wireshark.
From Wireshark log I see TCP retransmission packet every time I perform execution of my test. Moreover, size of this packet is always the same.
https://docs.google.com/file/d/0BxLwpCjOzREAMVhmTmxic3RqQWc/edit?usp=sharing
Another test case
https://docs.google.com/file/d/0BxLwpCjOzREAX3BCX0d3bzEyS2s/edit?usp=sharing
Does anybody faced this kind of behaviour? What could be the reason for that?
TCP does not prioritize traffic like IP. When there are a lot of TCP background connections opened that are uploading data (like when BitTorrent is seeding in background) delay may occur for a particular socket because TCP will choose only one socket at a time to send its packets to the IP level. So a particular socket must wait its turn besides a lot other connections without having any priority resulting a delay.
I am currently doing some experiments and I am trying to measure the delay created by TCP in such congestion situations. Because this delay occurs at the transport (TCP) level I am thinking to do a precise measurement of the delay by hooking the precise moments when some Linux system calls are used.
I am willing to upload data to a server using TCP (I can use Iperf tool). For hooking the system calls I want to use SystemTap. This tool can tell me the exact moment when a particular system call is called.
I want to know which are the names of two system calls used when sending a packet:
The first TCP level function called for a packet (is it tcp_sendmsg);
The last TCP level function called for a packet which passes it to the IP network level?
The difference (delta) between the moment of calling these two system functions is the delay I want to know.
The first TCP level function called for a packet is *tcp_sendmsg* from 'net/ipv4/tcp.c' system source file.
The last TCP level function called for a packet is *tcp_transmit_skb* from 'net/ipv4/tcp_output.c' system source file.
An interesting site with information about TCP source files from Linux is this: tcp_output
In my program the receiver has a bigger workload, should i make the sender wait for the receiver through methods like application level ACK?
You shouldn't directly send TCP ACK messages--those are handled at a low level by the OS. I'd look at the following in order of likelihood:
Is there some easy optimization on the receiver? It's pretty rare that you can really fill a network pipe more quickly than the receiver can handle the data. Make sure that the receiver has at least two threads: a network i/o thread and a work thread.
If the receiver is starting to panic, it could send a throttle message to the server, which makes the server cool its heels until the receiver catches up. This is more efficient than waiting for an ack message after every message, but it requires that the receiver know when it's about to fall behind, which might be difficult.
Alternately, the slowest but most reliable thing is to have the receiver acknowledge every message from the server, sort of like you mentioned. This wouldn't be a TCP ACK, but a special message in the data format that your sender/receiver are using for comm.