.ndo_start_xmit in Usbnet.c driver - networking

In usbnet.c driver, there is function pointer called .ndo_start_xmit which gets called whenever there is a packet to be transmitted from the upper layers.
Can it be called twice for the same packet and in what cases ??
If the USB endpoint buffers are full, then the tx_complete will call netif_stop_queue to diable queuing of packets into the queue from where .ndo_start_xmit fetches the packet to be transmitted to the USB subsystem.
So when the the USB endpoints buffer space is available after the short period , then again netif_wake_queue gets called that enables queueing of packets again in the queue so that the urbs(packets) are passed to the USB subsystem.
Is my understanding correct ?.
-Sumeet

Related

What are the differences between Kernel Buffer, TCP Socket Buffer and Sliding Window

Here's my understanding of incoming data flow in TCP/IP
Kernel reads data to its buffer from network interface
Kernel copy data from its buffer to TCP Socket Buffer, where Sliding Window works
The program that is blocked by read() wakes up and copy data from socket buffer.
I'm a little bit confused about where does the sliding window locate, or is it the same as socket buffer
Linux does not handle TCP's sliding window as a separate buffer, rather as several indices indicating how much has already been received / read. The Linux kernel packet handling process can be described in many ways and can be divided to small parts as yo go deeper, but the general flow is as follows:
The kernel prepares to receive data over a network interface, it prepares SKB (Socket Buffer) data structures and map them to the interface Rx DMA buffer ring.
When packets arrive, they fill these preconfigured buffers and notify the kernel in an interrupt context of the packets arrival. In this context, the buffers are moved to a recv queue for the network stack to handle them out of an interrupt context.
The network stack retrieves these packets and handles them accordingly, eventually arriving to the TCP layer (if they are indeed TCP packets) which in turn handles the window.
See struct tcp_sock member u32 rcv_wnd which is then used in tp->rcvq_space.space as the per-connection space left in window.
The buffer is added to socket receive queue and is read accordingly as stream data in tcp_recvmsg()
The important thing to remember here is that copies is the worst thing regarding performance. Therefore, the kernel will always (unless absolutely necessary) will avoid copies and use pointers instead.

Is DMA synchronous in network card drivers?

My understanding is that when a NIC adapter receives new packets, the top half handler uses DMA to copy data from the RX buffer to the main memory. I think this handler should not exit or release the INT pin before the transmission is completed, otherwise new packets would corrupt the old ones.
However, DMA is generally considered asynchronous and itself requires the interrupt mechanism to notify the CPU that data transmission is done. Thus my question, is DMA actually synchronous here, or interrupt can in fact happen within another interrupt handler?
In general, this synchronisation happens via ring descriptor between NIC(device driver) and host CPU. You will get packet path details here. I have explained the ring-descriptors below.
Edit:
Let me explain with Intel's ethernet Controller. If you look at section 3.2.3, where the RX descriptor format is given, it has status field which solves packet ownership problem. There are two major points to avoid contention and packet corruption as to who owns the packet (NIC driver or CPU).
DMA (from I/O device to Host memory): RX/TX Ring consists of 'hardware descriptors' and 'buffers' (carved from host memory). When we say DMA, controller transfer data, this happens from hardware FIFOs to this host memory.
Let us assume my ring buffers (of 512 bytes) are not big enough to hold the complete incoming packet(1500 or Jumbo packet), in that case the packet may span across multiple ring buffers and with EOP(End Of Packet) status field, indicates that the complete packet is now received (considering all the sanity checks/checksums are already done).
Second is who owns the packet now (driver or CPU for further consumption)? Now until the status flag DD (Descriptor Done) is set, it belongs to driver. Once set CPU can grab it for picking-and-poking.
This is specific to RX path. TX path is slightly different.
Consider it this way, there are multiple interrupts (IO, keyboard, mouse, etc) happening all the time in the system, but the time duration between two interrupts are so huge that CPU can do lot of other good stuff in between. And to further offload CPUs work DMA helps transferring data. So if an interrupt is raised and subroutine is called, all the subsequent interrupts can be masked as you are already inside that subroutine, but trust me these subroutine are very tiny they hardly consume any time until your next packet arrives. That means your packet arriving speed has to be higher than your processing speed.
Another Ex: for router/switches 99% time task is routing and switching hence subroutine and interrupt priorities are completely different, moreover all the time they are bombard with tons of packets and hence the subroutine in such cases will never come until there is another packet at bay. At least i have worked on such networking gears.

Is TCP Buffer In Address Space Of Process Memory?

I am told to increase TCP buffer size in order to process messages faster.
My Question is, no matter what buffer i am using for TCP message(ByteBuffer, DirectByteBuffer etc), whenever CPU receives interrupt from say NIC, to handle network request to read the socket data, does OS maintain any buffer in memory outside Address Space of requesting process(i.g. the process which is listening on that socket)
or
whatever way CPU receives network data, it will always be written in a buffer of process address space only and no buffer(including 'Recv-Q' and 'Send-Q' of netstat command) outside of the address space is maintained for this communication?
The process by which the Linux network stack receives data is a bit complicated. I wrote a comprehensive guide to the Linux network stack that explains everything you need to know starting from the device driver up to a userland program's socket receive queue.
There are many places buffers are maintained in the kernel:
The DMA ring where packets are written by the NIC after they've arrived.
References to the packets on the DMA ring are used to process the packet.
Eventually, the packet data is added to process' receive queue, if the receive queue is not full already.
Reads from the socket will pull packets from the process' receive queue.
If packet sniffing is occurring, packet data is duplicated and sent to any filters added by the packet sniffing code.
The full process of how data is moved, accounted for, and dropped (when required) is described in the blog post linked above.
Now, if you want to process messages faster, I assume you mean you want to reduce your packet processing latency, correct? If so, you should consider using SO_BUSYPOLL which can help reduce packet processing latency.
Increasing the receive buffer just increases the number of packets that can be queued for a userland socket. To increasing packet processing power, you need to carefully monitor and tune each component of the network stack. You may need to use something like RPS to increase the number of CPUs processing packets.
You will also want to monitor each component of your network stack to ensure that available buffers and CPU processing power is sufficient to handle your packet workload.
See:
http://linux.die.net/man/3/setsockopt
The options are SO_SNDBUF, and SO_RCVBUF. If you directly use the C-API, the call is setsockopt itself. If you use some kind of framework look up how to set socket options. This is indeed a kernel-side buffer, not one held by your process. It determines how many bytes the kernel can hold ready for you to fetch from a call to read/receive. It also affects the flow control mechanism of TCP.
You are being told to increase the socket send or receive buffer sizes. These are associated with the socket, in the TCP part of the kernel. See setsockopt() and SO_RCVBUF and SO_SNDBUF.

suddenly packets are stop coming at Ethernet PHY

I have a situation where packets are not coming at Ethernet PHY. I am using DMA ring buffer, the data are copied from physical wire to ring buffer then I am pushing it to upper layer stack. In the DMA ring buffer there are two counters consumer index and producer index as well as there are two pointers read pointer and writer pointer. The counter says that how many packets are came from physical layer whereas consumer buffer is used to keep the index of consumed buffer that has pushed to upper layer. Read and write pointers are used to pick the data.
In my current situation my producer and consumer index are becoming similar, it means that there are no packets coming in the DMA ring buffer whereas the packets are continuously pumped to the device connected to PC (wireshark logs confirm that packet is routing.)
We are making our bootloader OS independent, so here our implementation is doing many things(flow management, parsing the initial packet and pushing it to the upper layers ) within a single execution (introducing some timers)where as in its previous implementation which is VxWorks, things are happening in different threads and they were using their IP stack. After further debugging the issue, I have observed that packets are dropping due to RX_BUFFER overflow. I discovered that there are some issue in setting MAC multicast address in the filters at hardware level that might be a reason for the same. My observation is that first time its work fine. But after soft reset I am not able to put the filter again. I have doubt on a couple of more issue and I am probing the same.
1> Initialize ethernet driver.
2> LWIP (IP stack) initialization.
3> Registering callback functions.
4> Start Ethernet PHY driver.
5> Form DHCP connection.
6> Ethernet driver keeps polling, to accept DHCP offer.
7> Join IGMP
8> Poll for multicast packet
9> parse the packet and join other multicast group
10> start polling for multicast packet again. Here after step 4, randomly I am receiving RX_BUFFER overflow message at any step onwards. The max MTU size set is 1500 byte, and the size of Buffer is 2K.
Any suggestion to sort/narrow down the issue?
We are in touch with Broadcom for the above problem, we fixed the issue and tested it. I would like to update the modification that has been done.
After receiving the data packets from PHY layer we are flushing the PHY RX buffer. This section has removed because its already managed by PHY layer.
We have also made some minor modification in the flow of LWIP stack.

Who captures packets first - kernel or driver?

I am trying to send packets from one machine to another using tcpreplay and tcpdump.
If I write a driver for capturing packets directly from the NIC, which path will be followed?
1) N/w packet ----> NIC card ----> app (no role of kernel)
2) N/w packet -----> Kernel -----> NIC card ---> app
Thanks
It's usually in this order:
NIC hardware gets the electrical signal, hardware updates some of its registers and buffers, which are usually mapped into computer physical memory
Hardware activates the IRQ line
Kernel traps into interrupt-handling routine and invokes the driver IRQ handling function
The driver figures out whether this is for RX or TX
For RX the driver sets up DMA from NIC hardware buffers into kernel memory reserved for network buffers
The driver notifies upper-layer kernel network stack that input is available
Network stack input routine figures out the protocol, optionally does filtering, and whether it has an application interested in this input, and if so, buffers packet for application processing, and if a process is blocked waiting on the input the kernel marks it as runnable
At some point kernel scheduler puts that process on a CPU and resumes is, application consumes network input
Then there are deviations from this model, but those are special cases for particular hardware/OS. One vendor that does user-land-direct-to-hardware stuff is Solarflare, there are others.
A driver is the piece of code that interacts directly with the hardware. So it is the first piece of code that will see the packet.
However, a driver runs in kernel space; it is itself part of the kernel. And it will certainly rely on kernel facilities (e.g. memory management) to do its job. So "no role of kernel" is not going to be true.

Resources