Buffer queue in router/switch

Buffer queue in router/switch - networking

I'm so confused of understanding buffer queue concept in router/switch.
Normally, when 2 hosts connected to a same switch with the same delay, link of host1 and switch has bandwidth BW1 and link of host2 and switch has bandwidth BW2.
Host1 send packets continuously to host2.
If bw1 = bw2 then when packet come to router, it immediately switch packet to host2. That means router doesn't need a buffer queue, right???
if bw1 > bw2 then sending rate is bigger than receiving rate, and router has to keep some packets in buffer queue.
I wonder what is really buffer queue. Is queue concept different to buffer concept?
Please help me out.
Thank you

Even if the bandwidths of both the links are same, the router needs to do some processing on the packet.
It extracts the IP headers and looks at the destination IP address.
It looks up the routing table and finds the next hop that it needs to send the packet.
Reconstructs the packet and sends it to the next hop.
So there is some processing overhead and if packets arrive faster than the router can process them, then it needs to buffer the packets.

Related

Why is DPDK also used by transmitter to send packets and not only used by the receiver?

I'm setting up a traffic generator using pktgen-DPDK. What am having a hard time understanding is why DPDK plays a part when sending packets. From what I understand, when the receiver gets a packet and have their system configured to handle it using DPDK the NIC will send the packet to the app directly which then uses DPDK to do packet processing there (by passing inefficient Kernel network layers). So why do the transmitter also need to use DPDK for this? And how does it alter the packets that are being sent?
Here is an example to explain my thinking:
A transmitter is trying to send an image to receiver. The image is divided into small packets, which uses IP and TCP to get the packet from transmitter to receiver. After the packets have traveled over the internet, they finally get to the receiver. The receiver has configured their system to use DPDK, bypassing some Linux Kernel network layers. Through this the packet processing becomes faster.
Based on this example above, I don't see the point of using DPDK for sending packets or even how it would play a part in that. When we send packets, do we not simply use some protocols like TCP and IP to make sure the packet get where it needs to.
What is wrong with the example I'm giving and how could you rephrase it to be correctly?

Is TCP Buffer In Address Space Of Process Memory?

I am told to increase TCP buffer size in order to process messages faster.
My Question is, no matter what buffer i am using for TCP message(ByteBuffer, DirectByteBuffer etc), whenever CPU receives interrupt from say NIC, to handle network request to read the socket data, does OS maintain any buffer in memory outside Address Space of requesting process(i.g. the process which is listening on that socket)
or
whatever way CPU receives network data, it will always be written in a buffer of process address space only and no buffer(including 'Recv-Q' and 'Send-Q' of netstat command) outside of the address space is maintained for this communication?

The process by which the Linux network stack receives data is a bit complicated. I wrote a comprehensive guide to the Linux network stack that explains everything you need to know starting from the device driver up to a userland program's socket receive queue.
There are many places buffers are maintained in the kernel:
The DMA ring where packets are written by the NIC after they've arrived.
References to the packets on the DMA ring are used to process the packet.
Eventually, the packet data is added to process' receive queue, if the receive queue is not full already.
Reads from the socket will pull packets from the process' receive queue.
If packet sniffing is occurring, packet data is duplicated and sent to any filters added by the packet sniffing code.
The full process of how data is moved, accounted for, and dropped (when required) is described in the blog post linked above.
Now, if you want to process messages faster, I assume you mean you want to reduce your packet processing latency, correct? If so, you should consider using SO_BUSYPOLL which can help reduce packet processing latency.
Increasing the receive buffer just increases the number of packets that can be queued for a userland socket. To increasing packet processing power, you need to carefully monitor and tune each component of the network stack. You may need to use something like RPS to increase the number of CPUs processing packets.
You will also want to monitor each component of your network stack to ensure that available buffers and CPU processing power is sufficient to handle your packet workload.

See:
http://linux.die.net/man/3/setsockopt
The options are SO_SNDBUF, and SO_RCVBUF. If you directly use the C-API, the call is setsockopt itself. If you use some kind of framework look up how to set socket options. This is indeed a kernel-side buffer, not one held by your process. It determines how many bytes the kernel can hold ready for you to fetch from a call to read/receive. It also affects the flow control mechanism of TCP.

You are being told to increase the socket send or receive buffer sizes. These are associated with the socket, in the TCP part of the kernel. See setsockopt() and SO_RCVBUF and SO_SNDBUF.

tcpdump slowed down by... its own filter?

Do long BPF filters slow down tcpdump?
I replay a packet trace where all the packets have ttl=k and wait for ICMP messages back. What I've been noticing is that if I use the following filter (on eth0):
(ip and ip[8]=$k and src host $myAddress) or (icmp and dst host $myAddress and icmp[0]=11)
...I always miss 20-30 packets among the sent packets, whereas if I just do:
ip
... and then do the exact above filtering offline on the capture file, I find all the packets I had sent.
Is this a known behaviour?

If tcpdump is not fast enough to pop out captured packets from the queue, the kernel could drop some of them.
Look at the "XXXX packets dropped by kernel" message at the end of the dump to see if effectively some of them is lost.
Ensure to add the -n option to the command line. This will avoid DNS resolving and it will speed up a little (depending on your network)

Is there a way to verify whether the packets are received at a switch?

I have a router1-switch-router2 connection. My problem is if I send a packet from router1 to router2, it is not received at router2. I am sure the ipaddress/subnet address are correct. And am also sure that packets are going out the router1. And I am also sure of the internal port connections of the switch. I have access to the onpath switch. Is there any specific command that can be used in the switch to check whether the packet is received or not? ARP itself not getting resolved

You can have a packet capture app running on both the sender and receiver that would tell the incoming and outgoing packets on both boxes.
In this case probably your packet is getting dropped either on the sender or receiver side. There can be million reasons for a packet drop. But this is a good step to start with.

How to Reduce TCP delays caused by ARP flushes for MODBUS TCP

We have an application which is periodically sending TCP messages at a defined rate(Using MODBUS TCP). If a message is not received within a set period an alarm is raised. However every once in a while there appears to be a delay in messages being received. Investigation has shown that this is associated with the ARP cache being refreshed causing a resend of the TCP message.
The IP stack provider have told us that this is the expected behaviour for TCP. The questions are,
Is this expected behaviour for an IP stack? If not how do other stacks work around the period when IP/MAC address translation is not available
If this is the expected behaviour how can we reduce the delay in TCP messages during this period?(Permanent ARP entries have been tried, but are not the best solution)

In my last job I worked with a company building routers and switches. Our implementation would queue packets waiting for ARP replies and send them when the ARP reply was received. Therefore, no TCP retransmit required.
Retransmission in TCP occurs when an ACK is not received within a given time. If the ARP reply takes a long time, or is itself lost, you might be getting a retransmission even though the device waiting for the ARP reply is queuing the packet.
It would appear from your question that the period of the TCP message is shorter than the ARP refresh time. This implies that reuse of the ARP is not causing it to stay refreshed, which is possible behaviour that would be helpful in your situation.
A packet trace of the situation occurring could be helpful - are you actually losing the first packet? How long does the ARP reply take?
In order to stop the ARP cache timing out, you might want to try to find something that will refresh it, such as another ARP request for the same address, or a gratuitous ARP.
I found a specification for MODBUS TCP but it didn't help. Can you post some details of your network - media, devices, speeds?

Your description suggests that the peer ARP entries expire between TCP segments and cause some subsequent segments to fail due to the lack of a current MAC destination.
If you have the MODBUS devices on a separate subnet, then perhaps the destination router will be kind enough to queue the segment until it receives a valid MAC. If you cannot use a separate subnet, you could try to force the session to have keep-alives activated - this would cause a periodic empty message to be sent that would keep the ARP timers resetting. If the overhead of the keep-alive is too high and you completely control the application in your system, you could try to force zero-length messages through to the peer.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex