How to constrain addr to which SK_BUFF is allocated? - tcp

We are connecting two servers over PCIe. Server1 has a NIC card. It forwards packets to and from Server2. This means that Server1 must be able to DMA (over PCIe) into an Rx SK_BUFF that was allocated inside Server2. This in turn means Server2 must have the entire DRAM address range exposed via its BAR.
To avoid that.. is there any way for the device driver on Server2 to constrain the range of addresses to which an SK_BUFF is allocated in Server2's DRAM?

Related

Should the kernel pass data from a tap interface to an application listening on INADDR_ANY?

I have an application that creates, listens on and writes to a tap interface. The software will read(tun_fd,...) and perform some action on that data, and it will return data to the system as UDP packets via write(tun_fd,...).
I assign an IP to the interface, 10.10.10.10\24 so that a socket application can bind to it and so that the kernel will pass any packets for the virtual subnet to the tap interface.
The software generate frames with IP/UDP packets with the destination IP being that assigned to the interface, and a source IP existing in the same subnet. The source and dest mac address match that of the tap device. Those frames are written back to the kernel with write(tun_fd,...).
If I open said tap interface in wireshark I will see my frames/packets as I expect to, properly formatted, expected ports, expected macs and IPs. But if I try to read those packets with netcat -lvu 0.0.0.0 ${MY_UDP_PORT} I don't see anything.
Is this expected behavior?
Update 1
INADDR_ANY is a red herring. I have the problem even when explicitly binding to an interface / port as in this pseudo code:
#> # make_tap_gen is a fake program that creates a tap interface and pushes UDP packets to 10.10.10.10#1234
#> ./make_tap_gen tun0
#> ip addr add dev tun0 10.10.10.10/24
#> netcat -lvu 10.10.10.10 1234
Update 2
I modified my code to be able to switch to a tun as opposed to a tap and I experience the same issue (well formatted packets in Wireshark but no data in socket applications).
Update 3
In the kernel documentation for tuntap it says
Let's say that you configured IPv6 on the tap0, then whenever
the kernel sends an IPv6 packet to tap0, it is passed to the application
(VTun for example). The application encrypts, compresses and sends it to
the other side over TCP or UDP. The application on the other side decompresses
and decrypts the data received and writes the packet to the TAP device,
the kernel handles the packet like it came from real physical device.
This implies to me that a write(tun_fd,...) where the packet was properly formatted and destined for an IP assigned to some interface on the system should be received by any application listening to 0.0.0.0:${MY_UDP_PORT}
Yes, data written into the tuntap device via write(tun_fd...) should get passed to the kernel protocol stack and distributed to listening sockets with matching packet information just like the frame had arrived over a wire attached to a physical ethernet device.
It requires that the packets be properly formed (IP checksum is good, UDP checksum is good or 0). It requires that the kernel know how to handle the packet (is there an interface on the system with a matching destination IP?). If it's a tap device it may also require that your application is properly ARP'ing (although this might not be necessary for a 'received' packet from the perspective of a socket application listening to an address assigned to the tap device).
In my case the problem was silly. While I had turned on UDP checksum verification in wireshark I forgot to turn on IP header verification. An extra byteswap was breaking that checksum. After fixing that I was immediately able to see packets written into the TAP device in a socket application listening on the address assigned to that interface.

Push all packets received at network interface card into TCP/IP stack

Is it possible to push all packets received at NIC to the TCP/IP stack even if their ethernet address doesn't match my ethernet address? In other words I want to process all incoming packets at my NIC.
Can anyone mention a possible scenario for changing network interface driver code?how could I check the operation of driver code?
In the typical system, this already happens. That is, all you have to do is put the interface into promiscuous mode. The driver then sends all packets it receives to the TCP/IP stack. Check any ordinary network driver, you'll see that in processing received packets, there is no comparison of the MAC (or ethernet) address with the device's MAC address.
Simplifying considerably:
What normally happens is that when you don't have promiscuous mode enabled, the driver configures the device such that it filters on a specific MAC address, only delivering frames that have a matching address or a broadcast address (or occasionally a multicast address, which may or may not also be filtered). When you enable promiscuous mode, the driver simply tells the device not to filter on MAC address but to deliver all frames. The driver then will receive all frames and deliver them to the stack. In linux, this typically happens through a call to netif_receive_skb() or a variant thereof.
The TCP/IP stack itself doesn't care about the MAC address. It will instead look for packets that have an IP address matching one of its own. Any packets received that don't have an IP address belonging to this box are simply discarded -- unless there's a user-mode program trying to receive raw packets (such as tcpdump). [In the latter case, it's still discarded after being delivered to tcpdump.]
If it matches on the IP address, it's then passed up the stack to TCP or UDP [etc] -- where it could also be discarded if it doesn't correspond to a session / port that anything on the box cares about.
But typically packets destined for a MAC address not matching one assigned to this device will not be packets that this machine cares about. Hence, promiscuous mode is usually only enabled for debugging, troubleshooting, forensics (i.e. tcpdump, wireshark, etc). The rest of the time, it's a waste of processing resources since the packets will just be discarded.

What is the theoretical maximum number of open TCP connections that a modern Linux box can have

Assuming infinite performance from hardware, can a Linux box support >65536 open TCP connections?
I understand that the number of ephemeral ports (<65536) limits the number of connections from one local IP to one port on one remote IP.
The tuple (local ip, local port, remote ip, remote port) is what uniquely defines a TCP connection; does this imply that more than 65K connections can be supported if more than one of these parameters are free. e.g. connections to a single port number on multiple remote hosts from multiple local IPs.
Is there another 16 bit limit in the system? Number of file descriptors perhaps?
A single listening port can accept more than one connection simultaneously.
There is a '64K' limit that is often cited, but that is per client per server port, and needs clarifying.
Each TCP/IP packet has basically four fields for addressing. These are:
source_ip source_port destination_ip destination_port
<----- client ------> <--------- server ------------>
Inside the TCP stack, these four fields are used as a compound key to match up packets to connections (e.g. file descriptors).
If a client has many connections to the same port on the same destination, then three of those fields will be the same - only source_port varies to differentiate the different connections. Ports are 16-bit numbers, therefore the maximum number of connections any given client can have to any given host port is 64K.
However, multiple clients can each have up to 64K connections to some server's port, and if the server has multiple ports or either is multi-homed then you can multiply that further.
So the real limit is file descriptors. Each individual socket connection is given a file descriptor, so the limit is really the number of file descriptors that the system has been configured to allow and resources to handle. The maximum limit is typically up over 300K, but is configurable e.g. with sysctl.
The realistic limits being boasted about for normal boxes are around 80K for example single threaded Jabber messaging servers.
If you are thinking of running a server and trying to decide how many connections can be served from one machine, you may want to read about the C10k problem and the potential problems involved in serving lots of clients simultaneously.
If you used a raw socket (SOCK_RAW) and re-implemented TCP in userland, I think the answer is limited in this case only by the number of (local address, source port, destination address, destination port) tuples (~2^64 per local address).
It would of course take a lot of memory to keep the state of all those connections, and I think you would have to set up some iptables rules to keep the kernel TCP stack from getting upset &/or responding on your behalf.

How do I open an udp socket for a multicastgroup with Qt?

I have a question about Qt & network sockets. If I have a computer with multiple IP-Adresses in different networks, how do I open an udp socket for a multicastgroup on a specific network-adapter/ip adress.
eg: ip 192.168.2.1 and 172.20.0.1 and I want to create a socket that receives packets from the multicast group 228.5.6.7 on the 172.20.0.1 network adapter.
You should set that in imr_interface as shown below: (probably it's set to INADDR_ANY now)
struct ip_mreq mreq;
mreq.imr_multiaddr.s_addr = inet_addr("228.5.6.7");
mreq.imr_interface.s_addr = inet_addr("172.20.0.1");// <---- right here
...
QSocketDevice* sdev = new QSocketDevice(QSocketDevice::Datagram);
...
setsockopt(sdev->socket(), IPPROTO_IP, IP_ADD_MEMBERSHIP,(const char *)&mreq, sizeof(struct ip_mreq));
...
If it's a listening socket, you can use bind to IP address to bind it to a specific IP address to listen on.
If it's a client socket, the OS manage the right interface to create it on to reach that IP address as per routing table rules.

How does a system's TCP/IP stack differentiate between multiple programs connecting to the same address and port?

Suppose two web browsers are running on the same computer and are accessing the same website (in other words, accessing the same IP address on the same port).
How does the operating system recognize which packets are from/for which program?
Does each program have a unique id field in the TCP header? If so, what is the field called?
The two programs are not actually accessing the "same port." For purposes of TCP, a connection is defined by the tuple (src_ip,src_port,dst_ip,dst_port).
The source port is usually ephemeral, which means it is randomly assigned by the OS. In other words:
Program A will have:
(my_ip, 10000, your_ip, 80)
Program B will have:
(my_ip, 10001, your_ip, 80)
Thus, the OS can see those are different "connections" and can push the packets to the correct socket objects.
the source port number will be different even if the destination port number is the same. the kernel will associate the source port number with the process.
When the client opens a connection to destination port 80, it uses an arbitrary unused source port on the local machine, say 17824. The web server then responds to that client by sending packets to destination port 17824.
A second client will use a second unused port number, say 17825, and so the two sockets' packets will not be mixed up since they'll use different port numbers on the client machine.
Christopher's answer is partially correct.
Programs A and B actually have a handle to a socket descriptor stored in the underlying OS's socket implementation. Packets are delivered to this underlying socket, and then any process which has a handle to that socket resource can read or write it.
For example, say you are writing a simple server on a Unix like OS such as Linux or Mac OSX.
Your server accepts a connection, at which point a connection consisting of
( src IP, src Port, dest IP, dest Port )
comes in to existence in the underlying OS socket layer. You then fork a process to handle the connection - at this point you now have two processes with handles to the socket both of which can read / write it.
Typically ( always ) the original server will close it's handle to the socket and let the forked process handle it. There are many reasons for this, but the one that is not always obvious to people is that when the child process finishes it's work and closes the socket the socket will stay open and connected if the parent process still has an open handle to it.
By port number.
IP address is used to identify computer, and port is used to identify process(application) within the computer. When a port is used by one process, other processes can't use it any more. So if any packet is sent to that port, only the owner of that port can handle that packet.
Connections are identified by a pair of endpoints.
– Endpoint means (ip, port)
Os assigns random number as src port number so, when packet travels to the receiving side, it is treated as different process's msg, since src port numbers are different.

Resources