TCP - LRO/TSO techniques - tcp

Why is it must that all interfaces (routers and bridges) involved support LRO/TSO technique ?

Routers don't. Bridges do.
External routers, hubs, switches or anything else that is externally connected to the network will not see the effects of TSO, only interfaces inside the device with TSO will experience any effects - it's a software thing.
A router is an external device which is connected to the network by ethernet cables, fibre optic cables, wireless comms etc. These communication mediums adhere to internation standards such as 803.2 for ethernet or 803.11 for wireless - they're hardware devices, and hardware devices have very strict rules on how they communicate.
A bridge is an internal software construct and is specific to your OS.
Let's use 803.2 (ethernet) and a linux host for an example.
An application calls for a socket to be created and then pushes a large data chunk into the socket. The linux kernel determines which interface this data should be transitted on. The kernel will next interrogate the driver for this interface to determine its capabilities, if the interface is TSO capable the kernel will pass an sk_buff with a single "template" header and a huge chunk of data (more than 1 packets worth) to the interface driver.
Let's consider a standard interface straight to a hardware NIC first:
Some interfaces have fake TSO (they segment the packet in the driver) and some have true TSO (the template header and data are passed to the hardware with minimal alterations). At this point ether the driver or the NIC hardware will convert this large segment of data into multiple, standard compliant, 803.2 ethernet frames, it is these compliant frames that an external device, such as a router, hub, switch, modem or other host will see on the wire.
Now let's consider several NICs behind a software bridge:
Although the kernel is aware of each NIC at a low level, the network stack is only aware of the bride, thus only capabilities that ALL of the underlying NICs have should be passed up to the bridge. If an sk_buff is passed to a bridge, then ALL the interfaces in the bridge will receive the same sk_buff. We'll assume that the kernal has once again passed our large TSO sk_buff to a bridge, if any of the underlying interfaces does not support TSO then the packet will most likely be dropped by the hardware NIC in question.
In summary:
Worst case scenario is the bridge will repeatedly retry to send the same data chunk on the broken interface and the whole bridge will lock up until the application decides to give up. Best case scenario, the non TSO NIC will simply appear to be dead.
That said, if the NIC has unsafe code in its driver then this could cause a segmentation fault that could bring the whole system down.

Related

How to know the network device type from lldp captured packet?

I have a captured LLDP packet.
LLDP has a list of enabled capabilities (Router, Bridge etc.) but none of the capabilities in the list is Switch. The question is how can I know if the source which the packet arrived from is a Switch Device?
If there is no a concrete answer, proximate assumption will do.
Disclaimer: I cannot actively address the switch, I'm sniffing packets...
LLDP has a list of enabled capabilities (Router, Bridge etc.) but none
of the capabilities in the list is Switch.
A switch is a bridge. The original bridges only had a few interfaces (usually two), and bridging was done with software. When technology advanced to bridging in hardware, and the electronics became cheap enough to increase the interface density on a bridge, a vendor coined the marketing term, "switch."
Modern switches are transparent (all interfaces use the same protocol) bridges. There are also translating bridges, e.g. a WAP (Wireless Access Point), which translates between ethernet and Wi-Fi.

Transferring data between two computers connected with a switch from a high level language

I'll start with stating that I know very little about networking and the whole OSI model.
My goal is to create a tiny network(for now my laptop and a raspberry Pi) using an unmanaged network switch. On higher layer transmissions(level 3+) I would simply set the destination IP address for a packet. From what I've read on Wikipedia a network switch operates at the data link layer which means it uses MAC addresses.
How does one send data to a device on a local area network when it's connecting with something that only supports MAC addresses. More importantly, how does one do it from a high level language like Java or C#?
TL;DR The the OSI model is about abstraction and programing languages use operating system calls to implement this abstraction. The Rasberry Pi is running a full OS and will send and receive network data addressed to its assigned IP address. You do not need to specify MAC address.
You want to communicate with a Raspberry Pi from your Laptop. To do this you first connect them to the dumb switch and assign both devices an IP address in the same subnet, on physical interfaces connected to the dumb switch. Let say that your laptop's physical ethernet connection is assigned 10.0.0.1/24 and Rasberry Pi's physical ethernet connection is assigned 10.0.0.2/24 (If you do not understand my notation look at CIDR). IP addresses are Layer 3 constructs. Now your application will use an Operating System socket to create a TCP or UDP connection(see UDP java example here) with a layer 4 address (application port). Everything higher than Layer 4 is handled by your application.
Layer 2 and lower is handled by the OS. When your application tries to send data through the socket, the Operating System determines which physical interface to send data from by looking at the destination IP address. This lookup uses the OS Routing Table. Assuming you have a normal routing table, the OS will pick the interface that has ab IP with the same subnet as the destination IP. So if you send data to 10.0.0.2, your OS will send data from 10.0.0.1 because it has the same subnet of 10.0.0. Now the OS has selected an interface, it still does not know what Layer 2 MAC address to send the Layer 3 IP packet to. The main reason the OS does not know this is because IP addresses can change, but Layer 2 MAC addresses should not. Anyhow the OS sends out an ARP request which tries to get the MAC address for an IP address. If the devices are connected properly, the OS gets a MAC address for the desired IP address and begins to send data to that MAC address. The switch (smart or dumb) makes sure the message gets to the desired MAC address. At the receiving end, the OS receives the packet and send the data in the packet to sockets bound to the Layer 4 address (application port).
Side note: it is technically possible to send data to just a MAC address using RAW sockets but it is extremely technical.
Liam Kelly's answer provides great insight on abstraction of data sending. I will try to provide complementary information.
Network switch operation
While most switches operate at data level, there are some that can perform some operation at higher levels:
layer 3: Within the confines of the Ethernet physical layer, a layer-3 switch can perform some or all of the functions normally
performed by a router.
layer 4: [...] capability for network address translation, but then adds some type of load distribution based on TCP sessions.
layer 7: [...] distribute the load based on uniform resource locators (URLs), or by using some installation-specific technique to
recognize application-level transactions.
RAW sockets usage
As already specified, these require fairly advanced programming skills. They are also severely restricted in non-server versions of modern Windows Operating Systems (source) due to security concerns:
TCP data cannot be sent over raw sockets.
UDP datagrams with an invalid source address cannot be sent over raw sockets. The IP source address for any outgoing UDP datagram must
exist on a network interface or the datagram is dropped. This change
was made to limit the ability of malicious code to create
distributed denial-of-service attacks and limits the ability to send
spoofed packets (TCP/IP packets with a forged source IP address).
A call to the bind function with a raw socket for the IPPROTO_TCP protocol is not allowed.
Suggestion
If .NET is a viable option for you, I would take Pcap.Net for a spin, as it allows various operations at packet level using high level programming (including LINQ).

TCP/UDP over cell network

I'm a novice in this area looking for clarification. I believe that CDMA would be classified as part of the physical layer, so what is used for the data link layer (according to the OSI model) in cellular networks? Is TCP/UDP used in cellular networks? If so, in what capacity?
On a CDMA network (and some others, such as GPRS and HSPA), PPP is used at the Data Link Layer (layer 2).
TCP/UDP (or more generally, IP) is indeed used in CDMA networks to mostly for connection to the CMDA providers ISP network for Internet access by phones and "data sticks".
These data sticks usually provide an emulated modem on a serial port over USB, which is used in a very similar manner to dial-up modems of days gone by. You'd use the same "AT commands" to establish a connection, the only difference being the relatively high speed of the emulated serial port.

Clarification on Ethernet, MII, SGMII, RGMII and PHY

I primarily come from an Embedded Software background and hence I have very limited knowledge about hardware in general. I always use to think Ethernet as that little physical connector on your computer into which you attach your Ethernet cable. And from a Software perspective all you need to do is to install the driver (in Windows) or configure the Linux kernel to include the driver for your Ethernet.
Questions:
But as I have started going down one level (towards the hardware) and looking at various datasheet and schematics, I have started to come across terms like PHY, MII, SGMII, RGMII, etc. And now I am little confused as to what constitutes an Ethernet? For example, when I say Intel 82574L 1.0 Gbps Ethernet port, where do all these terms fit in?
Some definitions:
MAC - media access controller. This is the part of the system which converts a packet from the OS into a stream of bytes to be put on the wire (or fibre). Often interfaces to the host processor over something like PCI Express (for example).
PHY - physical layer - converts a stream of bytes from the MAC into signals on one or more wires or fibres.
MII - media independent interface. Just a standard set of pins between the MAC and the PHY, so that the MAC doesn't have to know or care what the physical medium is, and the PHY doesn't have to know or care how the host processor interface looks.
The MII was standardised a long time ago and supports 100Mbit/sec speeds. A version using less pins is also available, RMII ('R' for reduced).
For gigabit speeds, the GMII ('G' for gigabit) interface is used, with a reduced pincount version called RGMII. A very reduced pincount version called SGMII is also available ('S' for serial) which requires special capabilities on the IO pins of the MAC, whereas the other xMIIs are relatively conventional logic signals.
There are also many more varieties of interfaces used in other circumstances, may of which are linked to from the Wikipedia MII page:
http://en.wikipedia.org/wiki/Media_Independent_Interface
Regarding your specific Intel chip question - as far as I can tell (the datasheet link seems dead), that chip is a MAC, with PCIe. So it will sit between the PCIe bus on the host and some kind of gigabit physical layer (PHY).
Let me try to explain:
The MII, SGMII, RGMII are three kinds of interface between the MAC block and the PHY chip. The Intel 82574L is one MAC chip. Looking following figure:
_______ __________ ___________
CPU | PCI-E | | MII/SGMII/RGMII | |
or |<=======>| MAC |<================>| PHY |<====>physical interface
board| or else | | | |
_______ __________ ___________
For details about MII (100Mbps), SGMII (1Gbps, serial), RGMII (1Gbps, reduced) definition, you can google them.
Basically speaking, NIC (Network Interface Card) consist of one MAC block and related PHY chip, and other peripheral modules. And also one Ethernet device driver should work with the NIC hardware. The MAC block has one interface with the control CPU or PC main-board, such as PCIe bus or else.
You might want to look for the term "7 Layers of OSI" in which some frequently heard terms;
Ethernet PHY Corresponds to Physical Layer which consists from the literally physical components of the communication.
Ethernet MAC (not the Mac Address but the Media-Access Controller) Corresponds to Data-Link Layer, which is responsible from arranging the frames before sending them to physical layer.
Configurations such as MII, RMII, Auto-Negotion are configured from these two.And there are libraries to make your life easy.
Network Layer is the one responsible from routing of the packets. Protocols such as IP and DHCP are considered to be in this layer. Also this layer is the first lowest layer that is solely software based. If you are using light-weight IP for example ip & netif libraries are the ones everything else build upon.
Transport Layer is where transmission protocols such as TCP & UDP can be found.
Hope it helps, I don't know much about the upper layers sadly.
The Intel 82574L chip contains both the MAC and the PHY.
Refer to the Architecture block diagram on page 15 in the datasheet available from here: https://ark.intel.com/content/www/us/en/ark/products/32209/intel-82574l-gigabit-ethernet-controller.html
The MAC and PHY are both there, but from my non-engineer view, I was confused about the MII connections because I was expecting two separate chips.
In very basic terms when you connect ethernet cable to you laptop you are able to access internet. The ethernet port is the interface in above example. Likewise there is an interface connecting your Ethernet Media Access Control(MAC) to Ethernet PHY. Let me break it down here Ethernet MAC is address of NIC(Network interface Card). Ethernet PHY is the physical layer which acts as interface between your ethernet port and Ethernet MAC. Now the Ethernet MAC takes packer from processor converts it into bits and Ethernet PHY convert bits into electrical signals. The interface between the MAC and PHY is where MII/RGMII(etc) comes into picture.
Being media independent means that different types of PHY devices for connecting to different transmission media (i.e. twisted pair, fiber optic, etc.) can be used without redesigning or replacing the MAC hardware. Thus any MAC may be used with any PHY, independent of the network signal transmission media.
SoCs/PCs may have the number of Ethernet ports. For each Ethernet supported device you will have Either SGMII, RGMII interfaces for the data stream. And, you will also have MDIO/MDC interfaces that are used by the drivers on the SoC to control BOTH your MAC and PHY chips if separate (PHY can be SFP or small form factor pluggable).
Combined MAC/PHY chips that work directly with the SoC/CPU to provide both data path & control, as well as PHY (electrical or optical signaling). The MAC part validates packets CRC - counts errored frames, and runts, and provides VLAN capability and pause frame support if an interface is running low on input queue buffers. And, the PHY chip supports line-level signaling like PAM (a modulation scheme).
Go WIDE SCREEN TO SEE DIAGRAMS BELOW.
IE: (SoC or CPU or peripheral interface)
<separate PHY and MAC and shared MDIO/MDC access of PHY(SFP) and MAC>
SoC -----SGMII/RGMII---- MAC chip (can be a switch chip with VLAN support) --- PHY chip for magnetics etc (SFP).
SoC ----- MDIO/MDC ----------------------------------------------------------- PHY
(controlling PHY-related things like speed/FDX/HDX/Auto Negotiate)
SoC ----- MDIO/MDC ----- MAC chip
(controlling VLAN tagging, reading MAC-related data - overruns/runts and time stamps (for PTP) if the chip is 1588 capable)
The MDIO/MDC supports addressing different devices attached to the MDIO/MDC data bus.
OR
SoC ---- SGMII/RGMII ---- MAC chip w/Magnetics --- Cable (in this case magnetics and their control is on the MAC chip i.e. Speed, Duplex to control PAM signals etc)
SoC ---- MDIO/MDC ------- MAC chip
The MDIO/MDC control bus essentially gives the user access to Clause 22 and Clause 45 registers used to control the MAC/PHY or a MAC and PHY chip interface to the actual cable.
SFP allows the user to plug in different interfaces (telco etc) your PC only will have a RJ485 connector with magnetics built in. But telco's use SFP to convert to Optical or Electrical interfaces between their different pieces of equipment. Optical is usually preferred due to noise immunity, distance of transmission, and electrical isolation.

Difference between IPoIB and TCP over Infiniband

Can someone explain the concepts of IPoIB and TCP over infiniband? I understand the overall concept and data rates provided by native infiniband, but dont quite understand how TCP and IPoIB fit in. Why do u need them and what do they do? What is the difference when someone says their network uses IPoIB or TCP with infiniband? Which one is better? I am not from a strong networking background, so it would be nice if you could elaborate.
Thank you for your help.
InfiniBand adapters ("HCAs") provide a couple of advanced features that can be used via the native "verbs" programming interface:
Data transfers can be initiated directly from userspace to the hardware, bypassing the kernel and avoiding the overhead of a system call.
The adapter can handle all of the network protocol of breaking a large message (even many megabytes) into packets, generating/handling ACKs, retransmitting lost packets, etc. without using any CPU on either the sender or receiver.
IPoIB (IP-over-InfiniBand) is a protocol that defines how to send IP packets over IB; and for example Linux has an "ib_ipoib" driver that implements this protocol. This driver creates a network interface for each InfiniBand port on the system, which makes an HCA act like an ordinary NIC.
IPoIB does not make full use of the HCAs capabilities; network traffic goes through the normal IP stack, which means a system call is required for every message and the host CPU must handle breaking data up into packets, etc. However it does mean that applications that use normal IP sockets will work on top of the full speed of the IB link (although the CPU will probably not be able to run the IP stack fast enough to use a 32 Gb/sec QDR IB link).
Since IPoIB provides a normal IP NIC interface, one can run TCP (or UDP) sockets on top of it. TCP throughput well over 10 Gb/sec is possible using recent systems, but this will burn a fair amount of CPU. To your question, there is not really a difference between IPoIB and TCP with InfiniBand -- they both refer to using the standard IP stack on top of IB hardware.
The real difference is between using IPoIB with a normal sockets application versus using native InfiniBand with an application that has been coded directly to the native IB verbs interface. The native application will almost certainly get much higher throughput and lower latency, while spending less CPU on networking.

Resources