Detect faulty physical links with ping - networking

I would have a question regarding physical problem detection in a link with ping.
If we have a fiber or cable which has a problem and generate some CRC errors on the frame (visible with switch or router interface statistics), it's possible all ping pass because of the default small icmp packet size and statistically fewer possibilities of error. First, can you confirm this ?
Also, my second question, if I ping with a large size like 65000 bytes, one ping will generate approximately 65000 / 1500(mtu) = 43 frames, as ip framgents, then the statistics to get packet loss (because normally if one ip fragment is lost the entire ip packet is lost) with large ping is clearly higher ? Is this assumption is true ?
The global question is, with large ping, could we easier detect a physical problem on a link ?

A link problem is a layer 1 or 2 problem. ping is a layer 3 tool, if you use it for diagnosis you might get completely unexpected results. Port counters are much more precise in diagnosing link problems.
That said, it's quite possible that packet loss for small ping packets is low while real traffic is impacted more severely.
In addition to cable problems - that you'll need to repair - and a statistically random loss of packets there are also some configuration problems that can lead to CRC errors.
Most common in 10/100 Mbit networks is a duplex mismatch where one side uses half-duplex (HDX) transmission with CSMA/CD while the other one uses full-duplex (FDX) - once real data is transmitted, the HDX side will detect collisions, late collisions and possibly jabber while the FDX side will detect FCS errors. Throughput is very low, put ping with its low bandwidth usually works.
Duplex mismatches happen most often when one side is forced to full duplex, thus deactivating auto-negotiation and the other side defaults to half duplex.

Related

When to set “Don't fragment” flag in IP header?

There is a “Don't fragment” flag in IP header.
Could applications set this flag ?
When to set this flag and why?
If the 'DF' bit is set on packets, a router which normally would fragment a packet larger than MTU (and potentially deliver it out of order), instead will drop the packet. The router is expected to send "ICMP Fragmentation Needed" packet, allowing the sending host to account for the lower MTU on the path to the destination host. The sending side will then reduce its estimate of the connection's Path MTU (Maximum Transmission Unit) and re-send in smaller segments.This process is called PMTU-D ("Path MTU Discovery").
A fragmentation causes extra overhead on CPU processing to re-assemble packets at the other end (and to handle missing fragments).
Typically, the 'DF' bit is a configurable parameter for the IP stack. I know of ping utility with an option to set DF.
It is often useful to avoid fragmentation, since apart from CPU utilization for fragmentation and re-assembly, it may affect throughput (if lost fragments need re-transmission). For this reason, it is often desirable to know the maximum transmission unit. So the 'Path MTU discovery' is used to find this size, by simply setting the DF bit (say for a ping)
Further down in RFC 791 it says:
If the Don't Fragment flag (DF) bit is set, then internet
fragmentation of this datagram is NOT permitted, although it may be
discarded. This can be used to prohibit fragmentation in cases
where the receiving host does not have sufficient resources to
reassemble internet fragments.
So it appears what they had in mind originally was small embedded devices with the simplest possible implementation of IP, and little memory. Today, you might think of an IoT device like a smart light bulb or smoke alarm. They might not have the code or memory to reassemble fragments, and so the software communicating with them would set DF.
The only situations I can think of where you would possibly want to set this flag is:
If you are building something like a client-server application where
you don't want the other side having to deal with a fragmented
packet but rather prefer a packet loss to that.
Or if you are on a
network with a very specific set of restrictions, possibly caused by
bandwidth issues or a specific firewall behaviour.
Except for such specific circumstances you would likely never touch it.
From RFC 791:
Fragmentation of an internet datagram is necessary when it
originates in a local net that allows a large packet size and must
traverse a local net that limits packets to a smaller size to reach
its destination.
An internet datagram can be marked "don't fragment." Any internet
datagram so marked is not to be internet fragmented under any
circumstances. If internet datagram marked don't fragment cannot be
delivered to its destination without fragmenting it, it is to be
discarded instead.
Could applications set this flag ?
Yes if you write low-level enough code that you are dealing with the IP header. This part of the question is a bit broad for giving a more specific answer, you should probably figure out if you want to set it before worrying about the how.

how can you count the number of packet losses in a file transfer?

One of my networks course projects has to do with 802.11 protocol.
Me and my parther thought about exploring the "hidden terminal" problem, simulating it.
We've set up a private network. We have 2 wireless terminals that will attempt to send a file
to a 3rd terminal that is connected to the router via ethernet. RTS/CTS will be disabled.
To compare results, we'd like to measure the number of packet collisions that occured during the transfer so as to conclude that is due to RTS being disabled.
We've read that it is imposible to measure packet collisions as it is basically noise. We'll have to make do with counting the packets that didnt recieve an "ACK". Basically, the number of retransmitions.
How can we do that?
I suggested that instead of sending a file, we could make the 2 wireless terminals ping the 3rd terminal continually. The ping feature automatically counts the ping packets that didnt recieve the "pong". Do you think its a viable approach?
Thank you very much.
No, you'll get incorrect results. Ping is an application, i.e. working at application (highest) level of the network. 802.11 protocol operates at MAC layer - there are at least 2 layers separating between ping and 802.11. Whatever retransmissions happen at MAC layer - they are hidden by the layers above it. You'll see failure in ping only if all the retransmissions initiated by lower levels have failed.
You need to work on the same level that you're investigating - in your case it's the MAC layer. You can use a sniffer (google for it) to get the statistics you want.

Packet loss showing at point of entry onto network - what could cause?

A traffic source (server) with a 1gigabit NIC is attached to a 1gigabit port of a Cisco switch.
I mirror this traffic (SPAN) to a separate gigabit port on the same switch and then capture this traffic on a high throughput capture device (riverbed shark).
Wireshark analysis of the capture shows that there is a degree of packet loss - around 0.1% of TCP segments are being lost (based on sequence number analysis).
Given that this is the first point on the network for this traffic, what can cause this loss?
The throughput is not anywhere near 1gigabit, there are no port errors (which might indicate a dodgy patch lead).
In Richard Stevens illustrated TCP book he makes mention of 'local congestion' - where the TCP stack is producing data at a rate faster than the underlying local queues can be emptied.
Could this be what I am seeing?
If so, is there a way to confirm it on an AIX box?
(Stevens example used the Linux 'tc' command for a ppp0 device to demonstrate drops at the lower level)
The lost can be anywhere along the network path.
If there is loss between two hosts, you should be seeing DUP ACKs. You need to see what side is sending the DUP ACKs. This would be the host that isn't receiving all the packets. ( When a packet is not seen, it will send a DUP ACK to ask for the packet again.)
There may be congestion somewhere else along the path. Look for output drops on interfaces. Or CRC erros .

Average UDP packet loss and packet re-ordering

I'd like to garner fellow SO'ers experience with regards to the issue of UDP packet loss (or drop-out).
Initially my understanding is that given direct point to point connections where the NICs are connected via a crossover cable and ample buffer on the NICs and timely processing of said buffers, that there 'should' be no packet loss or packet ordering issues. I believe this is also the case given one good/high-end switch in between the points.
Excluding the above scenario, what is the expected average UDP packet loss over a LAN
What scenarios cause UDP packet ordering issues?
No idea on the UDP packetloss on average LANs. I assume reasonably low on modern switched networks, otherwise your LAN or endpoints are too highly loaded. :)
The re-ordering is probably easiest to achieve when routes are brought up and down; say, one of the switches in your organization is under enough load that re-organizing the tree makes sense and traffic is sent through different switches. More likely is your ISP's peers coming and going, or reaching traffic limits, and the priority of packets through them changes -- old packets were in flight on the heavy-loaded network, new packets are in flight on the lighter-loaded network, and they arrive out of order.
I too am looking for an expected average. I found that from a direct link (PC to PC) packet loss occurs very rarely, although it definitely occurs. Availability was something like 99.9% at 1 kB packets # 50 Hz.
I have seen reordering just by sending and receiving on the same network interface.
I concluded that this occurs because each packet is handled asynchronously so that there is a chance of a newly arrived packet being processed before packets received prior to the newly received one.
On my basic gigabit switched LAN I get zero packet loss at even 50,000 packets per second, with FreeBSD, Solaris or Linux.
However Windows is something quite special, I easily see packet loss on exactly the same hardware at low speeds such as 10,000 per second. This is mainly due to buffer overflow between WinSock and the NIC, if you drive the packets faster you lose more, if you space out the packets you drop less.
There is no magical number, my situation is probably worse due to Broadcom having terrible Windows drivers.
You can easily see packet ordering issues, however it is almost always only the last two packets switched. This is an artifact of how switches function.
Interestingly what you haven't mentioned in Wi-Fi, radio signals are highly subject to interference and environmental conditions.

Can the data in a UDP packet be assumed to be correct at the application level?

I recall reading somewhere that if a udp actually gets to the application layer that the data can assume to be intact. Disregarding the possibility of someone in the middle sending fake packets will the data I receive in the application layer always be what was sent out?
UDP uses a 16-bit optional checksum. Packets which fail the checksum test are dropped.
Assuming a perfect checksum, then 1 out of 65536 corrupt packets will not be noticed. Lower layers may have checksums (or even stronger methods, like 802.11's forward error correction) as well. Assuming the lower layers pass a corrupt packet to IP every n packets (on average), and all the checksums are perfectly uncorrelated, then every 65536*n packets your application will see corruption.
Example: Assume the underlying layer also uses a 16-bit checksum, so one out of every 2^16 * 2^16 = 2^32 corrupt packets will pass through corrupted. If 1/100 packets are corrupted, then the app will see 1 corruption per 2^32*100 packets on average.
If we call that 1/(65536*n) number p, then you can calculate the chance of seeing no corruption at all as (1-p)^i where i is the number of packets sent. In the example, to get up to a 0.5% chance of seeing corruption, you need to send nearly 2.2 billion packets.
(Note: In the real world, the chance of corruption depends on both packet count and size. Also, none of these checksums are cryptographically secure, it is trivial for an attacker to corrupt a packet. The above is only for random corruptions.)
UDP uses a 16-bit checksum so you have a reasonable amount of assurance that the data has not been corrupted by the link layer. However, this is not an absolute guarantee. It is always good to validate any incoming data at the application layer, when possible.
Please note that the checksum is technically optional in IPv4. This should further drop your "absolute confidence" level for packets sent over the internet.
See the UDP white paper
You are guaranteed only that the checksum is consistent with the header and data in the UDP packet. The odds of a checksum matching corrupted data or header are 1 in 2^16. Those are good odds for some applications, bad for others. If someone along the chain is dropping checksums, you're hosed, and have no way of even guessing whether any part of the packet is "correct". For that, you need TCP.
Theoretically a packet might arrive corrupted: the packet has a checksum, but a checksum isn't a very strong check. I'd guess that that kind of corruption is unlikely though, (because if it's being sent via a noisy modem or something that the media layer is likely to have its own, stronger corruption detection).
Instead I'd guess that the most likely forms of corruption are lost packets (not arriving at all), packets being duplicated (two copies of the same packet arriving), and packets arriving out of sequence (a later one arriving before an earlier one).
Not really. And it depends on what you mean by "Correct".
UDP packets have a checksum that would be checked at the network layer (below the application layer) so if you get a UDP packet at the application layer, you can assume the checksum passed.
However, there is always the chance that the packet was damaged and the checksum was similarly damaged so that is is actually correct. This would be extremely rare - with today's modern hardware it would be really hard for this to happen. Also, if an attacker had access to the packet, they could just update the checksum to match whatever data they changed.
See RFC 768 for more on UDP (quite small for a tech spec :).
Its worth noting the same 16-bit crc implementation applies to TCP as well as UDP on a per packet basis. When characterizing the properties of UDP consider the majority of data transfers that take place on the Internet today use TCP. When you download a file from a web site the same CRC is used for the transmission.
The secret is the physical and virtual layers (L1) of most access technologies is significantly more robust than TCP and the combined chance of error between L1 and L2 is very low.
For example modems had error correcting hardware and the PPP layer also had its own checksum.
DSL is the same way with error correction at the ATM (Solomon codes) and CRC at the PPPoA layers.
Docsis cable modems use similiar technology to that of DSL for error detection and correction.
The end result is that errors in modern systems are extremely unlikely to ever get past L1.
I have seen clock issues with old frame relay circuts 14 years ago routinly cause corruption at the TCP layer. Have also heard stories of patterns of bit flips on malfunctioning hardware promoting canceling of CRCs and corrupting TCP.
So yes it is possible for corruption and yes you should implement your own error detection if the data is very important. In practice on the Internet and private networks its a rare occurance today.
All hardware: disk drives, buses, processors, even ECC memory have their own error probabilities - for most applications their low enough that we take them for granted.

Resources