Lossy network versus Congested network - networking

Suppose, there is a network which gives a lot of Timeout errors when packets are transmitted over it. Now, timeouts can happen either because the network itself is inherently lossy (say, poor hardware) or it might be that the network is highly congested, due to which network devices are losing packets in between, leading to Timeouts. Now, what additional statistics about the traffic being transmitted (like Missing Packets errors etc.) are required that might help us to find out whether timeouts are happening due to poor hardware, or too much network load.
Please note that we have access only to one node in the network (from which we are transmitting packets) and as such, we cannot get to know the load being put by other nodes on the network. Similarly, we don't really have any information about the hardware being used in the network. Statistics is all that we have.

A network node only has hardware information about its local collision domain, which on a standard network will be the cable that links the host to the switch.
All the TCP stack will know about lost packets is that it is not receiving acknowledgements so it needs to resend, there is no mechanism for devices (E.g. switches & routers) between a source and destination to tell the source that there is a problem.
Without access to any other nodes the only way to ascertain if your problem is load based would be to run a test that sends consistent traffic over the network for a long period, if the packet retry count per second/minute/hour remains the same then it would suggest that there is a hardware issue, if the losses only occur during peak traffic periods then the issue could be load related. Of course there could be a situation where misconfigured hardware issues will only be apparent during high traffic periods, this takes things back to the main problem which is that you need access to network stats from beyond your single node.

In practice, nearly all loss on terrestrial network paths is due to either congestion or firewalls. Loss due to bit-errors is extremely rare. Even on wireless networks, forward error correction handles most bit/media/transmission errors. Congestion can be caused by a lot of different factors: any given network path will involve dozens of devices and if any one of them becomes overloaded for even a moment, packets will be dropped.
The only way to tell the difference between congestion induced packet loss and media errors is that media errors will occur independent of load. In other words, the loss rate will be the same whether you are sending a lot of data or only a little data.
To test that, you will need some control, or at least knowledge, of the load on the path. Since you don't have control and the only knowledge you have is from source-node observation, the best you can do is to take test samples (using ping is the easiest) around the clock and throughout the week, recording loss rates and latencies. These should give you an idea of when the path is relatively idle. If loss rates remain significant even when the path is (probably) idle, then there might be a media-loss issue. But again, that is extremely rare.
For background, I have written a few articles on the subject:
Loss, Latency, and Speed, discussing what statistics you can observe about a path and what they mean.
Common Network Performance Problems, discussing the most common components in a network path and how they affect performance (congestion).

Related

Is a checksum needed for application data of a custom protocol?

Since packets travel over the wire have checksums on different layers, Ethernet and IPv4 have checksums for their headers, TCP's checksum even covers the entire segment.
I know it is not impossible that a corrupted packet, from the standpoint of the application layer, can slip in without being discarded by Ethernet/IP/TCP, because there are chances that their checksums are correct, only the probability is low.
I am designing a custom binary protocol for an IM application. My question is do I need to add a checksum to ensure the integrity of my application data? Is a checksum really needed in practice?
There's actual research on this subject. It's old, but very relevant to the question at hand.
The paper, from 2000, is called "When the CRC and TCP checksum disagree" by Jonathan Stone and Craig Partridge, which investigate packet and frame errors, and look how often the TCP checksum is wrong, but the Ethernet CRC is fine. You can find the PDF here. Here are the important bits.
From the abstract:
Traces of Internet packets from the past two years show that between 1
packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on
links where link-level CRCs should catch all but 1 in 4 billion
errors.
From the conclusion (with some of my highlighting)
In practice, the checksum is being asked to detect an error every
few thousand packets. After eliminating those errors that the checksum
always catches, the data suggests that, on average, between one packet
in 10 billion and one packet in a few millions will have an error that
goes undetected. The exact range depends on the type of data
transferred and the path being traversed. While these odds seem large,
they do not encourage complacency. In every trace, one or two 'bad
apple' hosts or paths are responsible for a huge proportion of the
errors. For applications which stumble across one of the `bad-apple'
hosts, the expected time until a corrupted data is accepted could be
as low as a few minutes. When compared to undetected error rates for
local I/O (e.g., disk drives), these rates are disturbing. Our
conclusion is that vital applications should strongly consider
augmenting the TCP checksum with an application sum.
I don't know of any newer research into that question (enlighten me if you know otherwise!), so the Internet could have become more reliable since then, and the numbers in the paper might be irrelevant.
However, and this is important, 17 years have passed, and the amount of Internet traffic simply exploded since that paper was written. At 1Gbps, which is not an uncommon connection speed nowadays, you're sending about 81K full TCP segments, with 1460 bytes of data, per second (or a lot more if the packets are smaller). That's a million big packets every 12.5 seconds, a billion in about 3.5 hours (or again, a lot more if the packets are small).
So to answer your question - that depends.
For transferring large files or other data, I'd definitely add additional checks if the data itself isn't protected in any way. For messaging, which pushes very little data into the network, you'll probably be fine with TCP's checksum, with maybe some sanity checks on the input you're getting to make sure that it's in the correct format, and various parameters and fields make sense.
I would not bother with a checksum because of packets getting corrupted in the network.
However, since you are working on a protocol that would presumably be used on the open internet, you will need to prepare for rare cases of an unintended application sending udp packets or making tcp connections to your receiving/listening ports. Also there will be maybe less port scans and hackers / script kiddies knocking on your gates.
So you should make your protocol such that it is easy to discard this kind of traffic. Using a checksum in every transmission would imho be one sensible way of doing that.

What is splay, in the context of networking or automated updates?

Artur Bergman complained in his Velocity NYC 2013 talk about getting loads of requests at the same time every single hour, with the comment "God I wish people would splay".
I tried searching for it, but due to the largest Swedish Youtube network being called Splay, the term is now completely ungooglable for me.
What does splay mean, in the context of automated updates, cron jobs or networking?
"Splay" is a term for a very simple concept: deliberate introduction of small random delays in the request timing of a large group of network clients (a so-called thundering herd). This is also sometimes called "jitter", but that term is overloaded in networking to also refer to accidental variation in the timing of received packets due to network congestion, improper queueing, configuration errors, etc.
These delays smooth the distribution of requests in a way that allows the server to handle them over a larger period of time and avoid network congestion.
A related concept is exponential backoff, though in this case it involves clients waiting a random (and exponentially increasing) amount of time after congestion before a retry.

Can i ignore UDP's lack of reliability features in a controlled environment?

I'm in a situation where, logically, UDP would be the perfect choice (i need to be able to broadcast to hundreds of clients). This is in a very small and controlled environment (the whole network is over a few square metters, all devices are local, the network is way oversized with gigabit ethernet and switches everywhere).
Can i simply "ignore" all of the added reliability that needs to be tossed on udp (checking messages arrived, resending them etc) as those mostly apply where the is expected packet loss (the internet) or is it really suggested to handle udp as "may not arrive" even in such conditions?
I'm not asking for theorycrafting, really wondering if anyone could tell me from experience if i'm actually likely to have udp packets missing in such an environment or is it's going to be a really rare event as obviously sending things and assuming that worked is much simpler than handling all possible errors.
This is a matter of stochastics. Even in small local networks, packet losses will occur. Maybe they have an absolute probability of 1e-10 in a normal usage scenario. Maybe more, maybe less.
So, now comes real-world experience: Network controllers and Operating systems do have a tough live, if used in high-throughput scenarios. Worse applies to switches. So, if you're near the capacity of your network infrastructure, or your computational power, losses become far more likely.
So, in the end it's just a question on how high up in the networking stack you want to deal with errors: If you don't want to risk your application failing in 1 in 1e6 cases, you will need to add some flow/data integrity control; which really isn't that hard. If you can live with the fact that the average program has to be restarted every once in a while, well, that's error correction on user level...
Generally, I'd encourage you to not take risks. CPU power is just too cheap, and bandwidth, too, in most cases. Try ZeroMQ, which has broadcast communication models, and will ensure data integrity (and resend stuff if necessary), is available for practically all relevant languages, and runs on all relevant OSes, and is (at least from my perspective) easier to use than raw UDP sockets.

Determine asymmetric latencies in a network

Imagine you have many clustered servers, across many hosts, in a heterogeneous network environment, such that the connections between servers may have wildly varying latencies and bandwidth. You want to build a map of the connections between servers by transferring data between them.
Of course, this map may become stale over time as the network topology changes - but lets ignore those complexities for now and assume the network is relatively static.
Given the latencies between nodes in this host graph, calculating the bandwidth is a relative simply timing exercise. I'm having more difficulty with the latencies - however. To get round-trip time, it is a simple matter of timing a return-trip ping from the local host to a remote host - both timing events (start, stop) occur on the local host.
What if I want one-way times under the assumption that the latency is not equal in both directions? Assuming that the clocks on the various hosts are not precisely synchronized (at least that their error is of the the same magnitude as the latencies involved) - how can I calculate the one-way latency?
In a related question - is this asymmetric latency (where a link is quicker in direction than the other) common in practice? For what reasons/hardware configurations? Certainly I'm aware of asymmetric bandwidth scenarios, especially on last-mile consumer links such as DSL and Cable, but I'm not so sure about latency.
Added: After considering the comment below, the second portion of the question is probably better off on serverfault.
To the best of my knowledge, asymmetric latencies -- especially "last mile" asymmetries -- cannot be automatically determined, because any network time synchronization protocol is equally affected by the same asymmetry, so you don't have a point of reference from which to evaluate the asymmetry.
If each endpoint had, for example, its own GPS clock, then you'd have a reference point to work from.
In Fast Measurement of LogP Parameters
for Message Passing Platforms, the authors note that latency measurement requires clock synchronization external to the system being measured. (Boldface emphasis mine, italics in original text.)
Asymmetric latency can only be measured by sending a message with a timestamp ts, and letting the receiver derive the latency from tr - ts, where tr is the receive time. This requires clock synchronization between sender and receiver. Without external clock synchronization (like using GPS receivers or specialized software like the network time protocol, NTP), clocks can only be synchronized up to a granularity of the roundtrip time between two hosts [10], which is useless for measuring network latency.
No network-based algorithm (such as NTP) will eliminate last-mile link issues, though, since every input to the algorithm will itself be uniformly subject to the performance characteristics of the last-mile link and is therefore not "external" in the sense given above. (I'm confident it's possible to construct a proof, but I don't have time to construct one right now.)
There is a project called One-Way Ping (OWAMP) specifically to solve this issue. Activity can be seen in the LKML for adding high resolution timestamps to incoming packets (SO_TIMESTAMP, SO_TIMESTAMPNS, etc) to assist in the calculation of this statistic.
http://www.internet2.edu/performance/owamp/
There's even a Java version:
http://www.av.it.pt/jowamp/
Note that packet timestamping really needs hardware support and many present generation NICs only offer millisecond resolution which may be out-of-sync with the host clock. There are MSDN articles in the DDK about synchronizing host & NIC clocks demonstrating potential problems. Timestamps in nanoseconds from the TSC is problematic due to core differences and may require Nehalem architecture to properly work at required resolutions.
http://msdn.microsoft.com/en-us/library/ff552492(v=VS.85).aspx
You can measure asymmetric latency on link by sending different sized packets to a port that returns a fixed size packet, like send some udp packets to a port that replies with an icmp error message. The icmp error message is always the same size, but you can adjust the size of the udp packet you're sending.
see http://www.cs.columbia.edu/techreports/cucs-009-99.pdf
In absence of a synchronized clock, the asymmetry cannot be measured as proven in the 2011 paper "Fundamental limits on synchronizing clocks over networks".
https://www.researchgate.net/publication/224183858_Fundamental_Limits_on_Synchronizing_Clocks_Over_Networks
The sping tool is a new development in this space, which uses clock synchronization against nearby NTP servers, or an even more accurate source in the form of a GNSS box, to estimate asymmetric latencies.
The approach is covered in more detail in this blog post.

Lots of ports with little data, or one port with lots of data?

I've been checking out using a system called ROS (http://www.ros.org) for some work.
There are lots of different types of data that get sent between network nodes in ROS.
You define a struct of data that you want to send in a message, and ROS will handle opening a specific port between the two nodes that will only send that struct of data.
So if there are 5 different messages, there will be 5 different ports.
As opposed to this scenario, I have seen other platforms that just push all the different messages across one port. This means that there needs to be a sort of multiplexing/demultiplexing (done by some sort of message parsing on the receivers end).
What I wonder is... which is better from a performance perspective?
Do operating systems switch based on ports quickly, so that a system like ROS doesn't have to do too much work to work out what is in the message and interpreting it?
OR
Is opening lots of ports going to mean lots of slower kernel calls, and the cost of having to work out and translate message types end up being more then the time spent switching between ports?
When this scales to a large amount of data at high rates and lots of different messages types there will be lots of ports. So I imagine that when scaling each of these topologies that performance will be a big factor in selecting the way to work.
I should also point out that these nodes usually exist on one small network, or most of the time on the one machine in which networking is used as a force of inter-process communication. So the transmission time is only a very small factor in the overall system timing.
ROS being an architecture for robots may have one node for every sensor and actuator, so depending on the complexity of your system we may be talking about 20-30 nodes pushing small-ish (100bytes or so) data between 10-100Hz
It depends. I do not know the specifics of ROS but in networking it comes down to the following constraints:
Distance: speed of light is fast but over a distance it starts making a difference
Protocol Overhead: connection oriented vs. connection-less
On the OS side, maintaining a list of free ports isn't such much of an overhead - of course there is a cost to it but everything is relative: if you are talking about a distributed system with long distance links, then it is easy to argue that cycling through OS network ports ranks as lower concern compared to managing communication quality.
Without a more specific question, I'll stop here.
I don't have any data on this, but it seems plausible that multiple ports might be handled more efficiently by multi-core systems, as opposed to demultiplexing within the program.

Resources