How To Compute HTTP request processing time without network latency? - http

Because of geographic distance between server and client network latency can vary a lot. So I want to get "pure" req. processing time of service without network latency.
I want to get network latency as TCP connecting time. As far as I understand this time depends a lot on network.
Main idea is to compute:
TCP connecting time,
TCP first packet receive time,
Get "pure" service time = TCP first packet receive (waiting time) – TCP connecting.
I divide TCP connecting by 2 because in fact there are 2 requests-response (3-way handshake).
I have two questions:
Should I compute TCP all packets receive time instead of only first packet?
Is this method okay in general?
PS: As a tool I use Erlang's gen_tcp. I can show the code.

If at all, i guess the "pure" service time = TCP first packet receive - TCP connecting.. You have written other way round.
A possible answer to your first questions is , you should ideally compute atleast some sort of average by considering pure service time of many packets rather than just first packet.
Ideally it can also have worst case, average case, best case service times.
For second question to be answered we would need why would you need pure service time only. I mean since it is a network application, network latencies(connection time etc...) should also be included in the "response time", not just pure service time. That is my view based on given information.

I have worked on a similar question when working for a network performance monitoring vendor in the past.
IMHO, there are a certain number of questions to be asked before proceeding:
connection time and latency: if you base your network latency metric, beware that it takes into account 3 steps: Client sends a TCP/SYN, Server responds with a TCP/SYN-ACK, the Client responds by a final ACK to set up the TCP connection. This means that the CT is equivalent to 1.5 RTT (round trip time). This validates taking the first two steps of the TCP setup process in acccount like you mention.
Taking in account later TCP exchanges: while this first sounds like a great idea to keep evaluating network latency in the course of the session, this becomes a lot more tricky. Here is why: 1. Not all packets have to be acknowledged (RFC1122 or https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment) , which will generate false measurements when it occurs, so you will need an heuristic to take these off your calculations. 2. Not all systems consider acknowledging packets a high priority tasks. This means that some high values will pollute your network latency data and simply reflect the level of load of the server for example.
So if you use only the first (and reliable) measurement you may miss some network delay variation (especially in apps using long lasting TCP sessions).

Related

Extremely high latency when network gets busy, TCP, libevent

In our C/S based online game project, we use TCP for network transmission. We include Libevent, utilise a bufferevent for each connection to handling with the network I/O automatically.
It works well before,but the lagging problem comes to the surface recently. When I do some stress testing to make the network busier, the latency becomes extremely high, several seconds or more. The server sinks into a confusing state:
the average CPU usage decreased (0%-60%-0%-60% repeat, waiting something?)
the net traffic decreased (nethogs)
the clients connected to server still alive (netstat & tcpdump)
It looks like something magically slowed all system down, but new connection to server responded quit in time.
When I changed the protocol to UDP, it works well on the same situation: no obvious latency, the system runs fast. Net traffic is around 3M/S.
The project is running on an Intranet. I also tested the max download speed, nearly 18M/S.
I studied part of Libevent's header files and ducumentations, tried to setup a rate limit to all connections. It did some improvements, but not completely resolved the problem even though I had tried several different configurations. Here is my parameters: read_rate 163840, read_burst 163840, write_rate 163840, write_burst 163840, tick_len 500ms.
Thank you for your help!
TCP = Transmission Control Protocol. It responds to packet loss by retransmitting unacknowledged packets after a delay. In the case of repeated loss, it will exponentially back off. Take a look at this network capture of an attempt to open a connection to host that is not responding:
It sends the initial SYN, and then after not getting an ack for 1s it tries again. After not getting an ack it then sends another after ~2s, then ~4s, then ~8s, and so on. So you can see that you can get some serious latency in the face of repeated packet loss.
Since you said you were deliberately stressing the network, and that the CPU usage is inconsistent, one possible explanation is that TCP is waiting to retransmit lost packets.
The best way to see what is going on is to get a network capture of what is actually transmitted. If your hosts are connected to a single switch, you can "span" a port of interest to the port of another host where you can make the capture.
If your switch isn't capable of this, or if you don't have the administrative control of the switch, then you will have to get the capture from one of hosts involved in your online game. The disadvantage of this is that taking the capture will possibly alter what happens, and it doesn't see what is actually on the wire. For example, you might have TCP segmentation offload enabled for your interface, in which case the capture will see large packets that will be broken up by the network interface.
I would suggest installing wireshark to analyse the network capture (which you can do in real time by using wireshark to do the capture as well). Any time you are working with a networked system I would recommend using wireshark so that you have some visibility into what is actually happening on the network. The first filter I would suggest you use is tcp.analysis.flags which will show you packets suggestive of problems.
I would also suggest turning off the rate limiting first to try to see what is going on (rate limiting is adding another reason to not send packets, which is probably going to make it harder to diagnose what is going on). Also, 500ms might be a longish tick_len depending on how your game operates. If your burst configuration allows the rate to be used up in 100ms, you will end up waiting 400ms before you can transmit again. The IO Graph is a very helpful feature of Wireshark in this regard. It can help you see transmission rates, although the default tick interval and unit are not very helpful in this regard. Here is an example of a bursty flow being rate limited to 200mbit/s:
Note that the tick interval is 1ms and the unit is bits/tick, which makes the top of the chart 1gb/s, the speed of the interface in question.

Is there latency of an application?

Latency (delay) is defined here as the time that a packet spends in travelling between sender and reciever.
Above definition is made for IP packets as far as I can understand. Can we say latency includes retransmission time for missing frames in data link layer? Or this definition assumes there is no missing frame?
Is it possible to make a latency definition for application level? Say, we have an application A. A uses TCP to send messages to a remote application. Since TCP is used missing segments will be retransmitted. So that, latency of an A message includes the missing segments' retransmission time.
Can we say latency includes retransmission time for missing frames in data link layer? Or this definition assumes there is no missing frame?
If you're measuring application latency, you can define latency to include the time it takes for missing TCP segments to be retransmitted.
Is it possible to make a latency definition for application level? Say, we have an application A. A uses TCP to send messages to a remote application. Since TCP is used missing segments will be retransmitted. So that, latency of an A message includes the missing segments' retransmission time.
This measurement is very feasible; obviously you will need to implement measurements of this latency within your application... also be aware that Nagle could skew your latency measurements upwards if your messages typically are larger than TCP MSS (1460 bytes on standard ethernet segments). If your messages tend to be larger than TCP MSS, disable Nagle to get the lowest average message latency.

What are the chances of losing a UDP packet?

Okay, so I am programming for my networking course and I have to implement a project in Java using UDP. We are implementing an HTTP server and client along with a 'gremlin' function that corrupts packets with a specified probability. The HTTP server has to break a large file up into multiple segments at the application layer to be sent to the client over UDP. The client must reassemble the received segments at the application layer. What I am wondering however is, if UDP is by definition unreliable, why am I having to simulate unreliability here?
My first thought is that perhaps it's simply because my instructor is figuring in our case, both the client and the server will be run on the same machine and that the file will be transferred from one process to another 100% reliably even over UDP since it is between two processes on the same computer.
This led me first to question whether or not UDP could ever actually lose a packet, corrupt a packet, or deliver a packet out of order if the server and client were guaranteed to be two processes on the same physical machine, guaranteed to be routed strictly over localhost only such that it won't ever go out over the network.
I would also like to know, in general, for a given packet what is the rough probability that UDP will drop / corrupt / or deliver a packet out of order while being used to facilitate communication over the open internet between two hosts that are fairly geographically distant from one another (say something comparable to the route between the average broadband user in the US to one of Google's CDNs)? I'm mostly just trying to get a general idea of the conditions experienced when communicated over UDP, does it drop / corrupt / misorder something on the order of 25% of packets, or is it more like something on the order of 0.001% of packets?
Much appreciation to anyone who can shed some light on any of these questions for me.
Packet loss happens for multiple reasons. Primarily it is caused by errors on individual links and network congestion.
Packet loss due to errors on the link is very low, when links are working properly. Less than 0.01% is not unusual.
Packet loss due to congestion obviously depends on how busy the link is. If there is spare capacity along the entire path, this number will be 0%. But as the network gets busy, this number will increase. When flow control is done properly, this number will not get very high. A couple of lost packets is usually enough that somebody will reduce their transmission speed enough to stop packets getting lost due to congestion.
If packet loss ever reaches 1% something is wrong. That something could be a bug in how your congestion control algorithm responds to packet loss. If it keeps sending packets at the same rate, when the network is congested and losing packets, the packet loss can be pushed much higher, 99% packet loss is possible if software is misbehaving. But this depends on the types of links involved. Gigabit Ethernet uses backpressure to control the flow, so if the path from source to destination is a single Gigabit Ethernet segment, the sending application may simply be slowed down and never see actual packet loss.
For testing behaviour of software in case of packet loss, I would suggest two different simulations.
On each packet drop it with a probability of 10% and transmit it with a probability of 90%
Transmit up to 100 packets per second or up to 100KB per second, and drop the rest if the application would send more.
if UDP is by definition unreliable, why am I having to simulate unreliability here?
It is very useful to have a controlled mechanism to simulate worst case scenarios and how both your client and server can respond to them. The instructor will likely want you to demonstrate how robust the system can be.
You are also talking about payload validity here and not just packet loss.
This led me to question whether or not UDP, lose a packet, corrupt a packet, or deliver it out of order if the server and client were two processes on the same machine and it wasn't having to go out over the actual network.
It is obviously less likely over the loopback adapter, but this is not impossible.
I found a few forum posts on the topic here and here.
I am also wondering what the chances of actually losing a packet, having it corrupted, or having them delivered out of order in reality would usually be over the internet between two geographically distant hosts.
This question would probably need to be narrowed down a bit. There are several factors both application level (packet size and frequency) as well as limitations/traffic of routers and switches along the path.
I couldn't find any hard numbers on this but it seems to be fairly low... like sub 5%.
You may be interested in The Internet Traffic Report and possibly pages such as this.
I was spamming udp packets over wifi to some nanoleaf panels and my packet loss was roughly 1/7000.
I think it depends on a ton of factors.

Assessing/diagnosing time connections are in SYN_RECV before being established

I'm trying to improve the performance of a (virtual) web server with a fairly standard CentOS/Apache setup and one thing I noticed is that new connections seem to "stick" in the SYN_RECV state, sometimes for several seconds, before finally being established and handled by Apache.
My first guess was that Apache could be reaching the limit for the number of connections it's prepared to handle simultaneously, but e.g. with keep-alive off netstat is reporting a few established connections (just those not involving localhost, so discarding "housekeeping" connections e.g. between Apache and Tomcat), whereas with keep-alive on it will happily get up to 100+ established connections (but with no clear difference to the SYN_RECV behaviour either way -- there's typically 10-20 connections sitting in SYN_RECV at any one time).
What are people's recommendations for investigating where the bottleneck is that's preventing the connections from being established quickly?
P.S. Follow-on question: does anybody know what a TYPICAL statistic would be for the time for a connection to be established once first "hitting" the server?
Update in case anyone else encounters this: in the end, I wrote a small Java program to take data from /proc/net/tcp and analyse and it appears that this is happening for a small proportion of connections (although that still means that at any one time there can be a number of connections in this state, because they can stay this way for a number of seconds) and looks like an issue local to those connections. Over 90% of connections are still going through in < 500ms and 81% in < 200ms. So if others get this, there isn't necessarily need for panic immediately.
Try capturing a packet trace and see if SYN ACKs are being retransmitted (and the number of re-tx). This could indicate a routing issue (SYN comes in via path A and SYN-ACK goes via path B which is broken).
Also see if these connections have a specific pattern (such as originating from the same network).

What factors other than latency and bandwidth affect network speed?

I have noticed that viewing images or websites that are hosted on US servers (Im in europe) is considerably slower. The main reason would be the latency because of the distance.
But if 1 packet takes n milliseconds to be received, can't this be alleviated by sending more packets simultaneously?
Does this actually happen or are the packets sent one by one? And if yes what determines how many packets can be send simultaneously (something to do with the cable i guess)?
But if 1 packet takes n milliseconds
to be received, can't this be
alleviated by sending more packets
simultaneously?
Not in a boundless way, by TCP/IP standards, because there algorithms that determine how much can be in flight and not yet acknowledged to avoid overloading the whole network.
Does this actually happen or are the
packets sent one by one?
TCP can and does keep up to a certain amount of packets and data "in flight".
And if yes what determines how many
packets can be send simultaneously
(something to do with the cable i
guess)?
What cable? The same standards apply whether you're on cabled, wireless, or mixed sequences of connections (remember your packet goes through many routers on its way to the destination, and the sequence of router can change among packets).
You can start your study of TCP e.g. wikipedia. Your specific questions deal with congestion control algorithms and standard, Wikipedia will give you pointers to all relevant algorithms and RFCs, but the whole picture won't do you much good if you try to start studying at that spot without a lot of other understanding of TCP (e.g., its flow control concepts).
Wikipedia and similar encyclopedia/tutorial sites can only give you a summary of the summary, while RFCs are not studied to be readable, or understandable to non-experts. If you care about TCP, I'd recommend starting your study with Stevens' immortal trilogy of books (though there are many other valid ones, Stevens' are by far my personal favorites).
The issue is parallelism.
Latency does not directly affect your pipe's throughput. For instance, a dump truck across the country has terrible latency, but wonderful throughput if you stuff it full of 2TB tapes.
The problem is that your web browser can't start asking for things until it knows what to ask for. So, when you load a web page with ten images, you have to wait until the img tags arrive before you can send the request for them. So everything is perceptibly slower, not because your connection is saturated but because there's down time between making one request and the next.
A prefetcher helps alleviate this issue.
As far as "multiple packets at a time" are concerned, a single TCP connection will have many packets in transit at once, as specified by the window scaling algorithm the ends are using. But that only helps with one connection at a time...
TCP uses what's called a sliding window. Basically the amount of buffer space, X, the receiver has to re-assemble out of order packets. The sender can send X bytes past the last acknowledged byte, sequence number N, say. This way you can fill the pipe between sender and receiver with X unacknowledged bytes under the assumption that the packets will likely get there and if not the receiver will let you know by not acknowledging the missing packets. On each response packet the receiver sends a cumulative acknowledgment, saying "I've got all the bytes up to byte X." This lets it ack multiple packets at once.
Imagine a client sending 3 packets, X, Y, and Z, starting at sequence number N. Due to routing Y arrives first, then Z, and then X. Y and Z will be buffered at the destination's stack and when X arrives the receiver will ack N+ (the cumulative lengths of X,Y, and Z). This will bump the start of the sliding window allowing the client to send additional packets.
It's possible with selective acknowledgement to ack portions of the sliding window and ask the sender to retransmit just the lost portions. In the classic scheme is Y was lost the sender would have to resend Y and Z. Selective acknowledgement means the sender can just resend Y. Take a look at the wikipedia page.
Regarding speed, one thing that may slow you down is DNS. That adds an additional round-trip, if the IP isn't cached, before you can even request the image in question. If it's not a common site this may be the case.
TCP Illustrated volume 1, by Richard Stevens is tremendous if you want to learn more. The title sounds funny but the packets diagrams and annotated arrows from one host to the other really make this stuff easier to understand. It's one of those books that you can learn from and then end up keeping as a reference. It's one of my 3 go-to books on networking projects.
The TCP protocol will try to send more and more packets at a time, up to a certain amount (I think), until it starts noticing that they're dropping (the packets die in router/switch land when their Time To Live expires) and then it throttles back. This way it determines the size of the window (bandwidth) that it can use. If the sending host is noticing from its end a lot of dropped packets, then the receiving host is just going to see it as a slow connection. It could very well be blasting you with data, you're just not seeing it.
I guess parallel packets transmission is also possible(ya.. there can be limiit on No. of packets to be send at a time)..U will get more information about the packet transmission from topics::> message switching,packet switching ,circuit switching & virtual circuit packet switching...

Resources