I'm doing HTTP GET-requests from mobile devices (so network connection usually is not reliable) and wondering what would be a better approach:
Try 1 request with a timeout of 60 sec or
Try 3 requests each with a timeout of 20 secs
Or any other combination of retries/timeouts. I don't know if a HTTP/TCP connection actually can be stalled so a retry would be a good thing. I don't transfer a lot of data (< 1 kB) and are wondering what approach usually yields to a faster response time?
As long as it's an idempotent operation, it should be OK to retry more frequently in theory.
(a GET should never have any side effect at all, to be honest.)
It might still put unnecessary load on the server, and delayed responses to multiple retransmits of the request may saturate the downlink and make the situation worse.
In interactive applications I find an honest "it's taking longer than normal" notification with a user-triggerable "retry" the best: The user has the option to press the "retry" button after exiting the tunnel or building that was causing a short network outage.
Conversely out in the woods with constantly low throughput, they will learn to ignore the notification and wait patiently.
Related
Overview
I have a Go echo http server running with version 1.13.
$ go version
go version go1.13.7 linux/amd64
I'm monitoring a number of different statistics about the server, including the number of goroutines. I periodically see brief spikes of thousands of goroutines, when high load shouldn't cause it to exceed maybe a few hundred. These spikes do not correlate to an increase in http requests as logged by the labstack echo middleware.
To better debug this situation, I added a periodic check in the program which sends me a pprof report on the goroutines if the number spikes.
The added goroutines surprised me, as when the server is in "normal" operating mode, I see 0 goroutines of the listed functions.
goroutine profile: total 1946
601 # 0x4435f0 0x4542e1 0x8f09dc 0x472c61
# 0x8f09db net/http.(*persistConn).readLoop+0xf0b /usr/local/go/src/net/http/transport.go:2027
601 # 0x4435f0 0x4542e1 0x8f2943 0x472c61
# 0x8f2942 net/http.(*persistConn).writeLoop+0x1c2 /usr/local/go/src/net/http/transport.go:2205
601 # 0x4435f0 0x4542e1 0x8f705a 0x472c61
# 0x8f7059 net/http.setRequestCancel.func3+0x129 /usr/local/go/src/net/http/client.go:321
What I'm struggling with, however, is where these are coming from, what they indicate, and at what point in an http request would I expect them.
To my untrained eye, it looks as if something is briefly attempting to open a connection the immediately tries to close it.
But it would be good to have confirmation of this. In what part of an http request do readLoop, writeLoop, and setRequestCancel goroutines get started? What do these goroutines indicate?
Notes
A few things I've looked at:
I tried adding middleware to capture requests frequencies from IP addresses as they came in, and report on those when the spikes happen. To total request number remains low, in the 30-40 range even as this spike is happening. No IP address is anomalous.
I've considered executing something like lsof to find open connections but that seems like a tenuous approach at best, and relies on my understanding of what these goroutines mean.
I've tried to cross-correlate the timing of seeing this with other things on the network, but without understanding what could cause this, I can't make much sense of where the potential culprit may lie.
If the number of goroutines exceeds 8192, the program crashes with the error: race: limit on 8192 simultaneously alive goroutines is exceeded, dying. A search for this error gets me to this github issue, which feels relevant because I am, in fact, using gorilla websockets in the program. However, the binary was compiled with -race and no race condition is spit out along with my error, which is entirely different from the aforementioned question.
It's a popular claim that HTTP pipelining can degrade performance of downloading sites due to the head of line (HoL) blocking phenomena. Is this performance compared to a single non-pipelined persistent HTTP connection or to multiple TCP connections opened simultaneously in order to download resources of the site in parallel?
In the first case I can't really see how large response blocking sending subsequential smaller ones can result in performance loss. Yes, this blocking will occur. But in the case of a single non-pipelined persistent HTTP connection the HoL blocking phenomena occurs every time the client sends a request and every time the server sends a response. The only reasons of performance being potentially worse here I was able to think of are that:
1) the time needed to properly queue/buffer requests/responses may be longer that the time saved by the fact that the server can start processing n-th request without waiting for the processing of (n-1)-th request to complete. But it basically comes down to numbering the requests/responses correctly, so it seems to be more of a concern if many small requests have to be dealt with (it's unlikely that queueing/buffering-related computations will take more time than processing a large response and people indicating that HoL blocking can be a problem refer to large responses, not to small ones) and it is not directly related to the HoL blocking;
2) if many clients had pipelining enabled then it is possible that many large responses would have to be buffered effectively leading to memory exhaustion on the server side. But this is a kind of special situation and clearly it is not what people have in mind when speaking about enabling pipeling in a browser being able to make performance worse.
On the other hand, in the case of comparison of pipelining to multiple simultaneous TCP connections it is readily seen that the necessity to send large response before sending subsequential smaller ones will slow things down.
However, if the comparison is made to a single non-pipelined HTTP connection and pipelining can indeed result in performance loss - can you demonstrate some basic (perhaps simplified) calculations showing that?
I tried to search the response to my question on the Internet but was unable to find it.
Some of the resources that I tried:
https://devcentral.f5.com/articles/http-pipelining-a-security-risk-without-real-performance-benefits
What are the disadvantage(s) of using HTTP pipelining?
The standard HTTP timeout seems to be 30 seconds, if the server does not respond at all. But what is the "standard" timeout if a server is responding, but sending the response very slowly? When does the client give up? When it reaches a certain time between packets? Never?
HTTP does not standardize on a timeout; nothing prevents clients from waiting forever. Some clients may do an application-level timeout of 30 seconds, but my Firefox, for example, shows network.http.response.timeout as 300 seconds.
The lack of standard applies even more to slow responses. For example, various scanning and reverse proxies employ trickling techniques to drip feed a few bytes to the client to prevent it from timing out while they do heavy processing. Usually 100 bytes or so every ten seconds suffices, though of course it's very much ad-hoc (see also comment above on lack of standard).
I'm working on a project where we have an asp.net website which makes asmx web service calls to a mid tier. The timeout to the mid tier is 5s. One thing that we've noticed is that occasionally, at peak traffic times, we get http 400's when calling the mid tier.
We've done a network trace on the website tier for some of these http 400 requests and noticed that
1) the 3 way tcp handshake goes fast
2) the actual first packet of the post is not initiated until 5 seconds later. The ack from the first packet of the post comes back quickly
3) a fin ack is sent shortly thereafter (presumably due to timing out), after which the web service tier comes back with the http 400 quickly (the 400 being understandable as the post was incomplete)
Sometimes there is an extra 5s delay before step 3. Any idea why this may happen? Step 2 looks like a very strange behavior to me. Could there be a resource contention causing this delay before the post is sent? Perhaps some sort of resource that could be configured differently?
We're using the standard .net async methods for making the request (BeginInvoke). The post body is fully available as a string before we call the api.
I'm thinking the cpu could be too high, causing the delay before 2. Has anybody seen that before? I know the cpu is at least 80% during this time. It could be higher, but our perf counter isn't very high resolution. We're trying to get a higher-resolution perf counter during a repro to confirm that further. Let me know if you have any other ideas!
Thanks!
Being curious, I wonder why HTTP, by design, can only handle one pending request per socket.
I understand that this limitation is because there is no 'Id' to associate a request to its response, so the only way to match a response with its request is to send the response on the same socket that sent the request. There would be no way to match a response to its request if there was more than one pending request on the socket because we may not receive the responses in the same order requests were sent.
If the protocol had been designed to have a matching 'Id' for requests and responses, there could be multiple pending requests on only one socket. This could greatly reduce the number of socket used by internet browsers and applications using web services.
Was HTTP designed like this for simplicity even if it's less efficient or am I missing something and this is the best approach?
Thanks.
Not true. Read about HTTP1.1 pipelining. Apache implements it and Firefox implements it. Although Firefox disables it by default.
To turn it on in Firefox use about:config and write 'pipelining' in the filter.
see: http://www.mozilla.org/projects/netlib/http/pipelining-faq.html
It's basically for simplicity; various proposals have been made over the years that multiplex on the same connection (e.g. SPDY) but none have taken off yet.
One problem with sending multiple requests on a single socket is that it would cause inefficient queuing.
For instance, lets say you are in a store and there are 2 cashiers, and 10 people waiting to be checked out. The ideal way to make the line is to have a single queue of 10 people and the next person in line goes to a cashier when they become available. However, if you sent all the requests at once you would probably send 5 people to cashier A and 5 to cashier B. However, what if you sent the 5 people with the largest shopping carts to the same cashier? That's bad queuing and what could happen if you queued a bunch of requests on a single socket.
NOTE: I'm not saying that you couldn't use queuing well, but it keeps it simple to do it right if there is no queuing on a single socket.
There are a few concidertaions I would review.
The first is related to the nature of TCP itself. TCP suffers from 'head-of-line' blocking issue where there can only be a single outstanding (unacknowledged) request (connection/TCP level) in flight. Given traditional latencies this can be a problem from a load time user experience perspective compared to results of parallel connection scheme browsers employ today. The higher the latency of the link the larger the impact of this fundemental limitation.
There is also a concurrency issue in that sometimes you really want to load multiple resources incrementally / in parallel. Back in the day one of the greatest features mozilla had over mosaic was that it would load images and objects incrementally so you could begin to see what was going on and use a resource without having to wait for it to load. With fewer connections there is a risk in that for example loading a large image on page before a style sheet can be catastrophic from an experience point of view. Expecting some kind of mitigating intelligence or explicit configuration to optimally order requests may not be a realistic or ideal solution.
There are proposals such as HTTP over SCTP that will more or less totally correct the issue you raise at the transport level.
Also realize that HTTP doesn't necessarily mandate a Content-Length header to serve data. Even if each HTTP response was ID'd, how would you manage streaming binary content with no content length (HTTP/1.0 style)? or if the client sent the Connection: close header to have the client close due to non-known lengths?
To manage this you would have to HTTP chunk (already present) in multiplex (I don't think anyone implements this) and add some non-trivial work to many programs.