Will HTTP ResponseWriter's write function buffer in Go?

Will HTTP ResponseWriter's write function buffer in Go? - http

Assume that we have a function handling an HTTP Request, something like:
func handler(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("first piece of data"))
// do something
w.Write([]byte("second piece of data"))
}
I'm wondering that if the first call to w.Write() is flushed to client or not?
If it is flushed, then we actually responses to clients twice, this is strange because how can we determine Content-Length before the second call to write?
If it is not flushed (say the data is buffered locally), then what if we write a huge amount of data at the first call? (will that stack overflow?)
Any explanation will be appreciated! :)

I'm wondering that if the first call to w.Write() is flushed to client or not?
net/http's default ResonseWriter has a (currently 4KB) large output buffer over the net.Conn it writes to. Additionally, the OS normally will buffer writes to a socket. So in most cases some kind of buffering takes place.
If it is flushed, then we actually responses to clients twice, this is strange because how can we determine Content-Length before the second call to write?
Well there's HTTP 1.1 which allows persistent connections. Such responses usually don't include a Content-Length header. Additionally, there's HTTP trailers.
If your client does not support HTTP 1.1 and persistent connections they will have some sort of read timeout, during this time you can write to the connection as many times as you like; it's one response.
This has more to do with the nature of TCP sockets and HTTP implementations than Go.
If it is not flushed (say the data is buffered locally), then what if we write a huge amount of data at the first call? (will that stack overflow?)
No, allocating a buffer on the stack makes no sense – the buffer's body will live on the heap. If you hit your per-process memory limit your application will panic "out of memory".
See also:
How to turn off buffering on write() system call?
*TCPConn.SetNoDelay
Edit to answer your question in the comments:
Chunked Transfer Encoding is part of the HTTP 1.1 specification and not supported in HTTP 1.0.
Edit to clarify:
As long as the total time it takes you to write both parts of your response does not exceed your client's read time out, and you don't specify a Content-Length header you just write your response and then close the connection. That's totally OK and not "hacky".

Related

Does Golang's (*http.ResponseWriter) Write() method block until data is recieved by the client?

I ask this question because I had a very weird puzzling experience that I am about to tell.
I am instrumenting an HTTP API server to observe it's behavior in the presence of latency between the server and the clients. I had a setup consisting of a single server and a dozen of clients connected with a 10Gbps Ethernet fabric. I measured the time it took to serve certain API requests in 5 scenarios. In each scenario, I set the latency between the server and the clients to one of the values: No latency (I call this baseline), 25ms, 50ms, 250ms or 400ms using the tc-netem(8) utility.
Because I am using histogram buckets to quantify the service time, I observed that all the requests were processed in less than 50ms whatever the scenario is, which clearly doesn't make any sense as, for example, in the case of 400ms, it should be at least around 400ms (as I am only measuring the duration from the moment the request hits the server to the moment the HTTP Write()function returns). Note that the response objects are between 1Kb to 10Kb in size.
Initially, I had doubts that the *http.ResponsWriter's Write() function was asynchronous and returns immediately before data is received by the client. So, I decided to test this hypothesis by writing a toy HTTP server that services the content of a file that is generated using dd(1) and /dev/urandom to be able to reconfigure the response size. Here is the server:
var response []byte
func httpHandler(w http.ResponseWriter, r * http.Request) {
switch r.Method {
case "GET":
now: = time.Now()
w.Write(response)
elapsed: = time.Since(now)
mcs: = float64(elapsed / time.Microsecond)
s: = elapsed.Seconds()
log.Printf("Elapsed time in mcs: %v, sec:%v", mcs, s)
}
}
func main() {
response, _ = ioutil.ReadFile("BigFile")
http.HandleFunc("/hd", httpHandler)
http.ListenAndServe(":8089", nil)
}
Then I start the server like this:
dd if=/dev/urandom of=BigFile bs=$VARIABLE_SIZE count=1 && ./server
from the client side, I issue time curl -X GET $SERVER_IP:8089/hd --output /dev/null
I tried with many values of $VARIABLE_SIZE from the range [1Kb, 500Mb], using an emulated latency of 400ms between the server and each one of the clients. To make long story short, I noticed that the Write() method blocks until the data is sent when the response size is big enough to be visually noticed (on the order of tens of megabytes). However, when the response size is small, the server doesn't report a mentally sane servicing time compared to the value reported by the client. For a 10Kb file, the client reports 1.6 seconds while the server reports 67 microseconds (which doesn't make sense at all, even me as a human I noticed a little delay on the order of a second as it is reported by the client).
To go a little further, I tried to find out starting from which response size the server returns a mentally acceptable time. After many trials using a binary search algorithm, I discovered that the server always returns few microseconds [20us, 600us] for responses that are less than 86501 bytes in size and returns expected (acceptable) times for requests that are >= 86501 bytes (usually half of the time reported by the client). As an example, for a 86501 bytes response, the client reported 4 seconds while the server reported 365 microseconds. For 86502 bytes, the client reported 4s and the sever reported 1.6s. I repeated this experience many times using different servers, the behavior is always the same. The number 86502 looks like magic !!
This experience explains the weird observations I initially had because all the API responses were less than 10Kb in size. However, this opens the door for a serious question. What the heck on earth is happening and how to explain this behavior ?
I've tried to search for answers but didn't find anything. The only thing I can think about is maybe it is related to Linux's sockets size and whether Go makes the system call in a non-blocking fashion. However, AFAIK, TCP packets transporting the HTTP responses should all be acknowledged by the receiver (the client) before the sender (the server) can return ! Breaking this assumption (as it looks like in this case) can lead to disasters ! Can someone please provide an explanation for this weird behavior ?
Technical details:
Go version: 12
OS: Debian Buster
Arch: x86_64

I'd speculate the question is stated in a wong way in fact: you seem to be guessing about how HTTP works instead of looking at the whole stack.
The first thing to consider is that HTTP (1.0 and 1.1, which is the standard version since long time ago) does not specify any means for either party to acknowledge data reception.
There exists implicit acknowledge for the fact the server received the client's request — the server is expected to respond to the request, and when it responds, the client can be reasonably sure the server had actually received the request.
There is no such thing working in the other direction though: the server does not expect the client to somehow "report back" — on the HTTP level — that it had managed to read the whole server's response.
The second thing to consider is that HTTP is carried over TCP connections (or TLS, whcih is not really different as it uses TCP as well).
An oft-forgotten fact about TCP is that it has no message framing — that is, TCP performs bi-directional transfer of opaque byte streams.
TCP only guarantees total ordering of bytes in these streams; it does not in any way preserve any occasional "batching" which may naturally result from the way you work with TCP via a typical programming interface — by calling some sort of "write this set of bytes" function.
Another thing which is often forgotten about TCP is that while it indeed uses acknowledgements to track which part of the outgoing stream was actually received by the receiver, this is a protocol detail which is not exposed to the programming interface level (at least not in any common implementation of TCP I'm aware of).
These features mean that if one wants to use TCP for message-oriented data exchange, one needs to implement support for both message boundaries (so-called "framing") and acknowledgement about the reception of individual messages in the procotol above TCP.
HTTP is a protocol which is above TCP but while it implements framing, it does not implement explicit acknowledgement besides the server responding to the client, described above.
Now consider that most if not all TCP implementations employ buffering in various parts of the stack. At least, the data which is submitted by the program gets buffered, and the data which is read from the incoming TCP stream gets buffered, too.
Finally consider that most commonly used TCP implementations provide for sending data into an active TCP connection through the use of a call allowing to submit a chunk of bytes of arbitrary length.
Considering the buffering described above, such a call typically blocks until all the submitted data gets copied to the sending buffer.
If there's no room in the buffer, the call blocks until the TCP stack manages to stream some amount of data from that buffer into the connection — freeing some room to accept more data from the client.
What all of the above means for net/http.ResponseWriter.Write interacting with a typical contemporary TCP/IP stack?
A call to Write would eventially try to submit the specified data into the TCP/IP stack.
The stack would try to copy that data over into the sending buffer of the corresponding TCP connection — blocking until all the data manages to be copied.
After that you have essentially lost any control about what happens with that data: it may eventually be successfully delivered to the receiver, or it may fail completely, or some part of it might succeed and the rest will not.
What this means for you, is that when net/http.ResponseWriter.Write blocks, it blocks on the sending buffer of the TCP socket underlying the HTTP connection you're operating on.
Note though, that if the TCP/IP stack detects an irrepairable problem with the connection underlying your HTTP request/response exchange — such as a frame with the RST flag coming from the remote part meaning the connection has been unexpectedly teared down — this problem will bubble up the Go's HTTP stack as well, and Write will return a non-nil error.
In this case, you will know that the client was likely not able to receive the complete response.

HTTP Request parsing

I would like to understand the usage of 'Transfer-encoding: Chunked' in case of HTTP requests.
Is it common for requests to be chunked?
My thinking is no since requests need to be completely read before processing, it does not make sense to be sending chunked requests.

It is not that common, but it can be very useful for large request bodies.
My thinking is no since requests need to be completely read before processing, it does not make sense to be sending chunked requests.
(1) No, they don't need to be read completely.
(2) ...and the main reason to compress it to save bytes on the wire anyway.

For an HTTP agent acting as a reverse proxy or a forward proxy, so taking a message from one side and sending it on the other side, using a chunked transmission means you can send the parts of the message you have without storing it locally. You avoid the 'buffering' problems, slowdown and storage.
You also have some optimizations based on each actor preferred size of data blocks, like you could have an actor which likes sending packets of 8000 bytes, because that's the good number for his own kernel settings (tcp windows, internal http server buffer size, etc), while another actor on the message transmission using smaller chunks of 2048 bytes.
Finally, you do not need to compute the size of the message, the message will end on the end-of-stream marker, that's all. Which is also usefull if you are sending something which is compressed on the fly, you may not know the final size until everything is compressed.
Chunked transmission is used a lot. It is the default mode of most HTTP servers if you ask for HTTP/1.1 mode and not HTTP/1.0.

Partial reading of HTTP requests

Suppose I have a server (REST) application, which does not need to read fully incoming HTTP requests. Clients may send large HTTP requests of any size but I need only first X Kilobytes.
I would like to read only X Kilobytes and immediately close the connection. Does it make sense? Is it legal in terms of HTTP? What are alternatives?

I would like to read only X Kilobytes and immediately close the connection. Does it make sense?
Not for a REST-ful application.
Is it legal in terms of HTTP?
Yes, technically. In the HTTP protocol a server response of some kind is always expected for a complete transaction. This will be experienced by the client as a premature ending of the connection, i.e. an incomplete or aborted transaction.
What are alternatives?
What are you trying to accomplish?
If you just want to read the first X bytes of whatever is sent by any client who connects and then not bother to reply at all, then the HTTP protocol is not for you, never mind REST.

Web server - how to parse requests? Asynchronous Stream Tokenizer?

I'm attempting to create a simple webserver in C# in asynchronous socket programming style. The purpose is very narrow - a Comet server (http long-polling).
I've got the windows service running, accepting connections, dumping request info to the Console and returning simple fixed content to the client.
Now, I can't figure out a manageable strategy for parsing the request data asynchronously and safely. I've written synchronous LL1 parsers before. I'm not sure if LL1 Parser is appropriate or necessary for HTTP. I don't know how to tokenize the input stream asynchronously. All I can think of is having an input buffer per client, reading into that, then copying that to a StringBuilder and periodically checking to see if I have a complete request. But that seems inefficient and might led to difficult to debug/maintain code.
Also, there are the two phases of the connection of receiving the request in full and the sending a response - in this case, after some delay. Once the request is validated and actionable, only then am I planning to enroll the connection in the long-polling manager. However, a misbehaving client could continue to send data and fill up a buffer, so I think I need to continue to monitor and empty the input stream during the response phase, right?
Any guidance on this is appreciated.
I guess the first step is knowing whether it is possible to efficiently tokenize a network stream asynchronously and without a large intermediate buffer. Even without a proper parser, the same challenges of creating a tokenizer apply to reading "lines" of input at a time, or even reading until double blank lines (one big token). I don't want to read one byte at a time from the network, but neither do I want to read too many bytes and have to store them in some intermediate buffer, right?

For HTTP the best way is reading the headers in memory completely (until you receive \r\n\r\n) and then simply splitting by \r\n to get the headers and every header by : to separate name and value.
There's no need to use a complex parser for that.

HTTP chunks: All chunks sent consecutively and uninterrupted?

Can I be sure that a chunked HTTP response will be sent uninterrupted by anything else? I need to differentiate responses (and requests) and this isn't a simple case of reading content length, seeing a closed connection or a no-body response code.
Can I read each chunk and once chunk-size is 0 I will have read exactly one response (or request)? i.e. is it possible for part of any other response to have been sent interleaved? I suspect it is sent consecutively and uninterrupted as there doesn't appear to be any kind of identification in the spec for chunked transfer, so how could more than one be reassembled?
Finally, if a response is sent chunked, does the client send anything more than its original request? I'm thinking along the lines of flow control and error checking but that is all handled at lower layers, so I suspect that the client does not send anything more.
Thanks!

"Interleaved" with what exactly? HTTP doesn't allow to send several responses concurrently on the same connection. Even with pipelining responses are still sent after each other. That is, you will see all the chunks coming in order before the response to any other request.
As for your final question, no, the client doesn't send anything more than the original request.