recv() data of unknown size with Berkeley Sockets - http

I have a code in C++ in which i use recv() from Berkeley Sockets to receive data from a remote host. The issue is that i do not know the size of the data ( which is variable ) so i need some kind of timeout opt ( probably ) to make this work.
Since I'm new in sockets programming, i was wondering how does for example a web client handle responses from a server ( eg a server sends the html data to the client ). Does it use some kind of timeout, since it doesn't know how big the page is ? Same with an FTP client.

When your data is of variable length, then typically that data is framed within another container. That is to say, there's a header preceding the actual data block that tell the receiver how much data it should accept.
For example HTTP uses new line characters to delimit data. If there's variable-length message, then in the header it will include "Content-length:" field that indicates exactly how many bytes to read once entire header is received (header stops when you read 2 consecutive new lines).
It is perfectly fine to read 4 bytes from socket, get how much data follows, then do another receive and read the rest. Only be careful, when you ask for 4 bytes, the socket might give you anywhere between 1-4 bytes so anything less than 4 means you need to go back and ask for remaining few bytes. This is a very common mistake. In dev environment you will almost always get 4 bytes when asking for 4, but once you deploy your app, somewhere on some machine you will get random crashes because their network behavior is somehow different.
Generally, it is a bad approach to rely on timeouts to determine when you reach end of data. With a timeout, you might get things "reliably" working in a well-controlled dev environment, but it is a very flaky solution. Any CPU/disk/network hick up might cause your app to stop receiving prematurely. You are also limiting your data throughput and responsiveness since your app is sleeping for some time interval instead of doing work.

Related

Does Golang's (*http.ResponseWriter) Write() method block until data is recieved by the client?

I ask this question because I had a very weird puzzling experience that I am about to tell.
I am instrumenting an HTTP API server to observe it's behavior in the presence of latency between the server and the clients. I had a setup consisting of a single server and a dozen of clients connected with a 10Gbps Ethernet fabric. I measured the time it took to serve certain API requests in 5 scenarios. In each scenario, I set the latency between the server and the clients to one of the values: No latency (I call this baseline), 25ms, 50ms, 250ms or 400ms using the tc-netem(8) utility.
Because I am using histogram buckets to quantify the service time, I observed that all the requests were processed in less than 50ms whatever the scenario is, which clearly doesn't make any sense as, for example, in the case of 400ms, it should be at least around 400ms (as I am only measuring the duration from the moment the request hits the server to the moment the HTTP Write()function returns). Note that the response objects are between 1Kb to 10Kb in size.
Initially, I had doubts that the *http.ResponsWriter's Write() function was asynchronous and returns immediately before data is received by the client. So, I decided to test this hypothesis by writing a toy HTTP server that services the content of a file that is generated using dd(1) and /dev/urandom to be able to reconfigure the response size. Here is the server:
var response []byte
func httpHandler(w http.ResponseWriter, r * http.Request) {
switch r.Method {
case "GET":
now: = time.Now()
w.Write(response)
elapsed: = time.Since(now)
mcs: = float64(elapsed / time.Microsecond)
s: = elapsed.Seconds()
log.Printf("Elapsed time in mcs: %v, sec:%v", mcs, s)
}
}
func main() {
response, _ = ioutil.ReadFile("BigFile")
http.HandleFunc("/hd", httpHandler)
http.ListenAndServe(":8089", nil)
}
Then I start the server like this:
dd if=/dev/urandom of=BigFile bs=$VARIABLE_SIZE count=1 && ./server
from the client side, I issue time curl -X GET $SERVER_IP:8089/hd --output /dev/null
I tried with many values of $VARIABLE_SIZE from the range [1Kb, 500Mb], using an emulated latency of 400ms between the server and each one of the clients. To make long story short, I noticed that the Write() method blocks until the data is sent when the response size is big enough to be visually noticed (on the order of tens of megabytes). However, when the response size is small, the server doesn't report a mentally sane servicing time compared to the value reported by the client. For a 10Kb file, the client reports 1.6 seconds while the server reports 67 microseconds (which doesn't make sense at all, even me as a human I noticed a little delay on the order of a second as it is reported by the client).
To go a little further, I tried to find out starting from which response size the server returns a mentally acceptable time. After many trials using a binary search algorithm, I discovered that the server always returns few microseconds [20us, 600us] for responses that are less than 86501 bytes in size and returns expected (acceptable) times for requests that are >= 86501 bytes (usually half of the time reported by the client). As an example, for a 86501 bytes response, the client reported 4 seconds while the server reported 365 microseconds. For 86502 bytes, the client reported 4s and the sever reported 1.6s. I repeated this experience many times using different servers, the behavior is always the same. The number 86502 looks like magic !!
This experience explains the weird observations I initially had because all the API responses were less than 10Kb in size. However, this opens the door for a serious question. What the heck on earth is happening and how to explain this behavior ?
I've tried to search for answers but didn't find anything. The only thing I can think about is maybe it is related to Linux's sockets size and whether Go makes the system call in a non-blocking fashion. However, AFAIK, TCP packets transporting the HTTP responses should all be acknowledged by the receiver (the client) before the sender (the server) can return ! Breaking this assumption (as it looks like in this case) can lead to disasters ! Can someone please provide an explanation for this weird behavior ?
Technical details:
Go version: 12
OS: Debian Buster
Arch: x86_64
I'd speculate the question is stated in a wong way in fact: you seem to be guessing about how HTTP works instead of looking at the whole stack.
The first thing to consider is that HTTP (1.0 and 1.1, which is the standard version since long time ago) does not specify any means for either party to acknowledge data reception.
There exists implicit acknowledge for the fact the server received the client's request — the server is expected to respond to the request, and when it responds, the client can be reasonably sure the server had actually received the request.
There is no such thing working in the other direction though: the server does not expect the client to somehow "report back" — on the HTTP level — that it had managed to read the whole server's response.
The second thing to consider is that HTTP is carried over TCP connections (or TLS, whcih is not really different as it uses TCP as well).
An oft-forgotten fact about TCP is that it has no message framing — that is, TCP performs bi-directional transfer of opaque byte streams.
TCP only guarantees total ordering of bytes in these streams; it does not in any way preserve any occasional "batching" which may naturally result from the way you work with TCP via a typical programming interface — by calling some sort of "write this set of bytes" function.
Another thing which is often forgotten about TCP is that while it indeed uses acknowledgements to track which part of the outgoing stream was actually received by the receiver, this is a protocol detail which is not exposed to the programming interface level (at least not in any common implementation of TCP I'm aware of).
These features mean that if one wants to use TCP for message-oriented data exchange, one needs to implement support for both message boundaries (so-called "framing") and acknowledgement about the reception of individual messages in the procotol above TCP.
HTTP is a protocol which is above TCP but while it implements framing, it does not implement explicit acknowledgement besides the server responding to the client, described above.
Now consider that most if not all TCP implementations employ buffering in various parts of the stack. At least, the data which is submitted by the program gets buffered, and the data which is read from the incoming TCP stream gets buffered, too.
Finally consider that most commonly used TCP implementations provide for sending data into an active TCP connection through the use of a call allowing to submit a chunk of bytes of arbitrary length.
Considering the buffering described above, such a call typically blocks until all the submitted data gets copied to the sending buffer.
If there's no room in the buffer, the call blocks until the TCP stack manages to stream some amount of data from that buffer into the connection — freeing some room to accept more data from the client.
What all of the above means for net/http.ResponseWriter.Write interacting with a typical contemporary TCP/IP stack?
A call to Write would eventially try to submit the specified data into the TCP/IP stack.
The stack would try to copy that data over into the sending buffer of the corresponding TCP connection — blocking until all the data manages to be copied.
After that you have essentially lost any control about what happens with that data: it may eventually be successfully delivered to the receiver, or it may fail completely, or some part of it might succeed and the rest will not.
What this means for you, is that when net/http.ResponseWriter.Write blocks, it blocks on the sending buffer of the TCP socket underlying the HTTP connection you're operating on.
Note though, that if the TCP/IP stack detects an irrepairable problem with the connection underlying your HTTP request/response exchange — such as a frame with the RST flag coming from the remote part meaning the connection has been unexpectedly teared down — this problem will bubble up the Go's HTTP stack as well, and Write will return a non-nil error.
In this case, you will know that the client was likely not able to receive the complete response.

Is it possible to transfer a file through CoAP?

Recently, I am doing a project and I am trying to transfer a json file to the CoAP server. I put some random values in key:value pairs such as:
{
key1: value1,
key2: [value21, value22, value23]
}
Questions:
CoAP is pretty much similar to HTTP. So, like HTTP, is it possible to transfer a json file through CoAP using POST/PUT method? If it is possible, what is the recommended directory location to put the uploaded file into the server (resource directory)?
Update:
The actual file size is about to 152.8 kB.
You can transfer arbitary JSON files using CoAP POST/PUT. Which directory would be writable depends fully on the server.
Note that for a file of that size, transfer times would be considerably longer than with HTTP, as packages are sent in lock-step (putting the first 1kB, response, next 1kB – whereas HTTP has a TCP window).
For a first shot, you may try out eclipse/californium's "simple-fileserver-example".
cf-simple-fileserver
The supports the read (GET) and uses option block 2 for that.
If you go deeper and leave the laboratory, RFC7959 blockwise may be faced several issues.
coap usually assumes, that the endpoints are identified by their ip-address (and
port). Though a blockwise transfer may last longer, that assumption may get broken. If the client is faced such a address change, a block option 2 (GET) may work, but for block option 1 (PUT), that would require special preparation.
Though such a blockwise transfer tends to last longer, it may get paused due to temporary transmission issus. That requires then a "resumption or fail" strategy. Also here GET is much easier than PUT.
Basic transmission issues on crashes. In my experience, blockwise comes with many blocks and so many MID are in use in a short period of time. If a client crashs and select a random MID on startup, the probability of an unaware MID clash is rather high. Depending on the coap servers deduplication implementation (strict according RFC7252 or advanced in awareness of that), your client may require a strategy to escape the situation, where the server retransmits unrelated messages just based on MIDs. My experience from that time was, "analyse what your get, if it smells, wait for the 247s :-)". Your client may also save the last used MID to overcome that or use a special/separate "blockwise endpoint" with disabled deduplication.
IP. FMPOV some have seen the issues left to the implementation and started to fill patents. That may require attention as well.
All together: If you use bockwise for payload of sometimes some K bytes, my experience is not that bad. But if you regulary transfer more, coap may be not the right choice.

How to read stream data from a TCP socket in Swift 2?

Let's suppose, I have a custom server that listens to connections on some port and once it has received a connection, it starts sending data (sort of a logger). Here's the first question:
Can it be just binary data? Actually, I need just two non-zero 8-bit values, and I was thinking of 0-value byte to separate each new portion of data.
These three bytes will be sent once or may be twice a second.
So, now I am looking for some code snippet in Swift 2 to properly read this data. Normally, I would expect calling
connectSocket(IP,port)
which would connect to the socket, and once it receives the first chunk of data,
socketCallBack()
is called, or something like that.
Intuitively, I don't like the idea of checking data in a while (true) loop. Or is this the proper way?
I've seen an example, when it first sends 'get' request to the server and immediately starts waiting for response. Probably, I can call it using a timer, once a second? Will it be correct?
What I am concerned about is trafic. Right now I have impemented it through a web-server, but I don't like that it spends way too much trafic for that overhead http data.
Probably, with that tcp connections on timer that would be much less, and it would save even more trafic if I establish just one connection in the beginning and transmit the data within this connection. Am I right?

How much data to send via TCP/IP at once

I've written a small program with the boost asio library to transfer files via TCP from a server to one or more clients.
During testing I found out that the transfer is extremely slow, about 10KiB/s. Nagle's algorithm is already disabled. If I transfer the same file via FileZilla from the same server to the same client, I get about 280KiB/s, so obviously something was very wrong.
My approach so far was to fragment each file into smaller packets of 1024 bytes, send one fragment(each fragment=1 async_write-call) to the client and wait for the client's response. I need to fragment the data to allow the client to keep track of the download progress and speed. In retrospect I suppose this was rather naïve, because the server has to wait for the client's response after each fragment. To check if this was the bottleneck, I've increased the fragment size twice, giving me the following results:
a) Fragment Size: 1024bytes
Transfer Speed: ~10KiB/s
b) Fragment Size: 8192bytes
Transfer Speed: ~80KiB/s
c) Fragment Size: 20000bytes
Transfer Speed: ~195KiB/s
The results speak for themselves, but I'm unsure what to do now.
I'm not too familiar with how the data transfer is actually handled internally, but if I'm not mistaken all of my data is basically added onto a stream? If that's the case, do I need to worry about how much data I write to that stream at once? Does it make a difference at all whether I use multiple write-calls with small fragments as opposed to one write-call with a large fragment? Are there any guidelines for this?
Simply stream the data to the client without artificial packetization. Reenable nagling, this is not a scenario that calls for disabling it. It will cause small inefficiencies to have it disabled.
Typical write buffer sizes would be 4KB and above.
The client can issue read calls to the network one after the other. After each successful read the client will have a new estimation for the current progress that is quite accurate. Typically, there will be one succeeding read call for each network packet received. If the incoming rate is very high then multiple packets tend to be coalesced into one read. That's not ap roblem.
If that's the case, do I need to worry about how much data I write to that stream at once?
No. Just keep a write call outstanding at all times.

WSAECONNABORTED when using recv for the second time

I am writing a 2D multiplayer game consisting of two applications, a console server and windowed client. So far, the client has a FD_SET which is filled with connected clients, a list of my game object pointers and some other things. In the main(), I initialize listening on a socket and create three threads, one for accepting incoming connections and placing them within the FD_SET, another one for processing objects' location, velocity and acceleration and flagging them (if needed) as the ones that have to be updated on the client. The third thread uses the send() function to send update info of every object (iterating through the list of object pointers). Such a packet consists of an operation code, packet size & the actual data. On the client I parse it, by reading first 5 bytes (the opcode and packet size) which are received correctly, but when I want to read the remaining part of the packet (since I now know the size of it), I get a WSAECONNABORTED (error code 10053). I've read about this error, but can't see why it occurs in my application. Any help would be appreciated.
The error means the system closed the socket. This could be because it detected that the client disconnected, or because it was sending more data than you were reading.
A parser for network protocols typcally needs a lot of work to make it robust, and you can't tell how much data you will get in a single read(), e.g. you may get more than your operation code and packet size in the first chunk you read, you might even get less (e.g. only the operation code). Double check this isn't happening in your failure case.

Resources