Save bandwidth for HTTP partially download case - http

I have such a special requirement:
Download a big file from server with HTTP.
If detect network not so good, then giveup current download but download another smaller file.
But per my test, it seems if I send a HTTP GET to server, it will download the file continuously even if I only read 1024 bytes, then I try to close the connection after detect bandwidth is low, but actually download bytes is larger than what I request.
To save bandwidth, I wish server do not send more data then my request, but it sounds like impossible. Then what's the actually mechanism to stop send data to client if we only request partially? e.g. 1024 bytes? If I do not disconnect, server will sent data to client till the whole file is finished?

Related

What's Download and Upload Packet?

Note: My question remains un-answered.
After hours of reading I'm confused about download/upload speeds.
Let's suppose I sent an http request to fetch an image, all websites mention this as "Download" but why?
My computer is sending the request and getting a response back, so it's both uploading (sending packet) and downloading (getting packet). Thus it's not pure Download.
Plus, I read that internet provides prioterize download over upload but how? given a packet from A to B how can they decide if it's an upload or download?
From A's perspective it's Upload, but according to B it's Download...
Where did you read about prioritizing download speed over upload? ISPs can only prioritize speed traffic by IP (to move traffic to another chanel out of shaper or to make this traffic use another shaper's rules) or by shaper set rules for UDP, Vo-Ip or traffic per VLAN/port. Web sites can optimize data for displaying content cached and compressed, for example. Or to use the uploading feature on other upstream servers. But you, as a user, upload to the server RAW data (an uncompressed set of bytes). For managing speeds from an ISP, the ISP must monitor each packet from each user. And then it can limit speeds either by packet size (but it will work 50/50 'cos files are uploaded by a few requests) or by meta data from a packet. It is a massive load for all infrastructure. And accordingly, it is a very costly process for businesses.

How to download image via gRPC Gateway?

I want to serve files(images) via gRPC Gateway from gRPC server. Since protocol buffers messages have sctructure, I don't see how I could ensure the gateway to send content of the bytes field of the response message instead of the entire json-encoded message. Is there a native solution for this or does one simply have to write a dedicated http muxer to handle these requests?
Rather than sending arbitrarily-sized files (probably as bytes), it would probably be better to include a URL to the file in the message and then host/serve the file over HTTP (e.g. from S3, Google Cloud Storage etc. from which you could generate signed URLs to limit access).
I think the max message size is 2GB (source?) and the recommendation (Large Data Sets) is to consider alternative techniques once messages sizes exceed few MBs.

vertex tcp client dropping packet

I have created Vertx TCP Server and corresponding TCP Client. The job of the TCP client is to read logs (in gzip format) from a specific directory and continuously upload to the server. The server would then do some processing and persist the data. Because the logs are in gzip format, hence the average size of a log file is 144 KB. My TCP Client simulator is spawning 4 threads, each thread will create 100 such log files and upload them to vertex server. So at the end, I expect to see 400 (= 100 * 4) files uploaded to the server.
But to my surprise, once the simulator finishes, I see the server has processed only 76 files and there is no sign of remaining 324 files. I have run it multiple times, but every time the server processes around 75 files.
From the TCP client log, I see that all 400 files have been uploaded. So it seems the server is dropping some packet. But from the server log, I see only ~76 files have been processed.
Later I reduced the compressed log file size generated by the simulator to make it 8KB (each log file now has lesser number of log entries), and this time I see around 398 files are processed at server side.
Apparently it seems to be some configuration issue. Maybe I have to configure the Vertex server with more resources (instances, increase received buffer size and event queue size, etc). But how would anybody know the limit, beyond which the packets will be dropped.
More precisely, what are the "specific" configuration required in such case.
Secondly, is it possible to configure the Vertx "TCP" client/server with some acknowledgement sort of mechanism, so that the client can upload the subsequent file, only when the previously uploaded file has been processed successfully.

How is a browser able to resume downloads?

Do downloads use HTTP? How can they resume downloads after they have been suspended for several minutes? Can they request a certain part of the file?
Downloads are done over either HTTP or FTP.
For a single, small file, FTP is slightly faster (though you'll barely notice a differece). For downloading large files, HTTP is faster due to automatic compression. For multiple files, HTTP is always faster due to reusing existing connections and pipelining.
Parts of a file can indeed be requested independent of the whole file, and this is actually how downloads work. This is a process known as 'Chunked Encoding'. A browser requests individual parts of a file, downloads them independently, and assembles them in the correct order once all parts have been downloaded:
In chunked transfer encoding, the data stream is divided into a series of non-overlapping "chunks". The chunks are sent out and received independently of one another. No knowledge of the data stream outside the currently-being-processed chunk is necessary for both the sender and the receiver at any given time.
And according to FTP vs HTTP:
During a "chunked encoding" transfer, the sending party sends a stream of [size-of-data][data] blocks over the wire until there is no more data to send and then it sends a zero-size chunk to signal the end of it.
This is combined with a process called 'Byte Serving' to allow for resuming of downloads:
Byte serving begins when an HTTP server advertises its willingness to serve partial requests using the Accept-Ranges response header. A client then requests a specific part of a file from the server using the Range request header. If the range is valid, the server sends it to the client with a 206 Partial Content status code and a Content-Range header listing the range sent.
Do downloads use HTTP?
Yes. Especially since major browsers had deprecated FTP.
How can they resume downloads after they have been suspended for several minutes?
Not all downloads can resume after this long. If the (TCP or SSL/TLS) connection had been closed, another one has to be initiated to resume the download. (If it's HTTP/3 over QUIC, then it's another story.)
Can they request a certain part of the file?
Yes. This can be done with Range Requests. But it require server-side support (especially when the requested resource is provided by a dynamic script).
That other answer mentioning chunked transfer had mistaken it for the underlaying mechanism of TCP. Chunked transfer is not designed for the purpose of resuming partial downloads. It's designed for delimiting message boundary when the Content-Length header is not present, and when the communicating parties wish to reuse the connection. It is also used when the protocol version is HTTP/1.1 and there's a trailer fields section (which is similar to header fields section, but comes after the message body). HTTP/2 and HTTP/3 have their own way to convey trailers.
Even if multiple non-overlapping "chunks" of the resource is requested, it's encapsulated in a multipart/* message.

How to send big data to an HTTP server where the client has limited RAM?

I am designing a temperature and humidity logging system using microcontrollers and embedded C. The system sends the data to the server using GET method (could be POST,too) whenever it gathers a new data from the sensors. Whenever the power and/or the internet is gone, it logs to an SD card.
I want to send these logged data to the server whenever the internet comes back. When sent separately, each of my request is as follows and it takes about 5 seconds to complete even on my local network.
GET /add.php?userID=0000mac=000:000:000:000:000:000&id=0000000000sensor=000&temp=%2B000.000&hum=000.000 HTTP/1.1\r\nHost: 192.168.10.25\r\n\r\n
However, since my available RAM is very limited to only about 400 bytes in this microcontroller, I cannot buffer and send all the logged data in one request.
For an electricity/internet loss of 2 days, about 3000 of data set is logged. How do I send these to the server in a very quick way?
Well I think a simple for loop will do it. Here is the pseudocode:
foreach(loggedRequest){
loadTheRequestFromTheLogFile();
sendTheDataForThisRequest(); // Hang here until the server returns a response
clearMemoryFromPreviousRequest();
}
By waiting for the server's response, you will be sure that the data got to the server as well as that you will not have the ram full.
You can also go with an inner for loop with a fixed number of requests and then wait for all their responses. This will be faster but you have to do multiple tests to be sure you're not reaching to the ram limit.

Resources