How can I get the size of a resource without actually downloading it? - http

So I'm on very constrained bandwidth where I am right now and I clicked a link to a pdf tutorial for something and Chrome began to download it and I was watching the size spiral upward from 20Kb past 5Mb and decided to stop it. How do I know it's not a 4Gb pdf?? Ridiculous, I know.
But I started thinking, surely there must be a way I can simply request the size of the resource to check before downloading. Perhaps some sort of cURL request?
Does anyone know a way?

You could try using the HTTP HEAD method. This should get you the headers of the document without the body. This might have the content length in it.
Or you could send an HTTP Range request header with a GET request. See section 14.35.2 in this document. Range headers look like:
Range: 1-20000
which would request the first 20,000 bytes (octets) of a document. If the document is less than 20,000 bytes, you would get the whole document.
The only problem is that the server might not support the Range header, in which case it will send a 200 status instead of 206. In that case you can just reset the connection if you don't want to risk burning bandwidth on a 5Gb document.

Related

What size does the server have to give to each response chunk?

If I wanted to configure my personal server so that the response for a certain request is set according to the chunk rules: what size should each of the server response chunk have?
For example, let's say that the chunked response is a long HTML page or a file.
How would you behave in these two cases?
From the RFC:
This allows dynamically produced content to be transferred...
In other words: Transfer-Encoding: chunked is needed when the length of content is unknown.
The length of your content may be as big as 10Tb... but also it can be as small as 10 bytes. It doesn't matter. The chucks' sizes depend solely on the algorithms you are using to generate them and to read then.
Let's say you generate a stream of messages of different lengths, one character per second. In this case you can decide to send one byte chucks to the client. This way the client will be able to use the data as soon as it arrives. But if your client have no use for partial messages, then you probably should save the bandwidth and send a chunk at the moment you've finished generating the next message. And again it doesn't matter how big or small the message is. It can be 2 characters or it can be 1000.
On second thought, there are some use cases for Transfer-Encoding: chunked with the data of known size. But then your question becomes to broad to answer. It depends on your client code, server code, network conditions, data properties, desired user experience, etc.
And if by any chance you are asking about optimal size from the network perspective, then just send the whole file - that the best bet. And support Content-Range on your server instead of Transfer-Encoding: chunked.

How to send chunks of video for streaming using HTTP protocol?

I am creating an app which uses sockets to send data to other devices. I am using Http protocol to send and receive data. Now the problem is, i have to stream a video and i don't know how to send a video(or stream a video).
If the user directly jump to the middle of video then how should i send data.
Thanks...
HTTP wasn't really designed with streaming in mind. Honestly the best protocol is something UDP-based (SCTP is even better in some ways, but support is sketchy). However, I appreciate you may be constrained to HTTP so I'll answer your question as written.
I should also point out that streaming video is actually quite a deep topic and all I can do here is try to touch on some of the approaches that you might want to investigate. If you have control of the end-to-end solution then you have some choices to make - if you only control one end, then your choices are more or less dictated by what's available at the other end.
If you only want to play from the start of the file then it's fairly straightforward - make a standard HTTP request and just start playing as soon as you've buffered up enough video that you can finish downloading the file before you catch up with your download rate. You don't need any special server support for this and any video format will work.
Seeking is trickier. You could take the approach that sites like YouTube used to take which is to simply not allow the user to seek until the file has downloaded enough to reach that point in the video (or just leave them looking at a spinner until that point is reached). This is not the user experience that most people will expect these days, however.
To do better you need to be in control of the streaming client. I would suggest treating the file in chunks and making byte range requests for one chunk at a time. When the user seeks into the middle of the file, you can work out the byte offset into the file and start making byte range requests from that point.
If the video format contains some sort of index at the start then you can use this to work out file offsets - so, your video client would have to request at least enough to get the index before doing any seeking.
If the format doesn't have any form of index but it's encoded at a constant bit rate (CBR) then you can do an initial HEAD request and look at the Content-Length header to find the size of the file. Then, if the use seeks 40% of the way through the video, for example, you just seek to 40% of the way through the encoded frames. This relies on knowing enough about the file format that you can calculate an appropriate seek point so that you can identify framing information and the like (or at least an encoding format which allows you to resynchonise with both the audio and video streams even if you jump in at an arbitrary point in the file). This approach might also work with variable bit rate (VBR) as long as the format is such that you can recover from an arbitrary seek.
It's not ideal but as I said, HTTP wasn't really designed for streaming.
If you have control of the file format and the server, you could make life easier by making each chunk a separate resource. This is how Apple HTTP live streaming and Microsoft smooth streaming both work. They need tool support to pre-process the video, and I don't know if you have control of the server end. Might be worth looking into, however. These also do more clever tricks such as allowing a client to switch between multiple versions of the stream encoded at different bit rates to cope with differences in bandwidth.

Pre flush head tag with gzip support

http://developer.yahoo.com/performance/rules.html
There it is given it is good to preflush the head tag .
But I have a question will it help while using gzip ? (I am using apache2).
I think full document will get gziped at one shot and then send to the client.
or is it also possible to have gzip as well as pre-flush the head tag
EDITED
The original version of this question suggested we were dealing with HTTP headers rather than the <head> section on an HTML document. I will leave my original answer below, but it actually has no relevance to this specific question.
To answer the question about pre-flushing the <head> section of a document - while it would be possible to do this in combination with gzip, it is probably not possible without more granular control over the gzip process than Apache affords. It is possible to break a gzipped stream into chunks that can be decompressed on their own (see this) but if there is a way to control Apache's gzip implementation to such a degree then I am not aware of it.
Doing so would likely decrease the efficacy of the gzip, making the compressed size larger, and would only be worth doing when the <head> of a document was particularly large, say, greater than 10KB (this is a somewhat arbitrary value I arrived at by reading about how gzip works under the bonnet, and should definitely not be taken as gospel).
Original answer, relating to the HTTP headers:
Purely from the viewpoint of the HTTP protocol, rather than exactly how you would implement it on an Apache based server, I can't see any reason why you can't preflush the headers and also use gzip to compress the body. Keeping in mind that fact that the headers are never gzipped (if they were, how would the client know they had been?), the transfer encoding of the content should have no effect on when you send the headers.
There are, however a couple of things to keep in mind:
Once the headers have been sent, you can't change your mind about the transfer encoding. So if you send the headers which state that the body will be gzipped, then realise that your body is only 4 bytes, your would still have to gzip it anyway, which would actually increase the size of the body. This probably wouldn't be a problem unless you were omitting the Content-Length: header which while possible, is bad practice as it means you cannot use persistent connections. This leads on to the next point...
You cannot send a Content-Length: header in this secenario. This is because you don't know what the size of the body is until you have compressed it, by which time it is ready to send, so you are not really (from the server's point of view) preflushing the headers, even if you do send them seperately before you start to send the body.
If it takes you a long time time to compress the body of the message (slow/heavily loaded server, very large body etc etc), and you don't start the compression until after you have sent the headers, there is a risk the client may time out waiting for the rest of the response. This depends entirely on the client, but the are so many HTTP client implementations out there that this possibility cannot be totally discounted.
In short, yes it is possible to do it, but there is no catch-all, "Yes, do it" or "No, don't do it" answer - whether you would do it depends on each request and the nature of it's response.

HTTP response was too large: 10485810. The limit is: 10485760

i have written an online brainfuck interpreter ..!! the problem is when i take the text input , it gives an error !!...
HTTP response was too large: 10485810. The limit is: 10485760.
it seems the max limit of gae is 1mb.. how can i get around it !1
Look again. The limit is 10 MiB.
This is not a limitation in the HTTP protocol, so the limitation is in the server platform that you are using (which you haven't specified in your question).
That's more data that you would reasonably send to the browser, so you clearly have an eternal loop that sends data until the buffer is full.
You can get around the limit by turning off buffering, but that will not remove the problem. Instead your code will just loop until the browser crashes from the huge response.
Optimise your Interpreter.
Whatever BF input you had, you really should not exceed the 10 MB response limit.

implementing a download manager that supports resuming

I intend on writing a small download manager in C++ that supports resuming (and multiple connections per download).
From the info I gathered so far, when sending the http request I need to add a header field with a key of "Range" and the value "bytes=startoff-endoff". Then the server returns a http response with the data between those offsets.
So roughly what I have in mind is to split the file to the number of allowed connections per file and send a http request per splitted part with the appropriate "Range". So if I have a 4mb file and 4 allowed connections, I'd split the file to 4 and have 4 http requests going, each with the appropriate "Range" field. Implementing the resume feature would involve remembering which offsets are already downloaded and simply not request those.
Is this the right way to do this?
What if the web server doesn't support resuming? (my guess is it will ignore the "Range" and just send the entire file)
When sending the http requests, should I specify in the range the entire splitted size? Or maybe ask smaller pieces, say 1024k per request?
When reading the data, should I write it immediately to the file or do some kind of buffering? I guess it could be wasteful to write small chunks.
Should I use a memory mapped file? If I remember correctly, it's recommended for frequent reads rather than writes (I could be wrong). Is it memory wise? What if I have several downloads simultaneously?
If I'm not using a memory mapped file, should I open the file per allowed connection? Or when needing to write to the file simply seek? (if I did use a memory mapped file this would be really easy, since I could simply have several pointers).
Note: I'll probably be using Qt, but this is a general question so I left code out of it.
Regarding the request/response:
for a Range-d request, you could get three different responses:
206 Partial Content - resuming supported and possible; check Content-Range header for size/range of response
200 OK - byte ranges ("resuming") not supported, whole resource ("file") follows
416 Requested Range Not Satisfiable - incorrect range (past EOF etc.)
Content-Range usu. looks like this: Content-Range: bytes 21010-47000/47022, that is bytes start-end/total.
Check the HTTP spec for details, esp. sections 14.5, 14.16 and 14.35
I am not an expert on C++, however, I had once done a .net application which needed similar functionality (download scheduling, resume support, prioritizing downloads)
i used microsoft bits (Background Intelligent Transfer Service) component - which has been developed in c. windows update uses BITS too. I went for this solution because I don't think I am a good enough a programmer to write something of this level myself ;-)
Although I am not sure if you can get the code of BITS - I do think you should just have a look at its documentation which might help you understand how they implemented it, the architecture, interfaces, etc.
Here it is - http://msdn.microsoft.com/en-us/library/aa362708(VS.85).aspx
I can't answer all your questions, but here is my take on two of them.
Chunk size
There are two things you should consider about chunk size:
The smaller they are the more overhead you get form sending the HTTP request.
With larger chunks you run the risk of re-downloading the same data twice, if one download fails.
I'd recommend you go with smaller chunks of data. You'll have to do some test to see what size is best for your purpose though.
In memory vs. files
You should write the data chunks to in memory buffer, and then when it is full write it to the disk. If you are going to download large files, it can be troublesome for your users, if they run out of RAM. If I remember correctly the IIS stores requests smaller than 256kb in memory, anything larger will be written to the disk, you may want to consider a simmilar approach.
Besides keeping track of what were the offsets marking the beginning of your segments and each segment length (unless you want to compute that upon resume, which would involve sort the offset list and calculate the distance between two of them) you will want to check the Accept-Ranges header of the HTTP response sent by the server to make sure it supports the usage of the Range header. The best way to specify the range is "Range: bytes=START_BYTE-END_BYTE" and the range you request includes both START_BYTE and byte END_BYTE, thus consisting of (END_BYTE-START_BYTE)+1 bytes.
Requesting micro chunks is something I'd advise against as you might be blacklisted by a firewall rule to block HTTP flood. In general, I'd suggest you don't make chunks smaller than 1MB and don't make more than 10 chunks.
Depending on what control you plan to have on your download, if you've got socket-level control you can consider writing only once every 32K at least, or writing data asynchronously.
I couldn't comment on the MMF idea, but if the downloaded file is large that's not going to be a good idea as you'll eat up a lot of RAM and eventually even cause the system to swap, which is not efficient.
About handling the chunks, you could just create several files - one per segment, optionally preallocate the disk space filling up the file with as many \x00 as the size of the chunk (preallocating might save you sometime while you write during the download, but will make starting the download slower), and then finally just write all of the chunks sequentially into the final file.
One thing you should beware of is that several servers have a max. concurrent connections limit, and you don't get to know it in advance, so you should be prepared to handle http errors/timeouts and to change the size of the chunks or to create a queue of the chunks in case you created more chunks than max. connections.
Not really an answer to the original questions, but another thing worth mentioning is that a resumable downloader should also check the last modified date on a resource before trying to grab the next chunk of something that may have changed.
It seems to me you would want to limit the size per download chunk. Large chunks could force you to repeat download of data if the connection aborted close to the end of the data part. Specially an issue with slower connections.
for the pause resume support look at this simple example
Simple download manager in Qt with puase/ resume support

Resources