I'm writing a mobile application and I'm having difficulty downloading lengthy files from a Yahoo! server that periodically (about every three minutes) aborts the download. The mobile application successfully downloads lengthy files from other servers via the same slow data connection. A dump of the HTTP header from the Yahoo! server is
D/AsyncDownloadFile( 694): header fields: {p3p=[policyref="http://info.yahoo.co
m/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi
OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA PO
L HEA PRE LOC GOV"], content-type=[application/octet-stream], connection=[close]
, last-modified=[Fri, 06 Aug 2010 14:47:50 GMT], content-length=[2000000], age=[
0], server=[YTS/1.17.13], accept-ranges=[bytes], date=[Sat, 07 Aug 2010 18:53:02
GMT]}
which shows it defines connection=[close]. A different (non Yahoo!) server defines connection=[keep-alive] and my mobile application successfully downloads from it. So I have a few questions: What is causing the Yahoo! server to periodically abort the downloads? What can I do to avoid the periodic aborting or what can I do to resume an aborted download? Are requests for byte ranges considered when the server defines connection=[close]?
Things I've read:
description of persistent connections,
description of byte range requests
Things I've tried:
I've tried setting the http header "Connection" field to "keep-alive" but the Yahoo! server response was a header "Connection" field set to "close".
I've tried resuming the connection as described in this question. Even though the HTTP header from the Yahoo! server shows that it supports byte range requests and the content range responses appear correct (e.g., content-range=[bytes 387924-1999999/2000000]), the resumed transfers incorrectly resume at the file start. I'm wondering if that is due to the connection=[close] setting.
Because the subject of HTTP headers is rather new to me, any suggestions or warnings about common pitfalls are welcome.
[UPDATE] I received a reply from Yahoo! tech support saying that byte range is not supported: "Yahoo! Web Hosting does not support Accept-range header since we work with a pool of servers and each request potentially reaches a different server. You will see connection=[close] in the response header indicating this." The remaining question is whether or not the periodic download abortions can be avoided. (I tried requesting Connection="keep-alive" but that was ignored.)
I received a reply from Yahoo! tech support explaining that byte range requests are not accepted: "Yahoo! Web Hosting does not support Accept-range header since we work with a pool of servers and each request potentially reaches a different server. You will see connection=[close] in the response header indicating this."
Given that resuming a download is not possible, I asked if it was possible to avoid the periodic aborting of downloads (e.g., Connection=[keep-alive]). The Yahoo! tech support reply: "the process is handled by the system and there is currently no work around."
While these are not the answers I would have liked, I give credit to Yahoo! tech support for fielding questions about the Yahoo! server behavior.
Related
Do downloads use HTTP? How can they resume downloads after they have been suspended for several minutes? Can they request a certain part of the file?
Downloads are done over either HTTP or FTP.
For a single, small file, FTP is slightly faster (though you'll barely notice a differece). For downloading large files, HTTP is faster due to automatic compression. For multiple files, HTTP is always faster due to reusing existing connections and pipelining.
Parts of a file can indeed be requested independent of the whole file, and this is actually how downloads work. This is a process known as 'Chunked Encoding'. A browser requests individual parts of a file, downloads them independently, and assembles them in the correct order once all parts have been downloaded:
In chunked transfer encoding, the data stream is divided into a series of non-overlapping "chunks". The chunks are sent out and received independently of one another. No knowledge of the data stream outside the currently-being-processed chunk is necessary for both the sender and the receiver at any given time.
And according to FTP vs HTTP:
During a "chunked encoding" transfer, the sending party sends a stream of [size-of-data][data] blocks over the wire until there is no more data to send and then it sends a zero-size chunk to signal the end of it.
This is combined with a process called 'Byte Serving' to allow for resuming of downloads:
Byte serving begins when an HTTP server advertises its willingness to serve partial requests using the Accept-Ranges response header. A client then requests a specific part of a file from the server using the Range request header. If the range is valid, the server sends it to the client with a 206 Partial Content status code and a Content-Range header listing the range sent.
Do downloads use HTTP?
Yes. Especially since major browsers had deprecated FTP.
How can they resume downloads after they have been suspended for several minutes?
Not all downloads can resume after this long. If the (TCP or SSL/TLS) connection had been closed, another one has to be initiated to resume the download. (If it's HTTP/3 over QUIC, then it's another story.)
Can they request a certain part of the file?
Yes. This can be done with Range Requests. But it require server-side support (especially when the requested resource is provided by a dynamic script).
That other answer mentioning chunked transfer had mistaken it for the underlaying mechanism of TCP. Chunked transfer is not designed for the purpose of resuming partial downloads. It's designed for delimiting message boundary when the Content-Length header is not present, and when the communicating parties wish to reuse the connection. It is also used when the protocol version is HTTP/1.1 and there's a trailer fields section (which is similar to header fields section, but comes after the message body). HTTP/2 and HTTP/3 have their own way to convey trailers.
Even if multiple non-overlapping "chunks" of the resource is requested, it's encapsulated in a multipart/* message.
Are there any general rules on when a website sends out a TCP reset, triggering the Connection reset by peer error?
Like
too many open connections
too high bandwidth use
connected for too long
…?
I'm pretty certain that there is no law governing this and that different websites/web developers have different tastes, but I would be interested if there are some general rule sets (from websites or textbooks on the subject or what you have been taught in school/at work) that are mostly followed.
Reason why I'm asking, of course, is that I want to get around being blocked…
I'm downloading some government data that is freely available, but is lacking an API or something, so the two official ways to get it are either clicking around in some web-GIS a few thousand times or going along the Kafkaesque path of explaining various levels of clerks the concepts of databases, csv files, zip files and that you can't (and won't need to, if they'd just did what you try to explain them) just drive to their agency with a "giant" harddrive, so I'm trying to just go the most resource saving way for everyone involved…
A website is not "sending" a "Connection reset by peer" error. This error is generated by the OS kernel on the client site if it gets a TCP reset for an active connection. There are many reasons this TCP reset might be sent. A TCP reset might be sent by design from some kind of load limit, for example to limit the number of connections from the same IP address within a specific time as a form of DOS protection, to restrict data scraping or to enforce some kind of fair use. There is no general rule or even law for this kind of explicit limits.
A TCP reset might also be caused by the application being overloaded, application crashing, system running out of resources ... .
And a TCP reset will happen if the client writes to a connection which the server already considers as closed. This can happen for example with HTTP keep alive: the server might close the connection on inactivity at any time after the HTTP response was sent. If the client sends a new request on the same connection at the same time the server closes the connection, the server will reject the new request (since the connection is closed on the server end) and will send a TCP RST, causing a connection reset by peer at the client. The client needs to properly handle this situation by creating a new connection and sending the request again (provided that the request was not state changing, i.e. is idempotent).
I am using BizTalk Server 2009 to send EDI messages to my client using AS2. I am able to send messages, but getting an error while sending messages that exceeds 5KB of file size. I checked with the partner whether they are using any restrictions on file size, but they are able to exchange even some GB's of files with other trading partner.
I compared the files that are successfully sent with the one that failed. But found no difference between two except LIN, PIA, QTY and other segments.
I found the below error when tracked in event viewer.
The adapter failed to transmit message going to send port "SendTextFile" with URL "http://xxclienturlxx.com:2080/ipnet/as2". It will be retransmitted after the retry interval specified for this Send Port. Details:"The remote server returned an error: (500) Internal Server Error.".
How do I resolve this?
Found the solution..
In BizTalk 2009, disabling "Enable chunked encoding" in Send port Transport type properties did the trick for me...
To have large message support when the size of the message is greater than 48 KB, the http send adapter is sending the data in chunks to the server instead of a full stream"
This post HTTP Send Adapter - Submit to ASP Page Issue helped me to find the solution.
Just switched some downloads over to the Akamai CDN network and I'm seeing some strange stuff in the log files they deliver. A number of entries have the status code 000. When I asked them they said that 000 is the status when the client disconnects without transferring the entire file. Since 000 doesn't appear to be a valid HTTP response code (from the RFC), I have to wonder if that's right.
There's a knowledge base article (requires login) which lists their log values:
Log Delivery Services (LDS) LDS will show a 000 for any 200 or 206
responses with a client abort: the object was served correctly from
the origin or edge, but the end-user terminated the
connection/transaction before it completed.
This is indeed a custom status because the standard log format doesn't include a field which can indicate a client abort.
000 is a common code to use when no HTTP code was received due to a network error. According to a knowledge base article for Amazon CloudFront, 000 also means that the client disconnected before completing the request for that service.
It normally means: No valid HTTP response code
(ie: Connection failed, or was aborted before any data happened).
I would guess that their are either network issues or Akamai isn't managing their webservers correctly.
I need a way to detect a missing response to a long running HTTP POST request. This problem arises when the network infrastructure (firewalls, proxies, unplugged cables, etc.) drops the response packets. The server may detect this failure, but the client cannot send additional bytes after the POST to probe the state of the TCP connection. The failure may be limited to a single TCP connection. For example I may be able to subsequently open a new TCP connection to the server.
I'm looking for a solution that still uses HTTP POST and does not change the duration of the server side processing.
Some solutions that I can think of are:
Provide a side channel interface to retrieve request & response history. If the history lists the response as having been send (presumably resulting in a TCP error) but I have not yet received it within a reasonable time I can generate a local error.
Use an X header to request that the server deliver "spurious" 100 Continue provisional responses on a regular interval. If I fail to see an expected 100 Continue or a non-provisional response I can generate a local error.
Is there a state of the art solution for this problem?
It sounds to me like you are using Soap for something that would be much better done using a stateful connection, or a server side push technology.