I am having difficulties understanding the HTTP request and response flow. I am working with a system where I can "hijack" incoming HTTP request and give my own response. The problem I am having is that some type of GET request seem to assume that all data is sent back in first request.
For instance, JPEG image requests, no matter the size (my tests include 0-20 MB JPEG files) seems to assume that the entire data is sent in the first response. Even if I don't send any data and explicitly set range header to 0 I never get a response back from the client asking for the data.
Other data request types, such as mp4 video, the client seems perfectly fine with getting a response with only header information with no data and then sends a new request explicitly asking for byte range 0-.
Is there some kind of agreement between the the client and server that some types should be sent back in one request while others can be split up in a number of requests?
Related
Does every HTTP request need to be paired with a response? When you do some POST or DELETE actions, my understanding is that sometimes you don't need to send back data. I've always been told to send back an empty object, but is that necessary? Also, is sending a status code considered a response?
Q1: Does every HTTP request need to be paired with a response?
Yes, unless client cancel the request. Actually, one HTTP request needs to be paired with one or more HTTP responses. According to RFC7231:
A server listens on a connection for a request, parses each message received, interprets the message semantics in relation to the identified request target, and responds to that request with one or more response messages.
Q2: When you do some POST or DELETE actions, my understanding is that sometimes you don't need to send back data. I've always been told to send back an empty object, but is that necessary?
It's NOT necessary to send back an empty object (payload). According to RFC7230, the response payload is not required:
A server responds to a client's request by sending one or more HTTP response messages, each beginning with... and finally a message body containing the payload body (if any).
However, although you don't have to "send back data", you still need to send back message, such as HTTP response statuc code and some necessary response headers.
Q3: is sending a status code considered a response?
Yes. Theoretically, a minimal HTTP response can only include HTTP protocol version, status code and status code textual phrase.
How should proxies / gateways behave when http servers send HTTP response where the data size exceeds content-length?
Dropping it as a RFC non-compliance is one way to go but looks like there are quite a few implementations/deployments with this behaviour today and this change will end up breaking those URLs.
Will really appreciate any insights/pointers.
Thanks,
Dev
If the data size exceeds content-length, the remaining bytes on the wire are considered part of the response to the next (pipelined) request.
If there isn't an outstanding request to match with that response, see https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-p1-messaging-26#section-3.3.3 -
If the final response to the last request on a connection has been
completely received and there remains additional data to read, a user
agent MAY discard the remaining data or attempt to determine if that
data belongs as part of the prior response body, which might be the
case if the prior message's Content-Length value is incorrect. A
client MUST NOT process, cache, or forward such extra data as a
separate response, since such behavior would be vulnerable to cache
poisoning.
I am trying to create an HTTP parser, and I am interested to know how can I retrieve the URL from HTTP response message text. Does every response contain the URL of the page? Do I need to look at some fields in the header for that or just look at the packet's body? Do I need to save information from previous packets (such as location message)?
Requests have URLs, responses are data packets sent back to the client. But if you make an HTTP request, the immediate response will come from the same URL.
Client sends a POST or PUT request that includes the header:
Expect: 100-continue
The server responds with the status code:
100 Continue
What does the client send now? Does it send an entire request (the previously send request line and headers along with the previously NOT sent content)? Or does it only send the content?
I think it's the later, but I'm struggling to find concrete examples online. Thanks.
This should be all the information you need regarding the usage of a 100 Continue response.
In my experienced this is really used when you have a large request body. It could be considered to be roughly complementary to the HEAD method in relation to GET requests - fetch just the header information and not the body (usually) to reduce network load. 100 responses are used to determine whether the server will accept the request based purely on the headers - so that, for example, if you try and send a large POST/PUT request to a non-existent server resource it will result in a 404 before the entire request body has been sent.
So the short answer to your question is - yes, it's the latter. Although, you should always read the RFC for a complete picture. RFC2616 contains 99% of the information you would ever need to know about HTTP - there are some more recent RFCs that settle some ambiguities and offer a few small extensions to the protocol but off the top of my head I can't remember what they are.
I am setting up a back end API in a script of mine that contacts one of my sites by sending XML to my web server in the form of POST data. This script will be used by many and I want to limit the bandwidth waste for people that accidentally turn the feature on without a proper access key.
I will be denying requests that do not have the correct access key by maybe generating a 403 access code.
Lets say the POST data is ~500kb of data. Does the server receive all 500kb of data when this attempt is made regardless of the status code?
How about if I made the url contain the key mydomain/api/123456789 and generate 403 status on all bad access keys.
Does the POST data still get sent/received regardless or is it negotiated before the data is finally sent.
Thanks in advance!
Generally speaking, the entire request will be sent, including post data. There is often no way for the application layer to return a response like a 403 until it has received the entire request.
In reality, it will depend on the language/framework used and how closely it is linked to the HTTP server. Section 8.2.2 of RFC2616 HTTP/1.1 specification has this to say
An HTTP/1.1 (or later) client sending
a message-body SHOULD monitor the
network connection for an error status
while it is transmitting the request.
If the client sees an error status, it
SHOULD immediately cease transmitting
the body. If the body is being sent
using a "chunked" encoding (section
3.6), a zero length chunk and empty trailer MAY be used to prematurely
mark the end of the message. If the
body was preceded by a Content-Length
header, the client MUST close the
connection.
So, if you can find a language environemnt closely linked with the HTTP server (for example, mod_perl), you could do this in a way which does comply with standards.
An alternative approach you could take is to make an initial, smaller request to obtain a URL to use for the larger POST. The application can then deny providing the URL to clients without an appropriate key.
Here is great book about RESTful Web Services, where it's explained how HTTP works: http://oreilly.com/catalog/9780596529260
You can consider any request as envelope, where on top of it it's written address (URL), some properties (HTTP Headers) and inside it there's some data (if request is initiated by post method). So as you might guess you can't receive envelope partially.
Oh I forgot, it's when you are using HTTP Post with standard HTTP header "application/x-www-form-urlencoded" but if you are uploading files (correspondingly using ""multipart/form-data") Django gives you control over streamed chunks of files using Middleware classes: http://docs.djangoproject.com/en/dev/topics/http/middleware/