Server sends response without fully receiving the request - http

If browser sends big http request (file upload),
and server notices that the file is bigger than the server could handle.
The server sends some error message back without recieving the whole request.
Are all (or any) browsers able to read this server response, or they will wait with reading the response if they are not ready with sending the request?
http protocol is "request - response".
Does it mean the request must be completed before browser starts to wait for response?
Thanks

The first line in an HTTP server response indicates whether the client request was successful or not, and why. The status is given with a three-digit server response code (also known as a status code) and a descriptive message.
Status codes are usually generated by Web servers, but they might also be generated by CGI scripts that bypass the server's precooked headers and supply their own. Status codes are grouped as follows:
Code Range Response Meaning
100-199 Informational
200-299 Client request successful
300-399 Client request redirected, further action necessary
400-499 Client request incomplete
500-599 Server errors
HTTP defines only a few specific codes in each range, although servers may define their own as needed. If a client receives a code that it does not recognize, it should understand its basic meaning from its numerical range. While most Web browsers handle codes in the 100-, 200-, and 300- range silently, some error codes in the 400- and 500- range are commonly reported back to the user (e.g., "404 Not Found").

Related

HTTP in simple terms

I came across the term HTTP. I have done some research and wanted to ensure that I correctly understood the term.
So, is it true that HTTP, in simple words, a letter containing information in the language that both client and server can understand.
Then, that letter is sent to the server thanks to TCP/IP which serves as a car that takes that letter to the server.
Then, after the letter is delivered to the server, the server reads the content of the letter and if it is GET request, the server takes the necessary data and ATTACHES that data to the letter and sends back to the client via again TCP/IP. But if it was POST request then the client ATTACHES the DATA to the letter and sends it to the server so that it saves that data in the database.
Is that true?
Basically, it is true.
However, the server can decide what to do if it is a GET or POST or any other request(it doesn't need to e.g. append it to a file).
I will show you some additional information/try to explain it in my words:
TCP is another communication protocol protocol. It allows a client to open a connection to a server and they can communicate afterwards.
HTTP(hyper text transfer protocol) builds up on TCP.
At first, the client opens a connection to the server.
After that, the client sends the HTTP Request. The first line contains the type of the request, the path and the version. For example, it could be GET / HTTP/1.1.
The next part of the request contains the Request parameters. Every parameter is a line. The parameters are sent like the following: paramName: paramValue
This part of the request ends with an empty line.
If it is a POST Request, query parameters are added next. If it is a GET Request, these query parameters are added with the path(e.g. /index.html?paramName=paramValue)
After rescieving the Request, the server sends a HTTP Response back to the client.
The first line of the response contains the HTTP version, the status code and the status message. For example, it could be HTTP/1.1 200 OK.
Then, just like in the request, the response parameters are following. For example Content-Length: 1024.
The response parameters also end with an empty line.
The last part of the response is the body/content. For example, this could be the HTML code of the website you are visiting.
Obviously, the length of the content/body of the response has to match the Content-Length parameter(in bytes).
After that, the connection will be closed(normally). If the client to e.g. request resources, it will send another request. The server has NO POSSIBILITY to send data to the client after that unless the client sends another request(websockets can bypass this issue).
GET is meant to get the content of a site A web browser will send a GET request if you type in a URL. POST can be used to update a site but in fact, the server can decide that. POST can be also used if the server doesn't want query parameters to be shown in the address bar.
There are other methods like PATCH or DELETE that are used by some APIs.
Some important status codes (and status messages) are:
200 OK (everything went well)
204 No content (like ok but there is no body in the response)
400 Bad Request (something is wrong with the Request)
404 Not found (the requested file(the path) was not found on the server)
500 Internal server error (An error occured while processing the request)
Every status code beginning with 1 is related to inform the client of something.
If it is starting with 2, everything went right.
Status code beginning with 3 forward the client to another site.
If it starts with 4, there is a error on the client side.
Codes starting with 5 represent an error that occured on the server side.
TCP is a network protocol that establishes a connection with the server over a network (or the Internet) and allows two-way communication. The HTTP will traffic inside this TCP tunnel. TCP is a very useful protocol that helps keep things sane, it ensures data packets are read in the correct order and that packets that went missing during transmission are sent again.
Sometimes there will be another protocol layer between HTTP and TCP, called SSL. It is responsible for encrypting the data that traffics over TCP, so that it is transmitted safely over unsafe networks. This is know as HTTPS, and is just HTTP but using this additional layer.
Although almost always true, HTTP doesn't necessarily uses TCP. UPnP requests use HTTP over UDP, a network protocol that uses standalone packets instead of a connection.
HTTP is a plain text protocol, meaning it's designed in such a way that a human can understand it without using any tools. This is very convenient for learning.
If you're using Firefox or Chrome, you can press Ctrl-Shift-C to open the Developer Tools, and under the Network tab you will see every HTTP request your browser is making, see exactly what's the request, what the server answered etc, and get a better view of how this protocol works.
Explaining it in details is... too extensive for this answer. But as you will see it's not that complicated.

Does HTTP response "200 OK" give a guarantee that the document has been received by the machine who generated the HTTP request?

I have two machines, A and B.
A sends an HTTP request to B and asks for some document.
B responds back and sends the requested document and gives a 200 OK message, but machine A is complaining that the document is not received because of a network failure.
Does HTTP code 200 also work as acknowledgment that the document is received?
Does the HTTP 200 code also work as an acknowledgment that document has been received?
No. Not at all.
It is not even a guarantee that the document was completely transmitted.
The response code is in the first line of the response stream. The server could fail, or be disconnected from the client anywhere between sending the first line and the last byte of the response. The server may not even know this has happened.
In fact, there is no way that the server can know if the client received a complete (or partial) HTTP response. There is no provision for an acknowledgment in the HTTP protocol.
Now you could implement an application protocol over the top of HTTP in which the client is required to send a second HTTP request to the server to say "yes, I got the document". But this would involve some "application logic" implemented in the user's browser; e.g. in Javascript.
Absolutely not.
HTTP 200 is generated by the server, and only means that it understood the request and thinks it is able to fulfill it (e.g. the file is actually there).
All sorts of errors may occur during the transmission of the full response document (network connection breaking, packet loss, etc) which will not show up in the HTTP response, but need to be detected separately.
A pretty good guide to the HTTP protocol is found here: http://blog.catchpoint.com/2010/09/17/anatomyhttp/
You should make a distinction between the HTTP protocol and the underlying stream transport protocol, which should be reliable for HTTP purposes. The stream transport protocol will ACKnowledge all data transmission, including the response, so that both ends of exchange will affirm that the data is transmitted correctly. If the transport stream fails, then you will get a 'network failure' or similar error. When this happens, the HTTP protocol cannot continue; the data is no longer reliable or even complete.
What a 200 OK message means, at the HTTP level, is that the server has the document you're after and is about to transmit it to you. Normally you will get a content-length header as well, so you will be able to ascertain if/when the body is complete as an additional check on top of the stream protocol. From the HTTP protocol perspective, a response receives no acknowledgement, so once a response has been sent there is no verification.
However, as the stream transport is reliable, the act of sending the response will either be successful or result in an error. This does verify whether the document has been received by the network target (as noted by TripeHound, in the case of non-direct connection, e.g. a proxy, this is not a guarantee of delivery to the final target).
It's very simple to see that the 200 OK response code can't be a guarantee of anything about the response document. It's sent before the document is transmitted, so only a violation of causality could allow it to be dependent on successful reception of the document. It only serves as an indicator that the request was received properly and the server believes that it's able to fulfill the request. If the request requires extra processing (e.g. running a script), rather than just returning a static document, the response code should generally be sent after this has been completed, so it's normally an indicator that this was successful (but there are situations where this is not feasible, such as requests with persistent connections and push notifications -- the script could fail later).
On a more general level, it's never possible to provide an absolute guarantee that all messages have been received in any protocol, due to the Two Generals Problem. No acknowledgement system can get around this, because at some point there has to be a last acknowledgement; there's no way to know if this is received successfully, because that would require another acknowledgement, contradicting the premise that it was the last one.
HTTP is designed with an awareness of the possibility of various sorts of "middleboxes" - proxies operating with or without the knowledge of the client.
If there is a proxy involved, then even knowing that the server had transmitted all the data and recieved an normal close connection would not tell you anything about whether the document has been received by the machine who generated the HTTP request.
A sends a request to B. There may be all kinds of obstacles in the way that prevent the request from reaching B. In the case of https, the request may be reaching B but be rejected and it counts as if it hadn't reached B. In all these cases, B will not send any status at all.
Once the request reaches B, and there are no bugs crashing B, and no hardware failure etc. B will examine the request and determine what to do and what status to report. If A requested a file that is there and A is allowed access, B will start sending a "status 200" together with the file data.
Again all kinds of things can go wrong. A may receive nothing, or the "status 200" with no data or incomplete data etc. (By "receive" I mean that data arrives on the Ethernet cable, or through WiFi).
Usually the user of A will use some library that handles the ugly bits. With some decent library, the user can expect that they either get some error, or a status complete with the corresponding data. If a status 200 arrives at A with only half the data, the user will (depending on the design of the library) receive an error, not a status, and definitely not a status 200.
Or you may have a library that reports the status 200 and tells you "here's the first 2,000 bytes", "here's the next 2,000 bytes" and so on, and at some point when things go wrong, you might be told "sorry, there was an error, the data is incomplete".
But in general, the case that the user gets a status 200, and no data, will not happen.

How browser maps the web response back to request?

Say i make the web request(www.amazon.com) to amazon web server through browser. Browser makes the connection with Internet through Internet service providers.
Request reaches to amazon server which process it and send back the response. Two questions here :-
Does Amazon server makes new connection with internet to send the response back or incoming request(initiated by me) waits on socket till amazon process the response ?
Once my browser receives the response how does it map the response(sent from amazon) back to particular request . I believe there must be some unique identifier like
requestId must be present in response through which browser must be mapping to request. Is that correct ?
Does Amazon server makes new connection with internet to send the response back or incoming request(initiated by me) waits on socket
till amazon process the response ?
It uses the same connection. Most of the time it's not even possible to connect back to a web browser due to firewall restrictions or Network Address Translation (NAT).
Once my browser receives the request how does it map the response(sent from amazon) back to particular request . I believe
there must be some unique identifier like requestId must be present in
response through which browser must be mapping to request. Is that
correct ?
It receives the response on the same socket. So the socket is the identifier. If HTTP2 multiplexing is used, then each multiplexed stream has a stream identifier, which is used to map the response back to the request.
The client opens a TCP-connection to the server, sends an HTTP-request and the server sends the response using the same connection. So, the browser knows from the connection that the response belongs to a specific request. This applies to basic HTTP 1.
This has to be distinguished from the programming model of an AJAX web application which is asynchronous and not synchronous. The application does not actively wait for a response. It is instead triggered later when the response arrives. The connection handling described above is what happens "under the hood".
Back to the connection handling: There are optimizations of HTTP that make things more complicated. HTTP 1.1 has a feature called "keep alive" and HTTP 2 goes further into this direction. The idea is to send more data over a single TCP-connection because establishing a TCP-connection is expensive (-> three way handshake, slow start). So, multiple requests and responses are sent over a single TCP-connection. Your question arises again in case of this optimization. If e. g. there is a sequence of requests A, B and a sequence of corresponding responses B, A within a single HTTP-connection how does the browser know the request a response belongs to? HTTP 2 introduces the concept of streams (RFC 7540, section 5):
A single HTTP/2 connection can contain multiple concurrently open
streams, with either endpoint interleaving frames from multiple
streams.
The order in which frames are sent on a stream is significant.
Streams are identified by an integer.
So, the stream identifier and the order within a stream can be used by the browser to find out the request a response belongs to.
HTTP 2 introduces another interesting feature which is called "push". The client can proactively send resources to the client that the client has not even requested. So, resources like e. g. images can be already sent when the HTML is requested avoiding another communication roundtrip.
HTTP uses Transfer Control Protocol. This is how it happens-
Does Amazon server makes new connection with internet to send the response back or incoming request(initiated by me) waits on socket till amazon process the response ?
No. Most browsers use HTTP 1.1 so the connection between client and server is established only once until closed (Persistent connection).
Once my browser receives the request how does it map the response(sent from amazon) back to particular request . I believe there must be some unique identifier like requestId must be present in response through which browser must be mapping to request. Is that correct ?
There is a protocol(HTTP) on how the messages are exchanged. HTTP dictates that responses must arrive in the order they were requested. So it goes like-
Request;Response;Request;Response;Request;Response;...
And there is also a specific format of HTTP request (from your browser- HTTP client) and HTTP response message (from amazon HTTP server). There are response status codes that let the browser know if their request has been succeeded, otherwise tell the errors.
A few sample codes-

HTTP server detecting a broken network connection from a HTTP client

I have an web application in which after making a HTTP request to the server, the client quits ( or network connection is broken) before the response was completely received by the client.
In this scenario the server side of the application needs to do some cleanup work. Is there a way built into HTTP protocol to detect this condition. How does the server know if the client is still waiting for the response or has quit?
Thanks
Vijay Kumar
No, there is nothing built in to the protocol to do this (after all, you can't tell whether the response has been received by the client itself yet, or just a downstream proxy).
Just have your client make a second request to acknowledge that it has received and stored the original response. If you don't see a timely acknowedgement, run the cleanup.
However, make sure that you understand the implications of the Two Generals' Problem.
You might have a network problem... usualy, when you send a HTTP request to the server, first you send headers and then the content of the POST (if it is a post method). Likewise, the server responds with the headers and document body. The first line in the header is the status. Usually, status 200 is the success status, if you get that, then there should be no problem getting the rest of the document. Check this for details on the HTTP response status headers http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html
LE:
Sorry, missread your question. Basically, you don't have a trigger for when the user disconnects. If you use OOP, you could use the destructor of a class to clean whatever it is you need to clean.

How do I report an error midway through a chunked http repsonse without closing the connection?

I have an HTTP server that returns large bodies in response to POST requests (it is a SOAP server). These bodies are "streamed" via chunking. If I encounter an error midway through streaming the response how can I report that error to the client and still keep the connection open? The implementation uses a proprietary HTTP/SOAP stack so I am interested in answers at the HTTP protocol level.
Once the server has sent the status line (the very first line of the response) to the client, you can't change the status code of the response anymore. Many servers delay sending the response by buffering it internally until the buffer is full. While the buffer is filling up, you can still change your mind about the response.
If your client has access to the response headers, you could use the fact that chunked encoding allows the server to add a trailer with headers after the chunked-encoded body. So, your server, having encountered the error, could gracefully stop sending the body, and then send a trailer that sets some header to some value. Your client would then interpret the presence of this header as a sign that an error happened.
Also keep in mind that chunked responses can contain "footers" which are just like HTTP headers. After failing, you can send a footer such as:
X-RealStatus: 500 Some bad stuff happened
Or if you succeed:
X-RealStatus: 200 OK
you can change the status code as long as response.iscommitted() returns false.
(fot HttpServletResponse in java, im sure there exists an equivalent in other languages)

Resources