HTTP Client Programming - How to know server have sent all the datas - http

I am new to HTTP client and TCP\IP programming, so my question might be vague to experienced persons but please try to answer it.
I am implementing a HTTP client , after sending request to server I am waiting for a read event(Asynchronous socket) and when the read event comes I am extracting the data using read command and storing it in a local buffer.
Here how to know that the server has sent all the data's so that I can start processing the information?
I am confused at this stage

Well the content can be returned all together or in chunks. When the server knows before hand the length of the payload, it will provide the Content-Length header in the response. But sometimes the server does not know the total length of payload before start transmitting, then it uses the chunk transfer.

The response from the server should contain a http-header which has a field named content-length. you can use that length to determine the amount of data the server should send. And you are done receiving data once the server has sent the given amount

Related

Can HTTP/2 client terminate the stream by sending a HEADER frame in bidirectional streaming RPC?

Suppose we have a bidirectional streaming RPC where the client is sending several request messages (i.e. multiple DATA frames) and the server is answering back with several response messages (i.e. multiple DATA frames).
As I understand it, when the RPC is complete, the server will normally send a HEADER frame with the status header as well as possibly some trailer headers like grpc-status and grpc-message to mark the completion of the request/response exchange.
My question is, suppose the server sends a bad response message, is it possible for the client to send the HEADER frame with the grpc-status and grpc-message headers to convey information about the error.
The reason why I'm asking is because in the c++ server code (generated from protobuf defininition), I'm struggling to find a way to get a hold of this last HEADER frame sent by the client to verify the values of the grpc-status and grpc-message headers.
Additionally, after going through the unit tests in the grpc project, it seems like only the server returns the status for the RPC, which further raises doubts.
I was however able to send the HEADER frame out from the client, but based on the above, I'm not certain whether this is the correct behavior even though I was able to do it.
I would appreciate it if someone can clarify this for me as I'm fairly new to HTTP/2 and gRPC.
Additionally, after going through the unit tests in the grpc project, it seems like only the server returns the status for the RPC, which further raises doubts.
Correct! In gRPC, the server is responsible for terminating the RPC with a status and optional trailing metadata. The client never sends a status to the server. The client can indicate it is done sending on the stream without a status (which internally happens by sending an empty data frame with the END_STREAM flag set, but users shouldn't need to be concerned with this detail). The client only sends HEADER frames at the start of the RPC.

HTTP in simple terms

I came across the term HTTP. I have done some research and wanted to ensure that I correctly understood the term.
So, is it true that HTTP, in simple words, a letter containing information in the language that both client and server can understand.
Then, that letter is sent to the server thanks to TCP/IP which serves as a car that takes that letter to the server.
Then, after the letter is delivered to the server, the server reads the content of the letter and if it is GET request, the server takes the necessary data and ATTACHES that data to the letter and sends back to the client via again TCP/IP. But if it was POST request then the client ATTACHES the DATA to the letter and sends it to the server so that it saves that data in the database.
Is that true?
Basically, it is true.
However, the server can decide what to do if it is a GET or POST or any other request(it doesn't need to e.g. append it to a file).
I will show you some additional information/try to explain it in my words:
TCP is another communication protocol protocol. It allows a client to open a connection to a server and they can communicate afterwards.
HTTP(hyper text transfer protocol) builds up on TCP.
At first, the client opens a connection to the server.
After that, the client sends the HTTP Request. The first line contains the type of the request, the path and the version. For example, it could be GET / HTTP/1.1.
The next part of the request contains the Request parameters. Every parameter is a line. The parameters are sent like the following: paramName: paramValue
This part of the request ends with an empty line.
If it is a POST Request, query parameters are added next. If it is a GET Request, these query parameters are added with the path(e.g. /index.html?paramName=paramValue)
After rescieving the Request, the server sends a HTTP Response back to the client.
The first line of the response contains the HTTP version, the status code and the status message. For example, it could be HTTP/1.1 200 OK.
Then, just like in the request, the response parameters are following. For example Content-Length: 1024.
The response parameters also end with an empty line.
The last part of the response is the body/content. For example, this could be the HTML code of the website you are visiting.
Obviously, the length of the content/body of the response has to match the Content-Length parameter(in bytes).
After that, the connection will be closed(normally). If the client to e.g. request resources, it will send another request. The server has NO POSSIBILITY to send data to the client after that unless the client sends another request(websockets can bypass this issue).
GET is meant to get the content of a site A web browser will send a GET request if you type in a URL. POST can be used to update a site but in fact, the server can decide that. POST can be also used if the server doesn't want query parameters to be shown in the address bar.
There are other methods like PATCH or DELETE that are used by some APIs.
Some important status codes (and status messages) are:
200 OK (everything went well)
204 No content (like ok but there is no body in the response)
400 Bad Request (something is wrong with the Request)
404 Not found (the requested file(the path) was not found on the server)
500 Internal server error (An error occured while processing the request)
Every status code beginning with 1 is related to inform the client of something.
If it is starting with 2, everything went right.
Status code beginning with 3 forward the client to another site.
If it starts with 4, there is a error on the client side.
Codes starting with 5 represent an error that occured on the server side.
TCP is a network protocol that establishes a connection with the server over a network (or the Internet) and allows two-way communication. The HTTP will traffic inside this TCP tunnel. TCP is a very useful protocol that helps keep things sane, it ensures data packets are read in the correct order and that packets that went missing during transmission are sent again.
Sometimes there will be another protocol layer between HTTP and TCP, called SSL. It is responsible for encrypting the data that traffics over TCP, so that it is transmitted safely over unsafe networks. This is know as HTTPS, and is just HTTP but using this additional layer.
Although almost always true, HTTP doesn't necessarily uses TCP. UPnP requests use HTTP over UDP, a network protocol that uses standalone packets instead of a connection.
HTTP is a plain text protocol, meaning it's designed in such a way that a human can understand it without using any tools. This is very convenient for learning.
If you're using Firefox or Chrome, you can press Ctrl-Shift-C to open the Developer Tools, and under the Network tab you will see every HTTP request your browser is making, see exactly what's the request, what the server answered etc, and get a better view of how this protocol works.
Explaining it in details is... too extensive for this answer. But as you will see it's not that complicated.

Response sent in chunked transfer encoding and indicating errors happening after some data has already been sent

I am sending large amount of data in my response to the client in chunked transfer encoding format.
How should I be dealing with any errors that occur midway during writing of the response?
I would like to know if there is any HTTP Spec recommended practice regarding this for clients to know that indeed the response is not a successful one but that the server ran into some issue.
Once you have started sending the HTTP headers to the client, you can't send anything else. You have to finish sending the response you intended to send, ie the chunked data and associated headers. If an error occurs midway through that, there is no way to report that error to the client. All you can do is close the connection. Either the client does not receive all of the headers, or it does not receive the terminating 0-length chunk at the end of the response. Either way is enough for the client to know that the server encountered an error during sending.
Chunked responses can use trailer header with some integrity check or post-processing status. This may be better than closing the connection as described in accepted answer because you can have custom error message. However, you only benefit from that if you have control of both server and client.

How does a http client associate an http response with a request (with Netty) or in general?

Is a http end point suppose to respond to requests from a particular client in order that they are received?
What about if it doesn't make sense to in the case of requests handled by cluster behind a proxy or in requests handled with NIO where one request is finished faster than the other?
Is there a standard way of associating a unique id with each http request to associate with the response? How is this handled in clients like http componenets httpclient or curl?
The question comes down to the following case:
Suppose, I am downloading a file from a server and the request is not finished. Is a client capable of completing other requests on the same keep-alive connection?
Whenever a TCP connection is opened, the connection is recognized by the source and destination ports and IP addresses. So if I connect to www.google.com on destination port 80 (default for HTTP), I need a free source port which the OS will generate.
The reply of the web server is then sent to the source port (and IP). This is also how NAT works, remembering which source port belongs to which internal IP address (and vice versa for incoming connections).
As for your edit: no, a single http connection can execute one command (GET/POST/etc) at the same time. If you send another command while you are retreiving data from a previously issued command, the results may vary per client and server implementation. I guess that Apache, for example, will transmit the result of the second request after the data of the first request is sent.
I won't re-write CodeCaster's answer because it is very well worded.
In response to your edit - no. It is not. A single persistent HTTP connection can only be used for one request at once, or it would get very confusing. Because HTTP does not define any form of request/response tracking mechanism, it simply would not be possible.
It should be noted that there are other protocols which use a similar message format (conforming to RFC822), which do allow for this (using mechanisms such as SIP's cSeq header), and it would be possible to implement this in a custom HTTP app, but HTTP does not define any standard mechanism for doing this, and therefore nothing can be done that could be assumed to work everywhere. It would also present a problem with the response for the second message - do you wait for the first response to finish before sending the second response, or try and pause the first response while you send the second response? How will you communicate this in a way that guarantees messages won't become corrupted?
Note also that SIP (usually) operates over UDP, which does not guarantee packet ordering, making the cSeq system more of a necessity.
If you want to send a request to a server while another transaction is still in progress, you will need to create a new connection to the server, and hence a new TCP stream.
Facebook did some research into this while they were building their CDN, and they concluded that you can efficiently have 2 or 3 open HTTP streams at any one time, but any more than that reduces overall transfer time because of the extra packet overhead cost. I would link to the blog entry if I could find the link...

Detecting missing responses to long running HTTP (SOAP) requests

I need a way to detect a missing response to a long running HTTP POST request. This problem arises when the network infrastructure (firewalls, proxies, unplugged cables, etc.) drops the response packets. The server may detect this failure, but the client cannot send additional bytes after the POST to probe the state of the TCP connection. The failure may be limited to a single TCP connection. For example I may be able to subsequently open a new TCP connection to the server.
I'm looking for a solution that still uses HTTP POST and does not change the duration of the server side processing.
Some solutions that I can think of are:
Provide a side channel interface to retrieve request & response history. If the history lists the response as having been send (presumably resulting in a TCP error) but I have not yet received it within a reasonable time I can generate a local error.
Use an X header to request that the server deliver "spurious" 100 Continue provisional responses on a regular interval. If I fail to see an expected 100 Continue or a non-provisional response I can generate a local error.
Is there a state of the art solution for this problem?
It sounds to me like you are using Soap for something that would be much better done using a stateful connection, or a server side push technology.

Resources