Is data sent via HTTP POST when the Server does not exist? - http

I work for a large-ish advertising company. We've created a very lightweight clone of the PayPal IPN so we can offer CC Processing services for our top advertisers.
Like the PP IPN, it's a simple RESTful interface.
I deliberately instructed our admin guys to configure the vhost for this web app to only respond to requests on port 443.
This particular question is beyond my HTTP Protocol knowledge:
This may vary from browser to browser, but when a user submits a form, and the ACTION for that form is, say http://www.somesite.com, if the browser cannot resolve that site, does the post payload ever get sent over the wire?
I know this is a bit esoteric and it's more of an implementation question than something that exists in the HTTP RFC (as far as I could tell). Any takers?

Before sending any data the browser needs to open a TCP connection to the target site. Since this connection to the target site cannot be opened in the first place, no data can be sent.
Update (Thanks for the hint in the comments):
Use HTTP-Requests like POST to avoid sending data over the wire which could be intercepted by proxies before the existence of the target could be checked. With proxies the TCP-connection is always established successfully and the HTTP-request-header is sent to it. The POST-request contains the additional data in his request-body which should be sent only if the request header returns no error. Nevertheless, the implementation of proxies differ and I cannot guarantee that there is no proxy which returns an error if the target-site is non-existing. But in such a case I don't know any way where you could avoid sending the complete data over the wire...

Related

HTTP in simple terms

I came across the term HTTP. I have done some research and wanted to ensure that I correctly understood the term.
So, is it true that HTTP, in simple words, a letter containing information in the language that both client and server can understand.
Then, that letter is sent to the server thanks to TCP/IP which serves as a car that takes that letter to the server.
Then, after the letter is delivered to the server, the server reads the content of the letter and if it is GET request, the server takes the necessary data and ATTACHES that data to the letter and sends back to the client via again TCP/IP. But if it was POST request then the client ATTACHES the DATA to the letter and sends it to the server so that it saves that data in the database.
Is that true?
Basically, it is true.
However, the server can decide what to do if it is a GET or POST or any other request(it doesn't need to e.g. append it to a file).
I will show you some additional information/try to explain it in my words:
TCP is another communication protocol protocol. It allows a client to open a connection to a server and they can communicate afterwards.
HTTP(hyper text transfer protocol) builds up on TCP.
At first, the client opens a connection to the server.
After that, the client sends the HTTP Request. The first line contains the type of the request, the path and the version. For example, it could be GET / HTTP/1.1.
The next part of the request contains the Request parameters. Every parameter is a line. The parameters are sent like the following: paramName: paramValue
This part of the request ends with an empty line.
If it is a POST Request, query parameters are added next. If it is a GET Request, these query parameters are added with the path(e.g. /index.html?paramName=paramValue)
After rescieving the Request, the server sends a HTTP Response back to the client.
The first line of the response contains the HTTP version, the status code and the status message. For example, it could be HTTP/1.1 200 OK.
Then, just like in the request, the response parameters are following. For example Content-Length: 1024.
The response parameters also end with an empty line.
The last part of the response is the body/content. For example, this could be the HTML code of the website you are visiting.
Obviously, the length of the content/body of the response has to match the Content-Length parameter(in bytes).
After that, the connection will be closed(normally). If the client to e.g. request resources, it will send another request. The server has NO POSSIBILITY to send data to the client after that unless the client sends another request(websockets can bypass this issue).
GET is meant to get the content of a site A web browser will send a GET request if you type in a URL. POST can be used to update a site but in fact, the server can decide that. POST can be also used if the server doesn't want query parameters to be shown in the address bar.
There are other methods like PATCH or DELETE that are used by some APIs.
Some important status codes (and status messages) are:
200 OK (everything went well)
204 No content (like ok but there is no body in the response)
400 Bad Request (something is wrong with the Request)
404 Not found (the requested file(the path) was not found on the server)
500 Internal server error (An error occured while processing the request)
Every status code beginning with 1 is related to inform the client of something.
If it is starting with 2, everything went right.
Status code beginning with 3 forward the client to another site.
If it starts with 4, there is a error on the client side.
Codes starting with 5 represent an error that occured on the server side.
TCP is a network protocol that establishes a connection with the server over a network (or the Internet) and allows two-way communication. The HTTP will traffic inside this TCP tunnel. TCP is a very useful protocol that helps keep things sane, it ensures data packets are read in the correct order and that packets that went missing during transmission are sent again.
Sometimes there will be another protocol layer between HTTP and TCP, called SSL. It is responsible for encrypting the data that traffics over TCP, so that it is transmitted safely over unsafe networks. This is know as HTTPS, and is just HTTP but using this additional layer.
Although almost always true, HTTP doesn't necessarily uses TCP. UPnP requests use HTTP over UDP, a network protocol that uses standalone packets instead of a connection.
HTTP is a plain text protocol, meaning it's designed in such a way that a human can understand it without using any tools. This is very convenient for learning.
If you're using Firefox or Chrome, you can press Ctrl-Shift-C to open the Developer Tools, and under the Network tab you will see every HTTP request your browser is making, see exactly what's the request, what the server answered etc, and get a better view of how this protocol works.
Explaining it in details is... too extensive for this answer. But as you will see it's not that complicated.

how does HTTP "session" reconstruction work?

I have found this tool online: http://www.unleashnetworks.com/products/unsniff.html
How does this work? Are they assuming that all HTTP traffic for a session occurs in the same TCP session, and then just clumping all that data together? Is that a safe assumption?
I was under the impression that when I load a page, multiple TCP sessions could be running for that single page load (images, videos, flash, whatever).
This seems to get complicated when I think about having two browser tabs open that are loading pages at the same time..how could I differentiate one http "session" from another? Especially true if they are hitting the same page, right?
For that matter, how does the browser know which data incoming belongs to which tab? Does it keep track of TCP sessions belonging to an individual tab?
Edit:
When HTTP session is mentioned above, I am referring to all of the related HTTP transactions that it takes to, say, load a page.
By TCP session, I am literally referring to the handshake's SYN -> FIN packet lifetime.
Although it might not be visible, the HTTP Session tracker is being passed to the server from the client as a parameter or as e cookie (header)
You might need to read about HTTP session token
A session token is a unique identifier that is generated and sent from a server to a client to identify the current interaction session. The client usually stores and sends the token as an HTTP cookie and/or sends it as a parameter in GET or POST queries. The reason to use session tokens is that the client only has to handle the identifier—all session data is stored on the server (usually in a database, to which the client does not have direct access) linked to that identifier. Examples of the names that some programming languages use when naming their HTTP cookie include JSESSIONID (JSP), PHPSESSID (PHP), and ASPSESSIONID (ASP).
I am not familiar with the "Unsniff" app you link to, but I have used a few packet sniffers before (my favorite is Wireshark). Usually you can differentiate sessions based on what host they are connected to. So, for instance, if you have 2 tabs open and one is opened to www.google.com and the other is www.facebook.com, the packet sniffer should be able to tell you which session is pointed at which host (or at least give you an IP address, which you can then use to find the host. see: reverse lookup).
Most times, multiple HTTP sessions will be open to one host. This is the case when you're loading a site's various resources (CSS files, images, javascript, etc.). Each of these resources will show up as a separate HTTP session (unless, of course, the connection is persistent... but your sniffer should be able to separate them anyway). In this case, you (or the sniffer) will need to determine what was downloaded by looking at the actual data within the HTTP packet.

How does a http client associate an http response with a request (with Netty) or in general?

Is a http end point suppose to respond to requests from a particular client in order that they are received?
What about if it doesn't make sense to in the case of requests handled by cluster behind a proxy or in requests handled with NIO where one request is finished faster than the other?
Is there a standard way of associating a unique id with each http request to associate with the response? How is this handled in clients like http componenets httpclient or curl?
The question comes down to the following case:
Suppose, I am downloading a file from a server and the request is not finished. Is a client capable of completing other requests on the same keep-alive connection?
Whenever a TCP connection is opened, the connection is recognized by the source and destination ports and IP addresses. So if I connect to www.google.com on destination port 80 (default for HTTP), I need a free source port which the OS will generate.
The reply of the web server is then sent to the source port (and IP). This is also how NAT works, remembering which source port belongs to which internal IP address (and vice versa for incoming connections).
As for your edit: no, a single http connection can execute one command (GET/POST/etc) at the same time. If you send another command while you are retreiving data from a previously issued command, the results may vary per client and server implementation. I guess that Apache, for example, will transmit the result of the second request after the data of the first request is sent.
I won't re-write CodeCaster's answer because it is very well worded.
In response to your edit - no. It is not. A single persistent HTTP connection can only be used for one request at once, or it would get very confusing. Because HTTP does not define any form of request/response tracking mechanism, it simply would not be possible.
It should be noted that there are other protocols which use a similar message format (conforming to RFC822), which do allow for this (using mechanisms such as SIP's cSeq header), and it would be possible to implement this in a custom HTTP app, but HTTP does not define any standard mechanism for doing this, and therefore nothing can be done that could be assumed to work everywhere. It would also present a problem with the response for the second message - do you wait for the first response to finish before sending the second response, or try and pause the first response while you send the second response? How will you communicate this in a way that guarantees messages won't become corrupted?
Note also that SIP (usually) operates over UDP, which does not guarantee packet ordering, making the cSeq system more of a necessity.
If you want to send a request to a server while another transaction is still in progress, you will need to create a new connection to the server, and hence a new TCP stream.
Facebook did some research into this while they were building their CDN, and they concluded that you can efficiently have 2 or 3 open HTTP streams at any one time, but any more than that reduces overall transfer time because of the extra packet overhead cost. I would link to the blog entry if I could find the link...

Why HTTP was designed to be a pull protocol?

I was watching many presentations about Html 5 WebSockets , where server can initialize connection with client and push the data without the request from the client.
We don't need Polling etc.
And , I am curious , why Http was designed as a "pull" and not full duplex protocol in the first place ? What where the reasons behind that kind of decision ?
Because when http was first designed it was meant to be used to retrieve documents from a server. And the easiest way to do is when the client asks the server for a document and gets it delivered as response (or an error in case it does not exist). When you have push protocol that means the server would need to keep client connections around for potentially a long time creating more resource management problems - remember we are talking about early 1990s here.
Http was designed for simply retrieving hypertext documents from a server. There were no reasons to push anything to the client when the pages were just pure, static html without scripting capabilities.
Since there was no need at the time for pushing things back to the client, the protocol was kept simple.
HTTP is mainly a pull protocol—someone loads information on a Web server and
users use HTTP to pull the information from the server at their convenience. In particular,
the TCP connection is initiated by the machine that wants to receive the file.

HTTP server detecting a broken network connection from a HTTP client

I have an web application in which after making a HTTP request to the server, the client quits ( or network connection is broken) before the response was completely received by the client.
In this scenario the server side of the application needs to do some cleanup work. Is there a way built into HTTP protocol to detect this condition. How does the server know if the client is still waiting for the response or has quit?
Thanks
Vijay Kumar
No, there is nothing built in to the protocol to do this (after all, you can't tell whether the response has been received by the client itself yet, or just a downstream proxy).
Just have your client make a second request to acknowledge that it has received and stored the original response. If you don't see a timely acknowedgement, run the cleanup.
However, make sure that you understand the implications of the Two Generals' Problem.
You might have a network problem... usualy, when you send a HTTP request to the server, first you send headers and then the content of the POST (if it is a post method). Likewise, the server responds with the headers and document body. The first line in the header is the status. Usually, status 200 is the success status, if you get that, then there should be no problem getting the rest of the document. Check this for details on the HTTP response status headers http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html
LE:
Sorry, missread your question. Basically, you don't have a trigger for when the user disconnects. If you use OOP, you could use the destructor of a class to clean whatever it is you need to clean.

Resources