Reconstruct HTTP browsing from pcap - http

I'm currently trying to automatically reconstruct an HTTP browsing only with a pcap ( basically it means matching an HTTP reply to the next HTTP requests). Most of the times, it works fine but sometimes a certain url, u, is present in the data of multiple HTTP replies.
For example, if u1 and u2 contains u in their reply data and if the request to u happens after the request to u2, how can I decide if the request to u was caused by u1 or by u2 ? Note that no request to u was made between u1 and u2.
Are there some fields in any network layer that I can use to make this match ?
Thanks!

HTTP runs on top of TCP, which is connection-oriented. You have access to the IP header of the connection used for the HTTP request (client IP/port -> server IP/Port).
HTTP is a command/response protocol, there is 1 response for each request.
So, simply look for an HTTP response immediately following the HTTP request on the same TCP connection (server IP/Port -> client IP/Port).
HTTP is state-less, the connection may be closed between requests without affecting the overall browsing model (closing connections is the required behavior in HTTP 0.9, is the default behavior in HTTP 1.0, and is not the default behavior in HTTP 1.1+), so it is possible for an HTTP response to trigger subsequent requests on new connections, so you need to be ready to handle that. The Connection header in the HTTP request will tell you whether the client is asking for the connection to remain open or not. The Connection header in the HTTP response will tell you whether the server is actually closing the connection or not after sending the response. But even if the server leaves the connection open, that is no guarantee that the client will actually reuse the same connection for later requests to the same server (though it likely will, unless a timeout elapses between requests).

Related

When does server receives http request body?

Say my client is posting a http request, of which the body is 1 TB.
The first time when my server's http handler get invoked, how much of the request boday has been in my server's tcp socket receive-q?
Edited:
Actaully what I want to know is related to the details of http.
Say the size of the whole http request is very large (big size of headers, and big size of body):
Basically, I can use one TCP request to send the whole http request(the size of which is very big). I can also split the big http request into several smaller slices and send them with several TCP requests. Which way does http protocol implement?
When can we think that the http server can start to work from the point of view of http protocol?
For example:
All the path in the URL (the part after the host:port in the URL) has been available in the server's memory? Or,
All the http headers have been available in the server's memory? Or,
All the request body has been available in the server's memory?

When a request to a proxy times out, what happens if/when the destination server eventually gets around to responding?

I'm trying to diagnose a web service that sits behind some load balancers and proxies. Under load, one of the servers along the way starts to return HTTP 504 errors, which indicates a gateway timeout. With that background out of the way, here is my question:
When a proxy makes a request to the destination server, and the destination server receives the request but doesn't respond in time (thus exceeding the timeout), resulting in a 504, what happens when the destination server does eventually respond? Does it know somehow that the requestor is no longer interested in a response? Does it happily send a response with no idea that the gateway already sent HTTP error response back to the client? Any insight would be much appreciated.
It's implementation-dependent, but any proxy that conforms to RFC 2616 section 8.1.2.1 should include Connection: close on the 504 and close the connection back to the client so it can no longer be associated with anything coming back from the defunct server connection, which should also be closed. Under load there is the potential for race conditions in this scenario so you could be looking at a bug in your proxy.
If the client then wants to make further requests it'll create a new connection to the proxy which will result in a new connection to the backend.

Proxy's Response to Asynchronous Close Events

Let's say you have an HTTP/1.1 proxy sitting between a client and a server. If connections are persistent, there is the possibility that the server will close the connection, but the client will send a request before being notified of the closure. What is the proxy's correct response to this? Does it send an HTTP error to the client or does it try to reconnect to the server?
The proxy should mimic the behaviour of the server, and close the connection - irrespective of whether there is a request in flight.
Automatically reconnecting can create unwanted side effects. The client would assume that it still has the same persistent connection and can, for example, skip authentication headers, cookies etc.
The other alternative - returning a 5xx error would also be wrong, since the client can also make incorrect assumptions about server state.
Mimicking server's behaviour is the safest and consistent option.

HTTP and Sessions

I just went through the specification of http 1.1 at http://www.w3.org/Protocols/rfc2616/rfc2616.html and came across a section about connections http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8 that says
" A significant difference between HTTP/1.1 and earlier versions of HTTP is that persistent connections are the default behavior of any HTTP connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server.
Persistent connections provide a mechanism by which a client and a server can signal the close of a TCP connection. This signaling takes place using the Connection header field (section 14.10). Once a close has been signaled, the client MUST NOT send any more requests on that connection. "
Then I also went through a section on http state management at https://www.rfc-editor.org/rfc/rfc2965 that says in its section 2 that
"Currently, HTTP servers respond to each client request without relating that request to previous or subsequent requests;"
A section about the need to have persistent connections in the RFC 2616 also said that prior to persistent connections every time a client wished to fetch a url it had to establish a new TCP connection for each and every new request.
Now my question is, if we have persistent connections in http/1.1 then as mentioned above a client does not need to make a new connection for every new request. It can send multiple requests over the same connection. So if the server knows that every subsequent request is coming over the same connection, would it not be obvious that the request is from the same client? And hence would this just not suffice to maintain the state and would this just nit be enough for the server to understand that the request was from the same client ? In this case then why is a separate state management mechanism required at all ?
Basically, yes, it would make sense, but HTTP persistent connections are used to eliminate administrative TCP/IP overhead of connection handling (e.g. connect/disconnect/reconnect, etc.). It is not meant to say anything about the state of the data moving across the connection, which is what you're talking about.
No. For instance, there might an intermediate (such as a proxy or a reverse proxy) in the request path that aggregates requests from multiple TCP connections.
See http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-21.html#intermediaries.

http push - http streaming method with ssl - do proxies interfere whith https traffic?

My Question is related to the HTTP Streaming Method for realizing HTTP Server Push:
The "HTTP streaming" mechanism keeps a request open indefinitely. It
never terminates the request or closes the connection, even after the
server pushes data to the client. This mechanism significantly
reduces the network latency because the client and the server do not
need to open and close the connection.
The HTTP streaming mechanism is based on the capability of the server
to send several pieces of information on the same response, without
terminating the request or the connection. This result can be
achieved by both HTTP/1.1 and HTTP/1.0 servers.
The HTTP protocol allows for intermediaries
(proxies, transparent proxies, gateways, etc.) to be involved in
the transmission of a response from server to the client. There
is no requirement for an intermediary to immediately forward a
partial response and it is legal for it to buffer the entire
response before sending any data to the client (e.g., caching
transparent proxies). HTTP streaming will not work with such
intermediaries.
Do I avoid the descibed problems whith proxy servers if i use HTTPS?
HTTPS doesn't use HTTP proxies - this would make security void. HTTPS connection can be routed via some HTTP proxy or just HTTP redirector by using HTTP CONNECT command, which establishes transparent tunnel to the destination host. This tunnel is completely opaque to the proxy, and proxy can't get to know, what is transferred (it can attempt to modify the dataflow, but SSL layer will detect modification and send an alert and/or close connection), i.e. what has been encrypted by SSL.
Update: for your task you can try to use one of NULL cipher suites (if the server allows) to reduce the number of operations, such as perform no encryption, anonymous key exchange etc. (this will not affect proxy's impossibility to alter your data).

Resources