obtain single website url from a bunch of http packets? - http

I'm newbie in network programming so please forgive me any mistakes.
I'm writing a simple sniffer, which should detect just URLs of websites requested by the user. I'm using pcap.net and I'm able to capture http packets (with tcp port 80 filter) and retrieve data from them. What I can't do is getting a single URI for the request which caused many http packets to come.
For example,
1. a user requests (from a browser) www.website.com
2. many http responses come, one of which is text/html for www.website.com
3. www.website.com contains resources from other html pages, so many other packets from other hosts are coming.
Is there a way to ignore the packets from the resources? Do I have to make some tcp session reconstruction? I've been googling for 2 days but couldn't find anything useful, so please help.

The HTTP responses from other hosts can be identified since they would probably come from different IPs, and not the IP that the request was sent to.
You can match HTTP requests and responses even without full TCP reconstruction by just looking at the IPs and TCP ports.
However, if you have multiple HTTP requests in the same TCP session, you will need to do TCP reconstruction to separate between the different requests and responses.

Related

How many http and https connections established in wireshark?

I ran wireshark for a while and opened a few webpages now my class assignment is asking me to find out the number of http and https connections established. Can you suggest commands or filters to find these out?
HTTP is connection-less, that means client and server knows about each other during current request and response only. You can open the .pcapng file and then
Click on “Statistics”
Select “HTTP”
Select “Packet Counter“
In my case, I can 14 requests and 14 responses. (Success 2xx).

DDos Attack Under HTTP

We have received a DDOS attack with the same pattern with all requests:
Protocol HTTP
GET
Random IP Address
Loading the home page /
Our server was returning a 301 to all requests and we had problems with the performance, the server was down.
We have blocked all requests coming from the HTTP and we have stopped the attack, we would like to know why we are receiving the attack to our servers under HTTP and not HTTPS from different sources, we would like to know if the source IP could only be changed using HTTP requests?
What's the best way to prevent this kind of attacks?
Our server right now is only working with HTTPS without issues. Server running on Azure Web Apps.
We have blocked all requests coming from the HTTP and we have stopped the attack.
Please note, when people type in your URL in browser manually, the first hit is usually over HTTP. If you turn off HTTP, people will not be able to access the the site by simply typing in your domain name.
we would like to know why we are receiving the attack to our servers under HTTP and not HTTPS from different sources
This is by the attacker to decide. Most probably it was only coincidence that the attack went over HTTP only.
we would like to know if the source IP could only be changed using HTTP requests?
No. For an HTTP request to be performed you need to do TCP handshake first. This means, that you can not fake the IP address easily, as you need to actively participate in the communication and the routers must see you as a valid participants. You can fake the IP while being in the same local network but it would be only for one packet and would not allow to perform a TCP handshake correctly.
What's the best way to prevent this kind of attacks?
We're still struggling with DDOS and there is no 100% solution. An attack of sufficient scale can turn down the internet as it did in the past already. There are some things you can do like:
Rate limiting - put some brakes on the incoming traffic not to kill your infrastructure completely. You will loose some valid traffic, but you will be up and running.
Filtering - pain when dealing with DDOS attacks. Analyse which IP addresses are attacking you constantly. Filter them on your firewall. (Imagine the fun when you are being attacked by 100k IoT devices). A WAF (Web Application Firewall) may allow you to filter not only on IP addresses but also on other request parameters too.
Scaling up - more infrastructure can do more.
In most cases all you need to do is survive till the attack is over.

Why can't I filter http in Wireshark?

I am new to networking and Wireshark in general, but there's one thing I'm really stuck on.
I opened Wireshark, started sniffing, then I went to different websites both using HTTP and HTTPS.
I stopped the sniffing and tried to filter out "http" and "https" (different tries), but it shows nothing.
In the Protocol column I see only TCP, UPD, DNS, TLSv1 etc. But no HTTP and not HTTPS.
What may cause this?
Thanks
Wireshark cannot see application data because it is encrypted with TLS. That's why Wireshark use TLS and TLS version in protocol column instead of HTTPS.
Almost all big website are using HTTPS todays. When you try to use HTTP the connection will be redirected to HTTPS. There are different redirection methods and it is possible the Wireshark cannot get enough data to know the communication is HTTP or not. That's why you can see TCP in protocol column instead of HTTP.
So You can filter packets with TCP ports:
tcp.port == 80 or tcp.port==443
You can filter the HTTP traffic with the help of Wireshark easily.
Refer : How many http and https connections established in wireshark?
Even you can get the additional information , like source IP, destination IP, flags etc from hex format. Refer Link,
How to obtain the source IP from a Wireshark dump of an HTTP GET request

How does a browser know which response belongs to which request?

Suppose when we request a resource over HTTP, we get a response as shown below:
GET / HTTP/1.1
Host: www.google.co.in
HTTP/1.1 200 OK
Date: Thu, 20 Apr 2017 10:03:16 GMT
...
But when a browser requests many resources at a time, how can it identify which request got which response?
when a browser requests many resources at a time, how can it identify which request got which response?
A browser can open one or more connections to a web server in order to request resources. For each of those connections the rules regarding HTTP keep-alive are the same and apply to both HTTP 1.0 and 1.1:
If HTTP keep-alive is off, the request is sent by the client, the response is sent by the server, the connection is closed:
Connection 1: [Open][Request1][Response1][Close]
If HTTP keep-alive is on, one "persistent" connection can be reused for succeeding requests. The requests are still issued serially over the same connection, so:
Connection 1: [Open][Request1][Response1][Request3][Response3][Close]
Connection 2: [Open][Request2][Response2][Request4][Response4][Close]
With HTTP Pipelining, introduced with HTTP 1.1, if it is enabled (on most browsers it is by default disabled, because of buggy servers), browsers can issue requests after each other without waiting for the response, but the responses are still returned in the same order as they were requested.
This can happen simultaneously over multiple (persistent) connections:
Connection 1: [Open][Request1][Request2][Response1][Response2][Close]
Connection 2: [Open][Request3][Request4][Response3][Response4][Close]
Both approaches (keep-alive and pipelining) still utilize the default "request-response" mechanism of HTTP: each response will arrive in the order of the requests on that same connection. They also have the "head of line blocking" problem: if [Response1] is slow and/or big, it holds up all responses that follow on that connection.
Enter HTTP 2 multiplexing: What is the difference between HTTP/1.1 pipelining and HTTP/2 multiplexing?. Here, a response can be fragmented, allowing a single TCP connection to transmit fragments of different requests and responses intermingled:
Connection 1: [Open][Rq1][Rq2][Resp1P1][Resp2P1][Rep2P2][Resp1P2][Close]
It does this by giving each fragment an identifier to indicate to which request-response pair it belongs, so the receiver can recompose the message.
I think you are really asking for HTTP Pipelining here. This is a technique introduced in HTTP/1.1, through which all requests would be sent out by the client in order and be responded by the server in the very same order. All the gory details are now in RFC 7230, sec. 6.3.2.
HTTP/1.0 had (or has) a comparable method known as Keep Alive. This would allow a client to issue a new request right after the previous has been answered. The benefit of this approach is that client and server no longer need to negotiate through another TCP handshake for a new request/response cycle.
The important part is that in both methods the order of the responses matches the order of the issued requests over one connection. Therefore, responses can be uniquely mapped to the issuing requests by the order in which the client is receiving them: First response matches, first request, second response matches second request, … and so forth.
I think the answer you are looking for is TCP,
HTTP is a protocol that relies on TCP to establish connection between the Client and the Host
In HTTP/1.0 a different TCP connection is created for each request/response pair,
HTTP/1.1 introduced pipelining, wich allowed mutiple request/response pair, to reuse a single TCP connection, to boost performance (Didnt work very well)
So the request and the corresponding response are linked by the TCP connection they rely on,
It's then easy to associate a specific request with the response it produced,
PS: HTTP is not bound to use TCP forever, for example google is experimenting with other transport protocols like QUIC, that might end up being more efficient than TCP for the needs of HTTP
In addition to the explanations above consider a browser can open many parallel connections, usually up to 6 to the same server. For each connection it uses a different socket. For each request-response in each socket it is easy to determine the correlation.
In each TCP connection, request and response are sequential. A TCP connection can be re-used after finishing a request-response cycle.
With HTTP pipelining, a single connection can be multiplexed for
multiple overlapping requests.
In theory, there can be any number[*1] of simultaneous TCP connections, enabling parallel requests and responses.
In practice, the number of simultaneous connections is usually limited on the browser and often on the server as well.
[*1] The number of simultaneous connections is limited by the number of ephemeral TCP ports the browser can allocate on a given system. Depending on the operating system, ephemeral ports start at 1024 (RFC 6056), 49152 (IANA), or 32768 (some Linux versions).
So, that may allow up to 65,535 - 1023 = 64,512 TCP source ports for an application. A TCP socket connection is defined by its local port number, the local IP address, the remote port number and the remote IP address. Assuming the server uses a single IP address and port number, the limit is the number of local ports you can use.

What's the difference between http request and writing http request text to tcp/ip socket on 80 port

Can someone explain the difference between the HTTP request and it handling and socket requests on 80 port. As I understood, HTTP server listen the 80 port and when someone sends an HTTP request on this port - server handle it. So when we place socket listener on port 80, and then write HTML formatted message to it - does it means that we send usual HTTP request? But as fiddler said - it false. What's the difference on a packet level? Or another lower than presentation-level between HTTP request and HTTP-formed writing to socket? Thanks.
First of all, port 80 is the default port for HTTP, it is not required. You can have HTTP servers listening on other ports as well.
Regarding the difference between "regular" HTTP requests and the ones you make yourself over a socket - there is no difference. The "regular" HTTP requests you are referring to (made by a web browser for example) are also implemented over sockets, just like you would do it manually yourself. And the same goes for the server. The implementation of the HTTP server listens for incoming socket connections and parses the data that passes there just like you would.
As long as you send in your socket valid HTTP protocol (according to the RFC), there should be no difference in the packet level (if the lower network stack is identical).
Keep in mind that the socket layer is just the layer the HTTP data always passes over. It doesn't really matter who put the data there, it just comes out from the other side the same way it was put in.
Please note that you have some degree of freedom when implementing an HTTP yourself. There are many optional fields and the order of the headers doesn't matter. So it is possible that two different HTTP implementations will be different in the packet level, but will behave basically the same.
The best way to actually see what's going on in the packet level, is by using a network sniffer - like wireshark or packetyzer. A sniffer actually records the packets of the network and shows you their content. So if you record several HTTP implementations (from various browsers) and your own socket implementation, you can make the required changes to make them identical in the packet level.

Resources