How to analyse a HTTP dump? - http

I have a file that apparently contains some sort of dump of a keep-alive HTTP conversation, i.e. multiple GET requests and responses including headers, containing an HTML page and some images. However, there is some binary junk in between - maybe it's a dump on the TCP or even IP level (I'm not sure how to determine what it is).
Basically, I need to extract the files that were transferred. Are there any free tools I could use for this?

Use Wireshark.
Look into the file format for its dumps and convert your dump to it. Its very simple. Its called the pcap file format. Then you can open it in Wireshark no problem and it should be able to recognize the contents. Wireshark supports many dozens if not many hundred communication formats at various OSI layers (including TCP/IP/HTTP) and is great for this kind of debugging.

Wireshark will analyze on the packet level. If you want to analyze on the protocol level, I recommend Fiddler: http://www.fiddlertool.com/fiddler/
It will show you the headers sent, the responses, and will decrypt HTTPS sessions as well. And a ton more.

The Net tab in the Firebug plugin for Firefox might be of use.

Related

Differentiating http and http2 packets

I'm working with packets one by one and need to be able to edit both http and http2 contents.
The question is: is there a way to distinguish the two on a single packet basis?
Edit: For some additional info, the point is to read and edit large pcap files, so i'm trying to work with as little memory as possible.
On a per-packet basis, no. A single TCP packet could represent any arbitrary part of the stream. You need to capture (at least) the first part of the stream to work out whether it's HTTP or HTTP/2 (or anything else).
You can use Chrome DevTool > Network > Protocol to see the protocol used in the file transference.

What does HTTP download exactly mean?

I often hear people say download with HTTP. What does it really mean technically?
HTTP stands for Hyper Text Transfer Protocol. So to understand it literally, it is meant for text transferring. And I used some sniffer tool to monitor the wire traffic. What get transferred are all ASCII characters. So I guess we have to convert whatever we want to download into characters before transferring it via HTTP. Using HTTP URL encoding? or some binary-to-text encoding schema such as base64? But that requires some decoding on the client side.
I always think it is TCP that can transfer whatever data, so I am guessing HTTP download is a mis-used word. It arise because we view a web page via HTTP and find some downloadable link on that page, and then we click it to download. In fact, browser open a TCP connection to download it. Nothing about HTTP.
Anyone could shed some light?
The complete answer to What does HTTP download exactly mean? is in its RCF 2616 specification, that you can read here: https://www.rfc-editor.org/rfc/rfc2616
Of course that's a long (but very detailed) document.
I won't replicate or summarize its content here.
In the body of your question you are more specific:
So to understand it literally, it is meant for text transferring.
I think the word "TEXT" it misleading you.
And
have to convert whatever we want to download into characters before transferring it via HTTP
is false. You don't necessarily have to.
A file, for example a JPEG image, may be sent over the wire without any kind of encoding. See for example this: When a web server returns a JPEG image (mime type image/jpeg), how is that encoded?
Note that optionally a compression or encoding may be applied (the most common case is GZIP for textual content like html, text, scripts...) but that depends on how the client and the server agree on how the data have to be transferred. That "agreement" is made with the "Accept-Encoding" and "Content-Encoding" directives in respectively the request's and the resonse's headers.
I understand the name is misleading you, but if you read Hyper Text Transfer Protocol as a Transfer Protocol with Hypertext capabilities, then it changes a bit.
When HTTP was developed there were already lots of protocols (for example, the IP protocol, which is how data are widely transmitted between servers on the internet) but there were not protocols that allowed for easy navigation between documents.
HTTP is a protocol that allows for transferring of information AND for hyper text (i.e. links) embedded within text documents. These links don't necessarily have to point to other text documents, so you can basically transmit any information using HTTP (the sender and the receiver agree on the type of document being sent using something called the mime type).
So the name still makes sense, even if you can send things other than text files.
HTTP stands for Hyper Text Transfer Protocol. So to understand it literally, it is meant for text transferring.
Yes, text transferring. Not necessarily plain text, but all text. It doesn't mean that your text has to be readable by a person, just the computer.
And I used some sniffer tool to monitor the wire traffic. What get transferred are all ASCII characters.
Your sniffer tool knows that you're a person, so it won't just present you with 0s and 1s. It converts whatever it gets to ASCII characters to make it readable to you. Alle communication over the wire is binary. The ASCII representation is just there for your sake.
So I guess we have to convert whatever we want to download into characters before transferring it via HTTP
No, not at all. Again, it's text – not necessarily plain text.
I always think it is TCP that can transfer whatever data, [...]
Here you're right. TCP does transfer all data, but in a completely different layer. To understand this, let's look at the OSI model:
When you send anything over the network, your data goes through all the different layers. First, the application layer. Here we have HTTP and several others. Everything you send over HTTP goes through the layers, down through presentation and all the way to the physical layer.
So when you say that TCP transfers the data, then you're right (HTTP could work over other transport protocols such as UDP, but that is rarely seen), but TCP transfers all your data whether you download a file from a webserver, copy a shared folder on your local network between computers or send an email.
HTTP can transfer "binary" data just fine. There is no need to convert anything.
HTTP is the protocol used to transfer your data. In your case any file you are downloading.
You can either do that(opening another type of connection) or you can send your data as raw text. What you'll send is just what you would see when opening the file in a text editor. Your browser just decides to save the file in your Downloads folder(or whereever you want it) because it sees the file type is not supportet(.rar, .zip).
If you look at OSI model, HTTP is a protocol that lives in the application layer. So when you hear that someone uses "HTTP to transfer data" they are referring to application layer protocol. An alternative would be FTP or NFS, for example.
Browser indeed opens TCP connection, when HTTP is used. TCP lives in the transport layer and provides reliable connection on top of IP.
HTTP protocol provides different verbs that can be used to retrieve and send data, GET and POST are the most common ones. Look-up REST.

What is the better performing / more compact way to send binary data to a server in WP7

Given the no direct tcp / socket limitation in Windows Phone 7 I was wondering what is the way that has the least performance overhead and/or can send it in the most compact way.
I think I can send the data as a file using HTTP (probably with an HTTPWebRequest) and encode it as Base64, but this would increase the transfer size significantly. I could use WCF but the performance overhead is going to be large as well.
Is there a way to send plain binary data without encoding it, or some faster way to do so?
Network communication on WP7 is currently limited to HTTP only.
With that in mind you're going to have to allow for the HTTP header being included as part of the transmission. You can help keep this small by not adding any additional headers youself (unless you really have to).
In terms of the body of the message then it's up to you to keep things as small as possible.
Formatting your data as JSON will typically be smaller than as XML.
If, however, your data will always be in a specific format you could just include it as raw data. i.e. if you know that the the data will have the first n bits/bytes/characters representing one thing, then next y bits/bytes/characters represent another, etc. you could format your data without any (field) identifiers. It just depends what you need.
If you want to send binary data, then certainly some people have been using raw sockets - see
Connect to attached pc from WP7 by opening a socket to localhost
However, unless you want to write your own socket server, then HTTP is very convenient. As Matt says, you can include binary content in your HTTP requests. To do this, you can use the headers:
Content-Type: application/octet-stream
Content-Transfer-Encoding: binary
Content-Length: your length
To actually set these headers, you may need to send this as a multipart message... see questions like Upload files with HTTPWebrequest (multipart/form-data)
There's some excellent sample code on AppHub forums - http://forums.create.msdn.com/forums/p/63646/390044.aspx - shows how to upload a binary photo to Facebook.
Unless your data is very large, then it may be easier to take the 4/3 hit of Base64 encoding :) (and there are other slightly more efficient encoding types too like Ascii85 - http://en.wikipedia.org/wiki/Ascii85)

Sniff HTTP packets for GET and POST requests from an application

I am coding an SEO tool in C# for doing keyword research. I need to make calls to Google Adword keyword tool. Now I know some tools which are doing the same already.
I just need to decipher what they are doing. I tried using Wireshark but it's very complex to get the actual POST data using Wireshark.
I tried using fiddler on IE but seems like too many Javascript requests are made which confuses fiddler a lot.
If I can just find out the exact requests the other tool is making I think my job is done. How can I do this?
Put http.request.method == "POST" in the display filter of wireshark to only show POST requests. Click on the packet, then expand the Hypertext Transfer Protocol field. The POST data will be right there on top.
You will have to use some sort of network sniffer if you want to get at this sort of data and you're likely to run into the same problem (pulling out the relevant data from the overall network traffic) with those that you do now with Wireshark.

How do Download Managers download huge files on HTTP without multiple requests?

I was downloading a 200MB file yesterday with FlashGet in the statistics it showed that it was using the HTTP1.1 protocol.
I was under the impression that HTTP is a request-response protocol and most generally used for web pages weighing a few KiB...I don't quite understand how it can download MB's or GB's of data and that too simultaneously through 5(or more) different streams.
HTTP/1.1 has a "Range" header that can specify what part of a file to transfer over the connection. The download manager can make multiple connections, specifying different ranges to transfer. It would then combine the chunks together to build the full file.
There is no size limit in http. It is used for web pages, but it is also used to deliver a huge majority of the content on the Internet. It's more a matter of bandwidth that limits sizes, not the protocol itself. And of course, this was more of a limit in the early days. (and, I suppose, those still on dial-up)
These links might help:
HTTP
HTTP Persistent Connections
Chunked Transfer Encoding

Resources