Why does HTTP use only printable characters? [duplicate] - http

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why HTTP protocol is designed in plain text way?
To show my complete ignorance of how TCP/IP works: Looking at the ASCII table, what is the rationale of HTTP using only tab, newline and [x20-x7E] for the protocol?
For example, why is "x02" ("Start of Text") not used but a double newline "x0Ax0A"?
Has it to do with any interference with TCP/IP (as in "is not allowed in Application Layer")? Is there perhaps a more trivial reason? Or a more complicated?
Duplicate of this question: Why HTTP protocol is designed in plain text way?.

Text-based protocols are easier to debug - no need for special formatting/decoding routines in your debug print code, just dump the entire request to the console. Likewise, you can make a HTTP request just by typing it into an appropriate generic TCP relay program (eg, netcat).

I can't speak as to why HTTP chose to use that particular restricted character set, but it doesn't have to do with the restrictions of TCP/IP. TCP/IP packets are structured more or less like an envelope; they have complex routing and formatting information that must be stored in a very specific format, but the actual contents of the packet payloads can be whatever you choose. The packet headers contain enough information to allow for forwarding and transmission regardless of the packet content.

Related

Figuring out what kind of payload is carried by a packet

I'm working with Scapy to parse a set of .pcap files. I would like to understand what kind of payload those packets are carrying. If I have for example a pcap file with a lot of UDP packets which payloads has the same starting bytes I don't know what kind of encoding was used, and the first values keep repeating in other packets. Is there any program or python library that could allow me to figure out or try to guess what kind of encoding was used (if for example is an RTP payload or MPEG one and so on)?
UPDATE
I was able to use nDPI on those pcap files and it gave me satisfying results for all the flows except for a set of them that it was not able to recognize. I'm going to share with you the first part of the hex representation of the data:
f1d00404d1002d7c484830320000020080073804610d00007b09040000000000010f000000000000000000000000000000000000000000000000000121e002a22e537fcccb815afafce2361b
The first part f1d004 does not change between previous and successive packets. I have already tried to decode them with different protocols using wireshark's feature "Decode as". I have tried with RTP,RTCP,RTSP,JSON,MPEG. If can be useful, this is the capture related to a camera, that's why I tried the previous protocols.

Differentiating http and http2 packets

I'm working with packets one by one and need to be able to edit both http and http2 contents.
The question is: is there a way to distinguish the two on a single packet basis?
Edit: For some additional info, the point is to read and edit large pcap files, so i'm trying to work with as little memory as possible.
On a per-packet basis, no. A single TCP packet could represent any arbitrary part of the stream. You need to capture (at least) the first part of the stream to work out whether it's HTTP or HTTP/2 (or anything else).
You can use Chrome DevTool > Network > Protocol to see the protocol used in the file transference.

What does HTTP download exactly mean?

I often hear people say download with HTTP. What does it really mean technically?
HTTP stands for Hyper Text Transfer Protocol. So to understand it literally, it is meant for text transferring. And I used some sniffer tool to monitor the wire traffic. What get transferred are all ASCII characters. So I guess we have to convert whatever we want to download into characters before transferring it via HTTP. Using HTTP URL encoding? or some binary-to-text encoding schema such as base64? But that requires some decoding on the client side.
I always think it is TCP that can transfer whatever data, so I am guessing HTTP download is a mis-used word. It arise because we view a web page via HTTP and find some downloadable link on that page, and then we click it to download. In fact, browser open a TCP connection to download it. Nothing about HTTP.
Anyone could shed some light?
The complete answer to What does HTTP download exactly mean? is in its RCF 2616 specification, that you can read here: https://www.rfc-editor.org/rfc/rfc2616
Of course that's a long (but very detailed) document.
I won't replicate or summarize its content here.
In the body of your question you are more specific:
So to understand it literally, it is meant for text transferring.
I think the word "TEXT" it misleading you.
And
have to convert whatever we want to download into characters before transferring it via HTTP
is false. You don't necessarily have to.
A file, for example a JPEG image, may be sent over the wire without any kind of encoding. See for example this: When a web server returns a JPEG image (mime type image/jpeg), how is that encoded?
Note that optionally a compression or encoding may be applied (the most common case is GZIP for textual content like html, text, scripts...) but that depends on how the client and the server agree on how the data have to be transferred. That "agreement" is made with the "Accept-Encoding" and "Content-Encoding" directives in respectively the request's and the resonse's headers.
I understand the name is misleading you, but if you read Hyper Text Transfer Protocol as a Transfer Protocol with Hypertext capabilities, then it changes a bit.
When HTTP was developed there were already lots of protocols (for example, the IP protocol, which is how data are widely transmitted between servers on the internet) but there were not protocols that allowed for easy navigation between documents.
HTTP is a protocol that allows for transferring of information AND for hyper text (i.e. links) embedded within text documents. These links don't necessarily have to point to other text documents, so you can basically transmit any information using HTTP (the sender and the receiver agree on the type of document being sent using something called the mime type).
So the name still makes sense, even if you can send things other than text files.
HTTP stands for Hyper Text Transfer Protocol. So to understand it literally, it is meant for text transferring.
Yes, text transferring. Not necessarily plain text, but all text. It doesn't mean that your text has to be readable by a person, just the computer.
And I used some sniffer tool to monitor the wire traffic. What get transferred are all ASCII characters.
Your sniffer tool knows that you're a person, so it won't just present you with 0s and 1s. It converts whatever it gets to ASCII characters to make it readable to you. Alle communication over the wire is binary. The ASCII representation is just there for your sake.
So I guess we have to convert whatever we want to download into characters before transferring it via HTTP
No, not at all. Again, it's text – not necessarily plain text.
I always think it is TCP that can transfer whatever data, [...]
Here you're right. TCP does transfer all data, but in a completely different layer. To understand this, let's look at the OSI model:
When you send anything over the network, your data goes through all the different layers. First, the application layer. Here we have HTTP and several others. Everything you send over HTTP goes through the layers, down through presentation and all the way to the physical layer.
So when you say that TCP transfers the data, then you're right (HTTP could work over other transport protocols such as UDP, but that is rarely seen), but TCP transfers all your data whether you download a file from a webserver, copy a shared folder on your local network between computers or send an email.
HTTP can transfer "binary" data just fine. There is no need to convert anything.
HTTP is the protocol used to transfer your data. In your case any file you are downloading.
You can either do that(opening another type of connection) or you can send your data as raw text. What you'll send is just what you would see when opening the file in a text editor. Your browser just decides to save the file in your Downloads folder(or whereever you want it) because it sees the file type is not supportet(.rar, .zip).
If you look at OSI model, HTTP is a protocol that lives in the application layer. So when you hear that someone uses "HTTP to transfer data" they are referring to application layer protocol. An alternative would be FTP or NFS, for example.
Browser indeed opens TCP connection, when HTTP is used. TCP lives in the transport layer and provides reliable connection on top of IP.
HTTP protocol provides different verbs that can be used to retrieve and send data, GET and POST are the most common ones. Look-up REST.

What is the better performing / more compact way to send binary data to a server in WP7

Given the no direct tcp / socket limitation in Windows Phone 7 I was wondering what is the way that has the least performance overhead and/or can send it in the most compact way.
I think I can send the data as a file using HTTP (probably with an HTTPWebRequest) and encode it as Base64, but this would increase the transfer size significantly. I could use WCF but the performance overhead is going to be large as well.
Is there a way to send plain binary data without encoding it, or some faster way to do so?
Network communication on WP7 is currently limited to HTTP only.
With that in mind you're going to have to allow for the HTTP header being included as part of the transmission. You can help keep this small by not adding any additional headers youself (unless you really have to).
In terms of the body of the message then it's up to you to keep things as small as possible.
Formatting your data as JSON will typically be smaller than as XML.
If, however, your data will always be in a specific format you could just include it as raw data. i.e. if you know that the the data will have the first n bits/bytes/characters representing one thing, then next y bits/bytes/characters represent another, etc. you could format your data without any (field) identifiers. It just depends what you need.
If you want to send binary data, then certainly some people have been using raw sockets - see
Connect to attached pc from WP7 by opening a socket to localhost
However, unless you want to write your own socket server, then HTTP is very convenient. As Matt says, you can include binary content in your HTTP requests. To do this, you can use the headers:
Content-Type: application/octet-stream
Content-Transfer-Encoding: binary
Content-Length: your length
To actually set these headers, you may need to send this as a multipart message... see questions like Upload files with HTTPWebrequest (multipart/form-data)
There's some excellent sample code on AppHub forums - http://forums.create.msdn.com/forums/p/63646/390044.aspx - shows how to upload a binary photo to Facebook.
Unless your data is very large, then it may be easier to take the 4/3 hit of Base64 encoding :) (and there are other slightly more efficient encoding types too like Ascii85 - http://en.wikipedia.org/wiki/Ascii85)

TCP/IP programming, data in more than one packet

I am writing an application in C, using libpcap. My program listens for new packets and parses them
according to a grammar. The payload actually is XML.
Sometimes one packet is not enough for an XML file, so the XML buffer is splitted into separate packets.
I want to add code logic in order to handle these cases. However I don't know in advance that a packet does not contain the whole data. How do I know that a packet has more data that will be send next? How to i recognize that a new packet contains the rest of the data?
Do I have to use the TH_FIN flag? Could you please explain it to me?
There's nothing in TCP that defines packets, that's up to the higher layers to define if they need to - TCP is just a stream.
If this is raw XML over a TCP stream, you actually need to parse the xml - you'll know when you have a whole xml document when you've received the end of the document element.
If it's XML packaged over HTTP , you might be able to parse out the Content-Length: header which should contain the length of the body.
Note, reassembling a TCP stream from captured packets is a very hard problem, there's a lot of corner cases, e.g. you'd need to handle retransmission , out of sequence tcp segments and many more. http://libnids.sourceforge.net/ might help you.
As Anon say use a higher level stream library.
But even then you need to know the chunk side before starting to handle it, as you will read from the stream in block's of n bytes.
Thus you want to first send in binary the number of bytes to be sent, then send x bytes, and repeat, thus when you are receiving the chucks via select/read to know went you have all of chunk one to pass to the processor.
If you're using TCP, use a TCP library that gives you the data as a stream instead of trying to handle the packets yourself.
Stream is good. Another option is to store the incoming data in a buffer (eg char*) and search for application messaging framing characters or in the case of Xml, the root end tag. Once you've found a complete xml message at the front of the buffer, pull it out and process.
The XMPP instant messaging protocol, used by Jabber, has means to move XML chunks over a TCP stream. I don't know how exactly it is done myself, but RFC 3290 is the protocol definition. You should be able to work it out from that.

Resources