What is the smallest possible HTTP and HTTPS data request - http

Lets say I want to GET one byte from a server using the HTTP protocol and I want to minimize everything. No headers just http://myserver.com/b, where b is a text file with one character in it, or better still b is just one character (not sure if that is possible).
Is there a way to do this with Apache and what is the smallest possible amount of data that is required for complete http and, complete HTTPS transactions?
Alternatively, the transaction could be done with just a head request if that is more data efficient.

If you're planning to use HTTP/1.1 (more or less require if you end up on a virtual host), your GET request will need to have the host name, either in the Host header or as an absolute URI in the request line (see RFC 2616 section 5.1.2).
Your response will also need a Content-Length or transfer encoding headers and delimiters.
If you're willing to "break" HTTP by using a HEAD request, it sounds like HTTP might not be the best choice of protocol. You might also be able to return something in a custom header, but that's not a clean way of doing it.
Note that, even if you implement your own protocol, you will need to implement a mechanism similar to what Content-Length or chunked encoding provide, to be able to determine when to stop reading from the remote party (otherwise, you won't be able to detect badly closed connections).
EDIT:
Here is a quick example, this will vary depending on your host name (assuming HTTP 1.1). I guess you could use OPTIONS instead. It depends on how much you're willing to break HTTP...
Request:
GET / HTTP/1.1
Host: www.example.com
That's 14 + 2 + 21 + 2 + 2 = 41 bytes (2 for CRLF)
Response:
HTTP/1.1 200 OK
Content-Length: 1
Content-Type: text/plain
a
That's 15 + 2 + 17 + 2 + 24 + 2 + 2 + 1 = 65 bytes
For HTTPS, there will be a small overhead for the SSL/TLS channel itself, but the bulk of it will be taken by the handshake, in particular, the server certificate (assuming you're not using client-cert authentication) should be the biggest. Check the size (in DER format) of your certificate.

What exactly are you trying to achieve, is this a sort of keep alive?
You could do a "GET /", which implies HTTP/1.0 being used, but that locks you out of stuff like virtual hosting etc. You can map "/" to a cgi-script, it doesn't need to be a real file, depending on what you're trying to achieve. You can configure Apache to only return the minimum set of headers, which would basically be "Content-Type: text/plain" (or another, shorter mime type, possibly custom mimetype e.g. "Content-Type: a/b") and "Content-Length: 0", thus not returning a response body at all.

It is an old question, but maybe someone found it useful, because nobody has answered the HTTPS part of the question.
For me this was needed for an easy validation of HTTPS communication in my proxy, which connects untrustable other proxies through tunnel.
This site explains it clearly: http://netsekure.org/2010/03/tls-overhead/
Quotes from the article:
One thing to keep in mind that will influence the calculation is the variable size of most of the messages. The variable nature will not allow to calculate a precise value, but taking some reasonable average values for the variable fields, one can get a good approximation of the overhead. Now, let’s go through each of the messages and consider their sizes.
ClientHello – the average size of initial client hello is about 160 to 170 bytes. It will vary based on the number of ciphersuites sent by the client as well as how many TLS ClientHello extensions are present. If session resumption is used, another 32 bytes need to be added for the Session ID field.
ServerHello – this message is a bit more static than the ClientHello, but still variable size due to TLS extensions. The average size is 70 to 75 bytes.
-Certificate – this message is the one that varies the most in size between different servers. The message carries the certificate of the server, as well as all intermediate issuer certificates in the certificate chain (minus the root cert). Since certificate sizes vary quite a bit based on the parameters and keys used, I would use an average of 1500 bytes per certificate (self-signed certificates can be as small as 800 bytes). The other varying factor is the length of the certificate chain up to the root certificate. To be on the more conservative side of what is on the web, let’s assume 4 certificates in the chain. Overall this gives us about 6k for this message.
ClientKeyExchange – let’s assume again the most widely used case – RSA server certificate. This corresponds to size of 130 bytes for this message.
ChangeCipherSpec – fixed size of 1 (technically not a handshake message)
Finished – depending whether SSLv3 is used or TLS, the size varies quite a bit – 36 and 12 bytes respectively. Most implementations these days support TLSv1.0 at least, so let’s assume TLS will be used and therefore the size will be 12 bytes
So the minimum can be as big (or small) as:
20 + 28 + 170 + 75 + 800 + 130 + 2*1 + 2*12 ≈ 1249
Though according to the article, the average is about 6449 bytes.
Also it is important to know that TLS sessions can be resumed, so only the 1st connection has this overhead. All other messages have about 330 bytes plus.

Related

Are large HTTP envelopes split across several partial http requests?

I'm reading a book that looks at different web service architectures, including an overview of how the SOAP protocol can be implemented in via HTTP. This was interesting to me because I do a lot of WCF development and didn't realize how client/server communication was implemented.
Since protocols like TCP and whatever is lower than that have fixed maximum packet sizes and as such have to split messages into packets, I just assumed that HTTP was similar. Is that not the case?
I.e. If I make a GET/POST request with a 20MB body, will a single HTTP envelope be sent and reassembled on the server?
If this is the case, what is the largest practical size of an http request? i have previously configured Nginx servers to allow 20mb file transfers and I'm wondering if this is too high...
From HTTP specification point of view, there is no limit for HTTP payload. According to RFC7230:
Any Content-Length field value greater than or equal to zero is valid. Since there is no predefined limit to the length of a payload, a recipient MUST anticipate potentially large decimal numerals and prevent parsing errors due to integer conversion overflows.
However, to prevent attack via very long or very slow stream of data, a web server may reject such HTTP request and return 413 Payload Too Large response.
"Since protocols like TCP and whatever is lower than that have fixed maximum packet sizes and as such have to split messages into packets, I just assumed that HTTP was similar. Is that not the case?"
No. HTTP is an application level protocol and is totally different. As HTTP is based on TCP, when the data is transferring, it would automatically split into packets on TCP level. There is no need to split the request on HTTP level.
An HTTP body can be as large as you want it to be, there is no download size limit, the size limit is usually set for uploads, to prevent someone uploading massive files to your server.
You can ask for a section of a resource using the Range header, if you only want part of it.
IE had limits of 2 and 4 GB at times, but these have been fixed since. Source

Why is the total amount of character in a GET limited?

I want to ask some question about the following quote taken from Head First Servlets and JSP, Second Edition book:
The total amount of characters in a GET is really limited (depending
on the server). If the user types, say, a long passage into a “search”
input box, the GET might not work.
Why is the total amount of characters in a GET limited?
How can I learn about the total amount of character in a Get?
When I said a long text into any input box, and GET is not working.
How many solution do I have to fix this problem.
Why is the get method limited?
There is no specific limit to the length of a GET request. Different servers can have different limits. If you need to send more data to the server, use POST instead of GET. A recommended minimum to be supported by servers and browsers is 8,000 bytes, but this is not required.
RFC 7230's Section 3.1.1 "Request Line" says
HTTP does not place a predefined limit on the length of a request-line, as described in Section 2.5. A server that receives a method longer than any that it implements SHOULD respond with a 501 (Not Implemented) status code. A server that receives a request-target longer than any URI it wishes to parse MUST respond with a 414 (URI Too Long) status code (see Section 6.5.12 of RFC7231).
Various ad hoc limitations on request-line length are found in practice. It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of 8000 octets.
Section 2.5 "Conformance and Error Handling" says
HTTP does not have specific length limitations for many of its protocol elements because the lengths that might be appropriate will vary widely, depending on the deployment context and purpose of the implementation. Hence, interoperability between senders and recipients depends on shared expectations regarding what is a reasonable length for each protocol element. Furthermore, what is commonly understood to be a reasonable length for some protocol elements has changed over the course of the past two decades of HTTP use and is expected to continue changing in the future.
and RFC 7231's Section 6.5.12 "414 URI Too Long" says
The 414 (URI Too Long) status code indicates that the server is
refusing to service the request because the request-target (Section
5.3 of [RFC7230]) is longer than the server is willing to interpret.
This rare condition is only likely to occur when a client has
improperly converted a POST request to a GET request with long query
information, when the client has descended into a "black hole" of
redirection (e.g., a redirected URI prefix that points to a suffix of
itself) or when the server is under attack by a client attempting to
exploit potential security holes.
The 'get'-data is send in the query string - which also has a maximum length.
You can do all kind of things with the query string, e.g. bookmark it. Would you really like to bookmark a real huge text?
It is possible to configure moste servers to use larger length - some clients will accept them, some will throw errors.
"Note: Servers ought to be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations might not properly support these lengths." HTTP 1.1 specification chapter 3.2.1:.
There is also a status code "414 Request-URI Too Long" - if you get this you will know that you have put to many chars in the get. (If you hit the server limit, if the client limit is lower then the server limit each browser will react in it's own way).
Generally it would be wise to set a limit for each data being send to a server - just if someones tries to make huge workload or slow down the server (e.g. send a huge file - 1 server connection is used. slow down the transmission, make additional sends - at some point the server wil have a lot of connections open. Use multiple clients and you have an attack scenario on a server).

How can I download a single file from multiple locations via HTTP?

I need to download a big file quickly, but all sources I can find have throttled bandwidth. Each of them seem to support HTTP 1.1 Byte Serving (Range Requests), since I can pause and resume the downloads. How can I download it from multiple sources in parallel?
Assuming this is a programming question (given that this is StackOverflow) I am going to explain how instead of just linking to a download accelerator that takes advantage of this.
What is needed in terms of the server to do this?
A server that supports Range HTTP header.
A server that allows for concurrent connections. It is possible to support Range while not allowing multiple simultaneous connection by using either endpoint or IP based restrictions server side. For this reason, I recommend you set up a simple test server instead of downloading from a file sharing site while testing this.
What is the Range Header?
Data transmission over HTTP is sent in order starting from the beginning of the file if the Range header is not set. The first byte of the file on the server will be the first byte of the HTTP response and the last byte of the file on the server will be the last byte of the HTTP response. The Range header allows you to specify where the bytes should start sending from allowing you to "skip" the beginning of the response.
Actual Answer Example
Our Situation
The response is plain text. The response content is just one word "StackOverflow!!" encoding ASCII, meaning each character is one byte. Therefore, the Content-Length header's value is 15 octets (another term for bytes).
We are going to download this file using 3 requests. For the sake of this example, we are going to say it will be 3 times faster but you should realize that this method will make downloads slower for very small files. This is because HTTP headers must be sent with each request as well as the 3-way handshake. We will also assume that the server supports HEAD requests and that the Content-Length header is sent with the download response. Finally, this request will be preformed using GET for reasons of HEAD requests. However, there are workarounds for POST.
Juicy Details
First, perform an HTTP HEAD request. Take the "Content-Length" header and divide that value by the amount of concurrent parallel connections you wish to make. For this example, the Content-Length is 15 and we wish to make 3 connections so the divided value will be 5.
Now preform the amount of requests you wished to preform parallel. With each request, set the Range header to "Range: bytes=" followe by how many requests have already been made times the divided value found above. Then append "-" followed by the value you just determined plus the divided value.
For this example, each request should have the header set as followed.
Range: bytes=0-5
Range: bytes=5-10
Range: bytes=10-15
The response of each of these requests should be
Stack
Overf
low!!
In essence, we are just conforming to Range specification (section 3.12 of RFC 2616) as well as Byte Range specification (section 14.35 of RFC 2616).
Finally, append the bytes of each request to form the final response data.
Disclaimer: I've never actually tried this but it should work in theory
I can't say if wget is able to put a file together again, if fetched from multiple sources.
The following example shows how to do it with aria2c.
You would build a download description file and then pass that to aria, like so:
aria2c -i uri.txt --split=5 --min-split-size=1M --max-connection-per-server=5
where uri.txt might contain
http://a.com/file1.iso http://mirror-1.com/file1.iso http://mirror-2.com/file1.iso
dir=/downloads
out=file1.iso
This would fetch the same file, from 3 different locations and place it into the downloads folder (dir) with the name file1.iso (out).

What is the mask in a WebSocket frame?

I am working on a websocket implementation and do not know what the sense of a mask is in a frame.
Could somebody explain me what it does and why it is recommend?
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |
+---------------------------------------------------------------+
Websockets are defined in RFC6455, which states in Section 5.3:
The unpredictability of the masking key is
essential to prevent authors of malicious applications from selecting
the bytes that appear on the wire.
In a blog entry about Websockets I found the following explanation:
masking-key (32 bits): if the mask bit is set (and trust me, it is if you write for the server side) you can read for unsigned bytes here which are used to xor the payload with. It's used to ensure that shitty proxies cannot be abused by attackers from the client side.
But the most clearly answer I found in an mailing list archive. There John Tamplin states:
Basically, WebSockets is unique in that you need to protect the network
infrastructure, even if you have hostile code running in the client, full
hostile control of the server, and the only piece you can trust is the
client browser. By having the browser generate a random mask for each
frame, the hostile client code cannot choose the byte patterns that appear
on the wire and use that to attack vulnerable network infrastructure.
As kmkaplan stated, the attack vector is described in Section 10.3 of the RFC.
This is a measure to prevent proxy cache poisoning attacks1.
What it does, is creating some randomness. You have to XOR the payload with the random masking-key.
By the way: It isn't just recommended. It is obligatory.
1: See Huang, Lin-Shung, et al. "Talking to yourself for fun and profit." Proceedings of W2SP (2011)
From this article:
Masking of WebSocket traffic from client to server is required because of the unlikely chance that malicious code could cause some broken proxies to do the wrong thing and use this as an attack of some kind. Nobody has proved that this could actually happen, but since the fact that it could happen was reason enough for browser vendors to get twitchy, masking was added to remove the possibility of it being used as an attack.
So assuming attackers were able to compromise both the JavaScript code executed in a browser as well as the the backend server, masking is designed to prevent the the sequence of bytes sent between these two endpoints being crafted in a special way that could disrupt any broken proxies between these two endpoints (by broken this means proxies that might attempt to interpret a websocket stream as HTTP when in fact they shouldn't).
The browser (and not the JavaScript code in the browser) has the final say on the randomly generated mask used to send the message which is why it's impossible for the attackers to know what the final stream of bytes the proxy might see will be.
Note that the mask is redundant if your WebSocket stream is encrypted (as it should be). Article from the author of Python's Flask:
Why is there masking at all? Because apparently there is enough broken infrastructure out there that lets the upgrade header go through and then handles the rest of the connection as a second HTTP request which it then stuffs into the cache. I have no words for this. In any case, the defense against that is basically a strong 32bit random number as masking key. Or you know… use TLS and don't use shitty proxies.
I have struggled to understand the purpose of the WebSocket mask until I encountered the following two resources which summarize it clearly.
From the book High Performance Browser Networking:
The payload of all client-initiated frames is masked using the value specified in the frame header: this prevents malicious scripts executing on the client from performing a cache poisoning attack against intermediaries that may not understand the WebSocket protocol.
Since the WebSocket protocol is not always understood by intermediaries (e.g. transparent proxies), a malicious script can take advantage of it and create traffic that causes cache poisoning in these intermediaries.
But how?
The article Talking to Yourself for Fun and Profit (http://www.adambarth.com/papers/2011/huang-chen-barth-rescorla-jackson.pdf) further explains how a cache poisoning attack works:
The attacker’s Java applet opens a raw socket connection to attacker.com:80 (as before, the attacker can also a SWF to mount a
similar attack by hosting an appropriate policy file to authorize this
request).
The attacker’s Java applet sends a sequence of bytes over the socket crafted with a forged Host header as follows: GET /script.js
HTTP/1.1 Host: target.com
The transparent proxy treats the sequence of bytes as an HTTP request and routes the request based on the original destination IP,
that is to the attacker’s server.
The attacker’s server replies with malicious script file with an HTTP Expires header far in the future (to instruct the proxy to cache
the response for as long as possible).
Because the proxy caches based on the Host header, the proxy stores the malicious
script file in its cache as http://target.com/script.js, not as
http://attacker.com/script.js.
In the future, whenever any client
requests http://target.com/script.js via the proxy, the proxy will
serve the cached copy of the malicious script.
The article also further explains how WebSockets come into the picture in a cache-poisoning attack:
Consider an intermediary examining packets exchanged between the browser and the attacker’s server. As above, the client requests
WebSockets and the server agrees. At this point, the client can send
any traffic it wants on the channel. Unfortunately, the intermediary
does not know about WebSockets, so the initial WebSockets handshake
just looks like a standard HTTP request/response pair, with the
request being terminated, as usual, by an empty line. Thus, the client
program can inject new data which looks like an HTTP request and the
proxy may treat it as such. So, for instance, he might inject the
following sequence of bytes: GET /sensitive-document HTTP/1.1 Host: target.com
When the intermediary examines these bytes, it might conclude that
these bytes represent a second HTTP request over the same socket. If
the intermediary is a transparent proxy, the intermediary might route
the request or cache the response according to the forged Host header.
In the above example, the malicious script took advantage of the WebSocket not being understood by the intermediary and "poisoned" its cache. Next time someone asks for sensitive-document from target.com they will receive the attacker's version of it. Imagine the scale of the attack if that document is for google-analytics.
To conclude, by forcing a mask on the payload, this poisoning won't be possible. The intermediary's cache entry will be different every time.

Maximum on HTTP header values?

Is there an accepted maximum allowed size for HTTP headers? If so, what is it? If not, is this something that's server specific or is the accepted standard to allow headers of any size?
No, HTTP does not define any limit. However most web servers do limit size of headers they accept. For example in Apache default limit is 8KB, in IIS it's 16K. Server will return 413 Entity Too Large error if headers size exceeds that limit.
Related question: How big can a user agent string get?
As vartec says above, the HTTP spec does not define a limit, however many servers do by default. This means, practically speaking, the lower limit is 8K. For most servers, this limit applies to the sum of the request line and ALL header fields (so keep your cookies short).
Apache 2.0, 2.2: 8K
nginx: 4K - 8K
IIS: varies by version, 8K - 16K
Tomcat: varies by version, 8K - 48K (?!)
It's worth noting that nginx uses the system page size by default, which is 4K on most systems. You can check with this tiny program:
pagesize.c:
#include <unistd.h>
#include <stdio.h>
int main() {
int pageSize = getpagesize();
printf("Page size on your system = %i bytes\n", pageSize);
return 0;
}
Compile with gcc -o pagesize pagesize.c then run ./pagesize. My ubuntu server from Linode dutifully informs me the answer is 4k.
Here is the limit of most popular web server
Apache - 8K
Nginx - 4K-8K
IIS - 8K-16K
Tomcat - 8K – 48K
Node (<13) - 8K; (>13) - 16K
HTTP does not place a predefined limit on the length of each header
field or on the length of the header section as a whole, as described
in Section 2.5. Various ad hoc limitations on individual header
field length are found in practice, often depending on the specific
field semantics.
HTTP Header values are restricted by server implementations. Http specification doesn't restrict header size.
A server that receives a request header field, or set of fields,
larger than it wishes to process MUST respond with an appropriate 4xx
(Client Error) status code. Ignoring such header fields would
increase the server's vulnerability to request smuggling attacks
(Section 9.5).
Most servers will return 413 Entity Too Large or appropriate 4xx error when this happens.
A client MAY discard or truncate received header fields that are
larger than the client wishes to process if the field semantics are
such that the dropped value(s) can be safely ignored without changing
the message framing or response semantics.
Uncapped HTTP header size keeps the server exposed to attacks and can bring down its capacity to serve organic traffic.
Source
RFC 6265 dated 2011 prescribes specific limits on cookies.
https://www.rfc-editor.org/rfc/rfc6265
6.1. Limits
Practical user agent implementations have limits on the number and
size of cookies that they can store. General-use user agents SHOULD
provide each of the following minimum capabilities:
o At least 4096 bytes per cookie (as measured by the sum of the
length of the cookie's name, value, and attributes).
o At least 50 cookies per domain.
o At least 3000 cookies total.
Servers SHOULD use as few and as small cookies as possible to avoid
reaching these implementation limits and to minimize network
bandwidth due to the Cookie header being included in every request.
Servers SHOULD gracefully degrade if the user agent fails to return
one or more cookies in the Cookie header because the user agent might
evict any cookie at any time on orders from the user.
--
The intended audience of the RFC is what must be supported by a user-agent or a server. It appears that to tune your server to support what the browser allows you would need to configure 4096*50 as the limit. As the text that follows suggests, this does appear to be far in excess of what is needed for the typical web application. It would be useful to use the current limit and the RFC outlined upper limit and compare the memory and IO consequences of the higher configuration.
I also found that in some cases the reason for 502/400 in case of many headers could be because of a large number of headers without regard to size.
from the docs
tune.http.maxhdr
Sets the maximum number of headers in a request. When a request comes with a
number of headers greater than this value (including the first line), it is
rejected with a "400 Bad Request" status code. Similarly, too large responses
are blocked with "502 Bad Gateway". The default value is 101, which is enough
for all usages, considering that the widely deployed Apache server uses the
same limit. It can be useful to push this limit further to temporarily allow
a buggy application to work by the time it gets fixed. Keep in mind that each
new header consumes 32bits of memory for each session, so don't push this
limit too high.
https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#3.2-tune.http.maxhdr
If you are going to use any DDOS provider like Akamai, they have a maximum limitation of 8k in the response header size. So essentially try to limit your response header size below 8k.

Resources