Why is the total amount of character in a GET limited? - http

I want to ask some question about the following quote taken from Head First Servlets and JSP, Second Edition book:
The total amount of characters in a GET is really limited (depending
on the server). If the user types, say, a long passage into a “search”
input box, the GET might not work.
Why is the total amount of characters in a GET limited?
How can I learn about the total amount of character in a Get?
When I said a long text into any input box, and GET is not working.
How many solution do I have to fix this problem.
Why is the get method limited?

There is no specific limit to the length of a GET request. Different servers can have different limits. If you need to send more data to the server, use POST instead of GET. A recommended minimum to be supported by servers and browsers is 8,000 bytes, but this is not required.
RFC 7230's Section 3.1.1 "Request Line" says
HTTP does not place a predefined limit on the length of a request-line, as described in Section 2.5. A server that receives a method longer than any that it implements SHOULD respond with a 501 (Not Implemented) status code. A server that receives a request-target longer than any URI it wishes to parse MUST respond with a 414 (URI Too Long) status code (see Section 6.5.12 of RFC7231).
Various ad hoc limitations on request-line length are found in practice. It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of 8000 octets.
Section 2.5 "Conformance and Error Handling" says
HTTP does not have specific length limitations for many of its protocol elements because the lengths that might be appropriate will vary widely, depending on the deployment context and purpose of the implementation. Hence, interoperability between senders and recipients depends on shared expectations regarding what is a reasonable length for each protocol element. Furthermore, what is commonly understood to be a reasonable length for some protocol elements has changed over the course of the past two decades of HTTP use and is expected to continue changing in the future.
and RFC 7231's Section 6.5.12 "414 URI Too Long" says
The 414 (URI Too Long) status code indicates that the server is
refusing to service the request because the request-target (Section
5.3 of [RFC7230]) is longer than the server is willing to interpret.
This rare condition is only likely to occur when a client has
improperly converted a POST request to a GET request with long query
information, when the client has descended into a "black hole" of
redirection (e.g., a redirected URI prefix that points to a suffix of
itself) or when the server is under attack by a client attempting to
exploit potential security holes.

The 'get'-data is send in the query string - which also has a maximum length.
You can do all kind of things with the query string, e.g. bookmark it. Would you really like to bookmark a real huge text?
It is possible to configure moste servers to use larger length - some clients will accept them, some will throw errors.
"Note: Servers ought to be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations might not properly support these lengths." HTTP 1.1 specification chapter 3.2.1:.
There is also a status code "414 Request-URI Too Long" - if you get this you will know that you have put to many chars in the get. (If you hit the server limit, if the client limit is lower then the server limit each browser will react in it's own way).
Generally it would be wise to set a limit for each data being send to a server - just if someones tries to make huge workload or slow down the server (e.g. send a huge file - 1 server connection is used. slow down the transmission, make additional sends - at some point the server wil have a lot of connections open. Use multiple clients and you have an attack scenario on a server).

Related

How to know the full request has been received with HTTP 1.0 and HTTP 1:1?

I'm implementing a ultra simple dummy HTTP server responding a message with Hello world to any requests. It is just for benchmarking the asynchronous event handling with wrk or equivalent web server benchmarking tool.
After some searching on the Web I can't find a clear EndOfMessage (EOM) marker. It seam that with HTTP 1.0 we know we have received the full request when the connection is closed. Is that right ?
For HTTP 1.1, how do we know if pipelining is used ? What is the EOM in this case ?
After some searching on the Web I can't find a clear EndOfMessage (EOM) marker.
You can't find one because such a thing doesn't exist. The only marker you may find is the CRLF pair indicating the end of the header fields. In general, the enclosed entity length (that is for requests and responses!) is either communicated beforehand via the Content-Length header or through the transport coding.
with HTTP 1.0 we know we have received the full request when the connection is closed. Is that right?
That is one of two ways mandated by RFC 1945. So generally speaking: no. From RFC 1945, section 7.2.2:
When an Entity-Body is included with a message, the length of that body may be determined in one of two ways. If a Content-Length header field is present, its value in bytes represents the length of the Entity-Body. Otherwise, the body length is determined by the closing of the connection by the server.
This may read like you were generally in the right with your assertion. BUT:
Closing the connection cannot be used to indicate the end of a request body, since it leaves no possibility for the server to send back a response.
With you being on the receiving side, your assumption is simply wrong on every conceivable level: If the request contains a body, announcing the size of said body through the Content-Length header is an absolute requirement.
HTTP/1.1 is a bit relaxed in this regard, as it allows for more options. As Julian pointed out, please consult RFC 7230, section 3.3.3. That section is straightforward to read and to answer your question, I'd have to c&p it as whole.
For HTTP 1.1, how do we know if pipelining is used ?
You do if you receive multiple requests through one connection. The strongest indicator for the client non engaging into pipelining is the presence of Connection: close in the first received request. See RFC 7230, section 6.3 and section 6.3.2. If you are worried about having to support this, you are always free to just read the first request and send back a response with Connection: close in it. The client will know it has to establish a new connection.
What is the EOM in this case ?
Again, there is no marker as there is no special treatment for requests during pipelining. All pipelining is really enabling is to have multiple requests being issued in one go. See section 3.3.3 from above on how to determine the message length.

How can I download a single file from multiple locations via HTTP?

I need to download a big file quickly, but all sources I can find have throttled bandwidth. Each of them seem to support HTTP 1.1 Byte Serving (Range Requests), since I can pause and resume the downloads. How can I download it from multiple sources in parallel?
Assuming this is a programming question (given that this is StackOverflow) I am going to explain how instead of just linking to a download accelerator that takes advantage of this.
What is needed in terms of the server to do this?
A server that supports Range HTTP header.
A server that allows for concurrent connections. It is possible to support Range while not allowing multiple simultaneous connection by using either endpoint or IP based restrictions server side. For this reason, I recommend you set up a simple test server instead of downloading from a file sharing site while testing this.
What is the Range Header?
Data transmission over HTTP is sent in order starting from the beginning of the file if the Range header is not set. The first byte of the file on the server will be the first byte of the HTTP response and the last byte of the file on the server will be the last byte of the HTTP response. The Range header allows you to specify where the bytes should start sending from allowing you to "skip" the beginning of the response.
Actual Answer Example
Our Situation
The response is plain text. The response content is just one word "StackOverflow!!" encoding ASCII, meaning each character is one byte. Therefore, the Content-Length header's value is 15 octets (another term for bytes).
We are going to download this file using 3 requests. For the sake of this example, we are going to say it will be 3 times faster but you should realize that this method will make downloads slower for very small files. This is because HTTP headers must be sent with each request as well as the 3-way handshake. We will also assume that the server supports HEAD requests and that the Content-Length header is sent with the download response. Finally, this request will be preformed using GET for reasons of HEAD requests. However, there are workarounds for POST.
Juicy Details
First, perform an HTTP HEAD request. Take the "Content-Length" header and divide that value by the amount of concurrent parallel connections you wish to make. For this example, the Content-Length is 15 and we wish to make 3 connections so the divided value will be 5.
Now preform the amount of requests you wished to preform parallel. With each request, set the Range header to "Range: bytes=" followe by how many requests have already been made times the divided value found above. Then append "-" followed by the value you just determined plus the divided value.
For this example, each request should have the header set as followed.
Range: bytes=0-5
Range: bytes=5-10
Range: bytes=10-15
The response of each of these requests should be
Stack
Overf
low!!
In essence, we are just conforming to Range specification (section 3.12 of RFC 2616) as well as Byte Range specification (section 14.35 of RFC 2616).
Finally, append the bytes of each request to form the final response data.
Disclaimer: I've never actually tried this but it should work in theory
I can't say if wget is able to put a file together again, if fetched from multiple sources.
The following example shows how to do it with aria2c.
You would build a download description file and then pass that to aria, like so:
aria2c -i uri.txt --split=5 --min-split-size=1M --max-connection-per-server=5
where uri.txt might contain
http://a.com/file1.iso http://mirror-1.com/file1.iso http://mirror-2.com/file1.iso
dir=/downloads
out=file1.iso
This would fetch the same file, from 3 different locations and place it into the downloads folder (dir) with the name file1.iso (out).

do webservers impose character limits on get requests [duplicate]

What's the maximum length of an HTTP GET request?
Is there a response error defined that the server can/should return if it receives a GET request that exceeds this length?
This is in the context of a web service API, although it's interesting to see the browser limits as well.
The limit is dependent on both the server and the client used (and if applicable, also the proxy the server or the client is using).
Most web servers have a limit of 8192 bytes (8 KB), which is usually configurable somewhere in the server configuration. As to the client side matter, the HTTP 1.1 specification even warns about this. Here's an extract of chapter 3.2.1:
Note: Servers ought to be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations might not properly support these lengths.
The limit in Internet Explorer and Safari is about 2 KB, in Opera about 4 KB and in Firefox about 8 KB. We may thus assume that 8 KB is the maximum possible length and that 2 KB is a more affordable length to rely on at the server side and that 255 bytes is the safest length to assume that the entire URL will come in.
If the limit is exceeded in either the browser or the server, most will just truncate the characters outside the limit without any warning. Some servers however may send an HTTP 414 error.
If you need to send large data, then better use POST instead of GET. Its limit is much higher, but more dependent on the server used than the client. Usually up to around 2 GB is allowed by the average web server.
This is also configurable somewhere in the server settings. The average server will display a server-specific error/exception when the POST limit is exceeded, usually as an HTTP 500 error.
You are asking two separate questions here:
What's the maximum length of an HTTP GET request?
As already mentioned, HTTP itself doesn't impose any hard-coded limit on request length; but browsers have limits ranging on the 2 KB - 8 KB (255 bytes if we count very old browsers).
Is there a response error defined that the server can/should return if it receives a GET request exceeds this length?
That's the one nobody has answered.
HTTP 1.1 defines status code 414 Request-URI Too Long for the cases where a server-defined limit is reached. You can see further details on RFC 2616.
For the case of client-defined limits, there isn't any sense on the server returning something, because the server won't receive the request at all.
Browser limits are:
Browser Address bar document.location
or anchor tag
---------------------------------------------------
Chrome 32779 >64k
Android 8192 >64k
Firefox >64k >64k
Safari >64k >64k
Internet Explorer 11 2047 5120
Edge 16 2047 10240
Want more? See this question on Stack Overflow.
A similar question is here: Is there a limit to the length of a GET request?
I've hit the limit and on my shared hosting account, but the browser returned a blank page before it got to the server I think.
Technically, I have seen HTTP GET will have issues if the URL length goes beyond 2000 characters. In that case, it's better to use HTTP POST or split the URL.
As already mentioned, HTTP itself doesn't impose any hard-coded limit on request length; but browsers have limits ranging on the 2048 character allowed in the GET method.
Yes. There isn't any limit on a GET request.
I am able to send ~4000 characters as part of the query string using both the Chrome browser and curl command.
I am using Tomcat 8.x server which has returned the expected 200 OK response.
Here is the screenshot of a Google Chrome HTTP request (hiding the endpoint I tried due to security reasons):
RESPONSE

counting HTTP packets

What is relation between number of HTTP packets and number of objects in a web page?
What is relation between number of HTTP packets and number of objects in a web page?
The short answer is there is obviously some relation, but there is no way you can accurately predict one from the other.
For a longer answer, we first need to correct some misconceptions in the question:
There is no such thing as an "HTTP packet". HTTP is a message oriented application protocol with one request message and one response message per "resource" fetched). This sits on top of a reliable byte stream protocol (with flow control, etc) called TCP. This in turn sits on top of a packet switching protocol called IP. An HTTP request/response exchange takes an unpredictable number of IP packets ... depending on message sizes AND network conditions. Other HTTP features such as compression, keeping connections alive, caching and so on make things even more complicated.
The idea of an "object" is ill-defined. An "object" could have a one-to-one correspondence between HTTP request / response pairs (i.e. a "resource" in the above) then that part is simple. OTOH, a "resource" could be a rendering of multiple "objects" in the application domain of the webserver.
On top of that, you've also got to account for the fact that a typical HTML resource has references to other resources (Scripts, CSS, images, etc) and may even involve Ajax callbacks. Each of these is a "resource", that may or may not need to be fetched ... depending on caching, etc.
Finally, there is an implicit assumption that all "objects" are the same size. This might be true in some application domains, but it is not true in general.
So to summarize, there are far to many variables and unknowns for it to be feasible to predict the number of network packets required to fetch a certain number of "objects".
A more practical approach is to attach a packet-level network analyser to your network and get it to count the number of packets sent and received.
If you make the following assumptions:
"HTTP packets" are HTTP messages,
"objects" are resources,
a resource doesn't require other resources (Scripts, CSS, images, etc) to render,
there is no caching,
the server is not doing redirects.
then one "object" requires two "HTTP packets".
But frankly, you've simplified the problem to a point where the answer is next to useless for predicting actual performance of real web-servers. (For instance, any one of those "objects" could be tiny ... or huge. And if you allow for arbitrary javascript, or content such as links to video streams, then the number of "packets" of one kind or another is potentially unbounded.)
A GET request is issued for every file referred in a HTML page, all of which, usually, fit in one TCP stream segment. HTTP is a state machine, so, many requests/response can be pipelined in one request/response.
The number of packets sent in response vary in the size of the objects and in caching parameters. For example, if a file is already in the browser cache, it will make a conditional get and will receive a HTTP/1.1 304 Not Modified response code, which does not contain any data. Moreover, many HTTP/1.1 304 can be issued in one segment, as this response is very tiny compared to segments' maximum size. Another example, if a file is bigger than the maximum segment size, the file may (and it probably will) be divided in many segments.
Is this what you wish to know?

Maximum on HTTP header values?

Is there an accepted maximum allowed size for HTTP headers? If so, what is it? If not, is this something that's server specific or is the accepted standard to allow headers of any size?
No, HTTP does not define any limit. However most web servers do limit size of headers they accept. For example in Apache default limit is 8KB, in IIS it's 16K. Server will return 413 Entity Too Large error if headers size exceeds that limit.
Related question: How big can a user agent string get?
As vartec says above, the HTTP spec does not define a limit, however many servers do by default. This means, practically speaking, the lower limit is 8K. For most servers, this limit applies to the sum of the request line and ALL header fields (so keep your cookies short).
Apache 2.0, 2.2: 8K
nginx: 4K - 8K
IIS: varies by version, 8K - 16K
Tomcat: varies by version, 8K - 48K (?!)
It's worth noting that nginx uses the system page size by default, which is 4K on most systems. You can check with this tiny program:
pagesize.c:
#include <unistd.h>
#include <stdio.h>
int main() {
int pageSize = getpagesize();
printf("Page size on your system = %i bytes\n", pageSize);
return 0;
}
Compile with gcc -o pagesize pagesize.c then run ./pagesize. My ubuntu server from Linode dutifully informs me the answer is 4k.
Here is the limit of most popular web server
Apache - 8K
Nginx - 4K-8K
IIS - 8K-16K
Tomcat - 8K – 48K
Node (<13) - 8K; (>13) - 16K
HTTP does not place a predefined limit on the length of each header
field or on the length of the header section as a whole, as described
in Section 2.5. Various ad hoc limitations on individual header
field length are found in practice, often depending on the specific
field semantics.
HTTP Header values are restricted by server implementations. Http specification doesn't restrict header size.
A server that receives a request header field, or set of fields,
larger than it wishes to process MUST respond with an appropriate 4xx
(Client Error) status code. Ignoring such header fields would
increase the server's vulnerability to request smuggling attacks
(Section 9.5).
Most servers will return 413 Entity Too Large or appropriate 4xx error when this happens.
A client MAY discard or truncate received header fields that are
larger than the client wishes to process if the field semantics are
such that the dropped value(s) can be safely ignored without changing
the message framing or response semantics.
Uncapped HTTP header size keeps the server exposed to attacks and can bring down its capacity to serve organic traffic.
Source
RFC 6265 dated 2011 prescribes specific limits on cookies.
https://www.rfc-editor.org/rfc/rfc6265
6.1. Limits
Practical user agent implementations have limits on the number and
size of cookies that they can store. General-use user agents SHOULD
provide each of the following minimum capabilities:
o At least 4096 bytes per cookie (as measured by the sum of the
length of the cookie's name, value, and attributes).
o At least 50 cookies per domain.
o At least 3000 cookies total.
Servers SHOULD use as few and as small cookies as possible to avoid
reaching these implementation limits and to minimize network
bandwidth due to the Cookie header being included in every request.
Servers SHOULD gracefully degrade if the user agent fails to return
one or more cookies in the Cookie header because the user agent might
evict any cookie at any time on orders from the user.
--
The intended audience of the RFC is what must be supported by a user-agent or a server. It appears that to tune your server to support what the browser allows you would need to configure 4096*50 as the limit. As the text that follows suggests, this does appear to be far in excess of what is needed for the typical web application. It would be useful to use the current limit and the RFC outlined upper limit and compare the memory and IO consequences of the higher configuration.
I also found that in some cases the reason for 502/400 in case of many headers could be because of a large number of headers without regard to size.
from the docs
tune.http.maxhdr
Sets the maximum number of headers in a request. When a request comes with a
number of headers greater than this value (including the first line), it is
rejected with a "400 Bad Request" status code. Similarly, too large responses
are blocked with "502 Bad Gateway". The default value is 101, which is enough
for all usages, considering that the widely deployed Apache server uses the
same limit. It can be useful to push this limit further to temporarily allow
a buggy application to work by the time it gets fixed. Keep in mind that each
new header consumes 32bits of memory for each session, so don't push this
limit too high.
https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#3.2-tune.http.maxhdr
If you are going to use any DDOS provider like Akamai, they have a maximum limitation of 8k in the response header size. So essentially try to limit your response header size below 8k.

Resources