counting HTTP packets - http

What is relation between number of HTTP packets and number of objects in a web page?

What is relation between number of HTTP packets and number of objects in a web page?
The short answer is there is obviously some relation, but there is no way you can accurately predict one from the other.
For a longer answer, we first need to correct some misconceptions in the question:
There is no such thing as an "HTTP packet". HTTP is a message oriented application protocol with one request message and one response message per "resource" fetched). This sits on top of a reliable byte stream protocol (with flow control, etc) called TCP. This in turn sits on top of a packet switching protocol called IP. An HTTP request/response exchange takes an unpredictable number of IP packets ... depending on message sizes AND network conditions. Other HTTP features such as compression, keeping connections alive, caching and so on make things even more complicated.
The idea of an "object" is ill-defined. An "object" could have a one-to-one correspondence between HTTP request / response pairs (i.e. a "resource" in the above) then that part is simple. OTOH, a "resource" could be a rendering of multiple "objects" in the application domain of the webserver.
On top of that, you've also got to account for the fact that a typical HTML resource has references to other resources (Scripts, CSS, images, etc) and may even involve Ajax callbacks. Each of these is a "resource", that may or may not need to be fetched ... depending on caching, etc.
Finally, there is an implicit assumption that all "objects" are the same size. This might be true in some application domains, but it is not true in general.
So to summarize, there are far to many variables and unknowns for it to be feasible to predict the number of network packets required to fetch a certain number of "objects".
A more practical approach is to attach a packet-level network analyser to your network and get it to count the number of packets sent and received.
If you make the following assumptions:
"HTTP packets" are HTTP messages,
"objects" are resources,
a resource doesn't require other resources (Scripts, CSS, images, etc) to render,
there is no caching,
the server is not doing redirects.
then one "object" requires two "HTTP packets".
But frankly, you've simplified the problem to a point where the answer is next to useless for predicting actual performance of real web-servers. (For instance, any one of those "objects" could be tiny ... or huge. And if you allow for arbitrary javascript, or content such as links to video streams, then the number of "packets" of one kind or another is potentially unbounded.)

A GET request is issued for every file referred in a HTML page, all of which, usually, fit in one TCP stream segment. HTTP is a state machine, so, many requests/response can be pipelined in one request/response.
The number of packets sent in response vary in the size of the objects and in caching parameters. For example, if a file is already in the browser cache, it will make a conditional get and will receive a HTTP/1.1 304 Not Modified response code, which does not contain any data. Moreover, many HTTP/1.1 304 can be issued in one segment, as this response is very tiny compared to segments' maximum size. Another example, if a file is bigger than the maximum segment size, the file may (and it probably will) be divided in many segments.
Is this what you wish to know?

Related

Are large HTTP envelopes split across several partial http requests?

I'm reading a book that looks at different web service architectures, including an overview of how the SOAP protocol can be implemented in via HTTP. This was interesting to me because I do a lot of WCF development and didn't realize how client/server communication was implemented.
Since protocols like TCP and whatever is lower than that have fixed maximum packet sizes and as such have to split messages into packets, I just assumed that HTTP was similar. Is that not the case?
I.e. If I make a GET/POST request with a 20MB body, will a single HTTP envelope be sent and reassembled on the server?
If this is the case, what is the largest practical size of an http request? i have previously configured Nginx servers to allow 20mb file transfers and I'm wondering if this is too high...
From HTTP specification point of view, there is no limit for HTTP payload. According to RFC7230:
Any Content-Length field value greater than or equal to zero is valid. Since there is no predefined limit to the length of a payload, a recipient MUST anticipate potentially large decimal numerals and prevent parsing errors due to integer conversion overflows.
However, to prevent attack via very long or very slow stream of data, a web server may reject such HTTP request and return 413 Payload Too Large response.
"Since protocols like TCP and whatever is lower than that have fixed maximum packet sizes and as such have to split messages into packets, I just assumed that HTTP was similar. Is that not the case?"
No. HTTP is an application level protocol and is totally different. As HTTP is based on TCP, when the data is transferring, it would automatically split into packets on TCP level. There is no need to split the request on HTTP level.
An HTTP body can be as large as you want it to be, there is no download size limit, the size limit is usually set for uploads, to prevent someone uploading massive files to your server.
You can ask for a section of a resource using the Range header, if you only want part of it.
IE had limits of 2 and 4 GB at times, but these have been fixed since. Source

Are HTTP responses to the same resource returned in sequence?

Say I want to request some resource on a given domain, e.g. example.com/image.jpg.
If I do two requests to this particular resource, can I expect the first response I get back to be mappable to the first response I sent, and so on (or does this depend on client/server implementation)?
I'm asking because if I want to debug request-response pair, I necessarily need to know exactly which response belongs to which request (for timing purposes etc.). So, are there any sure-fire ways to achieve this mapping?
I'm assuming that you are talking about HTTP/1.x using persistent connections (i.e. HTTP keep-alive) when sending a new request without having the response for the previous one yet, i.e. using HTTP pipelining. In this case the order of the responses matches the order of the requests.
If you are instead talking about HTTP/2 then the situation is different because multiple requests can receive the responses in parallel inside the same TCP connection and this also means that a later requests might have received a response before earlier requests. And in which order requests to the same resource are handled by the server depends fully on the server implementation.
The same is true when the requests are done within independent TCP connections. The order might be even more unpredictable than with HTTP/2 because the requests might be handled in different threads or processes and thus the order also depends on the scheduling of these in the operating system.

Why is the total amount of character in a GET limited?

I want to ask some question about the following quote taken from Head First Servlets and JSP, Second Edition book:
The total amount of characters in a GET is really limited (depending
on the server). If the user types, say, a long passage into a “search”
input box, the GET might not work.
Why is the total amount of characters in a GET limited?
How can I learn about the total amount of character in a Get?
When I said a long text into any input box, and GET is not working.
How many solution do I have to fix this problem.
Why is the get method limited?
There is no specific limit to the length of a GET request. Different servers can have different limits. If you need to send more data to the server, use POST instead of GET. A recommended minimum to be supported by servers and browsers is 8,000 bytes, but this is not required.
RFC 7230's Section 3.1.1 "Request Line" says
HTTP does not place a predefined limit on the length of a request-line, as described in Section 2.5. A server that receives a method longer than any that it implements SHOULD respond with a 501 (Not Implemented) status code. A server that receives a request-target longer than any URI it wishes to parse MUST respond with a 414 (URI Too Long) status code (see Section 6.5.12 of RFC7231).
Various ad hoc limitations on request-line length are found in practice. It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of 8000 octets.
Section 2.5 "Conformance and Error Handling" says
HTTP does not have specific length limitations for many of its protocol elements because the lengths that might be appropriate will vary widely, depending on the deployment context and purpose of the implementation. Hence, interoperability between senders and recipients depends on shared expectations regarding what is a reasonable length for each protocol element. Furthermore, what is commonly understood to be a reasonable length for some protocol elements has changed over the course of the past two decades of HTTP use and is expected to continue changing in the future.
and RFC 7231's Section 6.5.12 "414 URI Too Long" says
The 414 (URI Too Long) status code indicates that the server is
refusing to service the request because the request-target (Section
5.3 of [RFC7230]) is longer than the server is willing to interpret.
This rare condition is only likely to occur when a client has
improperly converted a POST request to a GET request with long query
information, when the client has descended into a "black hole" of
redirection (e.g., a redirected URI prefix that points to a suffix of
itself) or when the server is under attack by a client attempting to
exploit potential security holes.
The 'get'-data is send in the query string - which also has a maximum length.
You can do all kind of things with the query string, e.g. bookmark it. Would you really like to bookmark a real huge text?
It is possible to configure moste servers to use larger length - some clients will accept them, some will throw errors.
"Note: Servers ought to be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations might not properly support these lengths." HTTP 1.1 specification chapter 3.2.1:.
There is also a status code "414 Request-URI Too Long" - if you get this you will know that you have put to many chars in the get. (If you hit the server limit, if the client limit is lower then the server limit each browser will react in it's own way).
Generally it would be wise to set a limit for each data being send to a server - just if someones tries to make huge workload or slow down the server (e.g. send a huge file - 1 server connection is used. slow down the transmission, make additional sends - at some point the server wil have a lot of connections open. Use multiple clients and you have an attack scenario on a server).

maximum http request in a page

How many http request does a browser can handle in a single html page.
Their is a popular saying that browser can handle only a certain http request from a single domain and so its better to create static domain(cdn). so that http request can be shared between the 2 domains.
q1)How many http request can a browser handle in a single html page or atleast the saturation point(say 1000 requests)?
q2)How many http request from a single domain name can a browser render(say 100 from the same domain name)?
also any suggestions for best practices!!!
Section 8.1.4 of the HTTP/1.1 RFC says a "single-user client SHOULD NOT maintain more than 2 connections with any server or proxy."
However, the key word is "should"; most browsers use a different number. See this blog for a table of max connections per browser.
In theory there is no limit. But as the number of requests required to construct a page grows, the time taken for the page to be rendered increases. The relationship is not linear at low counts. Typically latency has a far bigger effect than bandwidth on actual throughput and there are mechanisms in HTTP to minimise the effect of this - such as keepalives and parallel requests. As Jon Grant says, there are limits on the number of concurrent requests.
A full answer to this question would fill a book - here's a good one.

Why can HTTP handle only one pending request per socket?

Being curious, I wonder why HTTP, by design, can only handle one pending request per socket.
I understand that this limitation is because there is no 'Id' to associate a request to its response, so the only way to match a response with its request is to send the response on the same socket that sent the request. There would be no way to match a response to its request if there was more than one pending request on the socket because we may not receive the responses in the same order requests were sent.
If the protocol had been designed to have a matching 'Id' for requests and responses, there could be multiple pending requests on only one socket. This could greatly reduce the number of socket used by internet browsers and applications using web services.
Was HTTP designed like this for simplicity even if it's less efficient or am I missing something and this is the best approach?
Thanks.
Not true. Read about HTTP1.1 pipelining. Apache implements it and Firefox implements it. Although Firefox disables it by default.
To turn it on in Firefox use about:config and write 'pipelining' in the filter.
see: http://www.mozilla.org/projects/netlib/http/pipelining-faq.html
It's basically for simplicity; various proposals have been made over the years that multiplex on the same connection (e.g. SPDY) but none have taken off yet.
One problem with sending multiple requests on a single socket is that it would cause inefficient queuing.
For instance, lets say you are in a store and there are 2 cashiers, and 10 people waiting to be checked out. The ideal way to make the line is to have a single queue of 10 people and the next person in line goes to a cashier when they become available. However, if you sent all the requests at once you would probably send 5 people to cashier A and 5 to cashier B. However, what if you sent the 5 people with the largest shopping carts to the same cashier? That's bad queuing and what could happen if you queued a bunch of requests on a single socket.
NOTE: I'm not saying that you couldn't use queuing well, but it keeps it simple to do it right if there is no queuing on a single socket.
There are a few concidertaions I would review.
The first is related to the nature of TCP itself. TCP suffers from 'head-of-line' blocking issue where there can only be a single outstanding (unacknowledged) request (connection/TCP level) in flight. Given traditional latencies this can be a problem from a load time user experience perspective compared to results of parallel connection scheme browsers employ today. The higher the latency of the link the larger the impact of this fundemental limitation.
There is also a concurrency issue in that sometimes you really want to load multiple resources incrementally / in parallel. Back in the day one of the greatest features mozilla had over mosaic was that it would load images and objects incrementally so you could begin to see what was going on and use a resource without having to wait for it to load. With fewer connections there is a risk in that for example loading a large image on page before a style sheet can be catastrophic from an experience point of view. Expecting some kind of mitigating intelligence or explicit configuration to optimally order requests may not be a realistic or ideal solution.
There are proposals such as HTTP over SCTP that will more or less totally correct the issue you raise at the transport level.
Also realize that HTTP doesn't necessarily mandate a Content-Length header to serve data. Even if each HTTP response was ID'd, how would you manage streaming binary content with no content length (HTTP/1.0 style)? or if the client sent the Connection: close header to have the client close due to non-known lengths?
To manage this you would have to HTTP chunk (already present) in multiplex (I don't think anyone implements this) and add some non-trivial work to many programs.

Resources