How do web browsers execute and process requests? - http

I would like know how browser executes/processes the request. I would like to know this because knowing how it works will help me understand how better web programming can be done which meets performance goals using browser features.
How browsers download CSS, JS and Image files?
Does it download one resource at a time or multiple?
How many parallel requests (connections) it can make?
What happens if request is getting executed on the server and user click on the stop button? Will the execution get complete and response will come back? Or on server site the request is suspended in half way?
How JS execution is handled by browser?
Please add helpful links/information if possible.
Thanks all,

Please consider splitting this up into multiple questions. Here is some relevant information:
A web browser, or any web client, who wants to retrieve an HTTP resource will construct a GET request. This contains information to route the request to the proper server, and information to tell the server which resource is being requested. A resource can be an HTML page, an image, a Javascript file, or anything else.
When the browser receives an HTML page, the page may have links to other resources (for instance, image tags). These instruct the browser to make further requests.
Multiple resources may be downloaded in parallel. This can happen if your browser is attempting to load multiple pages at once (like in different tabs), or if the browser has received an HTML page that points it to several resources (as in the last point). From a single hostname, the HTTP 1.1 spec says that at most two resources should be downloaded in parallel (though this is just a guideline and cannot stop a browser from attempting to do otherwise).
Javascript is interpreted by the browser, just like other scripting languages are interpreted by their respective engines.

In the usual way (e.g., http GET operation, etc.).
It's implementation-dependent, different browsers do it differently.
It's implementation-dependent; typically, though, no more than two at a time between the same two endpoints (e.g., that browser talking to the same server). May be more if retrieving from multiple servers. Other resources get queued and wait for a slot to open up. This limit is typically enforced by browsers, but may also be enforced by servers (so a browser with this limit lifted may still find that later requests sit waiting for a bit while the server queues them.).
It depends a lot on when they do that, what kind of server it is, etc.
In strict document order. The browser may download multiple script files simultaneously, but it will execute them in document order. This is very important. Further processing of the page may (probably will) get held up waiting for the script to get downloaded and run. (IE supports the defer attribute on script tags that lets you tell it that it can continue processing the page before it executes the script.)

Related

HTTP Design. Why servers do not send all files immediately after getting GET request?

As I understand, typical navigation works like this:
The browser sends a request to the server
The server sends back an HTML file
The browser parses the HTML file and figures out which files it needs
The browser sends a separate request for each JS/CSS file
The server sends JS/CSS files back to the browser
Finally, the browser has everything to display the site
Are steps 3 and 4 really necessary?
Why don't we have a server-side list of all files required for the site?
This way the server can send all the files without waiting for the browser.
Here are my best attempts to explain such a design:
Explanation 1: HTML file is more important in the early stages because the browser builds a DOM tree first.
Okey. But we can have a prioritized server-side list of all files required for the site. This way the browser can build the DOM tree and download CSS concurrently.
Explanation 2: search engines don't want to download anything besides HTML.
Yes. But we can add Get-All-Files-Immediately HTTP header for that.
Explanation 3: this is a legacy. We don't want to break old sites.
Yes, we don't. But Get-All-Files-Immediately header will solve that too.
Explanation 4: this overhead is negligible.
Is it so?
Let's look at Dev Tools->Network for facebook.com: :
Steps 3 and 4 happen between points A and B. This seems like a good fraction of TTFP. Am I wrong?
I am very confused as I cannot find one single good reason for such a design. What am I missing?
There is a list of files that the browser needs, it's in the HTML.
It's possible to preemptively send things from server to browser if the server knows the browser will request it in the future via HTTP/2 Push.
This can indeed reduce the total latency. But it also comes with challenges. How does the server know for example that the client doesn't already have the file? Clients will often cache assets like CSS and images, so if a client hits the server again, push those assets again can be wasteful.
The reality is that for most people the first roundtrip to get the HTML is not enough of a problem for it to be worth fixing.

Why are so many HTTP requests sent to www.google.com?

I'm using Burp suite to see the requests my computer sends out when I go to www.google.com, and noticed that there were a lot of different requests sent. Why is this the case? Shouldn't it just be one GET request to Google's server, and then done? Instead it's sending maybe 10 GET requests and a handful of POST requests.
There's one GET request for the page (and more for every image, CSS, and JavaScript file), and then there can be many other AJAX GET/POST requests that get done afterward for things like updating the suggestions as you type things in, sending location information, or doing stuff with the cookies on your computer. Pretty much any time new information is displayed without reloading the page, there's an AJAX request going on. AJAX is also used to make expensive requests so the page can load faster. There are many uses.
Here's a tutorial for how AJAX works if you would like to do it yourself: AJAX Tutorial
Note: AJAX is a method of sending requests, it's not its own programming language. It stands for "Asynchronous JavaScript and XML."
while it is hard to come up with a 100% answer to your question (I can not tell which requests your computer sends to Google) one possibility is that after the first GET request Google sends back a bunch of HTML/CSS/JavaScript. JavaScript is then executed on your computer (Client side) and might trigger another request towards Google servers. However, this is just one possibility.
Cheers,
Christian
Normally every element of a page is requestet with a separate GET. (css, images, scripts)
So you'll hardly (never) find a site which is being loaded by one single GET-request.

Is HTTP Conditional GET reliable to detect if web contents has been modified?

I'd like to ask a question regarding HTTP Conditional GET. Is this actually reliable to detect changes of a web resource?
I mean, if I write a program to check if the page content is changed by using HTTP Conditional GET, is it possible that the web server is misconfigured (or intentionally configured) to return there are no changes even though the contents of the HTML or XML (Restful) has changed?
(I'm referring to requesting a web page with a header "If-Modified-Since" as part of the GET request. So, is the modified date that comes back is always reliable?)
Of course it is possible. But the whole point of using a communication protocol, is that you trust that the other side is fulfilling it.
Usually, the situations like the one you mention are called "byzantine", because one of the ends is not following the protocol or failing, but cheating.
Yes, it is possible for a server to say that the content hasn't changed even though it has. It is still just code running on the server so it can do anything it wants.

HTTP out-of-order responses and Async processing in Servlet 3.0

I have multiple AJAX requests going out of my browser.
My UI is comprised of multiple views and the AJAX requests are trying to populate those views simultaneously. In some cases I require more than 10 simultaneous requests to be sent from client and processed concurrently at the server.
But due to browser limitations on max concurrent requests to a single domain and because of HTTP's "A server MUST send its responses to requests in the same order that the requests were received" constraint, I am not deriving as much concurrency in request processing as I would want.
From my application's standpoint, I dont need responses to come in the order in which I sent the request. I am ok if view8 gets populated before view1, for example.
Async processing using Servlet 3.0 constructs seems to address only one-side of the problem (the Server-side) and hence cannot be fully exploited for maximizing application concurrency.
My question is:
Am I missing out on some proper constructs ? ('proper' in contrast to workarounds like "host your images from a different sub domain") that can yield me more concurrency ?
This seems like something many web UIs would need ! If not, then I am designing my UI the wrong way. In either case, I would appreciate your inputs.
Edit1: To my advantage, I dont have to support a huge number of concurrent clients. The maximum number of concurrent clients accessing the app would be < 100. Given that fact, basically am trying to enhance the experience of these clients when I have the processing power available aplenty on my server-side.
Edit2: Our application/API is not for 'public' consumption. For ex: It is like my company's webmail app. It is hosted on the internet but it is not meant for everyone's consumption. Only meant for consumption by the relevant few.
The reason why am giving that info, is to differentiate my app from SO/Twitter, which seem to differentiate their (REST) API users from their normal website users. In our case, we think we should not differentiate that way and want to provide single-set of REST endpoints for both.
The reason behind the limitation in the spec (RFC2616) seems to be : "These guidelines are intended to improve HTTP response and avoid congestion.". However, intranet web apps have more luxuries and should not have to be so constrained !?
The server is exposing REST API and hence the UI makes specific GETs
for various resource catogories (ex: blogs, videos, news, articles).
Since each resource catogory has its exclusive view it all fits in
nicely. It feels wrong to collate requests to get blogs and videos
together in one request. Isnt it ?
Well, IMHO being pragmatic is more important. Sure, it makes sense for a service to expose RESTful API but it's not always necessary to expose the entire API to the browser. Your API can be separate from your server side web app. You can always make those multiple API requests on the server side, collate the results and send them back to the client. For e.g. look at the SO home page. The StackOverflow API does expose a RESTful API but when loading the home page the browser doesn't send across multiple requests just to populate the tags, thread listing etc.
Thanks Sanjay for the suggestion. But we wanted to have a single-API
for both REST clients and Browser clients. Interestingly, the root URI
"stackoverflow.com" is not mentioned in SO's REST API, but the browser
client uses it. I suppose if they had exposed the root URI, their
response would be difficult to process (as it would be a mixture of
data). Their REST API is granular (as is in my application), but their
javascript code uses some other doors(APIs) to decrease no. of
round-trips to the server! Somehow that doesnt feel right (Am a novice
in this field though). Feel free to correct me
SO doesn't use any "other doors". It's just that they simply don't send across 10 concurrent requests for populating something on the page. They make XHR request when you vote, mark thread as favorite, comment etc. For loading the page itself, there are no multiple requests. If you want to directly hit your RESTful API from the browser, you'll have to honor the limitations. Either that or go the desktop way which allows you virtually unlimited connections to your server but I guess you don't want to go that route...

ASP.NET: Doesnot download Parallel content

In asp.net application, how its possible to download all png,css JavaScript and other resources parallel.
Because i am monitoring using Fiddler and found that content is downloaded one after another.
That is actually more of a browser (client) behaviour in accordance to the specification in HTTP 1.1. The guideline is to limit simultaneous downloads to two per hostname.
http://www.yuiblog.com/blog/2007/04/11/performance-research-part-4/
While you may be able to alter your browser's settings to download more per hostname, that is only your machine and not that of others' in the Internet wilderness. One way to trick clients in downloading more simulatenously is to designate your web resources into different hostnames, like images stored in http://images.yoursite.com. But you may wanna to test this and balance it out, as per the article's suggestion.
You can try AJAX for that as usually there are 5 allowed server/client http connections you could theoretically use them all at once.
However I guess you will take little advantage of this, unless you have really big (or many) css and javascript files.
Not sure if this will work on images or other files.

Resources