I'm building a new site, and during the foundation stage I'm trying to assess the best way to load images. Browsers have a limit of 2-6 items it can load concurrently (images/css/js). Through the grapevine I've heard various different methods, but no definitive answer on which is actually faster.
Relative URLs:
background-image: url(images/image.jpg);
Absolute URLs:
background-image: url(http://site.com/images/image.jpg);
Absolute URLs (with sub-domains):
background-image: url(http://fakecdn.site.com/images/image.jpg);
Will a browser recognize my "fakecdn" subdomain as a different domain and load images from it concurrently in a separate thread?
Do images referenced in a #import CSS file load in a separate thread?
The HTTP 1.1 spec suggests that browsers do not open more than two connections to a given domain.
Clients that use persistent connections SHOULD limit the number of
simultaneous connections that they maintain to a given server. A
single-user client SHOULD NOT maintain more than 2 connections with
any server or proxy.
So, if you are loading many medium sized images, then it may make sense to put them on separate FQDNs so that the 2 connection limit is not the bottleneck. For small images, the need of a new socket connection to each FQDN may outweigh the benefits. Similarly, for large images, the client network bandwith may be the limiting factor.
If the images are always displayed, then using a data uri may be fastester, since no separate connection is required, and the images can be included in the stream in the order they are needed.
However, as always with optimizing for performance, profile first!
See
Wikipedia - data uri
For lots of small images, social media icons being a good example, you'll also want to look into combining them into a single sprite map. That way they'll all load in the same request, and you just have to do some background-positioning when using them.
Related
We are thinking about moving a server with many websites to http2. Now one concerns was that if you use http2 and download all ressources parallel that it could take longer for the browser to begin with painting / rendering the page as with only http since it is waiting for all ressources to be downloaded instead of just beginning with what is already there and continue to repaint stuff as it gets downloaded.
I think this is wrong but i found no article or good explaining so i could prove it to the ones that think this could be the case.
The browser will paint when it has the resources needed to paint and this will mostly not change under HTTP/2.
I am not sure why you think a browser would wait to download all the resources under HTTP/2 but not under HTTP/1.1?
Certain resources (e.g. CSS and Javascript unless set with async attribute) are render blocking and they must be downloaded before the initial paint will happen. In theory HTTP/2 is faster for multiple downloads so all that should happen if you move to HTTP/2 is these will download sooner and so it will paint earlier.
Now the limited number of connections that browsers used under HTTP/1.1 (typically 6-8) created a natural queuing mechanism and the browser had to prioritize these critical resources over non-critical resources like images and send them first. With HTTP/2 there is a much higher limit (typically 100-120 parallel downloads depending on the server), so the browser no longer prioritizes and there is a concern that if all the resources are downloaded in parallel then they could slow each other down. For example downloading 50 large print-quality images will use up a lot of bandwidth and might make a more critical CSS resource downloading at the same time take longer to download. In fact some early movers to HTTP/2 saw this scenario.
This is addressed with prioritization and dependencies in HTTP/2 - where the server can send some resource types (e.g. CSS, JavaScript) with a higher priority than others (e.g. images) rather than send everything with same priority. So even though all 51 resources are in flight at the same time the CSS data should be sent first, with the images after. The client can also suggest a prioritization but it's the server that ultimately decides. This does depend on the server implementation to have a good prioritization strategy so it is good to test before switching over.
The other thing worth bearing in mind is that how to measure this changes under HTTP/2. If a low priority image is queued for 4 seconds under HTTP/1 waiting for one of the limited number of HTTP/1 connections to become free and then downloads in 2 seconds you may have previously measured that as a 2 second download time (which is technically not correct as you weren't including the queuing time so it was actually 6 seconds). So if this shows as the 5 seconds under HTTP/2 as it is sent immediately you may think it is 3 seconds slower when in fact it's a full second faster. Just something to be aware of when analysis the impact of any move to HTTP/2. It's much better to look at the overall key metrics (first paint, document complete...etc.) rather than individual requests when measuring the impact because of this.
Incidentally this is a very interesting topic that goes beyond what can reasonably be expected to be covered in a StackOverflow answer. It's a shameless plug, but I cover a lot of this in a book I am writing on the topic if interested in finding out more on this.
What you mentioned should ideally not happen if the web server obeys the priorities that browser requests with. On http2, browser typically requests css with highest priority and async js, images with lower priority. This should ensure that even if your images, js and css are requested at the same time - the server sends css back first.
The only case this should not happen is if browser is not configured correctly.
You can watch priority of various resources for any page within chrome devtools.
I am new to webdev and would like to understand how CDNs work?
Specifically how do CDNs achieve performance in retrieving the content? Is the content stored on disk, in a database in binary format, or on disk but the location stored in the database?
How is the data kept in sync? Does the end user only push new/updated content to one location and the CDN takes care of synchronizing the content?
When is it wise to use a CDN and are there any other alternatives aside from storing the data on disk?
A content delivery network or content distribution network (CDN) is a globally distributed network of proxy servers deployed in multiple data centers.
CDNs are very useful for a multitude of reasons. For website owners who have visitors in multiple geographic locations, content will be delivered faster to these users as there is less distance to travel. CDN users also benefit from the ability to easily scale up and down much more easily due to traffic spikes. On average, 80% of a website consist of static resources therefore when using a CDN, there is much less load on the origin server.
Source
I'm downloading a full catalog worth of static image content (million+ images, all legal) from various webservers.
I want to download the images efficiently, but I'm considering what limits per domain I should place on the # of concurrent connections and time between connection attempts to avoid being blacklisted by DOS tools and other limiters.
The keyword I needed to look for was "webcrawler politness", that popped up some useful articles that answer the question quite well:
Typical politeness factor for a web crawler?
http://blog.mischel.com/2011/12/20/writing-a-web-crawler-politeness/
Our site has performance issues, and spambots make it worse, so we decided to configure Dynamic IP Restrictions to allow only 5 concurrent requests (per one request per IP). My concern is that a single page may do many concurrent requests as it contains many images (we have like 20 images per one page), so will these be blocked? Are images calculated as request in Dynamic IP Restrictions?
I found it, yes it's considered as request.
We had to switch on Dynamic IP Restrictions after a brute force attack on our website. We started with the default numbers.
After performing a 'hard' refresh (CTRL+F5) in my browser, our homepage was half covered in broken images! A request for a single ASPX can trigger thirty image requests and several CSS/JS file requests too. All happening within a few milliseconds, all from the same IP.
Your settings need to allow for this. Sadly this means the hackers get more of a chance too.
I have a slowly evolving dynamic website served from J2EE. The response time and load capacity of the server are inadequate for client needs. Moreover, ad hoc requests can unexpectedly affect other services running on the same application server/database. I know the reasons and can't address them in the short term. I understand HTTP caching hints (expiry, etags....) and for the purpose of this question, please assume that I have maxed out the opportunities to reduce load.
I am thinking of doing a brute force traversal of all URLs in the system to prime a cache and then copying the cache contents to geodispersed cache servers near the clients. I'm thinking of Squid or Apache HTTPD mod_disk_cache. I want to prime one copy and (manually) replicate the cache contents. I don't need a federation or intelligence amongst the slaves. When the data changes, invalidating the cache, I will refresh my master cache and update the slave versions, probably once a night.
Has anyone done this? Is it a good idea? Are there other technologies that I should investigate? I can program this, but I would prefer a configuration of open source technologies solution
Thanks
I've used Squid before to reduce load on dynamically-created RSS feeds, and it worked quite well. It just takes some careful configuration and tuning to get it working the way you want.
Using a primed cache server is an excellent idea (I've done the same thing using wget and Squid). However, it is probably unnecessary in this scenario.
It sounds like your data is fairly static and the problem is server load, not network bandwidth. Generally, the problem exists in one of two areas:
Database query load on your DB server.
Business logic load on your web/application server.
Here is a JSP-specific overview of caching options.
I have seen huge performance increases by simply caching query results. Even adding a cache with a duration of 60 seconds can dramatically reduce load on a database server. JSP has several options for in-memory cache.
Another area available to you is output caching. This means that the content of a page is created once, but the output is used multiple times. This reduces the CPU load of a web server dramatically.
My experience is with ASP, but the exact same mechanisms are available on JSP pages. In my experience, with even a small amount of caching you can expect a 5-10x increase in max requests per sec.
I would use tiered caching here; deploy Squid as a reverse proxy server in front of your app server as you suggest, but then deploy a Squid at each client site that points to your origin cache.
If geographic latency isn't a big deal, then you can probably get away with just priming the origin cache like you were planning to do and then letting the remote caches prime themselves off that one based on client requests. In other words, just deploying caches out at the clients might be all you need to do beyond priming the origin cache.