CDN yes or no in this case - cdn

Site that have nice but raising number of visits / hits, heavily operates on few database has 0.6 sec / req average serving time, most of that time is spend (about 30% / 0.2 sec/avg) on "first byte" so working on solving that problem.
Another way to improve speed that I considering is using CDN. I am not very sure can CDN in this case help:
-it all dynamic content
-have a lot of images
-already have internal caching (front end), (and caching on client side if that matters)
So, please tell me what you would suggest, should I try with CDN and how much improvement I can expect? (traffic is mostly US)

As #matthias says accept some of your other answers...
Whether a CDN will make much difference depends on how far they are from the server and what the latencies are like.
A CDN will give you more connections but you can do this by adding extra hostnames.
There are probably other ways of improving page load times, run a test on webpagetest.org and post the results.

Related

How to improve the Core Web Vital after getting 95 above in performance

I am using WordPress and Wp-rocket is installed on my website.
Now my issue is, I am getting 98 in the performance but my Core Web Vitals Assessment still shows Failed.
Any idea how to solve core web vitals? LCP is showing 2.9s. Do I need to work on this?
Core Web Vitals are measured over field data, not the fixed, repeatable definitions that lab-based tools like Lighthouse uses to analyse your website. See this article for good discussion on them both.
Often times Lighthouse is set too strictly, and people complain it shows worse performance than is seen by the site's real users, but it is just as easy to have the opposite as you see here. PageSpeed Insights (PSI) tries to use settings that are broadly applicable to all sites to give you "insights" into how to improve your performance but the results should be calibrated towards the real-user data that you see at the top of the audit.
In your case, I can see from your screenshots that you are seeing a high Time to First Byte (TTFB) in your real user data of 1.9 seconds. This makes passing the LCP limit of 2.5 seconds quite tough as it only leaves 0.6 seconds for that.
The question is why are you seeing that long TTFB in the field, when you don't see the same in your lab-based results - where you see a 1.1 second LCP time - including TTFB? There could be a number of reasons, and several potential options to resolve:
Your users are further away from your data centre, whereas PSI is close by. Are you using a CDN?
Your users are predominately using poorer network conditions than Lighthouse uses. Do you just need to serve less to them in these cases? For example, hold back images for those on slower network conditions using the Effective Connection Type API and only load them on demand so LCP is text by default? Or don't use web fonts for these users. Or other forms of progressive enhancements.
Your page visits are often jumping through several redirect steps - all of which add to TTFB, but for PSI you put the end of URL in directly so miss this in the analysis. This can often be out of your control if the referrer uses a link shortener (e.g. Twitter does).
Your page visits are often from uncached pages, that take time to generate. But when using PSI you run the test a few times, and so are benefiting from that page being cached and so is served quickly. Can you optimise your back-end server code, or improve your caching?
Your pages are not eligible for the super-fast in-memory bfcache for repeat visits when going back and forth throughout the site, which can be seen as a free web-performance win!.
Your pages often suffer from contention when lots of people visit at once, and that wasn't apparent in the PSI tests.
Those are some of the more common reasons for a slow TTFB but you may understand your sites, your infrastructure, and your users better to understand the main reason. Once you solve that, you should see your LCP times reduce and hopefully pass CWV.

How to detect a reasonable number of concurrent requests I can safely perform on someone's server?

I crawl some data from the web, because there is no API. Unfortunately, it's quite a lot of data from several different sites and I quickly learned I can't just make thousands of requests to the same site in a short while... I want to approach the data as fast as possible, but I don't want to cause a DOS attack :)
The problem is, every server has different capabilities and I don't know them in advance. The sites belong to my clients, so my intention is to prevent any possible downtime caused by my script. So no policy like "I'll try million requests first and if it fails, I'll try half million, and if it fails..." :)
Is there any best practice for this? How Google's crawler knows how many requests it can do in the same while to the same site? Maybe they "shuffle their playlist", so there are not as many concurrent requests to a single site. Could I detect this stuff somehow via HTTP? Wait for a single request, count response time, approximately guess how well balanced the server is and then somehow make up a maximum number of concurrent requests?
I use a Python script, but this doesn't matter much for the answer - just to let you know in which language I'd prefer your potential code snippets.
The google spider is pretty damn smart. On my small site it hits me 1 page per minute to the second. They obviously have a page queue that is filled keeping time and sites in mind. I also wonder if they are smart enough about not hitting multiple domains on the same server -- so some recognition of IP ranges as well as URLs.
Separating the job of queueing up the URLs to be spidered at a specific time from the actually spider job would be a good architecture for any spider. All of your spiders could use the urlToSpiderService.getNextUrl() method which would block (if necessary) unless the next URL is to be spidered.
I believe that Google looks at the number of pages on a site to determine the spider speed. The more pages that you have the refresh in a given time then the faster they need to hit that particular server. You certainly should be able to use that as a metric although before you've done an initial crawl it would be hard to determine.
You could start out at one page every minute and then as the pages-to-be-spidered for a particular site increases, you would decrease the delay. Some sort of function like the following would be needed:
public Period delayBetweenPages(String domain) {
take the number of pages in the to-do queue for the domain
divide by the overall refresh period that you want to complete in
if more than a minute then just return a minute
if less than some minimum then just return the minimum
}
Could I detect this stuff somehow via HTTP?
With the modern internet, I don't see how you can. Certainly if the server is returning after a couple of seconds or returning 500 errors, then you should be throttling way back but a typical connection and download is sub-second these days for a large percentage of servers and I'm not sure there is much to be learned from any stats in that area.

When to use load balancing?

I am just getting in to the more intricate parts of web development. This may not be in the best place. However, when is it best to get load balancing for a web project? I understand that it depends on good design/bad design as to how many users you can get to visit a site without it REALLY effecting the performance. However, I am planning to code a new project that could potentially have a lot of users and I wondered if I should be thinking off the bat about load balancing. Opinions welcome; thanks in advance!
I should not also that the project most likely will be asp.net (webforms or mvc not yet decided) with backend of mongodb or pgsql(again still deciding).
Load balancing can also be a form of high availability. What if your web server goes down? It can take a long time to replace it.
Generally, when you need to think about throughput you are already rich because you have an enormous amount of users.
Stackoverflow is serving 10m unique users a month with a few servers (6 or so). Think about how many requests per day you had if you were constantly generating 10 HTTP responses per second for 8 hot hours: 10*3600*8=288000 page impressions per day. You won't have that many users soon.
And if you do, you optimize your app to 20 requests per second and CPU core which means you get 80 requests per second on a commodity server. That is a lot.
Adding a load balancer later is usually easy. LBs can tag each user with a cookie so they get pinned to one particular target. You app will not notice the difference. Usually.
Is this for an e-commerce site? If so, then the real question to ask is "for every hour that the site is down, how much money are you losing?" If that number is substantial, then I would make load balancing a priority.
One of the more-important architecture decisions that I have seen affect this, is the use of session variables. You need to be able to provide a seamless experience if your user ends-up on different servers during their visit. Session variables won't transfer from server to server, so I would avoid using them.
I support a solution like this at work. We run four (used to be eight) .NET e-commerce websites on three Windows 2k8 servers (backed by two primary/secondary SQL Server 2008 databases), taking somewhere around 1300 (combined) orders per day. Each site is load-balanced, and kept "in the farm" by a keep-alive. The nice thing about this, is that we can take one server down for maintenance without the users really noticing anything. When we bring it back, we re-enable our replication service and our changes get pushed out to the other two servers fairly quickly.
So yes, I would recommend giving a solution like that some thought.
The parameters here that may affect the one the other and slow down the performance are.
Bandwidth
Processing
Synchronize
Have to do with how many user you have, together with the media you won to serve.
So if you have to serve a lot of video/files to deliver, you need many servers to deliver it. Let say that you do not have, what is the next think that need to check, the users and the processing.
From my experience what is slow down the processing is the locking of the session. So one big step to speed up the processing is to make a total custom session handling and your page will no lock the one the other and you can handle with out issue too many users.
Now for next step let say that you have a database that keep all the data, to gain from a load balance and many computers the trick is to make local cache of what you going to show.
So the idea is to actually avoid too much locking that make the users wait the one the other, and the second idea is to have a local cache on each different computer that is made dynamic from the main database data.
ref:
Web app blocked while processing another web app on sharing same session
Replacing ASP.Net's session entirely
call aspx page to return an image randomly slow
Always online
One more parameter is that you can make a solution that can handle the case of one server for all, and all for one :) style, where you can actually use more servers for backup reason. So if one server go off for any reason (eg for update and restart), the the rest can still work and serve.
As you said, it depends if/when load balancing should be introduced. It depends on performance and how many users you want to serve. LB also improves reliability of your app - it will not stop when one system goes crashing down. If you can see your project growing to be really big and serve lots of users I would sugest to design your application to be able to be upgraded to LB, so do not do anything non-standard. Try to steer away of home-made solutions and always follow good practice. If later on you really need LB it should not be required to change your app.
UPDATE
You may need to think ahead but not at a cost of complicating your application too much. Do not go paranoid and prepare everything to work lightning fast 'just in case'. For example, do not worry about sessions - session management can be easily moved to SQL Server at any time and this is the way to go with LB. Caching will also help if you hit some bottlenecks in the future but you do not need to implement it straight away - good design (stable interfaces), separation and decoupling will allow for the cache to be added later on. So again - stick to good practices, do not close doors but also do not open all of them straight away.
You may find this article interesting.

Is caching a good idea? If so, where?

I have an asp.net web site with 10-25k visitors a day (peaks of over 60k before holidays). Pages/visit is also high, since it's a content site.
I have a few specific pages which generate about 60% of the traffic. These pages are a bit complex and are DB heavy (sql server 2008 r2 backend).
I was wondering if it's worth "caching" a static version of these pages (I hear this is possible) and only re-render them when something changes (about once in 48hs).
Does this sound like a good idea? Where would be the best place to implement this?
(asp.net, iis, db)
Update: Looks like a good option for me is outputcache with SqlDependency. I see a reference to some kind of SQL server notification for invalidating the cache, but I only see talk of SQL server 2005. Has this option been deprecated by Microsoft? Any new way to handle this?
Caching is a broad term that can happen at a number of different points. The optimum solution may be a combination of some or all.
For example, you can add page, or output caching as described here, which caches output on the web server, which I think is what you were referring to.
In addition, you can cache the data in memory using something like memcached, so that your data is more available to the web server as it builds the page, but you need to look at cache hit rate to know for sure that you are caching the right data.
Also, although slightly off the topic of improving db heavy pages, you can cache static resources that change infrequently like images, css and include files using a content delivery network. Any CDN will almost certainly have a higher bandwidth and a cheaper data plan than your own connection because of the economies of scale, so the more of your content you can serve from there the better, in general.
Your first question was "I was wondering if it's worth "caching" a static version of these pages". I guess the answer to that depends on whether there is a performance problem at the moment, and where the cause of that problem is. If the pages are being served quickly and reliably, then quite possibly it's not worth implementing caching. If there is a performance problem, then where is it? Is it in db read time, or is it in the time spent building the page once the data has been returned?
I don't have much experience in caching, but this is what I would try to do:
I would look at your stats and run some profiles, see which are the most heavily visited pages that run the most expensive SQL queries. Pick one or two of the most expensive pages.
If the page is pseudo static, that is, no data on it such as your logged in username, no comments, etc etc, you can cache the entire page. You can set a relatively long cache as well, anything from 1 min to a few hours.
If the page has some dynamic real time content on it, such as comments, you can identify the static controls and cache those individually. Don't put a page wide cache on.
Good luck, sounds like a cache could improve performance.
Caching may or may not help. For example, if a site has low traffic and if the caching is enabled, the server processes to create the cache before serving the request. And because the traffic is low, there can be enough delay between successive requests. So the cached version may even expire and the server again creates a new cached version. This process makes the response even slower than normal.
Read more: Caching - the good, the bad.
I have myself experienced this issue.
If the traffic is good, caching may help you have better load times.
Cheers
Aditya

IIS 6.0 wildcard mapping benchmarks?

I'm quickly falling in love with ASP.NET MVC beta, and one of the things I've decided I won't sacrifice in deploying to my IIS 6 hosting environment is the extensionless URL. Therefore, I'm weighing the consideration of adding a wildcard mapping, but everything I read suggests a potential performance hit when using this method. However, I can't find any actual benchmarks!
The first part of this question is, do you know where I might find such benchmarks, or is it just an untested assumption?
The second part of the question is in regards to the 2 load tests I ran using jMeter on our dev server over a 100Mbs connection.
Background Info
Our hosting provider has a 4Gbs burstable internet pipe with a 1Gbs backbone for our VLAN, so anything I can produce over the office lan should translate well to the hosting environment.
The test scenario was to load several images / css files, since the supposed performance hit comes when requesting files that are now being passed through the ASP.NET ISAPI filter that would not normally pass through it. Each test contained 50 threads (simulated users) running the request script for 1000 iterations each. The results for each test are posted below.
Test Results
Without wildcard mapping:
Samples: 50,000
Average response time: 428ms
Number of errors: 0
Requests per second: 110.1
Kilobytes per second: 11,543
With wildcard mapping:
Samples: 50,000
Average response time: 429ms
Number of errors: 0
Requests per second: 109.9
Kilobytes per second: 11,534
Both tests were run warm (everything was in memory, no initial load bias), and from my perspective, performance was about even. CPU usage was approximately 60% for the duration of both tests, memory was fine, and network utilization held steady around 90-95%.
Is this sufficient proof that wildcard mappings that pass through the ASP.NET filter for ALL content don't really affect performance, or am I missing something?
Edit: 11 hours and not a single comment? I was hoping for more.. lol
Chris, very handy post.
Many who suggest a performance disadvantage infer that the code processed in a web application is some how different/inferior to code processed in the standard workflow. The base code type maybe different, and sure you'll be needing the MSIL interpreter, but MS has shown in many cases you'll actually see a performance increase in a .NET runtime over a native one.
It's also wise to consider how IIS has to be a "jack of all trades" - allowing all sorts of configuration and overrides even on static files. Some of those are designed for performance increase (caching, compression) and - indeed - will be lost unless you reimplement them in your code, but many of them are for other purposes and may not ever be used. If you build for your needs (only) you can ignore those other pieces and should be realising some kind of performance advantage, even though there's a potential ASP.NET disadvantage.
In my (non-.NET) MVC testing I'm seeing considerable (10x or more) performance benefits over webforms. Even if there was a small hit on the static content - that wouldn't be a tough pill to swallow.
I'm not surprised the difference is almost negligible in your tests, but I'm happy to see it backed up.
NOTE: You can disable wildcard mapping from static directories (I keep all static files in /static/(pics|styles|...)) in IIS. Switch the folder to an application, remove the wildcard mapping, and switch it back from an application and - voilĂ  - static files are handled by IIS without pestering your ASP.NET.
I think there are several additional things to check:
Since we're using the .Net ISAPI filter, we might be using threads used to run application for serving static assets. I would run the same load test while reviewing performance counters of threads - Review this link
I would run the same load test while running Microsoft Performance Analyzer and compare the reports.
I was looking for benchmark like this for a long time. Thanx!
In my company we did wildcard mapping on several web sites (standard web forms, .net1.1 and 2, iis6), and sys admins said to me that they didn't noticed any performance issues.
But, it seems you stressed network, not server. So maybe the scores are so similar because network bottleneck? Just thinking...
That's quite an impressive post there, thanks very much for that.
We're also assessing the security and performance concerns with removing a piece of software that's always been in place to filter out unwanted traffic.
Will there by any further benchmarking on your part?
Cheers,
Karl.
Seems the bottleneck in your test is network utilization. If the performance degradation is expected to be on the CPU usage (I'm not sure it is, but it's reasonable), then you wouldn't notice it with the test you did.
Since this is a complex system, with many variables - it does not mean that there is no performance degradation. It means that in your scenario - the performance degradation is probably negligible.

Resources