Harvesting Dynamic HTTP Content to produce Replicating HTTP Static Content - http

I have a slowly evolving dynamic website served from J2EE. The response time and load capacity of the server are inadequate for client needs. Moreover, ad hoc requests can unexpectedly affect other services running on the same application server/database. I know the reasons and can't address them in the short term. I understand HTTP caching hints (expiry, etags....) and for the purpose of this question, please assume that I have maxed out the opportunities to reduce load.
I am thinking of doing a brute force traversal of all URLs in the system to prime a cache and then copying the cache contents to geodispersed cache servers near the clients. I'm thinking of Squid or Apache HTTPD mod_disk_cache. I want to prime one copy and (manually) replicate the cache contents. I don't need a federation or intelligence amongst the slaves. When the data changes, invalidating the cache, I will refresh my master cache and update the slave versions, probably once a night.
Has anyone done this? Is it a good idea? Are there other technologies that I should investigate? I can program this, but I would prefer a configuration of open source technologies solution
Thanks

I've used Squid before to reduce load on dynamically-created RSS feeds, and it worked quite well. It just takes some careful configuration and tuning to get it working the way you want.

Using a primed cache server is an excellent idea (I've done the same thing using wget and Squid). However, it is probably unnecessary in this scenario.
It sounds like your data is fairly static and the problem is server load, not network bandwidth. Generally, the problem exists in one of two areas:
Database query load on your DB server.
Business logic load on your web/application server.
Here is a JSP-specific overview of caching options.
I have seen huge performance increases by simply caching query results. Even adding a cache with a duration of 60 seconds can dramatically reduce load on a database server. JSP has several options for in-memory cache.
Another area available to you is output caching. This means that the content of a page is created once, but the output is used multiple times. This reduces the CPU load of a web server dramatically.
My experience is with ASP, but the exact same mechanisms are available on JSP pages. In my experience, with even a small amount of caching you can expect a 5-10x increase in max requests per sec.

I would use tiered caching here; deploy Squid as a reverse proxy server in front of your app server as you suggest, but then deploy a Squid at each client site that points to your origin cache.
If geographic latency isn't a big deal, then you can probably get away with just priming the origin cache like you were planning to do and then letting the remote caches prime themselves off that one based on client requests. In other words, just deploying caches out at the clients might be all you need to do beyond priming the origin cache.

Related

Why is the TUX Web Server Dead? Does Nginx/Lighttpd/Epoll/Kqueue Replace it?

I recall a very fast kernel module for Linux called "TUX" for static files to answer IIS's superior-to-Linux static file web-serving performance and solve the "C10K problem." Now I keep seeing:
Nginx
Lighttpd
CDNs
... for "fast static file-serving." Serving static files quickly isn't difficult if your OS has the right features. Windows has since the invention of IO Completion ports, overlapped I/O, etc.
Did Tux die because of the security implications? Was it an experiment that Kqueue/Epoll combined with features like Sendfile made obsolete? What is the best solution to serve 100% static content -- say packshots of 50 or so images to simulate a "flipbook" movie.
I understand this ia "Server-related" question, but it's also theoretical. If it's purely static, is a CDN really going to be better anyway?
Mostly because Ingo Molnár stopped working on it. Why? I believe it was because kernel version 2.2 implemented the sendfile(2) call which matched (approximately) the massive performance benefits previously achieved by Tux. Note the Tux 2.0 Reference Manual is dated 2001.
serving a static file has three steps: decide which file to send, decide if to send the file, send the file. Now Tux did a really good job of sending the file, so so on deciding which file to send, and a lousy job of deciding if to send a file. The decisions are matter of policy and should be done in user space. Add sendfile and I can write a server that will be almost as good as tux in a short time, and add stuff without recompiling my kernel. maybe sql logging. just thinking about userspace sql calls being made from a kernel makes my eye twitch.
Tux isn't required anymore due to sendfile(). Nginx takes full advantage of this and IMO is one of the best web servers for static or non-static content. I've found memory problems with lighttpd, ymmv.
The whole purpose of CDN's is that it moves the 'web server' closer to the end users browser. This means less network hops and less round trip delay, without a large cost to you having to host multiple web servers around the world and using geo dns to send the user to the closest. Be aware tho as these web servers aren't in your control they could be overloaded and the benefit of less hops could be diminished if their network is overloaded. CDN's are usually targets of DDOS attacks and you might get caught up in something that has nothing to do with your content.

Increase in number of requests form server cause website slow?

In My office website,webpage has 3css files ,2 javascript files ,11images and 1page request total 17 requests from server, If 10000 people visit my office site ...
This may slow the website due to more requests??
And any issues to the server due to huge traffic ??
I remember My tiny office server has
Intel i3 Processor
Nvidia 2Gb Graphic card
Microsoft 2008 server
8 GB DDR3 Ram and
500GB Hard disk..
Website developed on Asp.Net
Net speed was 10mbps download and 2mbps upload.using static ip address.
There are many reasons a website may be slow.
A huge spike in Additional Traffic.
Extremely Large or non-optimized graphics.
Large amount of external calls.
Server issue.
All websites should have optimized images, flash files, and video's. Large types media slow down the overall loading of each page. Optimize each image.PNG images have an improved weighted optimization that can offer better looking images with smaller file size.You could also run a Traceroute to your site.
Hope this helps.
This question is impossible to answer because there are so many variables. It sounds like you're hypothesising that you will have 10000 simultaneous users, do you really expect there to be that many?
The only way to find out if your server and site hold up under that kind of load is to profile it.
There is a tool called Apache Bench http://httpd.apache.org/docs/2.0/programs/ab.html which you can run from the command line and simulate a number of requests to your server to benchmark it. The tool comes with an install of apache, then you can simulate 10000 requests to your server and see how the request time holds up. At the same time you can run performance monitor in windows to diagnose if there are any bottlenecks.
Example usage taken from wikipedia
ab -n 100 -c 10 http://www.yahoo.com/
This will execute 100 HTTP GET requests, processing up to 10 requests
concurrently, to the specified URL, in this example,
"http://www.yahoo.com".
I don't think that downloads your page dependencies (js, css, images), but there probably are other tools you can use to simulate that.
I'd recommend that you ensure that you enable compression on your site and set up caching as this will significanly reduce the load and number of requests for very little effort.
Rather than hardware, you should think about your server's upload capacity. If your upload bandwidth is low, of course it would be a problem.
The most possible reason is because one session is lock all the rest requests.
If you not use session, turn it off and check again.
relative:
Replacing ASP.Net's session entirely
jQuery Ajax calls to web service seem to be synchronous

asp.net how to test load balancing is working

Our infrastructure team has configured a a load balancing using Radware. Basically we have 3 web server that are load balanced.
Before we go live I would like to test and make sure that load balancing is working. How do I test the following:
3 servers are load balanced and requested are evenly distributed. (Any automated tool exists?)
Asp.net InProc session are working.
You can test by first generating an artificial load on your site (with any one of a number of load generators). Then have a look at the Windows Performance Counters for each site: things like HTTP requests per second and CPU use would be reasonable high-level metrics.
Yes, there are automated tools, but they usually require quite a bit of setup, and the better ones charge a fee. Perf counters are fast, easy and free.
As #swannee said, InProc sessions won't work in a load-balanced scenario unless your load balancer is configured to use sticky sessions. It's better to use SQL Server sessions with load balancing.
FWIW, you can test your software in a "mini" load balanced scenario on a single server by enabling IIS web gardens (multiple worker processes), from the AppPool config dialog.
Can you look in the IIS server logs to see how many hits each server is getting?
http://msdn.microsoft.com/en-us/library/ms953324.aspx
Also, unless you are using sticky sessions, you are going to have problems with InProc session. It won't work on a server farm (unless as stated you have sticky sessions turned on). If you don't have sticky sessions, you'll be able to tell real quickly that your session is being lost between requests with just some manual testing.
Our organization makes a series of ping and advanced status pages. These pages are monitored by our load balancers so it can take out unhealthy nodes in the event one node loses a connection to a database server or the node itself is having issues.
Our ping pages spit out the server name that you're connecting to and the status. They are avaliable by the common server names themself, like server01.application.com/ping and server02.application.com/ping but more important, they all answer on application.com/ping.
Refreshing the page will show us a new connection (you can see the server name change).
To test load you could use WCat, it's not the easiest tool to setup and script but it works.
To test sessions. you'll need to build out some pages that you can do load testing on to test sessions

ASP.Net load balancing

I am working on asp.net (newbie) and I am trying to understand what it means to do "load balancing" for the web site. The website will be used by multiple users and resources (database, web service,..).
If anyone could help me understanding the concept of the load balance for asp.net web site, I would really appreciate it.
Thanks.
One load-balancing-related issue you may want to be aware of at development time: where you store your session state. This MSDN article gives a good overview of your options.
If you implement your asp.net system using "out-of-process" or "sql-server-mode" session state management, that will give you some additional flexibliity later, if you decide to introduce a load-balancer to your deployed system:
Your load balancer needn't handle session affinity. As one poster mentioned above, all modern load-balancers handle it anyway, so this is a minor consideration in any case.
Web-gardens (a sort of IIS/server-implemented load-balancer) REQUIRES use of "out-of-process" or "sql-server-mode" session state management. So if your system is already configured that way, you'll be one step closer to being able to use web-gardens.
What is it?
Load balancing simply refers to distributing a workload between two or more computers. As a concept, it's not unique to asp.net. Although having separate machines for your database and web server could be called "load balancing" it more commonly refers to using multiple machines to serve a single role, such as having multiple web servers.
Should you worry about it? Probably not. Do you already have a performance problem? Are your database and web server on their own machines? If you do find that your server resources are strained, it would probably be easier to scale up (a more powerful single machine) than out (load balancing). These days, a dedicated box can handle a LOT of traffic if your code is decent.
Load Balancing, in the programming sense, does not apply to ASP.NET; it applies to a technique to try to distribute server load across two or more machines, rather than it all being used on one machine. Unless you will have many thousands (millions?) of users, you probably do not need to worry about it.
Check the Wikipedia article for more information.
Load balancing is not specific for any on technology stack be it asp.net, jsp etc. To load balance is to spread the incoming requests to a web site over more than one server. This is typically done with a software or hardware load balancer. The load balancer sits in front of two or more web servers and delegates the incoming traffic. Although this technique is not limited to web servers. Load Balancing
Enjoy!
I've never used it, but an option is IIS Application Request Routing.
IIS Application Request Routing (ARR)
2.0 enables Web server administrators, hosting providers, and Content
Delivery Networks (CDNs) to increase
Web application scalability and
reliability through rule-based
routing, client and host name
affinity, load balancing of HTTP
server requests, and distributed disk
caching
In a typical web server/database scenario, the db is almost always guaranteed to load up the machine first. This is because dealing with storing data requires more resources. Before you even start looking at load balancing your web server, you need to think about how to load balance the database.
Spreading one database across multiple servers is a lot harder than load balancing a web server. One of the techniques that can be used is sharding (or horizontal partitioning). This is where some records are stored on one server, and other records - on another server. For example records with ID 1-900000 are on server 1 and records 900001- are on server 2.
In comparison to DB load balancing, spreading the load across multiple ASP.NET servers is not overly complicated. Most of the session issues can be easily mitigated by using out of process session and/or never talking to Application.Cache directly. Data load balancing on the other hand is hard and requires a lot of planning and trial and error. In most cases, talking to a load balanced DB requires using an ORM which supports it (e.g. NHibernate) or your own Data Access Layer. The reason being is that you need to take out establishing a connection from the code that uses the database, so that the decision which DB to talk to is handled in one place.
the exact solution is to save session into the SQL Server with Stored Procedure. To read session call 'SessionCheck' stored Procedure.
I'd add that it really isn't something to worry about. By the time you need a load balancer, you can probably afford one of the neato newfangled ones with sticky sessions so you don't even have to deal with the session boogeyman.

Anyone using Memcached with ASP.NET on a distributed farm?

We have 22 HTTP servers each running their own individual ASP.NET Caches. They read from a read only DB that is only updated off peak hours.
We use a file dependency to invalidate the cache, prompting the servers to "new up" their caches...If this is accidentally done during peak hours, it risks bringing down our DB cluster due to the sudden deluge of open connections.
Has anyone used memcached with ASP.NET in this distributed form? It seems to me that it would offer a huge advantage of having to only build up one cache (and hit the DB 21 times less), while memcached would handle distributing it on each box.
If you have, do you place it on the same box as the HTTP boxes, or do you run a separate cache tier? How well does it scale, can we expect it to need powerful servers? Our working dataset is not huge (We fit it into 4 gigs of memory on each HTTP box just fine).
How do you handle invalidation?
Looking for experiences and war stories.
EDIT: Win2k3, IIS6, 64-bit servers...4 gigs per box (I believe, we may have upped it to 16 gigs when we changed to 64-bit servers).
"memcached would handle distributing it on each box"
memcached does not distribute or replicate a cache to each box in a memcached farm. The memcached client basically hashes the key and chooses a cache server based on that hash. When one of the memcached servers fail you will lose whatever cached items existed on that server, however, the client will recognize the failure and begin writing values to a different server. This being the case, your code needs to account for missing items in the cache and reset them if necessary.
This article discusses the memcached architecture in more detail: How memcached works.
Best practice (according to the memcached site) is to run memcached on the same box as your web server app or else you're making http calls (which isn't all that bad, but it's not optimal). If you're running a 64-bit app server (which you probably should if you're going to be running memcached), then you can load up each of the servers with loads of memory and it will be available to memcached. There's not much in the way of CPU resources used by memcached, so if your current app server isn't very taxed, it will remain that way.
Haven't used them together, but I've used them both on separate projects.
Last I saw the documentation explicitly said that sharing with the web server was ok.
Memcache really only needs RAM and if you take your asp.net cache out of the equation how much RAM is you web server actually using? Probably not much. It won't compete much with your web server for CPU and it doesn't need disk at all. You might consider segmenting off the network traffic (if you don't already) from the incoming web requests.
It worked well and was fast I didn't have any problems with it.
Oh, invalidation was explicit on the project I used it on. Not sure what other modes there are for that.
If you want to get replication accross your memcached servers then it maybe worth a look at repcached. It's a patch for memcached that handles the replication part.
Worth checking out Velocity, which is a distributed cache provided by Microsoft. I cannot give you a point-by-point comparison to memcached, but Velocity is integrated with ASP.NET and will continue to get more development and integration.

Resources