Anyone using Memcached with ASP.NET on a distributed farm? - asp.net

We have 22 HTTP servers each running their own individual ASP.NET Caches. They read from a read only DB that is only updated off peak hours.
We use a file dependency to invalidate the cache, prompting the servers to "new up" their caches...If this is accidentally done during peak hours, it risks bringing down our DB cluster due to the sudden deluge of open connections.
Has anyone used memcached with ASP.NET in this distributed form? It seems to me that it would offer a huge advantage of having to only build up one cache (and hit the DB 21 times less), while memcached would handle distributing it on each box.
If you have, do you place it on the same box as the HTTP boxes, or do you run a separate cache tier? How well does it scale, can we expect it to need powerful servers? Our working dataset is not huge (We fit it into 4 gigs of memory on each HTTP box just fine).
How do you handle invalidation?
Looking for experiences and war stories.
EDIT: Win2k3, IIS6, 64-bit servers...4 gigs per box (I believe, we may have upped it to 16 gigs when we changed to 64-bit servers).

"memcached would handle distributing it on each box"
memcached does not distribute or replicate a cache to each box in a memcached farm. The memcached client basically hashes the key and chooses a cache server based on that hash. When one of the memcached servers fail you will lose whatever cached items existed on that server, however, the client will recognize the failure and begin writing values to a different server. This being the case, your code needs to account for missing items in the cache and reset them if necessary.
This article discusses the memcached architecture in more detail: How memcached works.

Best practice (according to the memcached site) is to run memcached on the same box as your web server app or else you're making http calls (which isn't all that bad, but it's not optimal). If you're running a 64-bit app server (which you probably should if you're going to be running memcached), then you can load up each of the servers with loads of memory and it will be available to memcached. There's not much in the way of CPU resources used by memcached, so if your current app server isn't very taxed, it will remain that way.

Haven't used them together, but I've used them both on separate projects.
Last I saw the documentation explicitly said that sharing with the web server was ok.
Memcache really only needs RAM and if you take your asp.net cache out of the equation how much RAM is you web server actually using? Probably not much. It won't compete much with your web server for CPU and it doesn't need disk at all. You might consider segmenting off the network traffic (if you don't already) from the incoming web requests.
It worked well and was fast I didn't have any problems with it.
Oh, invalidation was explicit on the project I used it on. Not sure what other modes there are for that.

If you want to get replication accross your memcached servers then it maybe worth a look at repcached. It's a patch for memcached that handles the replication part.

Worth checking out Velocity, which is a distributed cache provided by Microsoft. I cannot give you a point-by-point comparison to memcached, but Velocity is integrated with ASP.NET and will continue to get more development and integration.

Related

Spring Redis cache expiration in memory

Using Spring Redis cache and wonder if is possible to set some data cache duration in memory. Cache of cache. If i know that data in Redis will not change for 5 minutes i dont need that Spring Redis cache touch the Redis everytime when some #Cacheable method is called.
Is Redisson the answer?
AFAICT, Redisson is simply a client-side facade or enhanced Redis (Java) client used to interface with a Redis node (or cluster) in a more powerful and convenient way, not unlike Spring Data Redis. For example, and as you already know, using Redis as a caching provider in Spring's Cache Abstraction.
Redis does seem to support client-side caching (a local cache in addiion to the remote (server) cache?), when using a Redis client/server topology. This would be transparent to you application (e.g. #Cacheable) and configured in the Redis client driver, AFAIK.
However, given my lack of experience with Redis, or even Redisson for matter, I cannot speak to this feature in detail. Redis client-side caching may need to be supported by the Redis client drivers (e.g. Jedis, Lettuce, even Redisson, etc).
NOW THE LONG-WINDED ANSWER FOR THE INTERESTED READER:
What you are describing when you state a "cache of cache" hearsay, is really having a "locally available cache" in addition to the "remote, or server-side cache". This assumes, of course, you are running Redis in a client/server (not embedded), and possibly distributed/clustered (maybe HA), capacity in the first place.
Ideally, you would choose a caching provider that supported this sort of arrangement out-of-the-box, natively. And, despite popular belief (for example), much of what Redis "reinvented" (horizontal scale-out or cluster, HA, even persistence) already existed in other, more mature solutions, built from the ground up with these concerns in mind.
SIDENOTE: Granted, the referenced article above is dated, but also a bit naive.
A "cache of (a) cache" is technically referred to as the Near Caching pattern.
It is where the "local" (application/client-side) cache mirrors the "remote" (server-side and primary) cache to avoid [a] network hop(s), i.e. latency, by only accessing the remote cache when necessary (e.g. cache miss), preferably in a "single-hop", "fault-tolerant" fashion, when the server-side is distributed and clustered.
However, a fundamental difference between the local cache and server-side, remote cache is that the local cache only stores a subset of the data from the remote cache based on "interests".
NOTE: In Redis's documentation, they referred to this as "tracking". There are different ways, across different providers, to express "interests" or track what the client has accessed. Be mindful of the different approaches here since they consume different system resources.
You might have a distributed (Web / Microservice) application architecture where several client application instances serve different demographics or populations of end-users. Clearly, those client application instances might use shared, but different subsets of the primary dataset stored in the servers. This is where the local cache and "registering interest" only in the data that matters to, or is used by, the client application comes into play.
"Registering interest" is important since the server-side, remote cache can notify clients ("push", rather than a client "pulling") hosting a local cache when data on the server changes that a client is interested in since more than 1 client might have interest in and use the same data (e.g. "record", and the intersection of data).
So, how do we properly address this concern without unnecessarily introducing extra (layers of) complexity into our system/application architecture?
Well, for one, it starts by choosing the right caching provider for the problem at hand.
DISCLAIMER: my experience stems from Apache Geode, which is the OSS variate of VMware Tanzu GemFire and a I am responsible for all things Spring for Apache Geode at VMware.
While I am a bit biased here it is not uncommon for other caching providers (and complete IMDG solutions) to support the same arrangement. For example, 1 of my personal favorites is Hazelcast.
Hazelcast calls this particular caching arrangement, or topology, an "embedded" cache and even refers to this as "near cache" in the documentation.
The nice thing about a local, embedded "Near Cache" is that it avoids latency through unnecessary networks hops, however, interest registration is key to keep data consistent, as far as possible.
I have documented, talked about and even demonstrated different caching patterns when using Spring for Apache Geode in the Spring Boot for Apache Geode documentation here and Near Caching in particular, along with the Near Caching Sample in the Samples with the other caching patterns).
I am sure you can find similar resources with other caching providers, even Redis.
At any rate, this documentation should help you understand different concerns to be aware of (e.g. memory consumption) when choosing any topology and configuration.
Good luck!

High performance ASP.NET setup

I would like to ask you what is the best setup for a following application:
ASP.NET 3.5 Web site - used as a presentation layer, a lot of AJAX and JS. Will not hit the server a lot.
ASP.NET WCF - sevice providing all data to the application. It's responsible for validation, data modeling / preparing and communication with the DB Server.
Database - SQL Server 2005 Std, some logic is coded on the server side as stored procedures. Some of the logic can be a bit time consuming. In my opinion it's the most resource consuming part of the app.
The website can have up to 1000 users per minute. We can have up to 4 servers in the following configuration: Intel Bi Xeon Quad 8x 2.00+ GHz, 16 GB RAM, SSD or RAID drives.
What is the best way to place parts of the application on the physical servers? Will they handle this kind of load?
The less scalable place in any application is database server, you can add more web and application servers but you can't replicate DB with the same ease so you will benefit in a long run if DB will not contain any logic especially any long running logic. In a lot of the applications limiting factor is not cpu but memory think about user sessions if you store 1mb of data per user you applications will be able to support 64,000 silmantanius user sessions with you machines it may be sufficient or not. Both problems can be mitigated by using application level caching but this can cause it own set of problems because now you faced with stale data. To scale session based sites you will need to use smart load balancer solution that supports sticky sessions, for your loads most likely you will need hardware load balancer.
In the application you describe, I suspect that thread management is going to be a big issue. Throwing hardware at the problem may not be the best approach.
In terms of partitioning, it depends on whether you can leverage things like caching and cache notifications. If every call to the app has to hit the DB and run a lengthy stored procedure, then you may want to have more DB machines and fewer front-end web servers.
This is a big subject. In an attempt to provide a reasonably comprehensive answer to exactly this kind of question, I ended up writing a book about it: Ultra-Fast ASP.NET: Build Ultra-Fast and Ultra-Scalable web sites using ASP.NET and SQL Server.

Harvesting Dynamic HTTP Content to produce Replicating HTTP Static Content

I have a slowly evolving dynamic website served from J2EE. The response time and load capacity of the server are inadequate for client needs. Moreover, ad hoc requests can unexpectedly affect other services running on the same application server/database. I know the reasons and can't address them in the short term. I understand HTTP caching hints (expiry, etags....) and for the purpose of this question, please assume that I have maxed out the opportunities to reduce load.
I am thinking of doing a brute force traversal of all URLs in the system to prime a cache and then copying the cache contents to geodispersed cache servers near the clients. I'm thinking of Squid or Apache HTTPD mod_disk_cache. I want to prime one copy and (manually) replicate the cache contents. I don't need a federation or intelligence amongst the slaves. When the data changes, invalidating the cache, I will refresh my master cache and update the slave versions, probably once a night.
Has anyone done this? Is it a good idea? Are there other technologies that I should investigate? I can program this, but I would prefer a configuration of open source technologies solution
Thanks
I've used Squid before to reduce load on dynamically-created RSS feeds, and it worked quite well. It just takes some careful configuration and tuning to get it working the way you want.
Using a primed cache server is an excellent idea (I've done the same thing using wget and Squid). However, it is probably unnecessary in this scenario.
It sounds like your data is fairly static and the problem is server load, not network bandwidth. Generally, the problem exists in one of two areas:
Database query load on your DB server.
Business logic load on your web/application server.
Here is a JSP-specific overview of caching options.
I have seen huge performance increases by simply caching query results. Even adding a cache with a duration of 60 seconds can dramatically reduce load on a database server. JSP has several options for in-memory cache.
Another area available to you is output caching. This means that the content of a page is created once, but the output is used multiple times. This reduces the CPU load of a web server dramatically.
My experience is with ASP, but the exact same mechanisms are available on JSP pages. In my experience, with even a small amount of caching you can expect a 5-10x increase in max requests per sec.
I would use tiered caching here; deploy Squid as a reverse proxy server in front of your app server as you suggest, but then deploy a Squid at each client site that points to your origin cache.
If geographic latency isn't a big deal, then you can probably get away with just priming the origin cache like you were planning to do and then letting the remote caches prime themselves off that one based on client requests. In other words, just deploying caches out at the clients might be all you need to do beyond priming the origin cache.

ASP.Net increase MaxProcesses (web garden) using state server and caching

I have an ASP.Net website on IIS7 and I am planing to increase the MaxProcesses to match the number of cores on the server (4 cores, 64bit Windows Server 2008).
From what I read, if I increase the MaxProcesses to create a web garden I have to set an out-of-process state server, so I am planing to use the ASPState service to share sessions between worker processes.
But there is something that is not clear to me, is Caching also shared? Or do I have to set a new custom provider for the cache?
In-process cache is never shared in a web garden.
But here's the REAL thing... I question the motivations behind what you're doing. If the object is to use your cores more efficiently, then you can just increase the number of request and/or worker threads you have running your ASP.NET application. Running multiple w3wp processes isn't necessarily the option you want. If you have some constrained resource, like an old in-process COM object that scales poorly with threads, then I can see how you might scale better with multiple processes. But unless you really know what you're doing and why, gently step back from that setting and leave it at 1. ;-)
Caching is not shared. The web garden creates multiple "w3wp" processes. Each process will have its own cache.
If you want to share cache then use something like MemCached Win32 (with the Enyim cache client) or use the new MS product Velocity. This way once you move beyond one server you will already be set up architecturally to handle it.

What are the strongest features of Memcached?

In particular what strengths does it have over caching features of Asp.net
memcached is a distributed cache -- the whole cache can be spread into multiple boxes. so for example you can use memcached to store session data in cluster environment, so this data is available to any box of the cluster.
memcached can be compared to Microsoft's Velocity (http://blogs.msdn.com/velocity/).
Another nice feature is that memcached runs as a stand alone service. If you take your application down, the cached data will remain in memory as long as the service runs.
We use memcached as a caching back-end in a ASP.NET web site. We have 12 memcached boxes.
UP for memcached:
Much more scalable, just add boxes with memory to spare
The cache nodes are very ignorant: this means that they have no knowlegde about the other nodes participating. This makes the management and configuration of such a system extremely easy.
All of the webservers have the same values in cache (so you never see hopping values deending on which webserver serves your request)
DOWN for memcached:
compared to in-memory cache, it is very slow. Mostly because of serialization/deserialization and network latency
The cache nodes are very ignorant: ther is, for example, no way to iterate over all of the cached items
Memcached is the simplest en fastest tool is you need distributed caching. If you can use in-process in-memory cache for your application, that will always be faster. We use a cache manager that will offload certain items to memcached and keep others in local cache.

Resources