My first impression of AppFabric Cache is that it's essentially a distributed hashtable in the same vein as memcached. The typical usage pattern of such a cache is that there is no guarantee that your data will be in the cache (old entries are evicted to make space for new ones), but with sufficient RAM they usually will be.
On the other hand MS provide a Web Session State Provider that stores session data in an AppFabric Cache. This appears to be a completely different usage pattern as we now require the cached items to never be evicted as a result of memory pressure. To achieve this MS provide a high-availability mode that keeps redundant copies of all data, furthermore eviction can be disabled, which in turn requires us to allocate sufficient RAM to ensure that the cache never reaches capacity.
It seems likely that an application would benefit from using both types/modes of cache, but as far as I can tell AppFabric RAM cannot be ringfenced within a cluster or host, hence the web session state may (and generally will) experience memory pressure in that case. The only solution I can see is to operate two AppFabric Cache clusters, one for each mode.
Is the above a good representation of the situation or am I missing some config setting that addresses this scenario?
Storing a session in appfabric is not a good idea,have faced many problems trying this(like due to memory pressure data got lost, multiple users hitting the cache to put the data can lead to data loss etc.) and now started using inProc/SqlServer session state use.
Related
I'm currently testing out AppFabric Distributed Cache, it's been working great.
When performance testing the Local Cache feature however, I find there is no difference in performance.
For the purposes of the performance test I am storing large pages generated from OutputCache into AppFabric and am noticing the same performance with or without local cache on.
Does anyone else have any similar experience?
I'm using Timeout based local cache, with a ttl of 300 and objectcount of 100000.
If the distributed cache is on the local server, then there should be very little difference.Since the main time usage accessing the distributed cache is the transport across the network.
It may be that it takes a bit longer to access the distributed cache than the local on the same machine, since local cache is in process:
When local cache is enabled, the cache client stores a reference to
the object locally. This keeps the object active in the memory of the
client application
However, local cache does add some sync overhead. So the actual differences will depend on your usage pattern.
I think this might depend on the type of data your are caching.
We use local cache a lot for web services that have many almost identical Get methods (small data in return). The local cache gave a significant less load on the cache servers, and most transactions take 0 ms.
I have the specific scenario for which we want to use Coherence as sitributed cache. Which I am gonna describe here.
I have 20+ standalone processes which are going to put the data in cache continuously. the frequency of all of them differs, though thats not a concern.
And 2 procesess which will be reading data from those cache.
I dont need any underlying db except for the way which coherence provide. Data will be written to the cache and read from the cache.
I have 4 node cluster at my disposal (cost constraint whatever) and the coherence cluster will be on different boxes (infra constraint whatever) and both the populating portion of the cache and the reading part will be on differnt nmachines.
The peak memory size of the cache daily will hover around 6 GB max, min being 2 GB.
Cache will have daily data only and I will have separate archiving processes to simulatneosuly keep archiving it also. the point is that cache size for now will have this size only. Lets say I am gonna keep the date out of key equation.
Though Would like to explore if I can store more into those 4 nodes. Right now its simple serialization, can explore other nbinary formats. Or should I definietly at this size of the cache?
My read and write operations are fairly spread out in the day. Meaning the read and write will keep on happening by those 2 reading clients and 20+ writing clients. Its not like one of them is more. Though there is a startup batch process in all of the background process which push more to the cache than the continuous pushing afterwards. But continuous pushing pushes fair amount of data too.
Now my questions regarding those above points (and because of some confusion also)
The biggest one is somebody told me that I an have limited number of connection depending on the nodes we have bought. so he said if its 4, you ideally should have 4 connections only at the max. So, develop a gatekeeper kind of application and what not. Even if we use TCP Extend. Now from my reading so far, I dont think so. Is it? The point is dont wanna go that way if its really is not a constraint.
In other words is there limit on connection through Proxy Service dependeing on the nodes in the cluster?
Soemwhat related to above only. at the very max, I am going to get some penalty on the performance while pushing to cache only if I go the Extend way, right?
Partioned cache/near cache. As the reading time as well as the most update cache both are extremely critical. (the most imp question i have).
Really want to see the benefit which can be obtained from going to POF instead of lets say serialization/externalizatble/protobuf. Can coherence support protobuf out of the box? (may be for later on)
There's no technical limitation to the number of connections a Coherence Extend proxy can support except normal network and hardware resource constraints. You will have to ask an Oracle sales person if there are licensing limitations.
There is some performance impact from using a proxy because you are adding an additional network hop (client to proxy to cluster). If you use POF serialization then the proxy does not have to serialize/deserialize values. It can just pass the object through in its serialized form. In most applications the performance impact of using a proxy is tiny because Coherence is highly optimized for network speed. You are not required to use a proxy unless your clients are .NET or C++, but there are advantages of isolating client performance from impacting the cache.
Near cache will improve retrieval performance dramatically if there a number of frequently retrieved items for a client since they will be found in-process.
POF offers performance improvements based on faster serialization/deserialization and more compact storage. It is always best to try with test data based on your real production data and measure the difference yourself. Coherence does not support protobuf out of the box.
I'm looking for a way for the application itself to monitor the amount of memory it is using, so I can record it in a log file every hour or so and keep an eye on the applications usage.
Its all hosted so we can make changes to the system to see what is going on so the solution will have to be from within the application code.
We may in future use the memory information to affect the caching policies.
Hmm, how detailed information do you need? If you just want the memory usage you can ask the GC. It knows. ;)
long bytes = GC.GetTotalMemory(false); // use 'false' to not wait for next collect
The variable 'bytes' will contain the number of bytes currently allocated in managed memory. I'm not sure whether the managed memory entails the entire process or just the AppDomain. You'll have to test this by running several AppDomains in one process and see if managed memory allocation is measured cross AppDomains. If they don't, then you can use this to measure total memory usage in an ASP.NET application.
If you want more specific information there's a diagnostics API for the CLR which you could interface with. There's also plenty of memory profilers out there, but if they'll work within an ASP.NET application I cannot say.
As an alternative, if you want more detailed information, you can read the performance counters using the System.Diagnostics.PerformanceCounter class. Here are some of the counters that you can plug into:
Request Bytes Out Total
Request Bytes In Total
Request Wait Time
Requests Executing
Requests/Sec
Errors Total
We have 22 HTTP servers each running their own individual ASP.NET Caches. They read from a read only DB that is only updated off peak hours.
We use a file dependency to invalidate the cache, prompting the servers to "new up" their caches...If this is accidentally done during peak hours, it risks bringing down our DB cluster due to the sudden deluge of open connections.
Has anyone used memcached with ASP.NET in this distributed form? It seems to me that it would offer a huge advantage of having to only build up one cache (and hit the DB 21 times less), while memcached would handle distributing it on each box.
If you have, do you place it on the same box as the HTTP boxes, or do you run a separate cache tier? How well does it scale, can we expect it to need powerful servers? Our working dataset is not huge (We fit it into 4 gigs of memory on each HTTP box just fine).
How do you handle invalidation?
Looking for experiences and war stories.
EDIT: Win2k3, IIS6, 64-bit servers...4 gigs per box (I believe, we may have upped it to 16 gigs when we changed to 64-bit servers).
"memcached would handle distributing it on each box"
memcached does not distribute or replicate a cache to each box in a memcached farm. The memcached client basically hashes the key and chooses a cache server based on that hash. When one of the memcached servers fail you will lose whatever cached items existed on that server, however, the client will recognize the failure and begin writing values to a different server. This being the case, your code needs to account for missing items in the cache and reset them if necessary.
This article discusses the memcached architecture in more detail: How memcached works.
Best practice (according to the memcached site) is to run memcached on the same box as your web server app or else you're making http calls (which isn't all that bad, but it's not optimal). If you're running a 64-bit app server (which you probably should if you're going to be running memcached), then you can load up each of the servers with loads of memory and it will be available to memcached. There's not much in the way of CPU resources used by memcached, so if your current app server isn't very taxed, it will remain that way.
Haven't used them together, but I've used them both on separate projects.
Last I saw the documentation explicitly said that sharing with the web server was ok.
Memcache really only needs RAM and if you take your asp.net cache out of the equation how much RAM is you web server actually using? Probably not much. It won't compete much with your web server for CPU and it doesn't need disk at all. You might consider segmenting off the network traffic (if you don't already) from the incoming web requests.
It worked well and was fast I didn't have any problems with it.
Oh, invalidation was explicit on the project I used it on. Not sure what other modes there are for that.
If you want to get replication accross your memcached servers then it maybe worth a look at repcached. It's a patch for memcached that handles the replication part.
Worth checking out Velocity, which is a distributed cache provided by Microsoft. I cannot give you a point-by-point comparison to memcached, but Velocity is integrated with ASP.NET and will continue to get more development and integration.
In particular what strengths does it have over caching features of Asp.net
memcached is a distributed cache -- the whole cache can be spread into multiple boxes. so for example you can use memcached to store session data in cluster environment, so this data is available to any box of the cluster.
memcached can be compared to Microsoft's Velocity (http://blogs.msdn.com/velocity/).
Another nice feature is that memcached runs as a stand alone service. If you take your application down, the cached data will remain in memory as long as the service runs.
We use memcached as a caching back-end in a ASP.NET web site. We have 12 memcached boxes.
UP for memcached:
Much more scalable, just add boxes with memory to spare
The cache nodes are very ignorant: this means that they have no knowlegde about the other nodes participating. This makes the management and configuration of such a system extremely easy.
All of the webservers have the same values in cache (so you never see hopping values deending on which webserver serves your request)
DOWN for memcached:
compared to in-memory cache, it is very slow. Mostly because of serialization/deserialization and network latency
The cache nodes are very ignorant: ther is, for example, no way to iterate over all of the cached items
Memcached is the simplest en fastest tool is you need distributed caching. If you can use in-process in-memory cache for your application, that will always be faster. We use a cache manager that will offload certain items to memcached and keep others in local cache.