when to use distributed hash table instead of a "traditional" hash table? - hashtable

Can anyone give me some intuitive examples? I have seen a bunch of notes but still could not get the "point" and advantage of "distributed hash table" compared to a simple traditional hash table. Thanks!

There are a number of advantages that you can get over a traditional hashtable, when using a distributed cache:
Distributed cache will be out of process. Data will remain cached even if user application restarts; traditional hashtable will be disposed with application restart
Distributed cache can be shared among multiple applications, data cached by one application will be available to all others; traditional hashtable will be local to the process only
Distributed cache provides scalability, i.e. adding more servers will add more memory (RAM) to be used for distributed-hashtable; where as local hashtable can only use local process's memory
Distributed caching solutions provide extra features like replication for fault tolerance, expiration, eviction and dependencies etc which help user make better use of caching as compared to a hashtable
Several solutions like NCache also provide SQL-like queries to be used on in-memory data in distributed cache
You can look into Iqbal Khan's article on MSDN about Distributed Caching On The Path To Scalability for further understanding of need of distributed cache.

Related

Why should I use IMemoryCache when we have IDistributedCache?

.Net Core provides in-memory implementations for both interfaces (MemoryCache and DistributedMemoryCache) but let's assume we have a working IDistributedCache implementation for our application.
When does it make sense to still use IMemoryCache. In what scenarios is it helpful or preferred over caching data in a distributed cache?
I was searching for same and found the answer in github issue:
They have fundamentally different semantics. MemoryCache can store live objects, the distributed cache can't, objects have to be serialized. The distributed cache can be off box and calls to it may fail or take a long time so getting and setting should be async, the MemoryCache is always in memory and fast. The distributed cache can be disconnected from the store so the interface should account for that.
https://github.com/aspnet/Caching/issues/220#issuecomment-241229013
By design IMemoryCache interface used when you need to implement data caching mechanism for single or multiple process on same app server.
Shortly we could say, in-process cached mechanism.
Meanwhile IDistributedCache interface been designed for distributed cache mechanism, where any data cache shared on many app servers (on web farm).
Shortly we could say, web farm data caching scenario.
Hope this could helps.

NoSQL and AppFabric with Azure

I have an ASP.net application that I'm moving to Azure. In the application, there's a query that joins 9 tables to produce a user record. Each record is then serialized in json and sent back and forth with the client. To increase query performance, the first time the 9 queries run and the record is serialized in json, the resulting string is saved to a table called JsonUserCache. The table only has 2 columns: JsonUserRecordID (that's unique) and JsonRecord. Each time a user record is requested from the client, the JsonUserCache table is queried first to avoid having to do the query with the 9 joins. When the user logs off, the records he created in the JsonUserCache are deleted.
The table JsonUserCache is SQL Server. I could simply leave everything as is but I'm wondering if there's a better way. I'm thinking about creating a simple dictionary that'll store the key/values and put that dictionary in AppFabric. I'm also considering using a NoSQL provider and if there's an option for Azure or if I should just stick to a dictionary in AppFabric. Or, is there another alternative?
Thanks for your suggestions.
"There are only two hard problems in Computer Science: cache invalidation and naming things."
Phil Karlton
You are clearly talking about a cache and as a general principle, you should not persist any cached data (in SQL or anywhere else) as you have the problem of expiring the cache and having to do the deletes (as you currently are). If you insist on storing your result somewhere and don't mind the clearing up afterwards, then look at putting it in an Azure blob - this is easily accessible from the browser and doesn't require that the request be handled by your own application.
To implement it as a traditional cache, look at these options.
Use out of the box ASP.NET caching, where you cache in memory on the web role. This means that your join will be re-run on every instance that the user goes to, but depending on the number of instances and the duration of the average session may be the simplest to implement.
Use AppFabric Cache. This is an extra API to learn and has additional costs which may get quite high if you have lots of unique visitors.
Use a specialised distributed cache such as Memcached. This has the added cost/hassle of having to run it all yourself, but gives you lots of flexibility in the long run.
Edit: All are RAM based. Using ASP.NET caching is simpler to implement and is faster to retrieve the data from cache because it is on the same machine - BUT requires the cache to be populated for each instance of the web role (i.e. it is not distributed). AppFabric caching is distributed but is also a bit slower (network latency) and, depending what you mean by scalable, AppFabric caching currently behaves a bit erratically at scale - so make sure you run tests. If you want scalable, feature rich distributed caching, and it is a big part of your application, go and put in Memcached.

Caching large amounts of data

I have been reading that lots of people use Redis or another key-value store/NoSQL solution as a distributed cache for their website.
Maybe I'm not understanding completely, but it seems a solution like this only works for shared data. For example, if I have a website that requires a user to log-in and the queries they generate return data specific to only that user (in my case, banking/asset information) that can't be cached for all users, this type of solution doesn't work.
Unfortunately, the database is shared across all our applications and when it get bogged down, the website gets bogged down as well. Since each user has gigabytes of information, I obviously can't cache all of that and each web page queries completely different information.
Is there some caching strategy that I can employ for this type of scenario?
A distributed cache like Velocity doesn't require that the data it stores be limited to "shared" data. But you do have to read the data from your DB and store it in the cache, which takes time.
A few alternatives:
Partition your data, so it's spread out among several DB servers
Add as much RAM as you can to each DB server, to allow SQL Server to cache what it can
There are many variations to the partitioning theme....
Is your web app load balanced? There are caching options at the web tier as well -- the ASP.NET object cache is a good place to start.
It's possible that your web clients are requesting the same data more than once (for a given user). So caching could give a benefit in that case.
But before you go implementing a huge caching solution, you really need to look at the queries that are particularly slow or executed a huge number of times and see if you can optimize them in any way.
Then look at upgrading your DB machine.
I read a nice article about the performance issues that MySpace had when they had a huge growth.
You can find the article here.
One quote from the article that stands out:
The addition of the cache servers is "something we should have done
from the beginning, but we were growing too fast and didn't have time
to sit down and do it," Benedetto adds
If the problem is in your database server think about partitioning your data and making use of a database farm to spread the load. Also think about SSD's! They can really speed up your database access code.
Depending how dynamic your data is you could consider using Fragment Caching. This will cache the HTML of the page rather than the data so if the volume of data is prohibtive to cache then this might work for you

ASP.NET Page.Cache versus Page.Application storage for data synchronization?

Both Page.Cache and Page.Application can store an application's "global" data, shared among requests and threads.
How should one storage area be chosen over the other considering scenarios of data synchronization in the multi-threaded ASP.NET environment?
Looking for best practice and experienced recommendation.
If the data
is stable during the life of the application
must always be available and must not be purged
better store it in HttpApplicationState.
If the data
not necessarily is needed for the life of the application
changes frequently
can be purged if needed (for example low system memory)
can be discarded if seldom used
should be invalidated/refreshed under some conditions (dependency rule: time span, date, file timestamp, ...)
then use Cache.
Other important points:
Large amounts of data may better be stored in Cache, the server then can purge it if low on memory.
Cache is safe for multithreaded operations. Page.Application needs locking.
See also this article on etutorials.org for more details.
You typically would store the data in Page.Application Items Collection when you need it within the same request. Page.Cache is typically used in data caching scenarios when you want to use it across multiple requests.

Custom caching in ASP.NET

I want to cache custom data in an ASP.NET application. I am putting lots of data into it, such as List<objects>, and other objects.
Is there a best practice for this? Since if I use a static data, if the w3p.exe dies or gets recycled, the cache will need to be filled again.
The database is also getting updated by other applications, so a thread would be needed to make sure it is on the latest data.
Update 1:
Just found this, which problably helps me
http://www.codeproject.com/KB/web-cache/cachemanagementinaspnet.aspx?fid=229034&df=90&mpp=25&noise=3&sort=Position&view=Quick&select=2818135#xx2818135xx
Update 2:
I am using DotNetNuke as the application, ( :( ). I have enabled persistent caching and now the whole application feels slugish.
Such as a Multiview takes about 3 seconds to swap view....
Update 3:
Strategies for Caching on the Web?
Linked to this, I am using the DotNetNuke caching method, which in turn uses the ASP.NET Cache object, it also has file based caching.
I have a helper:
CachingProvider.Instance().Add( _
(label & "|") + key, _
newObject, _
Nothing, _
Cache.NoAbsoluteExpiration, _
Cache.NoSlidingExpiration, _
CacheItemPriority.NotRemovable, _
Nothing)
Which runs that to add the objects to the cache, is this correct? As I want to keep it cached as long as possible. I have a thread which runs every x Minutes, which will update the cache. But I have noticied, the cache is getting emptied, I check for an object "CacheFilled" in the cache.
As a test I've told the worker process not to recycle, etc., but still it seems to clear out the cache. I have also changed the DotNetNuke settings from "heavy" to "light" but think that is for module caching.
You are looking for either out of process caching or a distributed caching system of some sort, based upon your requirements. I recommend distributed caching, because it is very scalable and is dedicated to caching. Someone else had recommended Velocity, which we have been evaluating and thoroughly enjoying. We have written several caching providers that we can interchange while we are evaluating different distributed caching systems without having to rebuild. This will come in handy when we are load testing the various systems as part of the final evaluation.
In the past, our legacy application has been a random assortment of cached items. There have been DataTables, DataViews, Hashtables, Arrays, etc. and there was no logic to what was used at any given time. We have started to move to just caching our domain object (which are POCOs) collections. Using generic collections is nice, because we know that everything is stored the same way. It is very simple to run LINQ operations on them and if we need a specialized "view" to be stored, the system is efficient enough to where we can store a specific collection of objects.
We also have put an abstraction layer in place that pretty much brokers calls between either the DAL or the caching model. Calls through this layer will check for a cache miss or cache hit. If there is a hit, it will return from the cache. If there is a miss, and the call should be cached, it will attempt to cache the data after retrieving it. The immediate benefit of this system is that in the event of a hardware or software failure on the machines dedicated to caching, we are still able to retrieve data from the database without having a true outage. Of course, the site will perform slower in this case.
Another thing to consider, in regards to distributed caching systems, is that since they are out of process, you can have multiple applications use the same cache. There are some interesting possibilities there, involving sharing database between applications, real-time manipulation of data, etc.
Also have a look at the MS Enterprise Caching Application block which allows your to write custom expiration policy, custom store etc.
http://msdn.microsoft.com/en-us/library/cc309502.aspx
You can also check "Velocity" which is available at
http://code.msdn.microsoft.com/velocity
This will be useful if you wish to scale your application across servers...
There are lots of articles about the Cache object in ASP.NET and how to make it use SqlDependencies and other types of cache expirations. No need to write your own. And using the Cache is recommended over session or any of the other collections people used to cram lots of data into.
Cache and Session can lead to sluggish behaviour, but sometimes they're the right solutions: the rule of right tool for right job applies.
Personally I've often created collections in pseudo-static singletons for the kind of role you describe (typically to avoid I/O overheads like storing a compiled xslttransform), but it's very important to keep in mind that that kind of cache is fragile, and design for it to A). filewatch or otherwise monitor what it's supposed to cache where appropriate and B). recreate/populate itself with use - it should expect to get flushed frequently.
Essentially I recommend it as a performance crutch, but don't rely on it for anything requiring real persistence.

Resources