Our web app uses in-memory caching (Application Data Caching) to improve throughput such that frequently queried data does not have to be loaded from the database (SQL Server) for every request. Potentially, it will be deployed in a web-farm so we have to solve the classical problem of having to synchronize the caches of all nodes. So what we need is a distributed cache.
Readily available solutions are NCache and REDIS (and probably more). However, since we are already using SignalR Backplane to communicate changes to our dataset to a Windows Service (and browser clients), I'm wondering if it could be used to implement a distributed cache.
Doing so, we would (more or less) re-use our existing dataset-has-changed messages but subscribe to them in the web app itself to invalidate its cache. The upside being that we don't have to introduce a new library/technology.
I guess my biggest questions are: Does that make sense? And, is SignalR Backplane reliable enough to make sure no events get lost resulting in out-dated caches? Or is this architectural misuse?
Signalr is for realtime solution not for static.
In your solution, you will select data on one service, and you will send it to another service by backplane. Then what ? Probably you will save this to memory. What happens if one of the service has restarted ? Data will gone. You will never face this problem with redis. Additionally, you will consume your local memory for this data.
Also how you will manage expiration ? Plus you will make effort to implement this cache system with signalr.
I don't suggest you to use signalr backplane for this. Stick with Redis or smilar technologies.
Related
Using Spring Redis cache and wonder if is possible to set some data cache duration in memory. Cache of cache. If i know that data in Redis will not change for 5 minutes i dont need that Spring Redis cache touch the Redis everytime when some #Cacheable method is called.
Is Redisson the answer?
AFAICT, Redisson is simply a client-side facade or enhanced Redis (Java) client used to interface with a Redis node (or cluster) in a more powerful and convenient way, not unlike Spring Data Redis. For example, and as you already know, using Redis as a caching provider in Spring's Cache Abstraction.
Redis does seem to support client-side caching (a local cache in addiion to the remote (server) cache?), when using a Redis client/server topology. This would be transparent to you application (e.g. #Cacheable) and configured in the Redis client driver, AFAIK.
However, given my lack of experience with Redis, or even Redisson for matter, I cannot speak to this feature in detail. Redis client-side caching may need to be supported by the Redis client drivers (e.g. Jedis, Lettuce, even Redisson, etc).
NOW THE LONG-WINDED ANSWER FOR THE INTERESTED READER:
What you are describing when you state a "cache of cache" hearsay, is really having a "locally available cache" in addition to the "remote, or server-side cache". This assumes, of course, you are running Redis in a client/server (not embedded), and possibly distributed/clustered (maybe HA), capacity in the first place.
Ideally, you would choose a caching provider that supported this sort of arrangement out-of-the-box, natively. And, despite popular belief (for example), much of what Redis "reinvented" (horizontal scale-out or cluster, HA, even persistence) already existed in other, more mature solutions, built from the ground up with these concerns in mind.
SIDENOTE: Granted, the referenced article above is dated, but also a bit naive.
A "cache of (a) cache" is technically referred to as the Near Caching pattern.
It is where the "local" (application/client-side) cache mirrors the "remote" (server-side and primary) cache to avoid [a] network hop(s), i.e. latency, by only accessing the remote cache when necessary (e.g. cache miss), preferably in a "single-hop", "fault-tolerant" fashion, when the server-side is distributed and clustered.
However, a fundamental difference between the local cache and server-side, remote cache is that the local cache only stores a subset of the data from the remote cache based on "interests".
NOTE: In Redis's documentation, they referred to this as "tracking". There are different ways, across different providers, to express "interests" or track what the client has accessed. Be mindful of the different approaches here since they consume different system resources.
You might have a distributed (Web / Microservice) application architecture where several client application instances serve different demographics or populations of end-users. Clearly, those client application instances might use shared, but different subsets of the primary dataset stored in the servers. This is where the local cache and "registering interest" only in the data that matters to, or is used by, the client application comes into play.
"Registering interest" is important since the server-side, remote cache can notify clients ("push", rather than a client "pulling") hosting a local cache when data on the server changes that a client is interested in since more than 1 client might have interest in and use the same data (e.g. "record", and the intersection of data).
So, how do we properly address this concern without unnecessarily introducing extra (layers of) complexity into our system/application architecture?
Well, for one, it starts by choosing the right caching provider for the problem at hand.
DISCLAIMER: my experience stems from Apache Geode, which is the OSS variate of VMware Tanzu GemFire and a I am responsible for all things Spring for Apache Geode at VMware.
While I am a bit biased here it is not uncommon for other caching providers (and complete IMDG solutions) to support the same arrangement. For example, 1 of my personal favorites is Hazelcast.
Hazelcast calls this particular caching arrangement, or topology, an "embedded" cache and even refers to this as "near cache" in the documentation.
The nice thing about a local, embedded "Near Cache" is that it avoids latency through unnecessary networks hops, however, interest registration is key to keep data consistent, as far as possible.
I have documented, talked about and even demonstrated different caching patterns when using Spring for Apache Geode in the Spring Boot for Apache Geode documentation here and Near Caching in particular, along with the Near Caching Sample in the Samples with the other caching patterns).
I am sure you can find similar resources with other caching providers, even Redis.
At any rate, this documentation should help you understand different concerns to be aware of (e.g. memory consumption) when choosing any topology and configuration.
Good luck!
I am trying to add cache to a Tornado application, with data in Mongo. I am using Redis as a shared cache store.
Since tornado is an asynchronous framework, I was thinking about using an async client for Redis, that uses tornado's ioloop to fetch data from Redis server. None of the existing solutions are very mature, and I heard the throughput of these clients are not good.
So my question is, if I use a synchronous Redis client like pyredis, will it negatively impact the performance of my app?
I mean, considering the Redis instance lives on the same LAN, the latency for a redis command is very small, does it matter whether it is blocking or not?
It's difficult to say for sure without benchmarking the two approaches side-by-side in your environment, but redis on a fast network may be fast enough that a synchronous driver wins under normal conditions (or maybe not. I'm not personally familiar with the performance of different redis drivers).
The biggest advantage of an asynchronous driver is that it may be able to handle outages of the redis server or the network more gracefully. While redis is having problems, it will be able to do other things that don't depend on redis. Of course, if your entire site depends on redis there may not be much else you can do in this case. This was FriendFeed's philosophy. When we originally wrote Tornado we used synchronous memcache and mysql drivers because those services were under our control and we could count on them being fast, but we used asynchronous HTTP clients for external APIs because they were less predictable.
I am interested in the Pub/Sub paradigm in order to provide a notifications system (ie : like Facebook), especially in a web application which has publishers (in several web applications on the same web server IIS) and one or more subscribers, in charge to display on the web the notifications for the front user.
I found out Redis, it seems to be a great server which provides interesting features : Caching (like Memcached) , Pub/Sub, queue.
Unfortunately, I didn't find any examples in a web context (ASP.NET, with Ajax/jQuery), except WebSockets and NodeJS but I don't want to use those ones (too early). I guess I need a process (subscriber) which receives messages from the publishers but I don't see how to do that in a web application (pub/sub works fine with unit tests).
EDIT : we currently use .NET (ASP.NET Forms) and try out ServiceStack.Redis library (http://www.servicestack.net/)
Actually Redis Pub/Sub handles this scenario quite well, as Redis is an async non-blocking server it can hold many connections cheaply and it scales well.
Salvatore (aka Mr Redis :) describes the O(1) time complexity of Publish and Subscribe operations:
You can consider the work of
subscribing/unsubscribing as a
constant time operation, O(1) for both
subscribing and unsubscribing
(actually PSUBSCRIBE does more work
than this if you are subscribed
already to many patterns with the
same client).
...
About memory, it is similar or smaller
than the one used by a key, so you
should not have problems to subscribe
to millions of channels even in a
small server.
So Redis is more than capable and designed for this scenario, but the problem as Tom pointed out in order to maintain a persistent connection users will need long-running connections (aka http-push / long-poll) and each active user will take its own thread. Holding a thread isn't great for scalability and technologically you would be better off using a non-blocking http server like Manos de Mono or node.js which are both async and non-blocking and can handle this scenario. Note: WebSockets is more efficient for real-time notifications over HTTP, so ideally you would use that if the users browser supports it and fallback to regular HTTP if they don't (or fallback to use Flash for WebSockets on the client).
So it's not the Redis or its Pub/Sub that doesn't scale here, it's the number of concurrent connections that a threaded HTTP server like IIS or Apache that is the limit, with that said you can still support a fair amount of concurrent users with IIS (this post suggests 3000) and since IIS is the bottleneck and not Redis you can easily just add an extra IIS server into the mix and distribute the load.
For this application, I would strongly suggest using SignalR, which is a .Net framework that enables real-time push to connected clients.
Redis publish/subscribe is not designed for this scenario - it requires a persistent connection to redis, which you have if you are writing a worker process but not when you are working with stateless web requests.
A publish/subscribe system that works for end users over http takes a little more work, but not too much - the simplest approach is to use a sorted set for each channel and record the time a user last got notifications. You could also do it with a list recording subscribers for each channel and write to the inbox list of each of those users whenever a notification is added.
With either of those methods a user can retrieve their new notifications very quickly. It will be a form of polling rather than true push notifications, but you aren't really going to get away from that due to the nature of http.
Technically you could use redis pub/sub with long-running http connections, but if every user needs their own thread with active redis and http connections, scalability won't be very good.
Our client requirement is to develop a WCF which can withstand with 1-2k concurrent website users and response should be around 25 milliseconds.
This service reads couple of columns from database and will be consumed by different vendors.
Can you suggest any architecture or any extra efforts that I need to take while developing. And how do we calculate server hardware configuration to cope up with.
Thanks in advance.
Hardly possible. You need network connection to service, service activation, business logic processing, database connection (another network connection), database query. Because of 2000 concurrent users you need several application servers = network connection is affected by load balancer. I can't imagine network and HW infrastructure which should be able to complete such operation within 25ms for 2000 concurrent users. Such requirement is not realistic.
I guess if you simply try to run the database query from your computer to remote DB you will see that even such simple task will not be completed in 25ms.
A few principles:
Test early, test often.
Successful systems get more traffic
Reliability is usually important
Caching is often a key to performance
To elaborate. Build a simple system right now. Even if the business logic is very simplified, if it's a web service and database access you can performance test it. Test with one user. What do you see? Where does the time go? As you develop the system adding in real code keep doing that test. Reasons: a). right now you know if 25ms is even achievable. b). You spot any code changes that hurt performance immediately. Now test with lots of user, what degradation patterns do you hit? This starts to give you and indication of your paltforms capabilities.
I suspect that the outcome will be that a single machine won't cut it for you. And even if it will, if you're successful you get more traffic. So plan to use more than one server.
And anyway for reliability reasons you need more than one server. And all sorts of interesting implementation details fall out when you can't assume a single server - eg. you don't have Singletons any more ;-)
Most times we get good performance using a cache. Will many users ask for the same data? Can you cache it? Are there updates to consider? in which case do you need a distributed cache system with clustered invalidation? That multi-server case emerging again.
Why do you need WCF?
Could you shift as much of that service as possible into static serving and cache lookups?
If I understand your question 1000s of users will be hitting your website and executing queries on your DB. You should definitely be looking into connection pools on your WCF connections, but your best bet will be to avoid doing DB lookups altogether and have your website returning data from cache hits.
I'd also look into why you couldn't just connect directly to the database for your lookups, do you actually need a WCF service in the way first?
Look into Memcached.
I am building an ASP.NET website which will collect data from a user and submit it to a 3rd party webservice. The webservice is somewhat unreliable and for this reason there is a backup service.
If a call to the primary service fails (timeout or some other error) then I need to flip a bit in a static class which will trip the system to use the secondary service.
At this point, I need to start polling the primary service (with dummy data) to see if it is back up (at which point I will receive an OK code in return). At this point I need to flip the bit back so that the website starts using the primary service again.
I've had a read of this Should I use a Windows Service or an ASP.NET Background Thread? and I think that separating out the code into a Windows Service would be the cleanest method of performing the polling, but then how would I communicate with the web appication.
One thought I've had is to expose a webservice that the Windows Service could use to communicate into the webapp but this seems both messy and over-kill.
I'd appreciate your thoughts and experiences performing similar tasks.
Thanks
I think the Windows service is the way to go, definitely.
As for the communication between the service and your web site, the best answer depends on the size and scale of your solution. If you are building something that needs to be reliable, I'd suggest you implement some sort of queue between your ASP.NET site and your Windows service. You have a lot of options here too, depending on budget and ability: BizTalk, MSMQ, and SQL Server queues (SSIS). Alternatively if you are looking for something smaller scale, I'd recommend you just stick it into a database table somewhere.
I would avoid using files on the file system because you will encounter issues with file locks and multithreading. I would also avoid directly communicating with the service because you risk losing the in-memory queue if the service fails for any reason.
Edited to add:
If reliability isn't a concern here, you could use a WPF named-pipes hosted service for communication between your website and your Windows service. This avoids much of the overheads normally involved in classic Web Services and is surprisingly quick. The only down-side is that self-hosting a WPF service is tricky and can be difficult to keep the service up.