I am trying to add cache to a Tornado application, with data in Mongo. I am using Redis as a shared cache store.
Since tornado is an asynchronous framework, I was thinking about using an async client for Redis, that uses tornado's ioloop to fetch data from Redis server. None of the existing solutions are very mature, and I heard the throughput of these clients are not good.
So my question is, if I use a synchronous Redis client like pyredis, will it negatively impact the performance of my app?
I mean, considering the Redis instance lives on the same LAN, the latency for a redis command is very small, does it matter whether it is blocking or not?
It's difficult to say for sure without benchmarking the two approaches side-by-side in your environment, but redis on a fast network may be fast enough that a synchronous driver wins under normal conditions (or maybe not. I'm not personally familiar with the performance of different redis drivers).
The biggest advantage of an asynchronous driver is that it may be able to handle outages of the redis server or the network more gracefully. While redis is having problems, it will be able to do other things that don't depend on redis. Of course, if your entire site depends on redis there may not be much else you can do in this case. This was FriendFeed's philosophy. When we originally wrote Tornado we used synchronous memcache and mysql drivers because those services were under our control and we could count on them being fast, but we used asynchronous HTTP clients for external APIs because they were less predictable.
Related
Using Spring Redis cache and wonder if is possible to set some data cache duration in memory. Cache of cache. If i know that data in Redis will not change for 5 minutes i dont need that Spring Redis cache touch the Redis everytime when some #Cacheable method is called.
Is Redisson the answer?
AFAICT, Redisson is simply a client-side facade or enhanced Redis (Java) client used to interface with a Redis node (or cluster) in a more powerful and convenient way, not unlike Spring Data Redis. For example, and as you already know, using Redis as a caching provider in Spring's Cache Abstraction.
Redis does seem to support client-side caching (a local cache in addiion to the remote (server) cache?), when using a Redis client/server topology. This would be transparent to you application (e.g. #Cacheable) and configured in the Redis client driver, AFAIK.
However, given my lack of experience with Redis, or even Redisson for matter, I cannot speak to this feature in detail. Redis client-side caching may need to be supported by the Redis client drivers (e.g. Jedis, Lettuce, even Redisson, etc).
NOW THE LONG-WINDED ANSWER FOR THE INTERESTED READER:
What you are describing when you state a "cache of cache" hearsay, is really having a "locally available cache" in addition to the "remote, or server-side cache". This assumes, of course, you are running Redis in a client/server (not embedded), and possibly distributed/clustered (maybe HA), capacity in the first place.
Ideally, you would choose a caching provider that supported this sort of arrangement out-of-the-box, natively. And, despite popular belief (for example), much of what Redis "reinvented" (horizontal scale-out or cluster, HA, even persistence) already existed in other, more mature solutions, built from the ground up with these concerns in mind.
SIDENOTE: Granted, the referenced article above is dated, but also a bit naive.
A "cache of (a) cache" is technically referred to as the Near Caching pattern.
It is where the "local" (application/client-side) cache mirrors the "remote" (server-side and primary) cache to avoid [a] network hop(s), i.e. latency, by only accessing the remote cache when necessary (e.g. cache miss), preferably in a "single-hop", "fault-tolerant" fashion, when the server-side is distributed and clustered.
However, a fundamental difference between the local cache and server-side, remote cache is that the local cache only stores a subset of the data from the remote cache based on "interests".
NOTE: In Redis's documentation, they referred to this as "tracking". There are different ways, across different providers, to express "interests" or track what the client has accessed. Be mindful of the different approaches here since they consume different system resources.
You might have a distributed (Web / Microservice) application architecture where several client application instances serve different demographics or populations of end-users. Clearly, those client application instances might use shared, but different subsets of the primary dataset stored in the servers. This is where the local cache and "registering interest" only in the data that matters to, or is used by, the client application comes into play.
"Registering interest" is important since the server-side, remote cache can notify clients ("push", rather than a client "pulling") hosting a local cache when data on the server changes that a client is interested in since more than 1 client might have interest in and use the same data (e.g. "record", and the intersection of data).
So, how do we properly address this concern without unnecessarily introducing extra (layers of) complexity into our system/application architecture?
Well, for one, it starts by choosing the right caching provider for the problem at hand.
DISCLAIMER: my experience stems from Apache Geode, which is the OSS variate of VMware Tanzu GemFire and a I am responsible for all things Spring for Apache Geode at VMware.
While I am a bit biased here it is not uncommon for other caching providers (and complete IMDG solutions) to support the same arrangement. For example, 1 of my personal favorites is Hazelcast.
Hazelcast calls this particular caching arrangement, or topology, an "embedded" cache and even refers to this as "near cache" in the documentation.
The nice thing about a local, embedded "Near Cache" is that it avoids latency through unnecessary networks hops, however, interest registration is key to keep data consistent, as far as possible.
I have documented, talked about and even demonstrated different caching patterns when using Spring for Apache Geode in the Spring Boot for Apache Geode documentation here and Near Caching in particular, along with the Near Caching Sample in the Samples with the other caching patterns).
I am sure you can find similar resources with other caching providers, even Redis.
At any rate, this documentation should help you understand different concerns to be aware of (e.g. memory consumption) when choosing any topology and configuration.
Good luck!
I have a situation where I host a high RPS highly available service that receives requests aka commands. These commands have to be sent to N downstream clients, who actually execute them. Each downstream client is separate microsevice and has different constraints like mode (sync,async), execution cadence etc.
Should a slow downstream client build the logic to receive all requests and execute them in batches as they want ? Or my service should build logic to talk to slow and fast clients by maintaining state for commands across downstream clients. Share your opinions
Not enough info to give any prescriptive advice, but I'd start with dividing the tasks into async and sync first. Those are 2 completely different workloads that, most likely, would require different implementation stacks. I'll give you an idea of what you can start with in the world of AWS...
Not knowing what you mean by async, I'd default to a message-bus setup. In that case you can use something like Amazon Kinesis or Kafka for ingestion purposes, and kicking off Lambda or EC2 instance. If the clients need to be notified of a finished job they can either long-poll an SQS queue, subscribe to an SNS topic, or use MQTT with websockets for a long-running connection.
The sync tasks are easier, since it's all about processing power. Just make sure you have your EC2 instances in an auto-scaling group behind an ALB or API Gateway to scale out, and in, appropriately.
This is a very simple answer since I don't have any details needed to be more precise, but this should give you an idea of where to get started.
Our web app uses in-memory caching (Application Data Caching) to improve throughput such that frequently queried data does not have to be loaded from the database (SQL Server) for every request. Potentially, it will be deployed in a web-farm so we have to solve the classical problem of having to synchronize the caches of all nodes. So what we need is a distributed cache.
Readily available solutions are NCache and REDIS (and probably more). However, since we are already using SignalR Backplane to communicate changes to our dataset to a Windows Service (and browser clients), I'm wondering if it could be used to implement a distributed cache.
Doing so, we would (more or less) re-use our existing dataset-has-changed messages but subscribe to them in the web app itself to invalidate its cache. The upside being that we don't have to introduce a new library/technology.
I guess my biggest questions are: Does that make sense? And, is SignalR Backplane reliable enough to make sure no events get lost resulting in out-dated caches? Or is this architectural misuse?
Signalr is for realtime solution not for static.
In your solution, you will select data on one service, and you will send it to another service by backplane. Then what ? Probably you will save this to memory. What happens if one of the service has restarted ? Data will gone. You will never face this problem with redis. Additionally, you will consume your local memory for this data.
Also how you will manage expiration ? Plus you will make effort to implement this cache system with signalr.
I don't suggest you to use signalr backplane for this. Stick with Redis or smilar technologies.
I am interested in the Pub/Sub paradigm in order to provide a notifications system (ie : like Facebook), especially in a web application which has publishers (in several web applications on the same web server IIS) and one or more subscribers, in charge to display on the web the notifications for the front user.
I found out Redis, it seems to be a great server which provides interesting features : Caching (like Memcached) , Pub/Sub, queue.
Unfortunately, I didn't find any examples in a web context (ASP.NET, with Ajax/jQuery), except WebSockets and NodeJS but I don't want to use those ones (too early). I guess I need a process (subscriber) which receives messages from the publishers but I don't see how to do that in a web application (pub/sub works fine with unit tests).
EDIT : we currently use .NET (ASP.NET Forms) and try out ServiceStack.Redis library (http://www.servicestack.net/)
Actually Redis Pub/Sub handles this scenario quite well, as Redis is an async non-blocking server it can hold many connections cheaply and it scales well.
Salvatore (aka Mr Redis :) describes the O(1) time complexity of Publish and Subscribe operations:
You can consider the work of
subscribing/unsubscribing as a
constant time operation, O(1) for both
subscribing and unsubscribing
(actually PSUBSCRIBE does more work
than this if you are subscribed
already to many patterns with the
same client).
...
About memory, it is similar or smaller
than the one used by a key, so you
should not have problems to subscribe
to millions of channels even in a
small server.
So Redis is more than capable and designed for this scenario, but the problem as Tom pointed out in order to maintain a persistent connection users will need long-running connections (aka http-push / long-poll) and each active user will take its own thread. Holding a thread isn't great for scalability and technologically you would be better off using a non-blocking http server like Manos de Mono or node.js which are both async and non-blocking and can handle this scenario. Note: WebSockets is more efficient for real-time notifications over HTTP, so ideally you would use that if the users browser supports it and fallback to regular HTTP if they don't (or fallback to use Flash for WebSockets on the client).
So it's not the Redis or its Pub/Sub that doesn't scale here, it's the number of concurrent connections that a threaded HTTP server like IIS or Apache that is the limit, with that said you can still support a fair amount of concurrent users with IIS (this post suggests 3000) and since IIS is the bottleneck and not Redis you can easily just add an extra IIS server into the mix and distribute the load.
For this application, I would strongly suggest using SignalR, which is a .Net framework that enables real-time push to connected clients.
Redis publish/subscribe is not designed for this scenario - it requires a persistent connection to redis, which you have if you are writing a worker process but not when you are working with stateless web requests.
A publish/subscribe system that works for end users over http takes a little more work, but not too much - the simplest approach is to use a sorted set for each channel and record the time a user last got notifications. You could also do it with a list recording subscribers for each channel and write to the inbox list of each of those users whenever a notification is added.
With either of those methods a user can retrieve their new notifications very quickly. It will be a form of polling rather than true push notifications, but you aren't really going to get away from that due to the nature of http.
Technically you could use redis pub/sub with long-running http connections, but if every user needs their own thread with active redis and http connections, scalability won't be very good.
I'm looking for multithreaded comet server library - what I need is async io (using epoll) working on a threadpool (4-8 threads). Tornado would be ideal if it was multithreaded.
Why multithreaded? I need to process and serve data which could come from every connected user - it could be synchronised between tornado instances using database but even nosql would be too big slowdown - almost every request would end up with database write/update - which even by using async drivers isn't a good idea. I can store everything in local volataile memory so it can be very fast - but must be run on single process to avoid inter-process communication. I don't need to scale - single box is enough - but it MUST be fast. Some data will be stored in MongoDB - but number of mongo queries will be like 5% of normal requests.
And important thing - semaphores (and other higher level approaches) are not rocket science for me so I'm not afraid of synchronisation.
Requirements:
async io
non-blocking
thousands of concurrent connections
FAST
basic HTTP features (GET, POST, cookies)
ability to process request asynchronously (do something, async call with callback (ex. database query), process callback, return data)
thread pool
C++/Java/Python
simple and lightweight
It would be nice to have async mongo driver too
I've looked into Boost ASIO and it seems to be capable of doing what I need - but I want to focus on application - not writing http request processing.
I've read about Tornado (seems ideal but is single threaded), Simple (not sure if it can process request asynchronously and return data after async call), BOOST ASIO (very nice, but too low-level)
Well, after more digging I decided to change technology... I decided to create my own protocol on top of TCP and Netty