uWSGI and Flask: keep objects in memory between requests - nginx

My stack is uWSGI, flask and nginx currently. I have a need to store data between requests (basically I receive push notifications from another service about events to the server and I want to store those events in the server memory, so client can just query server every n milliseconds, to receive latest update).
Normally this would not work, because of many reasons. One is a good deployment requires you to have several processes in uwsgi in production (and even maybe several machines to scale this out). But my case is very specific: I'm building a web app for a piece of hardware (You can think of your home router configuration page as a good example). This means no need to scale. I also do not have a database (at least not a traditional one) and probably normally 1-2 clients simultaneously.
if I specify --processes 1 --threads 4 in uwsgi, is this enough to ensure the data is kept in the memory as a single instance? Or do I also need to use --threads 1?
I'm also aware that some web servers clear memory randomly from time to time and restart the hosted app. Does nginx/uwsgi do that and where can I read about the rules?
I'd also welcome advises on how to design all of this, if there are better ways to handle this. Please note that I do not consider using any persistant storage for this - this does not worth the effort and may be even impossible due to hardware limitations.
Just to clarify: When I'm talking about one instance of data, I'm thinking of my app.py executing exactly one time and keeping the instances defined there for as long as the server lives.

If you don't need data to persist past a server restart, why not just build a cache object into you application that can do push and pop operations?
A simple array of objects should suffice, one flask route pushes new data to the array and another can pop the data off the array.

Related

Load balance background tasks in Azure Web App

I am developing an ASP.NET application that will be hosted as an Azure web app. Part of the app will continuously record multiple web-based cameras by retrieving a snapshot every N seconds. I would like to design the app so that the processes that record the cameras can be run on multiple instances. I would like it to load balance between all instances, but not duplicate effort for any one camera.
For example, if I have 100 cameras, and am running on 2 instances, I want each instance to get 50 cameras to process. If I have 5 instances, each instance should get 20 cameras to process. As I add cameras or scale instances up/down I would like for the system to load balance the work evenly.
If it's feasible, I would rather not spin up dedicated VMs just for processing cameras, due to increased cost.
I'm somewhat familiar with Akka.NET, Hangfire, and WebJobs, but am unclear if these will help in this scenario. I have used Hangfire and WebJobs to do background processing, but not with this sort of load-balancing requirement. Will these or some other framework or tool help me load balance these background tasks evenly across Azure Web App Instances? How should I go about setting up these or another framework to do this?
I honestly don't think you want to try to "balance" the servers. I think you just want to make sure the work is well distributed. If I were you, I would use a queue system like SQS to queue up all of the cameras that need a snapshot and let each instance worker dequeue one at a time and process it.
A good approach could be to have a master server responsible for queueing up the snapshots, and then have all of your workers servers simply work out of this shared queue. Even if one server happens to process more than the others, that is fine since the others were working out of the same queue. It just means that this server was able to process its jobs more quickly than the others.
To be honest, there are a lot of ways to approach this. You could do something as simple as just having a shared list of your cameras, with a timestamp for the last snapshot, and use this to work off of. Each server would request a camera, they would look at the list and find one that was stale, and then update the timestamp and perform the snapshot for the camera. The downside to something like this is you are going to struggle with non-atomic operations and the possibility of multiple workers making the request at the same time and both working on the same server. These are the type of things that a queue system will help you with, because as soon as one of those queue items are in flight, they will no longer be available. And also, because each server is responsible for invalidating their items once they are finished, if a server were to crash mid-snapshot, this work would simple go back into the queue.
No matter which solution you choose, it is going to boil down to having a central system/list for serving up stale cameras.
The Azure WebJob SDK uses the Storage Account you set up to balance the work between the various instances that are running your Jobs. You can gain finer control by using a Queue to divide up the work that needs doing and then scale your App Service Plan based on the Queue length.
Here's a rough picture of that architecture:

Web server tolerance to high client poll rate: Cowboy vs. Yaws web servers

I have been building a real-time notification system. It’s part of a web application, but events have to be seen as soon as they occur. Long polling was not an option because it would be expensive for the web server to hold on to connections when no events are available, so I had to go for short-lived polls.
Each client hits the web server every, say, 2 seconds (this is a fairly high rate). When events are available, they are sent as JSON to the JavaScript client. Now, this requires a server set-up to handle a high number of short-lived connections. I have implemented one such system using the Yaws web server. However, because Yaws starts quite a number of many other services, it feels heavy and connections begin to get either refused or aborted when they go beyond 30,000 (maybe because I am running some ETS Tables in the same Erlang VM as Yaws is running on [separating these may require rpc:call/4, which—I fear—will increase latency]). I know that there are operating-system-specific tweaks to do, and those have been done.
This would not be a problem if it was easy to cluster up several Yaws instances. In Yaws, i am using a few appmods, and I am doing things RESTfully. I was thinking that the Cowboy web server might enhance things a bit here. I have not used Cowboy before, but I have used Misultin. Looking at Cowboy, it is a full fledged OTP Application and it seems to be easy to cluster, and being lightweight, may perhaps increase on the number of clients the overall system can handle. Storage is on Mnesia, which I can distribute easily to add more nodes (maybe by replication), so that there is a Cowboy instance in front of every Mnesia instance.
My questions are:
Is my speculation correct, that if I switched from Yaws to Cowboy, I might increase the performance significantly?
Yaws has a clean API via Appmods and the #arg{} record. Does Cowboy have an equivalent of these two things (illustrate please)?
Can Cowboy handle file uploads? If so, which server (Yaws or Cowboy), in your opinion would be better to use in the case of frequent file uploads? Illustrate how file uploads are done with Cowboy.
It is possible to run several Yaws instances on the same machine. Do you think that creating many Yaws instances per server (physical box) and having the client-load distributed across these would help? What do I need to know about doing this?
When I set the yaws.conf parameter max_connections = nolimit, how would I specify the same in Cowboy?
Now, I followed the interview with Cowboy author and he discusses the reasons why Cowboy is more lightweight than Yaws. He says that
The biggest difference is the use of binaries instead of lists. The generic acceptor pool is another. I could list a lot of other small differences but I figure these aren’t the most interesting.
That because Cowboy uses the listener-pool library Ranch, it somehow ends up with a higher capability of handling more connections, plus the use of binaries and not lists.
Another quote from the same interview:
Since we use one process per connection instead of two, and we use binaries instead of lists, we end up using a lot less memory than other projects without user intervention. Cowboy is also lazy, it doesn’t do anything unless required. So we don’t have much in memory until the user starts calling functions.
I wonder how yaws handles this case. Somehow, my problem domain needs lightweight HTTP handling. It’s actually true that Yaws will lead to more memory consumption as compared to say, Mochiweb, Misultin or Cowboy. My greatest concern is that Yaws has the best/cleanest API whereby it gives us access to the #arg{} containing everything we need as an Erlang record, so that we can get them out ourselves, than the others which have numerous functions for extracting stuff outside. Even the documentation: Yaws docs are pretty good and straightforward. Perhaps I need to look at more Cowboy code for things like file uploading and simple GET and POST request handling.
Otherwise, the questions I asked earlier, remain as pressing concerns. Yaws is pretty good, but seems to be overkill for this fast light-weight short-lived high rate poll situation, what do you think?
Your 30000 refusal limit sounds an awful lot like a 32k limit somewhere. Either the default process count, which is 32k, or some system limit on file descriptors and so on. You should not rule out the possibility that the limitation is on the kernel side of things. I've seen systems come to their limits quite easily due to kernel configurations which can be really hard to handle.

ASP.NET hosting with unlimited single-node scalability

Since this question is from a user's (developer's) perspective I figured it might fit better here than on Server Fault.
I'd like an ASP.NET hosting that meets the following criteria:
The application seemingly runs on a single server (so no need to worry about e.g. session state or even static variables)
There is an option to scale storage, memory, DB size and CPU-power up and down on demand, in an "unlimited" way
I researched but there seems not to be such a platform, that completely abstracts the underlying architecture away and thus has the ease of use of a simple shared hosting but "unlimited" scalability.
"Single server" and "scalability" are mutually exclusive, I'm afraid. But a good load-balancer will apply affinity to requests so you don't need to needlessly double-cache data on multiple servers.
However, well-designed web applications are easy to port to a multiple-server scenario.
I think your best option is something like Windows Azure Websites (separate from Azure Web Workers) which run on a VM you don't have access to. The VM itself provides enough power as-is necessary to run your website, so you don't need to worry about allocating extra CPU power or RAM.
Things like SQL Server are handled separately, but is very cheap to run, and you can drag a slider to give yourself more storage space.
This can be still accomplished by using a cloud host like www.gearhost.com. Apps live in the cloud and by default get 1 node worker so session stickiness is maintained. You can then scale that application larger workers to accomplish what you need, all while maintaining HA and LB. Even further you can add multiple web workers. Each visitor is tied to a particular node to maintain session state even though you might have 10 workers for example. It's an easy and cheap way to scale a site with 100 visitors to many million in just a few clicks.

web service that can withstand with 1000 concurrent users with response in 25 millisecond

Our client requirement is to develop a WCF which can withstand with 1-2k concurrent website users and response should be around 25 milliseconds.
This service reads couple of columns from database and will be consumed by different vendors.
Can you suggest any architecture or any extra efforts that I need to take while developing. And how do we calculate server hardware configuration to cope up with.
Thanks in advance.
Hardly possible. You need network connection to service, service activation, business logic processing, database connection (another network connection), database query. Because of 2000 concurrent users you need several application servers = network connection is affected by load balancer. I can't imagine network and HW infrastructure which should be able to complete such operation within 25ms for 2000 concurrent users. Such requirement is not realistic.
I guess if you simply try to run the database query from your computer to remote DB you will see that even such simple task will not be completed in 25ms.
A few principles:
Test early, test often.
Successful systems get more traffic
Reliability is usually important
Caching is often a key to performance
To elaborate. Build a simple system right now. Even if the business logic is very simplified, if it's a web service and database access you can performance test it. Test with one user. What do you see? Where does the time go? As you develop the system adding in real code keep doing that test. Reasons: a). right now you know if 25ms is even achievable. b). You spot any code changes that hurt performance immediately. Now test with lots of user, what degradation patterns do you hit? This starts to give you and indication of your paltforms capabilities.
I suspect that the outcome will be that a single machine won't cut it for you. And even if it will, if you're successful you get more traffic. So plan to use more than one server.
And anyway for reliability reasons you need more than one server. And all sorts of interesting implementation details fall out when you can't assume a single server - eg. you don't have Singletons any more ;-)
Most times we get good performance using a cache. Will many users ask for the same data? Can you cache it? Are there updates to consider? in which case do you need a distributed cache system with clustered invalidation? That multi-server case emerging again.
Why do you need WCF?
Could you shift as much of that service as possible into static serving and cache lookups?
If I understand your question 1000s of users will be hitting your website and executing queries on your DB. You should definitely be looking into connection pools on your WCF connections, but your best bet will be to avoid doing DB lookups altogether and have your website returning data from cache hits.
I'd also look into why you couldn't just connect directly to the database for your lookups, do you actually need a WCF service in the way first?
Look into Memcached.

Logging across multiple web servers

I would like to know how people dealing with logging across multiple web servers. E.g. Assume there are 2 webservers and some events during the users session are serviced from one, some from the other. How would you go about logging events from the session coherently in one place (without e.g.creating single points of failure)? Assuming we are using: ASP.Net MVC, log4net.
Or am I looking at this the wrong way - should I log seperately and then merge later?
Thanks,
S
UPDATE
Please also assume that the load balancers will not guarantee that a session is stuck to one server.
You definitely want your web servers to log locally rather than over a network. You don't want potential network outages to prevent logging operations and you don't want the overhead of a network operation for logging. You should have log rotation set up and all your web servers clock's synced. When log rotation rolls your log files over to a new file, have the completed log files from each web server shipped to a common destination where they can be merged. I'm not a .net guy but you should be able to find software out there to merge IIS logs (or whatever web server you're using). Then you analyze the merged logs. This strategy is optimal except in the case that you need real-time log analysis. Do you? Probably not. It's fairly resilient to failures (assuming you have redundant disks) because if a server goes down, you can just reboot it and reprocess any log ship, log merge or log analysis operations that were interrupted.
An interesting solution alternative:
Have 2 log files appenders
First one in the local machine
In case of network failure you'll keep this log.
Second log to a unix syslog service remotely (of course
a very consistent network connection)
I used a similar approach long time ago, and it work really well, there are
a lot of nice tools for analyzing unix logs.
Normally your load balancing would lock the user to one server after the session is started. Then you wouldn't have to deal with logs for a specific user being spread across multiple servers.
One thing you could try is to have the log file in a location that is accessible by all web servers and have log4net configured to write to it. This may be problematic, however, with multiple processes trying to write to the same file. I have read about NLog which may work better in this scenario.
Also, the log4net FAQ has a question and possible solution to this exact problem

Resources