How to use OpenResty (nginx) to reduce data access time - nginx

I need to use nginx to proxy to different backend server according to the configuration information in database. One way is to use another program to write data in Redis and using OpenResty to access data in Redis.
To reduce access time, is there a better way to access data such as storing data in local memory using OpenResty and access that data in local memory?

OpenResty has build-in key-value storage. All data is shared between nginx workers by shared memory, so it is notably faster than accessing Redis.
It is possible to load all required values in init_by_lua*.
Probably you will need to use some cosocket-based library to access the database, cosocket API is disabled in
init_worker_by_lua*, but you may fire a timer with zero delay as workaround.
To avoid redundant database polling by multiple nginx workers you may start the timer for the first nginx worker only, when ngx.worker.id == 0.
This approach, of course, works only with static configuration data. I use it in a number of projects.

Related

uWSGI and Flask: keep objects in memory between requests

My stack is uWSGI, flask and nginx currently. I have a need to store data between requests (basically I receive push notifications from another service about events to the server and I want to store those events in the server memory, so client can just query server every n milliseconds, to receive latest update).
Normally this would not work, because of many reasons. One is a good deployment requires you to have several processes in uwsgi in production (and even maybe several machines to scale this out). But my case is very specific: I'm building a web app for a piece of hardware (You can think of your home router configuration page as a good example). This means no need to scale. I also do not have a database (at least not a traditional one) and probably normally 1-2 clients simultaneously.
if I specify --processes 1 --threads 4 in uwsgi, is this enough to ensure the data is kept in the memory as a single instance? Or do I also need to use --threads 1?
I'm also aware that some web servers clear memory randomly from time to time and restart the hosted app. Does nginx/uwsgi do that and where can I read about the rules?
I'd also welcome advises on how to design all of this, if there are better ways to handle this. Please note that I do not consider using any persistant storage for this - this does not worth the effort and may be even impossible due to hardware limitations.
Just to clarify: When I'm talking about one instance of data, I'm thinking of my app.py executing exactly one time and keeping the instances defined there for as long as the server lives.
If you don't need data to persist past a server restart, why not just build a cache object into you application that can do push and pop operations?
A simple array of objects should suffice, one flask route pushes new data to the array and another can pop the data off the array.

Collect data for Bosun from multiple endpoints

In the observability system we're building from scratch, we'd like to have a single scollector to collect data from all the web servers and send it to Bosun, instead of having an instance of scollector on each server.
Do you know if there's a way to achieve that?
Scollector is implemented as an agent, similar to OpenTSDB's tcollector. It's lightweight and doesn't cause too much overhead on the hosts.
If you want all the data that scollector is capable of collecting forwarded to Bosun, there needs to be a single agent per host to monitor. Scollector makes use of procfs and similar which is only accessible on the hosts directly.
You can additionally create your own additional collectors that scollector will invoke for you.
With that, depending on your use case, you might be able to collect data from remote hosts, but scollector is really designed to run as an agent on every host and collect the data locally.

Sailsjs distribution across multiple Google compute engine instances

Sailsjs requires setup to handle scaling horizontally. There are multiple ways to do this. I'm not sure if I have done this correctly, due to poor performance during load testing. Please confirm if I understand and am doing the setup correctly.
I've created a load balancer on the Google platform for handling the distribution of requests across the instances. Much is spoken about of Nginx for distributing, but I understand Googles load balancer does all I need in this regard. Note, I use session affinity: Client IP.
I've set up config/session.js to use express-mysql-session, so MemoryStore is not used.
I haven't set up anything in config/sockets.js. My project doesn't use live chat etc with socket.io, all requests are to waterline for data from db. But if this is a issue, please refer me to a way to do this with Mysql db not redis (or memory).
I use pm2 as a way to keep it live and to distribute processing on a instance.
Those are the main factors I've found regarding horizontal scaling with sailsjs.

Best practice for selecting which Redis server to read in real time

I have a nginx server that has a redis master and two salves of the master. The slaves are read and the master is read and write. Nginx server is fastcgi using spawed python apps and using pyredis.
When is comes for a read from my nginx app, what is best practice for determining which server gets the read among the three? Is it determined in realtime? Do I just do simple random selection using round robin in real time?
Again, I just have on master. Soon I will have two and will use consistent hashing in python using http://pypi.python.org/pypi/hash_ring so select which server gets the keys.
For the interim, is it wise to select which server will get the read using the hash ring even though they should be exact copies?
Thanks,
what you should do is abstract the code that does that so it doesn't change your app logic later when you split the data.
and as for reading - I'd use just the slaves for that. you can use the hashing if you want, provided it doesn't affect your code and is abstracted.

Local SQLite vs Remote MongoDB

I'm designing a new web project and, after studying some options aiming scalability, I came up with two database solutions:
Local SQLite files carefully designed for a scalable fashion (one new database file for each X users, as writes will depend on user content, with no cross-user data dependence);
Remote MongoDB server (like Mongolab), as my host server doesn't serve MongoDB.
I don't trust MySQL server at current shared host, as it cames down very frequently (and I had problems with MySQL on another host, too). For the same reason I'm not goint to use postgres.
Pros of SQLite:
It's local, so it must be faster (I'll take care of using index and transactions properly);
I don't need to worry about tcp sniffing, as Mongo wire protocol is not crypted;
I don't need to worry about server outage, as SQLite is serverless.
Pros of MongoDB:
It's more easily scalable;
I don't need to worry on splitting databases, as scalability seems natural;
I don't need to worry about schema changes, as Mongo is schemaless and SQLite doesn't fully support alter table (specially considering changing many production files, etc.).
I want help to make a decision (and maybe consider a third option). Which one is better when write and read operations is growing?
I'm going to use Ruby.
One major risk of the SQLite approach is that as your requirements to scale increase, you will not be able to (easily) deploy on multiple application servers. You may be able to partition your users into separate servers, but if that server were to go down, you would have some subset of users who could not access their data.
Using MongoDB (or any other centralized service) alleviates this problem, as your web servers are stateless -- they can be added or removed at any time to accommodate web load without having to worry about what data lives where.

Resources