nginx lua, storing data across scripts without SQL? - nginx

I'm wondering if there's a way to store very large arrays across scripts. In my previous solution, I was using SQL to store this data, but with 200 requests to the page every second, that's a lot of unnecessary very large select queries!
I was thinking perhaps there might exist an nginx module out there in the wild that allows you to store data that can be easily modified and accessed through lua without having to be deleted from memory and added to memory 200 times a second.
The only other option I can think of is building an nginx module to run my app and forgoing lua entirely. Ideas, anyone?

Use lua_nginx's built-in ngx.shared.DICT for fast in-memory storage.
From the documentation:
The shared dictionary will retain its contents through a server config
reload (either by sending the HUP signal to the Nginx process or by
using the -s reload command-line option).
The contents in the dictionary storage will be lost, however, when the
Nginx server quits.
Load your data from SQL into the shared dict, then use the shared dict from there.

Related

Next.js caching requests to database used in `getStaticPaths` and `getStaticProps` to improve build time

I have several dynamic pages which use exactly same getStaticPaths and also invoke exactly same database request in getStaticProps. How can I cache results of database requests so they can be reused when building different pages? I have tried to add basic in-memory memoization but it seems like it does not do much. My guess is that pages might be rendered in different workers and they don't share memory.
If you have heavy computations / requests, you can make a helper that would fetch data from the database and store results in a temporary file. So, you can check whether this file is created and then read it or it's a first request and data needs to be fetched from the database.

Fill in NGINX cache from different service

I have an idea to form NGINX cache in non most common way and I want to ask if this is really possible to achieve.
The common way we all are used is when request hits the backend service and only then it's cached in NGINX.
What I want to achieve is to form that NGINX native cache from separate service. That means I want to manipulate hashed keys that are stored in memory via some NGINX module and also create that directory structure with files that contain cached payloads.
The questions would be:
Is this possible?
How to achieve this, what modules should I include into NGINX, etc.?
NGINX is writing cached data to filesystem using some algorithm described here: http://czerasz.com/2015/03/30/nginx-caching-tutorial/. What is actually stored in the first line of that cached file? Everything from the second line is payload but there are some bytes written into the first line that are non readable and cache does not work in case this line is removed.
Thanks in advance!

Asynchronous logging of server-side activities

I have a web server that on every request produces a log that should be persistently saved in a DB.
Since the request rate is too high, I can't make DB operations on every request.
I thought to do the following thing:
Any request to a web server produces a log.
This log is placed somewhere in a place where it can be stored in a fast way (redis?)
Another service (a cron job?) periodically flushes the data from that place, removes duplicates (yes, there can be duplicates that don't need to be stored in the DB) and makes a single MySQL query to save the data permanently.
What would be the most efficient way to achieve this thing?
Normally you would use a common logging library (e.g. log4j) which will manage safely writing your log statements to the file. However you should note that log verbosity can still impact application performance.
After the file is on disk, you can do whatever you like with it - it would be completely normal to ingest that file into Splunk for further processing and ad-hoc searches/alerting.
If you want to have better performance on this operation, you should send you'r logs to some kind of queue and with a service reading that queue, send all the logs to the database.
I found some information about queue's in mysql.
MySQL Insert Statement Queue
regards.

how to start Apache Jena Fuseki as read only service (but also initially populate it with data)

I have been running an Apache Jean Fuseki with a closed port for a while. At present my other apps can access this via localhost.
Following their instructions I start this service as follows:
./fuseki-server --update --mem /ds
This create and updatable in memory database.
The only way I currently know how to add data to this database is using the built-in http request tools:
./s-post http://localhost:3030/ds/data
This works great except now I want to expose this port so that other people can query the dataset. However, I don't want to allow people to update or change the database, I just want them to be able to use and query the information I originally loaded into the data base.
According to the documentation (http://jena.apache.org/documentation/serving_data/), I can make the database read-only by starting it without the update option.
Data can be updated without access control if the server is started
with the --update argument. If started without that argument, data is
read-only.
But when I start the database this way, I am no longer able to populate with the initial dataset.
So, MY QUESTION: How I start an in-memory Fuseki database which I can populate with my original dataset but then disallow further http updates.
(My guess is that I need another method to populate the Fueseki database that is not using the http protocol. But I'm not sure)
Some options:
Here are some options:
1/ Use TDB tools to build a database offline and then start the server read only on that TDB database.
2/ Like (1) but use --update to build a persistent database, then stop the server, and restart without --update. The database is now read only. --update affects the services available and does not affect the data in any other way.
Having a persistent database has the huge advantage that you can start and stop the server without needing to reload data.
3/ Use a web server to pass through query requests to the fuseki server and limit the Fuseki server to talk to only localhost. You can update from the local machine, external people can't.
4/ Use Fuseki2 and adjust the security settings to allow update only from localhost but query from anywhere.
What you can't do is update a TDB database currently being served by Fuseki.

flask manage db connection :memory:

I have a flask application that needs to store some information from requests. The information is quite short-lived and if the server is restarted I do not need it any more - so I do not really need persistence.
I have read here that an Sqlite database, which is held in memory can be used for that. What is the best way to manage the database connection? In the flask documentation connections to the database are created on demand, but my database will be deleted if I close the connection.
The problem with using an in memory sqlite db is that your Sqlite in-memory databases cannot be accessed from multiple threads.
http://www.sqlite.org/inmemorydb.html
To further the problem, you are likely going to have more than one process running your app, which makes using an in-memory global variable out of the question as well.
So unless you can be certain that your app will only ever require a single thread or a single process (which is unlikely) You're going to need to either:
Use the disk to store state, such as an on-disk sqlite db, or even just some file you parse.
Use a daemonized process that runs separately from your application to manage the state.
I'd personally go with option 2.
You can use memcached for this, running on a central server or even on your app server if you've only got one. This will allow you to store state (including python objects!) temporarily, in memory and you can even set timeout values for when the data should expire, which from the sound of things might be useful for your app.
Since you're using Flask, you've got some really good built-in support for using a memcached cache, check it out here: http://flask.pocoo.org/docs/patterns/caching/
As for getting memcached running on your server, it's really just an apt-get or yum install away. Let me know if you have questions or challenges and I'll be happy to update.

Resources