multiple instances of rocksdb - rocksdb

On a multi-core server (or cluster) want to deploy a rocksdb db on each core that is independent of each other ie. not looking for a distributed db. Is this possible?
For each in-memory db, does it need to communicate with storage during runtime operations ie. not at startup or close down or are all db operations performed in-memory?

1) Yes, it's possible. RocksDB is simply a C++ library which you can compile with your code that manages multiple rocksdb instances a multi-core server (or cluster). Multiple rocksdb instances can also share the same set of resources (such as sharing the same thread pool) by having them using the same Env (see Options::env).
// Use the specified object to interact with the environment,
// e.g. to read/write files, schedule background work, etc.
// Default: Env::Default()
Env* env;
2) if the directory of your rocksdb instance is in memory (such as mounting via tmpfs), then all db operations are guaranteed to be performed in-memory. To make such rocksdb persistent, you can also optionally have write-ahead-log writing to some persistent storage like flash or disk.

Related

Data sharing - SQLite vs Shared Memory IPC

I would like to get your opinion regarding a design implementation for data sharing.
I am working on Linux embedded device (mips 200 Mhz) and I want to have some sort of data sharing between multiple processes which can either read or write multiple parameters at once.
This data holds ~200 string parameters which are updated every second.
Process may access to data around ~10 times in 1 second.
I would very much like to try and make the design efficient (CPU / Mem).
This data is not required to be persistent and will be recreated every reboot.
Currently, I am considering two options:
Using shard memory IPC (SHM) + semaphore (locking on all SHM).
To use SQLite memory based DB.
For either option, I will supply a C interface library which will perform all the logic of DB operation.
For SHM, this mean locking/unlocking the semaphore and access the parameters which can be referred as an indexed array.
For SQLite, my library will be a wrapper for the SQLite interface library, so the process will not have to know SQL syntax, (some parsing should be done for queries and reply).
I believe that shared memory is more efficient:
No need to use and parse SQL, and it is accessed as an array.
Saying that, there are some pros as well for using SQLite:
Already working and debugged (DB level).
Add flexibility.
Used widely in many embedded systems.
Getting to the point,
Performance wise, I have no experience with SQLite, I would appreciate if you can share your opinions and experience.
Thanks
SQLite's in-memory databases cannot be shared between processes, but you could put the DB file into tmpfs.
However, SQLite does not do any synchronization between processes. It does lock the DB file to prevent update conflicts, but if one process finds the file already locked, it just waits for a random amount of time.
For efficient communication between processes, you need to use a mechanism like SHM/semaphores or pipes.

flask manage db connection :memory:

I have a flask application that needs to store some information from requests. The information is quite short-lived and if the server is restarted I do not need it any more - so I do not really need persistence.
I have read here that an Sqlite database, which is held in memory can be used for that. What is the best way to manage the database connection? In the flask documentation connections to the database are created on demand, but my database will be deleted if I close the connection.
The problem with using an in memory sqlite db is that your Sqlite in-memory databases cannot be accessed from multiple threads.
http://www.sqlite.org/inmemorydb.html
To further the problem, you are likely going to have more than one process running your app, which makes using an in-memory global variable out of the question as well.
So unless you can be certain that your app will only ever require a single thread or a single process (which is unlikely) You're going to need to either:
Use the disk to store state, such as an on-disk sqlite db, or even just some file you parse.
Use a daemonized process that runs separately from your application to manage the state.
I'd personally go with option 2.
You can use memcached for this, running on a central server or even on your app server if you've only got one. This will allow you to store state (including python objects!) temporarily, in memory and you can even set timeout values for when the data should expire, which from the sound of things might be useful for your app.
Since you're using Flask, you've got some really good built-in support for using a memcached cache, check it out here: http://flask.pocoo.org/docs/patterns/caching/
As for getting memcached running on your server, it's really just an apt-get or yum install away. Let me know if you have questions or challenges and I'll be happy to update.

SQLite shared cache

I have a huge (>10GB) sqlite database that is shared among many (up to CPU core count) processes (same executable). This is a specialized application so RAM is not an issue and I want to cache as much of the database in memory. I have found about PRAGMA cache_size; and I am successfully using it but this blows the RAM usage out of proportion as each of many processes has its own private cache.
Now, I found SQLite Shared-Cache Mode but I can't see if this applies to different processes or just threads in one process. I have run some tests which confirm the latter but I am not sure if I am doing something wrong or whether something else needs to be done to make this work.
That page explains that "the same cache can be shared across an entire process".
In theory, you could try to configure your OS so that the entire database is held in the file cache.
If the amount of data in individual queries is small, it might be worthwhile to use a client/server database so that the caching needs to be done only in the server process.

System.Data.SQLite in-memory database multi-threading

I am creating a System.Data.SQLite in-memory database using connection string as
"Data Source=:memory:",
and want to access this database among multi-threads.
Now what I do is to clone the SQLiteConnection object and pass the copy to worker threads.
But I found that different threads actually get individual instances of in-memory database, not a shared one. How can I share one in-memory database among threads?
Thanks!
Based on the SQLite documentation for in-memory databases, I would try a datasource named with URI filename convention file::memory:?cache=shared or the like instead of :memory: (and note specifically the cache name that all connections are being told to use). As explained on the page, every instance of a :memory: is distinct from one another, exactly as you found.
Note you may also have to first enable shared-cache mode before making the connections to the in-memory database (as specified in the shared cache documentation with a call to sqlite3_enable_shared_cache(int) for this to work.

Can I achieve scalable multi-threaded access to an in-memory SQLite database

I have a multi-threaded Linux C++ application that needs a high performance reference data lookup facility. I have been looking at using an in-memory SQLite database for this but can't see a way to get this to scale in my multi-threaded environment.
The default threading mode (serialized) seems to suffer from a single coarse grained lock even when all transactions are read only. Moreover, I don't believe I can use multi-thread mode because I can't create multiple connections to a single in-memory database (because every call to sqlite3_open(":memory:", &db) creates a separate in-memory database).
So what I want to know is: is there something I've missed in the documentation and it is possible to have multiple threads share access to the same in-memory database from my C++ application.
Alternatively, is there some alternative to SQLite that I could be considering ?
Yes!
see the following extracted from the documentation at:
http://www.sqlite.org/inmemorydb.html
But its not a direct connection to DB memory, instead to the shared cache.Its a workaround. see the picture.
In-memory Databases And Shared Cache
In-memory databases are allowed to use shared cache if they are opened using a URI filename. If the unadorned ":memory:" name is used to specify the in-memory database, then that database always has a private cache and is this only visible to the database connection that originally opened it. However, the same in-memory database can be opened by two or more database connections as follows:
rc = sqlite3_open("file::memory:?cache=shared", &db);
Or,
ATTACH DATABASE 'file::memory:?cache=shared' AS aux1;
This allows separate database connections to share the same in-memory database. Of course, all database connections sharing the in-memory database need to be in the same process. The database is automatically deleted and memory is reclaimed when the last connection to the database closes.
If two or more distinct but shareable in-memory databases are needed in a single process, then the mode=memory query parameter can be used with a URI filename to create a named in-memory database:
rc = sqlite3_open("file:memdb1?mode=memory&cache=shared", &db);
Or,
ATTACH DATABASE 'file:memdb1?mode=memory&cache=shared' AS aux1;
When an in-memory database is named in this way, it will only share its cache with another connection that uses exactly the same name.
No, with SQLite you cannot access the same in-memory database from different threads. That's by design. More info at SQLite documentation.

Resources