Multiple simultaneous threads using SQLite in R [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
My question is basically whether it is safe, when using parallel processing in R, to have multiple threads accessing an SQLite database simultaneously.
I understand that SQLite is a file level dbs, so every connection gets access to the whole db. So, it is possible to have multiple connections going simultaneously (e.g., via the SQLite3 front end and, in R, via RSQLite's dbConnect() and via dplyr's src_sqlite()). I guess that this is OK so long as there is a single user who can assure that commands submitted one way are completed before other commands are submitted.
But with multithreading, it would seem possible that one thread might submit a command to an SQLite db while a command submitted by another thread might not have completed.
Does the underlying SQLite engine serialize received commands so that it is assured that one command is completed before the next one is processed, so as to avoid creating an inconsistent status of the database?
I have read the SQLite documentation on locking and "ACID," and as I understand this documentation, the answer appears to be "Yes."
But I want to be sure that I have understood things correctly.
Another question is whether it is safe to have separate threads submitting commands simultaneously that actually change the database.
Since one can't control the exact timing by which the two threads submit their commands, I assume that using parallel processes that might change an SQLite data table in an inconsistent way would not be a good idea -- e.g., having one thread insert a record into a table and another thread doing a SELECT on the same table.

It is okay if it is reading the database, but writing to the database locks the database for at least a few milliseconds. If you try to read while it is writing (or write while it is writing), an error will be returned, which can be used in order to determine whether you should retry the read/write operation. If this is for a relatively simple process, you should be fine with sqlite3. Source

Related

ASP.Net MVC and Database Connections [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I currently have an ASP.NET MVC application that has a static class to connect to the database. In some tests, I noticed that when I start using the application from several different sessions, it is getting extremely slow. If I only have one session, it works in an acceptable way, but from 3 sessions the usage starts to become unfeasible.
The static connection is starting in the Application_Start of Global.asax, I believe that this slowness is due to all of them competing for the same connection, right?
Given this, I decided to change the operation of the same, but I have two approaches that I think to follow, but I would like an opinion to know which would be the best:
1) Establish a session connection started in Global.asax, however I am afraid that due to certain actions that the application executes almost simultaneously, this approach is also slow at a given time.
2) Establish a connection for each query action to the database, instantiating the connection, opening the connection, executing the action and closing the connection. But again, due to the high number of actions that the application performs when loading some pages, I'm afraid to pop the connection pool by working this way.
Can you help me? Do you have a vision of another approach that can be used?
Currently, we are using ADO.Net and in some tests, we did with NHibernate even with just one user the gigantic slowness.
I thank the attention.
(Translated Post)
You can try to use Entity Framework.
http://www.entityframeworktutorial.net/what-is-entityframework.aspx
https://learn.microsoft.com/en-us/aspnet/mvc/overview/getting-started/getting-started-with-ef-using-mvc/creating-an-entity-framework-data-model-for-an-asp-net-mvc-application
Assuming that you have the application(ASP.NET MVC) and database already created you need to opne Visual Studio, select the Models folder>Add>New Item an than you need to choose ADO.NET Entity Data Model. After that you will see the Entity Data Model Wizard. From that window choose the first option, Ef Designer from Database. Click next, add the db connection parameters(db server, database name), click next, the the objects from db you want to use(tables, stored procedure) and click Finish
Hope it helps !

When is the session abandoned during a typical session? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to figure out an issue I'm having with sitecore. I'm wondering if my issue is basically a problem with their reliance on Session.Abandon():
For performance reasons Sitecore only writes contact data to xDB (this is mongo) when
the session ends.
This logic seems somewhat flawed (unless I misunderstand how sessions are managed in Asp.Net).
At what point (without explicitly calling Session.Abandon()) is the session flushed in this model? i.e. When will the session_end event be triggered?
Can you guarantee that the logic will always be called or can
sessions be terminated without triggering an Abandon event? for example when the app_pool is recycled.
I'm trying to figure this out as it would explain something that I'm experiencing, where the data is fine in session but is written intermittently into the mongoDb
I think that strategy for building the data in session and then flushing the data to MongoDb fits for xDb.
xDb is designed to be high volume so it makes sense for the data to be aggregated rather than constantly being written into a database table. This is the way DMS worked previously and doesn't scale very well.
The session end in my opinion is pretty reliable, and Sitecore give you various option for persisting session (inproc, mongo, SQL server), MongoDb and SQL Server are recommended for production environments. You can write Contact data directly to MongoDb by using the Contact Repository api but for live capturing of data you should use the Tracker api. When using the tracker api, as far as I am aware, the only way to get data into MongoDb is to flush the session.
If you need to flush the data to xDb for testing purposes then Session.Abandon() will work. I have a module here which you can use for creating contacts and then flushing the session, so you can see how reliable the session abandon is by checking in MongoDb.
https://marketplace.sitecore.net/en/Modules/X/xDB_Contact_Creator.aspx

Why use Process ID (PID)? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I learned how you can list all the processes and their id's using either of these commands:
ps
tasklist
But so far all I have seen people use a process id is to kill the process. Given a pid, is there any other purpose?
ps -> Process statistics - lists currently executing processes by owner and PID (process ID) in linux.
Uses of ps:
To display system processes
To force certain actions like forcefully logging off a user, killing certain process
The /proc directory contains subdirectories with unusual numerical names. Every one of these names maps to the process ID of a currently running process. Within each of these subdirectories, there are a number of files that hold useful information about the corresponding process.
Find painfully slow process
Identify top processes by CPU and memory usage etc
Find the hierarchy in relationship between processes
Clubbing ps with watch command would make it a realtime reporting tool

How do I prevent SQLite database locks?

From sqlite FAQ I've known that:
Multiple processes can have the same database open at the same time.
Multiple processes can be doing a SELECT at the same time. But only
one process can be making changes to the database at any moment in
time, however.
So, as far as I understand I can:
1) Read db from multiple threads (SELECT)
2) Read db from multiple threads (SELECT) and write from single thread (CREATE, INSERT, DELETE)
But, I read about Write-Ahead Logging that provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently.
Finally, I've got completely muddled when I found it, when specified:
Here are other reasons for getting an SQLITE_LOCKED error:
Trying to CREATE or DROP a table or index while a SELECT statement is
still pending.
Trying to write to a table while a SELECT is active on that same table.
Trying to do two SELECT on the same table at the same time in a
multithread application, if sqlite is not set to do so.
fcntl(3,F_SETLK call on DB file fails. This could be caused by an NFS locking
issue, for example. One solution for this issue, is to mv the DB away,
and copy it back so that it has a new Inode value
So, I would like to clarify for myself, when I should to avoid the locks? Can I read and write at the same time from two different threads? Thanks.
For those who are working with Android API:
Locking in SQLite is done on the file level which guarantees locking
of changes from different threads and connections. Thus multiple
threads can read the database however one can only write to it.
More on locking in SQLite can be read at SQLite documentation but we are most interested in the API provided by OS Android.
Writing with two concurrent threads can be made both from a single and from multiple database connections. Since only one thread can write to the database then there are two variants:
If you write from two threads of one connection then one thread will
await on the other to finish writing.
If you write from two threads of different connections then an error
will be – all of your data will not be written to the database and
the application will be interrupted with
SQLiteDatabaseLockedException. It becomes evident that the
application should always have only one copy of
SQLiteOpenHelper(just an open connection) otherwise
SQLiteDatabaseLockedException can occur at any moment.
Different Connections At a Single SQLiteOpenHelper
Everyone is aware that SQLiteOpenHelper has 2 methods providing access to the database getReadableDatabase() and getWritableDatabase(), to read and write data respectively. However in most cases there is one real connection. Moreover it is one and the same object:
SQLiteOpenHelper.getReadableDatabase()==SQLiteOpenHelper.getWritableDatabase()
It means that there is no difference in use of the methods the data is read from. However there is another undocumented issue which is more important – inside of the class SQLiteDatabase there are own locks – the variable mLock. Locks for writing at the level of the object SQLiteDatabase and since there is only one copy of SQLiteDatabase for read and write then data read is also blocked. It is more prominently visible when writing a large volume of data in a transaction.
Let’s consider an example of such an application that should download a large volume of data (approx. 7000 lines containing BLOB) in the background on first launch and save it to the database. If the data is saved inside the transaction then saving takes approx. 45 seconds but the user can not use the application since any of the reading queries are blocked. If the data is saved in small portions then the update process is dragging out for a rather lengthy period of time (10-15 minutes) but the user can use the application without any restrictions and inconvenience. “The double edge sword” – either fast or convenient.
Google has already fixed a part of issues related to SQLiteDatabase functionality as the following methods have been added:
beginTransactionNonExclusive() – creates a transaction in the “IMMEDIATE mode”.
yieldIfContendedSafely() – temporary seizes the transaction in order to allow completion of tasks by other threads.
isDatabaseIntegrityOk() – checks for database integrity
Please read in more details in the documentation.
However for the older versions of Android this functionality is required as well.
The Solution
First locking should be turned off and allow reading the data in any situation.
SQLiteDatabase.setLockingEnabled(false);
cancels using internal query locking – on the logic level of the java class (not related to locking in terms of SQLite)
SQLiteDatabase.execSQL(“PRAGMA read_uncommitted = true;”);
Allows reading data from cache. In fact, changes the level of isolation. This parameter should be set for each connection anew. If there are a number of connections then it influences only the connection that calls for this command.
SQLiteDatabase.execSQL(“PRAGMA synchronous=OFF”);
Change the writing method to the database – without “synchronization”. When activating this option the database can be damaged if the system unexpectedly fails or power supply is off. However according to the SQLite documentation some operations are executed 50 times faster if the option is not activated.
Unfortunately not all of PRAGMA is supported in Android e.g. “PRAGMA locking_mode = NORMAL” and “PRAGMA journal_mode = OFF” and some others are not supported. At the attempt to call PRAGMA data the application fails.
In the documentation for the method setLockingEnabled it is said that this method is recommended for using only in the case if you are sure that all the work with the database is done from a single thread. We should guarantee than at a time only one transaction is held. Also instead of the default transactions (exclusive transaction) the immediate transaction should be used. In the older versions of Android (below API 11) there is no option to create the immediate transaction thru the java wrapper however SQLite supports this functionality. To initialize a transaction in the immediate mode the following SQLite query should be executed directly to the database, – for example thru the method execSQL:
SQLiteDatabase.execSQL(“begin immediate transaction”);
Since the transaction is initialized by the direct query then it should be finished the same way:
SQLiteDatabase.execSQL(“commit transaction”);
Then TransactionManager is the only thing left to be implemented which will initiate and finish transactions of the required type. The purpose of TransactionManager – is to guarantee that all of the queries for changes (insert, update, delete, DDL queries) originate from the same thread.
Hope this helps the future visitors!!!
Not specific to SQLite:
1) Write your code to gracefully handle the situation where you get a locking conflict at the application level; even if you wrote your code so that this is 'impossible'. Use transactional re-tries (ie: SQLITE_LOCKED could be one of many codes that you interpret as "try again" or "wait and try again"), and coordinate this with application-level code. If you think about it, getting a SQLITE_LOCKED is better than simply having the attempt hang because it's locked - because you can go do something else.
2) Acquire locks. But you have to be careful if you need to acquire more than one. For each transaction at the application level, acquire all of the resources (locks) you will need in a consistent (ie: alphabetical?) order to prevent deadlocks when locks get acquired in the database. Sometimes you can ignore this if the database will reliably and quickly detect the deadlocks and throw exceptions; in other systems it may just hang without detecting the deadlock - making it absolutely necessary to take the effort to acquire the locks correctly.
Besides the facts of life with locking, you should try to design the data and in-memory structures with concurrent merging and rolling back planned in from the beginning. If you can design data such that the outcome of a data race gives a good result for all orders, then you don't have to deal with locks in that case. A good example is to increment a counter without knowing its current value, rather than reading the value and submitting a new value to update. It's similar for appending to a set (ie: adding a row, such that it doesn't matter which order the row inserts happened).
A good system is supposed to transactionally move from one valid state to the next, and you can think of exceptions (even in in-memory code) as aborting an attempt to move to the next state; with the option to ignore or retry.
You're fine with multithreading. The page you link lists what you cannot do while you're looping on the results of your SELECT (i.e. your select is active/pending) in the same thread.

SQLite and concurrency [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question last month and left it closed:
Original close reason(s) were not resolved
Improve this question
I need to integrate a database in one of our products and I wonder which one would be more suited to our needs (easy automatic deployment, no administration, good performance), and sqlite seems to be a good solution. The problem is that the database could potentially face high concurrency issues: it is accessed through PHP (Apache) each time a client connects to the server the database is running on. One client connects (and execute an INSERT query) approximatively every 10 seconds to the server, and it could possibly have more than 100 clients running.
When executing an INSERT query, sqlite locks the entire database at a certain time for a certain duration. Is there a way to compute that duration? If this is not possible, do you think sqlite (v3.3.7) is still adapted with the above conditions?
I try to avoid emotive replies and hyperbole but I am truly astonished at the lack of knowledge about sqlite displayed on this page. Different database implementations serve different needs and from the operational specs you provide, sqlite3 seems ideal for your needs. To elaborate:
sqlite3 is fully ACID compliant, meaning it ensures atomic commits, which is something neither MySQL (good as it may be) nor Oracle can brag about. See more here
Also, sqlite3 has a deceptively simple mechanism for ensuring maximum concurrency (which is also thread-safe) as described in their File locking and Concurrency document.
By their (sqlite3 developers') own estimation, sqlite3 is capable of up to 50,000 INSERTs per second - a theoretical maximum which is limited by disk rotation speed. ACID compliance requires sqlite3 to confirm that a database commit has been written to disk, so an INSERT, UPDATE or DELETE transaction requires two full disk rotations, thereby effectively reducing the number of transactions to 60/s on a 7200rpm diskdrive. This is outlined in the sqlite FAQ linked in another answer and the fact gives some idea of the engine's data throughput capability in production. But what about concurrent reading and writing?
The File locking and Concurrency document linked earlier, explains how sqlite3 avoids "writer startvation" - a condition whereby heavy database read access prevents a process/thread seeking to write to the database from acquiring a lock. The escalation of locking state from SHARED to PENDING to EXCLUSIVE happens as sqlite3 encounters an INSERT (or UPDATE or DELETE) statement and then again upon COMMIT, meaning that the full database lock is delayed to the last moment before an actual write is performed. The outcome of sqlite's clever mechanism for handling file locking means that should a writer join the queue (PENDING lock), existing reads (SHARED locks) will complete, grant an EXCLUSIVE lock to the writer process and then resume reading. This takes only a few milliseconds, meaning that the effective transaction throughput will hardly move from the 60/s rate quoted above.
I believe the default sqlite3 WAIT on an EXCLUSIVE lock is 3 seconds, so given the fact that 60 transactions per second is a reasonable expectation and that you seek to write to the database on average once every 10 seconds - I'd say sqlite3 is well up to the task and will only require the introduction of clustering once your traffic increases by a factor of 500.
Not bad and perfect for your requirement.
I don't think that SQLite would be a good solution for those requirements. SQLite is designed for local and lightweight use only, not to serve hundreds of requests.
I would recommend some other solution, for example MySQL or PostgreSQL, both can be scripted quite well. So, if I were you, I would put my efforts into the setup scriptings.
To avoid the flame war between SQLite believers and haters, let me draw draw your attention to the often referred SQLite When-To-Use document (I believe it is considered as a credible source). Here they state the following:
Situations Where A Client/Server RDBMS May Work Better
High Concurrency
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.
I think that in the referred question involves many writes and if the OP would go for SQLite, it would result a non-scalable solution.
Here is what SQLite has to say about appropriate uses of SQLite: http://www.sqlite.org/whentouse.html In particular, that page says SQLite is good for low the medium traffic sites, exactly the sort of application that you're contemplating.
Seems like SQLite would work for you, unless you expect substantial growth. Depending on what you do in each request, I would expect that a query rate of 0.17 queries per second to be well within SQLite's capabilities.
For good user experience, you should design your site so that queries needed to service a single request take ~ 200 milliseconds. To achieve this, result sets should probably not touch more than a few score rows; and should rely on indeces, not full table scans. If you hit that, then you'll have enough headroom to serve 5 queries per second (at peak). That's 30x the requirement that you state in your question.
The SQLite FAQ covers this topic: http://www.sqlite.org/faq.html (See: "(5) Can multiple applications or multiple instances of the same application access a single database file at the same time?")
But for your particular use, you'd probably want to do some stress testing to verify it'll meet your needs. 100 concurrent users might be a bit much for SQLite.
I read through the FAQs, and it seems that SQLite has some pretty decent support for concurrency, but may require the use of transactions to be sure things are going to go well.
The comments above regarding Apache concurrency are correct: one Apache server can serve out multiple requests, the number depends on how many processes are run. Most of the servers I run, its set to 3-5 processes, while on larger installations, it might reach 20. The point here is that SQLite can more than handle a small to medium traffic web site, as it can do thousands of inserts a second.
I plan on using SQLite for my current project, but to be safe, I fully intend on using BEGIN TRANSACTION and COMMIT for any writing or concurrency-sensitive parts.
Bottom line, as usual, Read The Manual.
In addition to the discussion above about sqlite3, one more awesome feature introduced in sqlite v 3.7.0 is WAL Mode, in which you can read from multiple processes and can write with one process at the same time.
have a look at http://www.sqlite.org/wal.html

Resources