DynamoDB Pessimistic locking with 2 Application - amazon-dynamodb

I have a use-case where there we many many different application
for the sake of simplicity , lets assume I have 2 applications App-A , App-B
App-A updates dynamoDBTable and due to consistency of data , I was thinking to use Pessimistic locking with App-A
App-B , only reads the DynamoDBTable (other attribute which will not be updated by App-A), hence does not require locking.
Also App-B is a front-end application ans latency is a issue. hence I wont want to App-B to acquire locks in a similar fashion as App-A
Can , I use App-A with DynamoDB locks to perform its consistent operations and NOT use DynamoDB locks from App-B ?

DynamoDB does not support pessimistic locking out of the box. You can implement pessimistic locking using the optimistic lock and atomic operation primitives provided by DynamoDB but at the cost of latency and additional consumed capacity.
For your use case, it sounds like there is a client (an application) that only reads data, so in this case there is no need for locks.
For the other client, which makes updates, it is better if you can design the client to use optimistic locking such that it reads the respective data from the table, attempts to update it and if the update fails due to consistency check it re-processes the operation. The reprocessing part depends on what it is the client is doing but the general pattern is the similar in most cases:
read data from table; make note of version of data
process the data and attempt to update table, with condition check based on the previously read version from #1
if the update succeeds, there is nothing left to do
if the update fails due to version condition check fail, go back to #1 and retry
If you're still curious about implementing pessimistic locking in DynamoDb, have a look at this post: https://aws.amazon.com/blogs/database/building-distributed-locks-with-the-dynamodb-lock-client/ that goes into the design of a distributed lock, based on DynamoDB, that can be used by two, or more clients to synchronize access to any resource (not just data in a DynamoDB table)

Related

Cosmos DB - thread safe pattern to allocate an 'available' document to reach request

For example if I was building an airline booking system and all of my seats were individual documents in a cosmos container with PartitionKey of the FlightNumber_DepartureDateTime e.g. UAT123_20220605T1100Z and id of SeatNumber eg. 12A.
A request comes in to allocate a single seat (any seat without a preference).
I want to be able to query the cosmos container for seats where allocated: false and allocate the first one to the request by setting allocated: true allocatedTo:ticketReference. But I need to do this in a thread safe way so that no two requests get the same seat.
Does Cosmos DB (SQL API) have a standard pattern to solve this problem?
The solution I thought of was to query a document and then update it by checking its Etag and if another thread got in first then the update would fail. If it fails then query another document and keep trying until I can successfully update it to claim the seat for this thread.
Is there a better way?
You could achieve this by using transactions. Cosmos DB allows you to write stored procedures that are executed in an atomic transaction, basically serializing concurrent seat reservation operations for you within a logical partition.
Quote from "Benefits of using server-side programming" in the link above:
Atomic transactions: Azure Cosmos DB database operations that are
performed within a single stored procedure or a trigger are atomic.
This atomic functionality lets an application combine related
operations into a single batch, so that either all of the operations
succeed or none of them succeed.
Bear in mind though that transactions come with a cost. They limit scalability of those operations. However in your scenario when you partition data per flight and given that those operations are very fast, this might be the preferable and most reliable option.
I have done something similar with Service Bus queues, essentially allowing you to queue bookings to be saved, therefore you can do the availability logic before you save the booking guaranteeing no overbookings.
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-queues-topics-subscriptions

How does BaseX handle concurrency?

I'm looking at using BaseX as a more flexible database.
How does it handle database concurrency? How does it work in a web app scenario, where two different users could update the same data and effectively get a "dirty read"?
How does it work in a web app scenario, where two different users could update the same data and effectively get a "dirty read"?
Be sure: Transactions are isolated from each other, so that updated anomalies cannot occur.
How does it handle database concurrency?
Have a look at the BaseX wiki page about transaction management, where the approach is described in-detail. Disclaimer: I implemented the newer database locking for BaseX during my thesis work, so I'm involved in the project.
BaseX applies several mechanics to prevent colliding transactions. The old process locking (which still can be enabled using the GLOBALLOCK option) simply denies multiple queries within a process, parallel execution could be achieved throughout multiple database instances, while basic isolation was achieved through per-database file system locks (without any guarantees regarding deadlocks, ...).
The newer database locking isolates parallel transactions by applying two phase locking on database level. Thus, two queries accessing multiple databases do run in parallel given they access different databases, otherwise one of them will have to wait (but they do not run at the same time, for sure). A drawback is that as we want to support deadlock free execution, we went for strict two phase locking, which fetches all database locks before execution of the query, but suffers from a penalty as determining which databases will be accessed is rather difficult in a dynamic language as XQuery, often failing with global locks on all databases.
For the future (given time allows, and no schedule is set) some optimizations are in queue, especially relaxing the strictness for two phase locking and the optimistic concurrency control I already evaluated in my thesis that would bring large gains in parallel execution, especially for web application scenarios.

How do I prevent SQLite database locks?

From sqlite FAQ I've known that:
Multiple processes can have the same database open at the same time.
Multiple processes can be doing a SELECT at the same time. But only
one process can be making changes to the database at any moment in
time, however.
So, as far as I understand I can:
1) Read db from multiple threads (SELECT)
2) Read db from multiple threads (SELECT) and write from single thread (CREATE, INSERT, DELETE)
But, I read about Write-Ahead Logging that provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently.
Finally, I've got completely muddled when I found it, when specified:
Here are other reasons for getting an SQLITE_LOCKED error:
Trying to CREATE or DROP a table or index while a SELECT statement is
still pending.
Trying to write to a table while a SELECT is active on that same table.
Trying to do two SELECT on the same table at the same time in a
multithread application, if sqlite is not set to do so.
fcntl(3,F_SETLK call on DB file fails. This could be caused by an NFS locking
issue, for example. One solution for this issue, is to mv the DB away,
and copy it back so that it has a new Inode value
So, I would like to clarify for myself, when I should to avoid the locks? Can I read and write at the same time from two different threads? Thanks.
For those who are working with Android API:
Locking in SQLite is done on the file level which guarantees locking
of changes from different threads and connections. Thus multiple
threads can read the database however one can only write to it.
More on locking in SQLite can be read at SQLite documentation but we are most interested in the API provided by OS Android.
Writing with two concurrent threads can be made both from a single and from multiple database connections. Since only one thread can write to the database then there are two variants:
If you write from two threads of one connection then one thread will
await on the other to finish writing.
If you write from two threads of different connections then an error
will be – all of your data will not be written to the database and
the application will be interrupted with
SQLiteDatabaseLockedException. It becomes evident that the
application should always have only one copy of
SQLiteOpenHelper(just an open connection) otherwise
SQLiteDatabaseLockedException can occur at any moment.
Different Connections At a Single SQLiteOpenHelper
Everyone is aware that SQLiteOpenHelper has 2 methods providing access to the database getReadableDatabase() and getWritableDatabase(), to read and write data respectively. However in most cases there is one real connection. Moreover it is one and the same object:
SQLiteOpenHelper.getReadableDatabase()==SQLiteOpenHelper.getWritableDatabase()
It means that there is no difference in use of the methods the data is read from. However there is another undocumented issue which is more important – inside of the class SQLiteDatabase there are own locks – the variable mLock. Locks for writing at the level of the object SQLiteDatabase and since there is only one copy of SQLiteDatabase for read and write then data read is also blocked. It is more prominently visible when writing a large volume of data in a transaction.
Let’s consider an example of such an application that should download a large volume of data (approx. 7000 lines containing BLOB) in the background on first launch and save it to the database. If the data is saved inside the transaction then saving takes approx. 45 seconds but the user can not use the application since any of the reading queries are blocked. If the data is saved in small portions then the update process is dragging out for a rather lengthy period of time (10-15 minutes) but the user can use the application without any restrictions and inconvenience. “The double edge sword” – either fast or convenient.
Google has already fixed a part of issues related to SQLiteDatabase functionality as the following methods have been added:
beginTransactionNonExclusive() – creates a transaction in the “IMMEDIATE mode”.
yieldIfContendedSafely() – temporary seizes the transaction in order to allow completion of tasks by other threads.
isDatabaseIntegrityOk() – checks for database integrity
Please read in more details in the documentation.
However for the older versions of Android this functionality is required as well.
The Solution
First locking should be turned off and allow reading the data in any situation.
SQLiteDatabase.setLockingEnabled(false);
cancels using internal query locking – on the logic level of the java class (not related to locking in terms of SQLite)
SQLiteDatabase.execSQL(“PRAGMA read_uncommitted = true;”);
Allows reading data from cache. In fact, changes the level of isolation. This parameter should be set for each connection anew. If there are a number of connections then it influences only the connection that calls for this command.
SQLiteDatabase.execSQL(“PRAGMA synchronous=OFF”);
Change the writing method to the database – without “synchronization”. When activating this option the database can be damaged if the system unexpectedly fails or power supply is off. However according to the SQLite documentation some operations are executed 50 times faster if the option is not activated.
Unfortunately not all of PRAGMA is supported in Android e.g. “PRAGMA locking_mode = NORMAL” and “PRAGMA journal_mode = OFF” and some others are not supported. At the attempt to call PRAGMA data the application fails.
In the documentation for the method setLockingEnabled it is said that this method is recommended for using only in the case if you are sure that all the work with the database is done from a single thread. We should guarantee than at a time only one transaction is held. Also instead of the default transactions (exclusive transaction) the immediate transaction should be used. In the older versions of Android (below API 11) there is no option to create the immediate transaction thru the java wrapper however SQLite supports this functionality. To initialize a transaction in the immediate mode the following SQLite query should be executed directly to the database, – for example thru the method execSQL:
SQLiteDatabase.execSQL(“begin immediate transaction”);
Since the transaction is initialized by the direct query then it should be finished the same way:
SQLiteDatabase.execSQL(“commit transaction”);
Then TransactionManager is the only thing left to be implemented which will initiate and finish transactions of the required type. The purpose of TransactionManager – is to guarantee that all of the queries for changes (insert, update, delete, DDL queries) originate from the same thread.
Hope this helps the future visitors!!!
Not specific to SQLite:
1) Write your code to gracefully handle the situation where you get a locking conflict at the application level; even if you wrote your code so that this is 'impossible'. Use transactional re-tries (ie: SQLITE_LOCKED could be one of many codes that you interpret as "try again" or "wait and try again"), and coordinate this with application-level code. If you think about it, getting a SQLITE_LOCKED is better than simply having the attempt hang because it's locked - because you can go do something else.
2) Acquire locks. But you have to be careful if you need to acquire more than one. For each transaction at the application level, acquire all of the resources (locks) you will need in a consistent (ie: alphabetical?) order to prevent deadlocks when locks get acquired in the database. Sometimes you can ignore this if the database will reliably and quickly detect the deadlocks and throw exceptions; in other systems it may just hang without detecting the deadlock - making it absolutely necessary to take the effort to acquire the locks correctly.
Besides the facts of life with locking, you should try to design the data and in-memory structures with concurrent merging and rolling back planned in from the beginning. If you can design data such that the outcome of a data race gives a good result for all orders, then you don't have to deal with locks in that case. A good example is to increment a counter without knowing its current value, rather than reading the value and submitting a new value to update. It's similar for appending to a set (ie: adding a row, such that it doesn't matter which order the row inserts happened).
A good system is supposed to transactionally move from one valid state to the next, and you can think of exceptions (even in in-memory code) as aborting an attempt to move to the next state; with the option to ignore or retry.
You're fine with multithreading. The page you link lists what you cannot do while you're looping on the results of your SELECT (i.e. your select is active/pending) in the same thread.

SQLite Concurrent Access

Does SQLite3 safely handle concurrent access by multiple processes
reading/writing from the same DB? Are there any platform exceptions to that?
If most of those concurrent accesses are reads (e.g. SELECT), SQLite can handle them very well. But if you start writing concurrently, lock contention could become an issue. A lot would then depend on how fast your filesystem is, since the SQLite engine itself is extremely fast and has many clever optimizations to minimize contention. Especially SQLite 3.
For most desktop/laptop/tablet/phone applications, SQLite is fast enough as there's not enough concurrency. (Firefox uses SQLite extensively for bookmarks, history, etc.)
For server applications, somebody some time ago said that anything less than 100K page views a day could be handled perfectly by a SQLite database in typical scenarios (e.g. blogs, forums), and I have yet to see any evidence to the contrary. In fact, with modern disks and processors, 95% of web sites and web services would work just fine with SQLite.
If you want really fast read/write access, use an in-memory SQLite database. RAM is several orders of magnitude faster than disk.
Yes it does.
Lets figure out why
SQLite is transactional
All changes within a single transaction in SQLite either occur
completely or not at all
Such ACID support as well as concurrent read/writes are provided in 2 ways - using the so-called journaling (lets call it “old way”) or write-ahead logging (lets call it “new way”)
Journaling (Old Way)
In this mode SQLite uses DATABASE-LEVEL locking.
This is the crucial point to understand.
That means whenever it needs to read/write something it first acquires a lock on the ENTIRE database file.
Multiple readers can co-exist and read something in parallel.
During writing it makes sure an exclusive lock is acquired and no other process is reading/writing simultaneously and hence writes are safe.
(This is known as a multiple-readers-single-writer or MSRW lock)
This is why here they’re saying SQlite implements serializable transactions
Troubles
As it needs to lock an entire database every time and everybody waits for a process handling writing concurrency suffers and such concurrent writes/reads are of fairly low performance
Rollbacks/outages
Prior to writing something to the database file SQLite would first save the chunk to be changed in a temporary file. If something crashes in the middle of writing into the database file it would pick up this temporary file and revert the changes from it
Write-Ahead Logging or WAL (New Way)
In this case all writes are appended to a temporary file (write-ahead log) and this file is periodically merged with the original database.
When SQLite is searching for something it would first check this temporary file and if nothing is found proceed with the main database file.
As a result, readers don’t compete with writers and performance is much better compared to the Old Way.
Caveats
SQlite heavily depends on the underlying filesystem locking functionality so it should be used with caution, more details here
You're also likely to run into the database is locked error, especially in the journaled mode so your app needs to be designed with this error in mind
Yes, SQLite handles concurrency well, but it isn't the best from a performance angle. From what I can tell, there are no exceptions to that. The details are on SQLite's site: https://www.sqlite.org/lockingv3.html
This statement is of interest: "The pager module makes sure changes happen all at once, that either all changes occur or none of them do, that two or more processes do not try to access the database in incompatible ways at the same time"
Nobody seems to have mentioned WAL (Write Ahead Log) mode. Make sure the transactions are properly organised and with WAL mode set on, there is no need to keep the database locked whilst people are reading things whilst an update is going on.
The only issue is that at some point the WAL needs to be re-incorporated into the main database, and it does this when the last connection to the database closes. With a very busy site you might find it take a few seconds for all connections to be close, but 100K hits per day should not be a problem.
In 2019, there are two new concurrent write options not released yet but available in separate branches.
"PRAGMA journal_mode = wal2"
The advantage of this journal mode over regular "wal" mode is that writers may continue writing to one wal file while the other is checkpointed.
BEGIN CONCURRENT - link to detailed doc
The BEGIN CONCURRENT enhancement allows multiple writers to process write transactions simultanously if the database is in "wal" or "wal2" mode, although the system still serializes COMMIT commands.
When a write-transaction is opened with "BEGIN CONCURRENT", actually locking the database is deferred until a COMMIT is executed. This means that any number of transactions started with BEGIN CONCURRENT may proceed concurrently. The system uses optimistic page-level-locking to prevent conflicting concurrent transactions from being committed.
Together they are present in begin-concurrent-wal2 or each in a separate own branch.
SQLite has a readers-writer lock on the database level. Multiple connections (possibly owned by different processes) can read data from the same database at the same time, but only one can write to the database.
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution. -- Appropriate Uses For SQLite # SQLite.org
The readers-writer lock enables independent transaction processing and it is implemented using exclusive and shared locks on the database level.
An exclusive lock must be obtained before a connection performs a write operation on a database. After the exclusive lock is obtained, both read and write operations from other connections are blocked till the lock is released again.
Implementation details for the case of concurrent writes
SQLite has a lock table that helps locking the database as late as possible during a write operation to ensure maximum concurrency.
The initial state is UNLOCKED, and in this state, the connection has not accessed the database yet. When a process is connected to a database and even a transaction has been started with BEGIN, the connection is still in the UNLOCKED state.
After the UNLOCKED state, the next state is the SHARED state. In order to be able to read (not write) data from the database, the connection must first enter the SHARED state, by getting a SHARED lock.
Multiple connections can obtain and maintain SHARED locks at the same time, so multiple connections can read data from the same database at the same time. But as long as even only one SHARED lock remains unreleased, no connection can successfully complete a write to the database.
If a connection wants to write to the database, it must first get a RESERVED lock.
Only a single RESERVED lock may be active at one time, though multiple SHARED locks can coexist with a single RESERVED lock. RESERVED differs from PENDING in that new SHARED locks can be acquired while there is a RESERVED lock. -- File Locking And Concurrency In SQLite Version 3 # SQLite.org
Once a connection obtains a RESERVED lock, it can start processing database modification operations, though these modifications can only be done in the buffer, rather than actually written to disk. The modifications made to the readout content are saved in the memory buffer.
When a connection wants to submit a modification (or transaction), it is necessary to upgrade the RESERVED lock to an EXCLUSIVE lock. In order to get the lock, you must first lift the lock to a PENDING lock.
A PENDING lock means that the process holding the lock wants to write to the database as soon as possible and is just waiting on all current SHARED locks to clear so that it can get an EXCLUSIVE lock. No new SHARED locks are permitted against the database if a PENDING lock is active, though existing SHARED locks are allowed to continue.
An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.
-- File Locking And Concurrency In SQLite Version 3 # SQLite.org
So you might say SQLite safely handles concurrent access by multiple processes writing to the same DB simply because it doesn't support it! You will get SQLITE_BUSY or SQLITE_LOCKED for the second writer when it hits the retry limitation.
This thread is old but i think it would be good to share result of my tests done on sqlite:
i ran 2 instances of python program (different processes same program) executing statements SELECT and UPDATE sql commands within transaction with EXCLUSIVE lock and timeout set to 10 seconds to get a lock, and result were frustrating. Every instance did in 10000 step loop:
connect to db with exclusive lock
select on one row to read counter
update the row with new value equal to counter incremented by 1
close connection to db
Even if sqlite granted exclusive lock on transaction, the total number of really executed cycles were not equal to 20 000 but less (total number of iterations over single counter counted for both processes).
Python program almost did not throw any single exception (only once during select for 20 executions).
sqlite revision at moment of test was 3.6.20 and python v3.3 CentOS 6.5.
In mine opinion it is better to find more reliable product for this kind of job or restrict writes to sqlite to single unique process/thread.
It is natural when you specify the name for db or even in memory db if you have concurrent access (specially write) you will get this.
In my case, I am using Sqlite for testing and it is because there are several tests in the same solution it happens.
You can have two improvements:
Delete before creating db.Database.EnsureDeletedAsync();
Use an empty string for connection, in this case it will create a random name each call:
{
"ConnectionStrings": {
"ConnectionType": "sqlite",
"ConnectionString": ""
}
}

Optimizing Put Performance in Berkeley DB

I just started playing with Berkeley DB a few days ago so I'm trying to see if there's something I've been missing when it comes to storing data as fast as possible.
Here's some info about the data:
- it comes in 512 byte chunks
- chunks come in order
- chunks will be deleted in FIFO order
- if i lose some data off the end because of power failure that's ok as long as the whole db isn't broken
After reading the a bunch of the documentation it seemed like a Queue db was exactly what I wanted.
However, after trying some test code my fastest results were about 1MByte per second just looping through a DB->put with DB_APPEND set. I also tried using transactions and bulk puts but both of these slowed things down considerably so I didn't pursue them for much time. I was inserting into a fresh db created on a NANDFlash chip on my Freescale i.MX35 dev board.
Since we're looking to get at least 2MBytes per second write speeds, I was wondering if there's something I missed that can improve my speeds since I know that my hardware can write faster than this.
Try putting this into your DB_CONFIG:
set_flags DB_TXN_WRITE_NOSYNC
set_flags DB_TXN_NOSYNC
From my experience, these increase write performance a lot.
DB_TXN_NOSYNC
If set, Berkeley DB will not write or synchronously flush the log on transaction commit or prepare. This means that transactions exhibit the ACI (atomicity, consistency, and isolation) properties, but not D (durability); that is, database integrity will be maintained, but if the application or system fails, it is possible some number of the most recently committed transactions may be undone during recovery. The number of transactions at risk is governed by how many log updates can fit into the log buffer, how often the operating system flushes dirty buffers to disk, and how often the log is checkpointed
Calling DB_ENV->set_flags with the DB_TXN_NOSYNC flag only affects the specified DB_ENV handle (and any other Berkeley DB handles opened within the scope of that handle). For consistent behavior across the environment, all DB_ENV handles opened in the environment must either set the DB_TXN_NOSYNC flag or the flag should be specified in the DB_CONFIG configuration file.
The DB_TXN_NOSYNC flag may be used to configure Berkeley DB at any time during the life of the application.
DB_TXN_WRITE_NOSYNC
If set, Berkeley DB will write, but will not synchronously flush, the log on transaction commit or prepare. This means that transactions exhibit the ACI (atomicity, consistency, and isolation) properties, but not D (durability); that is, database integrity will be maintained, but if the system fails, it is possible some number of the most recently committed transactions may be undone during recovery. The number of transactions at risk is governed by how often the system flushes dirty buffers to disk and how often the log is checkpointed.
Calling DB_ENV->set_flags with the DB_TXN_WRITE_NOSYNC flag only affects the specified DB_ENV handle (and any other Berkeley DB handles opened within the scope of that handle). For consistent behavior across the environment, all DB_ENV handles opened in the environment must either set the DB_TXN_WRITE_NOSYNC flag or the flag should be specified in the DB_CONFIG configuration file.
The DB_TXN_WRITE_NOSYNC flag may be used to configure Berkeley DB at any time during the life of the application.
See http://www.mathematik.uni-ulm.de/help/BerkeleyDB/api_c/env_set_flags.html for more details.
I suggest you must use transactions / TDS datastore if as you mention you cannot recreate a database (i.e. it isnt just a local cache) if it gets corrupted. If you dont care about loosing a few items in event of a crash/power outage then DB_TXN_WRITE_NOSYNC will improve TDS performance, you database will still be integral and recoverable.
If you store using BTREE and a numeric index (if you have no natural key) and watch out for endian issues so you get good key locality and high page utilization then you should be able to get way more than 2000 inserts a second, especially to SSD, especially if you Use DbMultileKeyDataBuilder to do bulk inserts.

Resources