Questions on sqlite transactions - sqlite

1) Is it ok to do inserts,deletes and updates in one transaction?
2) Is there a (recommended) limit to the number of writes per transaction?

Transaction is a logical block. You can do whatever you want within one transaction.
For example, for one functions of our product we build a temporary table, insert a set of tuples there and then run a SELECT that uses that temporary table and a permanent table. All that stuff is inside a transaction which is rolled back afterwards so that no changes happen to the database.

Related

Do multiple, concurrent update of individual fields in the same document require a transaction?

I am building a flutter app with firebase backend in which I require multiple clients to update the same document simultaneously. But the fields that each client updates are different. So would there be any benefit if I use transaction to update rather than updating the document normally?
You would want to use a transaction if a client is updating a field using the contents of the other fields for reference. So if field2 needs to be computed consistently based on the contents of field1, you would need a transaction to ensure that there is no race condition between the update of either field.
If each field is entirely, logically separate from each other, and there is no race condition between their updates (they can all change independently of each other) then it should be safe to update them each without a transaction. But bear in mind that each document has a sustained max write rate of 1 per second. So if you have a lot of concurrent updates coming in, those updates could fail. In that case, you'd want each field to exist in its own document.

when does innodb deliver updates to row data in buffer and disk?

I have a question about when does innodb update row data in the buffer and when does the change go to the disk. This question comes from the reading to undo log which says the history data are in the undo log waiting for rollbacks. If the engine needs undo log for rollback, changes of an update query must have changed the row before commit? And then what does the commit do since the data have already been updated.
When you INSERT, UPDATE, or DELETE a row:
Quick summary:
Fetch the block containing the row (or the block that should contain the row). 2. Insert/update/delete the row.
Mark the block as "dirty". It will eventually be written to disk.
Put non-unique secondary index changes in the "change buffer"
More details (on those steps):
To find the 16KB block, drill down the PRIMARY KEY's BTree. If the block is not in the buffer_pool (which is allocated in RAM), fetch it from disk. (This may involve bumping some other block out of the buffer_pool.
Copy the previous value (in case of Update/Delete) to the undo log, and prep it for flushing to disk.
A background task flushes dirty pages to disk. If all is going smoothly, 'most' of the buffer_pool contains non-dirty pages, and you 'never' have to wait for a 'free' block in the buffer_pool.
The Change Buffer is sort of a "delayed write" for index updates. It is transparent. That is, subsequent index lookups will automagically look in the change buffer and/or the index's BTree. The data in the CB will eventually be blended with the real index BTree and eventually flushed to disk.
UNIQUE keys: All INSERTs and UPDATEs that change the Unique key's column(s) necessarily check for dup-key rather than going through the change buffer.
AUTO_INCREMENT has some other special actions.
Depending on the values of innodb_flush_log_at_trx_commit and innodb_doublewrite something may be flushed to disk at the end of the transaction. These handle "atomic" transactions and "torn pages".
Replication: Other activity may include writing to and syncing the binlog, and pushing data to other nodes in the cluster.
The design is "optimistic" in that it is optimized for COMMIT at the expense of ROLLBACK. After a Commit, a process runs around purging the copies that were kept in case of a crash and Rollback. A Rollback is more complex in that it must put back the old copies of the rows. (See also "history list".)
Search for some of the keywords I have mentioned; read some other web pages; then come back with a more specific question.
Commit
Let's look at it from a different side. Each row, including non-yet-committed rows being changed/deleted, has a "transaction id". All the rows for a given transaction have the same id. So, even if there is a crash, InnoDB, knows what to cleanup. COMMIT and ROLLBACK need to be 'atomic'; this is aided by having a single write to disk "says it all". The only way for that to be possible is for the transaction id to be the key. Keep in mind, there could be a million rows scattered around the buffer_pool and data files and logs waiting for the commit/rollback.
After the commit/rollback, InnoDB can leisurely run around cleaning up things. For example, until a UPDATE is committed or rolled back, there are two copies of each row being changed. One of the rows needs to be removed -- eventually. Meanwhile, the two rows are on a "history list". Any other transactions search through the history list to see which one row they are allowed to see -- READ UNCOMMITTED = latest row that has not been committed / rolled back; READ COMMITTED = latest row that has been committed / rolled back; etc.
If I understand it correctly, the undo log is an optimization. For example, on a DELETE the "old values" of the rows are copied to the undo log, and the row is actually deleted from the data BTree. The optimization here is that the undo log is serially written, while the BTree may involve a lot more blocks, scattered around the table. Also, the normal processing of data blocks includes caching them in the buffer_pool. For Commit, the records in the undo log are tossed. For Rollback, there is the tedious effort of using the undo log for reconstruction.
Yes, the history list adds work for all other transactions touching your recently changed rows. But it enables transaction-isolation-modes and aids in recovery from crashes.

SQL Server Data Archiving

I have a SQL Azure database on which I need to perform some data archiving operation.
Plan is to move all the irrelevant data from the actual tables into Archive_* tables.
I have tables which have up to 8-9 million records.
One option is to write a stored procedure and insert data in to the new Archive_* tables and also delete from the actual tables.
But this operation is really time consuming and running for more than 3 hrs.
I am in a situation where I can't have more than an hour's downtime.
How can I make this archiving faster?
You can use Azure Automation to schedule execution of a stored procedure every day at the same time, during maintenance window, where this stored procedure will archive the oldest one week or one month of data only, each time it runs. The store procedure should archive data older than X number of weeks/months/years only. Please read this article to create the runbook. In a few days you will have all the old data archived and the Runbook will continue to do the job from now and on.
You can't make it faster, but you can make it seamless. The first option is to have a separate task that moves data in portions from the source to the archive tables. In order to prevent table lock escalations and overall performance degradation I would suggest you to limit the size of a single transaction. E.g. start transaction, insert N records into the archive table, delete these records from the source table, commit transaction. Continue for a few days until all the necessary data is transferred. The advantage of that way is that if there is some kind of a failure, you may restart the archival process and it will continue from the point of the failure.
The second option that does not exclude the first one really depends on how critical the performance of the source tables for you and how many updates are happening with them. It if is not a problem you can write triggers that actually pour every inserted/updated record into an archive table. Then, when you want a cleanup all you need to do is to delete the obsolete records from the source tables, their copies will already be in the archive tables.
In the both cases you will not need to have any downtime.

Clear DocumentDB Collection before inserting documents

I need to know how to clear documentdb collection before inserting new documents. I am using datafactory pipeline activity to fecth data from on-prem sql server and insert into documentdb collection. The frequency is set to every 2 hrs. So when the next cycle runs, I want to first clear the exisitng data in documentdb collection. How do I do that?
The easiest way is to programmatically delete the collection and recreate it with the same name. Our test scripts do this automatically. There is the potential for this to fail due to a subtle race condition, but we've found that adding a half second delay between the delete and recreate avoids this.
Alternatively, it would be possible to fetch every document id and then delete them one at a time. This would be most efficiently done from a stored procedure (sproc) so you didn't have to send it all over the wire, but it would still be consuming RUs and take time.

Attaching two memory databases

I am collecting data every second and storing it in a ":memory" database. Inserting data into this database is inside a transaction.
Everytime one request is sending to server and server will read data from the first memory, do some calculation, store it in the second database and send it back to the client. For this, I am creating another ":memory:" database to store the aggregated information of the first db. I cannot use the same db because I need to do some large calculation to get the aggregated result. This cannot be done inside the transaction( because if one collection takes 5 sec I will lose all the 4 seconds data). I cannot create table in the same database because I will not be able to write the aggregate data while it is collecting and inserting the original data(it is inside transaction and it is collecting every one second)
-- Sometimes I want to retrieve data from both the databses. How can I link both these memory databases? Using attach database stmt, I can attach the second db to the first one. But the problem is next time when a request comes how will I check the second db is exist or not?
-- Suppose, I am attaching the second memory db to first one. Will it lock the second database, when we write data to the first db?
-- Is there any other way to store this aggregated data??
As far as I got your idea, I don't think that you need two databases at all. I suppose you are misinterpreting the idea of transactions in sql.
If you are beginning a transaction other processes will be still allowed to read data. If you are reading data, you probably don't need a database lock.
A possible workflow could look as the following.
Insert some data to the database (use a transaction just for the
insertion process)
Perform heavy calculations on the database (but do not use a transaction, otherwise it will prevent other processes of inserting any data to your database). Even if this step includes really heavy computation, you can still insert and read data by using another process as SELECT statements will not lock your database.
Write results to the database (again, by using a transaction)
Just make sure that heavy calculations are not performed within a transaction.
If you want a more detailed description of this solution, look at the documentation about the file locking behaviour of sqlite3: http://www.sqlite.org/lockingv3.html

Resources