Deadlock Caused due to (row level Locking and) delete then insert in an table with no constraints - deadlock

We have an online application which checks room availability. A procedure is used and it uses a table (with no constraints) temporarily to delete its contents first and then insert the searched rows (searched data using select queries in cursor). Another user from another session deletes the table data and inserts again the table data.
Some observations from deadlock graph:
1. Enqueue type is TX (after seeing the deadlock graph), so, this is definitely not locking due to unindexed foreign keys.
The mode the lock is being waited on is 'X' (exclusive), so there is row-level locking.

Related

when does innodb deliver updates to row data in buffer and disk?

I have a question about when does innodb update row data in the buffer and when does the change go to the disk. This question comes from the reading to undo log which says the history data are in the undo log waiting for rollbacks. If the engine needs undo log for rollback, changes of an update query must have changed the row before commit? And then what does the commit do since the data have already been updated.
When you INSERT, UPDATE, or DELETE a row:
Quick summary:
Fetch the block containing the row (or the block that should contain the row). 2. Insert/update/delete the row.
Mark the block as "dirty". It will eventually be written to disk.
Put non-unique secondary index changes in the "change buffer"
More details (on those steps):
To find the 16KB block, drill down the PRIMARY KEY's BTree. If the block is not in the buffer_pool (which is allocated in RAM), fetch it from disk. (This may involve bumping some other block out of the buffer_pool.
Copy the previous value (in case of Update/Delete) to the undo log, and prep it for flushing to disk.
A background task flushes dirty pages to disk. If all is going smoothly, 'most' of the buffer_pool contains non-dirty pages, and you 'never' have to wait for a 'free' block in the buffer_pool.
The Change Buffer is sort of a "delayed write" for index updates. It is transparent. That is, subsequent index lookups will automagically look in the change buffer and/or the index's BTree. The data in the CB will eventually be blended with the real index BTree and eventually flushed to disk.
UNIQUE keys: All INSERTs and UPDATEs that change the Unique key's column(s) necessarily check for dup-key rather than going through the change buffer.
AUTO_INCREMENT has some other special actions.
Depending on the values of innodb_flush_log_at_trx_commit and innodb_doublewrite something may be flushed to disk at the end of the transaction. These handle "atomic" transactions and "torn pages".
Replication: Other activity may include writing to and syncing the binlog, and pushing data to other nodes in the cluster.
The design is "optimistic" in that it is optimized for COMMIT at the expense of ROLLBACK. After a Commit, a process runs around purging the copies that were kept in case of a crash and Rollback. A Rollback is more complex in that it must put back the old copies of the rows. (See also "history list".)
Search for some of the keywords I have mentioned; read some other web pages; then come back with a more specific question.
Commit
Let's look at it from a different side. Each row, including non-yet-committed rows being changed/deleted, has a "transaction id". All the rows for a given transaction have the same id. So, even if there is a crash, InnoDB, knows what to cleanup. COMMIT and ROLLBACK need to be 'atomic'; this is aided by having a single write to disk "says it all". The only way for that to be possible is for the transaction id to be the key. Keep in mind, there could be a million rows scattered around the buffer_pool and data files and logs waiting for the commit/rollback.
After the commit/rollback, InnoDB can leisurely run around cleaning up things. For example, until a UPDATE is committed or rolled back, there are two copies of each row being changed. One of the rows needs to be removed -- eventually. Meanwhile, the two rows are on a "history list". Any other transactions search through the history list to see which one row they are allowed to see -- READ UNCOMMITTED = latest row that has not been committed / rolled back; READ COMMITTED = latest row that has been committed / rolled back; etc.
If I understand it correctly, the undo log is an optimization. For example, on a DELETE the "old values" of the rows are copied to the undo log, and the row is actually deleted from the data BTree. The optimization here is that the undo log is serially written, while the BTree may involve a lot more blocks, scattered around the table. Also, the normal processing of data blocks includes caching them in the buffer_pool. For Commit, the records in the undo log are tossed. For Rollback, there is the tedious effort of using the undo log for reconstruction.
Yes, the history list adds work for all other transactions touching your recently changed rows. But it enables transaction-isolation-modes and aids in recovery from crashes.

Why does this transaction produce a deadlock?

My application runs on Maria DB using a master-master Galera replication setup.
The application can handle deadlocks, but I've been working to minimize those that occur as they fill up my log files. There remains one transaction that gets regular deadlocks and I don't know how to avoid it.
The process deletes a record from one table, does a couple of operations on other tables and then finally inserts a record into the original table.
The transaction looks broadly like this:
1. DELETE FROM table_a WHERE `id` = 'Foo'
2. REPLACE INTO table_b ( ... )
3. UPDATE table_c SET ....
4. INSERT INTO table_a (id,...) VALUES ('Bar',...)
The final insert regularly gets a deadlock although retrying the transaction fixes it. What is it about this pattern that causes a deadlock? What can I do to reduce the occurrence?
Question: Is the 'deadlock' in the node you are writing to? Or does the deadlock not occur until COMMIT; that is, when trying to reconcile across the cluster?
If on the writing node...
As soon as possible in the transaction, do
SELECT id FROM table_a WHERE ... FOR UPDATE;
to signal what row(s) you will be inserting in step 4.
Also, consider changing REPLACE to an equivalent INSERT .. ON DUPLICATE KEY UPDATE ... I don't know if it will directly help with the deadlock, but at least it is (probably) more efficient.
If on the cluster...
Are you touching lots of different rows? Are you using roundrobin for picking which node to write to?
In any case, speeding up the transaction will help. Is there anything that can be pulled out of the transaction. Some thoughts there:
Normalization can generally be done in its own transaction.
If you have an "if" in the transaction, it might be worth it to do a tentative test beforehand. (But you probably need to keep the "if" in the transaction.)

How can I improve performance while altering a large mysql table?

I have 600 Millions records in a table and I am not able to add a column in this table as every time I try to do it, it times out.
Suppose in your MYSQL database you have a giant table having 600 Millions of rows, having some schema operation such as adding a unique key, altering a column, even adding one more column to it is a very cumbersome process which will takes hours to process and sometimes there is a server time out. In order to overcome that, one to have to come up with very good migration plan, one of which I jotting below.
1) Suppose there is table Orig_X in which I have to add a new column colNew with default value as 0.
2) A Dummy table Dummy_X is created which is replica of Orig_X except with a new column colNew.
3) Data is inserted from the Orig_X to Dummy_X with the following settings.
4) Auto commit is set to zero, so that data is not committed after each insert statement hindering the performance.
5) Binary logs are set to zero, so that no data will be written in these logs.
6) After insertion of data bot the feature are set to one.
SET AUTOCOMMIT = 0;
SET sql_log_bin = 0;
Insert into Dummy_X(col1, col2, col3, colNew)
Select col1, col2, col3, from Orig_X;
SET sql_log_bin = 1;
SET AUTOCOMMIT = 1;
7) Now primary key can be created with the newly inserted column, which is now the part of primary key.
8) All the unique keys can now be created.
9) We can check the status of the server by issuing the following command
SHOW MASTER STATUS
10) It’s also helpful to issue FLUSH LOGS so MySQL will clear the old logs.
11) In order to boost performance to run the similar type of queries such as above insert statement, one should have query cache variable on.
SHOW VARIABLES LIKE 'have_query_cache';
query_cache_type = 1
Above were the steps for the migration strategy for the large table, below I am witting so steps to improve the performance of the database/queries.
1) Remove any unnecessary indexes on the table, pay particular attention to UNIQUE indexes as these when disable change buffering. Don't use a UNIQUE index if you have no reason for that constraint, prefer a regular INDEX.
2) If bulk loading a fresh table, delay creating any indexes besides the PRIMARY KEY. If you create them once all after data is loaded, then InnoDB is able to apply a pre-sort and bulk load process which is both faster and results in typically more compact indexes.
3) More memory can actually help in performance optimization. If SHOW ENGINE INNODB STATUS shows any reads/s under BUFFER POOL AND MEMORY and the number of Free buffers (also under BUFFER POOL AND MEMORY) is zero, you could benefit from more (assuming you have sized innodb_buffer_pool_size correctly on your server.
4) Normally your database table gets re-indexed after every insert. That's some heavy lifting for you database, but when your queries are wrapped inside a Transaction, the table does not get re-indexed until after this entire bulk is processed. Saving a lot of work.
5) Most MySQL servers have query caching enabled. It's one of the most effective methods of improving performance that is quietly handled by the database engine. When the same query is executed multiple times, the result is fetched from the cache, which is quite fast.
6) Using the EXPLAIN keyword can give you insight on what MySQL is doing to execute your query. This can help you spot the bottlenecks and other problems with your query or table structures. The results of an EXPLAIN query will show you which indexes are being utilized, how the table is being scanned and sorted etc...
7) If your application contains many JOIN queries, you need to make sure that the columns you join by are indexed on both tables. This affects how MySQL internally optimizes the join operation.
8) In every table have an id column that is the PRIMARY KEY, AUTO_INCREMENT and one of the flavors of INT. Also preferably UNSIGNED, since the value cannot be negative.
9) Even if you have a user’s table that has a unique username field, do not make that your primary key. VARCHAR fields as primary keys are slower. And you will have a better structure in your code by referring to all users with their id's internally.
10) Normally when you perform a query from a script, it will wait for the execution of that query to finish before it can continue. You can change that by using unbuffered queries. This saves a considerable amount of memory with SQL queries that produce large result sets, and you can start working on the result set immediately after the first row has been retrieved as you don't have to wait until the complete SQL query has been performed.
11) With database engines, disk is perhaps the most significant bottleneck. Keeping things smaller and more compact is usually helpful in terms of performance, to reduce the amount of disk transfer.
12) The two main storage engines in MySQL are MyISAM and InnoDB. Each have their own pros and cons.MyISAM is good for read-heavy applications, but it doesn't scale very well when there are a lot of writes. Even if you are updating one field of one row, the whole table gets locked, and no other process can even read from it until that query is finished. MyISAM is very fast at calculating
SELECT COUNT(*)
types of queries.InnoDB tends to be a more complicated storage
engine and can be slower than MyISAM for most small applications. But it supports row-based locking, which scales better. It also supports some more advanced features such as transactions.

When are commited changes visible to other transactions?

A transaction in an Oracle db makes changes in the db, and the changes are committed. Is it possible that other transactions see the performed changes only after several seconds, not immediately?
Background:
We have an application that performs db changes, commits them, and THEN (immediately) it reads the changed data back from the database. However, sometimes it happens that it finds no changes. When the same read is repeated later (by executing the same select manually from SQL Developer), the changed data are returned correctly. The db is standalone, not clustered.
The application does not communicate with the database directly, that would be easy, several layers (including MQ messaging) are involved. We've already eliminated other potential causes of the behaviour (like incorrect parameters, caching etc.). Now I'd like to eliminate an unexpected behaviour of the Oracle db as the cause.
Edit:
At first, I'd like to emphasize that I'm NOT asking whether uncommited changes can be visible to other sessions.
At second, the Oracle COMMIT statement has several modifiers, like WRITE BATCH or NOWAIT. I don't know whether these modifiers can have any influence on the answer to my questions, but we are not using them anyway.
Assuming that your sessions are all using a read committed isolation level, changes would be visible to any query that starts after the data was committed. It was possible in early versions of RAC to have a small delay between when a change was committed on one node and when it was visible on another node but that has been eliminated for a while and you're not using RAC so that's presumably not it.
If your transactions are using the serializable isolation level and the insert happens in a different session than the select, the change would only be visible to other sessions whose transactions began after the change was committed. If sessions A & B both start serializable transactions at time 0, A inserts a row at time 1, and B queries the data at time 2, B would see the state of the data at time 0 and wouldn't see the data that was inserted at time 1 until after it committed its transaction. Note that this would only apply if the two statements are in different sessions-- session A would see the row because it was inserted in A's transaction.
Barring an isolation level issue, I would expect that the SELECT wasn't actually running after the INSERT committed.

Questions on sqlite transactions

1) Is it ok to do inserts,deletes and updates in one transaction?
2) Is there a (recommended) limit to the number of writes per transaction?
Transaction is a logical block. You can do whatever you want within one transaction.
For example, for one functions of our product we build a temporary table, insert a set of tuples there and then run a SELECT that uses that temporary table and a permanent table. All that stuff is inside a transaction which is rolled back afterwards so that no changes happen to the database.

Resources