when does innodb deliver updates to row data in buffer and disk? - innodb

I have a question about when does innodb update row data in the buffer and when does the change go to the disk. This question comes from the reading to undo log which says the history data are in the undo log waiting for rollbacks. If the engine needs undo log for rollback, changes of an update query must have changed the row before commit? And then what does the commit do since the data have already been updated.

When you INSERT, UPDATE, or DELETE a row:
Quick summary:
Fetch the block containing the row (or the block that should contain the row). 2. Insert/update/delete the row.
Mark the block as "dirty". It will eventually be written to disk.
Put non-unique secondary index changes in the "change buffer"
More details (on those steps):
To find the 16KB block, drill down the PRIMARY KEY's BTree. If the block is not in the buffer_pool (which is allocated in RAM), fetch it from disk. (This may involve bumping some other block out of the buffer_pool.
Copy the previous value (in case of Update/Delete) to the undo log, and prep it for flushing to disk.
A background task flushes dirty pages to disk. If all is going smoothly, 'most' of the buffer_pool contains non-dirty pages, and you 'never' have to wait for a 'free' block in the buffer_pool.
The Change Buffer is sort of a "delayed write" for index updates. It is transparent. That is, subsequent index lookups will automagically look in the change buffer and/or the index's BTree. The data in the CB will eventually be blended with the real index BTree and eventually flushed to disk.
UNIQUE keys: All INSERTs and UPDATEs that change the Unique key's column(s) necessarily check for dup-key rather than going through the change buffer.
AUTO_INCREMENT has some other special actions.
Depending on the values of innodb_flush_log_at_trx_commit and innodb_doublewrite something may be flushed to disk at the end of the transaction. These handle "atomic" transactions and "torn pages".
Replication: Other activity may include writing to and syncing the binlog, and pushing data to other nodes in the cluster.
The design is "optimistic" in that it is optimized for COMMIT at the expense of ROLLBACK. After a Commit, a process runs around purging the copies that were kept in case of a crash and Rollback. A Rollback is more complex in that it must put back the old copies of the rows. (See also "history list".)
Search for some of the keywords I have mentioned; read some other web pages; then come back with a more specific question.
Commit
Let's look at it from a different side. Each row, including non-yet-committed rows being changed/deleted, has a "transaction id". All the rows for a given transaction have the same id. So, even if there is a crash, InnoDB, knows what to cleanup. COMMIT and ROLLBACK need to be 'atomic'; this is aided by having a single write to disk "says it all". The only way for that to be possible is for the transaction id to be the key. Keep in mind, there could be a million rows scattered around the buffer_pool and data files and logs waiting for the commit/rollback.
After the commit/rollback, InnoDB can leisurely run around cleaning up things. For example, until a UPDATE is committed or rolled back, there are two copies of each row being changed. One of the rows needs to be removed -- eventually. Meanwhile, the two rows are on a "history list". Any other transactions search through the history list to see which one row they are allowed to see -- READ UNCOMMITTED = latest row that has not been committed / rolled back; READ COMMITTED = latest row that has been committed / rolled back; etc.
If I understand it correctly, the undo log is an optimization. For example, on a DELETE the "old values" of the rows are copied to the undo log, and the row is actually deleted from the data BTree. The optimization here is that the undo log is serially written, while the BTree may involve a lot more blocks, scattered around the table. Also, the normal processing of data blocks includes caching them in the buffer_pool. For Commit, the records in the undo log are tossed. For Rollback, there is the tedious effort of using the undo log for reconstruction.
Yes, the history list adds work for all other transactions touching your recently changed rows. But it enables transaction-isolation-modes and aids in recovery from crashes.

Related

Can MariaDB return incomplete data?

I am using MySQL Connector to connect to a MariaDB server.
A function in my program periodically retrieves all entries in a table (with a select * from ... without any wheres, limits, etc.).
After it gets the data, it checks if these rows (using an auto-incremented id) are already present in its memory, and if not it adds them. But if a row does not exist in the retrieved list but presents in the memory-list, then that row must be deleted from memory.
Deleting that row from memory is not the only thing that's gonna happen. It also deletes a bunch of other tables/files linked to that row. So, if the connector somehow fails, does not retrieve the full list, and does not report this, then I'll get into trouble.
It might be a tad stupid question but I couldn't make sure if I needed any additional safety measures.

SQL Server Data Archiving

I have a SQL Azure database on which I need to perform some data archiving operation.
Plan is to move all the irrelevant data from the actual tables into Archive_* tables.
I have tables which have up to 8-9 million records.
One option is to write a stored procedure and insert data in to the new Archive_* tables and also delete from the actual tables.
But this operation is really time consuming and running for more than 3 hrs.
I am in a situation where I can't have more than an hour's downtime.
How can I make this archiving faster?
You can use Azure Automation to schedule execution of a stored procedure every day at the same time, during maintenance window, where this stored procedure will archive the oldest one week or one month of data only, each time it runs. The store procedure should archive data older than X number of weeks/months/years only. Please read this article to create the runbook. In a few days you will have all the old data archived and the Runbook will continue to do the job from now and on.
You can't make it faster, but you can make it seamless. The first option is to have a separate task that moves data in portions from the source to the archive tables. In order to prevent table lock escalations and overall performance degradation I would suggest you to limit the size of a single transaction. E.g. start transaction, insert N records into the archive table, delete these records from the source table, commit transaction. Continue for a few days until all the necessary data is transferred. The advantage of that way is that if there is some kind of a failure, you may restart the archival process and it will continue from the point of the failure.
The second option that does not exclude the first one really depends on how critical the performance of the source tables for you and how many updates are happening with them. It if is not a problem you can write triggers that actually pour every inserted/updated record into an archive table. Then, when you want a cleanup all you need to do is to delete the obsolete records from the source tables, their copies will already be in the archive tables.
In the both cases you will not need to have any downtime.

database having only one table with one column and one value size > over 200mb

I have been playing with database, I imported over million columns, played with functions at the end i selected only one value, deleted the rest of it and my database is the size over 200mb. I am doing it in sqlite3. How to reduce the size of it?
Your database is probably still reserving the space from your previous records.
This is essentially the same question:
Why does clearing an SQLite database not reduce its size?
The accepted answer:
When an object (table, index, trigger, or view) is dropped from the database, it leaves behind empty space. This empty space will be reused the next time new information is added to the database. But in the meantime, the database file might be larger than strictly necessary. Also, frequent inserts, updates, and deletes can cause the information in the database to become fragmented - scrattered out all across the database file rather than clustered together in one place.
The VACUUM command cleans the main database by copying its contents to a temporary database file and reloading the original database file from the copy. This eliminates free pages, aligns table data to be contiguous, and otherwise cleans up the database file structure.
Edit: you may want to research the pragma command 'auto_vacuum' if you expect to be doing this regularly. It will keep your file size down but has some pros and cons. In a production environment It is best to reserve more space than you need, as this reduces the risk of running out of disk space on the server.

When are commited changes visible to other transactions?

A transaction in an Oracle db makes changes in the db, and the changes are committed. Is it possible that other transactions see the performed changes only after several seconds, not immediately?
Background:
We have an application that performs db changes, commits them, and THEN (immediately) it reads the changed data back from the database. However, sometimes it happens that it finds no changes. When the same read is repeated later (by executing the same select manually from SQL Developer), the changed data are returned correctly. The db is standalone, not clustered.
The application does not communicate with the database directly, that would be easy, several layers (including MQ messaging) are involved. We've already eliminated other potential causes of the behaviour (like incorrect parameters, caching etc.). Now I'd like to eliminate an unexpected behaviour of the Oracle db as the cause.
Edit:
At first, I'd like to emphasize that I'm NOT asking whether uncommited changes can be visible to other sessions.
At second, the Oracle COMMIT statement has several modifiers, like WRITE BATCH or NOWAIT. I don't know whether these modifiers can have any influence on the answer to my questions, but we are not using them anyway.
Assuming that your sessions are all using a read committed isolation level, changes would be visible to any query that starts after the data was committed. It was possible in early versions of RAC to have a small delay between when a change was committed on one node and when it was visible on another node but that has been eliminated for a while and you're not using RAC so that's presumably not it.
If your transactions are using the serializable isolation level and the insert happens in a different session than the select, the change would only be visible to other sessions whose transactions began after the change was committed. If sessions A & B both start serializable transactions at time 0, A inserts a row at time 1, and B queries the data at time 2, B would see the state of the data at time 0 and wouldn't see the data that was inserted at time 1 until after it committed its transaction. Note that this would only apply if the two statements are in different sessions-- session A would see the row because it was inserted in A's transaction.
Barring an isolation level issue, I would expect that the SELECT wasn't actually running after the INSERT committed.

riak - unable to delete keys in a bucket

I am using riak version 1.4.10 and it is in a ring with two hosts. I am unable to get rid of keys left over from previous operations using simple delete operations on keys. When I list the keys for a bucket, it shows me the old keys, however if I try to retrieve the data associated with a key, no data is found. When I try to delete the key, it still persists. What could be the cause of this? Is there a way to wipe the keys in the bucket so it starts from a clean slate? I don't care about any of the data in riak, but I would rather not have to reinstall everything again.
You are probably seeing the tombstones of the old data. Since Riak is an eventually consistent data store, it needs to keep track of deletes as if they were ordinary writes, at least for a little while.
If data is present on one node, but not another, how do you tell if it is a PUT that hasnt' propagated yet, or a DELETE?
Riak solves this by using a tombstone. Whenever you delete something, instead of just wiping the data immediately, Riak replaces the existing value with a special value that it knows means deleted. This special value contains a vclock that is descended from the previous value, and metadata indicating deleted. So when it comes time to decide the above question, Riak simply compares the vclock of the value with that of the tombstone. Whichever descends from the other must be the correct one.
To solve the problem of an ever growing data size that contains mostly tombstones, tombstones are reaped after a time. The time is set using the delete_mode setting. After the DELETE is processed, and the tombstone has been written to the primary vnodes, the delete process issues a GET request for the key. Whenever the GET process encounters a tombstone, and all of the primary vnodes responded with the same tombstone, it schedules the tombstone to be reaped according to the delete_mode setting.
So if you want to actually get rid of the tombstones, check your delete_mode setting to make sure it is not set to 'keep', and issue a get for each one to make sure it is really gone.
Or if you are just wiping the data store to restart your tests, stop Riak, delete all the files under the data_root for the backend you are using, and restart.

Resources