Why does clearing an SQLite database not reduce its size? - sqlite

I have an SQLite database.
I created the tables and filled them with a considerable amount of data.
Then I cleared the database by deleting and recreating the tables. I confirmed that all the data had been removed and the tables were empty by looking at them using SQLite Administrator.
The problem is that the size of the database file (*.db3) remained the same after it had been cleared.
This is of course not desirable as I would like to regain the space that was taken up by the data once I clear it.
Did anyone make a similar observation and/or know what is going on?
What can be done about it?

From here:
When an object (table, index, trigger, or view) is dropped from the database, it leaves behind empty space. This empty space will be reused the next time new information is added to the database. But in the meantime, the database file might be larger than strictly necessary. Also, frequent inserts, updates, and deletes can cause the information in the database to become fragmented - scrattered out all across the database file rather than clustered together in one place.
The VACUUM command cleans the main database by copying its contents to a temporary database file and reloading the original database file from the copy. This eliminates free pages, aligns table data to be contiguous, and otherwise cleans up the database file structure.

Databases sizes work like water marks e.g. if the water rises the water mark goes up, when the water receeds the water mark stays where it was
You should look into shrinking databases

Related

Update second sqlite database from first database

I have database A which gets new data every second day. Now I have a second database B, which is a duplicate of database A, which is stored on a different server for read access.
What I do at the moment is, I copy the whole Database A via rsync to replace database B. This is no problem in regards to locking and reading, because I know exactly when =the writing into A has finished. Also no problem concerning access to B.
Now A is getting quite large, and the copying becomes unreliable because of network errors so that I regularly have to copy the database a second time.
Is there a way (I am sure there is) to let sqlite do the updating of B? I have seen .clone, but, again, it would transfer a large amount of data, when only a small amount has changed.
Any suggestions, on how I can make the process of adding the new data added to A to B at a later time more efficient (using sqlite3)?

when does innodb deliver updates to row data in buffer and disk?

I have a question about when does innodb update row data in the buffer and when does the change go to the disk. This question comes from the reading to undo log which says the history data are in the undo log waiting for rollbacks. If the engine needs undo log for rollback, changes of an update query must have changed the row before commit? And then what does the commit do since the data have already been updated.
When you INSERT, UPDATE, or DELETE a row:
Quick summary:
Fetch the block containing the row (or the block that should contain the row). 2. Insert/update/delete the row.
Mark the block as "dirty". It will eventually be written to disk.
Put non-unique secondary index changes in the "change buffer"
More details (on those steps):
To find the 16KB block, drill down the PRIMARY KEY's BTree. If the block is not in the buffer_pool (which is allocated in RAM), fetch it from disk. (This may involve bumping some other block out of the buffer_pool.
Copy the previous value (in case of Update/Delete) to the undo log, and prep it for flushing to disk.
A background task flushes dirty pages to disk. If all is going smoothly, 'most' of the buffer_pool contains non-dirty pages, and you 'never' have to wait for a 'free' block in the buffer_pool.
The Change Buffer is sort of a "delayed write" for index updates. It is transparent. That is, subsequent index lookups will automagically look in the change buffer and/or the index's BTree. The data in the CB will eventually be blended with the real index BTree and eventually flushed to disk.
UNIQUE keys: All INSERTs and UPDATEs that change the Unique key's column(s) necessarily check for dup-key rather than going through the change buffer.
AUTO_INCREMENT has some other special actions.
Depending on the values of innodb_flush_log_at_trx_commit and innodb_doublewrite something may be flushed to disk at the end of the transaction. These handle "atomic" transactions and "torn pages".
Replication: Other activity may include writing to and syncing the binlog, and pushing data to other nodes in the cluster.
The design is "optimistic" in that it is optimized for COMMIT at the expense of ROLLBACK. After a Commit, a process runs around purging the copies that were kept in case of a crash and Rollback. A Rollback is more complex in that it must put back the old copies of the rows. (See also "history list".)
Search for some of the keywords I have mentioned; read some other web pages; then come back with a more specific question.
Commit
Let's look at it from a different side. Each row, including non-yet-committed rows being changed/deleted, has a "transaction id". All the rows for a given transaction have the same id. So, even if there is a crash, InnoDB, knows what to cleanup. COMMIT and ROLLBACK need to be 'atomic'; this is aided by having a single write to disk "says it all". The only way for that to be possible is for the transaction id to be the key. Keep in mind, there could be a million rows scattered around the buffer_pool and data files and logs waiting for the commit/rollback.
After the commit/rollback, InnoDB can leisurely run around cleaning up things. For example, until a UPDATE is committed or rolled back, there are two copies of each row being changed. One of the rows needs to be removed -- eventually. Meanwhile, the two rows are on a "history list". Any other transactions search through the history list to see which one row they are allowed to see -- READ UNCOMMITTED = latest row that has not been committed / rolled back; READ COMMITTED = latest row that has been committed / rolled back; etc.
If I understand it correctly, the undo log is an optimization. For example, on a DELETE the "old values" of the rows are copied to the undo log, and the row is actually deleted from the data BTree. The optimization here is that the undo log is serially written, while the BTree may involve a lot more blocks, scattered around the table. Also, the normal processing of data blocks includes caching them in the buffer_pool. For Commit, the records in the undo log are tossed. For Rollback, there is the tedious effort of using the undo log for reconstruction.
Yes, the history list adds work for all other transactions touching your recently changed rows. But it enables transaction-isolation-modes and aids in recovery from crashes.

why is Sqlite checksum not same after reversing edits?

obviously editing any column value will change the checksum.
but saving the original value back will not return the file to the original checksum.
I ran VACUUM before and after so it isn't due to buffer size.
I don't have any indexes referencing the column and rows are not added or removed so pk index shouldn't need to change either.
I tried turning off the rollback journal, but that is a separate file so I'm not surprised it had no effect.
I'm not aware of an internal log or modified dates to explain why the same content does not produce the same file bytes.
Looking for insight on what is happening inside the file to explain this and if there is a way to make it behave(I don't see a relevant PRAGMA).
granted https://sqlite.org/dbhash.html exists to work around this problem but I don't see any of these conditions being triggered "... and so forth" is a pretty vague cause
Database files contain (the equivalent of) a timestamp of the last modification so that other processes can detect that the data has changed.
There are many other things that can change in a database file (e.g., the order of pages, the B-tree structure, random data in unused parts) without a difference in the data as seen at the SQL level.
If you want to compare databases at the SQL level, you have to compare a canonical SQL representation of that data, such as the .dump output, or use a specialized tool such as dbhash.

database having only one table with one column and one value size > over 200mb

I have been playing with database, I imported over million columns, played with functions at the end i selected only one value, deleted the rest of it and my database is the size over 200mb. I am doing it in sqlite3. How to reduce the size of it?
Your database is probably still reserving the space from your previous records.
This is essentially the same question:
Why does clearing an SQLite database not reduce its size?
The accepted answer:
When an object (table, index, trigger, or view) is dropped from the database, it leaves behind empty space. This empty space will be reused the next time new information is added to the database. But in the meantime, the database file might be larger than strictly necessary. Also, frequent inserts, updates, and deletes can cause the information in the database to become fragmented - scrattered out all across the database file rather than clustered together in one place.
The VACUUM command cleans the main database by copying its contents to a temporary database file and reloading the original database file from the copy. This eliminates free pages, aligns table data to be contiguous, and otherwise cleans up the database file structure.
Edit: you may want to research the pragma command 'auto_vacuum' if you expect to be doing this regularly. It will keep your file size down but has some pros and cons. In a production environment It is best to reserve more space than you need, as this reduces the risk of running out of disk space on the server.

sqlite: online backup is not identical to original

I'm doing an online backup of an (idle) database using the example 2 code from here. The backup file is not identical to the original (the length is the same, but it differs in 3 bytes), although the .dump from both databases is identical. Backup files taken at different times are identical to each other.
This isn't great, as I'd like a simple guarantee that the backup is identical to the original, and I'd like to record checksums on the actual database and the backups to simplify restores. Any idea if I can get around this, or if I can use the backup API to generate files that compare identically?
The online backup can write into an existing database, so this writing is done inside a transaction.
At the end of such a transaction, the file change counter (offsets 24-27) is changed to allow other processes to detect that the database was modified and that any caches in those processes are invalid.
This change counter does not use the value from the original database because it might be identical to the old value of the destination database.
If the destination database is freshly created, the change counter starts at zero.
This is likely to be a change from the original database, but at least it's consistent.
The byte at offset 28 was decreased because the database has some unused pages.
The byte at offset 44 was changed because the database does not actually use new schema features.
You might be able to avoid these changes by doing a VACUUM before the backup, but this wouldn't help for the change counter.
I would not have expected them to be identical, just because the backup API ensures that any backups are self consistent (ie transactions in progress are ignored).

Resources