sqlite: online backup is not identical to original - sqlite

I'm doing an online backup of an (idle) database using the example 2 code from here. The backup file is not identical to the original (the length is the same, but it differs in 3 bytes), although the .dump from both databases is identical. Backup files taken at different times are identical to each other.
This isn't great, as I'd like a simple guarantee that the backup is identical to the original, and I'd like to record checksums on the actual database and the backups to simplify restores. Any idea if I can get around this, or if I can use the backup API to generate files that compare identically?

The online backup can write into an existing database, so this writing is done inside a transaction.
At the end of such a transaction, the file change counter (offsets 24-27) is changed to allow other processes to detect that the database was modified and that any caches in those processes are invalid.
This change counter does not use the value from the original database because it might be identical to the old value of the destination database.
If the destination database is freshly created, the change counter starts at zero.
This is likely to be a change from the original database, but at least it's consistent.
The byte at offset 28 was decreased because the database has some unused pages.
The byte at offset 44 was changed because the database does not actually use new schema features.
You might be able to avoid these changes by doing a VACUUM before the backup, but this wouldn't help for the change counter.

I would not have expected them to be identical, just because the backup API ensures that any backups are self consistent (ie transactions in progress are ignored).

Related

Update second sqlite database from first database

I have database A which gets new data every second day. Now I have a second database B, which is a duplicate of database A, which is stored on a different server for read access.
What I do at the moment is, I copy the whole Database A via rsync to replace database B. This is no problem in regards to locking and reading, because I know exactly when =the writing into A has finished. Also no problem concerning access to B.
Now A is getting quite large, and the copying becomes unreliable because of network errors so that I regularly have to copy the database a second time.
Is there a way (I am sure there is) to let sqlite do the updating of B? I have seen .clone, but, again, it would transfer a large amount of data, when only a small amount has changed.
Any suggestions, on how I can make the process of adding the new data added to A to B at a later time more efficient (using sqlite3)?

why is Sqlite checksum not same after reversing edits?

obviously editing any column value will change the checksum.
but saving the original value back will not return the file to the original checksum.
I ran VACUUM before and after so it isn't due to buffer size.
I don't have any indexes referencing the column and rows are not added or removed so pk index shouldn't need to change either.
I tried turning off the rollback journal, but that is a separate file so I'm not surprised it had no effect.
I'm not aware of an internal log or modified dates to explain why the same content does not produce the same file bytes.
Looking for insight on what is happening inside the file to explain this and if there is a way to make it behave(I don't see a relevant PRAGMA).
granted https://sqlite.org/dbhash.html exists to work around this problem but I don't see any of these conditions being triggered "... and so forth" is a pretty vague cause
Database files contain (the equivalent of) a timestamp of the last modification so that other processes can detect that the data has changed.
There are many other things that can change in a database file (e.g., the order of pages, the B-tree structure, random data in unused parts) without a difference in the data as seen at the SQL level.
If you want to compare databases at the SQL level, you have to compare a canonical SQL representation of that data, such as the .dump output, or use a specialized tool such as dbhash.

Can select commands edit a sql database in anyway?

Is there any way in which select commands alter a sqlite database? I would assume not, but don't want to rely on that assumption. (the specific concern i had in mind was if e.g. querying the database for example creates indexes or similar for quicker retrieval in subsequent times, hence causing the sql files to change)
Asking, because i want to cache some values calculated from a sql file, and only update these values if there has been an edit to the sql file [specifically if the number of bytes file size has changed, which would indicate the sql database has changed. The specific calculations are quite computations intensive, so don't want to repeat unless neaded].
SELECT statements cannot modify the database.
SQLite sometimes needs to store temporary indexes or intermediate results, but such data goes into the temporary database, not into the actual database file.
Anyway, to find out whether a database file has changed, check the file change counter.

database having only one table with one column and one value size > over 200mb

I have been playing with database, I imported over million columns, played with functions at the end i selected only one value, deleted the rest of it and my database is the size over 200mb. I am doing it in sqlite3. How to reduce the size of it?
Your database is probably still reserving the space from your previous records.
This is essentially the same question:
Why does clearing an SQLite database not reduce its size?
The accepted answer:
When an object (table, index, trigger, or view) is dropped from the database, it leaves behind empty space. This empty space will be reused the next time new information is added to the database. But in the meantime, the database file might be larger than strictly necessary. Also, frequent inserts, updates, and deletes can cause the information in the database to become fragmented - scrattered out all across the database file rather than clustered together in one place.
The VACUUM command cleans the main database by copying its contents to a temporary database file and reloading the original database file from the copy. This eliminates free pages, aligns table data to be contiguous, and otherwise cleans up the database file structure.
Edit: you may want to research the pragma command 'auto_vacuum' if you expect to be doing this regularly. It will keep your file size down but has some pros and cons. In a production environment It is best to reserve more space than you need, as this reduces the risk of running out of disk space on the server.

Meta-data from SQLite

Is there any way to query a SQLite database for basic meta data such as:
Last date/time updated
Hash of database to indicate "state"
I am just looking for a simple, infrastructural way to have a script evaluate different databases and take a reasonable point of view on whether they are the same "state" as other databases in a different environment (PROD and DEV for instance).
In my experience, if no update, new record, or any change is made to the SQLite database file, the last modified time of the file doesn't change. So the last modified time should suffice for the time of any change made to database.
If 2 database files with same state are only accessed for reading, their modified times are always the same.
Similarly you get the file sizes for comparison.
You can use the whole file to calculate hash. If you consider same data in the database as the same "state" regardless of any difference in the past, then maybe you want hash of the all records in database, which is probably not simple.

Resources