Meta-data from SQLite - sqlite

Is there any way to query a SQLite database for basic meta data such as:
Last date/time updated
Hash of database to indicate "state"
I am just looking for a simple, infrastructural way to have a script evaluate different databases and take a reasonable point of view on whether they are the same "state" as other databases in a different environment (PROD and DEV for instance).

In my experience, if no update, new record, or any change is made to the SQLite database file, the last modified time of the file doesn't change. So the last modified time should suffice for the time of any change made to database.
If 2 database files with same state are only accessed for reading, their modified times are always the same.
Similarly you get the file sizes for comparison.
You can use the whole file to calculate hash. If you consider same data in the database as the same "state" regardless of any difference in the past, then maybe you want hash of the all records in database, which is probably not simple.

Related

why is Sqlite checksum not same after reversing edits?

obviously editing any column value will change the checksum.
but saving the original value back will not return the file to the original checksum.
I ran VACUUM before and after so it isn't due to buffer size.
I don't have any indexes referencing the column and rows are not added or removed so pk index shouldn't need to change either.
I tried turning off the rollback journal, but that is a separate file so I'm not surprised it had no effect.
I'm not aware of an internal log or modified dates to explain why the same content does not produce the same file bytes.
Looking for insight on what is happening inside the file to explain this and if there is a way to make it behave(I don't see a relevant PRAGMA).
granted https://sqlite.org/dbhash.html exists to work around this problem but I don't see any of these conditions being triggered "... and so forth" is a pretty vague cause
Database files contain (the equivalent of) a timestamp of the last modification so that other processes can detect that the data has changed.
There are many other things that can change in a database file (e.g., the order of pages, the B-tree structure, random data in unused parts) without a difference in the data as seen at the SQL level.
If you want to compare databases at the SQL level, you have to compare a canonical SQL representation of that data, such as the .dump output, or use a specialized tool such as dbhash.

Can select commands edit a sql database in anyway?

Is there any way in which select commands alter a sqlite database? I would assume not, but don't want to rely on that assumption. (the specific concern i had in mind was if e.g. querying the database for example creates indexes or similar for quicker retrieval in subsequent times, hence causing the sql files to change)
Asking, because i want to cache some values calculated from a sql file, and only update these values if there has been an edit to the sql file [specifically if the number of bytes file size has changed, which would indicate the sql database has changed. The specific calculations are quite computations intensive, so don't want to repeat unless neaded].
SELECT statements cannot modify the database.
SQLite sometimes needs to store temporary indexes or intermediate results, but such data goes into the temporary database, not into the actual database file.
Anyway, to find out whether a database file has changed, check the file change counter.

How to make sure SQLite database was not modified?

I am looking for a way to tell if a database file was modified or not.
The amount of data stored is not large, however updates are often and running select statements after any update to create a new checksum of all data would be too much.
Previously most of our data was stored as entries with JSON, so it was much easier to get few rows and create a checksum of it. Now however, we need to use the database properly, so data will be normalized across few tables and multiple rows.
I need this to be handled by the database, so I don't want to create an md5 of the database file and check that.
Is there any way I could achieve that?
Whenever a database is modified, the file change counter in the database header is incremented.

sqlite: online backup is not identical to original

I'm doing an online backup of an (idle) database using the example 2 code from here. The backup file is not identical to the original (the length is the same, but it differs in 3 bytes), although the .dump from both databases is identical. Backup files taken at different times are identical to each other.
This isn't great, as I'd like a simple guarantee that the backup is identical to the original, and I'd like to record checksums on the actual database and the backups to simplify restores. Any idea if I can get around this, or if I can use the backup API to generate files that compare identically?
The online backup can write into an existing database, so this writing is done inside a transaction.
At the end of such a transaction, the file change counter (offsets 24-27) is changed to allow other processes to detect that the database was modified and that any caches in those processes are invalid.
This change counter does not use the value from the original database because it might be identical to the old value of the destination database.
If the destination database is freshly created, the change counter starts at zero.
This is likely to be a change from the original database, but at least it's consistent.
The byte at offset 28 was decreased because the database has some unused pages.
The byte at offset 44 was changed because the database does not actually use new schema features.
You might be able to avoid these changes by doing a VACUUM before the backup, but this wouldn't help for the change counter.
I would not have expected them to be identical, just because the backup API ensures that any backups are self consistent (ie transactions in progress are ignored).

Tables with data that will never be deleted or changed

This is a more in depth follow up to a question I asked yesterday about storing historical data ( Storing data in a side table that may change in its main table ) and I'm trying to narrow down my question.
If you have a table that represents a data object at the application level and need that table for historical purposes is it considered bad practice to set it up to where the information can't be deleted. Basically I have a table representing safety requirements for a worker and I want to make it so that these requirements can never be deleted or changed. So if a change needs to made a new record is created.
Is this not a good idea? What are the best practice to deal with data like this? I have a table with historical safety training data and it points to the table with requirement data (as well as some other key tables) so I can't let the requirements be changed or the historical table will be pointing to the wrong information.
Is this not a good idea?
Your scenario sounds perfectly valid to me. If you have historical data that you need to keep there are various ways to meeting that requirement.
Option 1:
Store all historical data and current data in one table (make sure you store a creation date so you know what's old and what's new). When you need to retrieve the most recent record for someone, just base it on the most recent date that exists in the table.
Option 2:
Store all historical data in a separate table and keep current data in another. This might be beneficial if you're working with millions of records so you don't degrade performance of any applications built on top of it. Either at the time of creating a new record or through some nightly job you can move old data into the other table to keep your current table lightweight.
Here is one alternative, that is not necessarily "better" but is something to keep in mind...
You could have separate "active" and "historical" tables, then create a trigger so whenever a row in the active table is modified or deleted, the old row values are copied to the historical table, together with the timestamp.
This way, the application can work with the active table in a natural way, while the accurate history of changes is automatically generated in the historical table. And since this works at the DBMS level, you'll be more resistant to application bugs.
Of course, things can get much messier if you need to maintain a history of the whole graph of objects (i.e. several tables linked via FOREIGN KEYs). Probably the simplest option is to simply forgo referential integrity for historical tables and just keep it for active tables.
If that's not enough for your project's needs, you'll have to somehow represent a "snapshot" of the whole graph at the moment of change. One way to do it is to treat the connections as versioned objects too. Alternatively, you could just copy all the connections with each version of the endpoint object. Either case will complicate your logic significantly.

Resources