How to make sure SQLite database was not modified? - sqlite

I am looking for a way to tell if a database file was modified or not.
The amount of data stored is not large, however updates are often and running select statements after any update to create a new checksum of all data would be too much.
Previously most of our data was stored as entries with JSON, so it was much easier to get few rows and create a checksum of it. Now however, we need to use the database properly, so data will be normalized across few tables and multiple rows.
I need this to be handled by the database, so I don't want to create an md5 of the database file and check that.
Is there any way I could achieve that?

Whenever a database is modified, the file change counter in the database header is incremented.

Related

Can select commands edit a sql database in anyway?

Is there any way in which select commands alter a sqlite database? I would assume not, but don't want to rely on that assumption. (the specific concern i had in mind was if e.g. querying the database for example creates indexes or similar for quicker retrieval in subsequent times, hence causing the sql files to change)
Asking, because i want to cache some values calculated from a sql file, and only update these values if there has been an edit to the sql file [specifically if the number of bytes file size has changed, which would indicate the sql database has changed. The specific calculations are quite computations intensive, so don't want to repeat unless neaded].
SELECT statements cannot modify the database.
SQLite sometimes needs to store temporary indexes or intermediate results, but such data goes into the temporary database, not into the actual database file.
Anyway, to find out whether a database file has changed, check the file change counter.

Meta-data from SQLite

Is there any way to query a SQLite database for basic meta data such as:
Last date/time updated
Hash of database to indicate "state"
I am just looking for a simple, infrastructural way to have a script evaluate different databases and take a reasonable point of view on whether they are the same "state" as other databases in a different environment (PROD and DEV for instance).
In my experience, if no update, new record, or any change is made to the SQLite database file, the last modified time of the file doesn't change. So the last modified time should suffice for the time of any change made to database.
If 2 database files with same state are only accessed for reading, their modified times are always the same.
Similarly you get the file sizes for comparison.
You can use the whole file to calculate hash. If you consider same data in the database as the same "state" regardless of any difference in the past, then maybe you want hash of the all records in database, which is probably not simple.

Best format to store incremental data in regularly using R

I have a database that is used to store transactional records, these records are created and another process picks them up and then removes them. Occasionally this process breaks down and the number of records builds up. I want to setup a (semi) automated way to monitor things, and as my tool set is limited and I have an R shaped hammer, this looks like an R shaped nail problem.
My plan is to write a short R script that will query the database via ODBC, and then write a single record with the datetime, the number of records in the query, and the datetime of the oldest record. I'll then have a separate script that will process the data file and produce some reports.
What's the best way to create my datafile, At the moment my options are
Load a dataframe, add the record and then resave it
Append a row to a text file (i.e. a csv file)
Any alternatives, or a recommendation?
I would be tempted by the second option because from a semantic point of view you don't need the old entries for writing the new ones, so there is no reason to reload all the data each time. It would be more time and resources consuming to do that.

Updating a local sqlite db that is used for local metadata & caching from a service?

I've searched through the site and haven't found a question/answer that quite answer my question, the closest one I found was: Syncing objects between two disparate systems best approach.
Anyway to begun, because there is no RSS feeds available, I'm screen scraping a webpage, hence it does a fetch then it goes through the webpage to scrap out all of the information that I'm interested in and dumps that information into a sqlite database so that I can query the information at my leisure without doing repeat fetching from the website.
However I'm also storing various metadata on the data itself that is stored in the sqlite db, such as: have I looked at the data, is the data new/old, bookmark to a chunk of data (Think of it as a collection of unrelated data, and the bookmark is just a pointer to where I am in processing/reading of the said data).
So right now my current problem is trying to figure out how to update the local sqlite database with new data and/or changed data from the website in a manner that is effective and straightforward.
Here's my current idea:
Download the page itself
Create a temporary table for the parsed data to go into
Do a comparison between the official and the temporary table and copy updates and/or new information to the official table
This process seems kind of complicated because I would have to figure out how to determine if the data in the temporary table is new, updated, or unchanged. So I am wondering if there isn't a better approach or if anyone has any suggestion on how to architecture/structure such system?
Edit 1:
I'm not sure where to put the additional information, in an comment or as an edit, so I'm going to add it here.
This expands a bit on the metadata in regards of bookmarking, basically the data source can create new data/addition to the current data, so one reason why I was thinking of doing the temporary table idea was so that I would be able to determine if an data source that has been "bookmarked" has any new data or not.
Is it really important to determine if the data in the temporary table is new, updated or unchanged? Do you really need to keep an history of the changes?
NO: don't use the temporary table but just mark as old (timestamp) your old records, don't do updates, and just insert your new data.
YES: your idea seems correct to me but all depends on how much data you need to process each time; i don't think it is feasible with a large amount of data.

Making bulk Insertion/updation

I have data entry form like...
Data Entry Form http://img192.imageshack.us/img192/2478/inputform.jpg
There are some empty rows and some of them have values. User can Update existing values and can also fill value in empty rows.
I need to map these values in my DB table and some of them will be inserted as new rows into the database and existing record will be updated.
I need your suggestions, How can I accomplish this scenario with best approach.
Thanks
For each row, I would have a primary key (hidden), a dirty flag, and a new flag. In the grid, you would set the "dirty" flag to true when changes are made. When adding new rows in the UI, you would set the new flag as well as generate a primary key (this would be easiest if you used GUIDs for the key). Then, when you post this all back to the server, you would do inserts when the new flag is set and updates for those with the dirty flag.
Once the commit of the data has completed, you would simply clear the dirty and new flags.
Of course, if the data is shared by multiple contributors and can be edited concurrently, there's a bit more involved if you don't want someone overwriting another's edits.
I would look into using ADO.net DataSets and DataTables as a backing store in memory for your custom data grid. ADO.net allows you to bulk load a data set out of the database and track inserts, updates, and deletes against that data in memory. Once you are done, you can then bulk process the stored transactions back into the database.
The big benefit of using ADO.net is that all the prickly change tracking code is written for you already, and the library is deployed to every .net capable machine.
While it isn't in vogue right now, you can also send ADO.net data sets across the wire using XML serialization for altering and then send it back to be processed into the database.
Google around. There are literally thousands of books, tutorials, and blog posts on how to use ADO.net.

Resources