Microsoft Access with SQLite Back end #DELETED record problem - another solution - sqlite

SHORT VERSION: If all else fails, add a value (even zero) to an additional Number column in the SQLite table to stop the #DELETED demon.
LONGER VERSION: There have been many posts about this frustrating and inconsistent problem over the years. Lord knows, I have read each one at least half dozen times and tried every remedy proposed - and usually one of the incantations will finally solve the problem. Yet I found myself recently in a new quandary and spent the last two days racking my brain to osmose why none of the tricks worked on it.
It was classic: Access 2019 front end linked to a SQLite back end table (via the Devart SQLite ODBC driver, which I believe is inconsequential). The table had the recommended col_ID of Number format, Auto-incrementing as the Primary Key. It had col_DateUpdated, Text format with default value of current_timestamp. There was col_OperatorID which was Text format and the only column in a Unique Index. Finally, there were two other non-indexed Number format columns.
The table worked fine in SQLite. I could Add, Delete, Update no problem. In Access, when I opened the linked table and added a record it did not immediately show the auto incrementing value in col_ID, nor the date/time stamp. When I clicked off the row, it immediately filled all the columns of the new row with #DELETED. If I closed the table and reopened it, the new row would display just fine.
The SQLite Pragmas were all set to Default including Auto Index, Normal Locking Mode and Full Synch. I tried every combination of changing the table structure, column formats, indexes, default values, etc. The problem persisted regardless of whether there was any other data in the table or not.
I've been coding in Access for over 30 years and SQLite for three and have never seen anything like it.
I was stumped until , for the heck of it, I added a value into one of the other Number columns. Amazingly, it worked great!
I can create a new row, put values in col_OperatorID AND the two non-indexed Number columns, click off the row and it takes it fine. It updates the autonumber primary key col_ID and col_DateUpdated with the current date/time just fine with no #DELETED nonsense.
It beats me why it works, maybe Access finally can accept it as a really, really unique record (even though the additiaonal data is not in any index) or maybe putting the numeric value in the other, seemingly unimportant, columns forces an update across the link, I don't know. But I thought I would pass this along because I KNOW probably forevermore, unless Microsoft or the SQLite folks come up with a cure for this, there will be people that will need this additional gimmick to get out of #DELETED hell.
Good luck and Happy Trails.

Related

DynamoDB top item per partition

We are new to DynamoDB and struggling with what seems like it would be a simple task.
It is not actually related to stocks (it's about recording machine results over time) but the stock example is the simplest I can think of that illustrates the goal and problems we're facing.
The two query scenarios are:
All historical values of given stock symbol <= We think we have this figured out
The latest value of all stock symbols <= We do not have a good solution here!
Assume that updates are not synchronized, e.g. the moment of the last update record for TSLA maybe different than for AMZN.
The 3 attributes are just { Symbol, Moment, Value }. We could make the hash_key Symbol, range_key Moment, and believe we could achieve the first query easily/efficiently.
We also assume could get the latest value for a single, specified Symbol following https://stackoverflow.com/a/12008398
The SQL solution for getting the latest value for each Symbol would look a lot like https://stackoverflow.com/a/6841644
But... we can't come up with anything efficient for DynamoDB.
Is it possible to do this without either retrieving everything or making multiple round trips?
The best idea we have so far is to somehow use update triggers or streams to track the latest record per Symbol and essentially keep that cached. That could be in a separate table or the same table with extra info like a column IsLatestForMachineKey (effectively a bool). With every insert, you'd grab the one where IsLatestForMachineKey=1, compare the Moment and if the insertion is newer, set the new one to 1 and the older one to 0.
This is starting to feel complicated enough that I question whether we're taking the right approach at all, or maybe DynamoDB itself is a bad fit for this, even though the use case seems so simple and common.
There is a way that is fairly straightforward, in my opinion.
Rather than using a GSI, just use two tables with (almost) the exact same schema. The hash key of both should be symbol. They should both have moment and value. Pick one of the tables to be stocks-current and the other to be stocks-historical. stocks-current has no range key. stocks-historical uses moment as a range key.
Whenever you write an item, write it to both tables. If you need strong consistency between the two tables, use the TransactWriteItems api.
If your data might arrive out of order, you can add a ConditionExpression to prevent newer data in stocks-current from being overwritten by out of order data.
The read operations are pretty straightforward, but I’ll state them anyway. To get the latest value for everything, scan the stocks-current table. To get historical data for a stock, query the stocks-historical table with no range key condition.

OpenEdge Database Row Version

I am attempting to implement a row version strategy for tables in our OpenEdge database.
The simple solution i have come up with would be to add an integer iRowVersion field to each table and have the write trigger validate and increment the field as follows:
TRIGGER PROCEDURE FOR WRITE OF Customer OLD BUFFER oldCustomer.
IF Customer.iRowVersion < oldCustomer.iRowVersion THEN
RETURN ERROR "RowVersion Out Of Date".
ASSIGN Customer.iRowVersion = Customer.iRowVersion + 1.
This will prevent any concurrent changes being overwritten, however i am unsure the increment by one per row is the best.
SQL ROWVERSION is incremented accross the entire database, and to emulate that approach would use a sequence instead:
ASSIGN Customer.iRowVersion = NEXT-VALUE(rowVersionSequence).
In our large database where many records will be changing, this has the potential to increase the sequence very quickly. Having a sequence per table would curtail this but seems over the top and the +1 approach keeps it simple.
To clarify the question - would it be better to increment a row version number based on the rows last version, or should the SQL like approach be taken - making every row version unique to the database.
Additionally if going down the SQL style route, would the create trigger need to assign an initial row version? (otherwise all new unmodified records initialise at 0).
To version control records in the OpenEdge database I now have a solution that should work well, and is fairly simple.
Each table that needs to have a row version will have a RowVersion field, of type Integer.
We have a program that generates write triggers when we create new tables, so updating this to add some new code has been simple. The write trigger now checks the record to see if the table has a RowVersion field, and if so it then increments the version by 1.
Checking to make sure the row version matches before updating is the responsibility of the programmer in the code / script they are running.
There were several reasons for this method, but it keeps things simple:
Integers are simple and easy to read when running queries and debugging the database. Given our application uses, it is unlikely we would ever overflow an integer either.
A sequence is not needed to keep rowversions unique. They don't need to be. Each record just increments its own row version.
Although ProDataSets can do optimistic locking, there is no guarantee that the records in use will always be read / written using these, and therefore a field gives us the flexibility to write different code depending on the use.
Usually row versions should be checked before updating, if there was data issues, then fix scripts might need to be run to overwrite data regardless. For this we leave the checking to be done in a calling procedure (and not the trigger) for a write operation to a record.

Using auto-number database fields theory

I was on "another" programming forum, and we were talking about getting the next number from an auto-increment field BEFORE an insert takes place (there is a way using ADOX). This was in an MS-Access database btw.
Anyway, the discussion veered off into the area of SHOULD you use auto-increment fields for things like invoice numbers, PO numbers, bill of lading numbers, or anything else that needs an unique, incrementing number.
My thoughts were "why not"? Other people are arguing that an Invoice number (for instance) should be managed as a separate table and incremented with code, not using an auto-number field.
Can someone give me a good reason why that would be true?
I've used auto-number fields for years for just this type of thing and have never had problem one.
Your thoughts?
I have always avoided number auto_increment. As it turns out for good reason. But originally my reasons were because that was what the professor told us.
Facebook had a major breach a few years ago - simply because they were use AUTO_INCREMENT fields for user id's. Doesn't take a calculator to figure out that if my ID is 10320 there is likely someone with ID 10319, etc.
When debugging (or proofing design) having a key that implicit of the data it represents is a heck of a lot easier.
Have keys that are implicit of the data reduces the potencial for corrupted data (type's and user guessing).
Implicit keys require the developer think about they're data. I have never come across a table using implicit keys that was not normalized.
Other than the fact deadlines often run tight - there is no great reason for auto increment.
Normally I use and autonumbering field for the ID so I don't need to think about how's generated.
The recordset operation like insert and delete alter the sequence skipping block of numbers.
When you manage CustomerID, Invoice Numbers and so on, it's better to have the full control over them instead of letting them under system's control.
You can create a function that generates for you the desired numbers using a rule (e.g. the invoice can be a function that include the invoicing date).
With autonumbering you can't manage this.
After that there is NO FIXED RULES about what to do and what not do.
It's just your practice and experience and the degree of freedom you want to have.
Bye:-)

Reindexing a large SQL Server database to Lucene

We have a web service method which accepts some data and puts it in Lucene index. We use it to index new and updated entries from our asp.net web app.
These entries are stored in a large SQL Server table (20M rows and growing), and I need a way to be able to reindex the whole table in case if current index gets deleted or corrupted. I'm not sure what's the optimal way to retrieve chunks of data from a large table. Currently, we use the fact that the table has PK which is autoincrement, so we get chunks of 1000 rows until it starts to return nothing. Kind of like (in pseudo language):
i = 0
while (true)
{
SELECT col1, col2, col3 FROM mytable WHERE pk between i and i + 1000
.... if result is empty 20 times in a row, break ....
.... otherwise send result to web service to reindex ....
i = i + 1000
}
This way, we don't need to SELECT COUNT(*) which would be a big performance killer, and we just move up the pk values until we stop getting any results. This has it's con: if we have a hole greater than 20,000 values somewhere in the table, it will stop indexing assuming it reached the end, but that's a tradeoff we have to live for now.
Can anyone suggest a more efficient way of getting data from a table to index? I would assume we are not the first ones facing this problem - search engines are widely used nowadays :)
For what we do with Lucene, we rarely need to reindex everything. I can't remember coming across any case when all index would be corrupted (Lucene is actually quite safe/good at this), but it has been many times when individual items needed to be reindexed because of one reason or another. I'd say the most frequent reindexing patterns would be:
reindex items by given id (or set of ids)
reindex items by given period of time
The latter, of course, requires separate db index on the relevant date field(s) which should be a bit costly for 20M+ records but we decided to go for it (our biggest deployment had up to 10M records) as disk space is cheap these days anyway.
EDIT: added few explanations as per question author's comment.
If the source data structure changes, requiring reindexing of all records, our approach is to roll out new code which ensures all new data is correct (basically forms correct Lucene Document from this moment). Then after we can reindex things in batches (either manually or by hand), by providing relevant period ranges. This, to certain extent, also applies to Lucene version changes, too.
Why is a COUNT(*) a performance killer? What about MAX(id)? I'm thinking that a index would provide the information needed for those queries. You do have an index on your primary key, right?
I actually just figured it out - I can use IDENT_CURRENT(table_name) to get the last generated id, and use that instead of MAX() or Count() - this method should blow the other two away :)

Is there any best way to implement version control for database content?

I'm currently writing a posting website that needs to have a version control on posts. I just don't know how I should implement this, in term of database and technique, in order to save and control the post.
Is there anyone experienced with this who can help me?
I've seen that Wordpress does version control only in 1 table, which is POST. I also suggest doing the same since it's trouble to write into 2 tables with the same amount of data and fields.
I know that stackoverflow stores deltas between versions. What I have seen others do is set up another table like the first one but with an author and a version or timestamp on it. You can push records over to the other table using a database trigger so you don't have to worry too much about making the change at the application level.
If you would like to use only one table then I would suggest adding the author, timestamp and a iscurrent flag. The flag isn't really needed since you can select the max version number but it will make your queries much easier. Set the flag only when the row is the highest version number. You can still use a trigger to populate the rows but watch out or you might end up in a loop of update triggers.
I would create two tables, one is "live version" table and the other is an "archive" table. When a new version is created, move the existing live version to the archive table (with appropriate timestamps and author notes) and add the new live version to the live table.
The archive table would have the same schema as the live table except that it would also have additional columns that would hold metadata about the versioning that you are supporting (version number, notes, timestamps, etc.).
Take this with a huge grain of salt, but, you could have a parent id that is joined to the primary key on the same table along with a bool that indicates whether its the current version. It's the method I used for a CMS system a while back... You might want a common id for a revision history (so that getting all historic entries for an item is non recursive). You could do this by including the first version's id with the all the subsequent versions so you could get the whole lot easily.
my .02
Sounds like you just want a record version number on the row. It's a number that identifies the latest version. Every time you update the data you actually insert a new row and bump the record version number. When you query to get the data, you just query to get the row with the max record version number. Triggers can be used to generate the record version number so you don't have to worry about generating the number when inserting.
If you want to go full-blown version control, you need some sort of status field on the row as well that tells you if this version is reverted/deleted/approved or not. When you get the latest, you select the the row with the max revision control number that has the appropriate status.
If you just want to save the history of the posts and not actually have revision control you can just use the record version number technique.
See also Implementing Version Control of DB Objects.

Resources