OpenEdge Database Row Version - openedge

I am attempting to implement a row version strategy for tables in our OpenEdge database.
The simple solution i have come up with would be to add an integer iRowVersion field to each table and have the write trigger validate and increment the field as follows:
TRIGGER PROCEDURE FOR WRITE OF Customer OLD BUFFER oldCustomer.
IF Customer.iRowVersion < oldCustomer.iRowVersion THEN
RETURN ERROR "RowVersion Out Of Date".
ASSIGN Customer.iRowVersion = Customer.iRowVersion + 1.
This will prevent any concurrent changes being overwritten, however i am unsure the increment by one per row is the best.
SQL ROWVERSION is incremented accross the entire database, and to emulate that approach would use a sequence instead:
ASSIGN Customer.iRowVersion = NEXT-VALUE(rowVersionSequence).
In our large database where many records will be changing, this has the potential to increase the sequence very quickly. Having a sequence per table would curtail this but seems over the top and the +1 approach keeps it simple.
To clarify the question - would it be better to increment a row version number based on the rows last version, or should the SQL like approach be taken - making every row version unique to the database.
Additionally if going down the SQL style route, would the create trigger need to assign an initial row version? (otherwise all new unmodified records initialise at 0).

To version control records in the OpenEdge database I now have a solution that should work well, and is fairly simple.
Each table that needs to have a row version will have a RowVersion field, of type Integer.
We have a program that generates write triggers when we create new tables, so updating this to add some new code has been simple. The write trigger now checks the record to see if the table has a RowVersion field, and if so it then increments the version by 1.
Checking to make sure the row version matches before updating is the responsibility of the programmer in the code / script they are running.
There were several reasons for this method, but it keeps things simple:
Integers are simple and easy to read when running queries and debugging the database. Given our application uses, it is unlikely we would ever overflow an integer either.
A sequence is not needed to keep rowversions unique. They don't need to be. Each record just increments its own row version.
Although ProDataSets can do optimistic locking, there is no guarantee that the records in use will always be read / written using these, and therefore a field gives us the flexibility to write different code depending on the use.
Usually row versions should be checked before updating, if there was data issues, then fix scripts might need to be run to overwrite data regardless. For this we leave the checking to be done in a calling procedure (and not the trigger) for a write operation to a record.

Related

Can MariaDB return incomplete data?

I am using MySQL Connector to connect to a MariaDB server.
A function in my program periodically retrieves all entries in a table (with a select * from ... without any wheres, limits, etc.).
After it gets the data, it checks if these rows (using an auto-incremented id) are already present in its memory, and if not it adds them. But if a row does not exist in the retrieved list but presents in the memory-list, then that row must be deleted from memory.
Deleting that row from memory is not the only thing that's gonna happen. It also deletes a bunch of other tables/files linked to that row. So, if the connector somehow fails, does not retrieve the full list, and does not report this, then I'll get into trouble.
It might be a tad stupid question but I couldn't make sure if I needed any additional safety measures.

DynamoDB: Is it worth indexing a table for a one-time migration effort?

We are migrating a ton of different tables with different attributes to another table using a script to do conversions into the new DynamoDB table formats.
Details aside, we need to add the "migrated" attribute to every item in the old tables. In order to do this, we are aware that we need to do a scan & update every item in the table with the new attribute. However, if the script we're running that adds this attribute dies midway, we will need to restart the script and filter out anything that doesn't have this new attribute (and only add the new attribute to the items missing it).
One thought that came up was that we could add a global secondary index onto the table with the primaryKey + the migrated flag so that we could just use that to identify what needs to get migrated faster.
However, for a one-time migration effort (that might be run a few times in the case of failures), I'm not sure if its worth the cost of creating the index? The table has hundreds of millions of items in it, and it's hard for me to justify creating a huge index just to speed up the scan. Thoughts?
To use a GSI effectively you would ideally make it a sparse index. It would only contain unmigrated items. You would control this by setting an attribute "unmigrated" on every item, then remove that from the item after migrating it, but this will 4x your writes (because you write to the table and index, once when you add the unmigrated flag, once when you remove it).
I recommend that in your script that scans the table, periodically save the LastEvaluatedKey so you can resume where it left off if the script fails. To speed up the scan you can perform a segmented scan in parallel.

Controlling read locks on table for multithreaded plsql execution

I have a driver table with a flag that determines whether that record has been processed or not. I have a stored procedure that reads the table, picks a record up using a cursor, does some stuff (inserts into another table) and then updates the flag on the record to say it's been processed. I'd like to be able to execute the SP multiple times to increase processing.
Obvious answer seemed to be to use 'for update skip locked' in the select for the cursor but it seems this means I cannot commit within the loop (to update the processed flag and commit my inserts) without getting the fetch out of sequence error.
Googling tells me Oracle's AQ is the answer but for the time being this option is not available to me.
Other suggestions? This must be a pretty common request but I've been unable to find anything that useful.
TIA!
A

Reindexing a large SQL Server database to Lucene

We have a web service method which accepts some data and puts it in Lucene index. We use it to index new and updated entries from our asp.net web app.
These entries are stored in a large SQL Server table (20M rows and growing), and I need a way to be able to reindex the whole table in case if current index gets deleted or corrupted. I'm not sure what's the optimal way to retrieve chunks of data from a large table. Currently, we use the fact that the table has PK which is autoincrement, so we get chunks of 1000 rows until it starts to return nothing. Kind of like (in pseudo language):
i = 0
while (true)
{
SELECT col1, col2, col3 FROM mytable WHERE pk between i and i + 1000
.... if result is empty 20 times in a row, break ....
.... otherwise send result to web service to reindex ....
i = i + 1000
}
This way, we don't need to SELECT COUNT(*) which would be a big performance killer, and we just move up the pk values until we stop getting any results. This has it's con: if we have a hole greater than 20,000 values somewhere in the table, it will stop indexing assuming it reached the end, but that's a tradeoff we have to live for now.
Can anyone suggest a more efficient way of getting data from a table to index? I would assume we are not the first ones facing this problem - search engines are widely used nowadays :)
For what we do with Lucene, we rarely need to reindex everything. I can't remember coming across any case when all index would be corrupted (Lucene is actually quite safe/good at this), but it has been many times when individual items needed to be reindexed because of one reason or another. I'd say the most frequent reindexing patterns would be:
reindex items by given id (or set of ids)
reindex items by given period of time
The latter, of course, requires separate db index on the relevant date field(s) which should be a bit costly for 20M+ records but we decided to go for it (our biggest deployment had up to 10M records) as disk space is cheap these days anyway.
EDIT: added few explanations as per question author's comment.
If the source data structure changes, requiring reindexing of all records, our approach is to roll out new code which ensures all new data is correct (basically forms correct Lucene Document from this moment). Then after we can reindex things in batches (either manually or by hand), by providing relevant period ranges. This, to certain extent, also applies to Lucene version changes, too.
Why is a COUNT(*) a performance killer? What about MAX(id)? I'm thinking that a index would provide the information needed for those queries. You do have an index on your primary key, right?
I actually just figured it out - I can use IDENT_CURRENT(table_name) to get the last generated id, and use that instead of MAX() or Count() - this method should blow the other two away :)

Is there any best way to implement version control for database content?

I'm currently writing a posting website that needs to have a version control on posts. I just don't know how I should implement this, in term of database and technique, in order to save and control the post.
Is there anyone experienced with this who can help me?
I've seen that Wordpress does version control only in 1 table, which is POST. I also suggest doing the same since it's trouble to write into 2 tables with the same amount of data and fields.
I know that stackoverflow stores deltas between versions. What I have seen others do is set up another table like the first one but with an author and a version or timestamp on it. You can push records over to the other table using a database trigger so you don't have to worry too much about making the change at the application level.
If you would like to use only one table then I would suggest adding the author, timestamp and a iscurrent flag. The flag isn't really needed since you can select the max version number but it will make your queries much easier. Set the flag only when the row is the highest version number. You can still use a trigger to populate the rows but watch out or you might end up in a loop of update triggers.
I would create two tables, one is "live version" table and the other is an "archive" table. When a new version is created, move the existing live version to the archive table (with appropriate timestamps and author notes) and add the new live version to the live table.
The archive table would have the same schema as the live table except that it would also have additional columns that would hold metadata about the versioning that you are supporting (version number, notes, timestamps, etc.).
Take this with a huge grain of salt, but, you could have a parent id that is joined to the primary key on the same table along with a bool that indicates whether its the current version. It's the method I used for a CMS system a while back... You might want a common id for a revision history (so that getting all historic entries for an item is non recursive). You could do this by including the first version's id with the all the subsequent versions so you could get the whole lot easily.
my .02
Sounds like you just want a record version number on the row. It's a number that identifies the latest version. Every time you update the data you actually insert a new row and bump the record version number. When you query to get the data, you just query to get the row with the max record version number. Triggers can be used to generate the record version number so you don't have to worry about generating the number when inserting.
If you want to go full-blown version control, you need some sort of status field on the row as well that tells you if this version is reverted/deleted/approved or not. When you get the latest, you select the the row with the max revision control number that has the appropriate status.
If you just want to save the history of the posts and not actually have revision control you can just use the record version number technique.
See also Implementing Version Control of DB Objects.

Resources