Controlling read locks on table for multithreaded plsql execution - plsql

I have a driver table with a flag that determines whether that record has been processed or not. I have a stored procedure that reads the table, picks a record up using a cursor, does some stuff (inserts into another table) and then updates the flag on the record to say it's been processed. I'd like to be able to execute the SP multiple times to increase processing.
Obvious answer seemed to be to use 'for update skip locked' in the select for the cursor but it seems this means I cannot commit within the loop (to update the processed flag and commit my inserts) without getting the fetch out of sequence error.
Googling tells me Oracle's AQ is the answer but for the time being this option is not available to me.
Other suggestions? This must be a pretty common request but I've been unable to find anything that useful.
TIA!
A

Related

PROGRESS 4GL - When to use FOR FIRST, CAN-FIND and FIND FIRST?

I am new to progress 4GL. I'm always willing to write a proper code and willing to know each end every keyword that we are using but following sample queries giving same results. I don't know when to use FIND FIRST, FOR FIRST and CAN-FIND? Please help me by re-writing with impeccable answer
FOR EACH Customer NO-LOCK:
FOR FIRST Order OF Customer:
/*somelogic*/
END.
END.
FOR EACH Customer NO-LOCK:
FIND FIRST Order OF Customer NO-LOCK NO-ERROR.
IF AVAILABLE Order THEN
/*somelogic*/
END.
FOR EACH Customer NO-LOCK:
IF CAN-FIND(FIRST Order OF Customer ) THEN
DO:
/*somelogic*/
END.
END.
FOR FIRST scopes the record you find to a block. It avoids having to check the availability of the record when you reference it in the block as the block won't execute if it's not available.
FIND FIRST should never really be used. You have no control over which record will be the first to be found if there are many, and if there is just one you should just use FIND without the FIRST. That way the code explains what is expected, and you can test for AMBIGUOUS to ensure someone hasn't done something silly.
CAN-FIND is used for checking if a record exists without actually pulling that record back to the client side. There will be no record available in the buffer so you can't use the data. It's an excellent way of checking if a record exists without the overhead of pulling the content back across the wire. Use it if you don't care about the data.

Second ODBC UPDATE call without a 1s delay causes first to not happen

I'm making a couple of calls in a loop into a custom API to update a table in an SQL database, and I found that if I perform the second one immediately, the first one does not actually change the database. If I wait one second between calls, it works.
These two calls were originally only made after individual UI button presses, so this is likely the first time anyone ever tried doing it twice in a row so quickly after each other. We had a feature request that now requires it though.
A hardcoded sleep() is good for tracking down the issue, but it really goes against the grain to consider that a solution. So I'd like to know what needs to be done in ODBC to ensure a previous operation on a table has completed so that the next one won't fail. But again, I'm a total ODBC noob, so I'm not familiar with how its API is supposed to be used (and of course the author of this code left the company over 6 months ago).
Tracking through the layers of API code, I found
Everything ends in Windows ODBC calls.
A single handle for a single connection (from SQLAllocHandle ) is used for all calls.
The query in question is roughly UPDATE table_name SET ... BEGIN INSERT INTO table_name (...) VALUES (...); END
The call sequence for each query seems to be:
SQLCancel();
check(SQLPrepare());
check(SQLExecute());
SQLCancel();
Where check() is:
if (code != SQL_SUCCESS && code != SQL_SUCCESS_WITH_INFO) {
exception();
}
The main issue I see here is that warnings will be totally ignored. But my noob reading of things is that if the query is still running at the end of the call, it should have gone into exception() with something like HYT00 (Timeout expired), right?
The only other thing I can think is that another thread might be calling this API on the same connection, and cancelling the operation with its SQLCancel(). I'll go triple check that, but I'm pretty sure that's not happening.

OpenEdge Database Row Version

I am attempting to implement a row version strategy for tables in our OpenEdge database.
The simple solution i have come up with would be to add an integer iRowVersion field to each table and have the write trigger validate and increment the field as follows:
TRIGGER PROCEDURE FOR WRITE OF Customer OLD BUFFER oldCustomer.
IF Customer.iRowVersion < oldCustomer.iRowVersion THEN
RETURN ERROR "RowVersion Out Of Date".
ASSIGN Customer.iRowVersion = Customer.iRowVersion + 1.
This will prevent any concurrent changes being overwritten, however i am unsure the increment by one per row is the best.
SQL ROWVERSION is incremented accross the entire database, and to emulate that approach would use a sequence instead:
ASSIGN Customer.iRowVersion = NEXT-VALUE(rowVersionSequence).
In our large database where many records will be changing, this has the potential to increase the sequence very quickly. Having a sequence per table would curtail this but seems over the top and the +1 approach keeps it simple.
To clarify the question - would it be better to increment a row version number based on the rows last version, or should the SQL like approach be taken - making every row version unique to the database.
Additionally if going down the SQL style route, would the create trigger need to assign an initial row version? (otherwise all new unmodified records initialise at 0).
To version control records in the OpenEdge database I now have a solution that should work well, and is fairly simple.
Each table that needs to have a row version will have a RowVersion field, of type Integer.
We have a program that generates write triggers when we create new tables, so updating this to add some new code has been simple. The write trigger now checks the record to see if the table has a RowVersion field, and if so it then increments the version by 1.
Checking to make sure the row version matches before updating is the responsibility of the programmer in the code / script they are running.
There were several reasons for this method, but it keeps things simple:
Integers are simple and easy to read when running queries and debugging the database. Given our application uses, it is unlikely we would ever overflow an integer either.
A sequence is not needed to keep rowversions unique. They don't need to be. Each record just increments its own row version.
Although ProDataSets can do optimistic locking, there is no guarantee that the records in use will always be read / written using these, and therefore a field gives us the flexibility to write different code depending on the use.
Usually row versions should be checked before updating, if there was data issues, then fix scripts might need to be run to overwrite data regardless. For this we leave the checking to be done in a calling procedure (and not the trigger) for a write operation to a record.

What happens when 5 second execution time limit exceeds in Azure DocumentDb Stored Procedures

I have a read operation that reads a lot of records from a DocumentDb collection and when executed it will run for a long time. I am writing a stored procedure to move that query to the server-side. I understand that documentdb stored procedures have a execution cap of 5 seconds. What i wanna know is that in a read operation what happens when the query execution hits that time limit. Can i add some kind of a retry logic to continue after some time or will i have to do the read from the beginning?
This is not a problem if you follow this simple pattern when writing your stored procedures and you keep calling the stored procedure until continuation comes back null.
The key help here is that you are given some buffer beyond the 5 seconds to wrap up your stored procedure before it's forcedly shut down. Whenever the sproc is about to be shut down, the most recent database operation will return false instead of true. DocumentDB gives you enough time to process the last batch returned.
For read/query operations (example countDocuments), the key element to the recommended pattern is to store the continuation token for your read/query operation in the body that's returned from your stored procedure. You can set the body as many times as you want. Only the last one will be returned when the stored procedure either exist gracefully when resource limits are reached or whenever the stored procedure's job is done.
For write operations (example createVariedDocuments), documentdb-utils still looks at the continuation that's returned to decide if the sproc has finished its work except in this case, it won't be a read/query continuation and its value doesn't matter. It's simply an indicator for whether or not you need to call the sproc again. That's why I set it to "Value does not matter" in my example. Anything other than null would work.
Key off of the continuation that's returned from the stored procedure execution to decide whether or not to call it again. Documentdb-utils will automatically keep calling your stored procedure until continuation comes back null but you can implement this yourself. Documentdb-utils also includes a number of example sprocs that implement this pattern for you to riff off of. Documentdb-lumenize utilizes this pattern to the nth degree to implement an aggregation engine running inside of a sproc.
Disclosure: I'm the author of documentdb-utils and documentdb-lumenize.

Sqlite3: How to interrupt a long running update without roll back?

I have a long running multirow update such as:
UPDATE T set C1 = calculation(C2) where C1 is NULL
If table is large this update may take many seconds or even minutes.
During this time all other queries on this table fail with "database is locked"
after connection timeout expires (currently my timeout is 5 seconds).
I would like to stop this update query after, say, 3 seconds,
then restart it. Hopefully, after several restarts entire table will be updated.
Another option is to stop this update query before making any other request
(this will require inter-process cooperation, but it may be doable).
But I cannot find a way to stop update query without rolling back
all previously updated records.
I tried calling interrupt and returning non-0 from progress_handler.
Both these approaches abort the update command
and roll back all the changes.
So, it appears that sqlite treats this update as a transaction,
which does not make much sense in this case because all rows are independent.
But I cannot start a new transaction for each row, can I?
If interrupt and progress_handler cannot help me, what else I can do?
I also tried UPDATE with LIMIT and also WHERE custom_condition(C1).
These approaches do allow me to terminate update earlier,
but they are significantly slower than regular update
and they cannot terminate the query at specific time
(before another connection timeout expires).
Any other ideas?
This multirow update is such a common operation
that, I hope, other people have a good solution for it.
So, it appears that sqlite treats this update as a transaction, which does not make much sense in this case because all rows are independent.
No, that actually makes perfect sense, because you're not executing multiple, independent updates. You're executing a single update statement. The fine manual says
No changes can be made to the database except within a transaction.
Any command that changes the database (basically, any SQL command
other than SELECT) will automatically start a transaction if one is
not already in effect. Automatically started transactions are
committed when the last query finishes.
If you can determine the range of keys involved, you can execute multiple update statements. For example, if a key is an integer, and you determine the range to be from 1 to 1,000,000, you can write code to execute this series of updates.
begin transaction;
UPDATE T set C1 = calculation(C2)
where C1 is NULL and your_key between 1 and 100000;
commit;
begin transaction;
UPDATE T set C1 = calculation(C2)
where C1 is NULL and your_key between 100001 and 200000;
commit;
Other possibilities . . .
You can sleep for a bit between transactions to give other queries a chance to execute.
You can also time execution using application code, and calculate a best guess at range values that will avoid timeouts and still give good performance.
You can select the keys for the rows that will be updated, and use their values to optimize the range of keys.
In my experience, it's unusual to treat updates this way, but it sounds like it fits your application.
But I cannot start a new transaction for each row, can I?
Well, you can, but it's probably not going to help. It's essentially the same as the method above, using a single key instead of a range. I wouldn't fire you for testing that, though.
On my desktop, I can insert 100k rows in 1.455 seconds, and update 100k rows with a simple calculation in 420 ms. If you're running on a phone, that's probably not relevant.
You mentioned poor performance with LIMIT. Do you have a lastupdated column with an index on it? At the top of your procedure you would get the COMMENCED_DATETIME and use it for every batch in the run:
update foo
set myvalue = 'x', lastupdated = UPDATE_COMMENCED
where id in
(
select id from foo where lastupdated < UPDATE_COMMENCED
limit SOME_REASONABLE_NUMBER
)
P.S. With respect to slowness:
I also tried UPDATE with LIMIT and also WHERE custom_condition(C1). These approaches do allow me to terminate update earlier, but they are significantly slower than regular update...
If you're willing to give other processes access to stale data, and your update is designed so as not to hog system resources, why is there a need to have the update complete within a certain amount of time? There seems to be no need to worry about perfomance in absolute terms. The concern should be relative to other processes -- make sure they're not blocked.
I also posted this question at
http://thread.gmane.org/gmane.comp.db.sqlite.general/81946
and got several interesting answers, such as:
divide range of rowid into slices and update one slice at a time
use AUTOINCREMENT feature to start new update at the place where the previous update ended (by LIMIT 10000)
create a trigger that calls select raise(fail, ...) to abort update without rollback

Resources