What happens when 5 second execution time limit exceeds in Azure DocumentDb Stored Procedures - azure-cosmosdb

I have a read operation that reads a lot of records from a DocumentDb collection and when executed it will run for a long time. I am writing a stored procedure to move that query to the server-side. I understand that documentdb stored procedures have a execution cap of 5 seconds. What i wanna know is that in a read operation what happens when the query execution hits that time limit. Can i add some kind of a retry logic to continue after some time or will i have to do the read from the beginning?

This is not a problem if you follow this simple pattern when writing your stored procedures and you keep calling the stored procedure until continuation comes back null.
The key help here is that you are given some buffer beyond the 5 seconds to wrap up your stored procedure before it's forcedly shut down. Whenever the sproc is about to be shut down, the most recent database operation will return false instead of true. DocumentDB gives you enough time to process the last batch returned.
For read/query operations (example countDocuments), the key element to the recommended pattern is to store the continuation token for your read/query operation in the body that's returned from your stored procedure. You can set the body as many times as you want. Only the last one will be returned when the stored procedure either exist gracefully when resource limits are reached or whenever the stored procedure's job is done.
For write operations (example createVariedDocuments), documentdb-utils still looks at the continuation that's returned to decide if the sproc has finished its work except in this case, it won't be a read/query continuation and its value doesn't matter. It's simply an indicator for whether or not you need to call the sproc again. That's why I set it to "Value does not matter" in my example. Anything other than null would work.
Key off of the continuation that's returned from the stored procedure execution to decide whether or not to call it again. Documentdb-utils will automatically keep calling your stored procedure until continuation comes back null but you can implement this yourself. Documentdb-utils also includes a number of example sprocs that implement this pattern for you to riff off of. Documentdb-lumenize utilizes this pattern to the nth degree to implement an aggregation engine running inside of a sproc.
Disclosure: I'm the author of documentdb-utils and documentdb-lumenize.

Related

Azure Cosmos DB: is it always necessary to check for HasMoreResults?

All examples for querying Azure Cosmos DB with .NET C# check FeedIterator<T>.HasMoreResults before calling FeedIterator<T>.ReadNextAsync().
Considering that the default MaxItemCount is 100, and knowing for a fact that the query will return fewer items than 100, is it necessary to check for HasMoreResults?
Consider this example, which returns an integer:
var query = container.GetItemQueryIterator<int>("SELECT VALUE COUNT(1) FROM c");
int count = (await query.ReadNextAsync()).SingleOrDefault();
Is it necessary to check for HasMoreResults?
If your query can yield/terminate sooner, like the aggregations, then probably there is no more pages and HasMoreResults = false.
But the reason to always check HasMoreResults is because, in most cases, the SDK prefetches the next pages in memory while you are consuming the current one. If you don't drain all the pages, then these objects stay in memory. With time, memory footprint might increase (until eventually they get garbage collected but that can also consume CPU).
In cross-partition queries, it is common to see users make wrong assumptions, like, assuming all results of the query will be in 1 page (the first) and that can be true depending on which physical partition the data is stored, and it is very common in such cases users complaining that they had some code running for some time perfectly fine and then suddenly it stopped working (their data is now on another partition and not returning in the first page).
In some cases, the service might need to yield due to execution time going over the max time.
So, to avoid all these pitfalls (and others), the general recommendation is to loop until HasMoreResults = false. You won't be iterating more than it is required for each query, sometimes it will be one page, sometimes it might be more.
Source: https://learn.microsoft.com/azure/cosmos-db/nosql/query/pagination#understanding-query-executions
As a developer, we need to perform HasMoreResults Boolean check on DocumentClient object. If HasMoreResults is true then we can get more records by calling ExecuteNext method.
Also supporting comment by Mark Brown, it is best practice to check for HasMoreResults.
For managing results returned from quires Cosmosdb uses a continuation strategy. Each query submitted to Cosmos Db will have MaxItemCount Limit attribute and default limit value is 100.
The response of requests exceeding the MaxItemCount will get paginated and in response header continuation token will be present, which shows first partial page is returned and more records are available. Next pages can be retrieved by passing continuation token to subsequent calls.

Controlling read locks on table for multithreaded plsql execution

I have a driver table with a flag that determines whether that record has been processed or not. I have a stored procedure that reads the table, picks a record up using a cursor, does some stuff (inserts into another table) and then updates the flag on the record to say it's been processed. I'd like to be able to execute the SP multiple times to increase processing.
Obvious answer seemed to be to use 'for update skip locked' in the select for the cursor but it seems this means I cannot commit within the loop (to update the processed flag and commit my inserts) without getting the fetch out of sequence error.
Googling tells me Oracle's AQ is the answer but for the time being this option is not available to me.
Other suggestions? This must be a pretty common request but I've been unable to find anything that useful.
TIA!
A

BizTalk WCF-SQL composite operation for calling stored procedure updates, sequence?

We having a composite operation to invoke stored procedure to update a few tables. But running into some issues now potentially due to the sequence of event the updates are fired. Trying to understand how the composite operation work for WCF-SQL adapter, I know it is using one transaction context to execute the store procedures, but did it honor the sequence of rows when it comes to execute them? (e.g. run 1st row, then 2nd row then 3rd?) Environment is BizTalk2013 R2
Yes, the operations in a Composite Operation are executed in order and I've never had cause to doubt this.
Are you having a specific problem?

ExecuteNextAsync Not Working

I am working with Azure DocumentDB. I am looking at the ExecuteNextAsync operation. What I am seeing is the the ExecuteNextAsync returns no resluts. I am using examples I have found on line and don't generate any results. If I call an enumeration operation on the initial query results are returned. Is there an example showing the complete configuration for using ExecuteNextAsync?
Update
To be more explicit I am not actually getting any results. The call seems to just run and no error is generated.
Playing around with the collection defintion, I found that when I set the collection size to 250GB that this occurred. I tested with the collection to 10GB and it did work, for a while. Latest testing shows that the operation is now hanging again.
I have two collections generated. The first collection appears to work properly. The second one appears to fail on this operation.
Individual calls to ExecuteNextAsync may return 0 results, but when you run the query to completion by calling it until HasMoreResults is false, you will always get the complete results.
Almost always, a single call to ExecuteNextAsync will return results, but you may get 0 results commonly due to two reasons:
If the query is a scan, then DocumentDB will make partial progress based on available throughput. Here no results are returned, but a new continuation token based on the latest progress is returned to resume execution.
If it's a cross-partition query, then each call executes against a single partition. In this case, the call will return no results if that partition has no documents that match the query.
If you want queries to deterministically return results, you must use SELECT TOP vs. using the continuation token/ExecuteNextAsync as a mechanism for paging. You can also read query results in parallel across multiple partitions by changing FeedOptions.MaxDegreeOfParallelism to -1.

Attaching two memory databases

I am collecting data every second and storing it in a ":memory" database. Inserting data into this database is inside a transaction.
Everytime one request is sending to server and server will read data from the first memory, do some calculation, store it in the second database and send it back to the client. For this, I am creating another ":memory:" database to store the aggregated information of the first db. I cannot use the same db because I need to do some large calculation to get the aggregated result. This cannot be done inside the transaction( because if one collection takes 5 sec I will lose all the 4 seconds data). I cannot create table in the same database because I will not be able to write the aggregate data while it is collecting and inserting the original data(it is inside transaction and it is collecting every one second)
-- Sometimes I want to retrieve data from both the databses. How can I link both these memory databases? Using attach database stmt, I can attach the second db to the first one. But the problem is next time when a request comes how will I check the second db is exist or not?
-- Suppose, I am attaching the second memory db to first one. Will it lock the second database, when we write data to the first db?
-- Is there any other way to store this aggregated data??
As far as I got your idea, I don't think that you need two databases at all. I suppose you are misinterpreting the idea of transactions in sql.
If you are beginning a transaction other processes will be still allowed to read data. If you are reading data, you probably don't need a database lock.
A possible workflow could look as the following.
Insert some data to the database (use a transaction just for the
insertion process)
Perform heavy calculations on the database (but do not use a transaction, otherwise it will prevent other processes of inserting any data to your database). Even if this step includes really heavy computation, you can still insert and read data by using another process as SELECT statements will not lock your database.
Write results to the database (again, by using a transaction)
Just make sure that heavy calculations are not performed within a transaction.
If you want a more detailed description of this solution, look at the documentation about the file locking behaviour of sqlite3: http://www.sqlite.org/lockingv3.html

Resources