Cosmo ChangeFeed -Errors,exceptions and Service fail scenario's - azure-cosmosdb

All,
I am using Change Feed Processor Library.Want to know the best way to handle service failure along with the exceptions/errors scenario's in ProcessChangesAsync method. Below are the events am referring to.
1) Service failure - Service having the processor library crashed in the middle of some operation. How to start the process from the same document(doc on failure instance)? is there any inbuilt mechanism where change feed will start with the last failed documents? E.g. Let assume,in current batch we have 10 docs.5 processed successfully and then service breaks because of network failure or by some other reasons.Will my process starts with 6th document once service is re-started? How to achieve this?
2) Exception and Errors- Any errors in ProcessChangesAsync method can be handle using try catch at the global level but how to persist those failure records and make them available for the next batch? Again,looking for any available inbuilt mechanism in change feed process.

1) The Processor Library, by default, checkpoints after a successful run of ProcessChangesAsync. In the latest library version, you can customize the Checkpointer to do manual checkpoints in case you need it. If for some reason the processor shuts down before checkpointing, then it will start processing next from the the last successful checkpoint stored in the Leases collection. In your case, it will start with the first document again, so you will never lose a change but you could experience double processing (this is an "at least once" model).
2) There is no built-in mechanism that you can leverage, handling exceptions within the ProcessChangesAsync is your responsibility. You could not only add a global try/catch but, in the case you are looping over the documents, add a try/catch inside the loop, to handle a failing document (maybe send it to queue for later analysis/post-process) without losing the batch. If you require logging for those errors (I'm assuming that's what you mean by persisting errors?), then the latest version is compatible with LibLog, so plugging your own custom logging is as simple as:
using Microsoft.Azure.Documents.ChangeFeedProcessor.Logging;
var hostName = "SampleHost";
var tracelogProvider = new TraceLogProvider(); //You can use any provider supported by LibLog
using (tracelogProvider.OpenNestedContext(hostName))
{
LogProvider.SetCurrentLogProvider(tracelogProvider);
// After this, create IChangeFeedProcessor instance and start/stop it.
}
Source
Extra info for the comments
To avoid exceptions halting the batch or causing a batch to be reprocessed, you can have handling like this:
public async Task ProcessChangesAsync(IChangeFeedObserverContext context, IReadOnlyList<Document> documents, CancellationToken cancellationToken)
{
try
{
foreach(var document in documents)
{
try
{
// Do your work for the document
}
catch(Exception ex)
{
// Something happened with the current document, handle it, send it to a queue / another storage to analyze, log it. This catch will make the loop continue with the next.
}
}
}
catch(Exception ex)
{
// Something unhandled happened, log it and avoid throwing it again so the next batch is processed
}
}

Related

ASP.NET Core multithreaded background threads

Using ASP.NET Core .NET 5. Running on Windows.
Users upload large workbooks that need to be converted to a different format. Each conversion process is CPU intensive and takes around a minute to complete.
The idea is to use a pattern where the requests are queued in a background queue and then processed by background tasks.
So, I followed this Microsoft article
The queuing part worked well but the issue was that workbooks were executing sequentially in the background:
private async Task BackgroundProcessing(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
var workItem =
await TaskQueue.DequeueAsync(stoppingToken);
try
{
await workItem(stoppingToken);
}
catch (Exception ex)
{
_logger.LogError(ex,
"Error occurred executing {WorkItem}.", nameof(workItem));
}
}
}
If I queued 10 workbooks. Workbook 2 wouldn't start until workbook 1 is done. Workbook 3 wouldn't start until workbook 2 is done, etc.
So, I modified the code to run tasks without await and hid the warning with the discard operator (please note workItem is now Action, not Task):
while (!stoppingToken.IsCancellationRequested)
{
var workItem = await TaskQueue.DequeueAsync(stoppingToken);
_ = Task.Factory.StartNew(() =>
{
try
{
workItem(stoppingToken);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error occurred executing {WorkItem}.", nameof(workItem));
}
}, TaskCreationOptions.LongRunning);
}
That works -- I get all workbooks starting processing around the same time, and then they complete around the same time too. But, I am not sure if doing this is dangerous and can lead to bugs, crashes, etc.
Is the second version a workable solution, or will it lead to some disaster in the future? Is there a better way to implement parallel workloads on the background threads in ASP.NET?
Thanks.
Using an external queue has some advantages over in-memory queueing. In particular, the queue message are stored in a reliable external store with features around retries, multiple consumers, etc. If your app crashes, the queue item remains and can be tried again.
In Azure, you can use several services including Azure Storage Queues and Service Bus. I like Service Bus because it uses push-based behavior to avoid the need for a polling loop in your code. Either way, you can create an instance of IHostedService that will watch the queue and process the work items in a separate thread with configurable parallelization.
Look for examples on using within ASP.NET Core, for example:
https://damienbod.com/2019/04/23/using-azure-service-bus-queues-with-asp-net-core-services/
The idea is to use a pattern where the requests are queued in a background queue and then processed by background tasks.
The proper solution for request-extrinsic code is to use a durable queue with a separate backend processor. Any in-memory solution will lose that work any time the application is shut down (e.g., during a rolling upgrade).

ChangeFeedProcessorBuilder checkpointing after unsuccessful processing

I was investigating the behavior of a ChangeFeedProcessorBuilder processor1 that throws an exception or goes down while processing the particular change. Upon recovery, the same change will not be picked up anymore. Is there any way to checkpoint only after the successful processing of the notification?
The delegate is as follows:
var builder = container.GetChangeFeedProcessorBuilder("migrationProcessor",
(IReadOnlyCollection<object> input, CancellationToken cancellationToken) =>
{
Console.WriteLine(input.Count + " Changes Received by " + a);
// just first try will fail (static variable)
if (a++ == 0)
{
throw new Exception();
}
return Task.CompletedTask;
});
Thank you!
The default behavior of the Change Feed Processor is to checkpoint after a successful delegate execution: https://learn.microsoft.com/azure/cosmos-db/change-feed-processor#processing-life-cycle
The normal life cycle of a host instance is:
Read the change feed.
If there are no changes, sleep for a predefined amount of time (customizable with WithPollInterval in the Builder) and go to #1.
If there are changes, send them to the delegate.
When the delegate finishes processing the changes successfully, update the lease store with the latest processed point in time and go to #1.
If your delegate handler throws an unhandled exception, there is no checkpoint.
Adding from comments: The only scenario where the batch might not be retried is if the batch that throws is the first ever (lease has no Continuation). Because when the host picks up the lease again to reprocess, it has no point in time to retry from. Based on the official documentation, one lease is owned by a single instance, so there is no way that other instance could have picked up the same lease and be processing it in parallel (within the same Deployment Unit context).

How to use transactions in Cloud Datastore

I want to use Datastore from Cloud Compute through Java and I am following Getting started with Google Cloud Datastore.
My use case is quite standard - read one entity (lookup), modify it and save the new version. I want to do it in a transaction so that if two processes do this, the second one won't overwrite the changes made by the first one.
I managed to issue a transaction and it works. However I don't know what would happen if the transaction fails:
How to identify a failed transaction? Probably a DatastoreException with some specific code or name will be thrown?
Should I issue a rollback explicitly? Can I assume that if a transaction fails, nothing from it will be written?
Should I retry?
Is there any documentation on that?
How to identify a failed transaction? Probably a DatastoreException
with some specific code or name will be thrown?
Your code should always ensure that a transaction is either successfully committed or rolled back. Here's an example:
// Begin the transaction.
BeginTransactionRequest begin = BeginTransactionRequest.newBuilder()
.build();
ByteString txn = datastore.beginTransaction(begin)
.getTransaction();
try {
// Zero or more transactional lookup()s or runQuerys().
// ...
// Followed by a commit().
CommitRequest commit = CommitRequest.newBuilder()
.setTransaction(txn)
.addMutation(...)
.build();
datastore.commit(commit);
} catch (Exception e) {
// If a transactional operation fails for any reason,
// attempt to roll back.
RollbackRequest rollback = RollbackRequest.newBuilder()
.setTransaction(txn);
.build();
try {
datastore.rollback(rollback);
} catch (DatastoreException de) {
// Rollback may fail due to a transient error or if
// the transaction was already committed.
}
// Propagate original exception.
throw e;
}
An exception might be thrown by commit() or by another lookup() or runQuery() call inside the try block. In each case, it's important to clean up the transaction.
Should I issue a rollback explicitly? Can I assume that if a
transaction fails, nothing from it will be written?
Unless you're sure that the commit() succeeded, you should explicitly issue a rollback() request. However, a failed commit() does not necessarily mean that no data was written. See the note on this page.
Should I retry?
You can retry using exponential backoff. However, frequent transaction failures may indicate that you are attempting to write too frequently to an entity group.
Is there any documentation on that?
https://cloud.google.com/datastore/docs/concepts/transactions

How to determine that BizTalk has completed processing a message

We are writing automated system tests for a BizTalk application, but have a problem determining when we can execute the test's verification. We need to be sure that BizTalk has completely processed the message, or message processing has timed out, before the verification.
[Test]
public void ReceiveValidTaskMessageTestShouldBeLoggedInMessageLog()
{
// Exercise
MsmqHelpers.SendMessage(InboundQueueName, ValidMessage);
// Verify
Assert.That(() => GetMessageCount("ReceiveError"), Is.EqualTo(0).After(1000));
Assert.That(() => GetMessageCount("Receive"), Is.EqualTo(1).After(1000));
}
The last two lines check for the existence of a copy of the message in a table in an sql server, one table for successful message, one table for errors.
The problem here is that immediately after sending the message we verify that no message has been placed in the error table. But if BizTalk has not yet processed the message, then that assertion will pass even when it should fail.
What we need is something like this:
[Test]
public void ReceiveValidTaskMessageTestShouldBeLoggedInMessageLog()
{
// Exercise
MsmqHelpers.SendMessage(InboundQueueName, ValidMessage);
// Verify
Assert.That(() => PendingMessages, Is.EqualTo(0).After(1000));
Assert.That(() => GetMessageCount("ReceiveError"), Is.EqualTo(0));
Assert.That(() => GetMessageCount("Receive"), Is.EqualTo(1));
}
Herein lies the problem with automated integration testing.
Such testing is evidence-based, which is reflected in your test's assertions; you are looking for evidence that processing has taken place by check a database.
Similarly, in order to know that processing has finished, you are seeking some evidence that this has happened. For example, theoretically you could run queries against BizTalk message box database to check the state within.
However, BizTalk doesn't lend itself well to this kind of probing as it has not been built with testing in mind (one of it's weaknesses). I certainly wouldn't know how to go about doing this.
A couple of approaches worth considering:
Wait a "reasonable" amount of time before performing the database check to allow BizTalk to finish processing the message.
Have BizTalk output a log file (or some other evidence) just before processing completes which you can check before checking the database.
Even though the approach is limited automated integration testing is incredibly valuable.
A better approach would be to be notified when a record appears in either of those tables and pass/fail the test as appropriate. You could use a rudimentary infinite loop to continuously poll the tables, or a more elegant solution would be to use events - see the event handler delegate for more details.

Stoping web service in flex?

is it possible to stop a web service from executing?
I have a flex web application that searches clients with both full name and client id, when searching by name sometimes the usuer just types the last name and it takes a long time.
Since the app is used when clients are waiting in line, I would like to be able to stop the search and use their full name or id instead, and avoid waiting for the results and then having to search the user manually within the results.
thanks
edit: Sorry, I didn't explain myself correctly, when I meant "web service" I actually meant mx.rpc.soap.mxml.WebService, I want to stop it from waiting for the result event and the fault event. thanks.
There is actually a cancel(..) method explicitly for this purpose, though it is a little burried. Using the cancel method will cause the result and fault handlers not to be called and will also remove the busy cursor etc.
Depending on how you run your searches (ie. separate worker process etc), it is also possible to extend this by added in a cancelSearch() web service method to kill these worker processes and free up server resources etc.
private var _searchToken:AsyncToken;
public function doSearch(query:String):void
{
_searchToken = this.searchService.doSearch(query);
}
protected function doSearch_resultHandler(event:ResultEvent):void
{
trace("doSearch result");
trace("TODO: Do stuff with results");
_searchToken = null;
}
protected function doSearch_faultHandler(event:FaultEvent):void
{
trace("doSearch fault: " + event.fault);
_searchToken = null;
}
public function cancelSearch():void
{
var searchMessageId:String = _searchToken.message.messageId;
// Cancels the last service invocation or an invokation with the
// specified ID. Even though the network operation may still
// continue, no result or fault event is dispatched.
searchService.getOperation("doSearch").cancel(searchMessageId);
_searchToken = null;
trace("The search was cancelled, result/fault handlers not called");
// TODO: If your web service search method is using worker processes
// to do a search and is likely to continue processing for some time,
// you may want to implement a 'cancel()' method on the web service
// to stop any search threads that may be running.
}
Update
You could use disconnect() to remove any pending request responders, but it also disconnects the service's connection. Then call initialize().
/Update
You cannot stop the web service from executing, because that's beyond the Flex app's control, but you can limit the processing of the web service's response. For instance on the app, have a button like Cancel Search which sets a boolean bSearchCanceled to true.
The result handler for the web service call checks bSearchCanceled; if true just return.

Resources