ChangeFeedProcessorBuilder checkpointing after unsuccessful processing - azure-cosmosdb

I was investigating the behavior of a ChangeFeedProcessorBuilder processor1 that throws an exception or goes down while processing the particular change. Upon recovery, the same change will not be picked up anymore. Is there any way to checkpoint only after the successful processing of the notification?
The delegate is as follows:
var builder = container.GetChangeFeedProcessorBuilder("migrationProcessor",
(IReadOnlyCollection<object> input, CancellationToken cancellationToken) =>
{
Console.WriteLine(input.Count + " Changes Received by " + a);
// just first try will fail (static variable)
if (a++ == 0)
{
throw new Exception();
}
return Task.CompletedTask;
});
Thank you!

The default behavior of the Change Feed Processor is to checkpoint after a successful delegate execution: https://learn.microsoft.com/azure/cosmos-db/change-feed-processor#processing-life-cycle
The normal life cycle of a host instance is:
Read the change feed.
If there are no changes, sleep for a predefined amount of time (customizable with WithPollInterval in the Builder) and go to #1.
If there are changes, send them to the delegate.
When the delegate finishes processing the changes successfully, update the lease store with the latest processed point in time and go to #1.
If your delegate handler throws an unhandled exception, there is no checkpoint.
Adding from comments: The only scenario where the batch might not be retried is if the batch that throws is the first ever (lease has no Continuation). Because when the host picks up the lease again to reprocess, it has no point in time to retry from. Based on the official documentation, one lease is owned by a single instance, so there is no way that other instance could have picked up the same lease and be processing it in parallel (within the same Deployment Unit context).

Related

Why does Vertx throws a warning even with blocking attribute?

I have a Quarkus application where I use the event bus.
the code in question looks like this:
#ConsumeEvent(value = "execution-request", blocking = true)
#Transactional
#TransactionConfiguration(timeout = 3600)
public void consume(final Message<ExecutionRequest> msg) {
try {
execute(...);
} catch (final Exception e) {
// some logging
}
}
private void execute(...)
throws InterruptedException {
// it actually runs a long running task, but for
// this example this has the same effect
Thread.sleep(65000);
}
Why do I still get a
WARN [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-worker-thread-0,5,main] has been blocked for 63066 ms, time limit is 60000 ms: io.vertx.core.VertxException: Thread blocked
I'm I doing something wrong? Is the blocking parameter at the ConsumeEvent annotation not enough to let that handle in a separate Worker?
Your annotation is working as designed; the method is running in a worker thread. You can tell by both the name of the thread "vert.x-worker-thread-0", and by the 60 second timeout before the warnings were logged. The eventloop thread only has a 3 second timeout, I believe.
The default Vert.x worker thread pool is not designed for "very" long running blocking code, as stated in their docs:
Warning:
Blocking code should block for a reasonable amount of time (i.e no more than a few seconds). Long blocking operations or polling operations (i.e a thread that spin in a loop polling events in a blocking fashion) are precluded. When the blocking operation lasts more than the 10 seconds, a message will be printed on the console by the blocked thread checker. Long blocking operations should use a dedicated thread managed by the application, which can interact with verticles using the event-bus or runOnContext
That message mentions blocking for more than 10 seconds triggers a warning, but I think that's a typo; the default is actually 60.
To avoid the warning, you'll need to create a dedicated WorkerExecutor (via vertx.createSharedWorkerExecutor) configured with a very high maxExcecuteTime. However, it does not appear you can tell the #ConsumeEvent annotation to use it instead of the default worker pool, so you'd need to manually create an event bus consumer, as well, or use a regular #ConsumeEvent annotation, but call workerExectur.executeBlocking inside of it.

Cosmo ChangeFeed -Errors,exceptions and Service fail scenario's

All,
I am using Change Feed Processor Library.Want to know the best way to handle service failure along with the exceptions/errors scenario's in ProcessChangesAsync method. Below are the events am referring to.
1) Service failure - Service having the processor library crashed in the middle of some operation. How to start the process from the same document(doc on failure instance)? is there any inbuilt mechanism where change feed will start with the last failed documents? E.g. Let assume,in current batch we have 10 docs.5 processed successfully and then service breaks because of network failure or by some other reasons.Will my process starts with 6th document once service is re-started? How to achieve this?
2) Exception and Errors- Any errors in ProcessChangesAsync method can be handle using try catch at the global level but how to persist those failure records and make them available for the next batch? Again,looking for any available inbuilt mechanism in change feed process.
1) The Processor Library, by default, checkpoints after a successful run of ProcessChangesAsync. In the latest library version, you can customize the Checkpointer to do manual checkpoints in case you need it. If for some reason the processor shuts down before checkpointing, then it will start processing next from the the last successful checkpoint stored in the Leases collection. In your case, it will start with the first document again, so you will never lose a change but you could experience double processing (this is an "at least once" model).
2) There is no built-in mechanism that you can leverage, handling exceptions within the ProcessChangesAsync is your responsibility. You could not only add a global try/catch but, in the case you are looping over the documents, add a try/catch inside the loop, to handle a failing document (maybe send it to queue for later analysis/post-process) without losing the batch. If you require logging for those errors (I'm assuming that's what you mean by persisting errors?), then the latest version is compatible with LibLog, so plugging your own custom logging is as simple as:
using Microsoft.Azure.Documents.ChangeFeedProcessor.Logging;
var hostName = "SampleHost";
var tracelogProvider = new TraceLogProvider(); //You can use any provider supported by LibLog
using (tracelogProvider.OpenNestedContext(hostName))
{
LogProvider.SetCurrentLogProvider(tracelogProvider);
// After this, create IChangeFeedProcessor instance and start/stop it.
}
Source
Extra info for the comments
To avoid exceptions halting the batch or causing a batch to be reprocessed, you can have handling like this:
public async Task ProcessChangesAsync(IChangeFeedObserverContext context, IReadOnlyList<Document> documents, CancellationToken cancellationToken)
{
try
{
foreach(var document in documents)
{
try
{
// Do your work for the document
}
catch(Exception ex)
{
// Something happened with the current document, handle it, send it to a queue / another storage to analyze, log it. This catch will make the loop continue with the next.
}
}
}
catch(Exception ex)
{
// Something unhandled happened, log it and avoid throwing it again so the next batch is processed
}
}

How can I cancel/abort a zone in Dart?

I have an http web server that I'm trying to detect long-running requests and abort them. The following code successfully returns to the client upon timeout, but the async zone still continues to run to completion. How can I actually kill the request handler?
var zone = runZoned(() {
var timer = new Timer(new Duration(seconds: Config.longRequestTimeoutSeconds), () {
if (!completer.isCompleted) { // -- not already completed
log.severe('request timed out');
// TODO: This successfully responds to the client early, but it does nothing to abort the zone/handler that is already running.
// Even though the client will never see the result (and won't have to wait for it), the zone/handler will continue to run to completion as normal.
// TODO: Find a way to kill/abort/cancel the zone
completer.complete(new shelf.Response(HttpStatus.SERVICE_UNAVAILABLE, body: 'The server timed out while processing the request'));
}
});
return innerHandler(request) // -- handle request as normal (this may consist of several async futures within)
.then((shelf.Response response) {
timer.cancel(); // -- prevent the timeout intercept
if (!completer.isCompleted) { // -- not already completed (not timed out)
completer.complete(response);
}
})
.catchError(completer.completeError);
});
Nothing can kill running code except itself. If you want code to be interruptible, you need some way to tell it, and the code itself needs to terminate (likely by throwing an error). In this case, the innerHandler needs to be able to interrupt itself when requested. If it's not your code, that might not be possible.
You can write a zone that stops execution of asynchronous events when a flag is set (by modifying Zone.run etc.), but you must be very careful about that - it might never get to an asynchronous finally block and release resources if you start throwing away asynchronous events. So, that's not recommended as a general solution, only for very careful people willing to do manual resource management.

JMS - Cannot retrieve message from queue. Happens intermittently

We have a Java class that listens to a database (Oracle) queue table and process it if there are records placed in that queue. It worked normally in UAT and development environments. Upon deployment in production, there are times when it cannot read a record from the queue. When a record is inserted, it cannot detect it and the records remain in the queue. This seldom happens but it happens. If I would give statistic, out of 30 records queued in a day, about 8 don't make it. We would need to restart the whole app for it to be able to read the records.
Here is a code snippet of my class..
public class SomeListener implements MessageListener{
public void onMessage(Message msg){
InputStream input = null;
try {
TextMessage txtMsg = (TextMessage) msg;
String text = txtMsg.getText();
input = new ByteArrayInputStream(text.getBytes());
} catch (Exception e1) {
// TODO Auto-generated catch block
logger.error("Parsing from the queue.... failed",e1);
e1.printStackTrace();
}
//process text message
}
}
Weird thing we cant find any traces of exceptions from the logs.
Can anyone help? by the way we set the receiveTimeout to 10 secs
We would need to restart the whole app for it to be able to read the records.
The most common reason for this is the listener thread is "stuck" in user code (//process text message). You can take a thread dump with jstack or jvisualvm or similar to see what the thread is doing.
Another possibility (with low volume apps like this) is the network (most likely a router someplace in the network) silently closes an idle socket because it has not been used for some time. If the container (actually the broker's JMS client library) doesn't know the socket is dead, it will never receive any more messages.
The solution to the first is to fix the code; the solution to the second is to enable some kind of heartbeat or keepalives on the connection so that the network/router does not close the socket when it has no "real" traffic on it.
You would need to consult your broker's documentation about configuring heartbeats/keepalives.

Task#call() method invoked before task is executed

According to the documentation, Task#call() is "invoked when the Task is executed ".
Consider the following program:
import javafx.application.Application;
import javafx.concurrent.Task;
import javafx.stage.Stage;
public class TestTask extends Application {
Long start;
public void start(Stage stage) {
start = System.currentTimeMillis();
new Thread(new Taskus()).start();
}
public static void main(String[] args) {
launch();
}
class Taskus extends Task<Void> {
public Taskus() {
stateProperty().addListener((obs, oldValue, newValue) -> {
try {
System.out.println(newValue + " at " + (System.currentTimeMillis()-start));
} catch (Exception e) {
e.printStackTrace();
}
});
}
public Void call() throws InterruptedException {
for (int i = 0; i < 10000; i++) {
// Could be a lot longer.
}
System.out.println("Some code already executed." + " at " + (System.currentTimeMillis()-start));
Thread.sleep(3000);
return null;
}
}
}
Executing this program gives me the following output:
Some code already executed. after 5 milliseconds
SCHEDULED after 5 milliseconds
RUNNING after 7 milliseconds
SUCCEEDED after 3005 milliseconds
Why is the call() method invoked before the task is even scheduled? This makes no sense to me. In the task where I first saw the issue my task executed a few seconds before the task went into the SCHEDULED state. What if I want to give the user some feedback on the state, and nothing happens until the task has already been executed for a few seconds?
Why is the call() method invoked before the task is even scheduled?
TLDR; version: It's not. It's merely invoked before you get notified that it's been scheduled.
You have two threads running, essentially independently: the thread you explicitly create, and the FX Application Thread. When you start your application thread, it will invoke Taskus.call() on that thread. However, changes to the the task's properties are made on the FX Application Thread via calls to Platform.runLater(...).
So when you call start() on your thread, the following occurs behind the scenes:
A new thread is started
On that thread, an internal call() method in Task is called. That method:
Schedules a runnable to execute on the FX Application Thread, that changes the stateProperty of the task to SCHEDULED
Schedules a runnable to execute on the FX Application Thread, that changes the stateProperty of the task to RUNNING
Invokes your call method
When the FX Application Thread receives the runnable that changes the state of the task from READY to SCHEDULED, and later from SCHEDULED to RUNNING, it effects those changes and notifies any listeners. Since this is on a different thread to the code in your call method, there is no "happens-before" relationship between code in your call method and code in your stateProperty listeners. In other words, there is no guarantee as to which will happen first. In particular, if the FX Application Thread is already busy doing something (rendering the UI, processing user input, processing other Runnables passed to Platform.runLater(...), etc), it will finish those before it makes the changes to the task's stateProperty.
What you are guaranteed is that the changes to SCHEDULED and to RUNNING will be scheduled on the FX Application thread (but not necessarily executed) before your call method is invoked, and that the change to SCHEDULED will be executed before the change to RUNNING is executed.
Here's an analogy. Suppose I take requests from customers to write software. Think of my workflow as the background thread. Suppose I have an admin assistant who communicates with the customers for me. Think of her workflow as the FX Application thread. So when I receive a request from a customer, I tell my admin assistant to email the customer and notify them I received the request (SCHEDULED). My admin assistant dutifully puts that on her "to-do" list. A short while later, I tell my admin assistant to email the customer telling them I have started working on their project (RUNNING), and she adds that to her "to-do" list. I then start working on the project. I do a little work on the project, and then go onto Twitter and post a tweet (your System.out.println("Some code already executed")) "Working on a project for xxx, it's really interesting!". Depending on the number of things already on my assistant's "to-do" list, it's perfectly possible the tweet may appear before she sends the emails to the customer, and so perfectly possible the customer sees that I have started work on the project before seeing the email saying the work is scheduled, even though from the perspective of my workflow, everything occurred in the correct order.
This is typically what you want: the status property is designed to be used to update the UI, so it must run on the FX Application Thread. Since you are running your task on a different thread, you presumably want it to do just that: run in a different thread of execution.
It seems unlikely to me that a change to the scheduled state would be observed a significant amount of time (more than one frame rendering pulse, typically 1/60th second) after the call method actually started executing: if this is happening you are likely blocking the FX Application thread somewhere to prevent it from seeing those changes. In your example, the time delay is clearly minimal (less than a millisecond).
If you want to do something when the task starts, but don't care which thread you do it on, just do that at the beginning of the call method. (In terms of the analogy above, this would be the equivalent of me sending the emails to the customer, instead of requesting that my assistant do it.)
If you really need code in your call method to happen after some user notification has occurred on the FX Application Thread, you need to use the following pattern:
public class Taskus extends Task<Void> {
#Override
public Void call() throws Exception {
FutureTask<Void> uiUpdate = new FutureTask<Void>(() -> {
System.out.println("Task has started");
// do some UI update here...
return null ;
});
Platform.runLater(uiUpdate);
// wait for update:
uiUpdate.get();
for (int i = 0; i < 10000; i++) {
// any VM implementation worth using is going
// to ignore this loop, by the way...
}
System.out.println("Some code already executed." + " at " + (System.currentTimeMillis()-start));
Thread.sleep(3000);
return null ;
}
}
In this example, you are guaranteed to see "Task has started" before you see "Some code already executed". Additionally, since displaying the "Task has started" method happens on the same thread (the FX Application thread) as the changes in state to SCHEDULED and RUNNING, and since displaying the "Task has started" message is scheduled after those changes in state, you are guaranteed to see the transitions to SCHEDULED and RUNNING before you see the "Task has started" message. (In terms of the analogy, this is the same as me asking my assistant to send the emails, and then not starting any work until I know she has sent them.)
Also note that if you replace your original call to
System.out.println("Some code already executed." + " at " + (System.currentTimeMillis()-start));
with
Platform.runLater(() ->
System.out.println("Some code already executed." + " at " + (System.currentTimeMillis()-start)));
then you are also guaranteed to see the calls in the order you are expecting:
SCHEDULED after 5 milliseconds
RUNNING after 7 milliseconds
Some code already executed. after 8 milliseconds
SUCCEEDED after 3008 milliseconds
This last version is the equivalent in the analogy of me asking my assistant to post the tweet for me.

Resources