How to reliably schedule events in Corda? - corda

Many examples of using event scheduling in Corda -- including it's own reference implementation of the Interest Rate Swap contract -- rely on SchedulableState mechanism. Unfortunately there is not much information available in the Corda documentation regarding the use of the scheduler in a real business context. Reliable execution of scheduled business events is important, and, as a party who depends on a CorDapp to automatically trigger scheduled events in order to fulfil my contractual obligations, I'd like to implement robust controls for whether things that had been scheduled have actually run. For example, it is possible to envisage a situation whereby a scheduled flow kicked off, but then ended prematurely due to a bug in the code or node misconfiguration without achieving its indented purpose. The question therefore is what CorDapp designer and/or node operator could do to ensure reliable execution of scheduled activities?

SchedulableState normally is used to represent recurring events.
Please check out this HeartBeat sample: https://github.com/corda/samples/tree/release-V4/heartbeat
You can think of monthly payment or a paycheck that is reoccured every month.

Related

Axon FW 4.6 comes with Dead Letter Queue support, but is it possible to still have rollback support in case of exception?

we're looking into using the new feature of Axon, dead letter queue(DLQ).
In our previous application (axon 4.5x) we have a lot of eventhandlers updating projections. Our default is to rethrow exceptions when they occur, which will trigger a rollback for the database updates. Perhaps not the best practice to rely on this behaviour (because it can not rollback everything, eg sending an email from event can not be reverted of course)
This behaviour seems to get lost when introducing DLQ in our applications, which has big impact on our current application flow (projections are updated when they previously weren't). This makes upgrading not that easy.
Is it possible to still get the old behaviour(transaction rolled back in case of exceptions) together with DLQ processing?
What we tried was building a test application to test the new DLQ features. While playing around all looks fine in case of exceptions (they were moved to dlq) but the projections still got updated (not rolled back as before)
We throw an exception after the .save() of the projection simulating a database failure to see if events involved (we have multiple eventhandlers for an event updating projections) got rolled back.
You need to choose here, #davince. Storing a dead-letter in the dead-letter queue similarly requires a database transaction.
To ensure the token still progresses and the dead letter is entered, the framework uses the existing transaction.
Furthermore, in practical terms event handling was successful.
Hence, a rollback for some of the parts wouldn't be feasible.
The best way to deal with this, as is mentioned in the Reference Guide, is to make your event handlers idempotent:
Before configuring a SequencedDeadLetterQueue it is vital to validate whether your event handling functions are idempotent.
As a processing group consists of several Event Handling Components
(as explained in the intro of this chapter), some handlers may succeed
in event handling while others will not. As a configured dead-letter
queue does not stall event handling, a failure in one Event Handling
Component does not cause a rollback for other event handlers.
Furthermore, as the dead-letter support is on the processing group
level, dead-letter processing will invoke all event handlers for that
event within the processing group.
Thus, if your event handlers are not idempotent, processing letters may result in undesired side effects.
Hence, we strongly recommend making your event handlers idempotent when using the dead-letter queue.
The principle of exactly-once delivery is no longer guaranteed; at-least-once delivery is the reality to cope with.
I understand this requires some rework on your end, #davince. Although the feature is powerful, it comes with a certain level of responsibility before you use it.
I hope the above clarifies this for you.
Added, I'd like to point out that the version upgrade in itself does not require you to use the dead-letter queue. Hence, this change shouldn't impose any strains for updating to the latest release.
Update 1
Sometimes you need to think about an issue a bit longer.. I was just wondering the following things about your setup. Perhaps I can help out on that front:
What storage mechanism do you use to store projections in?
Where are you storing your tokens?
Where are you planning to store your dead-letters?
How are you invoking the storage layer from your event handlers?
Update 2
Thanks for sharing that you're using PostgreSQL for your projections, tokens, and dead letters. And that you're using JPA entities for storage.
It gives more certainty about your setup, as it may impact how your system would react in case of a rollback.
However, as it's rather vanilla/regular, the shared comment from the Reference Guide still applies.
This, sadly enough, means some work on your end, #davince. I hope the road forward to start using the SequencedDeadLetterQueue from Axon Framework is clear.
By the way, if you ever have recommendations on how the framework or the documentation may be improved, be sure to file issues in GitHub here and here, respectively.

Can I replay a Saga in Axon Framework without rerunning old transactions?

Is there a way to replay a Saga instance in Axon Framework in such a manner that the transactions that already happened are not replayed?
The simple answer to your question is no #2011ashwini. The Saga's you create in Axon Framework are backed by a dedicated type of Event Handling Component, which enforces that the reset API (provided by the TrackingEventProcessor) has zero impact.
It is this exact API which is the only means to tell Axon Framework you are doing triggering a replay, thus the only means to use things like Axon's #ResetHandler, #DisallowReplay and ReplayStatus parameter (as described further here). Only through these handles would you be able to know whether the Saga is handling events a second time (as you are replaying) yes or no.
The only way to thus forcefully replay a Saga, is by dropping the TrackingToken which is stored for it.
However, Saga's typically introduce side effects. Especially side effects are things you normally would not want to be replayed, ever. Hence why this is closed of by Axon's Saga logic to begin with.
On top of that, you've asked the following:
Is there a way to replay a Saga instance in Axon Framework in such a manner that the transactions that already happened are not replayed?
To be honest with you, that doesn't sound like a replay at all. The transaction which have been performed by the Saga are done so by the events it has handled. Replaying means re-handling those exact same events. Thus you are somewhat contradicting yourself with the question you are asking (hence why I asked for clarification).
Regardless, I hope this provides the additional background you need.

Event-sourcing: when (and not) should I use Message Queue?

I am building a project from scratch using event-sourcing with Java and Cassandra.
My apps we be based on microservices and in some use cases information will be processed asynchronously. I was wondering what part a Message Queue (such as Rabbit, Active MQ Artemis, Kafka, etc) would play to improve the technology stack in this environment and if I understand the scenarios if I won't use it.
I would start with separating messaging infrastructure like RabbitMQ from event streaming/storing/processing like Kafka. These are two different things made for two (or more) different purposes.
Concerning the event sourcing, you have to have a place where you must store events. This storage must be append-only and support fast reads of unstructured data based on an identity. One example of such persistence is the EventStore.
Event sourcing goes together with CQRS, which means you have to project your changes (event) to another store, which you can query. This is done by projecting events to that store, this is where events get processed to change the domain object state. It is important to understand that using message infrastructure for projections is generally a bad idea. This is due to the nature of messaging and two-phase commit issue.
If you look at how events get persisted, you can see that they get saved to the store as one transaction. If you then need to publish events, this will be another transaction. Since you are dealing with two different pieces of infrastructure, things can get broken.
The messaging issue as such is that messages are usually guaranteed to be delivered "at least once" and the order of messages is usually not guaranteed. Also, when your message consumer fails and NACKs the message, it will be redelivered but usually a bit later, again breaking the sequence.
The ordering and duplication concerns, whoever, do not apply to event streaming servers like Kafka. Also, the EventStore will guarantee once only event delivery in order if you use catch-up subscription.
In my experience, messages are used to send commands and to implement event-driven architecture to connect independent services in a reactive way. Event stores, at the other hand, are used to persist events and only events that get there are then projected to the query store and also get published to the message bus.
Make sure you are clear on the distinction between send(command) and publish(event). Udi Dahan touches on that topic in his essay on busses and brokers.
In most cases where you are event sourcing, you do not want to be reconstructing state from published events. If you need state, then query the technical authority/book of record for the history, and reconstruct the state from the history.
On the other hand, event driven activity off of a message queue should be fine. When a single event (plus the subscriber's state) has everything you need, then running off of the bus is fine.
In some cases, you might do both. For example, if you were updating cached views, you'd subscribe to various BobChanged events to know when your cached data was stale; to rebuild a stale view, you would reload a representation of the history and transform it into an updated view.
In the world of event-sourcing applications, message queues usually allow you to implement publish-subscribe pattern style of communication between producers and consumers. Also, they usually help you with delivery guarantees: which messages were delivered to which subscribers and which ones were not.
But they don't store all messages indefinitely. You need to have an event store to do any kind of event sourcing.
The question is not 'to queue or not to queue', but it is more like:
can this thing store huge volume of events indefinitely?
does it have publish-subscribe capabilities?
does it provide at-least-once delivery guarantees?
So, you should use something like Kafka or EventStore to have all that out-of-the-box. Alternatively, you can combine event store with message queue manually, but this is going to be more involved.

How TaskTrackers informs Jobtrackers about their state?

I read about the Apache Hadoop. They said that in HDFS, tasks are any process, that is, mapper or reducer. And they together called jobs.
They have two things, JOBTRACKER, and TASKTRACKER , tasktracker is on each node that manages mapper or reducer tasks.
And, Jobtracker is the one, who manges all task-trackers.
Till now I understand all the concpts theoretically, and all the things are well explained in many blogs.
But I have one doubt, how tasktracker inform jobtracker that given task fail. How they communicate each other. Is they using any other software just like , Apache AVRO.
Please explain me the internal mechanism of this.
Looking for your kind reply.
AVRO has nothing to do with this. It is just a serialization framework, which folks usually use if they feel that Hadoop's serialization is not helping them much. Otherwise it is just another member of the Hadoop ecosystem.
Coming to your original question, it is done through heartbeats, as #thiru_k has specified above. But along with the number of available slots heartbeat signals contains some other info as well, like job status, resource usage, etc. Tasks which don't report their progress for a while are marked as hung or killed. I would suggest you to go through this link, it'll answer all your questions.
The TaskTrackers sends out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated

Event Driven Architecture - Service Contract Design

I'm having difficulty conceptualising a requirement I have into something that will fit into our nascent SOA/EDA
We have a component I'll call the Data Downloader. This is a facade for an external data provider that has both high latency and a cost associated with every request. I want to take this component and create a re-usable service out of it with a clear contract definition. It is up to me to decide how that contract should work, however its responsibilities are two-fold:
Maintain the parameter list (called a Download Definition) for an upcoming scheduled download
Manage the technical details of the communication to the external service
Basically, it manages the 'how' of the communication. The 'what' and the 'when' are the responsibilities of two other components:
The 'what' is managed by 'Clients' who are responsible for
determining the parameters for the download.
The 'when' is managed by a dedicated scheduling component. Because of the cost associated with the downloads we'd like to batch the requests intraday.
Hopefully this sequence diagram explains the responsibilities of the services:
Because each of the responsibilities are split out in three different components, we get all sorts of potential race conditions with async messaging. For instance when the Scheduler tells the Downloader to do its work, because the 'Append to Download Definition' command is asynchronous, there is no guarantee that the pending requests from Client A have actually been serviced. But this all screams high-coupling to me; why should the Scheduler necessarily know about any 'prerequisite' client requests that need to have been actioned before it can invoke a download?
Some potential solutions we've toyed with:
Make the 'Append to Download Definition' command a blocking request/response operation. But this then breaks the perf. and scalability benefits of having an EDA
Build something in the Downloader to ensure that it only runs when there are no pending commands in its incoming request queue. But that then introduces a dependency on the underlying messaging infrastructure which I don't like either.
Makes me think I'm thinking about this problem in a completely backward way. Or is this just a classic case of someone trying to fit a synchronous RPC requirement into an async event-driven architecture?
The thing I like most about EDA and SOA, is that it almost completely eliminates the notion of race condition. As long as your events are associated with some association key (e.g. downloadId), the problem you describe can be addressed with several solutions of different complexities - depending on your needs. I'm not sure I totally understand the described use-case but I will try my best
Out of the top of my head:
DataDownloader maintains a list of received Download Definitions and a list of triggered downloads. When a definition is received it is checked against the triggers list to see if the associated download has already been triggered, and if it was, execute the download. When a TriggerDownloadCommand is recieved, the definitions list is checked against a definition with the associated downloadId.
For more complex situation, consider using the Saga pattern, which is implemented by some 3rd party messaging infrastructures. With some simple configuration, it will handle both messages, and initiate the actual download when the required condition is satisfied. This is more appropriate for distributed systems, where an in-memory collection is out of the question.
You can also configure your scheduler (or the trigger command handler) to retry when an error is signaled (e.g. by an exception), in order to avoid that race condition, and ultimately give up after a specified timeout.
Does this help?

Resources