Two conflicting long lived process managers - asynchronous

Let assume we got two long lived process managers. Both sagas operates over 10 milion items for example. First saga adds something to each item. Second saga removes it from each item. Given both process managers need few minutes to complete its job if I run them simultaneously I get into troubles.
Part of those items would hold the value while rest of them not. The result is close to random actually and depends on command order that affect particular item. I wondered if redispatching "Remove" command in case of failure would solve the problem. I mean if you try remove non existing value you should wait for the first saga to add the value. But while process managers are working someone else may dispatch "Remove" or "Add" command. In such case my approach would fail.
How may I solve such problem? :)

It seems that you would want the second saga to not run if the first saga is running (and presumably not run until some process which depends on whatever the first saga added being there). So the apparent solution would be to have a component (could be a microservice, could also be a record in a strongly consistent datastore like zookeeper/etcd/consul) that gives permission for the sagas to start executing. An example protocol might look like:
Saga sends a message to the component identifying the saga and conveying the intention to start
Component validates that no sagas might be running which would prevent this saga from running
Component responds with permission to start running
Subsequent saga attempts result in rejection until the running saga tells the component it's OK to run the other saga
Assuming that this component is reliably durable, the failure mode to worry about is that permission is granted but this component never processes the message that the saga finished (causes of this could include the permission message not getting delivered/processed or the saga crashing). No amount of acknowledgements or extra messages can solve this (it's basically the Two Generals' Problem).
A mitigation is to have this component (or something watching this component) alert if it seems that too much time has passed without saga completion. Whatever/whoever is responsible for ensuring liveness would then investigate to see if the saga is still running and if none is running, inform the component that it's OK to run the other saga. Note that this is not foolproof: it's quite possible for the decider in question to make what turns out to be the wrong decision.

I feel like I need more context. Whilst you don't say it explicitly, is the problem that the second saga tries to remove values that haven't been added by the first?
If that was true, a simple solution would be to just use a third state.
What I mean by that is to just more explicitly define and declare item state. You currently seem to have two states with value, and without value, but nothing to indicate if an item is ready to be processed by the second saga because the first saga has already done it's work on the item in question.
So all that needs to happen is that the second saga keeps looking for items where:
(with_value == true & ready_for_saga2 == true)
Ready_for_saga2 or "Saga 1 processing complete", whatever seems more appropriate in your context.

I'd say that the solution would vary based on which actual problem, we're trying to solve.
Say it's an inventory and add are items added to the inventory and remove are items requested for delivery. Then the order of commands does not matter that much because you could just process the request for delivery, when new items are added to the inventory.
This would lead to an aggregate root with two collections: Items and PendingOrders.
One process manager adds new inventory to Items - if any orders are pending, it will complete these orders in the same transaction and remove both the item and the order from the collections.
If the other process manager adds an order (tries to remove an item), it will either do it right away, if there's any items left - or it will add the order to the pending orders to be processed when new items arrive (and maybe notify someone about the delay, while we're at it).
This way we end up with the same state regardless of the order of commands, but the actual real-world-problem has great influence on the model chosen.
If we have other real world problems, we can make a model those too.
Let's say you have two users that each starts a process that bulk updates titles on inventory items. In this case you - and the users - have to decide how best to resolve this conflict - what will lead to the best real world outcome.
If you want consistency across all the items - all or no items should be updated by a single bulk update - I would embed this knowledge in a new model. Let's call it UpdateTitlesProcesses. We have only one instance of this model in the system. The state is shared between processes. This model is effectually a command queue, and when a user initiates the bulk operation, it adds all the commands to the queue and starts processing each item one at a time.
When the second user initiates another title update, the business logic in our models will reject this, as there's already another update started. Or if the experts say that the last write should win, then we ditch the remaining commands from the first process and add the new ones (and similarly we should decide what should happen if a user issues a single title update, not bulk - should it be rejected, prioritized or put on hold?).
So in short I'd say:
Make it clear which real world problem we are solving - and thus which conflict resolution outcome is best (probably a trade off, often also something that requires user interaction or notification).
Model this explicitly (where processes, actions and conflict handling are also part of the model).

Related

In SCORM 2004 (4th ed.) when are Available Children meant to be selected and randomized?

The pseudocode for the Select Children Process [SR.1] and Randomize Children Process [SR.2] heavily suggests these processes are meant to be run multiple times although for SR.1 no behavior is defined when selection is meant to occur onEachNewAttempt.
Since both the Sequencing Request Process [SB.2.12] and the Navigation Request Process [NB.2.1] expect the Available Children to be selected/randomized and the Content Delivery Environment Process [DB.2] only initializes the new attempt after a traversal over the various Available Children has already happened, it seems like the LMS is meant to run both of these processes during initialization of the activity tree itself before attempting to deliver the first activity or handle any requests.
However this doesn't explain when SR.2 is meant to be re-run. Since DB.2 creates the new attempt progress information by iterating over the activity path from the root to the specified activity, randomizing each activity's Available Children along the way would result in the position of the specified activity within the activity tree changing after selection, which seems unintuitive. Further more, if one were to attempt to implement onEachNewAttempt for SR.1 this could also cause the selected activity to vanish from the available activities (though this would explain why its behavior is undefined in SCORM).
My understanding would be that the Available Children are meant to be initialized to the list of all children followed by SR.1 and SR.2 being applied to all activities starting from the root and that SR.2 is then re-applied in DB.2 for every activity in the path despite this changing the order of activities. Is this correct or am I missing something?
Upon re-reading section 4.7 in SN-4-48 it seems that the answer is that the selection and randomization should indeed happen once at the start of the sequencing session (i.e. on initialization) and then again in the End Attempt Process [UP.4] (although for onEachNewAttempt it actually states "prior to the first attempt", which could also be read as referring to the delivery process, DB.2).
What makes this a bit awkward is that UP.4 is applied in many places including immediately prior to delivery (in DB.2), which still means randomization could occur after an activity has already been selected and that randomization could happen multiple times in between a sequencing request and delivery.

Return entity updated by axon command

What is the best way to get the updated representation of an entity after mutating it with a command.
For example, lets say I have a project like digital-restaurant and I want to be able to update a field on the restaurant and return it's current state to the client making the update (to retrieve any modifications by different processes).
When a restaurant is created, it is easy to retrieve the current state (ie: the projection representation) after dispatching the create command by subscribing to a FindRestaurantQuery and waiting until a record is returned (see Restaurant CommandController)
However, it isn't so simple to detect when the result of an UpdateCommand has been applied to the projection. For example,
if we use the same trick and subscribe to the FindRestaurantQuery, we will be notified if the restaurant has been modified,
but it may not be our command that triggered the modification (in the case where multiple processes are concurrently issuing
update commands).
There seems to be two obvious ways to detect when a given update command has been applied to the projection:
Have a unique ID associated with every update command.
Subscribe to a query that is updated when the command ID has been applied to the projection.
Propagate the unique ID to the event that is applied by the aggregate
When the projection receives the event, it can notify the query listener with the current state
Before dispatching an update command, query the existing state of the projection
Calculate the destination state given the contents of the update command
In the case of (1): is there any situation (eg: batching / snapshotting) where the event carrying the unique ID may be
skipped over somehow, preventing the query listener from being notified?
Is there a more reliable / more idiomatic way to accomplish this use case?
Axon 4 with Spring boot.
Although fully asynchronous designs may be preferable for a number of reasons, it is a common scenario that back-end teams are forced to provide synchronous REST API on top of asynchronous CQRS+ES back-ends.
The part of the demo application that is trying to solve this problem is located here https://github.com/idugalic/digital-restaurant/tree/master/drestaurant-apps/drestaurant-monolith-rest
The case you are mentioning is totally valid.
I would go with the option 1.
My only concern is that you have to introduce new unique ID associated with every update command attribute to the domain (events). This ID attribute does not have any Domain/Business value by my opinion. There is an Audit(who, when) attribute associated to every event already, and maybe you can use that to correlate commands and subscriptions. I believe that there is more value in this solution (identity is part of domain), if this is not to relaxing for your case.
Please note that Queries have to be extended with Audit in this case (you will know who requested the Query)

fetch 1 message at given offset when debugging app using spring-kafka

I understand, that random access in inefficient. But app failed, record content is (let's assume it) big for logging or otherwise inappropriate (that's true without assumption), so I only have info, that record at this offset failed and why. Good, now I want to see the data, let's say to be able to reproduce it. How to do that?
OK, I can use ConsumerSeekAware consumer, but that will rewind the position and process all records from that position on. I don't want that, I want just 1 specific message. I can use specific consumer in specific consumer group for this use case not to influence others and set ConsumerConfig.MAX_POLL_RECORDS_CONFIG to 1 so that each pull returns just 1 record, but this will not stop all records from reaching the listener. Since there is no way how to call poll manually, programmatically. Right? Or is there such a way? Or other how to achieve this? Even if I try to reach spring-kafka internals, the org.apache.kafka.clients.consumer.The consumer seems to be made inaccessible on purpose, or at least I do not see the way.
Yes, you can just create your own consumer manually and poll it.
Get a reference to the consumer factory and call `createConsumer("tempGroup", "tempClient").
You would need to create a second consumer factory with max.poll.records=1.
You can copy the other properties from the main factory by calling getConfigurationProperties() - and creating a new map from it and create a new DefaultKafkaConsumerFactory.
Close the consumer when you are done.

Ngrx complex state reducer

I struggle finding the right way to mutate my state in an ngrx application as the state is rather complex and depending on many factors. This Question is not about doing one piece of code correct but more about how to design such a software in general, what are doe's and don'ts when finding some hacky solutions and workarounds.
The app 'evolved' by time and i wan't to share this process in an abstracted way to make my point clear:
Stage 1
State contains Entities. Those represent nodes in a tree and are linked by ids. Modifying or adding an entity requires a check about the type of nodes the new/modified ones should be connected with. Also it might be that upon modifying a node, other nodes had to be updated.
The solution was: create functions that do the job. Call them right in the reducer so everything is always up to date and synchronus when used (there are services that might modify state).
Stage 2
A configuration is added to the state having an impact on the way the automatically modifyed nodes are modifyed/created. This configuration is saved in it's own state right under the root state.
The solution:
1) Modify the actions to also take the required data from the configuration.
2) Modify the places where the actions are created/dispatched (add some ugly
this.state.select(fromRoot.getX)
.first()
.(subscribe(element => {this.state.dispatch(new Action({...old_payload, newPayload: element}))})
wrapper around the dispatch-calls)
3) modify the functions doing the node-modification and
4) adding the argument-passing to the function calls inside the reducer
Stage 3
Now i'am asked to again add another configuration to the process, also retrived by the backend and also saved in another state right under the root state
State now looks like:
root
|__nodes
|__config_1
|__config_2
i was just about to repeat the steps as in stage 2 but the actions get really ig with all the data passed in and functions have to carry around a lot of data. This seems to be wrong, when i actually dispatch the action on the state containing all the needed info.
How can i handle this correct?
Some ideasi already had:
use Effects: they are able to get everything from state they need and can create everything - so i only need to dispatch an action with only the actions payload, the effect then can grab everything from the state it needs. I don't like this idea because it triggers asynchronus tasks to modify the state and add not-state-changing actions.
use a service: with a service holding state it would be much like with effects but without using actions to just create asynchronus calls which then dispatch the actions that relly change state.
do all the stuffi n the component: at the moment the components are kept pretty simple when it comes to changing state as i prefer the idea that actions carry as little data as possible, since reducers can access the state to get theyr data - but this is where the problem occus, this time i can't get hands on the data i need.

Is context.executeQueryAsync a transactional operation?

Let's say i update multiple items in a loop and then call executeQueryAsync() on ClientContext class and this call returns error (failed callback is invoked). Can i be sure that not a sinle item was updated of all these i wanted to update? is there a chance that some of them will get updated and some of them will not? In other words, is this operation transactional? Thank you, i cannot find a single post about it.
I am asking about CSOM model not server solutions.
SharePoint handles its internal updates in a transactional method, so updating a document will actually be multiple calls to the DB that if one method fails, will roll back the other changes so nothing is half updated on a failure.
However, that is not made available to us as an external developer. If you create an update that updates 9 items within your executeQueryAsync call and it fails on #7, then the first 6 will not be rolled back. You are going to have to write code to handle the failures and if rolling back is important, then you will have to manually roll back the changes within your code.

Resources