Workflow services and operations not supported by states - workflow-foundation-4

I have two workflow services (state machines) that should cooperate and exchange information to accomplish the desired behavior.
The problem I have (but I also had it with only one state machine) is that sometimes I try to send an operation which is not allowed by the current state.
There are two problems: 1) I have to wait the operation timeout to know that the operation was not allowed 2) I'm "masking" real timeouts due to other problems
By now, I found two possible solutions: 1) I can change signatures to return true (allowed) and false (not allowed) and add all operations to all states, (not allowed operations would trigger a self-transition) 2) I always add all transitions to all states (not allowed would trigger a self-transition) but for transitions not allowed I will send an exception
I would like to know which is the best option (and, of course, I'd appreciate other possible solutions too).
I would also like to know how I could reply to a request with an exception (maybe throwing it within a try/catch?).

Another option here is to use the information in the workflow persistence store. One of the columns contains the active bookmarks and in the case of a Receive activity this is the SOAP operation. You could have a separate service that exposes that information for a given workflow instance.
You still need to cater of the fact that you might send messages to a workflow that is in a different state because the workflow persistence store isn't updated right away (unless you make it do so) and because multiple people might send messages to the same workflow instance. Still this basic technique works really well and I have used this to enable/disable buttons on the UI based on the state of a workflow.


Axon Event Processing Timeout

I am using an Axon Event Tracking processor. Sometimes events take longer that 10 seconds to process.
This seems to cause the message to be processed again and this appears in the log "Releasing claim of token X/0 failed. It was owned by another node."
If I up the number of segments it does not log this BUT the event is still processed twice so I think this might be misleading. (I think I was mistaken about this)
I have tried adjusting the fetchDelay, cleanupDelay and tokenClaimInterval. None of which has fixed this. Is there a property or something that I am missing?
The scenario taking longer than 10 seconds is making a HTTP request to an external service.
I'm using axon 4.1.2 with all default configuration when using with Spring auto configuration. I cannot see the Releasing claim on token and preparing for retry in [timeout]s log.
I was having this issue with a single segment and 2 instances of the application. I realised I hadn't increased the number of segments like I thought I had.
After further investigation I have discovered that adding an additional segment seems to have stopped this. Even if I have for example 2 segments and 6 applications it still doesn't reappear, however I'm not sure how this is different to my original scenario of 1 segment and 2 application?
I didn't realise it would be possible for multiple threads to grab the same tracking token and process the same event. It sounds like the best action would be to put an idem-potency check before the HTTP call?
The Releasing claim of token [event-processor-name]/[segment-id] failed. It was owned by another node. message can only occur in three scenarios:
You are performing a merge operation of two segments which fails because the given thread doesn't own both segments.
The main event processing loop of the TrackingEventProcessor is stopped, but releasing the token claim fails because the token is already claimed by another thread.
The main event processing loop has caught an Exception, making it retry with a exponential back-off, and it tries to release the claim (which might fail with the given message).
I am guessing it's not options 1 and 2, so that would leave us with option 3. This should also mean you are seeing other WARN level messages, like:
Releasing claim on token and preparing for retry in [timeout]s
Would you be able to share whether that's the case? That way we can pinpoint a little better what the exact problem is you are encountering.
By the way, very likely you have several processes (event handling threads of the TrackingEventProcessor) stealing the TrackingToken from one another. As they're stealing an un-updated token, both (or more) will handled the same event. Hence why you see the event handler being invoked twice.
Obviously undesirable behavior and something we should resolve for you. I would like to ask you to provide answers to my comments under the question, as right now I have to little to go on. Let us figure this out #Dan!
Thanks for updating your question #dan, that's very helpful.
From what you've shared, I am fairly confident that both instances are stealing the token from one another. This does depend though on whether both are using the same database for the token_entry table (although I am assuming they are).
If they are using the same table, then they should "nicely" share their work, unless one of them takes to long. If it takes to long, the token will be claimed by another process. This other process in this case is the thread of the TEP of your other application instance. The "claim timeout" is defaulted to 10 seconds, which also corresponds with the long running event handling process.
This claimTimeout is adjustable though, by invoking the Builder of the JpaTokenStore/JdbcTokenStore (depending on which you are using / auto wiring) and calling the JpaTokenStore.Builder#claimTimeout(TemporalAmount) method. And, I think this would be required on your end, giving the fact you have a long running operation.
There are of course different ways of tackling this. Like, making sure the TEP is only ran on a single instance (not really fault tolerant though), or offloading this long running operation to a schedule task which is triggered by the event.
But, I think we've found the issue at least, so I'd suggest to tweak the claimTimeout and see if the problem persists.
Let us know if this resolves the problem on your end #dan!

Event-sourcing: when (and not) should I use Message Queue?

I am building a project from scratch using event-sourcing with Java and Cassandra.
My apps we be based on microservices and in some use cases information will be processed asynchronously. I was wondering what part a Message Queue (such as Rabbit, Active MQ Artemis, Kafka, etc) would play to improve the technology stack in this environment and if I understand the scenarios if I won't use it.
I would start with separating messaging infrastructure like RabbitMQ from event streaming/storing/processing like Kafka. These are two different things made for two (or more) different purposes.
Concerning the event sourcing, you have to have a place where you must store events. This storage must be append-only and support fast reads of unstructured data based on an identity. One example of such persistence is the EventStore.
Event sourcing goes together with CQRS, which means you have to project your changes (event) to another store, which you can query. This is done by projecting events to that store, this is where events get processed to change the domain object state. It is important to understand that using message infrastructure for projections is generally a bad idea. This is due to the nature of messaging and two-phase commit issue.
If you look at how events get persisted, you can see that they get saved to the store as one transaction. If you then need to publish events, this will be another transaction. Since you are dealing with two different pieces of infrastructure, things can get broken.
The messaging issue as such is that messages are usually guaranteed to be delivered "at least once" and the order of messages is usually not guaranteed. Also, when your message consumer fails and NACKs the message, it will be redelivered but usually a bit later, again breaking the sequence.
The ordering and duplication concerns, whoever, do not apply to event streaming servers like Kafka. Also, the EventStore will guarantee once only event delivery in order if you use catch-up subscription.
In my experience, messages are used to send commands and to implement event-driven architecture to connect independent services in a reactive way. Event stores, at the other hand, are used to persist events and only events that get there are then projected to the query store and also get published to the message bus.
Make sure you are clear on the distinction between send(command) and publish(event). Udi Dahan touches on that topic in his essay on busses and brokers.
In most cases where you are event sourcing, you do not want to be reconstructing state from published events. If you need state, then query the technical authority/book of record for the history, and reconstruct the state from the history.
On the other hand, event driven activity off of a message queue should be fine. When a single event (plus the subscriber's state) has everything you need, then running off of the bus is fine.
In some cases, you might do both. For example, if you were updating cached views, you'd subscribe to various BobChanged events to know when your cached data was stale; to rebuild a stale view, you would reload a representation of the history and transform it into an updated view.
In the world of event-sourcing applications, message queues usually allow you to implement publish-subscribe pattern style of communication between producers and consumers. Also, they usually help you with delivery guarantees: which messages were delivered to which subscribers and which ones were not.
But they don't store all messages indefinitely. You need to have an event store to do any kind of event sourcing.
The question is not 'to queue or not to queue', but it is more like:
can this thing store huge volume of events indefinitely?
does it have publish-subscribe capabilities?
does it provide at-least-once delivery guarantees?
So, you should use something like Kafka or EventStore to have all that out-of-the-box. Alternatively, you can combine event store with message queue manually, but this is going to be more involved.

Event Driven Architecture - Service Contract Design

I'm having difficulty conceptualising a requirement I have into something that will fit into our nascent SOA/EDA
We have a component I'll call the Data Downloader. This is a facade for an external data provider that has both high latency and a cost associated with every request. I want to take this component and create a re-usable service out of it with a clear contract definition. It is up to me to decide how that contract should work, however its responsibilities are two-fold:
Maintain the parameter list (called a Download Definition) for an upcoming scheduled download
Manage the technical details of the communication to the external service
Basically, it manages the 'how' of the communication. The 'what' and the 'when' are the responsibilities of two other components:
The 'what' is managed by 'Clients' who are responsible for
determining the parameters for the download.
The 'when' is managed by a dedicated scheduling component. Because of the cost associated with the downloads we'd like to batch the requests intraday.
Hopefully this sequence diagram explains the responsibilities of the services:
Because each of the responsibilities are split out in three different components, we get all sorts of potential race conditions with async messaging. For instance when the Scheduler tells the Downloader to do its work, because the 'Append to Download Definition' command is asynchronous, there is no guarantee that the pending requests from Client A have actually been serviced. But this all screams high-coupling to me; why should the Scheduler necessarily know about any 'prerequisite' client requests that need to have been actioned before it can invoke a download?
Some potential solutions we've toyed with:
Make the 'Append to Download Definition' command a blocking request/response operation. But this then breaks the perf. and scalability benefits of having an EDA
Build something in the Downloader to ensure that it only runs when there are no pending commands in its incoming request queue. But that then introduces a dependency on the underlying messaging infrastructure which I don't like either.
Makes me think I'm thinking about this problem in a completely backward way. Or is this just a classic case of someone trying to fit a synchronous RPC requirement into an async event-driven architecture?
The thing I like most about EDA and SOA, is that it almost completely eliminates the notion of race condition. As long as your events are associated with some association key (e.g. downloadId), the problem you describe can be addressed with several solutions of different complexities - depending on your needs. I'm not sure I totally understand the described use-case but I will try my best
Out of the top of my head:
DataDownloader maintains a list of received Download Definitions and a list of triggered downloads. When a definition is received it is checked against the triggers list to see if the associated download has already been triggered, and if it was, execute the download. When a TriggerDownloadCommand is recieved, the definitions list is checked against a definition with the associated downloadId.
For more complex situation, consider using the Saga pattern, which is implemented by some 3rd party messaging infrastructures. With some simple configuration, it will handle both messages, and initiate the actual download when the required condition is satisfied. This is more appropriate for distributed systems, where an in-memory collection is out of the question.
You can also configure your scheduler (or the trigger command handler) to retry when an error is signaled (e.g. by an exception), in order to avoid that race condition, and ultimately give up after a specified timeout.
Does this help?

NServiceBus, when are too many message used?

When considering a service in NServiceBus at what point do you start questioning how many messages handled by a service is too much and start to break these into a new service?
Consider the following: I have a sales service which can currently be broken into a few distinct business components, these are sales order validation, sales order processing, purchase order validation and purchase order processing.
There are currently about 20 message handlers and 2 sagas used within this service. My concern is that during high volume traffic from my website this can cause an initial spike in the messages to jump into the hundreds. Considering that the messages need to be processed in the order they are taken off the queue this can cause a delay for the last in the queue ( depending on what processing each message does).
When separating concerns within a service into smaller business components I find this makes things a little easier. Sure, it's a logical separation, but it seems to provide a layer of clarity and understanding. To me it seems it seems an easier option to do this than creating new services where in the end the more services I have the more maintenance I need to do.
Does anyone have any similar concerns to this?
I think you have actually answered you own question :)
As soon as the message volume reaches a point where the lag becomes an issue you could look to instance your endpoint. You do not necessarily need to reduce the number of handlers. You could simply install the service a number of times and have specific message types sent to the relevant endpoint by mapping.
So it becomes a matter of a simple instance installation and some config changes. So you can then either split messages on sending so that messages from a particular source end up on a particular endpoint (maybe priority) or on message type.
I happened to do the same thing on a previous project (not using NServiecBus though) where we needed document conversion messages coming from the UI to be processed ASAP. We simply installed the conversion service again with its own set of queues and changed the UI configuration to send the conversion messages to the new endpoint. The background conversion messages were still going to the previous endpoint. So here the source determined the separation.

Does the concept of shared sessions exist in ASP.NET?

I am working on a web application (ASP.NET) game that would consist of a single page, and on that page, there would be a game board akin to Monopoly. I am trying to determine what the best architectural approach would be. The main requirements I have identified thus far are:
Up to six users share a single game state object.
The users need to keep (relatively) up to date on the current state of the game, i.e. whose turn it is, what did the active user just roll, how much money does each other user have, etc.
I have thought about keeping the game state in a database, but it seems like overkill to keep updating the database when a game state object (say, in a cache) could be kept up to date. For example, the flow might go like this:
Receive request for data from a user.
Look up data in database. Create object from that data.
Verify user has permissions to perform request based on the game's state (i.e. make sure it's really their turn or have enough money to buy that property).
Update the game object.
Write the game object back to the database.
Repeat for every single request.
Consider that a single server would be serving several concurrent games.
I have thought about using AJAX to make requests to an an ASP.NET page.
I have thought about using AJAX requests to a web service using silverlight.
I have thought about using WCF duplex channels in silverlight.
I can't figure out what the best approach is. All seem to have their drawbacks. Does anyone out there have experience with this sort of thing and care to share those experiences? Feel free to ask your own questions if I am being too ambiguous! Thanks.
Update: Does anyone have any suggestions for how to implement this connection to the server based on the three options I mention above?
You could use the ASP.Net Cache or the Application state to store the game object since these are shared between users. The cache would probably be the best place since objects can be removed from it to save memory.
If you store the game object in cache using a unique key you can then store the key in each visitors Session and use this to retrieve the shared game object. If the cache has been cleared you will recreate the object from the database.
While updating a database seems like overkill, it has advantages when it comes time to scale up, as you can have multiple webheads talking to one backend.
A larger concern is how you communicate the game state to the clients. While a full update of the game state from time to time ensures that any changes are caught and all clients remain in synchronization even if they miss a message, gamestate is often quite large.
Consider as well that usually you want gamestate messages to trigger animations or other display updates to portray the action (for example, of a piece moves, it shouldn't just appear at the destination in most cases... it should move across the board).
Because of that, one solution that combines the best of both worlds is to keep a database that collects all of the actions performed in a table, with sequential IDs. When a client requests an update, it can give all the actions after the last one it knew about, and the client can "act out" the moves. This means even if an request fails, it can simply retry the request and none of the actions will be lost.
The server can then maintain an internal view of the gamestate as well, from the same data. It can also reject illegal actions and prevent them from entering the game action table (and thus prevent other clients from being incorrectly updated).
Finally, because the server does have the "one true" gamestate, the clients can periodically check against that (which will allow you to find errors in your client or server code). Because the server database should be considered the primary, you can retransmit the entire gamestate to any client that gets incorrect state, so minor client errors won't (potentially) ruin the experience (except perhaps a pause while the state is downloaded).
Why don't you just create an application level object to store your details. See Application State and Global Variables in ASP.NET for details. You can use the sessionID to act as a key for the data for each player.
You could also use the Cache to do the same thing using a long time out. This does have the advantage that older data could be flushed from the Cache after a period of time ie 6 hours or whatever.
