fetch 1 message at given offset when debugging app using spring-kafka - spring-kafka

I understand, that random access in inefficient. But app failed, record content is (let's assume it) big for logging or otherwise inappropriate (that's true without assumption), so I only have info, that record at this offset failed and why. Good, now I want to see the data, let's say to be able to reproduce it. How to do that?
OK, I can use ConsumerSeekAware consumer, but that will rewind the position and process all records from that position on. I don't want that, I want just 1 specific message. I can use specific consumer in specific consumer group for this use case not to influence others and set ConsumerConfig.MAX_POLL_RECORDS_CONFIG to 1 so that each pull returns just 1 record, but this will not stop all records from reaching the listener. Since there is no way how to call poll manually, programmatically. Right? Or is there such a way? Or other how to achieve this? Even if I try to reach spring-kafka internals, the org.apache.kafka.clients.consumer.The consumer seems to be made inaccessible on purpose, or at least I do not see the way.

Yes, you can just create your own consumer manually and poll it.
Get a reference to the consumer factory and call `createConsumer("tempGroup", "tempClient").
You would need to create a second consumer factory with max.poll.records=1.
You can copy the other properties from the main factory by calling getConfigurationProperties() - and creating a new map from it and create a new DefaultKafkaConsumerFactory.
Close the consumer when you are done.

Related

Two conflicting long lived process managers

Let assume we got two long lived process managers. Both sagas operates over 10 milion items for example. First saga adds something to each item. Second saga removes it from each item. Given both process managers need few minutes to complete its job if I run them simultaneously I get into troubles.
Part of those items would hold the value while rest of them not. The result is close to random actually and depends on command order that affect particular item. I wondered if redispatching "Remove" command in case of failure would solve the problem. I mean if you try remove non existing value you should wait for the first saga to add the value. But while process managers are working someone else may dispatch "Remove" or "Add" command. In such case my approach would fail.
How may I solve such problem? :)
It seems that you would want the second saga to not run if the first saga is running (and presumably not run until some process which depends on whatever the first saga added being there). So the apparent solution would be to have a component (could be a microservice, could also be a record in a strongly consistent datastore like zookeeper/etcd/consul) that gives permission for the sagas to start executing. An example protocol might look like:
Saga sends a message to the component identifying the saga and conveying the intention to start
Component validates that no sagas might be running which would prevent this saga from running
Component responds with permission to start running
Subsequent saga attempts result in rejection until the running saga tells the component it's OK to run the other saga
Assuming that this component is reliably durable, the failure mode to worry about is that permission is granted but this component never processes the message that the saga finished (causes of this could include the permission message not getting delivered/processed or the saga crashing). No amount of acknowledgements or extra messages can solve this (it's basically the Two Generals' Problem).
A mitigation is to have this component (or something watching this component) alert if it seems that too much time has passed without saga completion. Whatever/whoever is responsible for ensuring liveness would then investigate to see if the saga is still running and if none is running, inform the component that it's OK to run the other saga. Note that this is not foolproof: it's quite possible for the decider in question to make what turns out to be the wrong decision.
I feel like I need more context. Whilst you don't say it explicitly, is the problem that the second saga tries to remove values that haven't been added by the first?
If that was true, a simple solution would be to just use a third state.
What I mean by that is to just more explicitly define and declare item state. You currently seem to have two states with value, and without value, but nothing to indicate if an item is ready to be processed by the second saga because the first saga has already done it's work on the item in question.
So all that needs to happen is that the second saga keeps looking for items where:
(with_value == true & ready_for_saga2 == true)
Ready_for_saga2 or "Saga 1 processing complete", whatever seems more appropriate in your context.
I'd say that the solution would vary based on which actual problem, we're trying to solve.
Say it's an inventory and add are items added to the inventory and remove are items requested for delivery. Then the order of commands does not matter that much because you could just process the request for delivery, when new items are added to the inventory.
This would lead to an aggregate root with two collections: Items and PendingOrders.
One process manager adds new inventory to Items - if any orders are pending, it will complete these orders in the same transaction and remove both the item and the order from the collections.
If the other process manager adds an order (tries to remove an item), it will either do it right away, if there's any items left - or it will add the order to the pending orders to be processed when new items arrive (and maybe notify someone about the delay, while we're at it).
This way we end up with the same state regardless of the order of commands, but the actual real-world-problem has great influence on the model chosen.
If we have other real world problems, we can make a model those too.
Let's say you have two users that each starts a process that bulk updates titles on inventory items. In this case you - and the users - have to decide how best to resolve this conflict - what will lead to the best real world outcome.
If you want consistency across all the items - all or no items should be updated by a single bulk update - I would embed this knowledge in a new model. Let's call it UpdateTitlesProcesses. We have only one instance of this model in the system. The state is shared between processes. This model is effectually a command queue, and when a user initiates the bulk operation, it adds all the commands to the queue and starts processing each item one at a time.
When the second user initiates another title update, the business logic in our models will reject this, as there's already another update started. Or if the experts say that the last write should win, then we ditch the remaining commands from the first process and add the new ones (and similarly we should decide what should happen if a user issues a single title update, not bulk - should it be rejected, prioritized or put on hold?).
So in short I'd say:
Make it clear which real world problem we are solving - and thus which conflict resolution outcome is best (probably a trade off, often also something that requires user interaction or notification).
Model this explicitly (where processes, actions and conflict handling are also part of the model).

Return entity updated by axon command

What is the best way to get the updated representation of an entity after mutating it with a command.
For example, lets say I have a project like digital-restaurant and I want to be able to update a field on the restaurant and return it's current state to the client making the update (to retrieve any modifications by different processes).
When a restaurant is created, it is easy to retrieve the current state (ie: the projection representation) after dispatching the create command by subscribing to a FindRestaurantQuery and waiting until a record is returned (see Restaurant CommandController)
However, it isn't so simple to detect when the result of an UpdateCommand has been applied to the projection. For example,
if we use the same trick and subscribe to the FindRestaurantQuery, we will be notified if the restaurant has been modified,
but it may not be our command that triggered the modification (in the case where multiple processes are concurrently issuing
update commands).
There seems to be two obvious ways to detect when a given update command has been applied to the projection:
Have a unique ID associated with every update command.
Subscribe to a query that is updated when the command ID has been applied to the projection.
Propagate the unique ID to the event that is applied by the aggregate
When the projection receives the event, it can notify the query listener with the current state
Before dispatching an update command, query the existing state of the projection
Calculate the destination state given the contents of the update command
In the case of (1): is there any situation (eg: batching / snapshotting) where the event carrying the unique ID may be
skipped over somehow, preventing the query listener from being notified?
Is there a more reliable / more idiomatic way to accomplish this use case?
Axon 4 with Spring boot.
Although fully asynchronous designs may be preferable for a number of reasons, it is a common scenario that back-end teams are forced to provide synchronous REST API on top of asynchronous CQRS+ES back-ends.
The part of the demo application that is trying to solve this problem is located here https://github.com/idugalic/digital-restaurant/tree/master/drestaurant-apps/drestaurant-monolith-rest
The case you are mentioning is totally valid.
I would go with the option 1.
My only concern is that you have to introduce new unique ID associated with every update command attribute to the domain (events). This ID attribute does not have any Domain/Business value by my opinion. There is an Audit(who, when) attribute associated to every event already, and maybe you can use that to correlate commands and subscriptions. I believe that there is more value in this solution (identity is part of domain), if this is not to relaxing for your case.
Please note that Queries have to be extended with Audit in this case (you will know who requested the Query)

Looking for an efficient way to update the data

I'm writing a small game for Android in Unity. Basically the person have to guess whats on the photo. Now my boss wants me to add an additional function-> after successful/unsuccessful guess the player will get the panel to rate the photo (basically like or dislike), because we want to track which photos are not good/remove the photos after a couple of successful guesses.
My understanding is that if we want to add +1 to the variable in Firebase first I have to make the call and get it then we have to make a separate call with adding 1 to the value we got. I was wandering if there is a more efficient way to do it?
Thanks for any suggestions!
Instead of requesting firebase when you want to add ,you can request firebase in the beginning (onCreate like method) and save the object and then use it when you want to update it.
thanks
Well, one thing you can do is to store your data temporarily in some object, but NOT send it to Firebase right away. Instead, you can send the data to Firebase in times when the app/game is about to get paused/minimized; hence, reducing potential lags and increasing player satisfaction. OnApplicationPause(bool) is one of such functions that gets called when the game is minimized.
To do what you want, I would recommend using a Transaction instead of just doing a SetValueAsync. This lets you change values in your large shared database atomically, by first running your transaction against the local cache and later against the server data if it differs (see this question/answer).
This gets into some larger interesting bits of the Firebase Unity plugin. Reads/writes will run against your local cache, so you can do things like attach a listener to the "likes" node of a picture. As your cache syncs online and your transaction runs, this callback will be asynchronously triggered letting you keep the value up to date without worrying about syncing during app launch/shutdown/doing your own caching logic. This also means that generally, you don't have to worry too much about your online/offline state throughout your game.

What's the best way to create/use an ID throughout the processing of a message in Biztalk?

Our program so far: We have a process that involves multiple schemata, orchestrations and messages sent/received.
Our desire: To have an ID that links the whole process together when we log our progress into a SQL server table.
So far, we have a table that logs our progress but when there are multiple messages it is very difficult to read since Biztalk will, sometimes, process certain messages out of order.
E.g., we could have:
1 Beginning process for client1
2 Second item for client1
3 Third item for client1
4 Final item for client1
Easily followed if there's only one client being updated at a time. On the other hand, this will be much more likely:
1 Beginning process for client1
2 Beginning process for client2
3 Second item for client2
4 Third item for client2
5 Second item for client1
6 Third item for client1
7 Final item for client1
8 Final item for client2
It would be nice to have an ID throughout the whole thing so that the last listing could ordered by this ID field.
What is the best and/or quickest way to do this? We had thought to add an ID, we would create, from the initial moment of the first orchestration's triggering and keep passing that value to all the schemata and later orchestrations. This seems like a lot of work and would require we modify all the schemata - which just seems wrong.
Should we even be wanting to have such an ID? Any other solutions that come to mind?
This may not exactly be the easiest way, but have you looked at this:
http://blogs.msdn.com/b/appfabriccat/archive/2010/08/30/biztalk-application-tracing-made-easy-with-biztalk-cat-instrumentation-framework-controller.aspx
Basically it's an instrumentation framework which allows you to event out from pipelines, maps, orchs, etc.
When you write out to the event-trace you can use a "business key" which will tie mutltiple events together in a chain, similar to what you are saying.
Available here
http://btscatifcontroller.codeplex.com/
I'm not sure I fully understand all the details of your specific setup, but here goes:
If you can correlate the messages from the same client into a "long running" orchestration (which waits for subsequent messages from the same client), then the orchestration will have an automatically assigned ServiceId Guid, which will be kept throughout the orchestration.
As you say, for correlation purposes, you would usually try and use natural keys within the existing incoming message schemas to correlate subsequent messages back to the running orchestration - this way you don't need to change the schemas. In your example, ClientId might be a good correlation, provided that the same client cannot send multiple message 'sets' simultaneously. (and worst case, if you do add a new correlation key to the schemas, all systems involved in the orchestration will need to be changed to 'remember' this key and return it to you.) Again, assuming ClientId as a correlation key, in your example, 2 orchestrations would be running simultaneously - one for Client 1 and one for Client 2
However, for scalability and version control reasons, (very) long running orchestrations are generally to be avoided unless they are absolutely necessary (e.g. unless you can only trigger a process once all 4 client messages are received). If you decide to keep each message as a separate orchestration or just mapped and filtered on a port, another way to 'track' the sets of is by using BAM - you can use a continuation to tie all the client messages back together, e.g. for the purpose of a report or such.
Take a look at BAM. It's designed to do exactly what you describe: Using Business Activity Monitoring
This book has got a very good chapter about BAM and this tool, by one of the authors of the book, can help you developing your BAM solution. And finally, a nice BAM Poster.
Don't be put off by the initial complexity. When you get your head around it, BAM it's one of the coolest features of BizTalk.
Hope this helps. Good luck.
Biztalk assigns various values in the message context that usually persist for the life of the processing of that message. Such as the initial MessageId. Will that work for you?
In our application we have to use an externally provided ID (from the customer). We have a multi-part message with this id in part of it. You might consider that as well
You could create a UniqueId and StepId and pass them around in the message context. When a new process for a client starts set UniqueId to a Guid and StepId to 1. As it gets passed to the next process increment the StepId.
This would allow you to query events, grouped by client id and in the order (stepId) the event happened.

WF4 receive activity to be able to CreateInstance AND handle subsequent correlation

I want to create a workflow that will be persistent and which will consist of a Pick activity containing the following:
A Receive pick activity (ReceiveItem) which can Create a WF Instance using an email address parameter for correlation AND can also be called again later with the same email address and be picked up in correlation to start up the correct persisted WF. Each item is added to a queue for later processing
A MaxItems pick activity which will force the processing of the queue when it reaches a defined size and
A Timer pick activity which will simply process all queued items at the end of the day
Please Note: I want to receive the second and subsequent items via RecieveItem with the same email address parameter.
My question is:
Will this work as I suggest or am I going to get correlation collisions because the Receive activity can CreateInstance? Or will WF simply create a WF Instance at the beginning and then always correlate after that?
If this will not work how could I implement this with one single Receive activity and still get the benefit of single workflow handling the both the receive and batch operations?
That will work just fine. Check this blog post for an example of how to do that. The complete XAML is listed at the bottom if you want to inspect all Receive settings.

Resources