How to handle data replication lag and event notification - asynchronous

We have a simple application, who upon every update of an entity sends out a notification to SNS(it could very well have been any other queuing system). Clients are listening to these notifications and they do a get of updated entity based on these notifications.
The problem we are facing is, when clients do a get, sometimes data is not replicated and we return 404 or sometimes stale data(even worse).
How can we mitigate this while sending notifications?

Here are Few strategies to mitigate this with pros and cons
Instead of sending notification from application send notification using database streams
For example dynamodb streams ans aws lambda. This pattern can be useful in the case of multiregion deployment as well. where all the subscriber, publisher will subscribe to their regional database streams. And also atomicity of sending message and writing to database is preserved. And we wont loose events in the case of regional failure.
Send delayed messages to your broker
Some borkers like activemq and sqs support this functionality, but SNS does not. A workaround for that could be writing to sqs queue which then writes to sns. This might be a good option when your database does not support streams.
Send special error code for retry-able gets
Since we know that eventual consistency is there we can return special error code to clients, so that they can retry based on this error code. The retry strategy should be exponential backoff. but this may mean giving away your problems to clients. Also we should have some sort of versioning in place.
Fetch from another region
If entity is not found in the same region application can go to another region or master database to fetch it. NOTE Don't do this. as it is an anti pattern. I am mentioning it here just for the sake of completion.
Send the full entity in message
If entities to be fetched by rest service is small and there are no security constrain around who can access what, we can send the full entity in message. This is ensure that client don't have to do explicit fetch of it every time a new message is arrived.

Related

IRC | asynchronous communitcation

I'm currently working on an IRC-Chat and we want to add the option to chat with other people privately (User-To-User) which works fine, but the messages aren't stored, meaning that a user loses all private messages after disconnecting. They also can't message a person once they have disconnected.
All of this isn't the case with channels, where messages are stored for X time, allowing a asynchronous communication.
Is there a way of allowing asynchronous messaging for private User-To-User messages without storing the messages in an extra system? Or is this simply a limitation of IRC?
IRC isn't designed to store messages - it's a feature, not a bug. One way to get around this is to configure the server to 'simulate' past messages (essentially pretending to be the other user to send past messages). To do this, you would need to store messages - however, it would be on the server itself, not on a separate database or a third party.

Microservices client acknowledgement and Event Sourcing

Scenario
I am building courier service system using Microservices. I am not sure of few things and here is my Scenario
Booking API - This is where customer Place order
Payment API - This is where we process the payment against booking
Notification API - There service is responsible for sending the notification after everything is completed.
The system is using event-driven Architecture. When customer places booking order , i commit local transaction in booking API and publish event. Payment API and notification API are subscribed to their respective event . Once Done Payment and notification API need to acknowledge back to Booking API.
My Questions is
After publishing the event my booking service can't block the call and goes back to the client (front end). How does my client app will have to check the status of transaction or it would know that transaction is completed? Does it poll every couple of seconds ? Since this is distributed transaction and any service can go down and won't be able to acknowledge back . In that case how do my client (front end) would know since it will keep on waiting. I am considering saga for distributed transactions.
What's the best way to achieve all of this ?
Event Sourcing
I want to implement Event sourcing to track the complete track of the booking order. Does i have to implement this in my booking API with event store ? Or event store are shared between services since i am supposed to catch all the events from different services . What's the best way to implement this ?
Many Thanks,
The way I visualize this is as follows (influenced by Martin Kleppmann's talk here and here).
The end user places an order. The order is written to a Kafka topic. Since Kafka has a log structured storage, the order details will be saved in the least possible time. It's an atomic operation ('A' in 'ACID') - all or nothing
Now as soon as the user places the order, the user would like to read it back (read-your-write). To acheive this we can write the order data in a distributed cache as well. Although dual write is not usually a good idea as it may cause partial failure (e.g. writing to Kafka is successful, but writing to cache fails), we can mitigate this risk by ensuring that one of the Kafka consumer writes the data in a database. So, even in a rare scenario of cache failure, the user can read the data back from DB eventually.
The status of the order in the cache as written at the time of order creation is "in progress"
One or more kafka consumer groups are then used to handle the events as follows: the payment and notification are handled properly and the final status will be written back to the cache and database
A separate Kafka consumer will then receive the response from the payment and notification apis and write the updates to cache, DB and a web socket
The websocket will then update the UI model and the changes would be then reflected in the UI through event sourcing.
Further clarifications based on comment
The basic idea here is that we build a cache using streaming for every service with data they need. For e.g. the account service needs feedback from the payment and notification services. Therefore, we have these services write their response to some Kafka topic which has some consumers that write the response back to order service's cache
Based on the ACID properties of Kafka (or any similar technology), the message will never be lost. Eventually we will get all or nothing. That's atomicity. If the order service fails to write the order, an error response is sent back to the client in a synchronous way and the user probably retries after some time. If the order service is successful, the response to the other services must flow back to its cache eventually. If one of the services is down for some time, the response will be delayed, but it will be sent eventually when the service resumes
The clients need not poll. The result will be propagated to it through streaming using websocket. The UI page will listen to the websocket As the consumer writes the feedback in the cache, it can also write to the websocket. This will notify the UI. Then if you use something like Angular or ReactJS, the appropriate section of the UI can be refreshed with the value received at the websocket. Until that happens user keeps seeing the status "in progress" as was written to the cache at the time of order creation Even if the user refreshes the page, the same status is retrieved from the cache. If the cache value expires and follows a LRU mechanism, the same value will be fetched from the DB and wriitten back to the cache to serve future requests. Once the feedback from the other services are available, the new result will be streamed using websocket. On page refresh, new status would be available from the cache or DB
You can pass an Identifier back to client once the booking is completed and client can use this identifier to query the status of the subsequent actions if you can connect them on the back end. You can also send a notification back to the Client when other events are completed. You can do long polling or you can do notification.
thanks skjagini. part of my question is to handle a case where other
microservices don't get back in time or never. lets say payment api is
done working and charged the client but didn't notify my order service
in time or after very long time. how my client waits ? if we timeout
the client the backend may have processed it after timeout
In CQRS, you would separate the Commands and Querying. i.e, considering your scenario you can implement all interactions with Queues for interaction. (There are multiple implementations for CQRS with event sourcing, but in simplest form):
Client Sends a request --> Payment API receives the request --> Validates the request (if validation fails throws error back to the user) --> On successful validation --> generates a GUID and writes the message request to Queue --> passes the GUID to the user
Payment API subscribes the payment queue --> After processing the request --> writes to Order queue or any other queues
Order APi subscribes to Order Queue and processes the request.
User has a GUID which can get him data for all the interactions.
If use a pub/sub as in Kafka instead of Kafka (all other subsequent systems can read from the same topic, you don't need to write for each queue)
If any of the services fail to process, once the services are restarted they should be able to pick where they left off, if the services are down in the middle of a transaction as long as they roll back their resp changes you system should be stable condition
I'm not 100% sure what you are asking. But it sounds like you should be using a messaging service. As #Saptarshi Basu mentioned kafka is good. I would really recommend NATS - although I'm biased because that's the one I work with
With NATS you can create request-reply messages to interface between client and booking service. That's a 1-1 communication
If you have multiple instances of each of your services running, you can use the Queuing service to automatically load balance. NATS will just randomly select a server for you
And then you can use pub-sub feeds for communication between all of your services.
This will give you a very resilient and scalable architecture, and NATS makes it all incredibly easy

What's the recommended way to handle microservice processing bugs new insights?

Before I get to my question, let me sketch out a sample set of microservices to illustrate my dilemma.
Scenario outline
Suppose I have 4 microservices:
An activation service where features supplied to our customers are (de)activated. A registration service where members can be added and changed. A secured key service that is able to generate secure keys (in a multi step process) for members to be used when communicating with them with the outside world. And a communication service that is used to communicate about our members with external vendors.
The secured key service may however only request secured keys if this is a feature that is activated. Additionally, the communication service may only communicate about members that have a secured key AND if the communication feature itself is activated.
Because they are microservices, each of the services has it's own datastore and is completely self sufficient. That is, any data that is required from the other microservices is duplicated locally and kept in sync by means of asynchronous messages from the other microservices.
The dilemma
I'm actually facing two main dilemma's. The first is (pretty obviously) data synchronization. When there are multiple data stores that need to be kept in sync you have to account for messages getting lost or processed out of order. But there are plenty of out of the box solutions for this and when all fails you could even fall back to some kind of ETL process to keep things in sync.
The main issue I'm facing however is the actions that need to be performed. In the above example the secured key service must perform an action when it either
Receives a message from the registration service for a new member when it already knows that the secured keys feature is active in the activation service
Receives a message from the activation service that the secured keys feature is now active when it already knows about members from the registration service
In both cases this means that a message from the external system must lead to both an update in the local copy of the data as well as some logic that needs to be processed.
The question
Now to the actual question :)
What is the recommended way to cope with either bugs or new insights when it comes to handling those messages? Suppose there is a bug in the message handler from the activation service. The handler does update the internal data structure, but it fails to detect that there are already registered members and thus never starts the secure key generation process. Alternatively it could be that there's no bug, but we decide that there is something else we want the handler to do.
The system will have no reason to resubmit or reprocess messages (as the message didn't fail), but there's no real way for us to re-trigger the behavior that's behind the message.
I hope it's clear what I'm asking (and I do apologize if it should be posted on any of the other 170 Stack... sites, I only really know of StackOverflow)
I don't know what is the recommended way, I know how this is done in DDD and maybe this can help you as DDD and microservices are friends.
What you have is a long-running/multi-step process that involves information from multiple microservices. In DDD this can be implemented using a Saga/Process manager. The Saga maintains a local state by subscribing to events from both the registration service and the activation service. As the events come, the Saga check to see if it has all the information it needs to generate secure keys by submitting a CreateSecureKey command. The events may come in any order and even can be duplicated but this is not a problem as the Saga can compensate for this.
In case of bugs or new features, you could create special scripts or other processes that search for a particular situation and handle it by submitting specific compensating commands, without reprocessing all the past events.
In case of new features you may even have to process old events that now are interesting for your business process. You do this in the same way, by querying the events source for the newly interesting old events and send them to the newly updated Saga. After that import process, you subscribe the Saga to these newly interesting events and the Saga continues to function as usual.

API vs Event in Microservice approach

What about Smart endpoints and dumb pipes in terms of different type of requests?
After reading that I was thinking that it's enough to subscribe for some events and deal with that. But now I've realised that sometimes you should have opened API (maybe not for the end customers, but for the API Gateway etc). Is this ok? Or you should "eventize" (transform into event) any request which coming to Microservices cloud?
So, for instance, you have Invoice and Order services.
It's clear that when order created you might use an event which might be consumed by Invoice service to create an invoice. It's clear that for receiving list of last user's orders you may use CQRS on Order service side or even just make new service LastOrders which will keep just projection of required data. But should this request transformed into event or LastOrders should provide API for that and listen for events to update it's own DB?
We do it like this:
All commands are issued as messages in durable queues with type-based routing
Processing takes places in isolated handlers
REST POST and PUT are only created for the API that should be accessible from legacy/external systems
These "command"-style REST endpoints only form command as a message and send it via the message bus
REST GET is perfect for fetching the data and we do not use messaging there, although we could have some message handlers to retrieve data for long-running processes that can only use messages
Command (message) handlers always publish events about what they have done or not done
Downstream event processing can do whatever they want by subscribing to these events

Check database for changes via long polling

Im creating a chat app in ASP.NET MVC3.
im using long polling and AsyncController to do so
when a user posts a chat its saved in database , to retrieve should i constantly check database for change in record or after definite interval
or is there an better/ efficient way of doing it
i came across this question but could not get a usable answer.
You may take a look at SignalR for an efficient way. Contrary to the standard polling mechanism (in which you are sending requests at regular intervals to check for changes), SignalR uses a push mechanism in which the server sends notifications to connected clients to notify them about changes.
Since you're already using long polling and an asynccontrolller, why not create a message pool? Take a look at this solution.
In a nutshell, instead of just writing the updated chat to the database, you should also stick it in some sort of queue. Then each user's async thread is listening to that pool waiting for a message to appear. When one appears return the data to the user through your normal operation. When all listening threads have picked up the message it can be removed from the queue. This will prevent you from having several threads hammering your database looking for a new message.
You can give PServiceBus(http://pservicebus.codeplex.com/) a try and here is a sample web chat app(http://74.208.226.12/ChatApp/chat.html) running and does not need database in between to pass message between two web clients. If you want to persist data in the database for logging sake, you can always subscribe to the chat message and log it to database.

Resources