loose coupling of notifications in SOA - soa

The solution is composed of an orchestration process services and multiple legacy applications in charge of making CRUD operations on the domain entities.
Every time an update, add or delete statement is executed by these legacy applications a notification is sent by the entity owner application.
In the modeling phase we decided to map the entities fine-grained. In this way every CRUD operation can rise thousands of notifications(up to 20k) resulting to blocking users activity for a while becouse entity persistence and notification sending are combined in the same transaction. This can be inacceptable when it takes more than 120 seconds.
What i wanted to do is separate the user activity in legacy applications from entity persistence and notification sending deferring these to a specific application service(for example). I know the best would be deferring these activities to a background thread in the user application but as i mentioned i'm using very old legacy applications. Is there any SOA design patterns that can be applied to this scenario?

What you want is "Decoupled invocation". You accept the request and put it in a (persistent) queue and send an acknowledgment to the user that the request has been received. Depending on the scenarioYou can send an additional reply (e.g. by email) when the message has been fully processed.
.

Related

How to handle data replication lag and event notification

We have a simple application, who upon every update of an entity sends out a notification to SNS(it could very well have been any other queuing system). Clients are listening to these notifications and they do a get of updated entity based on these notifications.
The problem we are facing is, when clients do a get, sometimes data is not replicated and we return 404 or sometimes stale data(even worse).
How can we mitigate this while sending notifications?
Here are Few strategies to mitigate this with pros and cons
Instead of sending notification from application send notification using database streams
For example dynamodb streams ans aws lambda. This pattern can be useful in the case of multiregion deployment as well. where all the subscriber, publisher will subscribe to their regional database streams. And also atomicity of sending message and writing to database is preserved. And we wont loose events in the case of regional failure.
Send delayed messages to your broker
Some borkers like activemq and sqs support this functionality, but SNS does not. A workaround for that could be writing to sqs queue which then writes to sns. This might be a good option when your database does not support streams.
Send special error code for retry-able gets
Since we know that eventual consistency is there we can return special error code to clients, so that they can retry based on this error code. The retry strategy should be exponential backoff. but this may mean giving away your problems to clients. Also we should have some sort of versioning in place.
Fetch from another region
If entity is not found in the same region application can go to another region or master database to fetch it. NOTE Don't do this. as it is an anti pattern. I am mentioning it here just for the sake of completion.
Send the full entity in message
If entities to be fetched by rest service is small and there are no security constrain around who can access what, we can send the full entity in message. This is ensure that client don't have to do explicit fetch of it every time a new message is arrived.

Microservices client acknowledgement and Event Sourcing

Scenario
I am building courier service system using Microservices. I am not sure of few things and here is my Scenario
Booking API - This is where customer Place order
Payment API - This is where we process the payment against booking
Notification API - There service is responsible for sending the notification after everything is completed.
The system is using event-driven Architecture. When customer places booking order , i commit local transaction in booking API and publish event. Payment API and notification API are subscribed to their respective event . Once Done Payment and notification API need to acknowledge back to Booking API.
My Questions is
After publishing the event my booking service can't block the call and goes back to the client (front end). How does my client app will have to check the status of transaction or it would know that transaction is completed? Does it poll every couple of seconds ? Since this is distributed transaction and any service can go down and won't be able to acknowledge back . In that case how do my client (front end) would know since it will keep on waiting. I am considering saga for distributed transactions.
What's the best way to achieve all of this ?
Event Sourcing
I want to implement Event sourcing to track the complete track of the booking order. Does i have to implement this in my booking API with event store ? Or event store are shared between services since i am supposed to catch all the events from different services . What's the best way to implement this ?
Many Thanks,
The way I visualize this is as follows (influenced by Martin Kleppmann's talk here and here).
The end user places an order. The order is written to a Kafka topic. Since Kafka has a log structured storage, the order details will be saved in the least possible time. It's an atomic operation ('A' in 'ACID') - all or nothing
Now as soon as the user places the order, the user would like to read it back (read-your-write). To acheive this we can write the order data in a distributed cache as well. Although dual write is not usually a good idea as it may cause partial failure (e.g. writing to Kafka is successful, but writing to cache fails), we can mitigate this risk by ensuring that one of the Kafka consumer writes the data in a database. So, even in a rare scenario of cache failure, the user can read the data back from DB eventually.
The status of the order in the cache as written at the time of order creation is "in progress"
One or more kafka consumer groups are then used to handle the events as follows: the payment and notification are handled properly and the final status will be written back to the cache and database
A separate Kafka consumer will then receive the response from the payment and notification apis and write the updates to cache, DB and a web socket
The websocket will then update the UI model and the changes would be then reflected in the UI through event sourcing.
Further clarifications based on comment
The basic idea here is that we build a cache using streaming for every service with data they need. For e.g. the account service needs feedback from the payment and notification services. Therefore, we have these services write their response to some Kafka topic which has some consumers that write the response back to order service's cache
Based on the ACID properties of Kafka (or any similar technology), the message will never be lost. Eventually we will get all or nothing. That's atomicity. If the order service fails to write the order, an error response is sent back to the client in a synchronous way and the user probably retries after some time. If the order service is successful, the response to the other services must flow back to its cache eventually. If one of the services is down for some time, the response will be delayed, but it will be sent eventually when the service resumes
The clients need not poll. The result will be propagated to it through streaming using websocket. The UI page will listen to the websocket As the consumer writes the feedback in the cache, it can also write to the websocket. This will notify the UI. Then if you use something like Angular or ReactJS, the appropriate section of the UI can be refreshed with the value received at the websocket. Until that happens user keeps seeing the status "in progress" as was written to the cache at the time of order creation Even if the user refreshes the page, the same status is retrieved from the cache. If the cache value expires and follows a LRU mechanism, the same value will be fetched from the DB and wriitten back to the cache to serve future requests. Once the feedback from the other services are available, the new result will be streamed using websocket. On page refresh, new status would be available from the cache or DB
You can pass an Identifier back to client once the booking is completed and client can use this identifier to query the status of the subsequent actions if you can connect them on the back end. You can also send a notification back to the Client when other events are completed. You can do long polling or you can do notification.
thanks skjagini. part of my question is to handle a case where other
microservices don't get back in time or never. lets say payment api is
done working and charged the client but didn't notify my order service
in time or after very long time. how my client waits ? if we timeout
the client the backend may have processed it after timeout
In CQRS, you would separate the Commands and Querying. i.e, considering your scenario you can implement all interactions with Queues for interaction. (There are multiple implementations for CQRS with event sourcing, but in simplest form):
Client Sends a request --> Payment API receives the request --> Validates the request (if validation fails throws error back to the user) --> On successful validation --> generates a GUID and writes the message request to Queue --> passes the GUID to the user
Payment API subscribes the payment queue --> After processing the request --> writes to Order queue or any other queues
Order APi subscribes to Order Queue and processes the request.
User has a GUID which can get him data for all the interactions.
If use a pub/sub as in Kafka instead of Kafka (all other subsequent systems can read from the same topic, you don't need to write for each queue)
If any of the services fail to process, once the services are restarted they should be able to pick where they left off, if the services are down in the middle of a transaction as long as they roll back their resp changes you system should be stable condition
I'm not 100% sure what you are asking. But it sounds like you should be using a messaging service. As #Saptarshi Basu mentioned kafka is good. I would really recommend NATS - although I'm biased because that's the one I work with
With NATS you can create request-reply messages to interface between client and booking service. That's a 1-1 communication
If you have multiple instances of each of your services running, you can use the Queuing service to automatically load balance. NATS will just randomly select a server for you
And then you can use pub-sub feeds for communication between all of your services.
This will give you a very resilient and scalable architecture, and NATS makes it all incredibly easy

What's the recommended way to handle microservice processing bugs new insights?

Before I get to my question, let me sketch out a sample set of microservices to illustrate my dilemma.
Scenario outline
Suppose I have 4 microservices:
An activation service where features supplied to our customers are (de)activated. A registration service where members can be added and changed. A secured key service that is able to generate secure keys (in a multi step process) for members to be used when communicating with them with the outside world. And a communication service that is used to communicate about our members with external vendors.
The secured key service may however only request secured keys if this is a feature that is activated. Additionally, the communication service may only communicate about members that have a secured key AND if the communication feature itself is activated.
Because they are microservices, each of the services has it's own datastore and is completely self sufficient. That is, any data that is required from the other microservices is duplicated locally and kept in sync by means of asynchronous messages from the other microservices.
The dilemma
I'm actually facing two main dilemma's. The first is (pretty obviously) data synchronization. When there are multiple data stores that need to be kept in sync you have to account for messages getting lost or processed out of order. But there are plenty of out of the box solutions for this and when all fails you could even fall back to some kind of ETL process to keep things in sync.
The main issue I'm facing however is the actions that need to be performed. In the above example the secured key service must perform an action when it either
Receives a message from the registration service for a new member when it already knows that the secured keys feature is active in the activation service
Receives a message from the activation service that the secured keys feature is now active when it already knows about members from the registration service
In both cases this means that a message from the external system must lead to both an update in the local copy of the data as well as some logic that needs to be processed.
The question
Now to the actual question :)
What is the recommended way to cope with either bugs or new insights when it comes to handling those messages? Suppose there is a bug in the message handler from the activation service. The handler does update the internal data structure, but it fails to detect that there are already registered members and thus never starts the secure key generation process. Alternatively it could be that there's no bug, but we decide that there is something else we want the handler to do.
The system will have no reason to resubmit or reprocess messages (as the message didn't fail), but there's no real way for us to re-trigger the behavior that's behind the message.
I hope it's clear what I'm asking (and I do apologize if it should be posted on any of the other 170 Stack... sites, I only really know of StackOverflow)
I don't know what is the recommended way, I know how this is done in DDD and maybe this can help you as DDD and microservices are friends.
What you have is a long-running/multi-step process that involves information from multiple microservices. In DDD this can be implemented using a Saga/Process manager. The Saga maintains a local state by subscribing to events from both the registration service and the activation service. As the events come, the Saga check to see if it has all the information it needs to generate secure keys by submitting a CreateSecureKey command. The events may come in any order and even can be duplicated but this is not a problem as the Saga can compensate for this.
In case of bugs or new features, you could create special scripts or other processes that search for a particular situation and handle it by submitting specific compensating commands, without reprocessing all the past events.
In case of new features you may even have to process old events that now are interesting for your business process. You do this in the same way, by querying the events source for the newly interesting old events and send them to the newly updated Saga. After that import process, you subscribe the Saga to these newly interesting events and the Saga continues to function as usual.

SOA Publishing Messages vs Calling Procedures

The project I am working on is moving from an n-tier to a SOA architecture so I have been reading up on good SOA practices. I'm struggling to understand the dynamic between avoiding RPC style services in favor of event driven services, and the requirement of User Interfaces to retrieve data and do it speedily.
So for instance, ideally a SOA architecture would be composed of repeatable business process wherein you could simply publish a message onto an ESB which would handle finding the services that handle that message. So rather than executing a procedure called "Setup New User" which set out to do all the tasks related to new user setup, you would publish a message into the ESB that just contained the new user's details and had the appropriate document type "New User" and then the ESB would find services that handled that event that would then do whatever domain specific new user provisioning was required.
However, sometimes you just need data. Maybe you have a page that shows some list of user associated data. You can't just fire off a message into the ESB because you need data back and you need it now. Also, you aren't really triggering any business processes; you're just retrieving data from previously invoked business processes (the processes that caused the user to be associated with the data for instance). So to give a concrete example, maybe I just want to see the list of 10 Netflix movies a user has watched recently.
How do you reconcile these disparate types of services in a single SOA system?
In an ESB, where event-driven approach is followed, you have all kinds of listeners, that detect events and act accordingly. These listeners may wait for the appearance of direct messages via some protocol at certain endpoint for example. No matter what the trigger is - a purely business event that starts a business process or a technical call that just needs to retrieve some data, it is still an event that is handled by the ESB. So you are not technically breaking the event-driven approach - it is enforced by your ESB solution. Moreover keep in mind SOA doesn't impose such limitation - you do not have to implement everything in event driven manner.
In your case (provided, you don't have a dedicated BPM solution in place), I'd identify and implement two kinds of services on two purely conceptual layers in the ESB:
Technical services (the event is an incoming direct message for retrieval/modification of data), that can be either called directly by another system (via the ESB) or called by other process services.
Process services on the top (business) layer that are being triggered in a event-driven way (using topic queue for example, where process services listen for their triggering event)
However, this may not be the most optimal approach. I've been discussing business processes in a dedicated business process layer versus process services in the ESB in this topic. Feel free to check it out, because it is kind of related with your question.

Real Time Notification System using SignalR and Azure

I am trying to craft a facebook-like notification system in our ASP.NET MVC website
In a scenario, the notification system works like this
User1 follow User2 by clicking the follow button in the MVC site
the MVC site send a NotificationItem to the NotificationManager via API request.
POST api/notifications/send
NotificationManager then process this notificationItem and then save it to Azure Table Storage
class NotificationManager {
void SaveNotification(NotificationItem item) {
// save it to azure table storage
}
}
After saving item, the Client (User2) then subscribe to the notificationEvent via the NotificationHub (SignalR hub)
NotificationHub then notify User2 along with the processed notification data.
class NotificationHub: Hub {
async Task NotifyUser(string recipientId) {
// query data from storage and then process it
var notificationData= await _repo.GetProcessedNotificationDataAsync(recipientId);
Clients.Group(recipientId).notifyUser(notificationData);
}
}
I tried to illustrate the CURRENT process and the architecture on this image
Now, the one that bothers me is this line of code in step number 5
var notificationData= await _repo.GetProcessedNotificationDataAsync(recipientId);
What it does behind is query the notificationItem from storage, process it to user readable notification (ei. "User1 is now following you"), then update the status of the notificationItem (IsSent, DateSent) behind.
Needless to say, it performs somehow "heavy" operation. And it will be triggered in real-time every time there is a new NotificationItem to be delivered or broadcast to each subscriber.
Obviously, im talking about performance and scalability issue here. So I researched some possible technology or approach that can solve this problem. And seems like using Azure Service Bus backplane approach is a viable option
SignalR Scaleout with Azure Service Bus
Introduction to Scaleout in SignalR
One particular paragraph explains some limitations on this approach
Server broadcast (e.g., stock ticker): Backplanes work well for this
scenario, because the server controls the rate at which messages are
sent.
Client-to-client (e.g., chat): In this scenario, the backplane might
be a bottleneck if the number of messages scales with the number of
clients; that is, if the rate of messages grows proportionally as more
clients join
High-frequency realtime (e.g., real-time games): A backplane is not
recommended for this scenario.
Now - this one got me thinking, as in my case. Server Broadcast and Client to Client (and something in between) is applicable to what I am trying to achieve.
These are the notification event scenarios that I am currently working on (and its acceptance criteria)
Notify user for new followers (real-time or near-real-time notification)
Chat Messaging (real-time, will see if the chatter is currently typing)
Post a Status (real time or near real time)
Comment in a Post (real-time, will see if the chatter is currently typing)
After hours of thinking - What's in my mind right now (as a result of my lack of experience) is a
notification system something like this
As you will notice, this is very different from our current design. The current design only uses 1 Azure webrole.
In this, theres a webrole for the website, webrole for the hub, and a worker role to process the notification data. Therefore - distributing the "load" to 3 different roles as well as opening the possibility for scaling-out.
Now - this is my questions in my head right now.
Is this architecture right for what I am trying to achieve?
Since the notification is in the queue - can we achieve "Real-time" notification update on this?
Since we separate the SignalR hub in another web role - how we will handle authorization?
Service Bus Queue or Azure Queue?
Any help will be appreciated.
Note: I intend to only use Azure technologies as this project is really an Azure and .NET oriented.
Yes, your architecture looks like it should do the job, for a load balanced scenario.
If you use ServiceBus Queue it should be, you can integrate pretty easily (use the nuget package SignalRChat Microsoft.AspNet.SignalR.ServiceBus)
Auth will be fine also assuming your application is stateless and auth cookie will still be valid for both roles, assuming they are behind a load balanced scenario.
Service Bus. Here is some more info on how its done. http://www.asp.net/signalr/overview/performance/scaleout-with-windows-azure-service-bus

Resources