API vs Event in Microservice approach - soa

What about Smart endpoints and dumb pipes in terms of different type of requests?
After reading that I was thinking that it's enough to subscribe for some events and deal with that. But now I've realised that sometimes you should have opened API (maybe not for the end customers, but for the API Gateway etc). Is this ok? Or you should "eventize" (transform into event) any request which coming to Microservices cloud?
So, for instance, you have Invoice and Order services.
It's clear that when order created you might use an event which might be consumed by Invoice service to create an invoice. It's clear that for receiving list of last user's orders you may use CQRS on Order service side or even just make new service LastOrders which will keep just projection of required data. But should this request transformed into event or LastOrders should provide API for that and listen for events to update it's own DB?

We do it like this:
All commands are issued as messages in durable queues with type-based routing
Processing takes places in isolated handlers
REST POST and PUT are only created for the API that should be accessible from legacy/external systems
These "command"-style REST endpoints only form command as a message and send it via the message bus
REST GET is perfect for fetching the data and we do not use messaging there, although we could have some message handlers to retrieve data for long-running processes that can only use messages
Command (message) handlers always publish events about what they have done or not done
Downstream event processing can do whatever they want by subscribing to these events

Related

Microservices without synchronous communication possible?

I know this question was already asked in a lot of ways and flavors, I wanted to add another way and a concrete example.
Basically I know the we should avoid synchronous communication, I was just wondering if there are some patterns to really avoid all of it. Let me give you a short example for a situation in which I wouldn't know how to make it asynchronous:
I have a service that is managing e.g. users, basically a DB that hast users saved and their configuration etc.
Now another service that is the API Gates provides the Endpoint to register the user. And this is the point where the communication becomes a problem: if the register endpoint is called we somehow have to call synchronously the user service because we e.g. need the userId of the newly create user. So this is a very abstract example and needing the userId might not be needed in a lot of cases, but in generell I am curious about this patter:
A services needs to call another service in order to create a new resource but needs some kind of data of the newly created resource either to return it to it's caller or create locally some kind of connection between it's entities and the other services entities.
Is there some pattern for this or is this just a place where synchronous communication needs to happen?
What you are describing is the Orchestration vs Choreography patterns:
In the Orchestration pattern a microservice invokes its dependencies directly, just like in your example, a microservice invokes another to register the user and then uses the userId from the response.
On the other hand, we can have the Choreography pattern where we use need a message queue system, e.g., Kafka, RabbitMq, to decouple the microservices. The same example would work as following:
Your User-Manager microservice will publish an event (command) of type RegisterUser to the message queue, containing the user information.
The API Gates subscribes to the events of type RegisterUser and whenever it gets an event of that type it will create the user normally.
Now, the API Gates must let everyone know that the user was created, so it will publish another event of type UserCreated containing the user information, e.g., the userId.
Finally, the User Manager must also subscribe the UserCreated events, so it can proceed with the flow.
With this approach the two microservices do not know each other, they are decoupled, and you can have any number of dependencies subscribing the events, i.e., you can add new dependencies without needing to change the code.

Is there a specification for commands similar to events in OpenAPI?

I am building a microservice application, and I try to follow the best practices. I use event sourcing and event driven state transfer in many places, but I realized that sometimes I just need to call another service in an asynchronous way to kindly ask it to do something e.g. send out a registration email (as the email service is a technical component and not a domain). I noticed that many times, services just call the other's service API endpoint, but that wouldn't be asynchronous. As I don't expect any returned value when calling from another service, the command would only produce events, RPC is not necessary.
In the end, my plan is to implement commands/actions that can be triggered by clients from REST API (then the commands may also produce responses) or by events or other services from RabbitMQ or similar. This leads me to how should I define the data structure of the command/action, is there any specification for that? Or existing solutions for Python? Or should I do something differently?

How to handle data replication lag and event notification

We have a simple application, who upon every update of an entity sends out a notification to SNS(it could very well have been any other queuing system). Clients are listening to these notifications and they do a get of updated entity based on these notifications.
The problem we are facing is, when clients do a get, sometimes data is not replicated and we return 404 or sometimes stale data(even worse).
How can we mitigate this while sending notifications?
Here are Few strategies to mitigate this with pros and cons
Instead of sending notification from application send notification using database streams
For example dynamodb streams ans aws lambda. This pattern can be useful in the case of multiregion deployment as well. where all the subscriber, publisher will subscribe to their regional database streams. And also atomicity of sending message and writing to database is preserved. And we wont loose events in the case of regional failure.
Send delayed messages to your broker
Some borkers like activemq and sqs support this functionality, but SNS does not. A workaround for that could be writing to sqs queue which then writes to sns. This might be a good option when your database does not support streams.
Send special error code for retry-able gets
Since we know that eventual consistency is there we can return special error code to clients, so that they can retry based on this error code. The retry strategy should be exponential backoff. but this may mean giving away your problems to clients. Also we should have some sort of versioning in place.
Fetch from another region
If entity is not found in the same region application can go to another region or master database to fetch it. NOTE Don't do this. as it is an anti pattern. I am mentioning it here just for the sake of completion.
Send the full entity in message
If entities to be fetched by rest service is small and there are no security constrain around who can access what, we can send the full entity in message. This is ensure that client don't have to do explicit fetch of it every time a new message is arrived.

Microservices client acknowledgement and Event Sourcing

Scenario
I am building courier service system using Microservices. I am not sure of few things and here is my Scenario
Booking API - This is where customer Place order
Payment API - This is where we process the payment against booking
Notification API - There service is responsible for sending the notification after everything is completed.
The system is using event-driven Architecture. When customer places booking order , i commit local transaction in booking API and publish event. Payment API and notification API are subscribed to their respective event . Once Done Payment and notification API need to acknowledge back to Booking API.
My Questions is
After publishing the event my booking service can't block the call and goes back to the client (front end). How does my client app will have to check the status of transaction or it would know that transaction is completed? Does it poll every couple of seconds ? Since this is distributed transaction and any service can go down and won't be able to acknowledge back . In that case how do my client (front end) would know since it will keep on waiting. I am considering saga for distributed transactions.
What's the best way to achieve all of this ?
Event Sourcing
I want to implement Event sourcing to track the complete track of the booking order. Does i have to implement this in my booking API with event store ? Or event store are shared between services since i am supposed to catch all the events from different services . What's the best way to implement this ?
Many Thanks,
The way I visualize this is as follows (influenced by Martin Kleppmann's talk here and here).
The end user places an order. The order is written to a Kafka topic. Since Kafka has a log structured storage, the order details will be saved in the least possible time. It's an atomic operation ('A' in 'ACID') - all or nothing
Now as soon as the user places the order, the user would like to read it back (read-your-write). To acheive this we can write the order data in a distributed cache as well. Although dual write is not usually a good idea as it may cause partial failure (e.g. writing to Kafka is successful, but writing to cache fails), we can mitigate this risk by ensuring that one of the Kafka consumer writes the data in a database. So, even in a rare scenario of cache failure, the user can read the data back from DB eventually.
The status of the order in the cache as written at the time of order creation is "in progress"
One or more kafka consumer groups are then used to handle the events as follows: the payment and notification are handled properly and the final status will be written back to the cache and database
A separate Kafka consumer will then receive the response from the payment and notification apis and write the updates to cache, DB and a web socket
The websocket will then update the UI model and the changes would be then reflected in the UI through event sourcing.
Further clarifications based on comment
The basic idea here is that we build a cache using streaming for every service with data they need. For e.g. the account service needs feedback from the payment and notification services. Therefore, we have these services write their response to some Kafka topic which has some consumers that write the response back to order service's cache
Based on the ACID properties of Kafka (or any similar technology), the message will never be lost. Eventually we will get all or nothing. That's atomicity. If the order service fails to write the order, an error response is sent back to the client in a synchronous way and the user probably retries after some time. If the order service is successful, the response to the other services must flow back to its cache eventually. If one of the services is down for some time, the response will be delayed, but it will be sent eventually when the service resumes
The clients need not poll. The result will be propagated to it through streaming using websocket. The UI page will listen to the websocket As the consumer writes the feedback in the cache, it can also write to the websocket. This will notify the UI. Then if you use something like Angular or ReactJS, the appropriate section of the UI can be refreshed with the value received at the websocket. Until that happens user keeps seeing the status "in progress" as was written to the cache at the time of order creation Even if the user refreshes the page, the same status is retrieved from the cache. If the cache value expires and follows a LRU mechanism, the same value will be fetched from the DB and wriitten back to the cache to serve future requests. Once the feedback from the other services are available, the new result will be streamed using websocket. On page refresh, new status would be available from the cache or DB
You can pass an Identifier back to client once the booking is completed and client can use this identifier to query the status of the subsequent actions if you can connect them on the back end. You can also send a notification back to the Client when other events are completed. You can do long polling or you can do notification.
thanks skjagini. part of my question is to handle a case where other
microservices don't get back in time or never. lets say payment api is
done working and charged the client but didn't notify my order service
in time or after very long time. how my client waits ? if we timeout
the client the backend may have processed it after timeout
In CQRS, you would separate the Commands and Querying. i.e, considering your scenario you can implement all interactions with Queues for interaction. (There are multiple implementations for CQRS with event sourcing, but in simplest form):
Client Sends a request --> Payment API receives the request --> Validates the request (if validation fails throws error back to the user) --> On successful validation --> generates a GUID and writes the message request to Queue --> passes the GUID to the user
Payment API subscribes the payment queue --> After processing the request --> writes to Order queue or any other queues
Order APi subscribes to Order Queue and processes the request.
User has a GUID which can get him data for all the interactions.
If use a pub/sub as in Kafka instead of Kafka (all other subsequent systems can read from the same topic, you don't need to write for each queue)
If any of the services fail to process, once the services are restarted they should be able to pick where they left off, if the services are down in the middle of a transaction as long as they roll back their resp changes you system should be stable condition
I'm not 100% sure what you are asking. But it sounds like you should be using a messaging service. As #Saptarshi Basu mentioned kafka is good. I would really recommend NATS - although I'm biased because that's the one I work with
With NATS you can create request-reply messages to interface between client and booking service. That's a 1-1 communication
If you have multiple instances of each of your services running, you can use the Queuing service to automatically load balance. NATS will just randomly select a server for you
And then you can use pub-sub feeds for communication between all of your services.
This will give you a very resilient and scalable architecture, and NATS makes it all incredibly easy

What's the recommended way to handle microservice processing bugs new insights?

Before I get to my question, let me sketch out a sample set of microservices to illustrate my dilemma.
Scenario outline
Suppose I have 4 microservices:
An activation service where features supplied to our customers are (de)activated. A registration service where members can be added and changed. A secured key service that is able to generate secure keys (in a multi step process) for members to be used when communicating with them with the outside world. And a communication service that is used to communicate about our members with external vendors.
The secured key service may however only request secured keys if this is a feature that is activated. Additionally, the communication service may only communicate about members that have a secured key AND if the communication feature itself is activated.
Because they are microservices, each of the services has it's own datastore and is completely self sufficient. That is, any data that is required from the other microservices is duplicated locally and kept in sync by means of asynchronous messages from the other microservices.
The dilemma
I'm actually facing two main dilemma's. The first is (pretty obviously) data synchronization. When there are multiple data stores that need to be kept in sync you have to account for messages getting lost or processed out of order. But there are plenty of out of the box solutions for this and when all fails you could even fall back to some kind of ETL process to keep things in sync.
The main issue I'm facing however is the actions that need to be performed. In the above example the secured key service must perform an action when it either
Receives a message from the registration service for a new member when it already knows that the secured keys feature is active in the activation service
Receives a message from the activation service that the secured keys feature is now active when it already knows about members from the registration service
In both cases this means that a message from the external system must lead to both an update in the local copy of the data as well as some logic that needs to be processed.
The question
Now to the actual question :)
What is the recommended way to cope with either bugs or new insights when it comes to handling those messages? Suppose there is a bug in the message handler from the activation service. The handler does update the internal data structure, but it fails to detect that there are already registered members and thus never starts the secure key generation process. Alternatively it could be that there's no bug, but we decide that there is something else we want the handler to do.
The system will have no reason to resubmit or reprocess messages (as the message didn't fail), but there's no real way for us to re-trigger the behavior that's behind the message.
I hope it's clear what I'm asking (and I do apologize if it should be posted on any of the other 170 Stack... sites, I only really know of StackOverflow)
I don't know what is the recommended way, I know how this is done in DDD and maybe this can help you as DDD and microservices are friends.
What you have is a long-running/multi-step process that involves information from multiple microservices. In DDD this can be implemented using a Saga/Process manager. The Saga maintains a local state by subscribing to events from both the registration service and the activation service. As the events come, the Saga check to see if it has all the information it needs to generate secure keys by submitting a CreateSecureKey command. The events may come in any order and even can be duplicated but this is not a problem as the Saga can compensate for this.
In case of bugs or new features, you could create special scripts or other processes that search for a particular situation and handle it by submitting specific compensating commands, without reprocessing all the past events.
In case of new features you may even have to process old events that now are interesting for your business process. You do this in the same way, by querying the events source for the newly interesting old events and send them to the newly updated Saga. After that import process, you subscribe the Saga to these newly interesting events and the Saga continues to function as usual.

Resources