Although Pact supports testing of messages, I find that the recommended flow in the "Pact Nirvana" doesn't quite match the flow that I understand an Event Driven Application needs.
Let's say we have an Order management service and a Shipping management service.
The Shipping service emits ShippingPreparedEvents that are received by the Order service.
If we deleted a field inside the ShippingPreparedEvent, I'd expect first to make a change to the Order service so that it stops reading the old field. Deploy it. And then make the change in the Shipping service and deploy it.
That way, there wouldn't have any downtime on the services.
However, I believe Pact would expect to deploy the Shipping service first (it's the provider of the event) so that the contract can be verified before deploying the consumer. In this case, deploying the provider first will break my consumer.
Can this situation be avoided somehow? Am I missing anything?
Just to provide more context, we can see in this link that different changes would require different order of deployment. https://docs.confluent.io/current/schema-registry/avro.html#summary
I won't be using Kafka nor Avro, but I believe my flow would be similar.
Thanks a lot.
If we deleted a field inside the ShippingPreparedEvent, I'd expect first to make a change to the Order service so that it stops reading the old field. Deploy it. And then make the change in the Shipping service and deploy it. That way, there wouldn't have any downtime on the services.
I agree. What specifically in the Pact Nirvana guide gives you the impression this isn't the way to go? Pact (and the Pact Broker) don't actually care about the order of deployments.
In your case, removing the field would cause a failure of a can-i-deploy check, because removing the field would break the Order Management Service. The only approach would be to remove the field usage from the consumer, publish a new version of that contract and deploy to Production first.
Related
As an example:
Provider updates a field that existing consumers consume.
The provider pushes up their changes, and triggers a provider verification within this provider build.
The provider verification tests against all consumers tagged with 'prod'.
The verification fails, as expected, since prod consumers are still expecting the unchanged version of the field.
Consumer services are notified, and makes appropriate changes to their expectations, and publishes the contracts to the broker.
Webhook triggers a provider verification.
This is where I'm lost. In this step of the provider verification, do we explicitly want our provider to check out the branch that contains the change to be used for the provider verification? Because if we use the production version of the provider here, the verification would fail since the changed fields aren't yet available.
This guide explains the process and should clarify for you: https://docs.pact.io/pact_nirvana
But more generally, I think you mostly want to add breaking changes into a system that are backwards compatible initially and so don't require such coupling. So first, if you're changing/breaking a field in an API, I would have consumers first stop using the field (or if you can use the expand/contract approach where you add a new field with the new behaviour, go that way) and get them into production. Then it won't matter what branch the changes run against. But the tagging/branching section in the aforementioned docs should help if you don't want to go that way.
Suppose we have a site on Google Firebase Hosting that routes some requests to a Google Cloud Run service. The service is considered entirely an implementation detail and its only client is the single website. The only reason for using a Cloud Run service is that it is the only suitable technical option within the Firebase platform.
Now, suppose that the API of the service may have a breaking change with every update, so the Firebase Hosting content must change too. How do you update or roll back both parts together so as to avoid incompatibilities?
Straightforwardly, we can update the service and the site content in separate steps, but that means some requests from the old revision of the site may reach the new revision of the service or the other way around, causing errors due to API mismatch. The same issues are present when rolling back the site content and the service at the same time.
One theoretical solution would be to deterministically route requests to different service revisions based on revision labels, but that is not supported on Cloud Run.
One realistic solution would be to create a new service for every update of the site content. However, that would result in unbounded accumulation of services which are not automatically deleted like service revisions are.
Another solution (proposed below) would be to maintain backwards compatibility in the service - it would support both the latest and the previous API version. However, this can be considered an unnecessary overhead. Since the two parts (static content and the service) have no real need to ever be updated independently, it would be very convenient to avoid the overhead of maintaining backwards compatibility in the service.
For what I know there is no way to make this update in a single transaction to avoid this behavior you mentioned as Firebase and Cloud Run are different products.
Also a good Practice in API design is to allow Service Evolution this means that updating the API shall not break the apps consuming it and new versions of the app shall be able to evolve in a way that they can consume the current API.
Something that is done when a new API will not allow retrocompatibility is to have different endpoints this is why some APIs are apiName/V1/method and apiName/v2/method but in this case both versions of the API are deployed.
Knowing full well, there are many types of workflows for different ways of integrating Pact, I'm trying to visualize what a common work flow looks like. I developed this Swimlane for Pact Broker Workflow.
How do we run a Provider verification on an older Provider build?
How does this change with tags?
When does the webhook get created back to the Provider?
What if different Providers have different base urls (i.e. build systems)?
How does a new Provider build alert about the Consumers if the Provider fails?
Am I thinking about this flow correctly?
I've tried to collect my understanding from Webhooks, Using pact where the consumer team is different from the provider team, and Publishing verification results to a Pact Broker . Assuming I am thinking about the problem the right way and did not completely miss some documentation, I'd gladly write up an advise work flow documentation for the community.
Your swimlane diagram is a good picture of the workflow, with the caveat that once everything is all set up, it's rare to manually start provider builds from the broker.
The provider doesn't ever notify the consumers about verification failure (or success) in the process. If it did, then you could end up with circular builds.
I think about it like this:
The consumer tests create a contract (the Pact file).
This step also verifies that the consumer can work with a provider that fulfils that contract (using the mock provider).
Then, the consumer gives this Pact file to the broker (if configured to do so)
Now that there's a new pact, the broker (if configured) can trigger a provider build
The provider's CI infrastructure builds the provider, and runs the pact verification
The provider's CI infrastructure (if configured) tells the broker about the verification result.
The broker and the provider's build system are the only bits that know about the verification result - it isn't passed back to the consumer at the moment.
A consumer that is passing the tests means the consumer can say "I've written this communication contract and confirmed that I can hold up my side of it". Failure to verify the contract at the provider end doesn't change this statement.
However, if the verification succeeds, you may want to trigger a consumer deployment. As Beth Skurrie (one of the primary contributors to Pact) points out in the comments below:
Communicating the status of the verification back to the consumer is actually a highly important thing, as it tells the consumer whether or not they can be deployed safely. It is the missing part of the pact workflow at the moment, and I'm working away as fast as I can to rectify this.
Currently, since the verification status is information you might like to know about - especially if you're unable to see the provider's CI infrastructure - you might like to check out the pact build badges, which are a lighter way of checking the broker.
Before I get to my question, let me sketch out a sample set of microservices to illustrate my dilemma.
Scenario outline
Suppose I have 4 microservices:
An activation service where features supplied to our customers are (de)activated. A registration service where members can be added and changed. A secured key service that is able to generate secure keys (in a multi step process) for members to be used when communicating with them with the outside world. And a communication service that is used to communicate about our members with external vendors.
The secured key service may however only request secured keys if this is a feature that is activated. Additionally, the communication service may only communicate about members that have a secured key AND if the communication feature itself is activated.
Because they are microservices, each of the services has it's own datastore and is completely self sufficient. That is, any data that is required from the other microservices is duplicated locally and kept in sync by means of asynchronous messages from the other microservices.
The dilemma
I'm actually facing two main dilemma's. The first is (pretty obviously) data synchronization. When there are multiple data stores that need to be kept in sync you have to account for messages getting lost or processed out of order. But there are plenty of out of the box solutions for this and when all fails you could even fall back to some kind of ETL process to keep things in sync.
The main issue I'm facing however is the actions that need to be performed. In the above example the secured key service must perform an action when it either
Receives a message from the registration service for a new member when it already knows that the secured keys feature is active in the activation service
Receives a message from the activation service that the secured keys feature is now active when it already knows about members from the registration service
In both cases this means that a message from the external system must lead to both an update in the local copy of the data as well as some logic that needs to be processed.
The question
Now to the actual question :)
What is the recommended way to cope with either bugs or new insights when it comes to handling those messages? Suppose there is a bug in the message handler from the activation service. The handler does update the internal data structure, but it fails to detect that there are already registered members and thus never starts the secure key generation process. Alternatively it could be that there's no bug, but we decide that there is something else we want the handler to do.
The system will have no reason to resubmit or reprocess messages (as the message didn't fail), but there's no real way for us to re-trigger the behavior that's behind the message.
I hope it's clear what I'm asking (and I do apologize if it should be posted on any of the other 170 Stack... sites, I only really know of StackOverflow)
I don't know what is the recommended way, I know how this is done in DDD and maybe this can help you as DDD and microservices are friends.
What you have is a long-running/multi-step process that involves information from multiple microservices. In DDD this can be implemented using a Saga/Process manager. The Saga maintains a local state by subscribing to events from both the registration service and the activation service. As the events come, the Saga check to see if it has all the information it needs to generate secure keys by submitting a CreateSecureKey command. The events may come in any order and even can be duplicated but this is not a problem as the Saga can compensate for this.
In case of bugs or new features, you could create special scripts or other processes that search for a particular situation and handle it by submitting specific compensating commands, without reprocessing all the past events.
In case of new features you may even have to process old events that now are interesting for your business process. You do this in the same way, by querying the events source for the newly interesting old events and send them to the newly updated Saga. After that import process, you subscribe the Saga to these newly interesting events and the Saga continues to function as usual.
I have a need to change Attributes of an App and I understand I can do it with management server API calls.
The two issues with using the management server APIs are:
performance: it’s making calls to the management server, when it
might be possible directly in the message processor. Performance
issues can probably be mitigated with caching.
availability: having to use management server APIs means that the system is
dependent on the management server being available. While if it were
done directly in the proxy itself, it would reduce the number of
failure points.
Any recommended alternatives?
Finally all entities are stored in the cassandra ( for the runtime )
Your best choice is using access entity policy for getting any info about an entity. That would not hit the MS. But just for your information - most of the time you do not even need an access entity policy. When you use a validate apikey or validate access token policy - all the related entity details are made available as flow variable by the MP. So no additional access entity calls should be required.
When you are updating any entity (like developer, application) - I really assume it is management type use case and not a runtime use case. Hence using management APIs should be fine.
If your use case requires a runtime API call to in-turn update an attribute in the application then possibly that attribute should not be part of the application. Think how you can take it out to a cache, KVM or some other place where you can access it from MP (Just a thought without completely knowing the use cases ).
The design of the system is that all entity editing goes through the Management Server, which in turn is responsible for performing the edits in a performant and scalable way. The Management Server is also responsible for knowing which message processors need to be informed of the changes via zookeeper registration. This also ensures that if a given Message Processor is unavailable because it, for example, is being upgraded, it will get the updates whenever it becomes available. The Management Server is the source of truth.
In the case of Developer App Attributes, (or really any App meta-data) the values are cached for 3 minutes (I think), so that the Message Processor may not see the new values for up to 3 minutes.
As far as availability, the Management Server is designed to be highly available, relying on the same underlying architecture as the message processor design.