Streaming Airflow events to message queue - airflow

My goal is to easily export events from Airflow into other systems. One option is to create a plugin that can access Airflow internal state and expose it via Rest API (there are some existing implementations), but what I'm concerned more about is whether it would be possible to plug into Airflow's event log and stream those message to an external message queue (e.g. Kafka, PubSub, Kinesis).

The easiest way I could imagine accomplishing this is by using the sqlalchemy.event.listens_for decorator, attached to the various Airflow models, and filtering for the model events you're looking to shuttle off to the message queue.
You could do this in an airflow_local_settings module so that it's loaded up automatically on startup. Then place some extra configuration values in your airflow.cfg file that drive the settings for the remote message queue.

Related

In Mule 4 is there any any way to run the Mule-Batch in synchronous Mode

In Mule 4 is there any any way to run the Mule-Batch in synchronous Mode
I have done several projects using mule batch component.
Now present case we have a situation where need to dependent on the output produced by the mule batch component.For my case it is creating a file in asynchronous mode which
contains the below information.
studentId,Name,Status
1,bijoy,enrolled
2,hari,not_enrolled,
3,piyush,erolled
But as it is running in asynchronous mode could not rely on the data.
My question is is there any way to run mule Batch (Mule 4) synchronously?
No, it is not possible to run a Batch synchronously within the flow it is called, by design.
As an alternative you could put the logic that you want to execute after the batch in a separate flow that listens to a VM queue. In the On Complete phase of the batch you can send a message to that VM queue. The listening flow can't receive batch data directly but for a file it should be OK.
Having said that, file exchange is not a very good method of exchanging information inside an application. I would recommend to explore alternatives like databases for transient data, unless you just need the file to send it elsewhere.

Axon4 - Re-queue failed messages

In below scenario, what would be the bahavior of Axon -
Command Bus recieved the command
It creates an event
However messaging infra is down (say kafka)
Does Axon has re-queing capability for event or any other alternative to handle this scenario.
If you're using Axon, you know it differentiates between Command, Event and Query messages. I'd suggest to be specific in your question which message type you want to retry.
However, I am going to make the assumption it's about events, as your stating Kafka.
If this is the case, I'd highly recommend reading the reference guide on the matter, as it states how you can uncouple Kafka publication from actual event storage in Axon.
Simply put, use a TrackingEventProcessor as the means to publish events on Kafka, as this will ensure a dedicate thread is used for publication instead of the same thread storing the event. Added, the TrackingEventProcessor can be replayed, thus "re-process" events.

How to create command by consuming message from kafka topic rather than through Rest API

I'm using Axon version (3.3) which seamlessly supports Kafka with annotation in the SpringBoot Main class using
#SpringBootApplication(exclude = KafkaAutoConfiguration.class)
In our use case, the command side microservice need to pick message from kafka topic rather than we expose it as Rest api. It will store the event in event store and then move it to another kafka topic for query side microservice to consume.
Since KafkaAutoCOnfiguration is disabled, I cannot use spring-kafka configuration to write a consumer. How can I consume a normal message in Axon?
I tried writing a normal Kafka spring Consumer but since Kafka Auto COnfiguration is disabled, initial trigger for the command is not picked up from the Kafka topic
I think I can help you out with this.
The Axon Kafka Extension is solely meant for Events.
Thus, it is not intended to dispatch Commands or Queries from one node to another.
This is very intentionally, as Event messages have different routing needs apposed to Command and Query messages.
Axon views Kafka a fine fit as an Event Bus and as such this is supported through the framework.
It is however not ideal for Command messages (should be routed to a single handler, always) or Query messages (can be routed to a single handler, several handlers or have a subscription model).
Thus, I you'd want to "abuse" Kafka for different types of messages in conjunction with Axon, you will have to write your own component/service for it.
I would however stick to the messaging paradigm and separate these concerns.
For far increasing simplicity when routing messages between Axon applications, I'd highly recommend trying out Axon Server.
Additionally, here you can hear/see Allard Buijze point out the different routing needs per message type (thus the reason why Axon's Kafka Extension only deals with Event messages).

Netflix conductor : Customer client in Java

We are planning to deploy Netflix conductor war in PCF and then create a Conductor client in Java that will communicate with the server and load the json(tasks and workflow creation ) on start up
Can we create the JSONs and load them at client start-uo ? I have tried googling but unable to find sample Conductor Client that can create workflows etc
Any help in pointing to this direction would be helpful.
Thanks in advance
Clients are like listeners with capability of doing the work (workers) and reporting back the status of the task to the conductor. This listening ability comes after a task of specific type is already scheduled by the conductor. For all this to happen the task definitions and workflow definition (metadata) should be fed into the conductor followed by the workflow execution through the rest end-point.
Why do you want to load the workflow tasks and definitions in the startup? Looks like you are using in-memory mode for your POC. For the workflow definitions and tasks to load on startup, use an external dynomite and elasticsearch configured. So that you will not required to load these every time you restart your server.
Hope this helps.

How to track status of an async long running process from stateless spring boot services in GCP?

We have a few Spring Boot services running in a GCP Kubernetes Engine which expose their HTTP(over gRpc) API to the clients.
One task is to import very large data files. The proposed way is to upload files to Google storage and trigger an asynchronous import by providing the file-path to the import location and return HTTP 202 in case the request was valid.
Next, we set the status of the import to pending within the persistence layer (spanner) and trigger an asynchronous parsing and batch ingestion process. In case the import was successful we set the status to completed.
The only way for a client to know if the import was successful is to come back and poll our API for the current status.
And there arises the question then.
There are several load-balanced pods of the same kind. If the importing service crashes (i mean crash, not exception handling), there is no straightforward way for us to finally set the status to aborted. The status will remain pending forever.
We would like to circumvent the use of an additional layer like hazelcast if possible. Also, we`d like to avoid having another service that communicates with one or other pods directly, observes the states and does some fancy callback stuff.
Can anyone give a hint of how to tackle that problem in a best practice manner?
Many thanks.

Resources