is this possible? integrating two airflow servers - airflow

Consider two airflow server is running A, B. I want B's DAG(one particular dag) running status to be available on A's Dags dashboard. is this possible? if possible how?
thanks in advance
Sundar

Related

Airflow EKS and Fargate

Have some basic questions about setting up Airflow on EKS using Fargate. What I have understood so far is that master plane will still be managed by AWS, while the worker plane will be Fargate instances.
Question: What I am unclear is while setting up webserver/scheduler etc on Fargate, do I need to specify anywhere the amount of Vcpu and Memory?
More importantly, do any changes need to be done on how dags are written so that they can execute on the individual pods ? Also do the tasks in the dags specify how much Vcpu and memory will the task use?
Sorry just entering into the Fargate/EKS/Airflow world.
Thanks

How to setup Airflow > 2.0 high availability cluster on centos 7 or above

I want to setup HA for airflow(2.3.1) on centos7. Messaging queue - Rabbitmq and metadata db - postgres. Anybody knows how to setup it.
Your question is very large, because the high availability has multiple level and definition:
Airflow availability: multiple scheduler, multiple workers, auto scaling to avoid pressure, high storage volume, ...
The databases: a HA cluster for Rabbitmq and a HA cluster for postgres
Even if you have the first two levels, how many node you want to use? you cannot put everything in the same node, you need to run one service replica per node
Suppose you did that, and now you have 3 different nodes running in the same data center, what if there is a fire in the data center? So you need to use multiple nodes in different regions
After doing all of above, is there a risk for network problem? of course there is
If you just want to run airflow in HA mode, you have multiple option to do that on any OS:
docker compose: usually we use it for developing, but you can use it for production too, you can create multiple scheduler instances, with multiple workers, it can help you to improve the availability of your service
docker swarm: similar to docker compose with additional features (scaling, multi nodes, ...), you will not find much resources to install it, but you can use the compose files and just do some changes
kubernetes: the best solution, K8S can help you to ensure the availability of your services, easy install with helm
or just running the different services on your host: not recommended, because of manual tasks, and applying the HA is complicated

Running airflow DAG/tasks on different hosts

We currently have a bunch of independent jobs running on different servers & being scheduled with crontab. The goal would be to have a single view of all the jobs across the servers and whether they've run successfully etc.
Airflow is one of the tools we are considering using to achieve this. But our servers are configured very differently. Is it possible to set up airflow so that DAG1 (and the airflow scheduler & webserver) runs on server1 and DAG2 runs on server2 without RabbitMQ.
Essentially I'd like to achieve something like the first answer given here (or just at a DAG level): Airflow DAG tasks parallelism on different worker nodes
in the quickest & simplest way possible!
Thanks
You can checkout Running Apache-Airflow with Celery Executor in Docker.
To use celery, you can instantiate a redis node as a pod and proceed with managing tasks across multiple hosts.
The link above will also give you a starter docker-compose yaml to help you get started quickly with Apache Airflow on celery executor.
Is it possible to set up airflow so that DAG1 (and the airflow
scheduler & webserver) runs on server1 and DAG2 runs on server2
without RabbitMQ.
Airflow by default will try to use multiple hosts on Celery Executor and the division will always be on task level and not on DAG level.
This post might help you with spawning specific tasks on a specific worker node.

Configure airflow with two different ports in same cluster?

How to create a airflow with two different ports in same cluster?
I'm assuming you're speaking about the Airflow webserver process, since you didn't clarify, but you should be able to simply run multiple processes and just set the AIRFLOW__WEBSERVER__WEB_SERVER_PORT environment variable for each process accordingly.

How to start multiple virtual machines simultaneously in CloudStack

Is there a way to start multiple virtual machines (instances) simultaneously in CloudStack?
Apparently this can't be done using the http user interface. Also, the http API request specifies only one id for targeting the virtual machine.
All I can think to solve this problem is to fire multiple individual start requests for each instance, then polling each of the job for results. Is there a better way?
CloudStack is an API driven system, if there is no API call where you can specify multiple VMs to be created (and I don't think there is), then it is not possible.
If you do need to create multiple machines (nearly) simultaneously, the only option I see is to fire multiple API calls, as you already mentioned.
See this answer on another question for a list of tools that make interfacing with CloudStack easier.
To start VM on cloudstack simultaneously tho in serial, I used cloudmonkey and created a bash script to setup a group of know VM UUID. See here for my experience
https://sites.google.com/site/cloudfyp/tutorial/cloudmonkey/commands-on-cloudmonkey

Resources