Is it possible to execute an airflow DAG remotely via command line?
I know there is an airflow command line tool but it seems to allow executing from the server's terminal rather than from any external client.
I can think of two options here
Experimental Rest API (preferable): use good-old GET / POST requests to trigger / cancel DAGRuns
Airflow CLI: SSH into the machine running Airflow and trigger the DAG via command-line
Related
When I'm trying to perform airflow scheduler CLI command to run airflow scheduler (locally), and keep getting error of Connection in use: ('0.0.0.0', 8794). No scheduler process is running, but the port seems to be in use. Did someone have this problem and can help me?
We have a requirement where we need to send airflow metrics to datadog. I tried to follow the steps mentioned here
https://docs.datadoghq.com/integrations/airflow/?tab=host
Likewise, I included statsD in airflow installation and updated the airflow configuration file (Steps 1 and 2)
After this point, I am not able to figure out how to send my metrics to datadog. Do I follow the Host configurations or containarized configurations? For the Host configurations, we have to update the datadog.yaml file which is not in our repo and for containerized version, they have specified how to do it for Kubernetics but we don't use Kubernetics.
We are using airflow by creating a docker build and running it over on Amazon ECS. We also have a datadog agent running parallely in the same task (not part of our repo). However I am not able to figure out what configurations I need to make in order to send the StatsD metrics to datadog. Please let me know if anyone has any answer.
Hi based on airflow docs I am able to set up cloud/remote logging.
Remote logging is working for dag and task logs but it's not able to back up or remotely store following logs of.
scheduler
dag_processing_manager
I am using docker_hub airflow docker image.
I am currently setup airflow scheduler in Linux server A and airflow web server in Linux server B. Both server has no Internet access. I have start the initDB in server A and keep all the dags in server A.
However, when i refresh the webserver UI, it keep having error message:-
This DAG isn't available in the webserver DagBag object
How do i configure the dag folder for web server (server B) to read the dag from scheduler (server A)?
I am using bashoperator. Is that Celery Operator is a must?
Thanks in advance
The scheduler has found your dags_folder, and its processes, and is scheduling them accordingly. The webserver however can "see" these processes solely by their existence in the database but can't find them in its dags_folder path.
You need to ensure that the dags_folder for both servers contain the same files, and that both are kept in sync with one another. This is out of scope for Airflow and it won't handle this on your behalf.
I have a AirFlow service running normally on remote machine, which can be accessed through Browser with URL: http://airflow.xxx.com
Now I want to dynamically upload DAGs from another machine to AirFlow at airflow.xxx.com, and make that DAG auto run.
After I read the airflow document: http://airflow.incubator.apache.org/, I found way to dynamically create DAGs and auto run it, which can be done on the airflow machine airflow.xxx.com.
But I want to do it in another machine, how can I accomplish it, is there a way like webhdfs, which let me directly send command to remote AirFlow?
You should upload your new dag in the Apache Airflow dag directory.
If you didn't set Airflow up in a cluster environment, you should have web-server, scheduler and worker all running on the same machine.
On that machine, if you did not amend airflow.cfg, you should have your dag directory in dags_folder = /usr/local/airflow/dags
If you access the Airflow machine from the other machine through SFTP (or FTP) you can simply put the file in that dir.