When we deploy Airflow DAG, the State of DAG is Paused.
Is it possible to un-Pause the DAG after deployment automatically.
The configuration parameter dags_are_paused_at_creation should be set as False in the configuration file. The corresponding documentation link is below.
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#dags-are-paused-at-creation
Related
I have two DAGs in my airflow scheduler, which were working in the past. After needing to rebuild the docker containers running airflow, they are now stuck in queued. DAGs in my case are triggered via the REST API, so no actual scheduling is involved.
Since there are quite a few similar posts, I ran through the checklist of this answer from a similar question:
Do you have the airflow scheduler running?
Yes!
Do you have the airflow webserver running?
Yes!
Have you checked that all DAGs you want to run are set to On in the web ui?
Yes, both DAGS are shown in the WebUI and no errors are displayed.
Do all the DAGs you want to run have a start date which is in the past?
Yes, the constructor of both DAGs looks as follows:
dag = DAG(
dag_id='image_object_detection_dag',
default_args=args,
schedule_interval=None,
start_date=days_ago(2),
tags=['helloworld'],
)
Do all the DAGs you want to run have a proper schedule which is shown in the web ui?
No, I trigger my DAGs manually via the REST API.
If nothing else works, you can use the web ui to click on the dag, then on Graph View. Now select the first task and click on Task Instance. In the paragraph Task Instance Details you will see why a DAG is waiting or not running.
Here is the output of what this paragraph is showing me:
What is the best way to find the reason, why the tasks won't exit the queued state and run?
EDIT:
Out of curiousity I tried to trigger the DAG from within the WebUI and now both Runs executed (the one triggered from the WebUI failed, but that was expected, since there was no config set)
We have encountered a scenario recently where someone mistakenly turned off a production dag, and we want to get alert whenever a dag is paused using datadog.
I have checked https://docs.datadoghq.com/integrations/airflow/?tab=host
But have not got any metric for dag to check if it is paused or not.
I can run a custom script in datadog as well.
One of the method is that I exec into postgres pod and get the list of active dags:
select * from dag where is_paused=true;
Or is there any other way I can get the unpaused dag list and also when new dag is added what is the best way to handle it.
I want the alert whenever a unpaused dag is paused.
If you are on Airflow 2 you can use the REST API to query for state of the DAG.
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag
There is "is_paused" field.
And of you are not Airflow 2, you should be. Airflow 1.10 is end-of-life and will not receive any fixes (including critical security fixes) so you should upgrade as soon as you can.
What would happen if DAG changes while DAG is running (particularly if this is dynamic DAG)
What would happen if code of custom made operator is changed during some DAG run
So after some investigation I came to conclusion that DAG changes will be visible in current DAG run and that operators (and all other plugins) are not reloaded by default.
DAG changes during DAG run
If you remove task while DAG is running scheduler notice that, set status for that task to "Removed" while logging current line:
State of this instance has been externally set to removed. Taking the poison pill.
If you add new task it will also be visible and executed. It will be executed even if it is upstream task of task which is already finished
If task is changed, changes will be incorporated into current DAG run only if task is not already started execution or finished
All plugins (included operators) are not refreshed automatically by default. You can restart Airflow or you can set reload_on_plugin_change property in [webserver] section of airflow.cfg file to True
I am trying to manage airflow dags (create, execute etc.) via java backend. Currently after creating a dag and placing it in dags folder of airflow my backend is constantly trying to run the dag. But it can't run it until its picked up by airflow scheduler, which can take quite some time if the number of dags are more. I am wondering if there any events that airflow emits which I can tap to check for new dags processed by scheduler, and then trigger, execute command from my backend. Or is there a way or configuration where airflow will automatically start a dag once it processes it rather than we triggering it ?
is there a way or configuration where airflow will automatically start a dag once it processes it rather than we triggering it ?
Yes, one of the parameters that you can define is is_paused_upon_creation.
If you set your DAG as:
DAG(
dag_id='tutorial',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval="#daily",
start_date=datetime(2020, 12, 28),
is_paused_upon_creation=False
)
The DAG will start as soon as picked up by the scheduler (assuming conditions to run it are met)
I am wondering if there any events that airflow emits which I can tap to check for new dags processed by scheduler
In Airflow >=2.0.0 you can use the API - list dags endpoint to get all dags that are in the dagbag
In any Airflow version you can use this code to list the dag_ids:
from airflow.models import DagBag
print(DagBag().dag_ids())
Is there anyway to start the dag automatically without manually triggering dag once it is available in dagbag, considering is_paused_upon_creation=false is set.
Please, try to create your DAG with schedule_interval='#once'. It should works.