I have DAG where max_active_runs is set to 2, but now I want to run backfills for 20ish runs. I actually expected airflow to sort of schedule all the backfills but only start 2 at a time, but that doesn't seem to happen. When I run the backfill command it starts two, but the command doesn't return since it didn't manage to start them all, instead, it keeps on trying until it succeeds.
So what I expected was this:
I ran the backfill command
All the runs are marked as running
Command returns since now everything should be scheduled
Two of the runs start
What I experienced:
I ran the backfill command
Two runs are marked as running and start
Command doesn't return since it can't start the rest
The experienced behavior makes it hard to just start a backfill and the shutdown your computer. So am I doing something wrong?
Update
Using trigger_dag instead of backfill did what I wanted it to do. When running with backfill it seems like the command needed to be running for it to continue, feels weird. The difference with trigger_dag is that it trigger the dag and then it let airflow deal with it. Maybe it has something to do with how the backfill command is executed when using gcloud composer environments run <env> --location=<location> backfill -- ...?
Related
What is the difference between max_retries and status_retries when using Airflow BatchOperator? I need to ensure that if a batch job fails, airflow will mark the task (that triggered the batch job) as a failure. Currently, my batch job fails, but the airflow BatchOperator task is marked as a success. I believe that using one or both of these parameters will solve my problem, but I'm not sure what the difference really is between them. Thanks!!
I have read that Airflows catchup feature applies to task instances that do not yet have a state - i.e. the scheduler will pick up any execution dates where the DAG has not yet ran (starting from the given start_date) - is this correct, and if so, does this mean catchup does not apply to failed DAG runs?
I am looking for a way to backfill any execution dates that failed, rather than not having ran at all.
Take a look at the backfill command options. You could use rerun-failed-tasks:
if set, the backfill will auto-rerun all the failed tasks for the backfill date range instead of throwing exceptions
Default: False
or reset-dagruns:
if set, the backfill will delete existing backfill-related DAG runs and start anew with fresh, running DAG runs
Default: False
Also, keep in mind this:
If reset_dag_run option is used, backfill will first prompt users whether airflow should clear all the previous dag_run and task_instances within the backfill date range. If rerun_failed_tasks is used, backfill will auto re-run the previous failed task instances within the backfill date range.
Before doing anything like the above, my suggestion is that you try it first with some dummy DAG or similiar.
I have created a dag and that dag is available in the Airflow UI and i turned it on to run it. After running the dag the status is showing it is up for retry. After that i went to the server and used the command "Airflow scheduler" and after that the dag went successful.
Before running the dag the scheduler is up and running and i am not sure why this is happened.
Do we need to run the airflow scheduler when ever we create a new dag ?
Want to know how the scheduler works.
Thanks
You can look at the airflow scheduler as an infinite loop that checks tasks' states on each iteration and triggers tasks whose dependencies have been met.
The entire process generates a bunch of data that piles up more and more on each round and, at some point, it might end up rendering the scheduler useless as its performance degrades over time. This depends on your Airflow version, it seems to be solved in the newest version (2.0), but for older ones (< 2.0) the recommendation was to restart the scheduler every run_duration number of seconds, with some people recommending setting it to once an hour or once a day. So, unless you're working on Airflow 2.0, I think this is what you're experiencing.
You can find references to this scheduler-restarting issue in posts made by Astronomer here and here.
I am trying to use airflow trigger_dag dag_id to trigger my dag, but it just show me running state and doesn't do anymore.
I have searched for many questions, but all people just say dag id paused. the problem is my dag is unpaused, but also keep the running state.
Note: I can use one dag to trigger another one in Web UI. But it doesn't work in command line.
please see the snapshot as below
I had the same issue many times, The state of the task is not running, it is not queued either, it's stuck after we 'clear'. Sometimes I found the task is going to Shutdown state before getting into stuck. And after a large time the instance will be failed, still, the task status will be in white. I have solved it in many ways, I
can't say its reason or exact solution, but try one of this:
Try trigger dag command again with the same Execution date and time instead of the clear option.
Try backfill it will run only unsuccessful instances.
or try with a different time within the same interval it will create another instance which is fresh and not have the issue.
As seen above picture, TAB_click_count Operator and COVERAGE_poi Operator do not run. still running.
Just one time my DAG was failed to run because of those two operators.
And i fixed that. But after error occur, two operators are not finished still running when my dag runs.
I tried to trigger the dag, but same.
What is problem to my dag or operator?