How to re-run all failed tasks in Apache Airflow? - airflow

I have a Apache Airflow DAG with tens of thousands of tasks and after a run, say a handful of them failed.
I fixed the bug that caused some tasks to fail and I would like to re-run ONLY FAILED TASKS.
This SO post suggests using the GUI to "clear" failed task:
How to restart a failed task on Airflow
This approach works if you have a handful number of failed tasks.
I am wondering if we can bypass the GUI and do it problematically, through command line something like:
airflow_clear_failed_tasks dag_id execution_data

Use the following command to clear only failed tasks:
airflow clear [-s START_DATE] [-e END_DATE] --only_failed dag_id
Documentation: https://airflow.readthedocs.io/en/stable/cli.html#clear

The command to clear only failed tasks was updated. It is now (Airflow 2.0 as of March 2021):
airflow tasks clear [-s START_DATE] [-e END_DATE] --only-failed dag_id

Related

Airflow - Can't backfill via CLI

I have an Airflow deployment running in a Kubernetes cluster. I'm trying to use the CLI to backfill one of my DAGs by doing the following:
I open a shell to my scheduler node by running the following command: kubectl exec --stdin --tty airflow-worker-0 -- /bin/bash
I then execute the following command to initiate the backfill - airflow dags backfill -s 2021-08-06 -e 2021-08-31 my_dag
It then hangs on the below log entry indefinitely until I terminate the process:
[2022-05-31 13:04:25,682] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags
I then get an error similar to the below, complaining that a random DAG that I don't care about can't be found:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/dags/__pycache__/example_dag-37.pyc'
Is there any way to address this? I don't understand why the CLI has to fill up the DagBag given that I've already told it exactly what DAG I want to execute - why is it then looking for random DAGs in the pycache folder that don't exist?

Airflow CLI: How to get status of dag tasks in Airflow 1.10.12?

In Airflow 2.0, you can get the status of tasks in a dag by running CLI command: airflow tasks states-for-dag-run. (See docs here: https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#state_repeat1)
What's the equivalent in Airflow 1.10.12? I can't seem to find it in the 1.10.12 docs.
There is no direct equivalent as this is a new CLI command of Airflow 2.0.
In Airflow 1.10.12 you can do (docs):
airflow task_state [-h] [-sd SUBDIR] dag_id task_id execution_date

How to read value passed to the airflow backfill --conf {"key": "value"}

I have an airflow DAG which I can run with some parameters using:
airflow trigger_dag 'my_dag' --conf '{"key":"value"}'
then I can get the 'value' in my DAG like this:
context['dag_run'].conf.get('key')
I would like to do the same using backfill:
airflow backfill 'my_dag' --conf '{"key":"value"}' -s 2019-04-15 -e 2019-04-16
Is it possible to get passed value in --conf for backfill?
I came upon this question while having the same issue, and although its a few years later I thought this might help someone.
As the OP suspected, prior DAG executions impact whether a backfill will use the conf provided in the command line. This was recently raised in this issue and a fix merged: https://github.com/apache/airflow/pull/22837
Yes, the backfill command also has a conf parameter.
From: https://airflow.apache.org/1.10.3/cli.html#backfill
airflow backfill [-h] [-t TASK_REGEX] [-s START_DATE] [-e END_DATE] [-m] [-l]
[-x] [-i] [-I] [-sd SUBDIR] [--pool POOL]
[--delay_on_limit DELAY_ON_LIMIT] [-dr] [-v] [-c CONF]
[--reset_dagruns] [--rerun_failed_tasks] [-B]
dag_id
-c, --conf
JSON string that gets pickled into the DagRun’s conf attribute

Airflow run specifying incorrect -sd or -SUBDIRECTORY

I have an Airflow process running every day with many DAGS. Today, all of a sudden, none of the DAGS can be run because when Airflow calls airflow run it misspecifies the -sd directory to find the DAG.
Here's the example:
[2018-09-26 15:18:10,406] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow run daily_execution dag1 2018-09-26T13:17:50.511269 --job_id 1 --raw -sd DAGS_FOLDER/newfoldernewfolder/dags/all_dags.py']
As you can see, right after -sd, the the subdirectory repeates newfolder twice when it should only state DAGS_FOLDER/newfolder/dags/all_dags.py.
I tried running the DAG with the same files that were running two days ago (when everything was correct) but I get the same error. I'm guessing that something has changed in Airflow configuration but I'm not aware of any changes in airflow.cfg. I've been only managing the UI and airflow run gets called automatically once I turn the DAG on.
Anybody has an idea where airflow run might get this directory and how I can update this?

how to clear failing DAGs using the CLI in airflow

I have some failing DAGs, let's say from 1st-Feb to 20th-Feb. From that date upword, all of them succeeded.
I tried to use the cli (instead of doing it twenty times with the Web UI):
airflow clear -f -t * my_dags.my_dag_id
But I have a weird error:
airflow: error: unrecognized arguments: airflow-webserver.pid airflow.cfg airflow_variables.json my_dags.my_dag_id
EDIT 1:
Like #tobi6 explained it, the * was indeed causing troubles.
Knowing that, I tried this command instead:
airflow clear -u -d -f -t ".*" my_dags.my_dag_id
but it's only returning failed task instances (-f flag). -d and -u flags don't seem to work because taskinstances downstream and upstream the failed ones are ignored (not returned).
EDIT 2:
like #tobi6 suggested, using -s and -e permits to select all DAG runs within a date range. Here is the command:
airflow clear -s "2018-04-01 00:00:00" -e "2018-04-01 00:00:00" my_dags.my_dag_id.
However, adding -f flag to the command above only returns failed task instances. is it possible to select all failed task instances of all failed DAG runs within a date range ?
If you are using an asterik * in the Linux bash, it will automatically expand the content of the directory.
Meaning it will replace the asterik with all files in the current working directory and then execute your command.
This will help to avoid the automatic expansion:
"airflow clear -f -t * my_dags.my_dag_id"
One solution I've found so far is by executing sql(MySQL in my case):
update task_instance t left join dag_run d on d.dag_id = t.dag_id and d.execution_date = t.execution_date
set t.state=null,
d.state='running'
where t.dag_id = '<your_dag_id'
and t.execution_date > '2020-08-07 23:00:00'
and d.state='failed';
It will clear all tasks states on failed dag_runs, as button 'clear' pressed for entire dag run in web UI.
In airflow 2.2.4 the airflow clear command was deprecated.
You could now run:
airflow tasks clear -s <your_start_date> -e <end_date> <dag_id>

Resources