How to exit with error from script to Airflow? - airflow

Say I'm running:
t = BashOperator(
task_id='import',
bash_command="""python3 script.py '{{ next_execution_date }}' """,
dag=dag)
And for some reason I want the script to exit with error and indicate airflow that he should retry this task.
I tried to use os._exit(1) but Airflow mark the task as success.
I know there is:
from airflow.exceptions import AirflowException
raise AirflowException("error msg")
But this is more for functions written in a DAG. My script is independent and sometimes we run it alone regardless of airflow.
Also the script is Python3 while Airflow is running under Python 2.7
It's seems excessive to install Airflow on Python3 just for error handling.
Is that any other solution?

Add || exit 1 at the end of your Bash command:
bash_command="""python3 script.py '{{ next_execution_date }}' || exit 1 """
More information: https://unix.stackexchange.com/a/309344

Related

Airflow - Can't backfill via CLI

I have an Airflow deployment running in a Kubernetes cluster. I'm trying to use the CLI to backfill one of my DAGs by doing the following:
I open a shell to my scheduler node by running the following command: kubectl exec --stdin --tty airflow-worker-0 -- /bin/bash
I then execute the following command to initiate the backfill - airflow dags backfill -s 2021-08-06 -e 2021-08-31 my_dag
It then hangs on the below log entry indefinitely until I terminate the process:
[2022-05-31 13:04:25,682] {dagbag.py:500} INFO - Filling up the DagBag from /opt/airflow/dags
I then get an error similar to the below, complaining that a random DAG that I don't care about can't be found:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/dags/__pycache__/example_dag-37.pyc'
Is there any way to address this? I don't understand why the CLI has to fill up the DagBag given that I've already told it exactly what DAG I want to execute - why is it then looking for random DAGs in the pycache folder that don't exist?

Airflow CLI: How to get status of dag tasks in Airflow 1.10.12?

In Airflow 2.0, you can get the status of tasks in a dag by running CLI command: airflow tasks states-for-dag-run. (See docs here: https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#state_repeat1)
What's the equivalent in Airflow 1.10.12? I can't seem to find it in the 1.10.12 docs.
There is no direct equivalent as this is a new CLI command of Airflow 2.0.
In Airflow 1.10.12 you can do (docs):
airflow task_state [-h] [-sd SUBDIR] dag_id task_id execution_date

Airflow find DAG Runs with specific conf

I'm using apache-airflow 1.10.10.
My use case is: I have 4 DAGs, all of them triggers a common DAG named "dag_common" with different conf parameters using BashOperator after some work.
airflow trigger_dag -c {"id":"1"} dag_common
airflow trigger_dag -c {"id":"2"} dag_common
airflow trigger_dag -c {"id":"3"} dag_common
airflow trigger_dag -c {"id":"4"} dag_common
Inside these DAGs I have to wait for the triggered DAG to finish, how can I accomplish this?
Dag1 has to wait until finish dag_common with conf id=1.
Is there any way to find all dag runs with specific conf?
It looks like a use case for SubDAGs: Implement dag_common as a subDAG and use SubDagOperator() in those four DAGs to run dag_common.

How to re-run all failed tasks in Apache Airflow?

I have a Apache Airflow DAG with tens of thousands of tasks and after a run, say a handful of them failed.
I fixed the bug that caused some tasks to fail and I would like to re-run ONLY FAILED TASKS.
This SO post suggests using the GUI to "clear" failed task:
How to restart a failed task on Airflow
This approach works if you have a handful number of failed tasks.
I am wondering if we can bypass the GUI and do it problematically, through command line something like:
airflow_clear_failed_tasks dag_id execution_data
Use the following command to clear only failed tasks:
airflow clear [-s START_DATE] [-e END_DATE] --only_failed dag_id
Documentation: https://airflow.readthedocs.io/en/stable/cli.html#clear
The command to clear only failed tasks was updated. It is now (Airflow 2.0 as of March 2021):
airflow tasks clear [-s START_DATE] [-e END_DATE] --only-failed dag_id

For Apache Airflow, How can I pass the parameters when manually trigger DAG via CLI?

I use Airflow to manage ETL tasks execution and schedule. A DAG has been created and it works fine. But is it possible to pass parameters when manually trigger the dag via cli.
For example:
My DAG runs every day at 01:30, and processes data for yesterday(time range from 01:30 yesterday to 01:30 today). There might be some issues with the data source. I need to re-process those data (manually specify the time range).
So can I create such an airflow DAG, when it's scheduled, that the default time range is from 01:30 yesterday to 01:30 today. Then if anything wrong with the data source, I need to manually trigger the DAG and manually pass the time range as parameters.
As I know airflow test has -tp that can pass params to the task. But this is only for testing a specific task. and airflow trigger_dag doesn't have -tp option. So is there any way to tigger_dag and pass parameters to the DAG, and then the Operator can read these parameters?
Thanks!
You can pass parameters from the CLI using --conf '{"key":"value"}' and then use it in the DAG file as "{{ dag_run.conf["key"] }}" in templated field.
CLI:
airflow trigger_dag 'example_dag_conf' -r 'run_id' --conf '{"message":"value"}'
DAG File:
args = {
'start_date': datetime.utcnow(),
'owner': 'airflow',
}
dag = DAG(
dag_id='example_dag_conf',
default_args=args,
schedule_interval=None,
)
def run_this_func(ds, **kwargs):
print("Remotely received value of {} for key=message".
format(kwargs['dag_run'].conf['message']))
run_this = PythonOperator(
task_id='run_this',
provide_context=True,
python_callable=run_this_func,
dag=dag,
)
# You can also access the DagRun object in templates
bash_task = BashOperator(
task_id="bash_task",
bash_command='echo "Here is the message: '
'{{ dag_run.conf["message"] if dag_run else "" }}" ',
dag=dag,
)
This should work, as per the airflow documentation: https://airflow.apache.org/cli.html#trigger_dag
airflow trigger_dag -c '{"key1":1, "key2":2}' dag_id
Make sure the value of -c is a valid json string, so the double quotes wrapping the keys are necessary here.
key: ['param1=somevalue1', 'param2=somevalue2']
First way:
"{{ dag_run.conf["key"] }}"
This will render the passed value as String "['param1=somevalue1', 'param2=somevalue2']"
Second way:
def get_parameters(self, **kwargs):
dag_run = kwargs.get('dag_run')
parameters = dag_run.conf['key']
return parameters
In this scenario, a list of strings is being passed and will be rendered as a list ['param1=somevalue1', 'param2=somevalue2']

Resources