I would like to call a method when Airflow Sensor times out. Is there a hook provided by Airflow library to do such actions? I have looked into source code and it throws AirflowSensorTimeoutException once timeout happens.
Is there a way to catch above exception or some sort of hook provided by Sensor to some action post timeout?
While there is no easy option to do it for a specific kind of exception (in your case - timeout exception), you can achieve that by adding another task that does X and pass trigger_rule='all_failed' param to it, so this task will only run if the parent failed.
In example:
sensor = FileSensor(
task_id='sensor',
filepath='/usr/local/airflow/file',
timeout=10,
poke_interval=10,
dag=dag
)
when_failed = BashOperator(
task_id='creator',
bash_command='touch /usr/local/airflow/file',
trigger_rule='all_failed',
dag=dag
)
sensor >> when_failed
You can also take this approach if you want to run only on timeout exception by using a ShortCircuitOperator with the trigger_rule that checks if the exception is timeout, but it will be more complicated.
Related
I'm trying to implement a way to get notified when my dag fails.
I tried to use the email_on_failure and a webhook method ( https://code.mendhak.com/Airflow-MS-Teams-Operator/ ).
But for both of them, I got a notification for every task that failed.
Is there a way to get notified only if the whole dag doesn't work?
I really appreciate any help you can provide.
You can choose to set on_failure_callback on operator level or on DAG level.
On Dag - A function to be called when a DagRun of this dag fails.
On Operator - a function to be called when a task instance
of this task fails.
In your case you need to set on_failure_callback in your DAG object:
dag = DAG(
dag_id=dag_id,
on_failure_callback=func_to_execute,
)
Airflow tasks not failing when an exception occurs during on_success_callback function execution even if error is caught and AirflowException is thrown in the callback func.Is this normal behaviour.
Is there any other way to make sure the task fails if an exception occurs during the callback function execution.
I believe when you reach the on_success callback , it means you the task has already succeeded. Now if you want the task to still fail because of the error in on_success callback then you might need to implement a try except block, where in you must set the task as failed manually. Something like this.
def on_success_callback(context):
try:
raise ValueError
except:
dag = context['dag']
tasks = dag.task_ids
print(context['execution_date'])
dag.clear(
task_ids = tasks,
start_date = context['execution_date'],
end_date = dag.end_date
)
You can derive the task instance from the context and then error it out.
on_success_callback is executed after the task has finished with Success.
Raising exceptions in on_success_callback will not result in changing the Task status.
If the code you execute in the on_success_callback suppose to fail the task in case of exception then this code should be in the task code.
I have an Airflow Http sensor that calls a REST endpoint and checks for a specific value in the JSON structure returned by the API
sensor = HttpSensor(
soft_fail=True,
task_id='http_sensor_check',
http_conn_id='http_default',
endpoint='http://localhost:8082/api/v1/resources/games/all',
request_params={},
response_check=lambda response: True if check_api_response(response) is True else False,
mode='reschedule',
dag=dag)
If the response_check is false, the DAG is put in a "up_for_reschedule" state. The issue is, the DAG stayed in that status forever and never got rescheduled.
My questions are:
What does "up_for_reschedule" means? and when would be the DAG rescheduled?
Let's suppose my DAG is scheduled to run every 5 minutes but because of the sensor, the "up_for_reschedule" DAG instance overlaps with the new run, will I have 2 DAGS running at the same time?
Thank you in advance.
In sensor mode='reschedule' means that if the criteria of the sensor isn't True then the sensor will release the worker to other tasks. This is very useful for cases when sensor may wait for a long time.
up_for_reschedule means that the sensor condition isn't true yet and it hasnt reached timout so the task is waiting to be rescheduled by the scheduler.
You don't know when the task will run. That depends on the scheduler (available resources, priorities etc..). If you don't want to allow parallel dag runs use max_active_runs=1 in DAG constructor.
Side note:
response_check=lambda response: True if check_api_response(response) is True else False,
is the same as:
response_check=lambda response: check_api_response(response),
I have a DAG which takes a very long time to do a bigquery operation. And always i get the error 'Broken DAG: [/home/airflow/gcs/dags/xyz.py] Timeout'
I found some answers saying that we have to increase the timeout in airflow.cfg. But that idea is not suitable in my project. Is it possible to somehow increase the timeout for a particular DAG? Anybody please help. Thank you.
Yes you can set dagrun_timeout parameter on the Dag.
Specify how long a DagRun should be up before
timing out / failing, so that new DagRuns can be created. The timeout
is only enforced for scheduled DagRuns, and only once the
# of active DagRuns == max_active_runs.
We also have a parameter execution_timeout on each Task that you can set.
execution_timeout: max time allowed for the execution of
this task instance, if it goes beyond it will raise and fail.
:type execution_timeout: datetime.timedelta
So if one of the task is running a query on BigQuery you can use something like
BigQueryOperator(sql=sql,
destination_dataset_table={{ params.t_name }}),
task_id='bq_query',
bigquery_conn_id='my_bq_connection',
use_legacy_sql=False,
write_disposition='WRITE_TRUNCATE',
create_disposition='CREATE_IF_NEEDED',
query_params={'t_name': table_name},
execution_timeout=datetime.timedelta(minutes=10)
dag=dag)
Can I externally(use a http request ?) to mark a specific task_id associated with dag_id and run_id as success/failure.
My task is a long running task on external system and I don't want my task to poll the system to find the status.. since we can probably have several 1000 task running at same time ..
Ideally want my task to
make a http request to start my external job
go to sleep
once the job is finished, it(External system or the post build action of my job) informs airflow that the task is done (identified by task_id, dag_id and run_id)
Thanks
You can solve this by sending SQL queries directly into Airflow's metadata DB:
UPDATE task_instance
SET state = 'success',
try_number = 0
WHERE
task_id = 'YOUR-TASK-ID'
AND
dag_id = 'YOUR-DAG-ID'
AND
execution_date = '2019-06-27T16:56:17.789842+00:00';
Notes:
The execution_date filter is crucial, Airflow identifies DagRuns by execution_date, not really by their run_id. This means you really need to get your DagRun's execution/run date to make it work.
The try_number = 0 part is added because sometimes Airflow will reset the task back to failed if it notices that try_number is already at its limit (max_tries)
You can see it in Airflow's source code here: https://github.com/apache/airflow/blob/750cb7a1a08a71b63af4ea787ae29a99cfe0a8d9/airflow/models/dagrun.py#L203
Airflow doesnt yet have a Rest endpoint. However you have a couple of options
- Using the airflow command line utilities to mark the jobs to success. E.g. In python using Popen.
- Directly update the Airflow DB table task_instance