I have a Airflow 1.10 DAG with the following sequence of operators -
PythonOperator1 --> S3KeySensor --> PythonOperator2 --> PythonOperator3
My requirement is to send email notification if -
S3KeySensor fails (timeout occurs waiting for file with soft_fail=True i.e. skipped)
PythonOperator2 or PythonOperator3 fails
No need to send email if DAG completes successfully
Can anyone please help how to implement this conditional logic with the EmailOperator.
Thanks.
There is a on_failure_callback config option for every Airflow task. It's value is supposed to be a python callable. So you'd need to configure your tasks (S3KeySensor, PythonOperator based) like that:
def send_email(context=None):
from airflow.operators.email import EmailOperator
email_alert = EmailOperator(...)
slack_alert.execute(context)
with DAG(...) as dag:
task_a = S3KeySensor(on_failure_callback=send_email, ...)
task_b = PythonOperator(on_failure_callback=send_email, ...)
task_a > task_b
More info on callbacks:
https://airflow.apache.org/docs/apache-airflow/2.2.1/logging-monitoring/callbacks.html
Remember that Airflow 1.* is deprecated already so it's recommended to use Airflow 2+.
Related
Trying to implement a simple deferrable operator based on this example, nothing seems to appear after the manual triggering of my DAG (same case with the exact code of example).
class TestDefer(BaseOperator):
def execute(self, context):
print("--- execute --")
self.defer(
trigger=TimeDeltaTrigger(delta=timedelta(seconds=1)),
method_name="func",
)
def func(self, context, event=None):
print("--- func ----")
pass
with DAG(
"def_dag", schedule_interval=None, start_date=datetime.now(),
) as dag:
t = TestDefer(task_id="defer_task")
and then :
airflow dags test def_dag now
airflow triggerer
Result : func is never called.
Thanks in advance for your help.
Your deferrable operator code is correct. I tested it with the DAG below in Airflow 2.5.1 (only changed the print statements to logs and the start_date because datetime.now() can lead to issues when scheduling, but it should work manually as you had it).
Is the issue the same when you run the DAG manually from the UI? Using airflow dags test... I get an output without "--- func ----" but manually running the DAG from the UI the line prints and the DAG works as expected. (might be loosely related to this issue).
If manually running from the UI does not work: what is the output of docker ps?
from airflow import DAG
from datetime import timedelta, datetime
from airflow.triggers.temporal import TimeDeltaTrigger
from airflow.models.baseoperator import BaseOperator
import logging
# get Airflow logger
log = logging.getLogger('airflow.task')
class TestDefer(BaseOperator):
def execute(self, context):
log.info("--- execute --")
self.defer(
trigger=TimeDeltaTrigger(delta=timedelta(seconds=1)),
method_name="func",
)
def func(self, context, event=None):
log.info("--- func ----")
pass
with DAG(
"def_dag",
schedule_interval=None,
start_date=datetime(2023, 1, 1),
catchup=False
) as dag:
t = TestDefer(task_id="defer_task")
After few tests, with Airflow 2.5.1, and your advices, my deferrable operator works following these steps :
launching of airflow scheduler
launching of airflow triggerer
airflow dags test... or from the UI
Thanks for the help.
I have been trying past 2 days to resolve this. There is a DAG python script which I created and saved it in the dags folder in airflow which is being referred to in the "airflow.cfg" file. The other dags are getting updated except for one dag. I tried to restart scheduler and also tried to reset the airflow db using airflow db reset and then tried airflow db init once again but still the same issue exists.
Some ideas on what you could check:
Do all of your DAGs have a unique dag_id? (I lost a few hours to this once, if two dags have the same name, the scheduler will randomly pick one to display with every dag_dir_list_interval)
If you are using a the #dag decorator: are you calling the DAG below its definition? Like so:
from airflow.decorators import dag, task
from pendulum import datetime
#dag(
dag_id="unique_name",
start_date=datetime(2022,12,10),
schedule=None,
catchup=False
)
def my_dag():
#task
def say_hi():
return "hi"
say_hi()
# without this line the DAG will not show up in the UI
my_dag()
What is the output of airflow run dags list and airflow run dags list-import-errors ?
If you have a lot of DAGs in your environment you might want to increase the dagbag_import_timeout.
Does your DAG work if thrown into a new Airflow instance (the easiest way to check is by spinning up a project with the Astro CLI and putting the dag into the dags folder created by astro dev init)
Disclaimer: I work at Astronomer, who develops the Astro CLI as an OS project.
I am trying to set two default values for all my dags. For that, I have created the file airflow_local_settings.py in my home directory of Airflow with the following code (trying to follow the example in https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#dag-level-cluster-policy):
def dag_policy(dag: DAG):
dag.catchup = False
dag.default_args['email'] = 'blabla'
However, I get the error:
Error: name 'DAG' is not defined
If I leave the code as:
def dag_policy(dag: DAG):
dag.catchup = False
dag.default_args['email'] = 'blabla'
Then the dags run with catchup=True and no email. How could I solve it? Thanks
DAG Policy was added in PR and available only in Airflow >= 2.0 since you are running 1.10.14 this feature is not available for you.
I am using airflow v2.0 on windows 10 WSL (Ubuntu 20.04).
The warning message is :
/home/jainri/.local/lib/python3.8/site-packages/airflow/models/dag.py:1342: PendingDeprecationWarning: The requested task could not be added to the DAG because a task with task_id create_tag_template_field_result is already in the DAG. Starting in Airflow 2.0, trying to overwrite a task will raise an exception.
warnings.warn(
Done.
Due to this warning, the dags showing in web UI are also some example dags included with apache airflow. I have setup **AIRFLOW_HOME** and it also picks up dags from there. But the list of example dags also displayed. I have posted the image of WEB UI also.
WebUI
This is the dag below that I am trying to run:
import datetime
import logging
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
#
# TODO: Define a function for the python operator to call
#
def greet():
logging.info("Hello Rishabh!!")
dag = DAG(
'lesson1.demo1',
start_date = datetime.datetime.now()
end_date
)
#
# TODO: Define the task below using PythonOperator
#
greet_task = PythonOperator(
task_id='greet_task',
python_callable=greet,
dag=dag
)
Also, the main issue is like the list of dags showing in webUI is some example dags. That shows up a huge list along with my own dags. Which makes it cumbersome to look for my own dags.
I found the issue, the error you are seeing is because of airflow/example_dags/example_complex.py (one of the example_dags) that is shipped with Airflow.
Disable loading of example_dags by setting AIRFLOW__CORE__LOAD_EXAMPLES=False as an environment variable or set [core] load_examples = False in airflow.cfg (docs).
I have a dag that we'll deploy to multiple different airflow instances and in our airflow.cfg we have dags_are_paused_at_creation = True but for this specific dag we want it to be turned on without having to do so manually by clicking on the UI. Is there a way to do it programmatically?
I created the following function to do so if anyone else runs into this issue:
import airflow.settings
from airflow.models import DagModel
def unpause_dag(dag):
"""
A way to programatically unpause a DAG.
:param dag: DAG object
:return: dag.is_paused is now False
"""
session = airflow.settings.Session()
try:
qry = session.query(DagModel).filter(DagModel.dag_id == dag.dag_id)
d = qry.first()
d.is_paused = False
session.commit()
except:
session.rollback()
finally:
session.close()
airflow-rest-api-plugin plugin can also be used to programmatically pause tasks.
Pauses a DAG
Available in Airflow Version: 1.7.0 or greater
GET - http://{HOST}:{PORT}/admin/rest_api/api?api=pause
Query Arguments:
dag_id - string - The id of the dag
subdir (optional) - string - File location or directory from which to
look for the dag
Examples:
http://{HOST}:{PORT}/admin/rest_api/api?api=pause&dag_id=test_id
See for more details:
https://github.com/teamclairvoyant/airflow-rest-api-plugin
supply your dag_id and run this command on your command line.
airflow pause dag_id.
For more information on the airflow command line interface: https://airflow.incubator.apache.org/cli.html
I think you are looking for unpause ( not pause)
airflow unpause DAG_ID
The following cli command should work per the recent docs.
airflow dags unpause dag_id
https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#unpause
Airflow's REST API provides a way using the DAG patch API: we need to update the dag with query parameter ?update_mask=is_paused and send boolean as request body.
Ref: https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/patch_dag
airflow pause dag_id.
has been discontinued.
You will have to use:
airflow dags pause dag_id
You can do this using in the python operator of any dag to pause and unpause the dags programatically . This is the best approch i found instead of using cli just pass the list of dags and rest is take care
from airflow.models import DagModel
dag_id = "dag_name"
dag = DagModel.get_dagmodel(dag_id)
dag.set_is_paused(is_paused=False)
And just if you want to check if it is paused or not it will return boolean
dag.is_paused()