I set up dags as below. I launched this dag by
airflow backfill jobs -s 2017-05-01 -e 2017-06-07
I did not receive any emails although the jobs are successful.
Am I suppose to do anything else to receive the email?
Or I should run dag in a different way?
dag as below:
default_args = {
'owner': 'me',
'depends_on_past': False,
'start_date': datetime(2017, 5, 01),
'end_date': datetime(2017, 6, 3),
'email': ['Owner#gmail.com'],
'email_on_failure': True,
'email_on_retry': True,
'retries': 1,
'retry_delay': timedelta(minutes=30),
}
Airflow gives the option to send email only on retry and failure by default. To send email on success, you can add EmailOperator in your DAG at the end.
Related
I did a DAG's with the following configuration:
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(0, 0, minute=1),
'email': ['francisco.salazar.12#sansano.usm.cl'],
'email_on_failure': False,
'email_on_retry': False,
'max_active_runs': 1,
'retries': 1,
'retry_delay': timedelta(minutes=1),
'provide_context': True
}
dag = DAG(
'terralink_environmetal_darksky',
default_args=default_args,
description='Extract Data from Darksky API',
catchup=False,
schedule_interval='31 * * * *',
)
The issue is that scheduler works correctly and execute DAG run at every hour that I defined in schedule_inverval (in minute 31 of every hour) BUT in midnight or the last execution of the day (scheduled at 00:31:00 for the next day) the DAG execution is not triggered.
I think that is a problem based on start_date but I don't know yet how to define this parameter in order to avoid the problem.
Airflow recommends to state a fixed startstart_date for your DAG. start_date is mainly for the purpose to specify when do you want your DAG to start running for the very first time. schedule_interval will be the most relevant one after the start_date did its purpose or (if you do not need to backfill or reset your dag).
I have a DAG configured like this:
AIRFLOW_DEFAULT_ARGS = {
'owner': 'airflow',
'depends_on_past': False,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
'dagrun_timeout': timedelta(hours=5)
}
DAILY_RUNNER = DAG(
'daily_runner',
max_active_runs=1,
start_date=datetime(2019, 1, 1),
schedule_interval="0 17 * * *",
default_args=AIRFLOW_DEFAULT_ARGS)
My current understanding is that retries says that a task will be retried once before failing for good. Is there a way to set a similar limit for the number of times a DAG gets retried? If I have a dag in the running state, I want to be able to set it to failed from within the UI once and have it stop rerunning.
Currently, there is no way to set retry at dag level.
Please refer the below answer for retrying a set of tasks/whole-dag in case of failures.
Can a failed Airflow DAG Task Retry with changed parameter
In its 2 out of 10 runs, the DAG status automatically sets to succes even when no tasks inside of it ran. Following is the DAG args which was passed and its tree view.
args = {
'owner': 'xyz',
'depends_on_past': False,
'catchup': False,
'start_date': datetime(2019, 7, 8),
'email': ['a#b.c'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
'provide_context': True,
'retry_delay': timedelta(minutes=2)
}
And I am passing DAG as a context like this:
with DAG(PARENT_DAG_NAME, default_args=args, schedule_interval='30 * * * *') as main_dag:
task1 = DummyOperator(
task_id='Raw_Data_Ingestion_Started',
)
task2 = DummyOperator(
task_id='Raw_Data_Ingestion_Completed',
)
task1 >> task2
Any idea what could be the issue? Is it something I need to change in the config file? And this behaviour is not periodic.
According to the official airflow documentation on DummyOperator:
Operator that does literally nothing. It can be used to group tasks in a DAG.
I'm trying to use apache airflow.
I managed to install everything.
I added a new DAG into dag folder and when I run airflow list_dags it shows me the dag examples along with my new dag.
However, when I go to UI I can't see the DAG listed in DAG tab.
I already killed webserver and restarted everything. It didn't work
fyi, I'm running apache on a VM with centos7.
thanks.
Zack in the comment section is right. If you change the owner in the dag's argument from the default 'airflow' to something else i.e
default_args = {
'owner': 'whateveryournameis', <----
'depends_on_past': False,
'start_date': datetime(2015, 6, 1),
'email': ['airflow#example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG('tutorial', default_args=default_args, schedule_interval=timedelta(days=1))
in order to have your new dag shown in UI dags list, you should create a new user in airflow.
Creating a user is simple. Go to UI, under Admin, go to Users and create a new one.
I configured my DAG like this:
default_args = {
'owner': 'Aviv',
'depends_on_past': False,
'start_date': datetime(2017, 1, 1),
'email': ['aviv#oron.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
'retry_delay': timedelta(minutes=1)
}
dag = DAG(
'MyDAG'
, schedule_interval=timedelta(minutes=3)
, default_args=default_args
, catchup=False
)
and for some reason, when i un-pause the DAG, its being executed twice immediatly.
Any idea why? And is there any rule i can apply to tell this DAG to never run more than once in the same time?
You can specify max_active_runs like this:
dag = airflow.DAG(
'customer_staging',
schedule_interval="#daily",
dagrun_timeout=timedelta(minutes=60),
template_searchpath=tmpl_search_path,
default_args=args,
max_active_runs=1)
I've never seen it happening, are you sure that those runs are not backfills, see: https://stackoverflow.com/a/47953439/9132848
I think its because you have missed the scheduled time and airflow is backfilling it automatically when you ON it again. You can disable this by
catchup_by_default = False in the airflow.cfg.