airflow UI not showing all DAG - airflow

I'm trying to use apache airflow.
I managed to install everything.
I added a new DAG into dag folder and when I run airflow list_dags it shows me the dag examples along with my new dag.
However, when I go to UI I can't see the DAG listed in DAG tab.
I already killed webserver and restarted everything. It didn't work
fyi, I'm running apache on a VM with centos7.
thanks.

Zack in the comment section is right. If you change the owner in the dag's argument from the default 'airflow' to something else i.e
default_args = {
'owner': 'whateveryournameis', <----
'depends_on_past': False,
'start_date': datetime(2015, 6, 1),
'email': ['airflow#example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG('tutorial', default_args=default_args, schedule_interval=timedelta(days=1))
in order to have your new dag shown in UI dags list, you should create a new user in airflow.
Creating a user is simple. Go to UI, under Admin, go to Users and create a new one.

Related

How to limit the number of dag retries?

I have a DAG configured like this:
AIRFLOW_DEFAULT_ARGS = {
'owner': 'airflow',
'depends_on_past': False,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
'dagrun_timeout': timedelta(hours=5)
}
DAILY_RUNNER = DAG(
'daily_runner',
max_active_runs=1,
start_date=datetime(2019, 1, 1),
schedule_interval="0 17 * * *",
default_args=AIRFLOW_DEFAULT_ARGS)
My current understanding is that retries says that a task will be retried once before failing for good. Is there a way to set a similar limit for the number of times a DAG gets retried? If I have a dag in the running state, I want to be able to set it to failed from within the UI once and have it stop rerunning.
Currently, there is no way to set retry at dag level.
Please refer the below answer for retrying a set of tasks/whole-dag in case of failures.
Can a failed Airflow DAG Task Retry with changed parameter

How to limit Airflow to run only one instance of a DAG run at a time?

I want the tasks in the DAG to all finish before the 1st task of the next run gets executed.
I have max_active_runs = 1, but this still happens.
default_args = {
'depends_on_past': True,
'wait_for_downstream': True,
'max_active_runs': 1,
'start_date': datetime(2018, 03, 04),
'owner': 't.n',
'email': ['t.n#example.com'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 3,
'retry_delay': timedelta(minutes=4)
}
dag = DAG('example', default_args=default_args, schedule_interval = schedule_interval)
(All of my tasks are dependent on the previous task. Airflow version is 1.8.0)
Thank you
I changed to put max_active_runs as an argument of DAG() instead of in default_arguments, and it worked.
Thanks SimonD for giving me the idea, though not directly pointing to it in your answer.
You've put the 'max_active_runs': 1 into the default_args parameter and not into the correct spot.
max_active_runs is a constructor argument for a DAG and should not be put into the default_args dictionary.
Here is an example DAG that shows where you need to move it to:
dag_args = {
'owner': 'Owner',
# 'max_active_runs': 1, # <--- Here is where you had it.
'depends_on_past': False,
'start_date': datetime(2018, 01, 1, 12, 00),
'email_on_failure': False
}
sched = timedelta(hours=1)
dag = DAG(
job_id,
default_args=dag_args,
schedule_interval=sched,
max_active_runs=1 # <---- Here is where it is supposed to be
)
If the tasks that your dag is running are actually sub-dags then you may need to pass max_active_runs into the subdags too but not 100% sure on this.
You can use xcoms to do it. First take 2 python operators as 'start' and 'end' to the DAG. Set the flow as:
start ---> ALL TASKS ----> end
'end' will always push a variable
last_success = context['execution_date'] to xcom (xcom_push). (Requires provide_context = True in the PythonOperators).
And 'start' will always check xcom (xcom_pull) to see whether there exists a last_success variable with value equal to the previous DagRun's execution_date or to the DAG's start_date (to let the process start).
Followed this answer
Actually you should use DAG_CONCURRENCY=1 as environment var. Worked for me.

Apache Airflow Task Instance state is blank

I have the dag config like below
args = {
'owner': 'XXX',
'depends_on_past': False,
'start_date': datetime(2018, 2, 26),
'email': ['sample#sample.com'],
'email_on_failure': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG(dag_id='Daily_Report',
default_args=args,
schedule_interval='0 11 * * *',
dagrun_timeout=timedelta(seconds=30))
I have a bash operator and a data bricks operator
run_this = BashOperator(task_id='run_report',
bash_command=templated_command,
dag=dag)
notebook_run = DatabricksSubmitRunOperator(
task_id='notebook_run',
notebook_task=notebook_task,
existing_cluster_id='xxxx',
dag=dag)
I'm running this like run_this.set_downstream(notebook_run)
The bash operator runs fine but the data bricks operator doesn't run it just leaves a blank state like below
Blank Status Airflow
Any thing I'm missing ? Im using Airflow version from Databricks here https://github.com/databricks/incubator-airflow
Try highlighting the text in the white label. It will probably say "None". White on white is terrible UX so I'm not sure why Airflow does it that way.

unwanted DAG runs in Airflow

I configured my DAG like this:
default_args = {
'owner': 'Aviv',
'depends_on_past': False,
'start_date': datetime(2017, 1, 1),
'email': ['aviv#oron.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
'retry_delay': timedelta(minutes=1)
}
dag = DAG(
'MyDAG'
, schedule_interval=timedelta(minutes=3)
, default_args=default_args
, catchup=False
)
and for some reason, when i un-pause the DAG, its being executed twice immediatly.
Any idea why? And is there any rule i can apply to tell this DAG to never run more than once in the same time?
You can specify max_active_runs like this:
dag = airflow.DAG(
'customer_staging',
schedule_interval="#daily",
dagrun_timeout=timedelta(minutes=60),
template_searchpath=tmpl_search_path,
default_args=args,
max_active_runs=1)
I've never seen it happening, are you sure that those runs are not backfills, see: https://stackoverflow.com/a/47953439/9132848
I think its because you have missed the scheduled time and airflow is backfilling it automatically when you ON it again. You can disable this by
catchup_by_default = False in the airflow.cfg.

airflow does not send emails when jobs are launched?

I set up dags as below. I launched this dag by
airflow backfill jobs -s 2017-05-01 -e 2017-06-07
I did not receive any emails although the jobs are successful.
Am I suppose to do anything else to receive the email?
Or I should run dag in a different way?
dag as below:
default_args = {
'owner': 'me',
'depends_on_past': False,
'start_date': datetime(2017, 5, 01),
'end_date': datetime(2017, 6, 3),
'email': ['Owner#gmail.com'],
'email_on_failure': True,
'email_on_retry': True,
'retries': 1,
'retry_delay': timedelta(minutes=30),
}
Airflow gives the option to send email only on retry and failure by default. To send email on success, you can add EmailOperator in your DAG at the end.

Resources