Airflow dag gets stuck after renaming a task - airflow

I have a task(named task1) inside a subdag, a dag run has been finished. then I renamed task1 to task2, and rerun the previous dag run, using 'airflow clear'. Then the subdag is always in running state. When I zoom into the subdag, I can see the subdag's state is success, and all its tasks has been finished successfully.
The subdag's log shows Airflow is still waiting for task1(but it is marked as 'removed'):
[2018-08-22 23:53:04,032] {base_task_runner.py:95} INFO - Subtask: [2018-08-22 23:53:04,032] {jobs.py:2002} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 1 | succeeded: 5 | kicked_off: 0 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 0
[2018-08-22 23:53:04,032] {base_task_runner.py:95} INFO - Subtask: [2018-08-22 23:53:04,032] {jobs.py:2006} INFO - Finished dag run loop iteration. Remaining tasks [<TaskInstance: task1 2018-08-08 15:13:02 [removed]>]
[2018-08-22 23:53:09,050] {base_task_runner.py:95} INFO - Subtask: [2018-08-22 23:53:09,049] {jobs.py:2002} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 1 | succeeded: 5 | kicked_off: 0 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 0
[2018-08-22 23:53:09,050] {base_task_runner.py:95} INFO - Subtask: [2018-08-22 23:53:09,050] {jobs.py:2006} INFO - Finished dag run loop iteration. Remaining tasks [<TaskInstance: task1 2018-08-08 15:13:02 [removed]>]
[2018-08-22 23:53:14,068] {base_task_runner.py:95} INFO - Subtask: [2018-08-22 23:53:14,067] {jobs.py:2002} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 1 | succeeded: 5 | kicked_off: 0 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 0
[2018-08-22 23:53:14,068] {base_task_runner.py:95} INFO - Subtask: [2018-08-22 23:53:14,068] {jobs.py:2006} INFO - Finished dag run loop iteration. Remaining tasks [<TaskInstance: task1 2018-08-08 15:13:02 [removed]>]
[2018-08-22 23:53:19,083] {base_task_runner.py:95} INFO - Subtask: [2018-08-22 23:53:19,083] {jobs.py:2002} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 1 | succeeded: 5 | kicked_off: 0 | failed: 0 | skipped: 5 | deadlocked: 0 | not ready: 0
[2018-08-22 23:53:19,084] {base_task_runner.py:95} INFO - Subtask: [2018-08-22 23:53:19,083] {jobs.py:2006} INFO - Finished dag run loop iteration. Remaining tasks [<TaskInstance: task1 2018-08-08 15:13:02 [removed]>]
I'm using apache-airflow 1.8.1.
What should I do now?

Can you recreate the error without using the sub-dag? Airflow treats each subdag as a vertex instead of a the whole graph, so when you changed the one task, it updated the subdag, but that change did not propagate back to the parent DAG.
The way Airflow handles subdags can have unintended consequences so most of the community advises staying away from them.

You must change the name of the subdag. Due to how Airflow saves the information of the DAGs in the metadata db, you need to change the DAG name each time you make significant changes in the DAG.
That is why the naming convention for DAGs is my_dag_v1, so that you can conveniently update the v number each time you make changes.

Related

Airflow standalone sqlite3 Integrity Error

I'm trying to run airflow standalone after following these instructions https://airflow.apache.org/docs/apache-airflow/stable/start/local.html on "Ubuntu on Windows". I already placed the AirflowHome folder inside C:/Users/my_user_name/ and that's esentially all the changes I did. However, I'm getting an IntegrityError and the documentation seems very cryptic. Could you guys help me out?
standalone | Starting Airflow Standalone
standalone | Checking database is initialized
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
WARNI [airflow.models.crypto] empty cryptography key - values will not be stored encrypted.
standalone | Database ready
[2022-05-26 10:21:49,812] {manager.py:585} INFO - Removed Permission menu access on Permissions to role Admin
[2022-05-26 10:21:49,885] {manager.py:543} INFO - Removed Permission View: menu_access on Permissions
[2022-05-26 10:21:50,076] {manager.py:508} INFO - Created Permission View: menu access on Permissions
[2022-05-26 10:21:50,127] {manager.py:568} INFO - Added Permission menu access on Permissions to role Admin
triggerer | ____________ _____________
triggerer | ____ |__( )_________ __/__ /________ __
triggerer | ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
triggerer | ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
triggerer | _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
triggerer | [2022-05-26 10:21:58,355] {triggerer_job.py:101} INFO - Starting the triggerer
scheduler | ____________ _____________
scheduler | ____ |__( )_________ __/__ /________ __
scheduler | ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
scheduler | ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
scheduler | _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
scheduler | [2022-05-26 10:21:58 -0400] [233] [INFO] Starting gunicorn 20.1.0
scheduler | [2022-05-26 10:21:58 -0400] [233] [INFO] Listening at: http://0.0.0.0:8793 (233)
scheduler | [2022-05-26 10:21:58 -0400] [233] [INFO] Using worker: sync
scheduler | [2022-05-26 10:21:58 -0400] [234] [INFO] Booting worker with pid: 234
scheduler | [2022-05-26 10:21:58,614] {scheduler_job.py:693} INFO - Starting the scheduler
scheduler | [2022-05-26 10:21:58,614] {scheduler_job.py:698} INFO - Processing each file at most -1 times
scheduler | [2022-05-26 10:21:58,619] {executor_loader.py:106} INFO - Loaded executor: SequentialExecutor
scheduler | [2022-05-26 10:21:58,622] {manager.py:156} INFO - Launched DagFileProcessorManager with pid: 235
scheduler | [2022-05-26 10:21:58,624] {scheduler_job.py:1218} INFO - Resetting orphaned tasks for active dag runs
scheduler | [2022-05-26 10:21:58,639] {settings.py:55} INFO - Configured default timezone Timezone('UTC')
scheduler | [2022-05-26 10:21:58 -0400] [236] [INFO] Booting worker with pid: 236
scheduler | [2022-05-26 10:21:58,709] {manager.py:399} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2) when using sqlite. So we set parallelism to 1.
webserver | [2022-05-26 10:21:59 -0400] [231] [INFO] Starting gunicorn 20.1.0
webserver | [2022-05-26 10:21:59 -0400] [231] [INFO] Listening at: http://0.0.0.0:8080 (231)
webserver | [2022-05-26 10:21:59 -0400] [231] [INFO] Using worker: sync
webserver | [2022-05-26 10:21:59 -0400] [239] [INFO] Booting worker with pid: 239
webserver | [2022-05-26 10:21:59 -0400] [240] [INFO] Booting worker with pid: 240
webserver | [2022-05-26 10:21:59 -0400] [241] [INFO] Booting worker with pid: 241
webserver | [2022-05-26 10:22:00 -0400] [242] [INFO] Booting worker with pid: 242
webserver | [2022-05-26 10:22:02,638] {manager.py:585} INFO - Removed Permission menu access on Permissions to role Admin
webserver | [2022-05-26 10:22:02,644] {manager.py:587} ERROR - Remove Permission to Role Error: DELETE statement on table 'ab_permission_view_role' expected to delete 1 row(s); Only 0 were matched.
webserver | [2022-05-26 10:22:02,776] {manager.py:543} INFO - Removed Permission View: menu_access on Permissions
webserver | /home/carlos/.local/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py:1461 SAWarning: DELETE statement on table 'ab_permission_view' expected to delete 1 row(s); 0 were matched. Please set confirm_deleted_rows=False within the mapper configuration to prevent this warning.
webserver | [2022-05-26 10:22:02,817] {manager.py:543} INFO - Removed Permission View: menu_access on Permissions
webserver | /home/carlos/.local/lib/python3.8/site-packages/sqlalchemy/orm/session.py:2459 SAWarning: Identity map already had an identity for (<class 'airflow.www.fab_security.sqla.models.Permission'>, (185,), None), replacing it with newly flushed object. Are there load operations occurring inside of an event handler within the flush?
webserver | [2022-05-26 10:22:03,097] {manager.py:508} INFO - Created Permission View: menu access on Permissions
webserver | [2022-05-26 10:22:03,168] {manager.py:568} INFO - Added Permission menu access on Permissions to role Admin
webserver | [2022-05-26 10:22:03,175] {manager.py:570} ERROR - Add Permission to Role Error: (sqlite3.IntegrityError) UNIQUE constraint failed: ab_permission_view_role.permission_view_id, ab_permission_view_role.role_id
webserver | [SQL: INSERT INTO ab_permission_view_role (permission_view_id, role_id) VALUES (?, ?)]
webserver | [parameters: (185, 1)]
webserver | (Background on this error at: http://sqlalche.me/e/14/gkpj)
webserver | [2022-05-26 10:22:03,188] {manager.py:570} ERROR - Add Permission to Role Error: (sqlite3.IntegrityError) UNIQUE constraint failed: ab_permission_view_role.permission_view_id, ab_permission_view_role.role_id
webserver | [SQL: INSERT INTO ab_permission_view_role (permission_view_id, role_id) VALUES (?, ?)]
webserver | [parameters: (185, 1)]
webserver | (Background on this error at: http://sqlalche.me/e/14/gkpj)
standalone |
standalone | Airflow is ready
standalone | Login with username: admin password: qhbDVvxz9ARPaWQt
standalone | Airflow Standalone is for development purposes only. Do not use this in production!
When trying to open airflow, I always get the following:
And the same for http://0.0.0.0:8793/
Almost exactly the same happens when I try to run airflow webserver.
Thanks!

Airflow: How to use the same tasks in different dags

I am learning Airflow and run into a problem.
I have 2 tasks, that I want to use in several dags. The difference between these tasks will be only the parameters the operators are going to get.
This could be accomplished by simply copy and pasting the tasks into all the dags, but maintain this type of code would be a nightmare.
So what a want to do is to create a class that will contain the tasks I will be calling several times and just import this class from the dags.
I replicated the issue with a minimal example.
This is the code for the class:
from airflow.operators.bash_operator import BashOperator
class Operator_generator():
_instance = None
def __init__(self, var1, var2):
self.var1 = var1
self.var2 = var2
def create_task_1(self):
return BashOperator(
task_id='task1',
bash_command='echo Im running task 1, the current execution date is {{ds}} and the previous execution date is {{prev_ds}}'
)
def create_task_2(self):
return BashOperator(
task_id='task2',
bash_command='echo Im running task 2, the current execution date is {{ds}} and the previous execution date is {{prev_ds}}'
)
and this is a dag example where I would import the class
from include.src.date.decorator import DefaultDateTime
from airflow import DAG
from include.src.airflow.xcom import cleanup
from operator_creator import Operator_generator
dag_id = "dag1"
default_args = {
"owner": "airflow",
"depends_on_past": False,
"start_date": DefaultDateTime(2021, 6, 1),
'retries': 1
}
# Dag definition
with DAG(
dag_id,
schedule_interval='#monthly',
catchup=False,
on_failure_callback=cleanup,
on_success_callback=cleanup
) as dag:
dag.doc_md = __doc__
operator_generator = Operator_generator('var1','var2')
task1 = operator_generator.create_task_1()
task2 = operator_generator.create_task_2()
task1 >> task2
Note that 'var1' and 'var2' are variables that I need to parametrize the operators.
The problem is that when I run the dag the tasks run twice:
[2021-08-25 16:29:46,937] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2021-08-25 16:29:46,937] {taskinstance.py:881} INFO -
--------------------------------------------------------------------------------
[2021-08-25 16:29:46,955] {taskinstance.py:900} INFO - Executing <Task(BashOperator): task1> on 2021-07-01T06:00:00+00:00
[2021-08-25 16:29:46,961] {standard_task_runner.py:53} INFO - Started process 67689 to run task
[2021-08-25 16:29:47,011] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: dag1.task1 2021-07-01T06:00:00+00:00 [running]> 30b770753547
[2021-08-25 16:29:47,032] {bash_operator.py:113} INFO - Tmp dir root location:
/tmp
[2021-08-25 16:29:47,033] {bash_operator.py:136} INFO - Temporary script location: /tmp/airflowtmpixijgd4s/task1lfcwdvfa
[2021-08-25 16:29:47,033] {bash_operator.py:146} INFO - Running command: echo Im running task 1, the current execution date is 2021-07-01 and the previous execution date is 2021-06-01
[2021-08-25 16:29:47,039] {bash_operator.py:153} INFO - Output:
[2021-08-25 16:29:47,040] {bash_operator.py:157} INFO - Im running task 1, the current execution date is 2021-07-01 and the previous execution date is 2021-06-01
[2021-08-25 16:29:47,040] {bash_operator.py:161} INFO - Command exited with return code 0
[2021-08-25 16:29:47,052] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=dag1, task_id=task1, execution_date=20210701T060000, start_date=20210825T162946, end_date=20210825T162947
[2021-08-25 16:29:55,335] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: dag1.task1 2021-08-25T16:29:41+00:00 [queued]>
[2021-08-25 16:29:55,335] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: dag1.task2 2021-07-01T06:00:00+00:00 [queued]>
[2021-08-25 16:29:55,348] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: dag1.task1 2021-08-25T16:29:41+00:00 [queued]>
[2021-08-25 16:29:55,348] {taskinstance.py:879} INFO -
--------------------------------------------------------------------------------
[2021-08-25 16:29:55,348] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2021-08-25 16:29:55,348] {taskinstance.py:881} INFO -
--------------------------------------------------------------------------------
[2021-08-25 16:29:55,357] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: dag1.task2 2021-07-01T06:00:00+00:00 [queued]>
[2021-08-25 16:29:55,357] {taskinstance.py:879} INFO -
--------------------------------------------------------------------------------
[2021-08-25 16:29:55,357] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2021-08-25 16:29:55,357] {taskinstance.py:881} INFO -
--------------------------------------------------------------------------------
[2021-08-25 16:29:55,363] {taskinstance.py:900} INFO - Executing <Task(BashOperator): task1> on 2021-08-25T16:29:41+00:00
[2021-08-25 16:29:55,366] {standard_task_runner.py:53} INFO - Started process 67809 to run task
[2021-08-25 16:29:55,370] {taskinstance.py:900} INFO - Executing <Task(BashOperator): task2> on 2021-07-01T06:00:00+00:00
[2021-08-25 16:29:55,374] {standard_task_runner.py:53} INFO - Started process 67810 to run task
[2021-08-25 16:29:55,412] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: dag1.task1 2021-08-25T16:29:41+00:00 [running]> 30b770753547
[2021-08-25 16:29:55,422] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: dag1.task2 2021-07-01T06:00:00+00:00 [running]> 30b770753547
[2021-08-25 16:29:55,430] {bash_operator.py:113} INFO - Tmp dir root location:
/tmp
[2021-08-25 16:29:55,432] {bash_operator.py:136} INFO - Temporary script location: /tmp/airflowtmpsacovlfm/task1doc6fakb
[2021-08-25 16:29:55,432] {bash_operator.py:146} INFO - Running command: echo Im running task 1, the current execution date is 2021-08-25 and the previous execution date is 2021-08-25
[2021-08-25 16:29:55,440] {bash_operator.py:153} INFO - Output:
[2021-08-25 16:29:55,440] {bash_operator.py:157} INFO - Im running task 1, the current execution date is 2021-08-25 and the previous execution date is 2021-08-25
[2021-08-25 16:29:55,441] {bash_operator.py:161} INFO - Command exited with return code 0
[2021-08-25 16:29:55,444] {bash_operator.py:113} INFO - Tmp dir root location:
/tmp
[2021-08-25 16:29:55,445] {bash_operator.py:136} INFO - Temporary script location: /tmp/airflowtmpyqqww8an/task2i29a2lk7
[2021-08-25 16:29:55,445] {bash_operator.py:146} INFO - Running command: echo Im running task 2, the current execution date is 2021-07-01 and the previous execution date is 2021-06-01
[2021-08-25 16:29:55,451] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=dag1, task_id=task1, execution_date=20210825T162941, start_date=20210825T162955, end_date=20210825T162955
[2021-08-25 16:29:55,453] {bash_operator.py:153} INFO - Output:
[2021-08-25 16:29:55,453] {bash_operator.py:157} INFO - Im running task 2, the current execution date is 2021-07-01 and the previous execution date is 2021-06-01
[2021-08-25 16:29:55,454] {bash_operator.py:161} INFO - Command exited with return code 0
[2021-08-25 16:29:55,465] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=dag1, task_id=task2, execution_date=20210701T060000, start_date=20210825T162955, end_date=20210825T162955
[2021-08-25 16:29:56,922] {logging_mixin.py:112} INFO - [2021-08-25 16:29:56,921] {local_task_job.py:103} INFO - Task exited with return code 0
[2021-08-25 16:30:05,333] {logging_mixin.py:112} INFO - [2021-08-25 16:30:05,333] {local_task_job.py:103} INFO - Task exited with return code 0
[2021-08-25 16:30:05,337] {logging_mixin.py:112} INFO - [2021-08-25 16:30:05,337] {local_task_job.py:103} INFO - Task exited with return code 0
[2021-08-25 16:30:06,794] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: dag1.task2 2021-08-25T16:29:41+00:00 [queued]>
[2021-08-25 16:30:06,809] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: dag1.task2 2021-08-25T16:29:41+00:00 [queued]>
[2021-08-25 16:30:06,809] {taskinstance.py:879} INFO -
--------------------------------------------------------------------------------
[2021-08-25 16:30:06,810] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2021-08-25 16:30:06,810] {taskinstance.py:881} INFO -
--------------------------------------------------------------------------------
[2021-08-25 16:30:06,822] {taskinstance.py:900} INFO - Executing <Task(BashOperator): task2> on 2021-08-25T16:29:41+00:00
[2021-08-25 16:30:06,826] {standard_task_runner.py:53} INFO - Started process 67937 to run task
[2021-08-25 16:30:06,875] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: dag1.task2 2021-08-25T16:29:41+00:00 [running]> 30b770753547
[2021-08-25 16:30:06,892] {bash_operator.py:113} INFO - Tmp dir root location:
/tmp
[2021-08-25 16:30:06,893] {bash_operator.py:136} INFO - Temporary script location: /tmp/airflowtmpot_xsukw/task2xo4uxspu
[2021-08-25 16:30:06,893] {bash_operator.py:146} INFO - Running command: echo Im running task 2, the current execution date is 2021-08-25 and the previous execution date is 2021-08-25
[2021-08-25 16:30:06,901] {bash_operator.py:153} INFO - Output:
[2021-08-25 16:30:06,902] {bash_operator.py:157} INFO - Im running task 2, the current execution date is 2021-08-25 and the previous execution date is 2021-08-25
[2021-08-25 16:30:06,902] {bash_operator.py:161} INFO - Command exited with return code 0
[2021-08-25 16:30:06,913] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=dag1, task_id=task2, execution_date=20210825T162941, start_date=20210825T163006, end_date=20210825T163006
[2021-08-25 16:30:16,800] {logging_mixin.py:112} INFO - [2021-08-25 16:30:16,799] {local_task_job.py:103} INFO - Task exited with return code 0
Notice how the tasks are executed 2 times:
In the first execution the values of {{ds}} and {{prev_ds}} are the current date.
In the second execution the values of {{ds}} and {{prev_ds}} correspond to the monthly interval.
Why the tasks run 2 times?
Is there a way to import tasks like this?
Note 1: I am not allowed to use subdags.
Edit: Adding the execution tree
Edit 2:
If anyone run into this problem I figured out.
The problem was that I was running the dag with an external trigger, that deletes the dag and start it again. So the the dag runs for the external trigger, but also the scheduler sees that the dag hasn't run for the month, so it schedules the execution, resulting in 2 runs.
The solution I found is:
Turn off the dag in the airflow interface
Delete the dag (with the red x in the far right of the dag)
Refresh the page
The dag appears again in the list, turn it on
This will make the scheduler do its job and the dag will run as it should.

Airflow : Bash Operator is not giving the results

Through Airflow , We are trying to execute below code, this script is executed with 0 return code but no result.
Referred :
https://airflow.apache.org/tutorial.html
Taken same Example Pipeline definition
t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
Log is showing 0 return code ,But Output is empty
root#masbidw1.usa.corp.ad:/usr/airflow> airflow test tutorial print_date 2015-06-01
[2019-08-22 14:22:10,318] {__init__.py:45} INFO - Using executor SequentialExecutor
[2019-08-22 14:22:10,401] {models.py:189} INFO - Filling up the DagBag from /root/airflow/dags
[2019-08-22 14:22:10,552] {bash_operator.py:70} INFO - Tmp dir root location:
/tmp
[2019-08-22 14:22:10,552] {bash_operator.py:80} INFO - Temporary script location: /tmp/airflowtmpFWK0XJ//tmp/airflowtmpFWK0XJ/print_dateYEnzSO
[2019-08-22 14:22:10,552] {bash_operator.py:88} INFO - Running command: date
[2019-08-22 14:22:10,557] {bash_operator.py:97} INFO - Output:
[2019-08-22 14:22:10,561] {bash_operator.py:101} INFO - Thu Aug 22 14:22:10 EDT 2019
[2019-08-22 14:22:10,561] {bash_operator.py:105} INFO - Command exited with return code 0

How to gracefully shut down Airflow?

We run Airflow on AWS ECS, and bundle all DAGs in a Docker Image. From time to time, we update DAGS, and deploy a new version of the Docker Image. When we do so, ECS will kill running Airflow webserver / scheduler / workers, and start new ones. We ran into issues that some DAGs, which started before the deployment, were still marked as "running" after the deployment even though their tasks have been killed by the deployment, and some externalSensor tasks were left hanging.
Is there a way to gracefully shut down Airflow ? Ideally, it could mark all running tasks as up_for_retry, and rerun them after the deployment is finished ?
docker stop <container-id/name>
This does warm shutdown of celery workers
worker_1 | worker: Warm shutdown (MainProcess)
worker_1 |
worker_1 | -------------- celery#155afae87458 v4.1.1 (latentcall)
worker_1 | ---- **** -----
worker_1 | --- * *** * -- Linux-4.9.93-linuxkit-aufs-x86_64-with-debian-9.5 2018-10-18 16:52:08
worker_1 | -- * - **** ---
worker_1 | - ** ---------- [config]
.......

Airflow does not update the progress of a dag/task to be completed even though the dag/task has actually completed

I have set up airflow to be running in a distributed mode with 10 worker nodes. I tried to access the performance of the parallel workloads by triggering a test dag which contains just 1 task which just sleeps for 3 seconds and then comes out.
I triggered the dag using the command
airflow backfill test_dag -s 2015-06-20 -e 2015-07-10
The scheduler kicks of the jobs/dags in parallel and frequently I see the below o/p:
[2017-06-27 09:52:29,611] {models.py:4024} INFO - Updating state for considering 1 task(s)
[2017-06-27 09:52:29,647] {models.py:4024} INFO - Updating state for considering 1 task(s)
[2017-06-27 09:52:29,664] {jobs.py:1983} INFO - [backfill progress] | finished run 19 of 21 | tasks waiting: 0 | succeeded: 19 | kicked_off: 2 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 0
Here the kicked_off:2 indicates that 2 tasks are kicked off but when I see the UI for the status of the dag runs, I see 2 instances of dags to be running. When i look into the respective task instance log, it indicates that the task has been successfully completed but still the above message is displayed in the command prompt infinitely
[2017-06-27 09:52:29,611] {models.py:4024} INFO - Updating state for considering 1 task(s)
[2017-06-27 09:52:29,647] {models.py:4024} INFO - Updating state for considering 1 task(s)
[2017-06-27 09:52:29,664] {jobs.py:1983} INFO - [backfill progress] | finished run 19 of 21 | tasks waiting: 0 | succeeded: 19 | kicked_off: 2 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 0
Is it that the messages which are being sent by the worker is getting dropped and hence the status is not getting updated?
Is there any parameter in airflow.cfg file which allows the failed jobs like these to be retried on other worker nodes instead of infinately waiting for the message for the worker node which is responsible for executing the aobe failed tasks..

Resources