Trying to get a simplified version snowflake operator example to work, but triggering the DAG fails with error:
Task exited with return code Negsignal.SIGABRT
The dag only has the 1st task running the CREATE_TABLE_SQL_STRING. It will run successfully via test run: airflow dags test sf_example_short 2021-10-10
I can see the table is created in snowflake so connection appears fine and syntax must be okay.
But drop the table and trigger via airflow UI or via CLI:
airflow dags trigger sf_example_short it fails w/vague error:
Task exited with return code Negsignal.SIGABRT
googling I’ve found suggestions to change scheduler_health_check_threshold, or schedule_after_task_exectution, or default_impersonation, or OBJC_DISABLE_INITIALIZE_FORK_SAFETY, or killed_task_cleanup_time
but none of these fixed the issue
What am I missing? TIA!
Log excerpt:
[2021-10-11 10:03:57,256] {taskinstance.py:1114} INFO - Executing <Task(SnowflakeOperator): snowflake_cre_tbl> on 2021-10-11T15:01:09+00:00
[2021-10-11 10:03:57,261] {standard_task_runner.py:52} INFO - Started process 73291 to run task
[2021-10-11 10:03:57,271] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'sf_example_short', 'snowflake_cre_tbl', '2021-10-11T15:01:09+00:00', '--job-id', '222', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/sf_example_short.py', '--cfg-path', '/var/folders/jp/b35mp4dj4qn3y35k491hrpg80000gn/T/tmppt5q2b8h', '--error-file', '/var/folders/jp/b35mp4dj4qn3y35k491hrpg80000gn/T/tmp83zp78uz']
[2021-10-11 10:03:57,274] {standard_task_runner.py:77} INFO - Job 222: Subtask snowflake_cre_tbl
[2021-10-11 10:03:57,276] {cli_action_loggers.py:66} DEBUG - Calling callbacks: [<function default_action_log at 0x10e288940>]
[2021-10-11 10:03:57,286] {settings.py:208} DEBUG - Setting up DB connection pool (PID 73291)
[2021-10-11 10:03:57,287] {settings.py:244} DEBUG - settings.prepare_engine_args(): Using NullPool
[2021-10-11 10:03:57,289] {taskinstance.py:618} DEBUG - Refreshing TaskInstance <TaskInstance: sf_example_short.snowflake_cre_tbl 2021-10-11T15:01:09+00:00 [None]> from DB
[2021-10-11 10:03:57,298] {taskinstance.py:656} DEBUG - Refreshed TaskInstance <TaskInstance: sf_example_short.snowflake_cre_tbl 2021-10-11T15:01:09+00:00 [running]>
[2021-10-11 10:04:02,296] {taskinstance.py:618} DEBUG - Refreshing TaskInstance <TaskInstance: sf_example_short.snowflake_cre_tbl 2021-10-11T15:01:09+00:00 [running]> from DB
[2021-10-11 10:04:02,299] {taskinstance.py:656} DEBUG - Refreshed TaskInstance <TaskInstance: sf_example_short.snowflake_cre_tbl 2021-10-11T15:01:09+00:00 [running]>
[2021-10-11 10:04:02,305] {logging_mixin.py:109} INFO - Running <TaskInstance: sf_example_short.snowflake_cre_tbl 2021-10-11T15:01:09+00:00 [running]> on host xxxxxxs-MacBook-Pro.local
[2021-10-11 10:04:02,307] {taskinstance.py:618} DEBUG - Refreshing TaskInstance <TaskInstance: sf_example_short.snowflake_cre_tbl 2021-10-11T15:01:09+00:00 [running]> from DB
[2021-10-11 10:04:02,310] {taskinstance.py:656} DEBUG - Refreshed TaskInstance <TaskInstance: sf_example_short.snowflake_cre_tbl 2021-10-11T15:01:09+00:00 [running]>
[2021-10-11 10:04:07,302] {base_job.py:227} DEBUG - [heartbeat]
[2021-10-11 10:04:07,331] {taskinstance.py:684} DEBUG - Clearing XCom data
[2021-10-11 10:04:07,336] {taskinstance.py:691} DEBUG - XCom data cleared
[2021-10-11 10:04:07,350] {taskinstance.py:1251} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=sf_example_short
AIRFLOW_CTX_TASK_ID=snowflake_cre_tbl
AIRFLOW_CTX_EXECUTION_DATE=2021-10-11T15:01:09+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-10-11T15:01:09+00:00
[2021-10-11 10:04:07,351] {__init__.py:146} DEBUG - Preparing lineage inlets and outlets
[2021-10-11 10:04:07,351] {__init__.py:190} DEBUG - inlets: [], outlets: []
[2021-10-11 10:04:07,352] {snowflake.py:119} INFO - Executing: CREATE OR REPLACE TRANSIENT TABLE SF_SHORT_TEST (name VARCHAR(250), id INT);
[2021-10-11 10:04:07,356] {base.py:70} INFO - Using connection to: id: snowflake_conn. Host: https://***.snowflakecomputing.com/, Port: None, Schema: airflow1, Login: xxxxxxxxxxx, Password: ***, extra: {'account': '***', 'warehouse': 'DEMO_WH', 'database': 'AIRFLOW_SANDBOX', 'role': 'sysadmin'}
[2021-10-11 10:04:07,358] {connection.py:218} INFO - Snowflake Connector for Python Version: 2.4.1, Python Version: 3.8.12, Platform: macOS-10.15.7-x86_64-i386-64bit
[2021-10-11 10:04:07,359] {connection.py:421} DEBUG - connect
[2021-10-11 10:04:07,359] {connection.py:656} DEBUG - __config
[2021-10-11 10:04:07,359] {connection.py:773} INFO - This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
[2021-10-11 10:04:07,359] {connection.py:789} INFO - Setting use_openssl_only mode to False
[2021-10-11 10:04:07,360] {converter.py:135} DEBUG - use_numpy: False
[2021-10-11 10:04:07,360] {connection.py:570} DEBUG - REST API object was created: ***.snowflakecomputing.com:443
[2021-10-11 10:04:07,361] {auth.py:129} DEBUG - authenticate
[2021-10-11 10:04:07,362] {auth.py:156} DEBUG - assertion content: *********
[2021-10-11 10:04:07,362] {auth.py:160} DEBUG - account=***, user=xxxxxxxxxxx, database=AIRFLOW_SANDBOX, schema=airflow1, warehouse=DEMO_WH, role=sysadmin, request_id=***
[2021-10-11 10:04:07,362] {auth.py:193} DEBUG - body['data']: {'CLIENT_APP_ID': 'PythonConnector', 'CLIENT_APP_VERSION': '2.4.1', 'SVN_REVISION': None, 'ACCOUNT_NAME': '***', 'LOGIN_NAME': 'xxxxxxxxxxx', 'CLIENT_ENVIRONMENT': {'APPLICATION': 'PythonConnector', 'OS': 'Darwin', 'OS_VERSION': 'macOS-10.15.7-x86_64-i386-64bit', 'PYTHON_VERSION': '3.8.12', 'PYTHON_RUNTIME': 'CPython', 'PYTHON_COMPILER': 'Clang 12.0.0 (clang-1200.0.32.29)', 'OCSP_MODE': 'FAIL_OPEN', 'TRACING': 10, 'LOGIN_TIMEOUT': 120, 'NETWORK_TIMEOUT': None}, 'SESSION_PARAMETERS': {'CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY': 900, 'CLIENT_PREFETCH_THREADS': 4}}
[2021-10-11 10:04:07,363] {retry.py:230} DEBUG - Converted retries value: 1 -> Retry(total=1, connect=None, read=None, redirect=None, status=None)
[2021-10-11 10:04:07,364] {retry.py:230} DEBUG - Converted retries value: 1 -> Retry(total=1, connect=None, read=None, redirect=None, status=None)
[2021-10-11 10:04:07,364] {network.py:950} DEBUG - Active requests sessions: 1, idle: 0
[2021-10-11 10:04:07,365] {network.py:650} DEBUG - remaining request timeout: 120, retry cnt: 1
[2021-10-11 10:04:07,365] {network.py:638} DEBUG - Request guid: ***
[2021-10-11 10:04:07,366] {network.py:794} DEBUG - socket timeout: 60
[2021-10-11 10:04:07,405] {local_task_job.py:151} INFO - Task exited with return code Negsignal.SIGABRT
[2021-10-11 10:04:07,405] {taskinstance.py:618} DEBUG - Refreshing TaskInstance <TaskInstance: sf_example_short.snowflake_cre_tbl 2021-10-11T15:01:09+00:00 [running]> from DB
[2021-10-11 10:04:07,410] {taskinstance.py:656} DEBUG - Refreshed TaskInstance <TaskInstance: sf_example_short.snowflake_cre_tbl 2021-10-11T15:01:09+00:00 [running]>
[2021-10-11 10:04:07,411] {taskinstance.py:1867} DEBUG - Task Duration set to 10.174875
[2021-10-11 10:04:07,411] {taskinstance.py:1505} INFO - Marking task as FAILED. dag_id=sf_example_short, task_id=snowflake_cre_tbl, execution_date=20211011T150109, start_date=20211011T150357, end_date=20211011T150407
abbreviated system info:
Apache Airflow version | 2.1.3
executor | SequentialExecutor
task_logging_handler | airflow.utils.log.file_task_handler.FileTaskHandler
System info
OS | Mac OS
apache-airflow-providers-snowflake | 1.1.0
Related
We are using on_retry_callback parameter available in the Airflow operators to do some cleanup activities before the task is retried. If there are exceptions thrown on the on_retry_callback function, the exceptions are not logged in the task_instance's log. Without the exception details, it is getting difficult to debug if there are issues in the on_retry_callback function. If this is the default behavior, is there a workaround to enable logging for the exceptions?.
Note: We are using the airflow 2.0.2 version.
Please let me know if there are any questions.
Sample Dag to explain this is given below.
from datetime import datetime
from airflow.operators.python import PythonOperator
from airflow.models.dag import DAG
def sample_function2():
var = 1 / 0
def on_retry_callback_sample(context):
print(f'on_retry_callback_started')
v = 1 / 0
print(f'on_retry_callback completed')
dag = DAG(
'venkat-test-dag',
description='This is a test dag',
start_date=datetime(2023, 1, 10, 18, 0),
schedule_interval='0 12 * * *',
catchup=False
)
func2 = PythonOperator(task_id='function2',
python_callable=sample_function2,
dag=dag,
retries=2,
on_retry_callback=on_retry_callback_sample)
func2
Log file of this run on the local airflow setup is given below. If you see the last message we see on the log file "on_retry_callback_started" but I expect some ZeroDivisionError after this line and finally the line "on_retry_callback completed". How can I achieve this?.
14f0fed99882
*** Reading local file: /usr/local/airflow/logs/venkat-test-dag/function2/2023-01-13T13:22:03.178261+00:00/1.log
[2023-01-13 13:22:05,091] {{taskinstance.py:877}} INFO - Dependencies all met for <TaskInstance: venkat-test-dag.function2 2023-01-13T13:22:03.178261+00:00 [queued]>
[2023-01-13 13:22:05,128] {{taskinstance.py:877}} INFO - Dependencies all met for <TaskInstance: venkat-test-dag.function2 2023-01-13T13:22:03.178261+00:00 [queued]>
[2023-01-13 13:22:05,128] {{taskinstance.py:1068}} INFO -
--------------------------------------------------------------------------------
[2023-01-13 13:22:05,128] {{taskinstance.py:1069}} INFO - Starting attempt 1 of 3
[2023-01-13 13:22:05,128] {{taskinstance.py:1070}} INFO -
--------------------------------------------------------------------------------
[2023-01-13 13:22:05,143] {{taskinstance.py:1089}} INFO - Executing <Task(PythonOperator): function2> on 2023-01-13T13:22:03.178261+00:00
[2023-01-13 13:22:05,145] {{standard_task_runner.py:52}} INFO - Started process 6947 to run task
[2023-01-13 13:22:05,150] {{standard_task_runner.py:76}} INFO - Running: ['airflow', 'tasks', 'run', 'venkat-test-dag', 'function2', '2023-01-13T13:22:03.178261+00:00', '--job-id', '356', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/dp-etl-mixpanel_stg-24H/dags/venkat-test-dag.py', '--cfg-path', '/tmp/tmpny0mhh4j', '--error-file', '/tmp/tmpul506kro']
[2023-01-13 13:22:05,151] {{standard_task_runner.py:77}} INFO - Job 356: Subtask function2
[2023-01-13 13:22:05,244] {{logging_mixin.py:104}} INFO - Running <TaskInstance: venkat-test-dag.function2 2023-01-13T13:22:03.178261+00:00 [running]> on host 14f0fed99882
[2023-01-13 13:22:05,345] {{taskinstance.py:1283}} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=venkat-test-dag
AIRFLOW_CTX_TASK_ID=function2
AIRFLOW_CTX_EXECUTION_DATE=2023-01-13T13:22:03.178261+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-01-13T13:22:03.178261+00:00
[2023-01-13 13:22:05,346] {{taskinstance.py:1482}} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 117, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 128, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/usr/local/airflow/dags/dp-etl-mixpanel_stg-24H/dags/venkat-test-dag.py", line 7, in sample_function2
var = 1 / 0
ZeroDivisionError: division by zero
[2023-01-13 13:22:05,349] {{taskinstance.py:1532}} INFO - Marking task as UP_FOR_RETRY. dag_id=venkat-test-dag, task_id=function2, execution_date=20230113T132203, start_date=20230113T132205, end_date=20230113T132205
[2023-01-13 13:22:05,402] {{local_task_job.py:146}} INFO - Task exited with return code 1
[2023-01-13 13:22:05,459] {{logging_mixin.py:104}} INFO - on_retry_callback_started
Adding as an answer for visibility:
This issue is likely related to a fix which was merged in Airflow version 2.1.3:
https://github.com/apache/airflow/pull/17347
I am trying to create dependency between multiple dags.
Lets say Dag_A, Dab_B and and running every day at 14:15 and 14:30 respectively.
now i want to run Dag_C which runs at 14:30 having 2 sensors ( ExternalTaskSensors) each for above dags. I am also using execution_date_fn parameter which provides 3 execution date each for above dags. So basically sensor checks for 14:15 and 14:30 for each dag. But still sensor keeps on waiting and doesn't succeed. It going for up_for_schedule
Am i doing anything wrong? Please suggest how to deal with such cases.
I am using airflow version 2
Below is the code for
DAG_A:
with DAG(
dag_id="dag_a",
default_args=DEFAULT_ARGS,
max_active_runs=1,
schedule_interval="15 2 * * *",
catchup=True
) as dag:
dummy_task = DummyOperator(task_id="Task_A")
DAG_B:
with DAG(
dag_id="dag_b",
default_args=DEFAULT_ARGS,
max_active_runs=1,
schedule_interval="30 2 * * *",
catchup=True
) as dag:
dummy_task = DummyOperator(task_id="Task_B")
DAG_C:
with DAG(
dag_id="dag_c",
default_args=DEFAULT_ARGS,
max_active_runs=1,
schedule_interval="30 2 * * *",
catchup=True
) as dag:
wait_task_a = ExternalTaskSensor(
task_id=f"wait_for_task_a",
external_dag_id="dag_a",
execution_date_fn=lambda dt: [dt + timedelta(minutes=-i) for i in range(0, 30, 15)],
timeout=60 * 60 * 3, # 3 hours
poke_interval=60, # 5 minutes
mode="reschedule"
)
wait_task_b = ExternalTaskSensor(
task_id=f"wait_for_task_b",
external_dag_id="dag_b",
execution_date_fn=lambda dt: [dt + timedelta(minutes=-i) for i in range(0, 30, 15)],
timeout=60 * 60 * 3, # 3 hours
poke_interval=60, # 5 minutes
mode="reschedule"
)
dummy_task = DummyOperator(task_id="Task_C")
wait_task_a >> dummy_task
wait_task_b >> dummy_task
Sensor logs :
It keeps on poking although tasks are present
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1043} INFO - Dependencies all met for <TaskInstance: dag_c.wait_for_task_b scheduled__2022-05-19T02:30:00+00:00 [queued]>
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1043} INFO - Dependencies all met for <TaskInstance: dag_c.wait_for_task_b scheduled__2022-05-19T02:30:00+00:00 [queued]>
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1249} INFO -
--------------------------------------------------------------------------------
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1250} INFO - Starting attempt 1 of 2
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1251} INFO -
--------------------------------------------------------------------------------
[2022-05-23, 16:25:20 UTC] {taskinstance.py:1270} INFO - Executing <Task(ExternalTaskSensor): wait_for_task_b> on 2022-05-19 02:30:00+00:00
[2022-05-23, 16:25:20 UTC] {standard_task_runner.py:52} INFO - Started process 17603 to run task
[2022-05-23, 16:25:20 UTC] {standard_task_runner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'dag_c', 'wait_for_task_b', 'scheduled__2022-05-19T02:30:00+00:00', '--job-id', '4', '--raw', '--subdir', 'DAGS_FOLDER/sample/dagc.py', '--cfg-path', '/var/folders/q1/dztb0bzn0fn8mvfm7_q9ms0m0000gn/T/tmpb27mns7u', '--error-file', '/var/folders/q1/dztb0bzn0fn8mvfm7_q9ms0m0000gn/T/tmpc6y4_6cx']
[2022-05-23, 16:25:20 UTC] {standard_task_runner.py:80} INFO - Job 4: Subtask wait_for_task_b
[2022-05-23, 16:25:25 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: dag_c.wait_for_task_b scheduled__2022-05-19T02:30:00+00:00 [running]> on host yahoo-MacBook-Pro.local
[2022-05-23, 16:25:30 UTC] {taskinstance.py:1448} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=dag_c
AIRFLOW_CTX_TASK_ID=wait_for_task_b
AIRFLOW_CTX_EXECUTION_DATE=2022-05-19T02:30:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-05-19T02:30:00+00:00
[2022-05-23, 16:25:30 UTC] {external_task.py:175} INFO - Poking for tasks None in dag dag_b on 2022-05-19T02:30:00+00:00,2022-05-19T02:15:00+00:00 ...
[2022-05-23, 16:25:30 UTC] {taskinstance.py:1726} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2022-05-23, 16:25:30 UTC] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-05-23, 16:25:30 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
So when I run the job locally using jar, it deploys and finishes successfully i.e. I can see the output files in GCS
java -cp /Users/zainqasmi/Workspace/vasa/dataflow/build/libs/vasa-dataflow-2022-03-25-12-27-14-784-all.jar com.nianticproject.geodata.extraction.ExtractGeodata \
--project=vasa-dev \
--configurationPath=/Users/zainqasmi/Workspace/vasa/dataflow/src/main/resources/foursquare/extract.pb.txt \
--region=us-central1 \
--runner=DataflowRunner \
--dryRun=false \
--workerMachineType=n2d-highmem-16
However, when I push the dag to airflow, it apparently runs successfully i.e. Marking task as SUCCESS and return code 0. But I can't find the dataflow being executed anywhere in GCP UI. Am I missing something? Using environment composer-2-0-7-airflow-2-2-3
Logs from airflow:
*** Reading remote log from gs://us-central1-airflow-dev-b0cc30af-bucket/logs/foursquare_1/extract_geodata/2022-03-25T22:52:15.382542+00:00/1.log.
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1033} INFO - Dependencies all met for <TaskInstance: foursquare_1.extract_geodata manual__2022-03-25T22:52:15.382542+00:00 [queued]>
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1033} INFO - Dependencies all met for <TaskInstance: foursquare_1.extract_geodata manual__2022-03-25T22:52:15.382542+00:00 [queued]>
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1239} INFO -
--------------------------------------------------------------------------------
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1240} INFO - Starting attempt 1 of 2
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1241} INFO -
--------------------------------------------------------------------------------
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1260} INFO - Executing <Task(DataFlowJavaOperator): extract_geodata> on 2022-03-25 22:52:15.382542+00:00
[2022-03-25, 22:52:21 UTC] {standard_task_runner.py:52} INFO - Started process 57323 to run task
[2022-03-25, 22:52:21 UTC] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'foursquare_1', 'extract_geodata', 'manual__2022-03-25T22:52:15.382542+00:00', '--job-id', '1531', '--raw', '--subdir', 'DAGS_FOLDER/dataflow_operator_test.py', '--cfg-path', '/tmp/tmp4thgd6do', '--error-file', '/tmp/tmpu6crkval']
[2022-03-25, 22:52:21 UTC] {standard_task_runner.py:77} INFO - Job 1531: Subtask extract_geodata
[2022-03-25, 22:52:22 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: foursquare_1.extract_geodata manual__2022-03-25T22:52:15.382542+00:00 [running]> on host airflow-worker-9rz89
[2022-03-25, 22:52:22 UTC] {taskinstance.py:1426} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=foursquare_1
AIRFLOW_CTX_TASK_ID=extract_geodata
AIRFLOW_CTX_EXECUTION_DATE=2022-03-25T22:52:15.382542+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-03-25T22:52:15.382542+00:00
[2022-03-25, 22:52:22 UTC] {credentials_provider.py:312} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
[2022-03-25, 22:52:22 UTC] {taskinstance.py:1268} INFO - Marking task as SUCCESS. dag_id=foursquare_1, task_id=extract_geodata, execution_date=20220325T225215, start_date=20220325T225221, end_date=20220325T225222
[2022-03-25, 22:52:22 UTC] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-03-25, 22:52:22 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
Dag:
GCP_PROJECT = "vasa-dev"
CONNECTION_ID = 'bigquery_default'
VASA_DATAFLOW_JAR = '/home/airflow/gcs/data/bin/vasa-dataflow-2022-03-25-16-36-09-008-all.jar'
default_args = {
'owner': 'airflow',
'depends_on_past': True,
'wait_for_downstream' : True,
'max_active_runs' : 1,
'start_date': days_ago(1),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(days=1),
}
with DAG(
dag_id = 'foursquare_1',
schedule_interval=timedelta(days=1),
default_args=default_args
) as dag:
kick_off_dag = DummyOperator(task_id='run_this_first')
extract_geodata = DataFlowJavaOperator(
task_id='extract_geodata',
jar=VASA_DATAFLOW_JAR,
job_class='com.nianticproject.geodata.extraction.ExtractGeodata',
options= {
"project": "vasa-dev",
"configurationPath": "/home/airflow/gcs/foursquare/extract.pb.txt",
"region": "us-central1",
"runner": "DataflowRunner",
"dryRun": "false",
"workerMachineType":"n2d-highmem-16",
},
dag=dag)
end_task = BashOperator(
task_id='end_task',
bash_command='echo {{ execution_date.subtract(months=1).replace(day=1).strftime("%Y-%m-%d") }}',
dag=dag,
)
kick_off_dag >> extract_geodata >> end_task
When I run a sensor in Airflow with mode='reschedule" and the sensor executes multiple times because it evaluates to false, the log-texts from previous iterations are repeated.
If the sensor for example iterates three times the log looks like this:
"
Log-text from iteration 1
Log-text from iteration 1
Log-text from iteration 2
Log-text from iteration 1
Log-text from iteration 2
Log-text from iteration 3
"
Is this the expected behavior or is something misconfigured?
Is there any way I can get mode='reschedule' to create a log that
looks like this instead?
"
Log-text from iteration 1
Log-text from iteration 2
Log-text from iteration 3
"
I use Airflow 2.2.2 in a configuration run on Azure Container Instances.
Example
file_sensor = FileSensor(
task_id = "file_sensor",
poke_interval = 10 * 60,
timeout = 60 * 60,
mode = "reschedule",
filepath = "/home/xyz/test"
)
*** Reading remote log from wasb-logsstorage/TestDag/file_sensor/2022-01-17T11:00:00+00:00/1.log.
[2022-01-18, 13:45:33 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 13:45:33 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 13:45:33 ] {taskinstance.py:1241} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 13:45:33 ] {taskinstance.py:1242} INFO - Starting attempt 1 of 1
[2022-01-18, 13:45:33 ] {taskinstance.py:1243} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 13:45:33 ] {taskinstance.py:1262} INFO - Executing <Task(FileSensor): file_sensor> on 2022-01-17 11:00:00+00:00
[2022-01-18, 13:45:33 ] {standard_task_runner.py:52} INFO - Started process 30457 to run task
[2022-01-18, 13:45:33 ] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'TestDag', 'file_sensor', 'scheduled__2022-01-17T11:00:00+00:00', '--job-id', '1597', '--raw', '--subdir', 'DAGS_FOLDER/dags/zz.py', '--cfg-path', '/tmp/tmpk_5t7u3m', '--error-file', '/tmp/tmpbcqgbk_t']
[2022-01-18, 13:45:33 ] {standard_task_runner.py:77} INFO - Job 1597: Subtask file_sensor
[2022-01-18, 13:45:33 ] {logging_mixin.py:109} INFO - Running <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [running]> on host localhost
[2022-01-18, 13:45:33 ] {taskinstance.py:1429} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=TestDag
AIRFLOW_CTX_TASK_ID=file_sensor
AIRFLOW_CTX_EXECUTION_DATE=2022-01-17T11:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-01-17T11:00:00+00:00
[2022-01-18, 13:45:33 ] {filesystem.py:59} INFO - Poking for file /home/xyz/test
[2022-01-18, 13:45:34 ] {taskinstance.py:1686} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2022-01-18, 13:45:34 ] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-01-18, 13:45:34 ] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
<********** Start of log-text from iteration 2 - (line added manually by Ingesson8) **********>
[2022-01-18, 13:45:33 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 13:45:33 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 13:45:33 ] {taskinstance.py:1241} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 13:45:33 ] {taskinstance.py:1242} INFO - Starting attempt 1 of 1
[2022-01-18, 13:45:33 ] {taskinstance.py:1243} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 13:45:33 ] {taskinstance.py:1262} INFO - Executing <Task(FileSensor): file_sensor> on 2022-01-17 11:00:00+00:00
[2022-01-18, 13:45:33 ] {standard_task_runner.py:52} INFO - Started process 30457 to run task
[2022-01-18, 13:45:33 ] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'TestDag', 'file_sensor', 'scheduled__2022-01-17T11:00:00+00:00', '--job-id', '1597', '--raw', '--subdir', 'DAGS_FOLDER/dags/zz.py', '--cfg-path', '/tmp/tmpk_5t7u3m', '--error-file', '/tmp/tmpbcqgbk_t']
[2022-01-18, 13:45:33 ] {standard_task_runner.py:77} INFO - Job 1597: Subtask file_sensor
[2022-01-18, 13:45:33 ] {logging_mixin.py:109} INFO - Running <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [running]> on host localhost
[2022-01-18, 13:45:33 ] {taskinstance.py:1429} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=TestDag
AIRFLOW_CTX_TASK_ID=file_sensor
AIRFLOW_CTX_EXECUTION_DATE=2022-01-17T11:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-01-17T11:00:00+00:00
[2022-01-18, 13:45:33 ] {filesystem.py:59} INFO - Poking for file /home/xyz/test
[2022-01-18, 13:45:34 ] {taskinstance.py:1686} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2022-01-18, 13:45:34 ] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-01-18, 13:45:34 ] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2022-01-18, 13:55:35 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 13:55:35 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 13:55:35 ] {taskinstance.py:1241} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 13:55:35 ] {taskinstance.py:1242} INFO - Starting attempt 1 of 1
[2022-01-18, 13:55:35 ] {taskinstance.py:1243} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 13:55:35 ] {taskinstance.py:1262} INFO - Executing <Task(FileSensor): file_sensor> on 2022-01-17 11:00:00+00:00
[2022-01-18, 13:55:35 ] {standard_task_runner.py:52} INFO - Started process 32072 to run task
[2022-01-18, 13:55:35 ] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'TestDag', 'file_sensor', 'scheduled__2022-01-17T11:00:00+00:00', '--job-id', '1598', '--raw', '--subdir', 'DAGS_FOLDER/dags/zz.py', '--cfg-path', '/tmp/tmp1dxahk4f', '--error-file', '/tmp/tmprjw2al9w']
[2022-01-18, 13:55:35 ] {standard_task_runner.py:77} INFO - Job 1598: Subtask file_sensor
[2022-01-18, 13:55:36 ] {logging_mixin.py:109} INFO - Running <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [running]> on host localhost
[2022-01-18, 13:55:36 ] {taskinstance.py:1429} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=TestDag
AIRFLOW_CTX_TASK_ID=file_sensor
AIRFLOW_CTX_EXECUTION_DATE=2022-01-17T11:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-01-17T11:00:00+00:00
[2022-01-18, 13:55:36 ] {filesystem.py:59} INFO - Poking for file /home/xyz/test
[2022-01-18, 13:55:36 ] {taskinstance.py:1686} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2022-01-18, 13:55:36 ] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-01-18, 13:55:36 ] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
<********** Start of log-text from iteration 3 - (line added manually by Ingesson8) **********>
[2022-01-18, 13:45:33 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 13:45:33 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 13:45:33 ] {taskinstance.py:1241} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 13:45:33 ] {taskinstance.py:1242} INFO - Starting attempt 1 of 1
[2022-01-18, 13:45:33 ] {taskinstance.py:1243} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 13:45:33 ] {taskinstance.py:1262} INFO - Executing <Task(FileSensor): file_sensor> on 2022-01-17 11:00:00+00:00
[2022-01-18, 13:45:33 ] {standard_task_runner.py:52} INFO - Started process 30457 to run task
[2022-01-18, 13:45:33 ] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'TestDag', 'file_sensor', 'scheduled__2022-01-17T11:00:00+00:00', '--job-id', '1597', '--raw', '--subdir', 'DAGS_FOLDER/dags/zz.py', '--cfg-path', '/tmp/tmpk_5t7u3m', '--error-file', '/tmp/tmpbcqgbk_t']
[2022-01-18, 13:45:33 ] {standard_task_runner.py:77} INFO - Job 1597: Subtask file_sensor
[2022-01-18, 13:45:33 ] {logging_mixin.py:109} INFO - Running <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [running]> on host localhost
[2022-01-18, 13:45:33 ] {taskinstance.py:1429} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=TestDag
AIRFLOW_CTX_TASK_ID=file_sensor
AIRFLOW_CTX_EXECUTION_DATE=2022-01-17T11:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-01-17T11:00:00+00:00
[2022-01-18, 13:45:33 ] {filesystem.py:59} INFO - Poking for file /home/xyz/test
[2022-01-18, 13:45:34 ] {taskinstance.py:1686} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2022-01-18, 13:45:34 ] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-01-18, 13:45:34 ] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2022-01-18, 13:55:35 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 13:55:35 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 13:55:35 ] {taskinstance.py:1241} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 13:55:35 ] {taskinstance.py:1242} INFO - Starting attempt 1 of 1
[2022-01-18, 13:55:35 ] {taskinstance.py:1243} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 13:55:35 ] {taskinstance.py:1262} INFO - Executing <Task(FileSensor): file_sensor> on 2022-01-17 11:00:00+00:00
[2022-01-18, 13:55:35 ] {standard_task_runner.py:52} INFO - Started process 32072 to run task
[2022-01-18, 13:55:35 ] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'TestDag', 'file_sensor', 'scheduled__2022-01-17T11:00:00+00:00', '--job-id', '1598', '--raw', '--subdir', 'DAGS_FOLDER/dags/zz.py', '--cfg-path', '/tmp/tmp1dxahk4f', '--error-file', '/tmp/tmprjw2al9w']
[2022-01-18, 13:55:35 ] {standard_task_runner.py:77} INFO - Job 1598: Subtask file_sensor
[2022-01-18, 13:55:36 ] {logging_mixin.py:109} INFO - Running <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [running]> on host localhost
[2022-01-18, 13:55:36 ] {taskinstance.py:1429} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=TestDag
AIRFLOW_CTX_TASK_ID=file_sensor
AIRFLOW_CTX_EXECUTION_DATE=2022-01-17T11:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-01-17T11:00:00+00:00
[2022-01-18, 13:55:36 ] {filesystem.py:59} INFO - Poking for file /home/xyz/test
[2022-01-18, 13:55:36 ] {taskinstance.py:1686} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2022-01-18, 13:55:36 ] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-01-18, 13:55:36 ] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2022-01-18, 14:05:38 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 14:05:38 ] {taskinstance.py:1035} INFO - Dependencies all met for <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [queued]>
[2022-01-18, 14:05:38 ] {taskinstance.py:1241} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 14:05:38 ] {taskinstance.py:1242} INFO - Starting attempt 1 of 1
[2022-01-18, 14:05:38 ] {taskinstance.py:1243} INFO -
--------------------------------------------------------------------------------
[2022-01-18, 14:05:38 ] {taskinstance.py:1262} INFO - Executing <Task(FileSensor): file_sensor> on 2022-01-17 11:00:00+00:00
[2022-01-18, 14:05:38 ] {standard_task_runner.py:52} INFO - Started process 1447 to run task
[2022-01-18, 14:05:38 ] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'TestDag', 'file_sensor', 'scheduled__2022-01-17T11:00:00+00:00', '--job-id', '1600', '--raw', '--subdir', 'DAGS_FOLDER/dags/zz.py', '--cfg-path', '/tmp/tmpoq7eudcd', '--error-file', '/tmp/tmpqz5eorcr']
[2022-01-18, 14:05:38 ] {standard_task_runner.py:77} INFO - Job 1600: Subtask file_sensor
[2022-01-18, 14:05:38 ] {logging_mixin.py:109} INFO - Running <TaskInstance: TestDag.file_sensor scheduled__2022-01-17T11:00:00+00:00 [running]> on host localhost
[2022-01-18, 14:05:39 ] {taskinstance.py:1429} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=TestDag
AIRFLOW_CTX_TASK_ID=file_sensor
AIRFLOW_CTX_EXECUTION_DATE=2022-01-17T11:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-01-17T11:00:00+00:00
[2022-01-18, 14:05:39 ] {filesystem.py:59} INFO - Poking for file /home/xyz/test
[2022-01-18, 14:05:39 ] {taskinstance.py:1686} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
[2022-01-18, 14:05:39 ] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-01-18, 14:05:39 ] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
**Apache Airflow version:**1.10.9-composer
Kubernetes Version : Client Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.6002", GitCommit:"035184604aff4de66f7db7fddadb8e7be76b6717", GitTreeState:"clean", BuildDate:"2020-12-01T23:13:35Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}
Environment: Airflow, running on top of Kubernetes - Linux version 4.19.112
OS : Linux version 4.19.112+ (builder#7fc5cdead624) (Chromium OS 9.0_pre361749_p20190714-r4 clang version 9.0.0 (/var/cache/chromeos-cache/distfiles/host/egit-src/llvm-project c11de5eada2decd0a495ea02676b6f4838cd54fb) (based on LLVM 9.0.0svn)) #1 SMP Fri Sep 4 12:00:04 PDT 2020
Kernel : Linux gke-europe-west2-asset-c-default-pool-dc35e2f2-0vgz
4.19.112+ #1 SMP Fri Sep 4 12:00:04 PDT 2020 x86_64 Intel(R) Xeon(R) CPU # 2.20GHz GenuineIntel GNU/Linux
What happened ?
A running task is marked as Zombie after the execution time crossed the latest heartbeat + 5 minutes.
The task is running in background in another application server, triggered using SSHOperator.
[2021-01-18 11:53:37,491] {taskinstance.py:888} INFO - Executing <Task(SSHOperator): load_trds_option_composite_file> on 2021-01-17T11:40:00+00:00
[2021-01-18 11:53:37,495] {base_task_runner.py:131} INFO - Running on host: airflow-worker-6f6fd78665-lm98m
[2021-01-18 11:53:37,495] {base_task_runner.py:132} INFO - Running: ['airflow', 'run', 'dsp_etrade_process_trds_option_composite_0530', 'load_trds_option_composite_file', '2021-01-17T11:40:00+00:00', '--job_id', '282759', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/dsp_etrade_trds_option_composite_0530.py', '--cfg_path', '/tmp/tmpge4_nva0']
Task Executing time:
dag_id dsp_etrade_process_trds_option_composite_0530
duration 7270.47
start_date 2021-01-18 11:53:37,491
end_date 2021-01-18 13:54:47.799728+00:00
Scheduler Logs during that time:
[2021-01-18 13:54:54,432] {taskinstance.py:1135} ERROR - <TaskInstance: dsp_etrade_process_etrd.push_run_date 2021-01-18 13:30:00+00:00 [running]> detected as zombie
{
textPayload: "[2021-01-18 13:54:54,432] {taskinstance.py:1135} ERROR - <TaskInstance: dsp_etrade_process_etrd.push_run_date 2021-01-18 13:30:00+00:00 [running]> detected as zombie"
insertId: "1ca8zyfg3zvma66"
resource: {
type: "cloud_composer_environment"
labels: {3}
}
timestamp: "2021-01-18T13:54:54.432862699Z"
severity: "ERROR"
logName: "projects/asset-control-composer-prod/logs/airflow-scheduler"
receiveTimestamp: "2021-01-18T13:54:55.714437665Z"
}
Airflow-webserver log :
X.X.X.X - - [18/Jan/2021:13:54:39 +0000] "GET /_ah/health HTTP/1.1" 200 187 "-" "GoogleHC/1.0"
{
textPayload: "172.17.0.5 - - [18/Jan/2021:13:54:39 +0000] "GET /_ah/health HTTP/1.1" 200 187 "-" "GoogleHC/1.0"
"
insertId: "1sne0gqg43o95n3"
resource: {2}
timestamp: "2021-01-18T13:54:45.401670481Z"
logName: "projects/asset-control-composer-prod/logs/airflow-webserver"
receiveTimestamp: "2021-01-18T13:54:50.598807514Z"
}
Airflow Info logs :
2021-01-18 08:54:47.799 EST
{
textPayload: "NoneType: None
"
insertId: "1ne3hqgg47yzrpf"
resource: {2}
timestamp: "2021-01-18T13:54:47.799661030Z"
severity: "INFO"
logName: "projects/asset-control-composer-prod/logs/airflow-scheduler"
receiveTimestamp: "2021-01-18T13:54:50.914461159Z"
}
[2021-01-18 13:54:47,800] {taskinstance.py:1192} INFO - Marking task as FAILED.dag_id=dsp_etrade_process_trds_option_composite_0530, task_id=load_trds_option_composite_file, execution_date=20210117T114000, start_date=20210118T115337, end_date=20210118T135447
Copy link
{
textPayload: "[2021-01-18 13:54:47,800] {taskinstance.py:1192} INFO - Marking task as FAILED.dag_id=dsp_etrade_process_trds_option_composite_0530, task_id=load_trds_option_composite_file, execution_date=20210117T114000, start_date=20210118T115337, end_date=20210118T135447"
insertId: "1ne3hqgg47yzrpg"
resource: {2}
timestamp: "2021-01-18T13:54:47.800605248Z"
severity: "INFO"
logName: "projects/asset-control-composer-prod/logs/airflow-scheduler"
receiveTimestamp: "2021-01-18T13:54:50.914461159Z"
}
Airflow Database shows the latest heartbeat as:
select state, latest_heartbeat from job where id=282759
--------------------------------------
state | latest_heartbeat
running | 2021-01-18 13:48:41.891934
Airflow Configurations:
celery
worker_concurrency=6
scheduler
scheduler_health_check_threshold=60
scheduler_zombie_task_threshold=300
max_threads=2
core
dag_concurrency=6
Kubernetes Cluster :
Worker nodes : 6
What was expected to happen ?
The backend process takes around 2hrs 30 minutes to finish. During
such long running jobs the task is detected as zombie. Eventhough the
worker node is still processing the task. The state of the job is
still marked as 'running'. State if the task is not known during the
run time.