DAG Has Yet to Run - Airflow - airflow

I'm new to airflow, really appreciate any help for the following problem.
I tried to run the airflow webserver on my laptop.
Theoratically, I set the start_time=datetime.now(), it should be run successfully when I manually ran the dag on the webserver,but it changed overtime, it had been either queued or successful.
Sometimes it was successful(but runtime is 00:00:00, and obviously my dag hadn't been ran), and sometimes it was just queued.
Here's the code in my DAG:
from datetime import datetime
from airflow import DAG
from airflow.models import Variable
from airflow.operators.python import PythonOperator
def get_var():
#a=Variable.get('abc')
print('abd')
with DAG(dag_id='test_var',start_date=datetime.now()) as dag:
task1=PythonOperator(task_id='var',python_callable=get_var)
However, every time when I check the Graph bar in the airflow webUI, it shows up as the below picture:
I'm not sure if it matters with the way I initialize airflow, I follow the below steps:
airflow webserver -p 8080
airflow db init
--- These two steps worked, yet the third step ---
airflow scheduler
[2022-10-31 09:46:45,562] {scheduler_job.py:701} INFO - Starting the scheduler
[2022-10-31 09:46:45,562] {scheduler_job.py:706} INFO - Processing each file at most -1 times
[2022-10-31 09:46:45,565] {executor_loader.py:107} INFO - Loaded executor: SequentialExecutor
[2022-10-31 09:46:45,569] {manager.py:163} INFO - Launched DagFileProcessorManager with pid: 13315
[2022-10-31 09:46:45,570] {scheduler_job.py:1381} INFO - Resetting orphaned tasks for active dag runs
[2022-10-31 09:46:46,169] {settings.py:58} INFO - Configured default timezone Timezone('UTC')
[2022-10-31T09:46:46.172+0800] {manager.py:409} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2) when using sqlite. So we set parallelism to 1.
[2022-10-31 09:46:46 +0800] [13314] [INFO] Starting gunicorn 20.1.0
[2022-10-31 09:46:46 +0800] [13314] [ERROR] Connection in use: ('::', 8793)
[2022-10-31 09:46:46 +0800] [13314] [ERROR] Retrying in 1 second.
[2022-10-31 09:46:47 +0800] [13314] [ERROR] Connection in use: ('::', 8793)
[2022-10-31 09:46:47 +0800] [13314] [ERROR] Retrying in 1 second.
[2022-10-31 09:46:48 +0800] [13314] [ERROR] Connection in use: ('::', 8793)
[2022-10-31 09:46:48 +0800] [13314] [ERROR] Retrying in 1 second.
[2022-10-31 09:46:49 +0800] [13314] [ERROR] Connection in use: ('::', 8793)
[2022-10-31 09:46:49 +0800] [13314] [ERROR] Retrying in 1 second.
[2022-10-31 09:46:50 +0800] [13314] [ERROR] Connection in use: ('::', 8793)
[2022-10-31 09:46:50 +0800] [13314] [ERROR] Retrying in 1 second.
[2022-10-31 09:46:51 +0800] [13314] [ERROR] Can't connect to ('::', 8793)
It turned out like this.
Does this have anything to do with my DAG operation on the webUI?
Thanks for your time and help!
I tried to search to another stackflow post about `[ERROR] Can't connect to ('::', 8793), but they only discussed about the webserver stuff, and also I'm not sure if the reason that my dag couldn't work is because of airflow scheduler

Airflow Scheduler service needs to be running to get the Airflow DAGs to work. In your case, it seems another service is already using port 8793 (default for scheduler) which is probably why Airflow Scheduler service is unable to start.
Consider the following:
Use Executor = LocalExecutor
Running Airflow Scheduler in the Background first with the command: airflow scheduler &
Use something like this for defining your start_date
start_date = datetime.now(local_tz) - timedelta(1)
I would also recommend using with Docker & Docker-compose to isolate your environment and all python dependency issues associated with running Airflow directly on your laptop.
I found this great article. You can also refer to Airflow's official documentation around this.

Related

Airflow tasks ending up in Retry state without logs

Hi I'm currently running airflow on a Dataproc cluster. My DAGs used to run fine but facing this issue where tasks are ending up in 'retry' state without any logs when I click on task instance -> logs on airflow UI
I see the following error in terminal where I started the airflow webserver
2022-06-24 07:30:36.544 [ERROR] Executor reports task instance
<TaskInstance: **task name** 2022-06-23 07:00:00+00:00 [queued]> finished (failed)
although the task says its queued. Was the task killed externally?
None
[2022-06-23 06:08:33,202] {models.py:1758} INFO - Marking task as UP_FOR_RETRY
2022-06-23 06:08:33.202 [INFO] Marking task as UP_FOR_RETRY
What I tried so far
restarted webserver
Started server from 3 different ports
re-ran backfill command with 3 different timestamps
deleted dag runs for my dag, created a new dag run and then re-ran backfill command
cleared the PID as mentioned here How do I restart airflow webserver? and restarted the webserver
None of these worked. This issue is persistent for the past two days, appreciate any help here.At this point I'm guessing this is to do with a shared DB but not sure how to fix this.
<<update>> So what I also found is these tasks eventually go to success or failure state. when that happens the logs are available, but still no logs for the retry attempts in $airflow_home or our remote directory
The issue was there was another celery worker listening on the same queue. since this second worker was not configured properly it was failing the task and not writing the logs to remote location.

Airflow server constantly restarting - Signal 15

I launch a airflow webserver command in my local machine to start an airflow instance on port 8081. The server starts, however the pŕompt constantly shows some warning messages, as a loop. No error message appears, but the server doesn't works. Those are the messages:
/usr/local/lib/python3.8/dist-packages/airflow/configuration.py:361 DeprecationWarning: The default_queue option in [celery] has been moved to the default_queue option in [operators] - the old setting has been used, but please update your config.
/usr/local/lib/python3.8/dist-packages/airflow/configuration.py:361 DeprecationWarning: The dag_concurrency option in [core] has been renamed to max_active_tasks_per_dag - the old setting has been used, but please update your config.
/usr/local/lib/python3.8/dist-packages/airflow/configuration.py:361 DeprecationWarning: The processor_poll_interval option in [scheduler] has been renamed to scheduler_idle_sleep_time - the old setting has been used, but please update your config.
[2022-06-13 15:11:57,355] {manager.py:779} WARNING - No user yet created, use flask fab command to do it.
[2022-06-13 15:12:01,925] {manager.py:512} WARNING - Refused to delete permission view, assoc with role exists DAG Runs.can_create User
[2022-06-13 15:12:19 +0000] [1117638] [INFO] Handling signal: ttou
[2022-06-13 15:12:19 +0000] [1120256] [INFO] Worker exiting (pid: 1120256)
[2022-06-13 15:12:19 +0000] [1117638] [WARNING] Worker with pid 1120256 was terminated due to signal 15
[2022-06-13 15:12:22 +0000] [1117638] [INFO] Handling signal: ttin
[2022-06-13 15:12:22 +0000] [1121568] [INFO] Booting worker with pid: 1121568
Do you know what can could be happening?
Thank you in advance!

Why are my Airflow tasks being "externally set to failed"?

I'm using Airflow 2.0.0, and my tasks are sporadically being killed "externally" after running for a few seconds or minutes. The tasks usually run successfully (both for manual task initiated via airflow tasks test ... and for scheduled DAG runs), so I believe this is not related to my DAG code.
When tasks fail, this seems to be the key error from the task logs:
{local_task_job.py:170} WARNING - State of this instance has been externally set to failed. Terminating instance.
[2020-12-20 11:26:11,448] {taskinstance.py:826} INFO - Dependencies all met for <TaskInstance: daily_backups.run_backupper 2020-12-19T02:00:00+00:00 [queued]>
[2020-12-20 11:26:11,473] {taskinstance.py:826} INFO - Dependencies all met for <TaskInstance: daily_backups.run_backupper 2020-12-19T02:00:00+00:00 [queued]>
[2020-12-20 11:26:11,473] {taskinstance.py:1017} INFO -
--------------------------------------------------------------------------------
[2020-12-20 11:26:11,473] {taskinstance.py:1018} INFO - Starting attempt 3 of 3
[2020-12-20 11:26:11,473] {taskinstance.py:1019} INFO -
--------------------------------------------------------------------------------
[2020-12-20 11:26:11,506] {taskinstance.py:1038} INFO - Executing <Task(PythonOperator): run_backupper> on 2020-12-19T02:00:00+00:00
[2020-12-20 11:26:11,509] {standard_task_runner.py:51} INFO - Started process 12059 to run task
[2020-12-20 11:26:11,515] {standard_task_runner.py:75} INFO - Running: ['airflow', 'tasks', 'run', 'daily_backups', 'run_backupper', '2020-12-19T02:00:00+00:00', '--job-id', '22', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/backupper/daily_backups.py', '--cfg-path', '/tmp/tmpnfmqtorg']
[2020-12-20 11:26:11,517] {standard_task_runner.py:76} INFO - Job 22: Subtask run_backupper
[2020-12-20 11:26:11,609] {logging_mixin.py:103} INFO - Running <TaskInstance: daily_backups.run_backupper 2020-12-19T02:00:00+00:00 [running]> on host localhost
[2020-12-20 11:26:11,742] {taskinstance.py:1232} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=<user>
AIRFLOW_CTX_DAG_ID=daily_backups
AIRFLOW_CTX_TASK_ID=run_backupper
AIRFLOW_CTX_EXECUTION_DATE=2020-12-19T02:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2020-12-19T02:00:00+00:00
...
... my job's logs, indicating that the job is running healthily ...
...
[2020-12-20 11:26:16,587] {local_task_job.py:170} WARNING - State of this instance has been externally set to failed. Terminating instance.
[2020-12-20 11:26:16,593] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 12059
[2020-12-20 11:27:16,609] {process_utils.py:108} WARNING - process psutil.Process(pid=12059, name='airflow task runner: daily_backups run_backupper 2020-12-19T02:00:00+00:00 22', status='sleeping', started='11:26:11') did not respond to SIGTERM. Trying SIGKILL
[2020-12-20 11:27:16,618] {process_utils.py:61} INFO - Process psutil.Process(pid=12059, name='airflow task runner: daily_backups run_backupper 2020-12-19T02:00:00+00:00 22', status='terminated', exitcode=<Negsignal.SIGKILL: -9>, started='11:26:11') (12059) terminated with exit code Negsignal.SIGKILL
[2020-12-20 11:27:16,618] {local_task_job.py:118} INFO - Task exited with return code Negsignal.SIGKILL
The final few lines in the logs are not consistent. Here is a different version, for the same task that failed in an earlier attempt:
... same stuff as before ...
[2020-12-20 02:01:12,689] {local_task_job.py:170} WARNING - State of this instance has been externally set to failed. Terminating instance.
[2020-12-20 02:01:12,695] {process_utils.py:95} INFO - Sending Signals.SIGTERM to GPID 24442
[2020-12-20 02:02:00,462] {taskinstance.py:1214} ERROR - Received SIGTERM. Terminating subprocesses.
[2020-12-20 02:02:00,498] {process_utils.py:61} INFO - Process psutil.Process(pid=24442, status='terminated', exitcode=0, started='02:00:10') (24442) terminated with exit code 0
[2020-12-20 02:02:00,499] {local_task_job.py:118} INFO - Task exited with return code 0
I suspect in this case the script was able to respond to the SIGTERM in time, whereas in the previous case it was blocked on a long-running query and was not able to terminate cleanly.
I believe the problem was that the scheduler health check threshold was set to be smaller than the scheduler heartbeat interval.
In my config I had set scheduler_health_check_threshold to 30 seconds and scheduler_heartbeat_sec to 60 seconds. During the check for orphaned tasks (itself governed by a different parameter, orphaned_tasks_check_interval), the scheduler heartbeat was determined to be older than 30 seconds, which makes sense, because it was only heartbeating every 60 seconds. Thus the scheduler was inferred to be unhealthy and was therefore terminated.
Around the time of the failure, I could see messages like these in /var/log/syslog
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,368] {scheduler_job.py:1751} INFO - Resetting orphaned tasks for active dag runs
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,373] {scheduler_job.py:1764} INFO - Marked 1 SchedulerJob instances as failed
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,381] {scheduler_job.py:1805} INFO - Reset the following 1 orphaned TaskInstances:
Dec 20 11:26:14 localhost bash[11545]: #011<TaskInstance: daily_backups.run_backupper 2020-12-19 02:00:00+00:00 [running]>
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,571] {scheduler_job.py:938} INFO - 1 tasks up for execution:
Dec 20 11:26:14 localhost bash[11545]: #011<TaskInstance: daily_backups.run_backupper 2020-12-19 02:00:00+00:00 [scheduled]>
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,574] {scheduler_job.py:972} INFO - Figuring out tasks to run in Pool(name=default_pool) with 128 open slots and 1 task instances ready to be queued
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,575] {scheduler_job.py:999} INFO - DAG daily_backups has 0/16 running and queued tasks
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,575] {scheduler_job.py:1060} INFO - Setting the following tasks to queued state:
Dec 20 11:26:14 localhost bash[11545]: #011<TaskInstance: daily_backups.run_backupper 2020-12-19 02:00:00+00:00 [scheduled]>
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,578] {scheduler_job.py:1102} INFO - Sending TaskInstanceKey(dag_id='daily_backups', task_id='run_backupper', execution_date=datetime.datetime(2020, 12, 19, 2, 0, tzinfo=Timezone('UTC')), try_number=4) to executor with priority 2 and queue default
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,578] {base_executor.py:79} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'daily_backups', 'run_backupper', '2020-12-19T02:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/storage/airflow/dags/backupper/daily_backups.py']
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,581] {local_executor.py:81} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'daily_backups', 'run_backupper', '2020-12-19T02:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/storage/airflow/dags/backupper/daily_backups.py']
Dec 20 11:26:14 localhost bash[11545]: [2020-12-20 11:26:14,707] {dagbag.py:440} INFO - Filling up the DagBag from /storage/airflow/dags/backupper/daily_backups.py
Dec 20 11:26:15 localhost bash[11545]: Running <TaskInstance: daily_backups.run_backupper 2020-12-19T02:00:00+00:00 [queued]> on host localhost
and the timestamps coincide closely with the SIGTERM received by my task. I guess that since the SchedulerJob was marked as failed, then the TaskInstance running my actual task was considered an orphan, and thus marked for termination. At the same time it scheduled a new attempt (try_number=4).
Increasing the scheduler_health_check_threshold to 120 seconds and restarting the scheduler/webserver services appears to have resolved my issue.
I had the same issue.
From logs:
2021-05-07 13:04:19,960 INFO - Resetting orphaned tasks for active dag runs
2021-05-07 13:09:20,060 INFO - Resetting orphaned tasks for active dag runs
2021-05-07 13:14:20,186 INFO - Resetting orphaned tasks for active dag runs
2021-05-07 13:19:20,263 INFO - Resetting orphaned tasks for active dag runs
2021-05-07 13:24:20,399 INFO - Resetting orphaned tasks for active dag runs
2021-05-07 13:29:20,729 INFO - Resetting orphaned tasks for active dag runs
2021-05-07 13:34:20,892 INFO - Resetting orphaned tasks for active dag runs
2021-05-07 13:39:21,070 INFO - Resetting orphaned tasks for active dag runs
2021-05-07 13:44:21,328 INFO - Resetting orphaned tasks for active dag runs
2021-05-07 13:49:21,423 INFO - Resetting orphaned tasks for active dag runs
And my scheduler config was as follows:
# Task instances listen for external kill signal (when you clear tasks
# from the CLI or the UI), this defines the frequency at which they should
# listen (in seconds).
job_heartbeat_sec = 5
I had the same issue in AKS (Azure Kubernetes).
I resolved it with setting AIRFLOW__SCHEDULER__SCHEDULE_AFTER_TASK_EXECUTION to False.
https://github.com/apache/airflow/issues/14672
I had the same problem, running on a MacBook. Seems to be the MacBook going to sleep, solved it by ticking "Prevent your Mac from automatically sleeping when the display is off" in the "Power Adapter" section of preferences -> battery. (When on a charger)

Airflow 1.9.0 is queuing but tasks are not running

Airflow stopped running tasks all of a sudden. Below are all running
airflow scheduler
airflow webserver
airflow worker
webui message
All dependencies are met but the task instance is not running. In most
cases this just means that the task will probably be scheduled soon
unless:
- The scheduler is down or under heavy load
If this task instance does not start soon please contact your Airflow
administrator for assistance.
Scheduler seems to be in a loop, keeps repeating the below messages. WebUI shows tasks are in queued state. Tried restarting the scheduler, didn't help.
[2018-11-17 22:03:45,809] {{jobs.py:1607}} DEBUG - Starting Loop...
[2018-11-17 22:03:45,809] {{jobs.py:1627}} INFO - Heartbeating the process manager
[2018-11-17 22:03:45,810] {{jobs.py:1662}} INFO - Heartbeating the executor
[2018-11-17 22:03:45,810] {{base_executor.py:103}} DEBUG - 124 running task instances
[2018-11-17 22:03:45,810] {{base_executor.py:104}} DEBUG - 0 in queue
[2018-11-17 22:03:45,810] {{base_executor.py:105}} DEBUG - 76 open slots
[2018-11-17 22:03:45,810] {{base_executor.py:132}} DEBUG - Calling the <class 'airflow.executors.celery_executor.CeleryExecutor'> sync method
[2018-11-17 22:03:45,810] {{celery_executor.py:80}} DEBUG - Inquiring about 124 celery task(s)
Airflow setup:
apache-airflow[celery, redis, all]==1.9.0
I also checked these posts but didn't help me:
Airflow 1.9.0 is queuing but not launching tasks
Airflow tasks get stuck at "queued" status and never gets running
Problem solved. This is a problem when you create your build on or after 2018-11-15 Turns out apache-airflow[celery, redis, all]==1.9.0 takes the latest version of redis-py 3.0.1 which does not work with celery 4.2.1.
Solution is to use redis-py 2.10.6
redis==2.10.6
apache-airflow[celery, all]==1.9.0

Airflow execution_timeout settings not respected

In my tasks, I have execution_timeout=timedelta(minutes=1) set in my task and 'dagrun_timeout': timedelta(minutes=2) for my DAG, and this is correctly reflected in the web GUI's Task Instance Details. However, none of my task instances are actually set to failed or retry when breaching the one minute threshold. Rather, they time out at 11 minutes...
[2017-11-02 18:00:05,376] {base_task_runner.py:95} INFO - Subtask: [2017-11-02 18:00:05,370] {base_hook.py:67} INFO - Using connection to: [REDACTED]
[2017-11-02 18:10:06,505] {base_task_runner.py:95} INFO - Subtask: [2017-11-02 18:10:06,504] {timeout.py:37} ERROR - Process timed out
Do I have a problem with my configuration, or is there something buggy happening with how Airflow interprets time out settings?

Resources