So before asking this I've looked through the docs and had a look at Difference between "airflow run" and "airflow test" in Airflow to see if I can figure out why I am having this problem.
I've got a few dags, all of which use LocalExecutor. Two of them use an SSHOperator and the other just runs locally. I've tried airflow run <some_task_id> <some_execution_date> and airflow trigger_dag <dag_id>, all of which fail. However, when I run airflow test <task_id> <execution_date> it works. I should emphasise that this is also the case for the dag that only has a locally ran task.
There seems to be a lot of confusion around the start_date in dags, as well as how it relates to schedule_interval. All of my dags have a static start_date set to a time in the near past and just for sanity I have the schedule interval schedule_interval='* * * * *' so that it runs every minute (it's a lightweight task). When running the dag the task just runs and fails, if retries is set to 0, or it just gets stuck in a retry state if retries > 0, every minute without much feedback at all. All I ever get in task instance details, for each case, is:
Task instance's dagrun did not exist: Unknown reason.
or
Task is not ready for retry yet but will be retried automatically. Current date is 2019-02-20T12:30:35.381668+00:00 and task will be retried at 2019-02-20T12:31:21.492310+00:00.
There are no logs in the UI despite specifying where they should go in the config file and it telling me where they are from the task instance details page.
Here is an example of one of the dags:
import json
import re
from os.path import expanduser
from datetime import datetime, timedelta
from airflow import DAG
from airflow.utils import timezone
from airflow.contrib.hooks.ssh_hook import SSHHook
from airflow.contrib.operators.ssh_operator import SSHOperator
from bw_config_tools.connect.bw_config import ConfigDbClient
CONFIG_DB_INFO = '/etc/airflow/config_db_info.json'
START_SCRIPT = '/bin/start.sh'
TIME_IN_PAST = timezone.convert_to_utc(datetime(2019, 2, 14, 15, 00))
DEFAULT_ARGS = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': TIME_IN_PAST,
'email': ['example#domain.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
'retry_delay': timedelta(minutes=1),
}
def _extract_instance_id(instance_string):
return re.findall(r'\d+', instance_string)[0]
def _read_file_as_json(file_name):
with open(file_name) as open_file:
return json.load(open_file)
DB_INFO = _read_file_as_json(CONFIG_DB_INFO)
CONFIG_CLIENT = ConfigDbClient(**DB_INFO)
print('Config DB client: {0}'.format(CONFIG_CLIENT))
APP_DIRS = CONFIG_CLIENT.get_values('%solr-mentions-cleanup.[0-9]+.dir%', strictness='similar')
INSTANCE_START_SCRIPT_PATHS = {
_extract_instance_id(instance_string): directory+START_SCRIPT
for instance_string, directory in APP_DIRS.items()
}
# Create an ssh hook which refers to pre-existing connection information
# setup and stored by airflow
SSH_HOOK = SSHHook(ssh_conn_id='solr-mentions-cleanups', key_file='/home/airflow/.ssh/id_rsa')
# Create a DAG object to add tasks to
DAG = DAG('solr-mentions-cleanups',
default_args=DEFAULT_ARGS,
schedule_interval='* * * * *'
)
DAG.catchup = False
# Create a task for each solr-mentions-cleanup instance.
for instance_id, start_script in INSTANCE_START_SCRIPT_PATHS.items():
task = SSHOperator(
task_id='run-solr-mentions-cleanups-{0}'.format(instance_id),
command='bash {0} disabled-queries --delete'.format(start_script),
ssh_hook=SSH_HOOK,
dag=DAG)
and here is the output from running the task:
(venv) airflow#some_host ~ # airflow run solr-mentions-cleanups run-solr-mentions-cleanups-0 2019-02-14
[2019-02-20 12:38:51,313] {settings.py:146} DEBUG - Setting up DB connection pool (PID 16375)
[2019-02-20 12:38:51,313] {settings.py:174} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600, pid=16375
[2019-02-20 12:38:51,491] {__init__.py:42} DEBUG - Cannot import due to doesn't look like a module path
[2019-02-20 12:38:51,645] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-02-20 12:38:51,654] {cli_action_loggers.py:40} DEBUG - Adding <function default_action_log at 0x7f0364fdc8c8> to pre execution callback
[2019-02-20 12:38:51,930] {cli_action_loggers.py:64} DEBUG - Calling callbacks: [<function default_action_log at 0x7f0364fdc8c8>]
[2019-02-20 12:38:51,974] {settings.py:146} DEBUG - Setting up DB connection pool (PID 16375)
[2019-02-20 12:38:51,974] {settings.py:154} DEBUG - settings.configure_orm(): Using NullPool
[2019-02-20 12:38:51,976] {models.py:273} INFO - Filling up the DagBag from /etc/airflow/dags
[2019-02-20 12:38:51,978] {models.py:360} INFO - File /etc/airflow/dags/__init__.py assumed to contain no DAGs. Skipping.
[2019-02-20 12:38:51,978] {models.py:363} DEBUG - Importing /etc/airflow/dags/hbase-exports.py
[2019-02-20 12:38:51,983] {models.py:501} DEBUG - Loaded DAG <DAG: hbase-daily-export>
[2019-02-20 12:38:51,984] {models.py:363} DEBUG - Importing /etc/airflow/dags/test_dag.py
[2019-02-20 12:38:51,985] {models.py:501} DEBUG - Loaded DAG <DAG: test_dag>
[2019-02-20 12:38:51,986] {models.py:363} DEBUG - Importing /etc/airflow/dags/solr_mentions_cleanup.py
Creating dag
Config DB client: <bw_config_tools.connect.bw_config.ConfigDbClient object at 0x7f032739b4e0>
The key file given is /home/airflow/.ssh/id_rsa
[2019-02-20 12:38:52,196] {base_hook.py:83} INFO - Using connection to: id: solr-mentions-cleanups. Host: some_host, Port: None, Schema: None, Login: some_user, Password: None, extra: {}
extra connection info given:
Key file in extra options: None
SSH config file being used is /home/airflow/.ssh/config
[2019-02-20 12:38:52,198] {models.py:501} DEBUG - Loaded DAG <DAG: solr-mentions-cleanups>
[2019-02-20 12:38:52,251] {cli.py:520} INFO - Running <TaskInstance: solr-mentions-cleanups.run-solr-mentions-cleanups-0 2019-02-14T00:00:00+00:00 [success]> on host xxxxxx.net
[2019-02-20 12:38:54,026] {settings.py:146} DEBUG - Setting up DB connection pool (PID 16453)
[2019-02-20 12:38:54,027] {settings.py:174} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600, pid=16453
[2019-02-20 12:38:54,207] {__init__.py:42} DEBUG - Cannot import due to doesn't look like a module path
[2019-02-20 12:38:54,362] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-02-20 12:38:54,371] {cli_action_loggers.py:40} DEBUG - Adding <function default_action_log at 0x7f4345cfa8c8> to pre execution callback
[2019-02-20 12:38:54,622] {cli_action_loggers.py:64} DEBUG - Calling callbacks: [<function default_action_log at 0x7f4345cfa8c8>]
[2019-02-20 12:38:54,658] {settings.py:146} DEBUG - Setting up DB connection pool (PID 16453)
[2019-02-20 12:38:54,658] {settings.py:154} DEBUG - settings.configure_orm(): Using NullPool
[2019-02-20 12:38:54,660] {models.py:273} INFO - Filling up the DagBag from /etc/airflow/dags/solr_mentions_cleanup.py
[2019-02-20 12:38:54,662] {models.py:363} DEBUG - Importing /etc/airflow/dags/solr_mentions_cleanup.py
Config DB client: <bw_config_tools.connect.bw_config.ConfigDbClient object at 0x7f4308b5dc50>
The key file given is /home/airflow/.ssh/id_rsa
[2019-02-20 12:38:54,909] {base_hook.py:83} INFO - Using connection to: id: solr-mentions-cleanups. Host: some_host, Port: None, Schema: None, Login: some_user, Password: None, extra: {}
extra connection info given:
Key file in extra options: None
SSH config file being used is /home/airflow/.ssh/config
[2019-02-20 12:38:54,912] {models.py:501} DEBUG - Loaded DAG <DAG: solr-mentions-cleanups>
[2019-02-20 12:38:54,961] {cli.py:520} INFO - Running <TaskInstance: solr-mentions-cleanups.run-solr-mentions-cleanups-0 2019-02-14T00:00:00+00:00 [success]> on host xxxx.net
[2019-02-20 12:38:55,054] {cli_action_loggers.py:81} DEBUG - Calling callbacks: []
[2019-02-20 12:38:55,054] {settings.py:201} DEBUG - Disposing DB connection pool (PID 16453)
[2019-02-20 12:38:56,310] {cli_action_loggers.py:81} DEBUG - Calling callbacks: []
[2019-02-20 12:38:56,313] {settings.py:201} DEBUG - Disposing DB connection pool (PID 16375)
Furthermore, this used to work until I had to restart everything on the server it was running on, so I'm quite confident that this is close to the version that once worked.
Any ideas what I am doing wrong?
Also, fyi, we have it connected up to a postgres database running on another server. I've confirmed that the state definitely updates which confirms there are no connection issues there.
Related
I'm testing kubernetes pod operator running
Airflow 2.2.1 with kubernetes cnf 2.1.0 on my minikube.
I'm having issues trying to spaw a mock task:
init_environments = [
k8s.V1EnvVar(name='AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE', value='""'),
k8s.V1EnvVar(name='KUBERNETES__POD_TEMPLATE_FILE', value='""'),
k8s.V1EnvVar(name='POD_TEMPLATE_FILE', value='""')]
other_task = KubernetesPodOperator(
dag=dag,
task_id="ingestion_kube",
env_vars=init_environments,
cmds=["bash", "-cx"],
arguments=["echo 10 \n\n\n\n\n\n\n"],
name="base",
image="meltano-flieber",
image_pull_policy="IfNotExists",
in_cluster=True,
namespace="localkubeflow",
is_delete_operator_pod=False,
pod_template_file=None,
get_logs=True
)
The task pod from KubernetesExecutor that executes the airflow completes successfully but there is no sign of the pod operator task.
Relevant logs of airflow task:
[2021-11-23 21:30:28,213] {dagbag.py:500} INFO - Filling up the DagBag from /opt/***/dags/meltano/meltano_ingest_pendo.py
Running <TaskInstance: meltano_tasks.ingestion_kube manual__2021-11-23T21:30:10.788963+00:00 [queued]> on host meltanotasksingestionkube.996b19ee10464c2f8683cdfc8ce7303
And in airflow tha task looks like if failed but without any relevant logs, anybody have something related to that or has any suggestions?
Apache Airflow version: 1.10.10
Kubernetes version (if you are using kubernetes) (use kubectl version): Not using Kubernetes or docker
Environment: CentOS Linux release 7.7.1908 (Core) Linux 3.10.0-1062.el7.x86_64
Python Version: 3.7.6
Executor: LocalExecutor
What happened:
I write a simple dag to clean airflow logs. Everything is OK when I use 'airflow test' command to test it, I also trigger it manually in WebUI which use 'airflow run' command to start my task, it is still OK.
But after I reboot my server and restart my webserver & scheduler service (in daemon mode), every time I trigger the exactly same dag, it still get scheduled like usual, but exit with code 1 immediately after start a new process to run task.
I also use 'airflow test' command again to check if there is something wrong with my code now, but everything seems OK when using 'airflow test', but exit silently when using 'airflow run', it is really weird.
Here's the task log when it's manually triggered in WebUI ( I've changed the log level to DEBUG, but still can't find anything useful), or you can read the attached log file: task error log.txt
Reading local file: /root/airflow/logs/airflow_log_cleanup/log_cleanup_worker_num_1/2020-04-29T13:51:44.071744+00:00/1.log
[2020-04-29 21:51:53,744] {base_task_runner.py:61} DEBUG - Planning to run as the user
[2020-04-29 21:51:53,750] {taskinstance.py:686} DEBUG - dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set.
[2020-04-29 21:51:53,754] {taskinstance.py:686} DEBUG - dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying.
[2020-04-29 21:51:53,754] {taskinstance.py:686} DEBUG - dependency 'Task Instance State' PASSED: True, Task state queued was valid.
[2020-04-29 21:51:53,754] {taskinstance.py:669} INFO - Dependencies all met for
[2020-04-29 21:51:53,757] {taskinstance.py:686} DEBUG - dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set.
[2020-04-29 21:51:53,760] {taskinstance.py:686} DEBUG - dependency 'Pool Slots Available' PASSED: True, ('There are enough open slots in %s to execute the task', 'default_pool')
[2020-04-29 21:51:53,766] {taskinstance.py:686} DEBUG - dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying.
[2020-04-29 21:51:53,768] {taskinstance.py:686} DEBUG - dependency 'Task Concurrency' PASSED: True, Task concurrency is not set.
[2020-04-29 21:51:53,768] {taskinstance.py:669} INFO - Dependencies all met for
[2020-04-29 21:51:53,768] {taskinstance.py:879} INFO -
[2020-04-29 21:51:53,768] {taskinstance.py:880} INFO - Starting attempt 1 of 2
[2020-04-29 21:51:53,768] {taskinstance.py:881} INFO -
[2020-04-29 21:51:53,779] {taskinstance.py:900} INFO - Executing on 2020-04-29T13:51:44.071744+00:00
[2020-04-29 21:51:53,781] {standard_task_runner.py:53} INFO - Started process 29718 to run task
[2020-04-29 21:51:53,805] {logging_mixin.py:112} INFO - [2020-04-29 21:51:53,805] {cli_action_loggers.py:68} DEBUG - Calling callbacks: []
[2020-04-29 21:51:53,818] {logging_mixin.py:112} INFO - [2020-04-29 21:51:53,817] {cli_action_loggers.py:86} DEBUG - Calling callbacks: []
[2020-04-29 21:51:58,759] {logging_mixin.py:112} INFO - [2020-04-29 21:51:58,759] {base_job.py:200} DEBUG - [heartbeat]
[2020-04-29 21:51:58,759] {logging_mixin.py:112} INFO - [2020-04-29 21:51:58,759] {local_task_job.py:124} DEBUG - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.98824 s
[2020-04-29 21:52:03,753] {logging_mixin.py:112} INFO - [2020-04-29 21:52:03,753] {local_task_job.py:103} INFO - Task exited with return code 1
How to reproduce it:
I really don't know how to reproduce it. because it happens suddenly, and seems like permanently??
Anything else we need to know:
I try to figure out the difference between 'airflow test' and 'airflow run', it might have something to do with process fork I guess?
What I've tried to solve this problem but all failed:
clear all dag/dag run/task instance info, remove all files under /root/airflow except for the config file, and restart my service
reboot my server again
uninstall airflow and install it again
I finally figure out how to reproduce this bug.
When you config email in airflow.cfg and your dag contains email operator or use smtp serivce, if your smtp password contains character like "^", the first task of your dag will 100% exited with return code 1 without any error information, in my case the first task is merely a python operator.
Although I think it's my bad to mess up smtp service, there should be some reasonable hints, actually it takes me a whole week to debug this, I have to reset everything in my airflow environment and slowly change configuration to see when does this bug happens.
Hope this information is helpful
I'm having trouble getting the LocalExecutor to work.
I created a postgres database called airflow and granted all privileges to the airflow user. Finally I updated my airflow.cfg file:
# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor, KubernetesExecutor
executor = LocalExecutor
# The SqlAlchemy connection string to the metadata database.
# SqlAlchemy supports many different database engine, more information
# their website
sql_alchemy_conn = postgresql+psycopg2://airflow:[MY_PASSWORD]#localhost:5432/airflow
Next I ran:
airflow initdb
airflow scheduler
airflow webserver
I thought it was working, but I noticed my dags were taking a long time to finish. Upon further inspection of my log files, I noticed that they say Airflow is using the SequentialExecutor.
INFO - Job 319: Subtask create_task_send_email [2020-01-07 12:00:16,997] {__init__.py:51} INFO - Using executor SequentialExecutor
Does anyone know what could be causing this?
I want to run a simple Dag "test_update_bq", but when I go to localhost I see this: DAG "test_update_bq" seems to be missing.
There are no errors when I run "airflow initdb", also when I run test airflow test test_update_bq update_table_sql 2015-06-01, It was successfully done and the table was updated in BQ. Dag:
from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'Anna',
'depends_on_past': True,
'start_date': datetime(2017, 6, 2),
'email': ['airflow#airflow.com'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 5,
'retry_delay': timedelta(minutes=5),
}
schedule_interval = "00 21 * * *"
# Define DAG: Set ID and assign default args and schedule interval
dag = DAG('test_update_bq', default_args=default_args, schedule_interval=schedule_interval, template_searchpath = ['/home/ubuntu/airflow/dags/sql_bq'])
update_task = BigQueryOperator(
dag = dag,
allow_large_results=True,
task_id = 'update_table_sql',
sql = 'update_bq.sql',
use_legacy_sql = False,
bigquery_conn_id = 'test'
)
update_task
I would be grateful for any help.
/logs/scheduler
[2019-10-10 11:28:53,308] {logging_mixin.py:95} INFO - [2019-10-10 11:28:53,308] {dagbag.py:90} INFO - Filling up the DagBag from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:53,333] {scheduler_job.py:1532} INFO - DAG(s) dict_keys(['test_update_bq']) retrieved from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:53,383] {scheduler_job.py:152} INFO - Processing /home/ubuntu/airflow/dags/update_bq.py took 0.082 seconds
[2019-10-10 11:28:56,315] {logging_mixin.py:95} INFO - [2019-10-10 11:28:56,315] {settings.py:213} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=3600, pid=11761
[2019-10-10 11:28:56,318] {scheduler_job.py:146} INFO - Started process (PID=11761) to work on /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,324] {scheduler_job.py:1520} INFO - Processing file /home/ubuntu/airflow/dags/update_bq.py for tasks to queue
[2019-10-10 11:28:56,325] {logging_mixin.py:95} INFO - [2019-10-10 11:28:56,325] {dagbag.py:90} INFO - Filling up the DagBag from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,350] {scheduler_job.py:1532} INFO - DAG(s) dict_keys(['test_update_bq']) retrieved from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,399] {scheduler_job.py:152} INFO - Processing /home/ubuntu/airflow/dags/update_bq.py took 0.081 seconds
Restarting the airflow webserver helped.
So I kill gunicorn process on ubuntu and then restart airflow webserver
This error is usually due to an exception happening when Airflow tries to parse a DAG. So the DAG gets registered in metastore(thus visible UI), but it wasn't parsed by Airflow. Can you take a look at Airflow logs, you might see an exception causing this error.
None of the responses helped me solving this issue.
However after spending some time I found out how to see the exact problem.
In my case I ran airflow (v2.4.0) using helm chart (v1.6.0) inside kubernetes. It created multiple docker containers. I got into the running container using ssh and executed two commands using airflow's cli and this helped me a lot to debug and understand the problem
airflow dags report
airflow dags reserialize
In my case the problem was that database schema didn't match the airflow version.
In my tasks, I have execution_timeout=timedelta(minutes=1) set in my task and 'dagrun_timeout': timedelta(minutes=2) for my DAG, and this is correctly reflected in the web GUI's Task Instance Details. However, none of my task instances are actually set to failed or retry when breaching the one minute threshold. Rather, they time out at 11 minutes...
[2017-11-02 18:00:05,376] {base_task_runner.py:95} INFO - Subtask: [2017-11-02 18:00:05,370] {base_hook.py:67} INFO - Using connection to: [REDACTED]
[2017-11-02 18:10:06,505] {base_task_runner.py:95} INFO - Subtask: [2017-11-02 18:10:06,504] {timeout.py:37} ERROR - Process timed out
Do I have a problem with my configuration, or is there something buggy happening with how Airflow interprets time out settings?