I want to run a simple Dag "test_update_bq", but when I go to localhost I see this: DAG "test_update_bq" seems to be missing.
There are no errors when I run "airflow initdb", also when I run test airflow test test_update_bq update_table_sql 2015-06-01, It was successfully done and the table was updated in BQ. Dag:
from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'Anna',
'depends_on_past': True,
'start_date': datetime(2017, 6, 2),
'email': ['airflow#airflow.com'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 5,
'retry_delay': timedelta(minutes=5),
}
schedule_interval = "00 21 * * *"
# Define DAG: Set ID and assign default args and schedule interval
dag = DAG('test_update_bq', default_args=default_args, schedule_interval=schedule_interval, template_searchpath = ['/home/ubuntu/airflow/dags/sql_bq'])
update_task = BigQueryOperator(
dag = dag,
allow_large_results=True,
task_id = 'update_table_sql',
sql = 'update_bq.sql',
use_legacy_sql = False,
bigquery_conn_id = 'test'
)
update_task
I would be grateful for any help.
/logs/scheduler
[2019-10-10 11:28:53,308] {logging_mixin.py:95} INFO - [2019-10-10 11:28:53,308] {dagbag.py:90} INFO - Filling up the DagBag from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:53,333] {scheduler_job.py:1532} INFO - DAG(s) dict_keys(['test_update_bq']) retrieved from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:53,383] {scheduler_job.py:152} INFO - Processing /home/ubuntu/airflow/dags/update_bq.py took 0.082 seconds
[2019-10-10 11:28:56,315] {logging_mixin.py:95} INFO - [2019-10-10 11:28:56,315] {settings.py:213} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=3600, pid=11761
[2019-10-10 11:28:56,318] {scheduler_job.py:146} INFO - Started process (PID=11761) to work on /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,324] {scheduler_job.py:1520} INFO - Processing file /home/ubuntu/airflow/dags/update_bq.py for tasks to queue
[2019-10-10 11:28:56,325] {logging_mixin.py:95} INFO - [2019-10-10 11:28:56,325] {dagbag.py:90} INFO - Filling up the DagBag from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,350] {scheduler_job.py:1532} INFO - DAG(s) dict_keys(['test_update_bq']) retrieved from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,399] {scheduler_job.py:152} INFO - Processing /home/ubuntu/airflow/dags/update_bq.py took 0.081 seconds
Restarting the airflow webserver helped.
So I kill gunicorn process on ubuntu and then restart airflow webserver
This error is usually due to an exception happening when Airflow tries to parse a DAG. So the DAG gets registered in metastore(thus visible UI), but it wasn't parsed by Airflow. Can you take a look at Airflow logs, you might see an exception causing this error.
None of the responses helped me solving this issue.
However after spending some time I found out how to see the exact problem.
In my case I ran airflow (v2.4.0) using helm chart (v1.6.0) inside kubernetes. It created multiple docker containers. I got into the running container using ssh and executed two commands using airflow's cli and this helped me a lot to debug and understand the problem
airflow dags report
airflow dags reserialize
In my case the problem was that database schema didn't match the airflow version.
Related
I have configured a dag in such a way that if current instance has failed next instance won't run. However, the problem is.
Problem
let's say past instance of the task is failed and current instance is in waiting state. Once I fix the issue how to run the current instance without making past run successful. I want to see the history when the task(dag) failed.
DAG
dag = DAG(
dag_id='test_airflow',
default_args=args,
tags=['wealth', 'python', 'ml'],
schedule_interval='5 13 * * *',
max_active_runs=1,
)
run_this = BashOperator(
task_id='run_after_loop',
bash_command='lll',
dag=dag,
depends_on_past=True
)
I guess you could trigger a task execution via cli using airflow run
There are two arguments that may help you:
-i, --ignore_dependencies - Ignore task-specific dependencies, e.g. upstream, depends_on_past, and retry delay dependencies
-I, --ignore_depends_on_past - Ignore depends_on_past dependencies (but respect upstream dependencies)
I am using Apache Airflow version 1.10.3 with the sequential executor, and I would like the DAG to fail after a certain amount of time if it has not finished. I tried setting dagrun_timeout in the example code
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'me',
'depends_on_past': False,
'start_date': datetime(2019, 6, 1),
'retries': 0,
}
dag = DAG('min_timeout', default_args=default_args, schedule_interval=timedelta(minutes=5), dagrun_timeout = timedelta(seconds=30), max_active_runs=1)
t1 = BashOperator(
task_id='fast_task',
bash_command='date',
dag=dag)
t2 = BashOperator(
task_id='slow_task',
bash_command='sleep 45',
dag=dag)
t2.set_upstream(t1)
slow_task alone takes more than the time limit set by dagrun_timeout, so my understanding is that airflow should stop DAG execution. However, this does not happen, and slow_task is allowed to run for its entire duration. After this occurs, the run is marked as failed, but this does not kill the task or DAG as desired. Using execution_timeout for slow_task does cause the task to be killed at the specified time limit, but I would prefer to use an overall time limit for the DAG rather than specifying execution_timeout for each task.
Is there anything else I should try to achieve this behavior, or any mistakes I can fix?
The Airflow scheduler runs a loop at least every SCHEDULER_HEARTBEAT_SEC (the default is 5 seconds).
Bear in mind at least here, because the scheduler performs some actions that may delay the next cycle of its loop.
These actions include:
parsing the dags
filling up the DagBag
checking the DagRun and updating their state
scheduling next DagRun
In your example, the delayed task isn't terminated at the dagrun_timeout because the scheduler performs its next cycle after the task completes.
According to Airflow documentation:
dagrun_timeout (datetime.timedelta) – specify how long a DagRun should be up before timing out / failing, so that new DagRuns can be created. The timeout is only enforced for scheduled DagRuns, and only once the # of active DagRuns == max_active_runs.
So dagrun_timeout wouldn't work for non-scheduled DagRuns (e.g. manually triggered) and if the number of active DagRuns < max_active_runs parameter.
We are running airflow version 1.9 in celery executor mode. Our task instances are stuck in retry mode. When the job fails, the task instance retries. After that it tries to run the task and then fall back to new retry time.
First state:
Task is not ready for retry yet but will be retried automatically. Current date is 2018-08-28T03:46:53.101483 and task will be retried at 2018-08-28T03:47:25.463271.
After some time:
All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:
- The scheduler is down or under heavy load
If this task instance does not start soon please contact your Airflow administrator for assistance
After some time: it again went into retry mode.
Task is not ready for retry yet but will be retried automatically. Current date is 2018-08-28T03:51:48.322424 and task will be retried at 2018-08-28T03:52:57.893430.
This is happening for all dags. We created a test dag and tried to get logs for both scheduler and worker logs.
from datetime import *
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.operators.bash_operator import BashOperator
default_args = {
'owner': 'Pramiti',
'depends_on_past': False,
'retries': 3,
'retry_delay': timedelta(minutes=1)
}
dag = DAG('airflow-examples.test_failed_dag_v2', description='Failed DAG',
schedule_interval='*/10 * * * *',
start_date=datetime(2018, 9, 7), default_args=default_args)
b = BashOperator(
task_id="ls_command",
bash_command="mdr",
dag=dag
)
Task Logs
Scheduler Logs
Worker Logs
I am new to airflow and I have written a simple SSHOperator to learn how it works.
default_args = {
'start_date': datetime(2018,6,20)
}
dag = DAG(dag_id='ssh_test', schedule_interval = '#hourly',default_args=default_args)
sshHook = SSHHook(ssh_conn_id='testing')
t1 = SSHOperator(
task_id='task1',
command='echo Hello World',
ssh_hook=sshHook,
dag=dag)
When I manually trigger it on the UI, the dag shows a status of running but the operator stays white, no status.
I'm wondering why my task isn't queuing. Does anyone have any ideas? My airflow.config is the default if that is useful information.
Even this isn't running
dag=DAG(dag_id='test',start_date = datetime(2018,6,21), schedule_interval='0 0 * * *')
runMe = DummyOperator(task_id = 'testest', dag = dag)
Make sure you've started the Airflow Scheduler in addition to the Airflow Web Server:
airflow scheduler
check if airflow scheduler is running
check if airflow webserver is running
check if all DAGs are set to On in the web UI
check if the DAGs have a start date which is in the past
check if the DAGs have a proper schedule (before the schedule date) which is shown in the web UI
check if the dag has the proper pool and queue.
So before asking this I've looked through the docs and had a look at Difference between "airflow run" and "airflow test" in Airflow to see if I can figure out why I am having this problem.
I've got a few dags, all of which use LocalExecutor. Two of them use an SSHOperator and the other just runs locally. I've tried airflow run <some_task_id> <some_execution_date> and airflow trigger_dag <dag_id>, all of which fail. However, when I run airflow test <task_id> <execution_date> it works. I should emphasise that this is also the case for the dag that only has a locally ran task.
There seems to be a lot of confusion around the start_date in dags, as well as how it relates to schedule_interval. All of my dags have a static start_date set to a time in the near past and just for sanity I have the schedule interval schedule_interval='* * * * *' so that it runs every minute (it's a lightweight task). When running the dag the task just runs and fails, if retries is set to 0, or it just gets stuck in a retry state if retries > 0, every minute without much feedback at all. All I ever get in task instance details, for each case, is:
Task instance's dagrun did not exist: Unknown reason.
or
Task is not ready for retry yet but will be retried automatically. Current date is 2019-02-20T12:30:35.381668+00:00 and task will be retried at 2019-02-20T12:31:21.492310+00:00.
There are no logs in the UI despite specifying where they should go in the config file and it telling me where they are from the task instance details page.
Here is an example of one of the dags:
import json
import re
from os.path import expanduser
from datetime import datetime, timedelta
from airflow import DAG
from airflow.utils import timezone
from airflow.contrib.hooks.ssh_hook import SSHHook
from airflow.contrib.operators.ssh_operator import SSHOperator
from bw_config_tools.connect.bw_config import ConfigDbClient
CONFIG_DB_INFO = '/etc/airflow/config_db_info.json'
START_SCRIPT = '/bin/start.sh'
TIME_IN_PAST = timezone.convert_to_utc(datetime(2019, 2, 14, 15, 00))
DEFAULT_ARGS = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': TIME_IN_PAST,
'email': ['example#domain.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
'retry_delay': timedelta(minutes=1),
}
def _extract_instance_id(instance_string):
return re.findall(r'\d+', instance_string)[0]
def _read_file_as_json(file_name):
with open(file_name) as open_file:
return json.load(open_file)
DB_INFO = _read_file_as_json(CONFIG_DB_INFO)
CONFIG_CLIENT = ConfigDbClient(**DB_INFO)
print('Config DB client: {0}'.format(CONFIG_CLIENT))
APP_DIRS = CONFIG_CLIENT.get_values('%solr-mentions-cleanup.[0-9]+.dir%', strictness='similar')
INSTANCE_START_SCRIPT_PATHS = {
_extract_instance_id(instance_string): directory+START_SCRIPT
for instance_string, directory in APP_DIRS.items()
}
# Create an ssh hook which refers to pre-existing connection information
# setup and stored by airflow
SSH_HOOK = SSHHook(ssh_conn_id='solr-mentions-cleanups', key_file='/home/airflow/.ssh/id_rsa')
# Create a DAG object to add tasks to
DAG = DAG('solr-mentions-cleanups',
default_args=DEFAULT_ARGS,
schedule_interval='* * * * *'
)
DAG.catchup = False
# Create a task for each solr-mentions-cleanup instance.
for instance_id, start_script in INSTANCE_START_SCRIPT_PATHS.items():
task = SSHOperator(
task_id='run-solr-mentions-cleanups-{0}'.format(instance_id),
command='bash {0} disabled-queries --delete'.format(start_script),
ssh_hook=SSH_HOOK,
dag=DAG)
and here is the output from running the task:
(venv) airflow#some_host ~ # airflow run solr-mentions-cleanups run-solr-mentions-cleanups-0 2019-02-14
[2019-02-20 12:38:51,313] {settings.py:146} DEBUG - Setting up DB connection pool (PID 16375)
[2019-02-20 12:38:51,313] {settings.py:174} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600, pid=16375
[2019-02-20 12:38:51,491] {__init__.py:42} DEBUG - Cannot import due to doesn't look like a module path
[2019-02-20 12:38:51,645] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-02-20 12:38:51,654] {cli_action_loggers.py:40} DEBUG - Adding <function default_action_log at 0x7f0364fdc8c8> to pre execution callback
[2019-02-20 12:38:51,930] {cli_action_loggers.py:64} DEBUG - Calling callbacks: [<function default_action_log at 0x7f0364fdc8c8>]
[2019-02-20 12:38:51,974] {settings.py:146} DEBUG - Setting up DB connection pool (PID 16375)
[2019-02-20 12:38:51,974] {settings.py:154} DEBUG - settings.configure_orm(): Using NullPool
[2019-02-20 12:38:51,976] {models.py:273} INFO - Filling up the DagBag from /etc/airflow/dags
[2019-02-20 12:38:51,978] {models.py:360} INFO - File /etc/airflow/dags/__init__.py assumed to contain no DAGs. Skipping.
[2019-02-20 12:38:51,978] {models.py:363} DEBUG - Importing /etc/airflow/dags/hbase-exports.py
[2019-02-20 12:38:51,983] {models.py:501} DEBUG - Loaded DAG <DAG: hbase-daily-export>
[2019-02-20 12:38:51,984] {models.py:363} DEBUG - Importing /etc/airflow/dags/test_dag.py
[2019-02-20 12:38:51,985] {models.py:501} DEBUG - Loaded DAG <DAG: test_dag>
[2019-02-20 12:38:51,986] {models.py:363} DEBUG - Importing /etc/airflow/dags/solr_mentions_cleanup.py
Creating dag
Config DB client: <bw_config_tools.connect.bw_config.ConfigDbClient object at 0x7f032739b4e0>
The key file given is /home/airflow/.ssh/id_rsa
[2019-02-20 12:38:52,196] {base_hook.py:83} INFO - Using connection to: id: solr-mentions-cleanups. Host: some_host, Port: None, Schema: None, Login: some_user, Password: None, extra: {}
extra connection info given:
Key file in extra options: None
SSH config file being used is /home/airflow/.ssh/config
[2019-02-20 12:38:52,198] {models.py:501} DEBUG - Loaded DAG <DAG: solr-mentions-cleanups>
[2019-02-20 12:38:52,251] {cli.py:520} INFO - Running <TaskInstance: solr-mentions-cleanups.run-solr-mentions-cleanups-0 2019-02-14T00:00:00+00:00 [success]> on host xxxxxx.net
[2019-02-20 12:38:54,026] {settings.py:146} DEBUG - Setting up DB connection pool (PID 16453)
[2019-02-20 12:38:54,027] {settings.py:174} INFO - settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=3600, pid=16453
[2019-02-20 12:38:54,207] {__init__.py:42} DEBUG - Cannot import due to doesn't look like a module path
[2019-02-20 12:38:54,362] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-02-20 12:38:54,371] {cli_action_loggers.py:40} DEBUG - Adding <function default_action_log at 0x7f4345cfa8c8> to pre execution callback
[2019-02-20 12:38:54,622] {cli_action_loggers.py:64} DEBUG - Calling callbacks: [<function default_action_log at 0x7f4345cfa8c8>]
[2019-02-20 12:38:54,658] {settings.py:146} DEBUG - Setting up DB connection pool (PID 16453)
[2019-02-20 12:38:54,658] {settings.py:154} DEBUG - settings.configure_orm(): Using NullPool
[2019-02-20 12:38:54,660] {models.py:273} INFO - Filling up the DagBag from /etc/airflow/dags/solr_mentions_cleanup.py
[2019-02-20 12:38:54,662] {models.py:363} DEBUG - Importing /etc/airflow/dags/solr_mentions_cleanup.py
Config DB client: <bw_config_tools.connect.bw_config.ConfigDbClient object at 0x7f4308b5dc50>
The key file given is /home/airflow/.ssh/id_rsa
[2019-02-20 12:38:54,909] {base_hook.py:83} INFO - Using connection to: id: solr-mentions-cleanups. Host: some_host, Port: None, Schema: None, Login: some_user, Password: None, extra: {}
extra connection info given:
Key file in extra options: None
SSH config file being used is /home/airflow/.ssh/config
[2019-02-20 12:38:54,912] {models.py:501} DEBUG - Loaded DAG <DAG: solr-mentions-cleanups>
[2019-02-20 12:38:54,961] {cli.py:520} INFO - Running <TaskInstance: solr-mentions-cleanups.run-solr-mentions-cleanups-0 2019-02-14T00:00:00+00:00 [success]> on host xxxx.net
[2019-02-20 12:38:55,054] {cli_action_loggers.py:81} DEBUG - Calling callbacks: []
[2019-02-20 12:38:55,054] {settings.py:201} DEBUG - Disposing DB connection pool (PID 16453)
[2019-02-20 12:38:56,310] {cli_action_loggers.py:81} DEBUG - Calling callbacks: []
[2019-02-20 12:38:56,313] {settings.py:201} DEBUG - Disposing DB connection pool (PID 16375)
Furthermore, this used to work until I had to restart everything on the server it was running on, so I'm quite confident that this is close to the version that once worked.
Any ideas what I am doing wrong?
Also, fyi, we have it connected up to a postgres database running on another server. I've confirmed that the state definitely updates which confirms there are no connection issues there.