I'm learning Airflow and just want to get up and running with the Quickstart: https://airflow.apache.org/docs/apache-airflow/stable/start.html
I'm not sure if this is a virtual environment issue or something I'm missing with Airflow that should be obvious and this may be a duplicate of this question from 2017: Running Airflow task from the command line does not work but there were no answers there.
My OS is POP_OS (debian)
I have created a new virtual environment and installed airflow by running the script provided:
# Airflow needs a home. `~/airflow` is the default, but you can put it
# somewhere else if you prefer (optional)
export AIRFLOW_HOME=~/airflow
# Install Airflow using the constraints file
AIRFLOW_VERSION=2.5.0
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
# For example: 3.7
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.5.0/constraints-3.7.txt
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
# The Standalone command will initialise the database, make a user,
# and start all components for you.
airflow standalone
# Visit localhost:8080 in the browser and use the admin account details
# shown on the terminal to login.
# Enable the example_bash_operator dag in the home page
I was expecting this to be plug-and-play and I haven't even reached the tutorials yet.
airflow standalone works and I can run the DAG from the web UI. However, if I run
# run your first task instance
airflow tasks run example_bash_operator runme_0 2015-01-01
from the CLI, I get
airflow.exceptions.DagRunNotFound: DagRun for example_bash_operator with run_id or execution_date of '2015-01-01' not found
full error:
(airflow) jasonstewartnz#pop-os:~$ airflow tasks run example_bash_operator runme_0 2015-01-01
[2022-12-22 11:11:18,776] {dagbag.py:538} INFO - Filling up the DagBag from /home/jasonstewartnz/airflow/dags
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): create_entry_group>, delete_entry_group already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): delete_entry_group>, create_entry_group already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): create_entry_gcs>, delete_entry already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): delete_entry>, create_entry_gcs already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): create_tag>, delete_tag already registered for DAG: example_complex
[2022-12-22 11:11:18,820] {taskmixin.py:205} WARNING - Dependency <Task(BashOperator): delete_tag>, create_tag already registered for DAG: example_complex
[2022-12-22 11:11:18,832] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,832] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
[2022-12-22 11:11:18,833] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/models/dag.py:3492 RemovedInAirflow3Warning: Param `schedule_interval` is deprecated and will be removed in a future release. Please use `schedule` instead.
[2022-12-22 11:11:18,914] {taskmixin.py:205} WARNING - Dependency <Task(_PythonDecoratedOperator): prepare_email>, send_email already registered for DAG: example_dag_decorator
[2022-12-22 11:11:18,914] {taskmixin.py:205} WARNING - Dependency <Task(EmailOperator): send_email>, prepare_email already registered for DAG: example_dag_decorator
Traceback (most recent call last):
File "/home/jasonstewartnz/.venv/airflow/bin/airflow", line 8, in <module>
sys.exit(main())
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/__main__.py", line 39, in main
args.func(args)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/cli_parser.py", line 52, in command
return func(*args, **kwargs)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/utils/cli.py", line 108, in wrapper
return f(*args, **kwargs)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/commands/task_command.py", line 384, in task_run
ti, _ = _get_ti(task, args.map_index, exec_date_or_run_id=args.execution_date_or_run_id, pool=args.pool)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/utils/session.py", line 75, in wrapper
return func(*args, session=session, **kwargs)
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/commands/task_command.py", line 159, in _get_ti
dag_run, dr_created = _get_dag_run(
File "/home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/cli/commands/task_command.py", line 115, in _get_dag_run
raise DagRunNotFound(
airflow.exceptions.DagRunNotFound: DagRun for example_bash_operator with run_id or execution_date of '2015-01-01' not found
The web UI tells me my config is /home/jasonstewartnz/airflow/airflow.cfg
The dags_folder in this /home/jasonstewartnz/airflow/dags is empty.
When I go to
http://localhost:8080/dags/example_bash_operator/details
I see that the fileloc attribute for the dag is:
fileloc /home/jasonstewartnz/.venv/airflow/lib/python3.10/site-packages/airflow/example_dags/example_bash_operator.py
Even if I copy this file to the DAGs directory or change the DAGs directory in the config to the above, or add it to the path the CLI still seems unable to find the DAG.
did you enable the DAG, 'example_bash_operator' in the UI as the instructions specify? I am referring to this step in the guide's instructios:
# Visit localhost:8080 in the browser and use the admin account details
# shown on the terminal to login.
# Enable the example_bash_operator dag in the home page
This should be done before you attempt to execute command, airflow tasks run.
Related
Given below json:
{ "Model" : "level1" }
what is the right combination of message_filtering_match_values and message_filtering_config values? I try below but it fails:
model_operator = SQSSensor(
task_id='model_operator',
dag=dag,
sqs_queue='https://sqs.somewhere/somequeue.fifo',
aws_conn_id='aws_default',
message_filtering='jsonpath',
message_filtering_config='Model[*]',
message_filtering_match_values=['level1'],
mode='reschedule')
Error message is:
Broken DAG: [/usr/local/airflow/dags/test_dag.py] Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/utils/decorators.py", line 94, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 414, in __init__
"arguments were:\n**kwargs: {k}".format(c=self.__class__.__name__, k=kwargs, t=task_id),
airflow.exceptions.AirflowException: Invalid arguments were passed to SQSSensor (task_id: model_operator). Invalid arguments were:
**kwargs: {'message_filtering': 'jsonpath', 'message_filtering_config': 'Model[*]', 'message_filtering_match_values': ['level1']}
The message_filtering / message_filtering_config / message_filtering_match_values were added recently in PR it was released in Amazon provider version 2.2.0
From the traceback we can see that these parameters are not recognized by the operator which means that you are running an older version of the Amazon provider.
You should upgrade the Amazon provider to the latest version.
pip install apache-airflow-providers-amazon --upgrade
It's also recommended to read the documentation about constraint files.
You didn't mention what Airflow version you are running nor what version of the Amazon provider so note to read the change logs in case you are upgrading major version.
Using Airflow 1.10.3, our DAGs run once everyday. But sometimes, if it spills over, it results in collision of tasks, leading to failure.
Even though the tasks belong to different DAG runs, but since the task name is the same, it results in failure.
[2021-07-27 11:49:09,642] {logging_mixin.py:95} INFO - [<TaskInstance: dag1.part1.task1 2021-07-24 05:30:00+00:00 [running]>, <TaskInstance: dag1.part1.task1 2021-07-26 05:30:00+00:00 [running]>]
[2021-07-27 11:49:09,642] {logging_mixin.py:95} INFO - [2021-07-27 11:49:09,642] {airflow_utils.py:114} INFO - Found existing tasks running with state RUNNING - [<TaskInstance: dag1.part1.task1 2021-07-24 05:30:00+00:00 [running]>, <TaskInstance: dag1.part1.task1 2021-07-26 05:30:00+00:00 [running]>]. Therefore skipping this task instance
[2021-07-27 11:49:09,644] {__init__.py:1580} ERROR - Found existing tasks running with state RUNNING - [<TaskInstance: dag1.part1.task1 2021-07-24 05:30:00+00:00 [running]>, <TaskInstance: dag1.part1.task1 2021-07-26 05:30:00+00:00 [running]>]. Therefore skipping this task instance
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 73, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/airflow/dags/utils/airflow_utils.py", line 115, in execute
raise RuntimeError(error)
RuntimeError: Found existing tasks running with state RUNNING - [<TaskInstance: dag1.part1.task1 2021-07-24 05:30:00+00:00 [running]>, <TaskInstance: dag1.part1.task1 2021-07-26 05:30:00+00:00 [running]>]. Therefore skipping this task instance
Any idea how to fix this? Thanks in advance.
Based on this entry in the log:
ERROR - Found existing tasks running with state RUNNING ... Therefore skipping this task instance
I suspect you have max_active_runs or max_active_runs_per_dag set to 1. If you need or expect multiple runs of the DAG to execute concurrently you can increase that number accordingly.
I'm running Apache Airflow 2.x locally, using the Docker Compose file that is provided in the documentation. In the .\dags directory on my local filesystem (which is mounted into the Airflow containers), I create a new Python script file, and implement the DAG using the TaskFlow API.
The changed to my DAG are sometimes invalid. For example, maybe I have an ImportError due to an invalid module name, or a syntax error. When Airflow attempts to import the DAG, I cannot find any log messages, from the web server, scheduler, or worker, that would indicate a problem, or what the specific problem is.
Instead, I have to read through my code line-by-line, and look for a problem. This problem is compounded by the fact that my local Python environment on Windows 10, and the Python environment for Airflow, are different versions and have different Python packages installed. Hence, I cannot reliably use my local development environment to detect package import failures, because the packages I expect to be installed in the Airflow environment are different than the ones I have locally. Additionally, the version of Python I'm using to write code locally, and the Python version being used by Airflow, are not matched up.
Thus, I am needing some kind of error logging to indicate that a DAG import failed.
Question: When a DAG fails to update / import, where are the logs to indicate if an import failure occurred, and what the exact error message was?
Currently, the DAG parsing logs would be under $AIRFLOW_HOME/logs/EXECUTION_DATE/scheduler/DAG_FILE.py.log
Example:
Let's say my DAG file is example-dag.py which has the following contents, as you can notice there is a typo in datetime import:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import dattime # <-- This Line has typo
dag = DAG(
dag_id='example_Dag',
schedule_interval=None,
start_date=datetime(2019, 2, 6),
)
t1 = BashOperator(
task_id='print_date1',
bash_command='sleep $[ ( $RANDOM % 30 ) + 1 ]s',
dag=dag)
Now, if you check logs under $AIRFLOW_HOME/logs/scheduler/2021-04-07/example-dag.py.log where $AIRFLOW_HOME/logs is what I have set in $AIRFLOW__LOGGING__BASE_LOG_FOLDER or [logging] base_log_folder in airflow.cfg (https://airflow.apache.org/docs/apache-airflow/2.0.1/configurations-ref.html#base-log-folder)
That file should have logs as below:
[2021-04-07 21:39:02,222] {scheduler_job.py:182} INFO - Started process (PID=686) to work on /files/dags/example-dag.py
[2021-04-07 21:39:02,230] {scheduler_job.py:633} INFO - Processing file /files/dags/example-dag.py for tasks to queue
[2021-04-07 21:39:02,233] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,233] {dagbag.py:451} INFO - Filling up the DagBag from /files/dags/example-dag.py
[2021-04-07 21:39:02,368] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,357] {dagbag.py:308} ERROR - Failed to import: /files/dags/example-dag.py
Traceback (most recent call last):
File "/opt/airflow/airflow/models/dagbag.py", line 305, in _load_modules_from_file
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/files/dags/example-dag.py", line 3, in <module>
from datetime import dattime
ImportError: cannot import name 'dattime'
[2021-04-07 21:39:02,380] {scheduler_job.py:645} WARNING - No viable dags retrieved from /files/dags/example-dag.py
[2021-04-07 21:39:02,407] {scheduler_job.py:190} INFO - Processing /files/dags/example-dag.py took 0.189 seconds
and you will see the error in the Webserver as follow:
Having an issue doing a backfill. When I run this in the command-line
airflow backfill my_dag -s 2021-01-01 -e 2021-01-12
the dag is triggered and begins running. The first task (a simple python script with no external dependencies) completes but in the second task, I receive this error about fernet key. The dag is just calling a MsSqlOperator.
cryptography.fernet.InvalidToken
If I trigger the dag manually in the UI, all the steps succeed.
If I trigger one execution in the dag in the CLI, the dag succeeds.
airflow dags trigger -e '2021-01-19T04:00:00' my_dag
The fernet key is in the config file and we've already run resetdb and re-created the connections. Same issue exists, backfill command doesn't work but other methods do.
Also tried using the --local flag (not sure what this does) but it doesn't work either.
Any ideas how to troubleshoot?
Running Airflow 1.10.14 on-prem with LocalExecutor. Edit: Issue exists when using .15 too.
Backfill Doc for Airflow 1.10.14
Logs:
Note, it says something about variable missing but that's misleading since it works if I manually run it from the UI. Key line is
ERROR - Can't decrypt _val for key=xyz_users_overlap_import, FERNET_KEY configuration missing
Full log:
INFO - Job 4303: Subtask download_user_files
[2021-05-17 23:41:29,327] {{logging_mixin.py:120}} INFO - Running <TaskInstance: xyz_users_overlap_import.download_user_files 2021-01-22T15:10:00+00:00 [running]> on host da2m.mycorp.corp
[2021-05-17 23:41:29,360] {{variable.py:58}} ERROR - Can't decrypt _val for key=xyz_users_overlap_import, FERNET_KEY configuration missing
[2021-05-17 23:41:29,361] {{taskinstance.py:1150}} ERROR - 'Variable xyz_users_overlap_import does not exist'
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 965, in _run_raw_task
self.render_templates(context=context)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1424, in render_templates
self.task.render_template_fields(context)
File "/usr/local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 719, in render_template_fields
self._do_render_template_fields(self, self.template_fields, context, jinja_env, set())
File "/usr/local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 726, in _do_render_template_fields
rendered_content = self.render_template(content, context, jinja_env, seen_oids)
File "/usr/local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 755, in render_template
return jinja_env.from_string(content).render(**context)
File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 1090, in render
self.environment.handle_exception()
File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "/usr/local/lib/python3.8/site-packages/jinja2/_compat.py", line 28, in reraise
raise value.with_traceback(tb)
File "<template>", line 1, in top-level template code
File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 471, in getattr
return getattr(obj, attribute)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1335, in __getattr__
self.var = Variable.get(item, deserialize_json=True)
File "/usr/local/lib/python3.8/site-packages/airflow/models/variable.py", line 117, in get
raise KeyError('Variable {} does not exist'.format(key))
KeyError: 'Variable xyz_users_overlap_import does not exist'
Have you tried by generating a new FERNET_KEY and set up as env variable (in this way you will overwrite the value from the airflow.cfg file):
export AIRFLOW__CORE__FERNET_KEY=your_fernet_key
The code to generate a new one is:
from cryptography.fernet import Fernet
fernet_key= Fernet.generate_key()
print(fernet_key.decode())
Do not forget to install before the package https://pypi.org/project/cryptography/ (pip install cryptography)
I have created virtualenv for python3 using:
virtualenv -p $(which python3) ENV
Then activate the source
source /Users/myusername/ENV/bin/activate
Install the apache-airflow:
pip install apache-airflow
then which airflow yields /Users/myusername/ENV/bin/airflow
But when I try to initdb using:
airflow initdb
I get below error:
{db.py:350} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
WARNI [airflow.utils.log.logging_mixin.LoggingMixin] cryptography not found - values will not be stored encrypted.
ERROR [airflow.models.DagBag] Failed to import: /Library/Python/2.7/site-packages/airflow/example_dags/example_http_operator.py
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/airflow/models/__init__.py", line 413, in process_file
m = imp.load_source(mod_name, filepath)
File "/Library/Python/2.7/site-packages/airflow/example_dags/example_http_operator.py", line 27, in <module>
from airflow.operators.http_operator import SimpleHttpOperator
File "/Library/Python/2.7/site-packages/airflow/operators/http_operator.py", line 21, in <module>
from airflow.hooks.http_hook import HttpHook
File "/Library/Python/2.7/site-packages/airflow/hooks/http_hook.py", line 23, in <module>
import tenacity
File "/Library/Python/2.7/site-packages/tenacity/__init__.py", line 375, in <module>
from tenacity.tornadoweb import TornadoRetrying
File "/Library/Python/2.7/site-packages/tenacity/tornadoweb.py", line 24, in <module>
from tornado import gen
File "/Library/Python/2.7/site-packages/tornado-6.0.3-py2.7-macosx-10.14-intel.egg/tornado/gen.py", line 126
def _value_from_stopiteration(e: Union[StopIteration, "Return"]) -> Any:
^
SyntaxError: invalid syntax
Done.
(ENV) ---------------------------------------------------------
Seems like example scripts use python 2.7 and it can't recognize the function definition syntax.
Does apache-airflow package need to be fixed by the next release or I can do something to fix this?
I tried fixing this:
Use python2.7 instead of python3
then install airflow on default python 2.7 enabled on mac but this throws other errors like package "six" is not compatible.
You need to turn off the example DAGs to be loaded in config file to solve this problem.
Anyway, it seems weird that airflow uses 2.7 Python when you told that it is installed into Python 3 virtual environment.