I'm taking advantage of the fact that Airflow v1.7.1.3 provides access to airflow.cfg to place some configuration values there rather than embedded in the code. We added the following as the first lines of the airflow.cfg file:
[foo]
bar = foo
bar
In the foobarDAG.py class representing the DAG, I do the following:
from airflow.configuration import conf
…
def fooBar():
pass
foobarList = conf['foo']['bar'].split('\n')
foobarOperator = PythonOperator(
task_id='fooBar',
provide_context=True,
python_callable=fooBar,
op_args=[foobarList],
dag=dag)
Testing this manually from the Python prompt is easy:
>>> from foobarDAG import foobarList
…
>>> foobarList
['foo', 'bar']
That's just what I would expect from the information in airflow.cfg, above.
We've also performed a test on the DAG directly:
airflow test foobarDAG fooBar 10-19-2016
That doesn't report any problems.
The problem crops up when we try to use the scheduler to schedule that one DAG:
airflow scheduler -d foobarDAG >& foobar_log.txt
In the web UI, we see the following at the top of the "DAGS" section:
Broken DAG: [/path/to/…/foobarDAG.py] 'foo'
And in foobar_log.txt, here is the error message:
[2016-10-19 14:56:09,028] {models.py:250} ERROR - Failed to import: /path/to/foobarDAG.py
Traceback (most recent call last):
File "/path/to/airflow/models.py", line 247, in process_file
m = imp.load_source(mod_name, filepath)
File "/path/to/anaconda3/envs/foobarenv/lib/python3.5/imp.py", line 172, in load_source
module = _load(spec)
File "<frozen importlib._bootstrap>", line 693, in _load
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 662, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/path/to/foobarDAG.py", line 67, in <module>
foobarList = conf['foo']['bar'].split('\n')
File "/path/to/anaconda3/envs/foobarenv/lib/python3.5/configparser.py", line 956, in __getitem__
raise KeyError(key)
KeyError: 'foo'
So oddly it appears that the scheduler isn't retrieving the ['foo'] section from airflow.cfg and providing it to the DAG. Any idea why?
It turns out that everything was working properly, but the scheduler hadn't been restarted. The scheduler was apparently still using the old airflow.cfg which did not have the added section.
Related
I've recently created custom timetable. Worked perfectly locally (python==3.9.12, airflow==2.3.0), so decided to upload it to plugins folder in my Cloud Composer (version==1.18.11, airflow==2.2.5). While scheduler picks up timetable and dag is run based on it, trying to open dag in UI throws me this error window:
Something bad has happened.
Airflow is used by many users, and it is very likely that others had similar problems and you can easily find
a solution to your problem.
Consider following these steps:
* gather the relevant information (detailed logs with errors, reproduction steps, details of your deployment)
* find similar issues using:
* GitHub Discussions
* GitHub Issues
* Stack Overflow
* the usual search engine you use on a daily basis
* if you run Airflow on a Managed Service, consider opening an issue using the service support channels
* if you tried and have difficulty with diagnosing and fixing the problem yourself, consider creating a bug report.
Make sure however, to include all relevant details and results of your investigation so far.
Python version: 3.8.12
Airflow version: 2.2.5+composer
Node: 67b211ed8faa
-------------------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/python3.8/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/auth.py", line 51, in decorated
return func(*args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/decorators.py", line 108, in view_func
return f(*args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/decorators.py", line 71, in wrapper
return f(*args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/views.py", line 2328, in tree
dag = current_app.dag_bag.get_dag(dag_id)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py", line 186, in get_dag
self._add_dag_from_db(dag_id=dag_id, session=session)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py", line 261, in _add_dag_from_db
dag = row.dag
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/serialized_dag.py", line 180, in dag
dag = SerializedDAG.from_dict(self.data) # type: Any
File "/opt/python3.8/lib/python3.8/site-packages/airflow/serialization/serialized_objects.py", line 951, in from_dict
return cls.deserialize_dag(serialized_obj['dag'])
File "/opt/python3.8/lib/python3.8/site-packages/airflow/serialization/serialized_objects.py", line 877, in deserialize_dag
v = _decode_timetable(v)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/serialization/serialized_objects.py", line 167, in _decode_timetable
raise _TimetableNotRegistered(importable_string)
airflow.serialization.serialized_objects._TimetableNotRegistered: Timetable class '<enter_your_timetable_plugin_name>.<enter_your_timetable_class_name>' is not registered
Going to window Plugins shows that no plugins are added (both Cloud Composer==2.0.15, airflow==2.2.5) and my local setup uploads plugin properly.
What's really interesting that while having same airflow version, both versions of Cloud Composer works differently.
I don't override any of default airflow variables, nor that should impact anything that's described here.
Many many thanks for any suggestions.
I'm having an issue with the loading of a file from dagster code (setup, not pipelines). Say I have the following project structure:
pipelines
-app/
--environments
----schedules.yaml
--repository.py
--repository.yaml
When I run dagit while inside the project folder($cd project && dagit -y app/repository.yaml), this folder becomes the working dir and inside the repository.py I could load a file knowing the root is project
# repository.py
with open('app/evironments/schedules.yaml', 'r'):
# do something with the file
However, if I set up a schedule the pipelines in the project do not run. Checking the cron logs it seems the open line throws a file not found exception. I was wondering if this happens because the working directory is different when executing the cron.
For context, I'm loading a config file with parameters of cron_schedules for each pipeline. Also, here's the tail of the stacktrace in my case:
File "/home/user/.local/share/virtualenvs/pipelines-mfP13m0c/lib/python3.8/site-packages/dagster/core/definitions/handle.py", line 190, in from_yaml
return LoaderEntrypoint.from_file_target(
File "/home/user/.local/share/virtualenvs/pipelines-mfP13m0c/lib/python3.8/site-packages/dagster/core/definitions/handle.py", line 161, in from_file_target
module = import_module_from_path(module_name, os.path.abspath(python_file))
File "/home/user/.local/share/virtualenvs/pipelines-mfP13m0c/lib/python3.8/site-packages/dagster/seven/__init__.py", line 75, in import_module_from_path
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/user/pipelines/app/repository.py", line 28, in <module>
schedule_builder = ScheduleBuilder(settings.CRON_PRESET, settings.ENV_DICT)
File "/home/user/pipelines/app/schedules.py", line 12, in __init__
self.cron_schedules = self._load_schedules_yaml()
File "/home/user/pipelines/app/schedules.py", line 16, in _load_schedules_yaml
with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'app/environments/schedules.yaml'
You could open the file using the absolute path of the file so that it opens correctly.
from dagster.utils import file_relative_path
with open(file_relative_path(__file__, './environments/schedules.yaml'), 'r'):
# do something with the file
All file_relative_path is simply doing the following, so you can call the os.path methods directly if you prefer:
def file_relative_path(dunderfile, relative_path):
os.path.join(os.path.dirname(dunderfile), relative_path)
I am trying to configure remote logging with Azure blob.
Airflow version: 1.10.2
Python: 3.6.5
Ubuntu: 18.04
Following are the step I did:
In $AIRFLOW_HOME/config/log_config.py, I have put REMOTE_BASE_LOG_FOLDER = 'wasb-airflow-logs' (This is a folder inside the container (container name: airflow-logs))
Empty init.py is in $AIRFLOW_HOME/config/
$AIRFLOW_HOME/config/ is added in $PYTHONPATH
Renamed DEFAULT_LOGGING_CONFIG to LOGGING CONFIG everywhere in $AIRFLOW_HOME/config/log_config.py
User defined in Airflow blob connection has read/write access to REMOTE_BASE_LOG_FOLDER
$AIRFLOW_HOME/airflow.cfg it has remote_logging = True
logging_config_class = log_config.LOGGING_CONFIG
remote_log_conn_id =
Following is the error:
Unable to load the config, contains a configuration error.
Traceback (most recent call last):
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 382, in resolve
found = getattr(found, frag)
AttributeError: module 'airflow.utils.log' has no attribute 'wasb_task_handler'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 384, in resolve
self.importer(used)
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/utils/log/wasb_task_handler.py", line 23, in <module>
from airflow.contrib.hooks.wasb_hook import WasbHook
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/contrib/hooks/wasb_hook.py", line 22, in <module>
from airflow.hooks.base_hook import BaseHook
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 28, in <module>
from airflow.models import Connection
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/models.py", line 86, in <module>
from airflow.utils.dag_processing import list_py_file_paths
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/utils/dag_processing.py", line 49, in <module>
from airflow.settings import logging_class_path
ImportError: cannot import name 'logging_class_path'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 558, in configure
handler = self.configure_handler(handlers[name])
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 708, in configure_handler
klass = self.resolve(cname)
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 391, in resolve
raise v
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 384, in resolve
self.importer(used)
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/utils/log/wasb_task_handler.py", line 23, in <module>
from airflow.contrib.hooks.wasb_hook import WasbHook
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/contrib/hooks/wasb_hook.py", line 22, in <module>
from airflow.hooks.base_hook import BaseHook
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 28, in <module>
from airflow.models import Connection
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/models.py", line 86, in <module>
from airflow.utils.dag_processing import list_py_file_paths
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/utils/dag_processing.py", line 49, in <module>
from airflow.settings import logging_class_path
ValueError: Cannot resolve 'airflow.utils.log.wasb_task_handler.WasbTaskHandler': cannot import name 'logging_class_path'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/gsingh/venv/bin/airflow", line 21, in <module>
from airflow import configuration
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/__init__.py", line 36, in <module>
from airflow import settings, configuration as conf
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/settings.py", line 262, in <module>
logging_class_path = configure_logging()
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/logging_config.py", line 73, in configure_logging
raise e
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/logging_config.py", line 68, in configure_logging
dictConfig(logging_config)
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 795, in dictConfig
dictConfigClass(config).configure()
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 566, in configure
'%r: %s' % (name, e))
ValueError: Unable to configure handler 'processor': Cannot resolve 'airflow.utils.log.wasb_task_handler.WasbTaskHandler': cannot import name 'logging_class_path'
I am not sure which configuration I am missing. Has anyone faced the same issue?
You need to install the azure package.
pip install 'apache-airflow[azure_blob_storage,azure_data_lake,azure_cosmos,azure_container_instances]
As per updating.md
This now should be installed with
pip install apache-airflow[azure]
But this didn't work for me.
sudo chown 50000:0 dags logs plugins in my case.
I tried to run official docker-compose.yml with all these containers (which are dependent on these 3 volume forwards) or simply wrap airflow standalone into a single container for a debug purpose. Turned out volumes were created with root ownerships instead of airflows.
I had the same error however if I scrolled up higher I could see that there was another exception thrown before the ValueError. Which was a PermissionError.
PermissionError: [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler'
The reason I got that error is because I didn't create the initial 3 folders (dags, logs, plugins) before running airflow docker container. So docker seems to have created then automatically but the permissions were wrong.
Steps to fix:
Stop current container
docker-compose down --volumes --remove-orphans
Delete folders dags, logs, plugins
Just in case Destroy the images and volumes already created (in Docker Desktop)
Create folders again from command line
mkdir logs dags plugins
run airflow docker again
docker-compose up airflow-init
docker-compose up
I recently ran into this nasty error where Airflow's apply_defaults decorator is throwing following stack-trace (my **kwargs do contain job_flow_id)
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/mnt/airflow/dags/zanalytics-airflow/src/main/mysql_import/dags/mysql_import_dag.py", line 23, in <module>
sync_dag_builder.build_sync_dag()
File "/mnt/airflow/dags/zanalytics-airflow/src/main/mysql_import/dags/builders/sync_dag_builders/emr_sync_dag_builder.py", line 26, in build_sync_dag
create_emr_task, terminate_emr_task = self._create_job_flow_tasks()
File "/mnt/airflow/dags/zanalytics-airflow/src/main/mysql_import/dags/builders/sync_dag_builders/emr_sync_dag_builder.py", line 44, in _create_job_flow_tasks
task_id=GlobalConstants.EMR_TERMINATE_STEP)
File "/home/hadoop/.pyenv/versions/3.6.6/lib/python3.6/site-packages/airflow/utils/decorators.py", line 98, in wrapper
result = func(*args, **kwargs)
File "/mnt/airflow/dags/zanalytics-airflow/src/main/aws/operators/emr_terminate_ancestor_job_flows_operator.py", line 31, in __init__
EmrTerminateJobFlowOperator.__init__(self, *args, **kwargs)
File "/home/hadoop/.pyenv/versions/3.6.6/lib/python3.6/site-packages/airflow/utils/decorators.py", line 98, in wrapper
result = func(*args, **kwargs)
File "/home/hadoop/.pyenv/versions/3.6.6/lib/python3.6/site-packages/airflow/contrib/operators/emr_terminate_job_flow_operator.py", line 44, in __init__
super(EmrTerminateJobFlowOperator, self).__init__(*args, **kwargs)
File "/home/hadoop/.pyenv/versions/3.6.6/lib/python3.6/site-packages/airflow/utils/decorators.py", line 94, in wrapper
raise AirflowException(msg)
airflow.exceptions.AirflowException: Argument ['job_flow_id'] is required
The disturbing parts are
Exception is presently originating from the __init__ of the built-in EmrTerminateJobFlowOperator
Earlier it was coming from EmrCreateJobFlowOperator, even though that doesn't take in a job_flow_id param; but it has gone since
Looking into decorators.py, I felt that sig_cache might be messing-up some things. In fact, from the commit that introduced it, I cannot figure out how function-signature caching is working at all (at least it isn't working in this way)?
I've tried deleting all __pycache__ and restarting scheduler, webserver without luck (I'm running them in separate Linux screens)
What could be causing the error?
How does sig_cache work and does it need to be cleared forcefully under any circumstances? If so, how to clear it?
Environment
Python 3.6.6
Airflow 1.10.2
LocalExecutor
So it looks like my install of apache airflow on a Google Compute Engine instance broke down. Everything was working great and then two days ago all the DAG runs show up stuck in a running state. I am using LocalExecutioner.
When I try to look at the log I get this error:
* Log file isn't local.
* Fetching here: http://:8793/log/collector/aa_main_combined_collector/2017-12-15T09:00:00
*** Failed to fetch log file from worker.
I didn't touch a setting anywhere. I looked through all the config files and I scanned the logs and I see this error
[2017-12-16 20:08:42,558] {jobs.py:355} DagFileProcessor0 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/airflow/jobs.py", line 347, in helper
pickle_dags)
File "/usr/local/lib/python3.4/dist-packages/airflow/utils/db.py", line 53, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/airflow/jobs.py", line 1584, in process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
File "/usr/local/lib/python3.4/dist-packages/airflow/jobs.py", line 1173, in _process_dags
dag_run = self.create_dag_run(dag)
File "/usr/local/lib/python3.4/dist-packages/airflow/utils/db.py", line 53, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/airflow/jobs.py", line 763, in create_dag_run
last_scheduled_run = qry.scalar()
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/query.py", line 2843, in scalar
ret = self.one()
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/query.py", line 2814, in one
ret = self.one_or_none()
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/query.py", line 2784, in one_or_none
ret = list(self)
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/query.py", line 2855, in iter
return self._execute_and_instances(context)
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/query.py", line 2878, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 945, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
context)
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 1405, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/util/compat.py", line 187, in reraise
raise value
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
context)
File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute
cursor.execute(statement, parameters)
File "/usr/local/lib/python3.4/dist-packages/airflow/bin/cli.py", line 69, in sigint_handler
sys.exit(0)
SystemExit: 0
Any thoughts out there?
I solved this problem though in doing so I discovered another problem.
Long and short of it as soon as I manually started the scheduler, everything worked again. It appears the problem was that the scheduler did not get restarted correctly after a system reboot.
I have scheduler running through SystemD. The Webserver .service works fine. However I do notice that the scheduler .service continually restarts. It appears there is an issue there I need to resolve. This part of it is solved for now.
Look at the log URL, verify if it ends with a date with special characters +:
&execution_date=2018-02-23T08:00:00+00:00
This was fixed here.
You can replace the + for -, or replace all special characters, in my case:
&execution_date=2018-02-23T08%3A00%3A00%2B00%3A00
This happens here.
The FileTaskHandler can not load the log from local disk, and try to load from worker.
Another thing that could be causing this error is the exclusion of the airflow/logs folder or the sub folders inside it.