I'm trying to delete a dag named 'twitterQueryParse' which you can see in this screenshot from my dags list:
airflow dags list
I've executed:
airflow dags delete twitterQueryParse
So I'm getting the anticipated error message:
[2022-12-25 09:39:49,657] {__init__.py:42} INFO - Loaded API auth backend: airflow.api.auth.backend.session
This will drop all existing records related to the specified DAG. Proceed? (y/n)y
[2022-12-25 09:39:53,555] {delete_dag.py:46} INFO - Deleting DAG: twitterQueryParse
Traceback (most recent call last):
File "/home/rony/anaconda3/bin/airflow", line 8, in <module>
sys.exit(main())
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/__main__.py", line 39, in main
args.func(args)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 52, in command
return func(*args, **kwargs)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/utils/cli.py", line 103, in wrapper
return f(*args, **kwargs)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/cli/commands/dag_command.py", line 163, in dag_delete
message = api_client.delete_dag(dag_id=args.dag_id)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/api/client/local_client.py", line 38, in delete_dag
count = delete_dag.delete_dag(dag_id)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 75, in wrapper
return func(*args, session=session, **kwargs)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/api/common/delete_dag.py", line 57, in delete_dag
raise DagNotFound(f"Dag id {dag_id} not found")
airflow.exceptions.DagNotFound: Dag id twitterQueryParse not found
But when I list the dags again twitterQueryParse remains on the list, even following a reset and initialization of the airflow db:
airflow db reset
airflow db init
My airflow version is 2.4.2
How do I delete this dag completely from the airflow system?
Related
I've recently created custom timetable. Worked perfectly locally (python==3.9.12, airflow==2.3.0), so decided to upload it to plugins folder in my Cloud Composer (version==1.18.11, airflow==2.2.5). While scheduler picks up timetable and dag is run based on it, trying to open dag in UI throws me this error window:
Something bad has happened.
Airflow is used by many users, and it is very likely that others had similar problems and you can easily find
a solution to your problem.
Consider following these steps:
* gather the relevant information (detailed logs with errors, reproduction steps, details of your deployment)
* find similar issues using:
* GitHub Discussions
* GitHub Issues
* Stack Overflow
* the usual search engine you use on a daily basis
* if you run Airflow on a Managed Service, consider opening an issue using the service support channels
* if you tried and have difficulty with diagnosing and fixing the problem yourself, consider creating a bug report.
Make sure however, to include all relevant details and results of your investigation so far.
Python version: 3.8.12
Airflow version: 2.2.5+composer
Node: 67b211ed8faa
-------------------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/python3.8/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/python3.8/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/auth.py", line 51, in decorated
return func(*args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/decorators.py", line 108, in view_func
return f(*args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/decorators.py", line 71, in wrapper
return f(*args, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/www/views.py", line 2328, in tree
dag = current_app.dag_bag.get_dag(dag_id)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py", line 186, in get_dag
self._add_dag_from_db(dag_id=dag_id, session=session)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/dagbag.py", line 261, in _add_dag_from_db
dag = row.dag
File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/serialized_dag.py", line 180, in dag
dag = SerializedDAG.from_dict(self.data) # type: Any
File "/opt/python3.8/lib/python3.8/site-packages/airflow/serialization/serialized_objects.py", line 951, in from_dict
return cls.deserialize_dag(serialized_obj['dag'])
File "/opt/python3.8/lib/python3.8/site-packages/airflow/serialization/serialized_objects.py", line 877, in deserialize_dag
v = _decode_timetable(v)
File "/opt/python3.8/lib/python3.8/site-packages/airflow/serialization/serialized_objects.py", line 167, in _decode_timetable
raise _TimetableNotRegistered(importable_string)
airflow.serialization.serialized_objects._TimetableNotRegistered: Timetable class '<enter_your_timetable_plugin_name>.<enter_your_timetable_class_name>' is not registered
Going to window Plugins shows that no plugins are added (both Cloud Composer==2.0.15, airflow==2.2.5) and my local setup uploads plugin properly.
What's really interesting that while having same airflow version, both versions of Cloud Composer works differently.
I don't override any of default airflow variables, nor that should impact anything that's described here.
Many many thanks for any suggestions.
I am trying to configure remote logging with Azure blob.
Airflow version: 1.10.2
Python: 3.6.5
Ubuntu: 18.04
Following are the step I did:
In $AIRFLOW_HOME/config/log_config.py, I have put REMOTE_BASE_LOG_FOLDER = 'wasb-airflow-logs' (This is a folder inside the container (container name: airflow-logs))
Empty init.py is in $AIRFLOW_HOME/config/
$AIRFLOW_HOME/config/ is added in $PYTHONPATH
Renamed DEFAULT_LOGGING_CONFIG to LOGGING CONFIG everywhere in $AIRFLOW_HOME/config/log_config.py
User defined in Airflow blob connection has read/write access to REMOTE_BASE_LOG_FOLDER
$AIRFLOW_HOME/airflow.cfg it has remote_logging = True
logging_config_class = log_config.LOGGING_CONFIG
remote_log_conn_id =
Following is the error:
Unable to load the config, contains a configuration error.
Traceback (most recent call last):
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 382, in resolve
found = getattr(found, frag)
AttributeError: module 'airflow.utils.log' has no attribute 'wasb_task_handler'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 384, in resolve
self.importer(used)
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/utils/log/wasb_task_handler.py", line 23, in <module>
from airflow.contrib.hooks.wasb_hook import WasbHook
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/contrib/hooks/wasb_hook.py", line 22, in <module>
from airflow.hooks.base_hook import BaseHook
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 28, in <module>
from airflow.models import Connection
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/models.py", line 86, in <module>
from airflow.utils.dag_processing import list_py_file_paths
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/utils/dag_processing.py", line 49, in <module>
from airflow.settings import logging_class_path
ImportError: cannot import name 'logging_class_path'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 558, in configure
handler = self.configure_handler(handlers[name])
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 708, in configure_handler
klass = self.resolve(cname)
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 391, in resolve
raise v
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 384, in resolve
self.importer(used)
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/utils/log/wasb_task_handler.py", line 23, in <module>
from airflow.contrib.hooks.wasb_hook import WasbHook
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/contrib/hooks/wasb_hook.py", line 22, in <module>
from airflow.hooks.base_hook import BaseHook
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 28, in <module>
from airflow.models import Connection
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/models.py", line 86, in <module>
from airflow.utils.dag_processing import list_py_file_paths
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/utils/dag_processing.py", line 49, in <module>
from airflow.settings import logging_class_path
ValueError: Cannot resolve 'airflow.utils.log.wasb_task_handler.WasbTaskHandler': cannot import name 'logging_class_path'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/gsingh/venv/bin/airflow", line 21, in <module>
from airflow import configuration
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/__init__.py", line 36, in <module>
from airflow import settings, configuration as conf
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/settings.py", line 262, in <module>
logging_class_path = configure_logging()
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/logging_config.py", line 73, in configure_logging
raise e
File "/home/gsingh/venv/lib/python3.6/site-packages/airflow/logging_config.py", line 68, in configure_logging
dictConfig(logging_config)
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 795, in dictConfig
dictConfigClass(config).configure()
File "/home/gsingh/anaconda3/lib/python3.6/logging/config.py", line 566, in configure
'%r: %s' % (name, e))
ValueError: Unable to configure handler 'processor': Cannot resolve 'airflow.utils.log.wasb_task_handler.WasbTaskHandler': cannot import name 'logging_class_path'
I am not sure which configuration I am missing. Has anyone faced the same issue?
You need to install the azure package.
pip install 'apache-airflow[azure_blob_storage,azure_data_lake,azure_cosmos,azure_container_instances]
As per updating.md
This now should be installed with
pip install apache-airflow[azure]
But this didn't work for me.
sudo chown 50000:0 dags logs plugins in my case.
I tried to run official docker-compose.yml with all these containers (which are dependent on these 3 volume forwards) or simply wrap airflow standalone into a single container for a debug purpose. Turned out volumes were created with root ownerships instead of airflows.
I had the same error however if I scrolled up higher I could see that there was another exception thrown before the ValueError. Which was a PermissionError.
PermissionError: [Errno 13] Permission denied: '/usr/local/airflow/logs/scheduler'
The reason I got that error is because I didn't create the initial 3 folders (dags, logs, plugins) before running airflow docker container. So docker seems to have created then automatically but the permissions were wrong.
Steps to fix:
Stop current container
docker-compose down --volumes --remove-orphans
Delete folders dags, logs, plugins
Just in case Destroy the images and volumes already created (in Docker Desktop)
Create folders again from command line
mkdir logs dags plugins
run airflow docker again
docker-compose up airflow-init
docker-compose up
I recently ran into this nasty error where Airflow's apply_defaults decorator is throwing following stack-trace (my **kwargs do contain job_flow_id)
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/mnt/airflow/dags/zanalytics-airflow/src/main/mysql_import/dags/mysql_import_dag.py", line 23, in <module>
sync_dag_builder.build_sync_dag()
File "/mnt/airflow/dags/zanalytics-airflow/src/main/mysql_import/dags/builders/sync_dag_builders/emr_sync_dag_builder.py", line 26, in build_sync_dag
create_emr_task, terminate_emr_task = self._create_job_flow_tasks()
File "/mnt/airflow/dags/zanalytics-airflow/src/main/mysql_import/dags/builders/sync_dag_builders/emr_sync_dag_builder.py", line 44, in _create_job_flow_tasks
task_id=GlobalConstants.EMR_TERMINATE_STEP)
File "/home/hadoop/.pyenv/versions/3.6.6/lib/python3.6/site-packages/airflow/utils/decorators.py", line 98, in wrapper
result = func(*args, **kwargs)
File "/mnt/airflow/dags/zanalytics-airflow/src/main/aws/operators/emr_terminate_ancestor_job_flows_operator.py", line 31, in __init__
EmrTerminateJobFlowOperator.__init__(self, *args, **kwargs)
File "/home/hadoop/.pyenv/versions/3.6.6/lib/python3.6/site-packages/airflow/utils/decorators.py", line 98, in wrapper
result = func(*args, **kwargs)
File "/home/hadoop/.pyenv/versions/3.6.6/lib/python3.6/site-packages/airflow/contrib/operators/emr_terminate_job_flow_operator.py", line 44, in __init__
super(EmrTerminateJobFlowOperator, self).__init__(*args, **kwargs)
File "/home/hadoop/.pyenv/versions/3.6.6/lib/python3.6/site-packages/airflow/utils/decorators.py", line 94, in wrapper
raise AirflowException(msg)
airflow.exceptions.AirflowException: Argument ['job_flow_id'] is required
The disturbing parts are
Exception is presently originating from the __init__ of the built-in EmrTerminateJobFlowOperator
Earlier it was coming from EmrCreateJobFlowOperator, even though that doesn't take in a job_flow_id param; but it has gone since
Looking into decorators.py, I felt that sig_cache might be messing-up some things. In fact, from the commit that introduced it, I cannot figure out how function-signature caching is working at all (at least it isn't working in this way)?
I've tried deleting all __pycache__ and restarting scheduler, webserver without luck (I'm running them in separate Linux screens)
What could be causing the error?
How does sig_cache work and does it need to be cleared forcefully under any circumstances? If so, how to clear it?
Environment
Python 3.6.6
Airflow 1.10.2
LocalExecutor
I've setup a database connection using sql_alchemy_conn = ibm_db_sa://{USERNAME}:{PASSWORD}#{HOST}:50000/airflow in the airflow.cfg file.
When I run airflow initdb, it pops up KeyError: 'ibm_db_sa'. How can I use a DB2 connection with Airflow?
===============
Here is more specific error message:
airflow initdb
[2017-02-01 15:55:57,135] {__init__.py:36} INFO - Using executor SequentialExecutor
DB: ibm_db_sa://db2inst1:***#localhost:50000/airflow
[2017-02-01 15:55:58,151] {db.py:222} INFO - Creating tables
Traceback (most recent call last):
File "/opt/anaconda/bin/airflow", line 15, in <module>
args.func(args)
File "/opt/anaconda/lib/python2.7/site-packages/airflow/bin/cli.py", line 524, in initdb
db_utils.initdb()
File "/opt/anaconda/lib/python2.7/site-packages/airflow/utils/db.py", line 106, in initdb
upgradedb()
File "/opt/anaconda/lib/python2.7/site-packages/airflow/utils/db.py", line 230, in upgradedb
command.upgrade(config, 'heads')
File "/opt/anaconda/lib/python2.7/site-packages/alembic/command.py", line 174, in upgrade
script.run_env()
File "/opt/anaconda/lib/python2.7/site-packages/alembic/script/base.py", line 416, in run_env
util.load_python_file(self.dir, 'env.py')
File "/opt/anaconda/lib/python2.7/site-packages/alembic/util/pyfiles.py", line 93, in load_python_file
module = load_module_py(module_id, path)
File "/opt/anaconda/lib/python2.7/site-packages/alembic/util/compat.py", line 79, in load_module_py
mod = imp.load_source(module_id, path, fp)
File "/opt/anaconda/lib/python2.7/site-packages/airflow/migrations/env.py", line 74, in <module>
run_migrations_online()
File "/opt/anaconda/lib/python2.7/site-packages/airflow/migrations/env.py", line 65, in run_migrations_online
compare_type=COMPARE_TYPE,
File "<string>", line 8, in configure
File "/opt/anaconda/lib/python2.7/site-packages/alembic/runtime/environment.py", line 773, in configure
opts=opts
File "/opt/anaconda/lib/python2.7/site-packages/alembic/runtime/migration.py", line 159, in configure
return MigrationContext(dialect, connection, opts, environment_context)
File "/opt/anaconda/lib/python2.7/site-packages/alembic/runtime/migration.py", line 103, in __init__
self.impl = ddl.DefaultImpl.get_by_dialect(dialect)(
File "/opt/anaconda/lib/python2.7/site-packages/alembic/ddl/impl.py", line 65, in get_by_dialect
return _impls[dialect.name]
KeyError: 'ibm_db_sa'
Did you install the required package for db2? Eg. pip install ibm_db_sa. By the way the pypi page lists that it is only compatible with python 3.
Please note that db2 is not officially supported as a backend for Airflow.
I'm taking advantage of the fact that Airflow v1.7.1.3 provides access to airflow.cfg to place some configuration values there rather than embedded in the code. We added the following as the first lines of the airflow.cfg file:
[foo]
bar = foo
bar
In the foobarDAG.py class representing the DAG, I do the following:
from airflow.configuration import conf
…
def fooBar():
pass
foobarList = conf['foo']['bar'].split('\n')
foobarOperator = PythonOperator(
task_id='fooBar',
provide_context=True,
python_callable=fooBar,
op_args=[foobarList],
dag=dag)
Testing this manually from the Python prompt is easy:
>>> from foobarDAG import foobarList
…
>>> foobarList
['foo', 'bar']
That's just what I would expect from the information in airflow.cfg, above.
We've also performed a test on the DAG directly:
airflow test foobarDAG fooBar 10-19-2016
That doesn't report any problems.
The problem crops up when we try to use the scheduler to schedule that one DAG:
airflow scheduler -d foobarDAG >& foobar_log.txt
In the web UI, we see the following at the top of the "DAGS" section:
Broken DAG: [/path/to/…/foobarDAG.py] 'foo'
And in foobar_log.txt, here is the error message:
[2016-10-19 14:56:09,028] {models.py:250} ERROR - Failed to import: /path/to/foobarDAG.py
Traceback (most recent call last):
File "/path/to/airflow/models.py", line 247, in process_file
m = imp.load_source(mod_name, filepath)
File "/path/to/anaconda3/envs/foobarenv/lib/python3.5/imp.py", line 172, in load_source
module = _load(spec)
File "<frozen importlib._bootstrap>", line 693, in _load
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 662, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/path/to/foobarDAG.py", line 67, in <module>
foobarList = conf['foo']['bar'].split('\n')
File "/path/to/anaconda3/envs/foobarenv/lib/python3.5/configparser.py", line 956, in __getitem__
raise KeyError(key)
KeyError: 'foo'
So oddly it appears that the scheduler isn't retrieving the ['foo'] section from airflow.cfg and providing it to the DAG. Any idea why?
It turns out that everything was working properly, but the scheduler hadn't been restarted. The scheduler was apparently still using the old airflow.cfg which did not have the added section.