Having an issue doing a backfill. When I run this in the command-line
airflow backfill my_dag -s 2021-01-01 -e 2021-01-12
the dag is triggered and begins running. The first task (a simple python script with no external dependencies) completes but in the second task, I receive this error about fernet key. The dag is just calling a MsSqlOperator.
cryptography.fernet.InvalidToken
If I trigger the dag manually in the UI, all the steps succeed.
If I trigger one execution in the dag in the CLI, the dag succeeds.
airflow dags trigger -e '2021-01-19T04:00:00' my_dag
The fernet key is in the config file and we've already run resetdb and re-created the connections. Same issue exists, backfill command doesn't work but other methods do.
Also tried using the --local flag (not sure what this does) but it doesn't work either.
Any ideas how to troubleshoot?
Running Airflow 1.10.14 on-prem with LocalExecutor. Edit: Issue exists when using .15 too.
Backfill Doc for Airflow 1.10.14
Logs:
Note, it says something about variable missing but that's misleading since it works if I manually run it from the UI. Key line is
ERROR - Can't decrypt _val for key=xyz_users_overlap_import, FERNET_KEY configuration missing
Full log:
INFO - Job 4303: Subtask download_user_files
[2021-05-17 23:41:29,327] {{logging_mixin.py:120}} INFO - Running <TaskInstance: xyz_users_overlap_import.download_user_files 2021-01-22T15:10:00+00:00 [running]> on host da2m.mycorp.corp
[2021-05-17 23:41:29,360] {{variable.py:58}} ERROR - Can't decrypt _val for key=xyz_users_overlap_import, FERNET_KEY configuration missing
[2021-05-17 23:41:29,361] {{taskinstance.py:1150}} ERROR - 'Variable xyz_users_overlap_import does not exist'
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 965, in _run_raw_task
self.render_templates(context=context)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1424, in render_templates
self.task.render_template_fields(context)
File "/usr/local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 719, in render_template_fields
self._do_render_template_fields(self, self.template_fields, context, jinja_env, set())
File "/usr/local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 726, in _do_render_template_fields
rendered_content = self.render_template(content, context, jinja_env, seen_oids)
File "/usr/local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 755, in render_template
return jinja_env.from_string(content).render(**context)
File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 1090, in render
self.environment.handle_exception()
File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "/usr/local/lib/python3.8/site-packages/jinja2/_compat.py", line 28, in reraise
raise value.with_traceback(tb)
File "<template>", line 1, in top-level template code
File "/usr/local/lib/python3.8/site-packages/jinja2/environment.py", line 471, in getattr
return getattr(obj, attribute)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1335, in __getattr__
self.var = Variable.get(item, deserialize_json=True)
File "/usr/local/lib/python3.8/site-packages/airflow/models/variable.py", line 117, in get
raise KeyError('Variable {} does not exist'.format(key))
KeyError: 'Variable xyz_users_overlap_import does not exist'
Have you tried by generating a new FERNET_KEY and set up as env variable (in this way you will overwrite the value from the airflow.cfg file):
export AIRFLOW__CORE__FERNET_KEY=your_fernet_key
The code to generate a new one is:
from cryptography.fernet import Fernet
fernet_key= Fernet.generate_key()
print(fernet_key.decode())
Do not forget to install before the package https://pypi.org/project/cryptography/ (pip install cryptography)
Related
I'm trying to delete a dag named 'twitterQueryParse' which you can see in this screenshot from my dags list:
airflow dags list
I've executed:
airflow dags delete twitterQueryParse
So I'm getting the anticipated error message:
[2022-12-25 09:39:49,657] {__init__.py:42} INFO - Loaded API auth backend: airflow.api.auth.backend.session
This will drop all existing records related to the specified DAG. Proceed? (y/n)y
[2022-12-25 09:39:53,555] {delete_dag.py:46} INFO - Deleting DAG: twitterQueryParse
Traceback (most recent call last):
File "/home/rony/anaconda3/bin/airflow", line 8, in <module>
sys.exit(main())
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/__main__.py", line 39, in main
args.func(args)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 52, in command
return func(*args, **kwargs)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/utils/cli.py", line 103, in wrapper
return f(*args, **kwargs)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/cli/commands/dag_command.py", line 163, in dag_delete
message = api_client.delete_dag(dag_id=args.dag_id)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/api/client/local_client.py", line 38, in delete_dag
count = delete_dag.delete_dag(dag_id)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/utils/session.py", line 75, in wrapper
return func(*args, session=session, **kwargs)
File "/home/rony/anaconda3/lib/python3.7/site-packages/airflow/api/common/delete_dag.py", line 57, in delete_dag
raise DagNotFound(f"Dag id {dag_id} not found")
airflow.exceptions.DagNotFound: Dag id twitterQueryParse not found
But when I list the dags again twitterQueryParse remains on the list, even following a reset and initialization of the airflow db:
airflow db reset
airflow db init
My airflow version is 2.4.2
How do I delete this dag completely from the airflow system?
I try to connect to Azure Data Lake using Airflow. I use Airflow connection via the Web UI.
When I try to connect using the test button, I get an error Bad Request. As seen below
I use the correct UUIDs. These UUIDs have been verified in other cases. I also checked the firewall.
When I execute the DAG, I use the Azure Data Lake connection id to check if a file exists: If I apply the method as described here: What is the best way to check if a file exists on an Azure Datalake using Apache Airflow?
This is the error I get
[2022-05-06, 17:27:33 UTC] {log.py:127} ERROR - 99ec1d77-e91c-4fd3-a1c7-fa751ca1e779 - OAuth2Client:The token response from the server is unparseable as JSON: ***
Traceback (most recent call last):
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 168, in _validate_token_response
wire_response = json.loads(body)
File "/usr/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 3 column 1 (char 4)
[2022-05-06, 17:27:33 UTC] {log.py:127} ERROR - 99ec1d77-e91c-4fd3-a1c7-fa751ca1e779 - OAuth2Client:Error validating get token response: ***
Traceback (most recent call last):
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 238, in _handle_get_token_response
return self._validate_token_response(body)
File "/opt/airflow/lib/python3.8/site-packages/adal/oauth2_client.py", line 168, in _validate_token_response
Authenticating to Azure Data Lake is by token credentials i.e. add specific credentials (client_id, secret, tenant) and account name to the Airflow connection.
Information about how to set it up can be found in this doc.
You can see code example in the source code test function.
Other method of authentication are currently not supported.
I was trying to get the connection running using the Airflow implementation. My impression was that it was buggy and did not work out well. The above situation happened with Airflow 2.2.5. When I upgraded to Airflow 2.3.0, the test button was grayed out.
The final solution was to use Access Tokens instead.
I'm running Apache Airflow 2.x locally, using the Docker Compose file that is provided in the documentation. In the .\dags directory on my local filesystem (which is mounted into the Airflow containers), I create a new Python script file, and implement the DAG using the TaskFlow API.
The changed to my DAG are sometimes invalid. For example, maybe I have an ImportError due to an invalid module name, or a syntax error. When Airflow attempts to import the DAG, I cannot find any log messages, from the web server, scheduler, or worker, that would indicate a problem, or what the specific problem is.
Instead, I have to read through my code line-by-line, and look for a problem. This problem is compounded by the fact that my local Python environment on Windows 10, and the Python environment for Airflow, are different versions and have different Python packages installed. Hence, I cannot reliably use my local development environment to detect package import failures, because the packages I expect to be installed in the Airflow environment are different than the ones I have locally. Additionally, the version of Python I'm using to write code locally, and the Python version being used by Airflow, are not matched up.
Thus, I am needing some kind of error logging to indicate that a DAG import failed.
Question: When a DAG fails to update / import, where are the logs to indicate if an import failure occurred, and what the exact error message was?
Currently, the DAG parsing logs would be under $AIRFLOW_HOME/logs/EXECUTION_DATE/scheduler/DAG_FILE.py.log
Example:
Let's say my DAG file is example-dag.py which has the following contents, as you can notice there is a typo in datetime import:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import dattime # <-- This Line has typo
dag = DAG(
dag_id='example_Dag',
schedule_interval=None,
start_date=datetime(2019, 2, 6),
)
t1 = BashOperator(
task_id='print_date1',
bash_command='sleep $[ ( $RANDOM % 30 ) + 1 ]s',
dag=dag)
Now, if you check logs under $AIRFLOW_HOME/logs/scheduler/2021-04-07/example-dag.py.log where $AIRFLOW_HOME/logs is what I have set in $AIRFLOW__LOGGING__BASE_LOG_FOLDER or [logging] base_log_folder in airflow.cfg (https://airflow.apache.org/docs/apache-airflow/2.0.1/configurations-ref.html#base-log-folder)
That file should have logs as below:
[2021-04-07 21:39:02,222] {scheduler_job.py:182} INFO - Started process (PID=686) to work on /files/dags/example-dag.py
[2021-04-07 21:39:02,230] {scheduler_job.py:633} INFO - Processing file /files/dags/example-dag.py for tasks to queue
[2021-04-07 21:39:02,233] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,233] {dagbag.py:451} INFO - Filling up the DagBag from /files/dags/example-dag.py
[2021-04-07 21:39:02,368] {logging_mixin.py:104} INFO - [2021-04-07 21:39:02,357] {dagbag.py:308} ERROR - Failed to import: /files/dags/example-dag.py
Traceback (most recent call last):
File "/opt/airflow/airflow/models/dagbag.py", line 305, in _load_modules_from_file
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/files/dags/example-dag.py", line 3, in <module>
from datetime import dattime
ImportError: cannot import name 'dattime'
[2021-04-07 21:39:02,380] {scheduler_job.py:645} WARNING - No viable dags retrieved from /files/dags/example-dag.py
[2021-04-07 21:39:02,407] {scheduler_job.py:190} INFO - Processing /files/dags/example-dag.py took 0.189 seconds
and you will see the error in the Webserver as follow:
I have created virtualenv for python3 using:
virtualenv -p $(which python3) ENV
Then activate the source
source /Users/myusername/ENV/bin/activate
Install the apache-airflow:
pip install apache-airflow
then which airflow yields /Users/myusername/ENV/bin/airflow
But when I try to initdb using:
airflow initdb
I get below error:
{db.py:350} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
WARNI [airflow.utils.log.logging_mixin.LoggingMixin] cryptography not found - values will not be stored encrypted.
ERROR [airflow.models.DagBag] Failed to import: /Library/Python/2.7/site-packages/airflow/example_dags/example_http_operator.py
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/airflow/models/__init__.py", line 413, in process_file
m = imp.load_source(mod_name, filepath)
File "/Library/Python/2.7/site-packages/airflow/example_dags/example_http_operator.py", line 27, in <module>
from airflow.operators.http_operator import SimpleHttpOperator
File "/Library/Python/2.7/site-packages/airflow/operators/http_operator.py", line 21, in <module>
from airflow.hooks.http_hook import HttpHook
File "/Library/Python/2.7/site-packages/airflow/hooks/http_hook.py", line 23, in <module>
import tenacity
File "/Library/Python/2.7/site-packages/tenacity/__init__.py", line 375, in <module>
from tenacity.tornadoweb import TornadoRetrying
File "/Library/Python/2.7/site-packages/tenacity/tornadoweb.py", line 24, in <module>
from tornado import gen
File "/Library/Python/2.7/site-packages/tornado-6.0.3-py2.7-macosx-10.14-intel.egg/tornado/gen.py", line 126
def _value_from_stopiteration(e: Union[StopIteration, "Return"]) -> Any:
^
SyntaxError: invalid syntax
Done.
(ENV) ---------------------------------------------------------
Seems like example scripts use python 2.7 and it can't recognize the function definition syntax.
Does apache-airflow package need to be fixed by the next release or I can do something to fix this?
I tried fixing this:
Use python2.7 instead of python3
then install airflow on default python 2.7 enabled on mac but this throws other errors like package "six" is not compatible.
You need to turn off the example DAGs to be loaded in config file to solve this problem.
Anyway, it seems weird that airflow uses 2.7 Python when you told that it is installed into Python 3 virtual environment.
when I try to install nodecellar with Cloudify,I am getting the following error
2015-07-13T17:31:03 LOG <nodecellar> [mongod_a50aa.configure] ERROR: Exception raised on operation [script_runner.tasks.run] invocation
Traceback (most recent call last):
File "/root/cloudify.host_dba5c/env/local/lib/python2.7/site-packages/cloudify/decorators.py", line 125, in wrapper
result = func(*args, **kwargs)
File "/root/cloudify.host_dba5c/env/local/lib/python2.7/site-packages/script_runner/tasks.py", line 58, in run
return process_execution(script_func, script_path, ctx, process)
File "/root/cloudify.host_dba5c/env/local/lib/python2.7/site-packages/script_runner/tasks.py", line 74, in process_execution
script_func(script_path, ctx, process)
File "/root/cloudify.host_dba5c/env/local/lib/python2.7/site-packages/script_runner/tasks.py", line 143, in execute
stderr_consumer.buffer.getvalue())
How can I fix this problem?
This exception is raised by the Cloudify Script Plugin you ran a script, which exited with a non-zero error code. Here is the source of that error.
The script that returned non-zero code is that script which is mapped to the configure operation on the mongod node. Which script that is depends on the version of the Nodecellar blueprint that you are using.
I can't give a more detailed answer without information regarding the specific blueprint version, which Cloudify version you have installed, details about your provider (local, Vagrant, Openstack, AWS), and OS (Ubuntu, Centos, etc).