Airflow run_as_user permission denied - airflow
I am running airflow 2.4.3 in a single machine, airflow is installed in a python virtualenv, the airflow_home is /wwx/airflow.
My DAG:
default_args = {
'owner': 'talend',
'start_date': datetime(2023, 2, 1),
'retries': 5,
'retry_delay': timedelta(minutes=5),
'run_as_user': 'talend'
}
dag = DAG(
'dag_fetch_public_holiday',
default_args=default_args,
description='Fetch public holiday and save to csv file.',
schedule_interval='0 6 * * *',
catchup=False,
tags=['wwx', 'elt']
)
download_csv = PythonOperator(task_id='task_download_csv', python_callable=download_public_holiday_csv, dag=dag)
DAG description:
The dag is owned and run by a user talend, this user is created in both OS and airflow level. In OS level, the user has group airflow, sudo; in airflow level, the user is admin role.
Inside the dag there is a PythonOperator task to save csv to a folder, it is expected that the csv file will be created and owned by the talend user.
Problem description:
When I trigger this dag in web UI, it is showing the error permission denied for the dag log folder:
*** Reading local file: /wwx/airflow/logs/dag_id=dag_fetch_public_holiday/run_id=scheduled__2023-02-06T06:00:00+00:00/task_id=task_download_csv/attempt=1.log
[2023-02-07, 18:07:23 CST] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: dag_fetch_public_holiday.task_download_csv scheduled__2023-02-06T06:00:00+00:00 [queued]>
[2023-02-07, 18:07:23 CST] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: dag_fetch_public_holiday.task_download_csv scheduled__2023-02-06T06:00:00+00:00 [queued]>
[2023-02-07, 18:07:23 CST] {taskinstance.py:1362} INFO -
--------------------------------------------------------------------------------
[2023-02-07, 18:07:23 CST] {taskinstance.py:1363} INFO - Starting attempt 1 of 6
[2023-02-07, 18:07:23 CST] {taskinstance.py:1364} INFO -
--------------------------------------------------------------------------------
[2023-02-07, 18:07:23 CST] {taskinstance.py:1383} INFO - Executing <Task(PythonOperator): task_download_csv> on 2023-02-06 06:00:00+00:00
[2023-02-07, 18:07:23 CST] {base_task_runner.py:129} INFO - Running on host: vmi1120376.contaboserver.net
[2023-02-07, 18:07:23 CST] {base_task_runner.py:130} INFO - Running: ['sudo', '-E', '-H', '-u', 'talend', 'airflow', 'tasks', 'run', 'dag_fetch_public_holiday', 'task_download_csv', 'scheduled__2023-02-06T06:00:00+00:00', '--job-id', '30', '--raw', '--subdir', 'DAGS_FOLDER/fetch_public_holiday.py', '--cfg-path', '/tmp/tmpejvrehc1']
[2023-02-07, 18:07:24 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv /wwx/airflow/venv/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
[2023-02-07, 18:07:24 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv [[34m2023-02-08, 02:07:24 CST[0m] {[34mdagbag.py:[0m537} INFO[0m - Filling up the DagBag from /wwx/airflow/dags/fetch_public_holiday.py[0m
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv Traceback (most recent call last):
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/usr/lib/python3.8/pathlib.py", line 1288, in mkdir
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv self._accessor.mkdir(self, mode)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv PermissionError: [Errno 13] Permission denied: '/wwx/airflow/logs/dag_id=dag_fetch_public_holiday/run_id=scheduled__2023-02-06T06:00:00+00:00/task_id=task_download_csv'
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv During handling of the above exception, another exception occurred:
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv Traceback (most recent call last):
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/usr/local/bin/airflow", line 8, in <module>
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv sys.exit(main())
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/__main__.py", line 39, in main
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv args.func(args)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 52, in command
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv return func(*args, **kwargs)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/utils/cli.py", line 103, in wrapper
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv return f(*args, **kwargs)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 372, in task_run
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv ti.init_run_context(raw=args.raw)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 2503, in init_run_context
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv self._set_context(self)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/utils/log/logging_mixin.py", line 77, in _set_context
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv set_context(self.log, context)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/utils/log/logging_mixin.py", line 213, in set_context
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv flag = cast(FileTaskHandler, handler).set_context(value)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/utils/log/file_task_handler.py", line 70, in set_context
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv local_loc = self._init_file(ti)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/utils/log/file_task_handler.py", line 320, in _init_file
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv Path(directory).mkdir(mode=0o777, parents=True, exist_ok=True)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/usr/lib/python3.8/pathlib.py", line 1297, in mkdir
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv if not exist_ok or not self.is_dir():
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/usr/lib/python3.8/pathlib.py", line 1422, in is_dir
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv return S_ISDIR(self.stat().st_mode)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv File "/usr/lib/python3.8/pathlib.py", line 1198, in stat
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv return self._accessor.stat(self)
[2023-02-07, 18:07:27 CST] {base_task_runner.py:111} INFO - Job 30: Subtask task_download_csv PermissionError: [Errno 13] Permission denied: '/wwx/airflow/logs/dag_id=dag_fetch_public_holiday/run_id=scheduled__2023-02-06T06:00:00+00:00/task_id=task_download_csv'
[2023-02-07, 18:07:28 CST] {local_task_job.py:159} INFO - Task exited with return code 1
[2023-02-07, 18:07:28 CST] {taskinstance.py:2623} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2023-02-07, 18:39:50 CST] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: dag_fetch_public_holiday.task_download_csv scheduled__2023-02-06T06:00:00+00:00 [queued]>
[2023-02-07, 18:39:50 CST] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: dag_fetch_public_holiday.task_download_csv scheduled__2023-02-06T06:00:00+00:00 [queued]>
[2023-02-07, 18:39:50 CST] {taskinstance.py:1362} INFO -
--------------------------------------------------------------------------------
[2023-02-07, 18:39:50 CST] {taskinstance.py:1363} INFO - Starting attempt 1 of 6
[2023-02-07, 18:39:50 CST] {taskinstance.py:1364} INFO -
--------------------------------------------------------------------------------
[2023-02-07, 18:39:51 CST] {taskinstance.py:1383} INFO - Executing <Task(PythonOperator): task_download_csv> on 2023-02-06 06:00:00+00:00
[2023-02-07, 18:39:51 CST] {base_task_runner.py:129} INFO - Running on host: vmi1120376.contaboserver.net
[2023-02-07, 18:39:51 CST] {base_task_runner.py:130} INFO - Running: ['sudo', '-E', '-H', '-u', 'talend', 'airflow', 'tasks', 'run', 'dag_fetch_public_holiday', 'task_download_csv', 'scheduled__2023-02-06T06:00:00+00:00', '--job-id', '32', '--raw', '--subdir', 'DAGS_FOLDER/fetch_public_holiday.py', '--cfg-path', '/tmp/tmp26yeooeq']
[2023-02-07, 18:39:53 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv /wwx/airflow/venv/lib/python3.8/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
[2023-02-07, 18:39:54 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv [[34m2023-02-08, 02:39:54 CST[0m] {[34mdagbag.py:[0m537} INFO[0m - Filling up the DagBag from /wwx/airflow/dags/fetch_public_holiday.py[0m
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv Traceback (most recent call last):
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/usr/lib/python3.8/pathlib.py", line 1288, in mkdir
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv self._accessor.mkdir(self, mode)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv PermissionError: [Errno 13] Permission denied: '/wwx/airflow/logs/dag_id=dag_fetch_public_holiday/run_id=scheduled__2023-02-06T06:00:00+00:00/task_id=task_download_csv'
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv During handling of the above exception, another exception occurred:
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv Traceback (most recent call last):
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/usr/local/bin/airflow", line 8, in <module>
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv sys.exit(main())
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/__main__.py", line 39, in main
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv args.func(args)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 52, in command
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv return func(*args, **kwargs)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/utils/cli.py", line 103, in wrapper
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv return f(*args, **kwargs)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/cli/commands/task_command.py", line 372, in task_run
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv ti.init_run_context(raw=args.raw)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 2503, in init_run_context
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv self._set_context(self)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/utils/log/logging_mixin.py", line 77, in _set_context
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv set_context(self.log, context)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/utils/log/logging_mixin.py", line 213, in set_context
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv flag = cast(FileTaskHandler, handler).set_context(value)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/utils/log/file_task_handler.py", line 70, in set_context
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv local_loc = self._init_file(ti)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/wwx/airflow/venv/lib/python3.8/site-packages/airflow/utils/log/file_task_handler.py", line 320, in _init_file
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv Path(directory).mkdir(mode=0o777, parents=True, exist_ok=True)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/usr/lib/python3.8/pathlib.py", line 1297, in mkdir
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv if not exist_ok or not self.is_dir():
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/usr/lib/python3.8/pathlib.py", line 1422, in is_dir
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv return S_ISDIR(self.stat().st_mode)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv File "/usr/lib/python3.8/pathlib.py", line 1198, in stat
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv return self._accessor.stat(self)
[2023-02-07, 18:40:00 CST] {base_task_runner.py:111} INFO - Job 32: Subtask task_download_csv PermissionError: [Errno 13] Permission denied: '/wwx/airflow/logs/dag_id=dag_fetch_public_holiday/run_id=scheduled__2023-02-06T06:00:00+00:00/task_id=task_download_csv'
[2023-02-07, 18:40:01 CST] {local_task_job.py:159} INFO - Task exited with return code 1
[2023-02-07, 18:40:01 CST] {taskinstance.py:2623} INFO - 0 downstream tasks scheduled from follow-on schedule check
Then I have examined the problematic folder and found it is permission 700.
I could chmod 777 to the folder manually, however, if there are other new dags/tasks, any new dag/task folder is automatically created with permission 700 and I have to manually chmod.
What I have tried:
I have followed https://airflow.apache.org/docs/apache-airflow/1.10.10/security.html?highlight=impersonation#impersonation
Added the following line to /etc/sudoers
airflow ALL=(ALL) NOPASSWD: ALL
Using sudo to start webserver and scheduler
sudo sh -c 'export AIRFLOW_HOME=/wwx/airflow; /wwx/airflow/venv/bin/airflow scheduler -D'
sudo sh -c 'export AIRFLOW_HOME=/wwx/airflow; /wwx/airflow/venv/bin/airflow webserver -D -p 8090'
Added airflow command to /usr/local/bin so that other users can use
sudo ln -s /wwx/airflow/venv/bin/airflow /usr/local/bin/airflow
I have found the solution as follows:
Add the user to airflow group.
usermod -aG airflow <username>
Set the acl of airflow log directory to have files/folders created with group permission rwx, so the dag run user can access these logs
setfacl -d -m group:airflow:rwx $AIRFLOW_HOME/logs
I am not sure if this is a intended behavior in airflow when running dag as other users, since I cannot find any hint from the airflow documentation.
Related
Airflow exceptions thrown on on_retry_callback is suppressed
We are using on_retry_callback parameter available in the Airflow operators to do some cleanup activities before the task is retried. If there are exceptions thrown on the on_retry_callback function, the exceptions are not logged in the task_instance's log. Without the exception details, it is getting difficult to debug if there are issues in the on_retry_callback function. If this is the default behavior, is there a workaround to enable logging for the exceptions?. Note: We are using the airflow 2.0.2 version. Please let me know if there are any questions. Sample Dag to explain this is given below. from datetime import datetime from airflow.operators.python import PythonOperator from airflow.models.dag import DAG def sample_function2(): var = 1 / 0 def on_retry_callback_sample(context): print(f'on_retry_callback_started') v = 1 / 0 print(f'on_retry_callback completed') dag = DAG( 'venkat-test-dag', description='This is a test dag', start_date=datetime(2023, 1, 10, 18, 0), schedule_interval='0 12 * * *', catchup=False ) func2 = PythonOperator(task_id='function2', python_callable=sample_function2, dag=dag, retries=2, on_retry_callback=on_retry_callback_sample) func2 Log file of this run on the local airflow setup is given below. If you see the last message we see on the log file "on_retry_callback_started" but I expect some ZeroDivisionError after this line and finally the line "on_retry_callback completed". How can I achieve this?. 14f0fed99882 *** Reading local file: /usr/local/airflow/logs/venkat-test-dag/function2/2023-01-13T13:22:03.178261+00:00/1.log [2023-01-13 13:22:05,091] {{taskinstance.py:877}} INFO - Dependencies all met for <TaskInstance: venkat-test-dag.function2 2023-01-13T13:22:03.178261+00:00 [queued]> [2023-01-13 13:22:05,128] {{taskinstance.py:877}} INFO - Dependencies all met for <TaskInstance: venkat-test-dag.function2 2023-01-13T13:22:03.178261+00:00 [queued]> [2023-01-13 13:22:05,128] {{taskinstance.py:1068}} INFO - -------------------------------------------------------------------------------- [2023-01-13 13:22:05,128] {{taskinstance.py:1069}} INFO - Starting attempt 1 of 3 [2023-01-13 13:22:05,128] {{taskinstance.py:1070}} INFO - -------------------------------------------------------------------------------- [2023-01-13 13:22:05,143] {{taskinstance.py:1089}} INFO - Executing <Task(PythonOperator): function2> on 2023-01-13T13:22:03.178261+00:00 [2023-01-13 13:22:05,145] {{standard_task_runner.py:52}} INFO - Started process 6947 to run task [2023-01-13 13:22:05,150] {{standard_task_runner.py:76}} INFO - Running: ['airflow', 'tasks', 'run', 'venkat-test-dag', 'function2', '2023-01-13T13:22:03.178261+00:00', '--job-id', '356', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/dp-etl-mixpanel_stg-24H/dags/venkat-test-dag.py', '--cfg-path', '/tmp/tmpny0mhh4j', '--error-file', '/tmp/tmpul506kro'] [2023-01-13 13:22:05,151] {{standard_task_runner.py:77}} INFO - Job 356: Subtask function2 [2023-01-13 13:22:05,244] {{logging_mixin.py:104}} INFO - Running <TaskInstance: venkat-test-dag.function2 2023-01-13T13:22:03.178261+00:00 [running]> on host 14f0fed99882 [2023-01-13 13:22:05,345] {{taskinstance.py:1283}} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=airflow AIRFLOW_CTX_DAG_ID=venkat-test-dag AIRFLOW_CTX_TASK_ID=function2 AIRFLOW_CTX_EXECUTION_DATE=2023-01-13T13:22:03.178261+00:00 AIRFLOW_CTX_DAG_RUN_ID=manual__2023-01-13T13:22:03.178261+00:00 [2023-01-13 13:22:05,346] {{taskinstance.py:1482}} ERROR - Task failed with exception Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task self._prepare_and_execute_task_with_callbacks(context, task) File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks result = self._execute_task(context, task_copy) File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task result = task_copy.execute(context=context) File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 117, in execute return_value = self.execute_callable() File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 128, in execute_callable return self.python_callable(*self.op_args, **self.op_kwargs) File "/usr/local/airflow/dags/dp-etl-mixpanel_stg-24H/dags/venkat-test-dag.py", line 7, in sample_function2 var = 1 / 0 ZeroDivisionError: division by zero [2023-01-13 13:22:05,349] {{taskinstance.py:1532}} INFO - Marking task as UP_FOR_RETRY. dag_id=venkat-test-dag, task_id=function2, execution_date=20230113T132203, start_date=20230113T132205, end_date=20230113T132205 [2023-01-13 13:22:05,402] {{local_task_job.py:146}} INFO - Task exited with return code 1 [2023-01-13 13:22:05,459] {{logging_mixin.py:104}} INFO - on_retry_callback_started
Adding as an answer for visibility: This issue is likely related to a fix which was merged in Airflow version 2.1.3: https://github.com/apache/airflow/pull/17347
google.api_core.exceptions.NotFound bucket does not exists
When I'm running data_ingestion_gcs_dag DAG in Airflow.I get error that it can not find a specified bucket, however, I rechecked it and the bucket name is fine. I have specified access to Google account with docker-compose, here is code down below, i have inserted only first part of code: version: '3' x-airflow-common: &airflow-common # In order to add custom dependencies or upgrade provider packages you can use your extended image. # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml # and uncomment the "build" line below, Then run `docker-compose build` to build the images. build: context: . dockerfile: ./Dockerfile environment: &airflow-common-env AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0 AIRFLOW__CORE__FERNET_KEY: '' AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true' AIRFLOW__CORE__LOAD_EXAMPLES: 'false' AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth' _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-} GOOGLE_APPLICATION_CREDENTIALS: /.google/credentials/google_credentials.json AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT: 'google-cloud-platform://?extra__google_cloud_platform__key_path=/.google/credentials/google_credentials.json' # TODO: Please change GCP_PROJECT_ID & GCP_GCS_BUCKET, as per your config GCP_PROJECT_ID: 'real-dtc-de' GCP_GCS_BUCKET: 'dtc_data_lake_real-dtc-de' volumes: - ./dags:/opt/airflow/dags - ./logs:/opt/airflow/logs - ./plugins:/opt/airflow/plugins - ~/.google/credentials/:/.google/credentials:ro And here is code from DAG code, presented down below: PROJECT_ID = os.environ.get("GCP_PROJECT_ID") BUCKET = os.environ.get("GCP_GCS_BUCKET") Here is logs from DAG: *** Reading local file: /opt/airflow/logs/data_ingestion_gcs_dag/local_to_gcs_task/2022-06-13T02:47:29.654918+00:00/1.log [2022-06-13, 02:47:36 UTC] {taskinstance.py:1032} INFO - Dependencies all met for <TaskInstance: data_ingestion_gcs_dag.local_to_gcs_task manual__2022-06-13T02:47:29.654918+00:00 [queued]> [2022-06-13, 02:47:36 UTC] {taskinstance.py:1032} INFO - Dependencies all met for <TaskInstance: data_ingestion_gcs_dag.local_to_gcs_task manual__2022-06-13T02:47:29.654918+00:00 [queued]> [2022-06-13, 02:47:36 UTC] {taskinstance.py:1238} INFO - -------------------------------------------------------------------------------- [2022-06-13, 02:47:36 UTC] {taskinstance.py:1239} INFO - Starting attempt 1 of 2 [2022-06-13, 02:47:36 UTC] {taskinstance.py:1240} INFO - -------------------------------------------------------------------------------- [2022-06-13, 02:47:36 UTC] {taskinstance.py:1259} INFO - Executing <Task(PythonOperator): local_to_gcs_task> on 2022-06-13 02:47:29.654918+00:00 [2022-06-13, 02:47:36 UTC] {standard_task_runner.py:52} INFO - Started process 1042 to run task [2022-06-13, 02:47:36 UTC] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'data_ingestion_gcs_dag', 'local_to_gcs_task', 'manual__2022-06-13T02:47:29.654918+00:00', '--job-id', '11', '--raw', '--subdir', 'DAGS_FOLDER/data_ingestion_gcs_dag.py', '--cfg-path', '/tmp/tmp11gg9aoy', '--error-file', '/tmp/tmpjbp6yrks'] [2022-06-13, 02:47:36 UTC] {standard_task_runner.py:77} INFO - Job 11: Subtask local_to_gcs_task [2022-06-13, 02:47:36 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: data_ingestion_gcs_dag.local_to_gcs_task manual__2022-06-13T02:47:29.654918+00:00 [running]> on host aea7312db396 [2022-06-13, 02:47:36 UTC] {taskinstance.py:1426} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=*** AIRFLOW_CTX_DAG_ID=data_ingestion_gcs_dag AIRFLOW_CTX_TASK_ID=local_to_gcs_task AIRFLOW_CTX_EXECUTION_DATE=2022-06-13T02:47:29.654918+00:00 AIRFLOW_CTX_DAG_RUN_ID=manual__2022-06-13T02:47:29.654918+00:00 [2022-06-13, 02:47:36 UTC] {taskinstance.py:1700} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2594, in upload_from_file retry=retry, File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2396, in _do_upload retry=retry, File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1917, in _do_multipart_upload transport, data, object_metadata, content_type, timeout=timeout File "/home/airflow/.local/lib/python3.7/site-packages/google/resumable_media/requests/upload.py", line 154, in transmit retriable_request, self._get_status_code, self._retry_strategy File "/home/airflow/.local/lib/python3.7/site-packages/google/resumable_media/requests/_request_helpers.py", line 147, in wait_and_retry response = func() File "/home/airflow/.local/lib/python3.7/site-packages/google/resumable_media/requests/upload.py", line 149, in retriable_request self._process_response(result) File "/home/airflow/.local/lib/python3.7/site-packages/google/resumable_media/_upload.py", line 113, in _process_response _helpers.require_status_code(response, (http.client.OK,), self._get_status_code) File "/home/airflow/.local/lib/python3.7/site-packages/google/resumable_media/_helpers.py", line 104, in require_status_code *status_codes google.resumable_media.common.InvalidResponse: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1329, in _run_raw_task self._execute_task_with_callbacks(context) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1455, in _execute_task_with_callbacks result = self._execute_task(context, self.task) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1511, in _execute_task result = execute_callable(context=context) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 174, in execute return_value = self.execute_callable() File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 185, in execute_callable return self.python_callable(*self.op_args, **self.op_kwargs) File "/opt/airflow/dags/data_ingestion_gcs_dag.py", line 51, in upload_to_gcs blob.upload_from_filename(local_file) File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2735, in upload_from_filename retry=retry, File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2598, in upload_from_file _raise_from_invalid_response(exc) File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 4466, in _raise_from_invalid_response raise exceptions.from_http_status(response.status_code, message, response=response) google.api_core.exceptions.NotFound: 404 POST https://storage.googleapis.com/upload/storage/v1/b/dtc_data_lake_animated-surfer-338618/o?uploadType=multipart: { "error": { "code": 404, "message": "The specified bucket does not exist.", "errors": [ { "message": "The specified bucket does not exist.", "domain": "global", "reason": "notFound" } ] } }
MysqlOperator in airflow 2.0.1 failed with "ssl connection error"
I am new to airflow and I am trying to test Mysql connection using MysqlOperator in airflow 2.0.1. However I am getting an error regarding to ssl connection error. I have tried to add extra parameters to disable ssl mode, but still I am getting the same error. Here is my code, (I tried to pass the ssl param = disable in the code), and it doesn't work. from airflow import DAG from airflow.providers.mysql.operators.mysql import MySqlOperator from airflow.operators.python import PythonOperator from airflow.operators.dummy_operator import DummyOperator from airflow.utils.dates import days_ago, timedelta default_args = { 'owner' : 'airflow', 'depend_on_past' : False, 'start_date' : days_ago(2), 'retries' : 1, 'retry_delay' : timedelta(minutes=1) } with DAG( 'mysqlConnTest', default_args=default_args, schedule_interval='#once', catchup=False) as dag: start_date = DummyOperator(task_id = "start_task") # [START howto_operator_mysql] select_table_mysql_task = MySqlOperator( task_id='select_table_mysql', mysql_conn_id='mysql', sql="SELECT * FROM country;"autocommit=True, parameters= {'ssl_mode': 'DISABLED'} ) start_date >> select_table_mysql_task and here is the error *** Reading local file: /opt/airflow/logs/mysqlHookConnTest/select_table_mysql/2021-04-14T12:46:42.221662+00:00/2.log [2021-04-14 12:47:46,791] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: mysqlHookConnTest.select_table_mysql 2021-04-14T12:46:42.221662+00:00 [queued]> [2021-04-14 12:47:47,007] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: mysqlHookConnTest.select_table_mysql 2021-04-14T12:46:42.221662+00:00 [queued]> [2021-04-14 12:47:47,047] {taskinstance.py:1042} INFO - -------------------------------------------------------------------------------- [2021-04-14 12:47:47,054] {taskinstance.py:1043} INFO - Starting attempt 2 of 2 [2021-04-14 12:47:47,074] {taskinstance.py:1044} INFO - -------------------------------------------------------------------------------- [2021-04-14 12:47:47,331] {taskinstance.py:1063} INFO - Executing <Task(MySqlOperator): select_table_mysql> on 2021-04-14T12:46:42.221662+00:00 [2021-04-14 12:47:47,377] {standard_task_runner.py:52} INFO - Started process 66 to run task [2021-04-14 12:47:47,402] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'mysqlHookConnTest', 'select_table_mysql', '2021-04-14T12:46:42.221662+00:00', '--job-id', '142', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/MySqlHookConnTest.py', '--cfg-path', '/tmp/tmppujnrey3', '--error-file', '/tmp/tmpjl_g_p3t'] [2021-04-14 12:47:47,413] {standard_task_runner.py:77} INFO - Job 142: Subtask select_table_mysql [2021-04-14 12:47:47,556] {logging_mixin.py:104} INFO - Running <TaskInstance: mysqlHookConnTest.select_table_mysql 2021-04-14T12:46:42.221662+00:00 [running]> on host ea95b9685a31 [2021-04-14 12:47:47,672] {taskinstance.py:1257} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=airflow AIRFLOW_CTX_DAG_ID=mysqlHookConnTest AIRFLOW_CTX_TASK_ID=select_table_mysql AIRFLOW_CTX_EXECUTION_DATE=2021-04-14T12:46:42.221662+00:00 AIRFLOW_CTX_DAG_RUN_ID=manual__2021-04-14T12:46:42.221662+00:00 [2021-04-14 12:47:47,687] {mysql.py:72} INFO - Executing: SELECT idPais, Nombre, codigo, paisPlataforma, create_date, update_date FROM ob_cpanel.cpanel_pais; [2021-04-14 12:47:47,710] {base.py:74} INFO - Using connection to: id: mysql. Host: sys-sql-pre-01.oneboxtickets.net, Port: 3306, Schema: , Login: lectura, Password: None, extra: None [2021-04-14 12:47:48,134] {taskinstance.py:1455} ERROR - (2006, 'SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol') Traceback (most recent call last): File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task self._prepare_and_execute_task_with_callbacks(context, task) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks result = self._execute_task(context, task_copy) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task result = task_copy.execute(context=context) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/mysql/operators/mysql.py", line 74, in execute hook.run(self.sql, autocommit=self.autocommit, parameters=self.parameters) File "/home/airflow/.local/lib/python3.6/site-packages/airflow/hooks/dbapi.py", line 173, in run with closing(self.get_conn()) as conn: File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/mysql/hooks/mysql.py", line 144, in get_conn return MySQLdb.connect(**conn_config) File "/home/airflow/.local/lib/python3.6/site-packages/MySQLdb/__init__.py", line 85, in Connect return Connection(*args, **kwargs) File "/home/airflow/.local/lib/python3.6/site-packages/MySQLdb/connections.py", line 208, in __init__ super(Connection, self).__init__(*args, **kwargs2) _mysql_exceptions.OperationalError: (2006, 'SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol') [2021-04-14 12:47:48,143] {taskinstance.py:1503} INFO - Marking task as FAILED. dag_id=mysqlHookConnTest, task_id=select_table_mysql, execution_date=20210414T124642, start_date=20210414T124746, end_date=20210414T124748 [2021-04-14 12:47:48,243] {local_task_job.py:146} INFO - Task exited with return code 1 We have tried to remove the last two parameter from the dag code, and we add in extra field(conn-airflow UI). Adding this json {"ssl":false} and the issue appears with another similar error /opt/airflow/logs/mysqlOperatorConnTest/select_table_mysql/2021-04-15T11:26:50.578333+00:00/2.log *** Fetching from: http://airflow-worker-0.airflow-worker.airflow.svc.cluster.local:8793/log/mysqlOperatorConnTest/select_table_mysql/2021-04-15T11:26:50.578333+00:00/2.log [2021-04-15 11:27:54,471] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: mysqlOperatorConnTest.select_table_mysql 2021-04-15T11:26:50.578333+00:00 [queued]> [2021-04-15 11:27:54,497] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: mysqlOperatorConnTest.select_table_mysql 2021-04-15T11:26:50.578333+00:00 [queued]> [2021-04-15 11:27:54,497] {taskinstance.py:1042} INFO - -------------------------------------------------------------------------------- [2021-04-15 11:27:54,497] {taskinstance.py:1043} INFO - Starting attempt 2 of 2 [2021-04-15 11:27:54,497] {taskinstance.py:1044} INFO - -------------------------------------------------------------------------------- [2021-04-15 11:27:54,507] {taskinstance.py:1063} INFO - Executing <Task(MySqlOperator): select_table_mysql> on 2021-04-15T11:26:50.578333+00:00 [2021-04-15 11:27:54,510] {standard_task_runner.py:52} INFO - Started process 115 to run task [2021-04-15 11:27:54,514] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'mysqlOperatorConnTest', 'select_table_mysql', '2021-04-15T11:26:50.578333+00:00', '--job-id', '68', '--pool', 'default_pool', '--raw', '--subdir', '/opt/airflow/dags/repo/MySqlOperatorConnTest.py', '--cfg-path', '/tmp/tmpy7bv58_z', '--error-file', '/tmp/tmpaoe808of'] [2021-04-15 11:27:54,514] {standard_task_runner.py:77} INFO - Job 68: Subtask select_table_mysql [2021-04-15 11:27:54,644] {logging_mixin.py:104} INFO - Running <TaskInstance: mysqlOperatorConnTest.select_table_mysql 2021-04-15T11:26:50.578333+00:00 [running]> on host airflow-worker-0.airflow-worker.airflow.svc.cluster.local [2021-04-15 11:27:54,707] {logging_mixin.py:104} WARNING - /opt/python/site-packages/sqlalchemy/sql/coercions.py:518 SAWarning: Coercing Subquery object into a select() for use in IN(); please pass a select() construct explicitly [2021-04-15 11:27:54,725] {taskinstance.py:1255} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=airflow AIRFLOW_CTX_DAG_ID=mysqlOperatorConnTest AIRFLOW_CTX_TASK_ID=select_table_mysql AIRFLOW_CTX_EXECUTION_DATE=2021-04-15T11:26:50.578333+00:00 AIRFLOW_CTX_DAG_RUN_ID=manual__2021-04-15T11:26:50.578333+00:00 [2021-04-15 11:27:54,726] {mysql.py:72} INFO - Executing: SELECT idPais, Nombre, codigo, paisPlataforma, create_date, update_date FROM ob_cpanel.cpanel_pais; [2021-04-15 11:27:54,744] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11) Traceback (most recent call last): File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson obj = json.loads(self.extra) File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11) [2021-04-15 11:27:54,744] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql [2021-04-15 11:27:54,744] {base.py:65} INFO - Using connection to: id: mysql. Host: sys-sql-pre-01.oneboxtickets.net, Port: 3306, Schema: , Login: lectura, Password: XXXXXXXX, extra: None [2021-04-15 11:27:54,745] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11) Traceback (most recent call last): File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson obj = json.loads(self.extra) File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11) [2021-04-15 11:27:54,745] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql [2021-04-15 11:27:54,745] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11) Traceback (most recent call last): File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson obj = json.loads(self.extra) File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11) [2021-04-15 11:27:54,745] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql [2021-04-15 11:27:54,746] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11) Traceback (most recent call last): File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson obj = json.loads(self.extra) File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11) [2021-04-15 11:27:54,746] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql [2021-04-15 11:27:54,746] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11) Traceback (most recent call last): File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson obj = json.loads(self.extra) File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11) [2021-04-15 11:27:54,746] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql [2021-04-15 11:27:54,746] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11) Traceback (most recent call last): File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson obj = json.loads(self.extra) File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11) [2021-04-15 11:27:54,747] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql [2021-04-15 11:27:54,747] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11) Traceback (most recent call last): File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson obj = json.loads(self.extra) File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11) [2021-04-15 11:27:54,747] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql [2021-04-15 11:27:54,747] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11) Traceback (most recent call last): File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson obj = json.loads(self.extra) File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11) [2021-04-15 11:27:54,747] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql [2021-04-15 11:27:54,787] {taskinstance.py:1455} ERROR - (2006, 'SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol') Traceback (most recent call last): File "/opt/python/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task self._prepare_and_execute_task_with_callbacks(context, task) File "/opt/python/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks result = self._execute_task(context, task_copy) File "/opt/python/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task result = task_copy.execute(context=context) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/mysql/operators/mysql.py", line 74, in execute hook.run(self.sql, autocommit=self.autocommit, parameters=self.parameters) File "/opt/python/site-packages/airflow/hooks/dbapi.py", line 173, in run with closing(self.get_conn()) as conn: File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/mysql/hooks/mysql.py", line 144, in get_conn return MySQLdb.connect(**conn_config) File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/__init__.py", line 85, in Connect return Connection(*args, **kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 208, in __init__ super(Connection, self).__init__(*args, **kwargs2) _mysql_exceptions.OperationalError: (2006, 'SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol') [2021-04-15 11:27:54,788] {taskinstance.py:1496} INFO - Marking task as FAILED. dag_id=mysqlOperatorConnTest, task_id=select_table_mysql, execution_date=20210415T112650, start_date=20210415T112754, end_date=20210415T112754 [2021-04-15 11:27:54,845] {local_task_job.py:146} INFO - Task exited with return code 1
We solved this issue upgrading the Mysql client to 5.7. Our server version was 5.6 and the previous client was 8, as I was using docker image. so we downgraded the client to be more closer to the server version.
KeyError: 'ti' in Apache Airflow xcom
We are trying to run a simple DAG with 2 tasks which will communicate data via xcom. DAG file: from __future__ import print_function import airflow from airflow import DAG from airflow.operators.bash_operator import BashOperator from airflow.operators.python_operator import PythonOperator args = { 'owner': 'airflow', 'start_date': airflow.utils.dates.days_ago(2) } dag = DAG( 'example_xcom', schedule_interval="#once", default_args=args) value_1 = [1, 2, 3] def push(**kwargs): # pushes an XCom without a specific target kwargs['ti'].xcom_push(key='value from pusher 1', value=value_1) def puller(**kwargs): ti = kwargs['ti'] v1 = ti.xcom_pull(key=None, task_ids='push') assert v1 == value_1 v1 = ti.xcom_pull(key=None, task_ids=['push']) assert (v1) == (value_1) push1 = PythonOperator( task_id='push', dag=dag, python_callable=push) pull = BashOperator( task_id='also_run_this', bash_command='echo {{ ti.xcom_pull(task_ids="push_by_returning") }}', dag=dag) pull.set_upstream(push1) But while running the DAG in airflow we are getting the following exception. [2018-09-27 16:55:33,431] {base_task_runner.py:98} INFO - Subtask: [2018-09-27 16:55:33,430] {models.py:189} INFO - Filling up the DagBag from /home/airflow/gcs/dags/xcom.py [2018-09-27 16:55:33,694] {base_task_runner.py:98} INFO - Subtask: Traceback (most recent call last): [2018-09-27 16:55:33,694] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/bin/airflow", line 27, in <module> [2018-09-27 16:55:33,696] {base_task_runner.py:98} INFO - Subtask: args.func(args) [2018-09-27 16:55:33,697] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 392, in run [2018-09-27 16:55:33,697] {base_task_runner.py:98} INFO - Subtask: pool=args.pool, [2018-09-27 16:55:33,698] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper [2018-09-27 16:55:33,699] {base_task_runner.py:98} INFO - Subtask: result = func(*args, **kwargs) [2018-09-27 16:55:33,699] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1492, in _run_raw_task [2018-09-27 16:55:33,701] {base_task_runner.py:98} INFO - Subtask: result = task_copy.execute(context=context) [2018-09-27 16:55:33,701] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 89, in execute [2018-09-27 16:55:33,702] {base_task_runner.py:98} INFO - Subtask: return_value = self.execute_callable() [2018-09-27 16:55:33,703] {base_task_runner.py:98} INFO - Subtask: File "/usr/local/lib/python2.7/site-packages/airflow/operators/python_operator.py", line 94, in execute_callable [2018-09-27 16:55:33,703] {base_task_runner.py:98} INFO - Subtask: return self.python_callable(*self.op_args, **self.op_kwargs) [2018-09-27 16:55:33,704] {base_task_runner.py:98} INFO - Subtask: File "/home/airflow/gcs/dags/xcom.py", line 22, in push [2018-09-27 16:55:33,707] {base_task_runner.py:98} INFO - Subtask: kwargs['ti'].xcom_push(key='value from pusher 1', value=value_1) [2018-09-27 16:55:33,708] {base_task_runner.py:98} INFO - Subtask: KeyError: 'ti' We validated the DAG there is but no issue, Please help us to fix this issue.
Add provide_context: True to default args. This is needed to define **kwargs. args = { 'owner': 'airflow', 'start_date': airflow.utils.dates.days_ago(2), 'provide_context': True } provide_context (bool) – if set to true, Airflow will pass a set of keyword arguments that can be used in your function. This set of kwargs correspond exactly to what you can use in your jinja templates. For this to work, you need to define **kwargs in your function header.
Airflow 1.9 - SSHOperator doesn't seem to work?
Upgraded to v1.9 and I'm having a hard time getting the SSHOperator to work. It was working w/ v1.8.2. Code dag = DAG('transfer_ftp_s3', default_args=default_args,schedule_interval=None) task = SSHOperator( ssh_conn_id='ssh_node', task_id="check_ftp_for_new_files", command="echo 'hello world'", do_xcom_push=True, dag=dag,) Error [2018-02-19 06:48:02,691] {{base_task_runner.py:98}} INFO - Subtask: Traceback (most recent call last): [2018-02-19 06:48:02,691] {{base_task_runner.py:98}} INFO - Subtask: File "/usr/bin/airflow", line 27, in <module> [2018-02-19 06:48:02,692] {{base_task_runner.py:98}} INFO - Subtask: args.func(args) [2018-02-19 06:48:02,693] {{base_task_runner.py:98}} INFO - Subtask: File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 392, in run [2018-02-19 06:48:02,695] {{base_task_runner.py:98}} INFO - Subtask: pool=args.pool, [2018-02-19 06:48:02,695] {{base_task_runner.py:98}} INFO - Subtask: File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper [2018-02-19 06:48:02,696] {{base_task_runner.py:98}} INFO - Subtask: result = func(*args, **kwargs) [2018-02-19 06:48:02,696] {{base_task_runner.py:98}} INFO - Subtask: File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1496, in _run_raw_task [2018-02-19 06:48:02,696] {{base_task_runner.py:98}} INFO - Subtask: result = task_copy.execute(context=context) [2018-02-19 06:48:02,697] {{base_task_runner.py:98}} INFO - Subtask: File "/usr/lib/python2.7/site-packages/airflow/contrib/operators/ssh_operator.py", line 146, in execute [2018-02-19 06:48:02,697] {{base_task_runner.py:98}} INFO - Subtask: raise AirflowException("SSH operator error: {0}".format(str(e))) [2018-02-19 06:48:02,698] {{base_task_runner.py:98}} INFO - Subtask: airflow.exceptions.AirflowException: SSH operator error: 'bool' object has no attribute 'lower'
as AIRFLOW-2122 check your connections setting , make sure the extra's value using a string instead of a bool