google.api_core.exceptions.NotFound bucket does not exists - airflow

When I'm running data_ingestion_gcs_dag DAG in Airflow.I get error that it can not find a specified bucket, however, I rechecked it and the bucket name is fine. I have specified access to Google account with docker-compose, here is code down below, i have inserted only first part of code:
version: '3'
x-airflow-common:
&airflow-common
# In order to add custom dependencies or upgrade provider packages you can use your extended image.
# Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
# and uncomment the "build" line below, Then run `docker-compose build` to build the images.
build:
context: .
dockerfile: ./Dockerfile
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow#postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:#redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
GOOGLE_APPLICATION_CREDENTIALS: /.google/credentials/google_credentials.json
AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT: 'google-cloud-platform://?extra__google_cloud_platform__key_path=/.google/credentials/google_credentials.json'
# TODO: Please change GCP_PROJECT_ID & GCP_GCS_BUCKET, as per your config
GCP_PROJECT_ID: 'real-dtc-de'
GCP_GCS_BUCKET: 'dtc_data_lake_real-dtc-de'
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ~/.google/credentials/:/.google/credentials:ro
And here is code from DAG code, presented down below:
PROJECT_ID = os.environ.get("GCP_PROJECT_ID")
BUCKET = os.environ.get("GCP_GCS_BUCKET")
Here is logs from DAG:
*** Reading local file: /opt/airflow/logs/data_ingestion_gcs_dag/local_to_gcs_task/2022-06-13T02:47:29.654918+00:00/1.log
[2022-06-13, 02:47:36 UTC] {taskinstance.py:1032} INFO - Dependencies all met for <TaskInstance: data_ingestion_gcs_dag.local_to_gcs_task manual__2022-06-13T02:47:29.654918+00:00 [queued]>
[2022-06-13, 02:47:36 UTC] {taskinstance.py:1032} INFO - Dependencies all met for <TaskInstance: data_ingestion_gcs_dag.local_to_gcs_task manual__2022-06-13T02:47:29.654918+00:00 [queued]>
[2022-06-13, 02:47:36 UTC] {taskinstance.py:1238} INFO -
--------------------------------------------------------------------------------
[2022-06-13, 02:47:36 UTC] {taskinstance.py:1239} INFO - Starting attempt 1 of 2
[2022-06-13, 02:47:36 UTC] {taskinstance.py:1240} INFO -
--------------------------------------------------------------------------------
[2022-06-13, 02:47:36 UTC] {taskinstance.py:1259} INFO - Executing <Task(PythonOperator): local_to_gcs_task> on 2022-06-13 02:47:29.654918+00:00
[2022-06-13, 02:47:36 UTC] {standard_task_runner.py:52} INFO - Started process 1042 to run task
[2022-06-13, 02:47:36 UTC] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'data_ingestion_gcs_dag', 'local_to_gcs_task', 'manual__2022-06-13T02:47:29.654918+00:00', '--job-id', '11', '--raw', '--subdir', 'DAGS_FOLDER/data_ingestion_gcs_dag.py', '--cfg-path', '/tmp/tmp11gg9aoy', '--error-file', '/tmp/tmpjbp6yrks']
[2022-06-13, 02:47:36 UTC] {standard_task_runner.py:77} INFO - Job 11: Subtask local_to_gcs_task
[2022-06-13, 02:47:36 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: data_ingestion_gcs_dag.local_to_gcs_task manual__2022-06-13T02:47:29.654918+00:00 [running]> on host aea7312db396
[2022-06-13, 02:47:36 UTC] {taskinstance.py:1426} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=data_ingestion_gcs_dag
AIRFLOW_CTX_TASK_ID=local_to_gcs_task
AIRFLOW_CTX_EXECUTION_DATE=2022-06-13T02:47:29.654918+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-06-13T02:47:29.654918+00:00
[2022-06-13, 02:47:36 UTC] {taskinstance.py:1700} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2594, in upload_from_file
retry=retry,
File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2396, in _do_upload
retry=retry,
File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1917, in _do_multipart_upload
transport, data, object_metadata, content_type, timeout=timeout
File "/home/airflow/.local/lib/python3.7/site-packages/google/resumable_media/requests/upload.py", line 154, in transmit
retriable_request, self._get_status_code, self._retry_strategy
File "/home/airflow/.local/lib/python3.7/site-packages/google/resumable_media/requests/_request_helpers.py", line 147, in wait_and_retry
response = func()
File "/home/airflow/.local/lib/python3.7/site-packages/google/resumable_media/requests/upload.py", line 149, in retriable_request
self._process_response(result)
File "/home/airflow/.local/lib/python3.7/site-packages/google/resumable_media/_upload.py", line 113, in _process_response
_helpers.require_status_code(response, (http.client.OK,), self._get_status_code)
File "/home/airflow/.local/lib/python3.7/site-packages/google/resumable_media/_helpers.py", line 104, in require_status_code
*status_codes
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1329, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1455, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1511, in _execute_task
result = execute_callable(context=context)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 174, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 185, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/opt/airflow/dags/data_ingestion_gcs_dag.py", line 51, in upload_to_gcs
blob.upload_from_filename(local_file)
File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2735, in upload_from_filename
retry=retry,
File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2598, in upload_from_file
_raise_from_invalid_response(exc)
File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 4466, in _raise_from_invalid_response
raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.NotFound: 404 POST https://storage.googleapis.com/upload/storage/v1/b/dtc_data_lake_animated-surfer-338618/o?uploadType=multipart: {
"error": {
"code": 404,
"message": "The specified bucket does not exist.",
"errors": [
{
"message": "The specified bucket does not exist.",
"domain": "global",
"reason": "notFound"
}
]
}
}

Related

Airflow exceptions thrown on on_retry_callback is suppressed

We are using on_retry_callback parameter available in the Airflow operators to do some cleanup activities before the task is retried. If there are exceptions thrown on the on_retry_callback function, the exceptions are not logged in the task_instance's log. Without the exception details, it is getting difficult to debug if there are issues in the on_retry_callback function. If this is the default behavior, is there a workaround to enable logging for the exceptions?.
Note: We are using the airflow 2.0.2 version.
Please let me know if there are any questions.
Sample Dag to explain this is given below.
from datetime import datetime
from airflow.operators.python import PythonOperator
from airflow.models.dag import DAG
def sample_function2():
var = 1 / 0
def on_retry_callback_sample(context):
print(f'on_retry_callback_started')
v = 1 / 0
print(f'on_retry_callback completed')
dag = DAG(
'venkat-test-dag',
description='This is a test dag',
start_date=datetime(2023, 1, 10, 18, 0),
schedule_interval='0 12 * * *',
catchup=False
)
func2 = PythonOperator(task_id='function2',
python_callable=sample_function2,
dag=dag,
retries=2,
on_retry_callback=on_retry_callback_sample)
func2
Log file of this run on the local airflow setup is given below. If you see the last message we see on the log file "on_retry_callback_started" but I expect some ZeroDivisionError after this line and finally the line "on_retry_callback completed". How can I achieve this?.
14f0fed99882
*** Reading local file: /usr/local/airflow/logs/venkat-test-dag/function2/2023-01-13T13:22:03.178261+00:00/1.log
[2023-01-13 13:22:05,091] {{taskinstance.py:877}} INFO - Dependencies all met for <TaskInstance: venkat-test-dag.function2 2023-01-13T13:22:03.178261+00:00 [queued]>
[2023-01-13 13:22:05,128] {{taskinstance.py:877}} INFO - Dependencies all met for <TaskInstance: venkat-test-dag.function2 2023-01-13T13:22:03.178261+00:00 [queued]>
[2023-01-13 13:22:05,128] {{taskinstance.py:1068}} INFO -
--------------------------------------------------------------------------------
[2023-01-13 13:22:05,128] {{taskinstance.py:1069}} INFO - Starting attempt 1 of 3
[2023-01-13 13:22:05,128] {{taskinstance.py:1070}} INFO -
--------------------------------------------------------------------------------
[2023-01-13 13:22:05,143] {{taskinstance.py:1089}} INFO - Executing <Task(PythonOperator): function2> on 2023-01-13T13:22:03.178261+00:00
[2023-01-13 13:22:05,145] {{standard_task_runner.py:52}} INFO - Started process 6947 to run task
[2023-01-13 13:22:05,150] {{standard_task_runner.py:76}} INFO - Running: ['airflow', 'tasks', 'run', 'venkat-test-dag', 'function2', '2023-01-13T13:22:03.178261+00:00', '--job-id', '356', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/dp-etl-mixpanel_stg-24H/dags/venkat-test-dag.py', '--cfg-path', '/tmp/tmpny0mhh4j', '--error-file', '/tmp/tmpul506kro']
[2023-01-13 13:22:05,151] {{standard_task_runner.py:77}} INFO - Job 356: Subtask function2
[2023-01-13 13:22:05,244] {{logging_mixin.py:104}} INFO - Running <TaskInstance: venkat-test-dag.function2 2023-01-13T13:22:03.178261+00:00 [running]> on host 14f0fed99882
[2023-01-13 13:22:05,345] {{taskinstance.py:1283}} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=venkat-test-dag
AIRFLOW_CTX_TASK_ID=function2
AIRFLOW_CTX_EXECUTION_DATE=2023-01-13T13:22:03.178261+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-01-13T13:22:03.178261+00:00
[2023-01-13 13:22:05,346] {{taskinstance.py:1482}} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 117, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 128, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/usr/local/airflow/dags/dp-etl-mixpanel_stg-24H/dags/venkat-test-dag.py", line 7, in sample_function2
var = 1 / 0
ZeroDivisionError: division by zero
[2023-01-13 13:22:05,349] {{taskinstance.py:1532}} INFO - Marking task as UP_FOR_RETRY. dag_id=venkat-test-dag, task_id=function2, execution_date=20230113T132203, start_date=20230113T132205, end_date=20230113T132205
[2023-01-13 13:22:05,402] {{local_task_job.py:146}} INFO - Task exited with return code 1
[2023-01-13 13:22:05,459] {{logging_mixin.py:104}} INFO - on_retry_callback_started
Adding as an answer for visibility:
This issue is likely related to a fix which was merged in Airflow version 2.1.3:
https://github.com/apache/airflow/pull/17347

mlflow model artifacts are not getting stored, while running the airflow dag. for that reason unable to fetch experiment details?

**mlflow training code:**
import mlflow
from mlflow.tracking import MlflowClient
client = MlflowClient()
" training the model and saving the model artificats"
mlflow.set_registry_uri('postgresql://postgres:postgres#localhost/mlflow')
mlflow.set_experiment('testing_mlflow_with_airflow')
with mlflow.start_run():
# creating the training dataframe
train_x = self.train_data[0]
train_y = self.train_data[1]
# training the given model
model.fit(train_x, train_y)
mlflow.sklearn.log_model(model, "model")
" getting the experiment details by experiment name"
experiment_id = client.get_experiment_by_name('testing_mlflow_with_airflow').experiment_id
experiment_results = mlflow.search_runs(experiment_ids=experiment_id) '''
**airflow task code:**
training = BashOperator(
task_id = 'mlflow_training',
bash_command='python3 /home/vasanth/airflow/scripts/mlproject/src/models/train_mlflow.py',
do_xcom_push=False
)
**airflow error:**
[2022-04-29, 13:01:08 UTC] {subprocess.py:74} INFO - Running command: ['bash', '-c', 'python3 /home/vasanth/airflow/scripts/mlproject/src/models/train_mlflow.py']
[2022-04-29, 13:01:08 UTC] {subprocess.py:85} INFO - Output:
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - WARNING:root:Malformed experiment '2'. Detailed error Yaml file '/tmp/airflowtmpzjvuldm6/mlruns/2/meta.yaml' does not exist.
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - Traceback (most recent call last):
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - File "/usr/local/lib/python3.8/dist-packages/mlflow/store/tracking/file_store.py", line 262, in list_experiments
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - experiment = self._get_experiment(exp_id, view_type)
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - File "/usr/local/lib/python3.8/dist-packages/mlflow/store/tracking/file_store.py", line 341, in _get_experiment
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - meta = read_yaml(experiment_dir, FileStore.META_DATA_FILE_NAME)
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - File "/usr/local/lib/python3.8/dist-packages/mlflow/utils/file_utils.py", line 179, in read_yaml
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - raise MissingConfigException("Yaml file '%s' does not exist." % file_path)
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - mlflow.exceptions.MissingConfigException: Yaml file '/tmp/airflowtmpzjvuldm6/mlruns/2/meta.yaml' does not exist.
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - tracking uri ***ql://***:***#localhost/mlflow
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - Traceback (most recent call last):
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - File "/home/vasanth/airflow/scripts/mlproject/src/models/train_mlflow.py", line 44, in <module>
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - experiment_id = client.get_experiment_by_name('testing_mlflow_with_airflow').experiment_id
[2022-04-29, 13:01:28 UTC] {subprocess.py:89} INFO - AttributeError: 'NoneType' object has no attribute 'experiment_id'
[2022-04-29, 13:01:29 UTC] {subprocess.py:93} INFO - Command exited with return code 1
[2022-04-29, 13:01:29 UTC] {taskinstance.py:1774} ERROR - Task failed with exception
how i can set a directory where all my experiment runs artifacts will be stored?
where the mlflow artifacts are getting stored now?
how i can find all runs details by the mlflow client as per the above code?
i have tried with different approaches, None of them is worked
setting the tracking server as below
mlflow.set_tracking_uri('postgresql://postgres:postgres#localhost/mlflow')
mlflow.set_tracking_uri('file:///tmp/mlruns')
mlflow.set_tracking_uri('http://localhost:5000')
mlflow.set_registry_uri('postgresql://postgres:postgres#localhost/mlflow')
mlflow.set_tracking_uri('/home/vasanth/airflow/scripts/mlproject/src/models')

Airflow job returns successfully. Deployed dataflow job nowhere to be found

So when I run the job locally using jar, it deploys and finishes successfully i.e. I can see the output files in GCS
java -cp /Users/zainqasmi/Workspace/vasa/dataflow/build/libs/vasa-dataflow-2022-03-25-12-27-14-784-all.jar com.nianticproject.geodata.extraction.ExtractGeodata \
--project=vasa-dev \
--configurationPath=/Users/zainqasmi/Workspace/vasa/dataflow/src/main/resources/foursquare/extract.pb.txt \
--region=us-central1 \
--runner=DataflowRunner \
--dryRun=false \
--workerMachineType=n2d-highmem-16
However, when I push the dag to airflow, it apparently runs successfully i.e. Marking task as SUCCESS and return code 0. But I can't find the dataflow being executed anywhere in GCP UI. Am I missing something? Using environment composer-2-0-7-airflow-2-2-3
Logs from airflow:
*** Reading remote log from gs://us-central1-airflow-dev-b0cc30af-bucket/logs/foursquare_1/extract_geodata/2022-03-25T22:52:15.382542+00:00/1.log.
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1033} INFO - Dependencies all met for <TaskInstance: foursquare_1.extract_geodata manual__2022-03-25T22:52:15.382542+00:00 [queued]>
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1033} INFO - Dependencies all met for <TaskInstance: foursquare_1.extract_geodata manual__2022-03-25T22:52:15.382542+00:00 [queued]>
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1239} INFO -
--------------------------------------------------------------------------------
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1240} INFO - Starting attempt 1 of 2
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1241} INFO -
--------------------------------------------------------------------------------
[2022-03-25, 22:52:21 UTC] {taskinstance.py:1260} INFO - Executing <Task(DataFlowJavaOperator): extract_geodata> on 2022-03-25 22:52:15.382542+00:00
[2022-03-25, 22:52:21 UTC] {standard_task_runner.py:52} INFO - Started process 57323 to run task
[2022-03-25, 22:52:21 UTC] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'foursquare_1', 'extract_geodata', 'manual__2022-03-25T22:52:15.382542+00:00', '--job-id', '1531', '--raw', '--subdir', 'DAGS_FOLDER/dataflow_operator_test.py', '--cfg-path', '/tmp/tmp4thgd6do', '--error-file', '/tmp/tmpu6crkval']
[2022-03-25, 22:52:21 UTC] {standard_task_runner.py:77} INFO - Job 1531: Subtask extract_geodata
[2022-03-25, 22:52:22 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: foursquare_1.extract_geodata manual__2022-03-25T22:52:15.382542+00:00 [running]> on host airflow-worker-9rz89
[2022-03-25, 22:52:22 UTC] {taskinstance.py:1426} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=foursquare_1
AIRFLOW_CTX_TASK_ID=extract_geodata
AIRFLOW_CTX_EXECUTION_DATE=2022-03-25T22:52:15.382542+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-03-25T22:52:15.382542+00:00
[2022-03-25, 22:52:22 UTC] {credentials_provider.py:312} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
[2022-03-25, 22:52:22 UTC] {taskinstance.py:1268} INFO - Marking task as SUCCESS. dag_id=foursquare_1, task_id=extract_geodata, execution_date=20220325T225215, start_date=20220325T225221, end_date=20220325T225222
[2022-03-25, 22:52:22 UTC] {local_task_job.py:154} INFO - Task exited with return code 0
[2022-03-25, 22:52:22 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
Dag:
GCP_PROJECT = "vasa-dev"
CONNECTION_ID = 'bigquery_default'
VASA_DATAFLOW_JAR = '/home/airflow/gcs/data/bin/vasa-dataflow-2022-03-25-16-36-09-008-all.jar'
default_args = {
'owner': 'airflow',
'depends_on_past': True,
'wait_for_downstream' : True,
'max_active_runs' : 1,
'start_date': days_ago(1),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(days=1),
}
with DAG(
dag_id = 'foursquare_1',
schedule_interval=timedelta(days=1),
default_args=default_args
) as dag:
kick_off_dag = DummyOperator(task_id='run_this_first')
extract_geodata = DataFlowJavaOperator(
task_id='extract_geodata',
jar=VASA_DATAFLOW_JAR,
job_class='com.nianticproject.geodata.extraction.ExtractGeodata',
options= {
"project": "vasa-dev",
"configurationPath": "/home/airflow/gcs/foursquare/extract.pb.txt",
"region": "us-central1",
"runner": "DataflowRunner",
"dryRun": "false",
"workerMachineType":"n2d-highmem-16",
},
dag=dag)
end_task = BashOperator(
task_id='end_task',
bash_command='echo {{ execution_date.subtract(months=1).replace(day=1).strftime("%Y-%m-%d") }}',
dag=dag,
)
kick_off_dag >> extract_geodata >> end_task

MysqlOperator in airflow 2.0.1 failed with "ssl connection error"

I am new to airflow and I am trying to test Mysql connection using MysqlOperator in airflow 2.0.1. However I am getting an error regarding to ssl connection error. I have tried to add extra parameters to disable ssl mode, but still I am getting the same error.
Here is my code, (I tried to pass the ssl param = disable in the code), and it doesn't work.
from airflow import DAG
from airflow.providers.mysql.operators.mysql import MySqlOperator
from airflow.operators.python import PythonOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.utils.dates import days_ago, timedelta
default_args = {
'owner' : 'airflow',
'depend_on_past' : False,
'start_date' : days_ago(2),
'retries' : 1,
'retry_delay' : timedelta(minutes=1)
}
with DAG(
'mysqlConnTest',
default_args=default_args,
schedule_interval='#once',
catchup=False) as dag:
start_date = DummyOperator(task_id = "start_task")
# [START howto_operator_mysql]
select_table_mysql_task = MySqlOperator(
task_id='select_table_mysql', mysql_conn_id='mysql', sql="SELECT * FROM country;"autocommit=True, parameters= {'ssl_mode': 'DISABLED'}
)
start_date >> select_table_mysql_task
and here is the error
*** Reading local file: /opt/airflow/logs/mysqlHookConnTest/select_table_mysql/2021-04-14T12:46:42.221662+00:00/2.log
[2021-04-14 12:47:46,791] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: mysqlHookConnTest.select_table_mysql 2021-04-14T12:46:42.221662+00:00 [queued]>
[2021-04-14 12:47:47,007] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: mysqlHookConnTest.select_table_mysql 2021-04-14T12:46:42.221662+00:00 [queued]>
[2021-04-14 12:47:47,047] {taskinstance.py:1042} INFO -
--------------------------------------------------------------------------------
[2021-04-14 12:47:47,054] {taskinstance.py:1043} INFO - Starting attempt 2 of 2
[2021-04-14 12:47:47,074] {taskinstance.py:1044} INFO -
--------------------------------------------------------------------------------
[2021-04-14 12:47:47,331] {taskinstance.py:1063} INFO - Executing <Task(MySqlOperator): select_table_mysql> on 2021-04-14T12:46:42.221662+00:00
[2021-04-14 12:47:47,377] {standard_task_runner.py:52} INFO - Started process 66 to run task
[2021-04-14 12:47:47,402] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'mysqlHookConnTest', 'select_table_mysql', '2021-04-14T12:46:42.221662+00:00', '--job-id', '142', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/MySqlHookConnTest.py', '--cfg-path', '/tmp/tmppujnrey3', '--error-file', '/tmp/tmpjl_g_p3t']
[2021-04-14 12:47:47,413] {standard_task_runner.py:77} INFO - Job 142: Subtask select_table_mysql
[2021-04-14 12:47:47,556] {logging_mixin.py:104} INFO - Running <TaskInstance: mysqlHookConnTest.select_table_mysql 2021-04-14T12:46:42.221662+00:00 [running]> on host ea95b9685a31
[2021-04-14 12:47:47,672] {taskinstance.py:1257} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=mysqlHookConnTest
AIRFLOW_CTX_TASK_ID=select_table_mysql
AIRFLOW_CTX_EXECUTION_DATE=2021-04-14T12:46:42.221662+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-04-14T12:46:42.221662+00:00
[2021-04-14 12:47:47,687] {mysql.py:72} INFO - Executing: SELECT idPais, Nombre, codigo, paisPlataforma, create_date, update_date FROM ob_cpanel.cpanel_pais;
[2021-04-14 12:47:47,710] {base.py:74} INFO - Using connection to: id: mysql. Host: sys-sql-pre-01.oneboxtickets.net, Port: 3306, Schema: , Login: lectura, Password: None, extra: None
[2021-04-14 12:47:48,134] {taskinstance.py:1455} ERROR - (2006, 'SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol')
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/mysql/operators/mysql.py", line 74, in execute
hook.run(self.sql, autocommit=self.autocommit, parameters=self.parameters)
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/hooks/dbapi.py", line 173, in run
with closing(self.get_conn()) as conn:
File "/home/airflow/.local/lib/python3.6/site-packages/airflow/providers/mysql/hooks/mysql.py", line 144, in get_conn
return MySQLdb.connect(**conn_config)
File "/home/airflow/.local/lib/python3.6/site-packages/MySQLdb/__init__.py", line 85, in Connect
return Connection(*args, **kwargs)
File "/home/airflow/.local/lib/python3.6/site-packages/MySQLdb/connections.py", line 208, in __init__
super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.OperationalError: (2006, 'SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol')
[2021-04-14 12:47:48,143] {taskinstance.py:1503} INFO - Marking task as FAILED. dag_id=mysqlHookConnTest, task_id=select_table_mysql, execution_date=20210414T124642, start_date=20210414T124746, end_date=20210414T124748
[2021-04-14 12:47:48,243] {local_task_job.py:146} INFO - Task exited with return code 1
We have tried to remove the last two parameter from the dag code, and we add in extra field(conn-airflow UI). Adding this json
{"ssl":false}
and the issue appears with another similar error
/opt/airflow/logs/mysqlOperatorConnTest/select_table_mysql/2021-04-15T11:26:50.578333+00:00/2.log
*** Fetching from: http://airflow-worker-0.airflow-worker.airflow.svc.cluster.local:8793/log/mysqlOperatorConnTest/select_table_mysql/2021-04-15T11:26:50.578333+00:00/2.log
[2021-04-15 11:27:54,471] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: mysqlOperatorConnTest.select_table_mysql 2021-04-15T11:26:50.578333+00:00 [queued]>
[2021-04-15 11:27:54,497] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: mysqlOperatorConnTest.select_table_mysql 2021-04-15T11:26:50.578333+00:00 [queued]>
[2021-04-15 11:27:54,497] {taskinstance.py:1042} INFO -
--------------------------------------------------------------------------------
[2021-04-15 11:27:54,497] {taskinstance.py:1043} INFO - Starting attempt 2 of 2
[2021-04-15 11:27:54,497] {taskinstance.py:1044} INFO -
--------------------------------------------------------------------------------
[2021-04-15 11:27:54,507] {taskinstance.py:1063} INFO - Executing <Task(MySqlOperator): select_table_mysql> on 2021-04-15T11:26:50.578333+00:00
[2021-04-15 11:27:54,510] {standard_task_runner.py:52} INFO - Started process 115 to run task
[2021-04-15 11:27:54,514] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'mysqlOperatorConnTest', 'select_table_mysql', '2021-04-15T11:26:50.578333+00:00', '--job-id', '68', '--pool', 'default_pool', '--raw', '--subdir', '/opt/airflow/dags/repo/MySqlOperatorConnTest.py', '--cfg-path', '/tmp/tmpy7bv58_z', '--error-file', '/tmp/tmpaoe808of']
[2021-04-15 11:27:54,514] {standard_task_runner.py:77} INFO - Job 68: Subtask select_table_mysql
[2021-04-15 11:27:54,644] {logging_mixin.py:104} INFO - Running <TaskInstance: mysqlOperatorConnTest.select_table_mysql 2021-04-15T11:26:50.578333+00:00 [running]> on host airflow-worker-0.airflow-worker.airflow.svc.cluster.local
[2021-04-15 11:27:54,707] {logging_mixin.py:104} WARNING - /opt/python/site-packages/sqlalchemy/sql/coercions.py:518 SAWarning: Coercing Subquery object into a select() for use in IN(); please pass a select() construct explicitly
[2021-04-15 11:27:54,725] {taskinstance.py:1255} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=mysqlOperatorConnTest
AIRFLOW_CTX_TASK_ID=select_table_mysql
AIRFLOW_CTX_EXECUTION_DATE=2021-04-15T11:26:50.578333+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-04-15T11:26:50.578333+00:00
[2021-04-15 11:27:54,726] {mysql.py:72} INFO - Executing: SELECT idPais, Nombre, codigo, paisPlataforma, create_date, update_date FROM ob_cpanel.cpanel_pais;
[2021-04-15 11:27:54,744] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11)
Traceback (most recent call last):
File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson
obj = json.loads(self.extra)
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11)
[2021-04-15 11:27:54,744] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql
[2021-04-15 11:27:54,744] {base.py:65} INFO - Using connection to: id: mysql. Host: sys-sql-pre-01.oneboxtickets.net, Port: 3306, Schema: , Login: lectura, Password: XXXXXXXX, extra: None
[2021-04-15 11:27:54,745] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11)
Traceback (most recent call last):
File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson
obj = json.loads(self.extra)
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11)
[2021-04-15 11:27:54,745] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql
[2021-04-15 11:27:54,745] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11)
Traceback (most recent call last):
File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson
obj = json.loads(self.extra)
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11)
[2021-04-15 11:27:54,745] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql
[2021-04-15 11:27:54,746] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11)
Traceback (most recent call last):
File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson
obj = json.loads(self.extra)
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11)
[2021-04-15 11:27:54,746] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql
[2021-04-15 11:27:54,746] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11)
Traceback (most recent call last):
File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson
obj = json.loads(self.extra)
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11)
[2021-04-15 11:27:54,746] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql
[2021-04-15 11:27:54,746] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11)
Traceback (most recent call last):
File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson
obj = json.loads(self.extra)
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11)
[2021-04-15 11:27:54,747] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql
[2021-04-15 11:27:54,747] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11)
Traceback (most recent call last):
File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson
obj = json.loads(self.extra)
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11)
[2021-04-15 11:27:54,747] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql
[2021-04-15 11:27:54,747] {connection.py:337} ERROR - Expecting value: line 2 column 9 (char 11)
Traceback (most recent call last):
File "/opt/python/site-packages/airflow/models/connection.py", line 335, in extra_dejson
obj = json.loads(self.extra)
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 9 (char 11)
[2021-04-15 11:27:54,747] {connection.py:338} ERROR - Failed parsing the json for conn_id mysql
[2021-04-15 11:27:54,787] {taskinstance.py:1455} ERROR - (2006, 'SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol')
Traceback (most recent call last):
File "/opt/python/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/opt/python/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/opt/python/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/mysql/operators/mysql.py", line 74, in execute
hook.run(self.sql, autocommit=self.autocommit, parameters=self.parameters)
File "/opt/python/site-packages/airflow/hooks/dbapi.py", line 173, in run
with closing(self.get_conn()) as conn:
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/mysql/hooks/mysql.py", line 144, in get_conn
return MySQLdb.connect(**conn_config)
File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/__init__.py", line 85, in Connect
return Connection(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 208, in __init__
super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.OperationalError: (2006, 'SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol')
[2021-04-15 11:27:54,788] {taskinstance.py:1496} INFO - Marking task as FAILED. dag_id=mysqlOperatorConnTest, task_id=select_table_mysql, execution_date=20210415T112650, start_date=20210415T112754, end_date=20210415T112754
[2021-04-15 11:27:54,845] {local_task_job.py:146} INFO - Task exited with return code 1
We solved this issue upgrading the Mysql client to 5.7. Our server version was 5.6 and the previous client was 8, as I was using docker image. so we downgraded the client to be more closer to the server version.

The triggered DAG can't get params from TriggerDagRunOperator

I've tried to trigger another dag with some paramters in a TriggerDagRunOperator, but in the triggered dag, the dag_run object is always None.
In the TriggerDagRunOperator, the message param is added into dag_run_obj's payload.
def conditionally_trigger(context, dag_run_obj):
if context['params']['condition_param']:
dag_run_obj.payload = {'message': context['params']['message']}
pp.pprint(dag_run_obj.payload)
return dag_run_obj
trigger = TriggerDagRunOperator(
task_id='test_trigger_dagrun',
trigger_dag_id="example_trigger_target_dag",
python_callable=conditionally_trigger,
params={'condition_param': True, 'message': 'Hello World'},
dag=dag,
)
I expected the triggered DAG could get it using kwargs['dag_run'].conf['message']) but unfortunately it doesn't work.
def run_this_func(ds, **kwargs):
print("Remotely received value of {} for key=message".
format(kwargs['dag_run'].conf['message']))
run_this = PythonOperator(
task_id='run_this',
provide_context=True,
python_callable=run_this_func,
dag=dag,
)
The dag_run object in kwargs is None
INFO - Executing <Task(PythonOperator): run_this> on 2019-01-18 16:10:18
INFO - Subtask: [2019-01-18 16:10:27,007] {models.py:1433} ERROR - 'NoneType' object has no attribute 'conf'
INFO - Subtask: Traceback (most recent call last):
INFO - Subtask: File "/Library/Python/2.7/site-packages/airflow/models.py", line 1390, in run
INFO - Subtask: result = task_copy.execute(context=context)
INFO - Subtask: File "/Library/Python/2.7/site-packages/airflow/operators/python_operator.py", line 80, in execute
INFO - Subtask: return_value = self.python_callable(*self.op_args, **self.op_kwargs)
INFO - Subtask: File "/Library/Python/2.7/site-packages/airflow/example_dags/example_trigger_target_dag.py", line 52, in run_this_func
INFO - Subtask: print("Remotely received value of {} for key=message".format(kwargs['dag_run'].conf['message']))
INFO - Subtask: AttributeError: 'NoneType' object has no attribute 'conf'
I also printed out the kwargs and indeed the 'dag_run' object is None.
The dags are sample code in Airflow so I'm not sure what happened.
Anybody knows the reason?
INFO - Subtask: kwargs: {u'next_execution_date': None, u'dag_run': None, u'tomorrow_ds_nodash': u'20190119', u'run_id': None, u'dag': <DAG: example_trigger_target_dag>, u'prev_execution_date': None, ...
BTW, If I trigger the DAG from CLI, it worked:
$ airflow trigger_dag 'example_trigger_target_dag' -r 'run_id' --conf '{"message":"test_cli"}'
Logs:
INFO - Subtask: kwargs: {u'next_execution_date': None, u'dag_run': <DagRun example_trigger_target_dag # 2019-01-18 ...
INFO - Subtask: Remotely received value of test_cli for key=message

Resources