Airflow PythonOperator template_dict raises error TemplateNotFound(template) - airflow

I'm trying to pass bar.sql through the PythonOperator's template_dict for use in the python_callable, like the docs mention, but this is the closest example I've found. I've also reviewed this question which references Airflow 1.8, but the solution did not work for me in practice - I'm using Airflow 2.2.4.
(Also, there seems to be a well known BashOperator issue (question and docs references) where TemplateNotFound errors are common. For the BashOperator, you can troubleshoot by changing command='script.sh' to command='script.sh ', but I did not have any such luck using this with my .sql file passed to PythonOperator's template_dict.)
My task below is resulting in the logs raising an error TemplateNotFound(template): bar.sql
with DAG(
'bigquery-dag',
default_args=default_args,
schedule_interval=timedelta(days=1),
start_date=datetime(2021, 1, 1),
catchup=False,
template_searchpath=['usr/local/airflow/include']
) as dag:
start = DummyOperator(
task_id='start',
on_success_callback=some_other_function
)
t1 = PythonOperator(
task_id='sql_printer',
python_callable=sqlPrinter,
templates_dict={'sql': 'bar.sql'},
templates_exts=['.sql',],
provide_context=True
)
start >> t1
My goal is for bar.sql to be available for use in sqlPrinter
-- ~/include/bar.sql
select 'hello world'
def sqlPrinter(**context):
print(f"sql: {context['templates_dict']['sql']}")
The result I would like to see is
>>> sql: select 'hello world'
Below is the DAG and error log from sql_printer the log results.
[2022-04-04, 22:32:29 ] {taskinstance.py:1264} INFO - Executing <Task(PythonOperator): sql_printer> on 2022-04-05 03:32:27.076984+00:00
[2022-04-04, 22:32:29 ] {standard_task_runner.py:52} INFO - Started process 15289 to run task
[2022-04-04, 22:32:29 ] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'bigquery-dag', 'sql_printer', 'manual__2022-04-05T03:32:27.076984+00:00', '--job-id', '456', '--raw', '--subdir', 'DAGS_FOLDER/bigquery-dag.py', '--cfg-path', '/tmp/tmp0fjl_t2a', '--error-file', '/tmp/tmpuy00moli']
[2022-04-04, 22:32:29 ] {standard_task_runner.py:77} INFO - Job 456: Subtask sql_printer
[2022-04-04, 22:32:29 ] {logging_mixin.py:109} INFO - Running <TaskInstance: bigquery-dag.sql_printer manual__2022-04-05T03:32:27.076984+00:00 [running]> on host 1296ec2abf88
[2022-04-04, 22:32:30 ] {taskinstance.py:1718} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1334, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1423, in _execute_task_with_callbacks
self.render_templates(context=context)
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 2011, in render_templates
self.task.render_template_fields(context)
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1061, in render_template_fields
self._do_render_template_fields(self, self.template_fields, context, jinja_env, set())
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1074, in _do_render_template_fields
rendered_content = self.render_template(content, context, jinja_env, seen_oids)
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1131, in render_template
return {key: self.render_template(value, context, jinja_env) for key, value in content.items()}
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1131, in <dictcomp>
return {key: self.render_template(value, context, jinja_env) for key, value in content.items()}
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1108, in render_template
template = jinja_env.get_template(content)
File "/usr/local/lib/python3.9/site-packages/jinja2/environment.py", line 997, in get_template
return self._load_template(name, globals)
File "/usr/local/lib/python3.9/site-packages/jinja2/environment.py", line 958, in _load_template
template = self.loader.load(self, name, self.make_globals(globals))
File "/usr/local/lib/python3.9/site-packages/jinja2/loaders.py", line 125, in load
source, filename, uptodate = self.get_source(environment, name)
File "/usr/local/lib/python3.9/site-packages/jinja2/loaders.py", line 214, in get_source
raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: bar.sql

try to use templates_dict={'query': 'bar.sql'}
t1 = PythonOperator(
task_id='sql_printer',
python_callable=sqlPrinter,
templates_dict={'query': 'bar.sql'},
provide_context=True
)
def sqlPrinter(**context):
print(f"sql: {context['templates_dict']['query']}")
the idea is came from this post

Related

Airflow PostgresOperator :Task failed with exception while using postgres_conn_id="redshift"

~$ airflow version
2.1.2
python 3.8
I am trying to execute some basic queries on my redshift cluster using a dag but the task is failing with an exception(not shown in the logs)
import datetime
import logging
from airflow import DAG
from airflow.contrib.hooks.aws_hook import AwsHook
from airflow.hooks.postgres_hook import PostgresHook
from airflow.operators.postgres_operator import PostgresOperator
from airflow.operators.python_operator import PythonOperator
import sql_statements
def load_data_to_redshift(*args, **kwargs):
aws_hook = AwsHook("aws_credentials")
credentials = aws_hook.get_credentials()
redshift_hook = PostgresHook("redshift")
sql_stmt = sql_statements.COPY_ALL_data_SQL.format(
credentials.access_key,
credentials.secret_key,
)
redshift_hook.run(sql_stmt)
dag = DAG(
'exercise1',
start_date=datetime.datetime.now()
)
create_t1_table = PostgresOperator(
task_id="create_t1_table",
dag=dag,
postgres_conn_id="redshift_default",
sql=sql_statements.CREATE_t1_TABLE_SQL
)
create_t2_table = PostgresOperator(
task_id="create_t2_table",
dag=dag,
postgres_conn_id="redshift_default",
sql=sql_statements.CREATE_t2_TABLE_SQL,
)
create_t1_table >> create_t2_table
following is the exception
[2021-09-17 05:23:33,902] {base.py:69} INFO - Using connection to: id: redshift_default. Host: rdscluster.123455.us-west-2.redshift.amazonaws.com, Port: 5439, Schema: udac, Login: ***, Password: ***, extra: {}
[2021-09-17 05:23:33,903] {taskinstance.py:1501} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/8085/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/8085/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1331, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/8085/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1361, in _execute_task
result = task_copy.execute(context=context)
File "/home/8085/.local/lib/python3.8/site-packages/airflow/providers/postgres/operators/postgres.py", line 70, in execute
self.hook.run(self.sql, self.autocommit, parameters=self.parameters)
File "/home/8085/.local/lib/python3.8/site-packages/airflow/hooks/dbapi.py", line 177, in run
with closing(self.get_conn()) as conn:
File "/home/8085/.local/lib/python3.8/site-packages/airflow/providers/postgres/hooks/postgres.py", line 115, in get_conn
self.conn = psycopg2.connect(**conn_args)
File "/home/8085/.local/lib/python3.8/site-packages/psycopg2/__init__.py", line 124, in connect
conn = psycopg2.connect("dbname=airflow user=abc password=ubantu host=127.0.0.1 port=5432")
File "/home/8085/.local/lib/python3.8/site-packages/psycopg2abc/__init__.py", line 124, in connect
conn = psycopg2.connect("dbname=airflow user=abc password=abc host=127.0.0.1 port=5432")
File "/home/8085/.local/lib/python3.8/site-packages/psycopg2/__init__.py", line 124, in connect
conn = psycopg2.connect("dbname=airflow user=abc password=abc host=127.0.0.1 port=5432")
[Previous line repeated 974 more times]
RecursionError: maximum recursion depth exceeded
[2021-09-17 05:23:33,907] {taskinstance.py:1544} INFO - Marking task as FAILED. dag_id=exercise1, task_id=create_t1_table, execution_date=20210917T092331, start_date=20210917T092333, end_date=20210917T092333
[2021-09-17 05:23:33,953] {local_task_job.py:149} INFO - Task exited with return code 1
I can't tell from the logs what is going wrong here, it appears that even after providing redshift connection ID the PostgresOperator is using default Postgres connection configured while installing the Airflow webserver but I could be wrong.
Any idea how do I resolve this or get more log out of airflow? (note I already tried with different airflow log levels from airflow config it didn't help either)
redshift - connection is defined properly and I can connect to redshift using another standalone python utility as well as plsql, so there is no issue with Redshift cluster.
-Thanks,
Resolved:
Somehow following file was referring to the airflow postgres DB created during the Airflow installation rather than connecting to the local postgres.
File "/home/8085/.local/lib/python3.8/site-packages/psycopg2/__init__.py", line 124, in connect
**conn = psycopg2.connect("dbname=airflow user=abc password=abc host=127.0.0.1 port=5432")**
Had to recreate the airflow DB from scratch to resolve the issue.

Airflow unexpected argument 'mounts'

I'm trying to set up an Airflow ETL pipeline that extracts images from the .bag file. I wanna extract it inside docker and I'm using DockerOperator. Docker image is pulled from private GitLab repository. The script I want to run is a python script inside a Docker container. The .bag file is on my external-SSD so I'm trying to mount it inside docker. Is there something wrong with the code or is it a different kind of problem?
Error:
[2021-09-16 10:39:17,010] {docker.py:246} INFO - Starting docker container from image registry.gitlab.com/url/of/gitlab:a24a3f05
[2021-09-16 10:39:17,010] {taskinstance.py:1462} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/filip/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1164, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/filip/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1282, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/filip/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1312, in _execute_task
result = task_copy.execute(context=context)
File "/home/filip/.local/lib/python3.6/site-packages/airflow/providers/docker/operators/docker.py", line 343, in execute
return self._run_image()
File "/home/filip/.local/lib/python3.6/site-packages/airflow/providers/docker/operators/docker.py", line 265, in _run_image
return self._run_image_with_mounts(self.mounts, add_tmp_variable=False)
File "/home/filip/.local/lib/python3.6/site-packages/airflow/providers/docker/operators/docker.py", line 287, in _run_image_with_mounts
privileged=self.privileged,
File "/usr/lib/python3/dist-packages/docker/api/container.py", line 607, in create_host_config
return HostConfig(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'mounts'
[2021-09-16 10:39:17,014] {taskinstance.py:1512} INFO - Marking task as FAILED. dag_id=ETL-test, task_id=docker_extract, execution_date=20210916T083912, start_date=20210916T083915, end_date=20210916T083917
[2021-09-16 10:39:17,062] {local_task_job.py:151} INFO - Task exited with return code 1
[2021-09-16 10:39:17,085] {local_task_job.py:261} INFO - 0 downstream tasks scheduled from follow-on schedule check
This is my code :
from airflow import DAG
from airflow.utils.dates import days_ago
from datetime import datetime, timedelta
from airflow.operators.dummy import DummyOperator
from airflow.providers.docker.operators.docker import DockerOperator
from docker.types import Mount
from airflow.operators.bash_operator import BashOperator
ssd_dir=Mount(source='/media/filip/external-ssd', target='/external-ssd', type='bind')
dag = DAG(
'ETL-test',
default_args = {
'owner' : 'admin',
'description' : 'Extract data from bag, simple test',
'depend_on_past' : False,
'start_date' : datetime(2021, 9, 13),
},
)
start_dag = DummyOperator(
task_id='start_dag',
dag=dag
)
extract = DockerOperator(
api_version="auto",
task_id='docker_extract',
image='registry.gitlab.com/url/of/gitlab:a24a3f05',
container_name='extract-test',
mounts=[ssd_dir],
auto_remove = True,
force_pull = False,
mount_tmp_dir=False,
command='python3 rgb_image_extraction.py --bagfile /external-ssd/2021-09-01-13-17-10.bag --output_dir /external-ssd/airflow --camera_topic /kirby1/vm0/stereo/left/color/image_rect --every_n_img 20 --timestamp_as_name',
docker_conn_id='gitlab_registry',
dag=dag
)
test = BashOperator(
task_id='print_hello',
bash_command='echo "hello world"',
dag=dag
)
start_dag >> extract >> test
I think you have an old docker python library installed. If you want to make sure airflow 2.1.0 works, you should always use constraints mechanism as described in https://airflow.apache.org/docs/apache-airflow/stable/installation.html otherwise you risk you will have outdated dependencies.
For example if you use Python 3.6, the right constraints are https://raw.githubusercontent.com/apache/airflow/constraints-2.1.3/constraints-3.6.txt and there docker python library is 5.0.0 I bet you have much older version.

Airflow: How to pass data from a decorated task to SimpleHttpOperator?

I recently started using Apache airflow. In am using Taskflow API with one decorated task with id Get_payload and SimpleHttpOperator. Task Get_payload gets data from database, does some data manipulation and returns a dict as payload.
Probelm
Unable to pass data from previous task into the next task. Yes I am aware of XComs but whole purpose of using Taskflow API is to avoid direct interactions with XComs. Getting below error when get_data is directly passed to data property of SimpleHttpOperator.
airflow.exceptions.AirflowException: 400:BAD REQUEST
What have I tried so far?
As mentioned in this SO answer, I used template_field in my custom sensor to define the field in which to expect the data from the previous task. In case of SimpleHttpOperator operator I cannot edit it to do the same. So how to solve it similarly in SimpleHttpOperator?
I have checkd this SO answer and this as well.
DAG:
from airflow.decorators import dag, task
from airflow.providers.http.operators.http import SimpleHttpOperator
from datetime import datetime
default_args = {
"owner": "airflow",
"start_date": datetime(2021, 1, 1),
}
#dag(default_args=default_args, schedule_interval=None, tags=["Http Operators"])
def http_operator():
#task(multiple_outputs=True)
def Get_payload(**kwargs):
# STEP 1: Get data from database.
# STEP 2: Manipulate data.
# STEP 3: Return payload.
data = {
"key_1": "Value 1",
"key_2": "Value 2",
"key_3": "Value 3",
"key_4": "Value 4",
}
return data
get_data = Get_payload()
ml_api = SimpleHttpOperator(
task_id="some_api",
http_conn_id="http_conn_id",
method="POST",
endpoint="/some-path",
data=get_data,
headers={"Content-Type": "application/json"},
)
get_data >> ml_api
http_operator_dag = http_operator()
Full log:
[2021-08-28 20:28:12,947] {taskinstance.py:903} INFO - Dependencies all met for <TaskInstance: http_operator.clf_api 2021-08-28T20:28:10.265689+00:00 [queued]>
[2021-08-28 20:28:12,970] {taskinstance.py:903} INFO - Dependencies all met for <TaskInstance: http_operator.clf_api 2021-08-28T20:28:10.265689+00:00 [queued]>
[2021-08-28 20:28:12,970] {taskinstance.py:1094} INFO -
--------------------------------------------------------------------------------
[2021-08-28 20:28:12,971] {taskinstance.py:1095} INFO - Starting attempt 1 of 1
[2021-08-28 20:28:12,971] {taskinstance.py:1096} INFO -
--------------------------------------------------------------------------------
[2021-08-28 20:28:12,982] {taskinstance.py:1114} INFO - Executing <Task(SimpleHttpOperator): clf_api> on 2021-08-28T20:28:10.265689+00:00
[2021-08-28 20:28:12,987] {standard_task_runner.py:52} INFO - Started process 19229 to run task
[2021-08-28 20:28:12,991] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'http_operator', 'clf_api', '2021-08-28T20:28:10.265689+00:00', '--job-id', '71', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/Http_Operator.py', '--cfg-path', '/tmp/tmp4l9hwi4q', '--error-file', '/tmp/tmpk1yrhtki']
[2021-08-28 20:28:12,993] {standard_task_runner.py:77} INFO - Job 71: Subtask clf_api
[2021-08-28 20:28:13,048] {logging_mixin.py:109} INFO - Running <TaskInstance: http_operator.clf_api 2021-08-28T20:28:10.265689+00:00 [running]> on host d332abee08c8
[2021-08-28 20:28:13,126] {taskinstance.py:1251} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=http_operator
AIRFLOW_CTX_TASK_ID=clf_api
AIRFLOW_CTX_EXECUTION_DATE=2021-08-28T20:28:10.265689+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-08-28T20:28:10.265689+00:00
[2021-08-28 20:28:13,128] {http.py:111} INFO - Calling HTTP method
[2021-08-28 20:28:13,141] {base.py:70} INFO - Using connection to: id: ML_API. Host: <IP-REMOVED>, Port: None, Schema: , Login: dexter, Password: ***, extra: {}
[2021-08-28 20:28:13,144] {http.py:140} INFO - Sending 'POST' to url: http://<IP-REMOVED>/classify
[2021-08-28 20:28:13,841] {http.py:154} ERROR - HTTP error: BAD REQUEST
[2021-08-28 20:28:13,842] {http.py:155} ERROR - <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>400 Bad Request</title>
<h1>Bad Request</h1>
<p>Failed to decode JSON object: Expecting value: line 1 column 1 (char 0)</p>
[2021-08-28 20:28:13,874] {taskinstance.py:1462} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/http/hooks/http.py", line 152, in check_response
response.raise_for_status()
File "/home/airflow/.local/lib/python3.8/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: BAD REQUEST for url: http://<IP-REMOVED>/classify
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1164, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1282, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1312, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/http/operators/http.py", line 113, in execute
response = http.run(self.endpoint, self.data, self.headers, self.extra_options)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/http/hooks/http.py", line 141, in run
return self.run_and_check(session, prepped_request, extra_options)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/http/hooks/http.py", line 198, in run_and_check
self.check_response(response)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/http/hooks/http.py", line 156, in check_response
raise AirflowException(str(response.status_code) + ":" + response.reason)
airflow.exceptions.AirflowException: 400:BAD REQUEST
[2021-08-28 20:28:13,882] {taskinstance.py:1505} INFO - Marking task as FAILED. dag_id=http_operator, task_id=clf_api, execution_date=20210828T202810, start_date=20210828T202812, end_date=20210828T202813
[2021-08-28 20:28:13,969] {local_task_job.py:151} INFO - Task exited with return code 1
[2021-08-28 20:28:14,043] {local_task_job.py:261} INFO - 0 downstream tasks scheduled from follow-on schedule check
As suggested by #Josh Fell in the comments, I had two mistakes in my DAG.
Wrap the data in json.dumps(data) before returning it from Get_payload.
Remove multiple_outputs=True from the task decorator of Get_payload.
Final code:
import json
from airflow.decorators import dag, task
from airflow.providers.http.operators.http import SimpleHttpOperator
from datetime import datetime
default_args = {
"owner": "airflow",
"start_date": datetime(2021, 1, 1),
}
#dag(default_args=default_args, schedule_interval=None, tags=["Http Operators"])
def http_operator():
#task()
def Get_payload(**kwargs):
# STEP 1: Get data from database.
# STEP 2: Manipulate data.
# STEP 3: Return payload.
data = {
"key_1": "Value 1",
"key_2": "Value 2",
"key_3": "Value 3",
"key_4": "Value 4",
}
return json.dumps(data)
get_data = Get_payload()
ml_api = SimpleHttpOperator(
task_id="some_api",
http_conn_id="http_conn_id",
method="POST",
endpoint="/some-path",
data=get_data,
headers={"Content-Type": "application/json"},
)
get_data >> ml_api
http_operator_dag = http_operator()

Make Airflow show print-statements on run-time (not after task is completed)

Say I have the following DAG (stuff omitted for clarity)
#dag.py
from airflow.operators.python import PytonOperator
def main():
print("Task 1")
#some code
print("Task 2")
#some more code
print("Done")
return 0
t1 = PythonOperator(python_callable=main)
t1
Say the program fails at #some more code due to e.g RAM-issues I just get an error in my log e.g
[2021-05-25 12:49:54,211] {process_utils.py:137} INFO - Output:
[2021-05-25 12:52:44,605] {taskinstance.py:1482} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python.py", line 493, in execute
super().execute(context=serializable_context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python.py", line 117, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python.py", line 531, in execute_callable
string_args_filename,
File "/usr/local/lib/python3.6/site-packages/airflow/utils/process_utils.py", line 145, in execute_in_subprocess
raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['/tmp/venv2wbjnabi/bin/python', '/tmp/venv2wbjnabi/script.py', '/tmp/venv2wbjnabi/script.in', '/tmp/venv2wbjnabi/script.out', '/tmp/venv2wbjnabi/string_args.txt']' died with <Signals.SIGKILL: 9>.
[2021-05-25 13:00:55,733] {taskinstance.py:1532} INFO - Marking task as FAILED. dag_id=test_dag, task_id=clean_data, execution_date=20210525T105621, start_date=20210525T105732, end_date=20210525T110055
[2021-05-25 13:00:56,555] {local_task_job.py:146} INFO - Task exited with return code 1
but none of the print-statements are printed thus I don't know where the program failed (I know it now due to debugging).
I assume, due to that, that Airflow don't flush before the task is marked as "success". Is there a way to make Airflow flush on runtime/print on runtime?

Apache Airflow dagrun_operator in Airflow Version 1.10.10

On Migrating Airflow from V1.10.2 to V1.10.10 One of our DAG have a task which is of dagrun_operator type.
Code snippet of the task looks something as below. Please assume that DAG dag_process_pos exists.
The DAG that is being triggered by the TriggerDagRunOperator is dag_process_pos. That starts with task of type dummy_operator [ Just a hint if this could be the trouble maker ]
task_trigger_dag_positional = TriggerDagRunOperator(
trigger_dag_id="dag_process_pos",
python_callable=set_up_dag_run_preprocessing,
task_id="trigger_preprocess_dag",
on_failure_callback=log_failure,
execution_date=datetime.now(),
provide_context=False,
owner='airflow')
def set_up_dag_run_preprocessing(context, dag_run_obj):
ti = context['ti']
dag_name = context['ti'].task.trigger_dag_id
dag_run = context['dag_run']
trans_id = dag_run.conf['transaction_id']
routing_info = ti.xcom_pull(task_ids="json_validation", key="route_info")
new_file_path = routing_info['file_location']
new_file_name = os.path.basename(routing_info['new_file_name'])
file_path = os.path.join(new_file_path, new_file_name)
batch_id = "123-AD-FF"
dag_run_obj.payload = {'inputfilepath': file_path,
'transaction_id': trans_id,
'Id': batch_id}
The DAG runs all fine. In fact the python callable of the task mentioned until the last line. Then it errors out.
[2020-06-09 11:36:22,838] {taskinstance.py:1145} ERROR - No row was found for one()
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/dagrun_operator.py", line 95, in execute
replace_microseconds=False)
File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag
replace_microseconds=replace_microseconds,
File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in _trigger_dag
external_trigger=True,
File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/models/dag.py", line 1471, in create_dagrun
run.refresh_from_db()
File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/models/dagrun.py", line 109, in refresh_from_db
DR.run_id == self.run_id
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3446, in one
raise orm_exc.NoResultFound("No row was found for one()")
sqlalchemy.orm.exc.NoResultFound: No row was found for one()
After which the on_failure_callback of that task is executed and all code of that callable runs perfectly ok as is expected. The query here is why did the dagrun_operator fail after the python callable.
P.S : The DAG that is being triggered by the TriggerDagRunOperator , in this case dag_process_pos starts with task of typedummy_operator

Resources