Apache Airflow dagrun_operator in Airflow Version 1.10.10 - airflow

On Migrating Airflow from V1.10.2 to V1.10.10 One of our DAG have a task which is of dagrun_operator type.
Code snippet of the task looks something as below. Please assume that DAG dag_process_pos exists.
The DAG that is being triggered by the TriggerDagRunOperator is dag_process_pos. That starts with task of type dummy_operator [ Just a hint if this could be the trouble maker ]
task_trigger_dag_positional = TriggerDagRunOperator(
trigger_dag_id="dag_process_pos",
python_callable=set_up_dag_run_preprocessing,
task_id="trigger_preprocess_dag",
on_failure_callback=log_failure,
execution_date=datetime.now(),
provide_context=False,
owner='airflow')
def set_up_dag_run_preprocessing(context, dag_run_obj):
ti = context['ti']
dag_name = context['ti'].task.trigger_dag_id
dag_run = context['dag_run']
trans_id = dag_run.conf['transaction_id']
routing_info = ti.xcom_pull(task_ids="json_validation", key="route_info")
new_file_path = routing_info['file_location']
new_file_name = os.path.basename(routing_info['new_file_name'])
file_path = os.path.join(new_file_path, new_file_name)
batch_id = "123-AD-FF"
dag_run_obj.payload = {'inputfilepath': file_path,
'transaction_id': trans_id,
'Id': batch_id}
The DAG runs all fine. In fact the python callable of the task mentioned until the last line. Then it errors out.
[2020-06-09 11:36:22,838] {taskinstance.py:1145} ERROR - No row was found for one()
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/dagrun_operator.py", line 95, in execute
replace_microseconds=False)
File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag
replace_microseconds=replace_microseconds,
File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in _trigger_dag
external_trigger=True,
File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/models/dag.py", line 1471, in create_dagrun
run.refresh_from_db()
File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/models/dagrun.py", line 109, in refresh_from_db
DR.run_id == self.run_id
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3446, in one
raise orm_exc.NoResultFound("No row was found for one()")
sqlalchemy.orm.exc.NoResultFound: No row was found for one()
After which the on_failure_callback of that task is executed and all code of that callable runs perfectly ok as is expected. The query here is why did the dagrun_operator fail after the python callable.
P.S : The DAG that is being triggered by the TriggerDagRunOperator , in this case dag_process_pos starts with task of typedummy_operator

Related

AirflowException("Task received SIGTERM signal")

I'm running Airflow with Docker swarm on 5 servers. After using 2 months, there are some errors on Dags like this. These errors occurred in dags that using a custom hive operator (similar to the inner function ) and no error occurred before 2 months. (Nothing changed with Dags...)
Also, if I tried to retry dag, sometimes it succeeded and sometimes it failed.
The really weird thing about this issue is that hive job was not failed. After the task was marked as failed in the airflow webserver (Sigterm), the query was complete after 1~10 mins.
As a result, flow is like this.
Task start -> 5~10 mins -> error (sigterm, airflow) -> 1~10 mins -> hive job success (hadoop log)
[2023-01-09 08:06:07,583] {local_task_job.py:208} WARNING - State of this instance has been externally set to up_for_retry. Terminating instance.
[2023-01-09 08:06:07,588] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 135213
[2023-01-09 08:06:07,588] {taskinstance.py:1236} ERROR - Received SIGTERM. Terminating subprocesses.
[2023-01-09 08:13:42,510] {taskinstance.py:1463} ERROR - Task failed with exception
Traceback (most recent call last):
File "/opt/airflow/dags/common/operator/hive_q_operator.py", line 81, in execute
cur.execute(statement) # hive query custom operator
File "/home/airflow/.local/lib/python3.8/site-packages/pyhive/hive.py", line 454, in execute
response = self._connection.client.ExecuteStatement(req)
File "/home/airflow/.local/lib/python3.8/site-packages/TCLIService/TCLIService.py", line 280, in ExecuteStatement
return self.recv_ExecuteStatement()
File "/home/airflow/.local/lib/python3.8/site-packages/TCLIService/TCLIService.py", line 292, in recv_ExecuteStatement
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py", line 134, in readMessageBegin
sz = self.readI32()
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py", line 217, in readI32
buff = self.trans.readAll(4)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/transport/TTransport.py", line 62, in readAll
chunk = self.read(sz - have)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift_sasl/__init__.py", line 173, in read
self._read_frame()
File "/home/airflow/.local/lib/python3.8/site-packages/thrift_sasl/__init__.py", line 177, in _read_frame
header = self._trans_read_all(4)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift_sasl/__init__.py", line 210, in _trans_read_all
return read_all(sz)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/transport/TTransport.py", line 62, in readAll
chunk = self.read(sz - have)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/transport/TSocket.py", line 150, in read
buff = self.handle.recv(sz)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1238, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
I already restarted the airflow server and there was nothing changed.
Here is the failed task's log (flower log)
Is there any helpful guide for me?
Thanks :)
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/celery/app/trace.py", line 412, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/celery/app/trace.py", line 704, in __protected_call__
return self.run(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/celery_executor.py", line 88, in execute_command
_execute_in_fork(command_to_exec)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/celery_executor.py", line 99, in _execute_in_fork
raise AirflowException('Celery command failed on host: ' + get_hostname())
airflow.exceptions.AirflowException: Celery command failed on host: 8be4caa25d17

Airflow DagRunAlreadyExists even after providing the custom run id and execution date

I am Getting DagRunAlreadyExists exception even after providing the custom run id and execution date.
This occurs when there are multiple request within a second.
Here is the MWAA CLI call
def get_unique_key():
from datetime import datetime
import random
import shortuuid
import string
timestamp = datetime.now().strftime(DT_FMT_HMSf)
random_str = timestamp + ''.join(random.choice(string.digits + string.ascii_letters) for _ in range(8))
uuid_str = shortuuid.ShortUUID().random(length=12)
return '{}{}'.format(uuid_str, random_str)
execution_date = datetime.utcnow().strftime("%Y-%m-%dT%H:%m:%S.%f")
dag_run_id = get_unique_key()
workflow_id = "my_workflow"
conf = json.dumps({"foo": "bar"})
"dags trigger {0} -c '{1}' -r {2} -e {3}".format(workflow_id, conf, dag_run_id, execution_date)
and here is the error log from MWAA CLI. If this can help to debug the issue.
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/dag_command.py", line 138, in dag_trigger
dag_id=args.dag_id, run_id=args.run_id, conf=args.conf, execution_date=args.exec_date
File "/usr/local/lib/python3.7/site-packages/airflow/api/client/local_client.py", line 30, in trigger_dag
dag_id=dag_id, run_id=run_id, conf=conf, execution_date=execution_date
File "/usr/local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py", line 125, in trigger_dag
replace_microseconds=replace_microseconds,
File "/usr/local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py", line 75, in _trigger_dag
f"A Dag Run already exists for dag id {dag_id} at {execution_date} with run id {run_id}"
airflow.exceptions.DagRunAlreadyExists: A Dag Run already exists for dag id my_workflow at 2022-10-18T06:10:28+00:00 with run id CL4Adauihkvz121928332658Gp6bsTWU
The problem is that the execution_date resolution is seconds . Airflow ignoring the milliseconds.
You can see in the error that no milliseconds mentioned in the execution_date (2022-10-18T06:10:28)

Make Airflow show print-statements on run-time (not after task is completed)

Say I have the following DAG (stuff omitted for clarity)
#dag.py
from airflow.operators.python import PytonOperator
def main():
print("Task 1")
#some code
print("Task 2")
#some more code
print("Done")
return 0
t1 = PythonOperator(python_callable=main)
t1
Say the program fails at #some more code due to e.g RAM-issues I just get an error in my log e.g
[2021-05-25 12:49:54,211] {process_utils.py:137} INFO - Output:
[2021-05-25 12:52:44,605] {taskinstance.py:1482} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python.py", line 493, in execute
super().execute(context=serializable_context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python.py", line 117, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.6/site-packages/airflow/operators/python.py", line 531, in execute_callable
string_args_filename,
File "/usr/local/lib/python3.6/site-packages/airflow/utils/process_utils.py", line 145, in execute_in_subprocess
raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['/tmp/venv2wbjnabi/bin/python', '/tmp/venv2wbjnabi/script.py', '/tmp/venv2wbjnabi/script.in', '/tmp/venv2wbjnabi/script.out', '/tmp/venv2wbjnabi/string_args.txt']' died with <Signals.SIGKILL: 9>.
[2021-05-25 13:00:55,733] {taskinstance.py:1532} INFO - Marking task as FAILED. dag_id=test_dag, task_id=clean_data, execution_date=20210525T105621, start_date=20210525T105732, end_date=20210525T110055
[2021-05-25 13:00:56,555] {local_task_job.py:146} INFO - Task exited with return code 1
but none of the print-statements are printed thus I don't know where the program failed (I know it now due to debugging).
I assume, due to that, that Airflow don't flush before the task is marked as "success". Is there a way to make Airflow flush on runtime/print on runtime?

Trouble branching DAGs in Airflow Taskflow API

I am new to Airflow. Using Taskflow API, I am trying to dynamically change the flow of DAGs. If a condition is met, the two step workflow should be executed a second time.
After defining two functions/tasks, if I fix the DAG sequence as below, everything works fine.
#dag(default_args=default_args, schedule_interval=None, start_date=days_ago(2))
def genesis(**kwargs):
#task()
def extract():
print("X")
#task()
def add_timeframe():
print("Y")
extracted_data = extract()
timeframe_data = add_timeframe(extracted_data)
However, I write any conditional logic to trigger the second run (either inside a DAG or after the function/task definitions), I get the error below. The error seems to be about setting upstream tasks. But the older "task.set_upstream(task2)" commands don't work in Taskflow Airflow 2.0.
All examples I could find of conditionally branching were based on the non Taskflow API. Please help.
#dag(default_args=default_args, schedule_interval=None, start_date=days_ago(2))
def genesis(**kwargs):
#task()
def extract():
print("X")
if <condition>:
extracted_data2 = extract()
timeframe_data2 = add_timeframe(extracted_data2)
#task()
def add_timeframe():
print("Y")
extracted_data = extract()
timeframe_data = add_timeframe(extracted_data)
ERROR - Tried to create relationships between tasks that don't have DAGs yet. Set the DAG for at least one task and try again: [<Task(_PythonDecoratedOperator): add_timeframe>, <Task(_PythonDecoratedOperator): extract>]
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1086, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1260, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1300, in _execute_task
result = task_copy.execute(context=context)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/operators/python.py", line 233, in execute
return_value = self.python_callable(*self.op_args, **self.op_kwargs)
File "/opt/airflow/dags/genesis.py", line 77, in extract
timeframe_data = add_timeframe(extracted_data)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/operators/python.py", line 294, in factory
**kwargs,
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 91, in __call__
obj.set_xcomargs_dependencies()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 722, in set_xcomargs_dependencies
apply_set_upstream(arg)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 711, in apply_set_upstream
apply_set_upstream(elem)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 708, in apply_set_upstream
self.set_upstream(arg.operator)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1239, in set_upstream
self._set_relatives(task_or_task_list, upstream=True)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1205, in _set_relatives
"task and try again: {}".format([self] + task_list)

Airflow HttpSensor won't work

I'm trying to create a HttpSensor in Airflow using the following code:
wait_to_launch = HttpSensor(
task_id="wait-to-launch",
endpoint='http://' + socket.gethostname() + ":8500/v1/kv/launch-cluster?raw",
response_check=lambda response: True if 'oui'==response.content else False,
dag=dag
)
But I keep getting this error:
Traceback (most recent call last):
File "http_sensor_test.py", line 30, in <module>
dag=dag
File "/home/me/.local/lib/python2.7/site-packages/airflow/utils/decorators.py", line 86, in wrapper
result = func(*args, **kwargs)
File "/home/me/.local/lib/python2.7/site-packages/airflow/operators/sensors.py", line 663, in __init__
self.hook = hooks.http_hook.HttpHook(method='GET', http_conn_id=http_conn_id)
File "/home/me/.local/lib/python2.7/site-packages/airflow/utils/helpers.py", line 436, in __getattr__
raise AttributeError
AttributeError
What am I missing?
You are running into a known issue, see AIRFLOW-1030. A fix has been merged (#2180), but unfortunately is not yet on a released version of airflow. The fix is marked for the next release (1.9.0), but it could be weeks/months until that is out. You can run a fork of airflow with this change or add the updated version of the HttpSensor as a custom operator (plugin).

Resources