Airflow HttpSensor won't work - airflow

I'm trying to create a HttpSensor in Airflow using the following code:
wait_to_launch = HttpSensor(
task_id="wait-to-launch",
endpoint='http://' + socket.gethostname() + ":8500/v1/kv/launch-cluster?raw",
response_check=lambda response: True if 'oui'==response.content else False,
dag=dag
)
But I keep getting this error:
Traceback (most recent call last):
File "http_sensor_test.py", line 30, in <module>
dag=dag
File "/home/me/.local/lib/python2.7/site-packages/airflow/utils/decorators.py", line 86, in wrapper
result = func(*args, **kwargs)
File "/home/me/.local/lib/python2.7/site-packages/airflow/operators/sensors.py", line 663, in __init__
self.hook = hooks.http_hook.HttpHook(method='GET', http_conn_id=http_conn_id)
File "/home/me/.local/lib/python2.7/site-packages/airflow/utils/helpers.py", line 436, in __getattr__
raise AttributeError
AttributeError
What am I missing?

You are running into a known issue, see AIRFLOW-1030. A fix has been merged (#2180), but unfortunately is not yet on a released version of airflow. The fix is marked for the next release (1.9.0), but it could be weeks/months until that is out. You can run a fork of airflow with this change or add the updated version of the HttpSensor as a custom operator (plugin).

Related

AirflowException("Task received SIGTERM signal")

I'm running Airflow with Docker swarm on 5 servers. After using 2 months, there are some errors on Dags like this. These errors occurred in dags that using a custom hive operator (similar to the inner function ) and no error occurred before 2 months. (Nothing changed with Dags...)
Also, if I tried to retry dag, sometimes it succeeded and sometimes it failed.
The really weird thing about this issue is that hive job was not failed. After the task was marked as failed in the airflow webserver (Sigterm), the query was complete after 1~10 mins.
As a result, flow is like this.
Task start -> 5~10 mins -> error (sigterm, airflow) -> 1~10 mins -> hive job success (hadoop log)
[2023-01-09 08:06:07,583] {local_task_job.py:208} WARNING - State of this instance has been externally set to up_for_retry. Terminating instance.
[2023-01-09 08:06:07,588] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 135213
[2023-01-09 08:06:07,588] {taskinstance.py:1236} ERROR - Received SIGTERM. Terminating subprocesses.
[2023-01-09 08:13:42,510] {taskinstance.py:1463} ERROR - Task failed with exception
Traceback (most recent call last):
File "/opt/airflow/dags/common/operator/hive_q_operator.py", line 81, in execute
cur.execute(statement) # hive query custom operator
File "/home/airflow/.local/lib/python3.8/site-packages/pyhive/hive.py", line 454, in execute
response = self._connection.client.ExecuteStatement(req)
File "/home/airflow/.local/lib/python3.8/site-packages/TCLIService/TCLIService.py", line 280, in ExecuteStatement
return self.recv_ExecuteStatement()
File "/home/airflow/.local/lib/python3.8/site-packages/TCLIService/TCLIService.py", line 292, in recv_ExecuteStatement
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py", line 134, in readMessageBegin
sz = self.readI32()
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py", line 217, in readI32
buff = self.trans.readAll(4)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/transport/TTransport.py", line 62, in readAll
chunk = self.read(sz - have)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift_sasl/__init__.py", line 173, in read
self._read_frame()
File "/home/airflow/.local/lib/python3.8/site-packages/thrift_sasl/__init__.py", line 177, in _read_frame
header = self._trans_read_all(4)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift_sasl/__init__.py", line 210, in _trans_read_all
return read_all(sz)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/transport/TTransport.py", line 62, in readAll
chunk = self.read(sz - have)
File "/home/airflow/.local/lib/python3.8/site-packages/thrift/transport/TSocket.py", line 150, in read
buff = self.handle.recv(sz)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1238, in signal_handler
raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
I already restarted the airflow server and there was nothing changed.
Here is the failed task's log (flower log)
Is there any helpful guide for me?
Thanks :)
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/celery/app/trace.py", line 412, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/celery/app/trace.py", line 704, in __protected_call__
return self.run(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/celery_executor.py", line 88, in execute_command
_execute_in_fork(command_to_exec)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/celery_executor.py", line 99, in _execute_in_fork
raise AirflowException('Celery command failed on host: ' + get_hostname())
airflow.exceptions.AirflowException: Celery command failed on host: 8be4caa25d17

Airflow DagRunAlreadyExists even after providing the custom run id and execution date

I am Getting DagRunAlreadyExists exception even after providing the custom run id and execution date.
This occurs when there are multiple request within a second.
Here is the MWAA CLI call
def get_unique_key():
from datetime import datetime
import random
import shortuuid
import string
timestamp = datetime.now().strftime(DT_FMT_HMSf)
random_str = timestamp + ''.join(random.choice(string.digits + string.ascii_letters) for _ in range(8))
uuid_str = shortuuid.ShortUUID().random(length=12)
return '{}{}'.format(uuid_str, random_str)
execution_date = datetime.utcnow().strftime("%Y-%m-%dT%H:%m:%S.%f")
dag_run_id = get_unique_key()
workflow_id = "my_workflow"
conf = json.dumps({"foo": "bar"})
"dags trigger {0} -c '{1}' -r {2} -e {3}".format(workflow_id, conf, dag_run_id, execution_date)
and here is the error log from MWAA CLI. If this can help to debug the issue.
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/dag_command.py", line 138, in dag_trigger
dag_id=args.dag_id, run_id=args.run_id, conf=args.conf, execution_date=args.exec_date
File "/usr/local/lib/python3.7/site-packages/airflow/api/client/local_client.py", line 30, in trigger_dag
dag_id=dag_id, run_id=run_id, conf=conf, execution_date=execution_date
File "/usr/local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py", line 125, in trigger_dag
replace_microseconds=replace_microseconds,
File "/usr/local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py", line 75, in _trigger_dag
f"A Dag Run already exists for dag id {dag_id} at {execution_date} with run id {run_id}"
airflow.exceptions.DagRunAlreadyExists: A Dag Run already exists for dag id my_workflow at 2022-10-18T06:10:28+00:00 with run id CL4Adauihkvz121928332658Gp6bsTWU
The problem is that the execution_date resolution is seconds . Airflow ignoring the milliseconds.
You can see in the error that no milliseconds mentioned in the execution_date (2022-10-18T06:10:28)

Trouble branching DAGs in Airflow Taskflow API

I am new to Airflow. Using Taskflow API, I am trying to dynamically change the flow of DAGs. If a condition is met, the two step workflow should be executed a second time.
After defining two functions/tasks, if I fix the DAG sequence as below, everything works fine.
#dag(default_args=default_args, schedule_interval=None, start_date=days_ago(2))
def genesis(**kwargs):
#task()
def extract():
print("X")
#task()
def add_timeframe():
print("Y")
extracted_data = extract()
timeframe_data = add_timeframe(extracted_data)
However, I write any conditional logic to trigger the second run (either inside a DAG or after the function/task definitions), I get the error below. The error seems to be about setting upstream tasks. But the older "task.set_upstream(task2)" commands don't work in Taskflow Airflow 2.0.
All examples I could find of conditionally branching were based on the non Taskflow API. Please help.
#dag(default_args=default_args, schedule_interval=None, start_date=days_ago(2))
def genesis(**kwargs):
#task()
def extract():
print("X")
if <condition>:
extracted_data2 = extract()
timeframe_data2 = add_timeframe(extracted_data2)
#task()
def add_timeframe():
print("Y")
extracted_data = extract()
timeframe_data = add_timeframe(extracted_data)
ERROR - Tried to create relationships between tasks that don't have DAGs yet. Set the DAG for at least one task and try again: [<Task(_PythonDecoratedOperator): add_timeframe>, <Task(_PythonDecoratedOperator): extract>]
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1086, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1260, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1300, in _execute_task
result = task_copy.execute(context=context)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/operators/python.py", line 233, in execute
return_value = self.python_callable(*self.op_args, **self.op_kwargs)
File "/opt/airflow/dags/genesis.py", line 77, in extract
timeframe_data = add_timeframe(extracted_data)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/operators/python.py", line 294, in factory
**kwargs,
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 91, in __call__
obj.set_xcomargs_dependencies()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 722, in set_xcomargs_dependencies
apply_set_upstream(arg)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 711, in apply_set_upstream
apply_set_upstream(elem)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 708, in apply_set_upstream
self.set_upstream(arg.operator)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1239, in set_upstream
self._set_relatives(task_or_task_list, upstream=True)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 1205, in _set_relatives
"task and try again: {}".format([self] + task_list)

Apache Airflow dagrun_operator in Airflow Version 1.10.10

On Migrating Airflow from V1.10.2 to V1.10.10 One of our DAG have a task which is of dagrun_operator type.
Code snippet of the task looks something as below. Please assume that DAG dag_process_pos exists.
The DAG that is being triggered by the TriggerDagRunOperator is dag_process_pos. That starts with task of type dummy_operator [ Just a hint if this could be the trouble maker ]
task_trigger_dag_positional = TriggerDagRunOperator(
trigger_dag_id="dag_process_pos",
python_callable=set_up_dag_run_preprocessing,
task_id="trigger_preprocess_dag",
on_failure_callback=log_failure,
execution_date=datetime.now(),
provide_context=False,
owner='airflow')
def set_up_dag_run_preprocessing(context, dag_run_obj):
ti = context['ti']
dag_name = context['ti'].task.trigger_dag_id
dag_run = context['dag_run']
trans_id = dag_run.conf['transaction_id']
routing_info = ti.xcom_pull(task_ids="json_validation", key="route_info")
new_file_path = routing_info['file_location']
new_file_name = os.path.basename(routing_info['new_file_name'])
file_path = os.path.join(new_file_path, new_file_name)
batch_id = "123-AD-FF"
dag_run_obj.payload = {'inputfilepath': file_path,
'transaction_id': trans_id,
'Id': batch_id}
The DAG runs all fine. In fact the python callable of the task mentioned until the last line. Then it errors out.
[2020-06-09 11:36:22,838] {taskinstance.py:1145} ERROR - No row was found for one()
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/dagrun_operator.py", line 95, in execute
replace_microseconds=False)
File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 141, in trigger_dag
replace_microseconds=replace_microseconds,
File "/usr/local/lib/python3.6/site-packages/airflow/api/common/experimental/trigger_dag.py", line 98, in _trigger_dag
external_trigger=True,
File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/models/dag.py", line 1471, in create_dagrun
run.refresh_from_db()
File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/airflow/models/dagrun.py", line 109, in refresh_from_db
DR.run_id == self.run_id
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3446, in one
raise orm_exc.NoResultFound("No row was found for one()")
sqlalchemy.orm.exc.NoResultFound: No row was found for one()
After which the on_failure_callback of that task is executed and all code of that callable runs perfectly ok as is expected. The query here is why did the dagrun_operator fail after the python callable.
P.S : The DAG that is being triggered by the TriggerDagRunOperator , in this case dag_process_pos starts with task of typedummy_operator

why I got the errors PartitionOwnedError and ConsumerStoppedException when starting a few consumers

I use pykafka to fetch message from kafka topic, and then do some process and update to mongodb. As the pymongodb can update only one item every time, so I start 100 processes. But when starting, some processes occoured errors "PartitionOwnedError and ConsumerStoppedException". I don't know why.
Thank you.
kafka_cfg = conf['kafka']
kafka_client = KafkaClient(kafka_cfg['broker_list'])
topic = kafka_client.topics[topic_name]
balanced_consumer = topic.get_balanced_consumer(
consumer_group=group,
auto_commit_enable=kafka_cfg['auto_commit_enable'],
zookeeper_connect=kafka_cfg['zookeeper_list'],
zookeeper_connection_timeout_ms = kafka_cfg['zookeeper_conn_timeout_ms'],
consumer_timeout_ms = kafka_cfg['consumer_timeout_ms'],
)
while(1):
for msg in balanced_consumer:
if msg is not None:
try:
value = eval(msg.value)
id = long(value.pop("id"))
value["when_update"] = datetime.datetime.now()
query = {"_id": id}}
result = collection.update_one(query, {"$set": value}, True)
except Exception, e:
log.error("Fail to update: %s, msg: %s", e, msg.value)
>
Traceback (most recent call last):
File "dump_daily_summary.py", line 182, in <module>
dump_daily_summary.run()
File "dump_daily_summary.py", line 133, in run
for msg in self.balanced_consumer:
File "/data/share/python2.7/lib/python2.7/site-packages/pykafka-2.5.0.dev1-py2.7-linux-x86_64.egg/pykafka/balancedconsumer.py", line 745, in __iter__
message = self.consume(block=True)
File "/data/share/python2.7/lib/python2.7/site-packages/pykafka-2.5.0.dev1-py2.7-linux-x86_64.egg/pykafka/balancedconsumer.py", line 734, in consume
raise ConsumerStoppedException
pykafka.exceptions.ConsumerStoppedException
>
Traceback (most recent call last):
File "dump_daily_summary.py", line 182, in <module>
dump_daily_summary.run()
File "dump_daily_summary.py", line 133, in run
for msg in self.balanced_consumer:
File "/data/share/python2.7/lib/python2.7/site-packages/pykafka-2.5.0.dev1-py2.7-linux-x86_64.egg/pykafka/balancedconsumer.py", line 745, in __iter__
message = self.consume(block=True)
File "/data/share/python2.7/lib/python2.7/site-packages/pykafka-2.5.0.dev1-py2.7-linux-x86_64.egg/pykafka/balancedconsumer.py", line 726, in consume
self._raise_worker_exceptions()
File "/data/share/python2.7/lib/python2.7/site-packages/pykafka-2.5.0.dev1-py2.7-linux-x86_64.egg/pykafka/balancedconsumer.py", line 271, in _raise_worker_exceptions
raise ex
pykafka.exceptions.PartitionOwnedError
PartitionOwnedError: check if there are some background process consuming in the same consumer_group, maybe there are not enough available partitions for starting another consumer.
ConsumerStoppedException: you can try upgrading your pykafka version (https://github.com/Parsely/pykafka/issues/574)
I met the same problem like you. But, I confused about others' solutions like adding enough partitions for consumers or updating the version of pykafka.
In fact, mine satisfied those conditions above.
Here is the version of tools:
python 2.7.10
kafka 2.11-0.10.0.0
zookeeper 3.4.8
pykafka 2.5.0
Here is my code:
class KafkaService(object):
def __init__(self, topic):
self.client_hosts = get_conf("kafka_conf", "client_host", "string")
self.topic = topic
self.con_group = topic
self.zk_connect = get_conf("kafka_conf", "zk_connect", "string")
def kafka_consumer(self):
"""kafka-consumer client, using pykafka
:return: {"id": 1, "url": "www.baidu.com", "sitename": "baidu"}
"""
from pykafka import KafkaClient
consumer = ""
try:
kafka = KafkaClient(hosts=str(self.client_hosts))
topic = kafka.topics[self.topic]
consumer = topic.get_balanced_consumer(
consumer_group=self.con_group,
auto_commit_enable=True,
zookeeper_connect=self.zk_connect,
)
except Exception as e:
logger.error(str(e))
while True:
message = consumer.consume(block=False)
if message:
print "message:", message.value
yield message.value
The two exceptions(ConsumerStoppedException and PartitionOwnedError), are raised by the function consum(block=True) of pykafka.balancedconsumer.
Of course, I recommend you to read the source code of that function.
There is a argument block=True, after altering it to False, the programme can not fall into the exceptions.
Then the kafka consumers work fine.
This behavior is affected by a longstanding bug that was recently discovered and is currently being fixed. The workaround we've used in production at Parse.ly is to run our consumers in an environment that handles automatically restarting them when they crash with these errors until all partitions are owned.

Resources