Currently our Airflow metrics are sent to Datadog with https://docs.datadoghq.com/integrations/airflow/?tab=host. There's a metric airflow.executor.queued_tasks: the number of queued tasks on executor, but I'm looking for the queued task name. Any idea of how to get it?
More context: I would like to get an alert if a task has been queued for more than X minute.
Related
I have a DAG with an HttpSensor set to mode="reschedule". This task is hitting an API that will return True after 100 seconds for a specified run. I thought that setting my sensor to mode="reschedule" would free the worker so that it could pick up a new task or start working on a new run. This does not seem to be the case, as I can have 16 parallel Runs whether they are in up_for_reschedule or running status.
Am I misunderstanding what mode="reschedule" does?
Or is there a limit to the number of concurrent runs I can have set elsewhere? If so, is there a way to set the number of workers (so as to not run out of resources) but have an unlimited number of runs?
The rescheduled task:
wait_for_sleep = HttpSensor(
task_id='http_sensor_check',
http_conn_id='',
endpoint="http://0.0.0.0:8081/long_sleep/{{run_id}}",
request_params={},
response_check=lambda response: response.text == "True",
poke_interval=60,
dag=dag,
mode='reschedule',
)
I also scaled up the number of concurrent runs a DAG could have, so that it exceeded the number of max_active_tasks_per_dag. Thinking this would mean 16 tasks in the running state and an unlimited number in the up_for_reschedule state. However, it looks like that is not the case.
I cannot start my DAG because of this Error. I thought it was because of sensor, but as I edited my Dag couple of times it is still showing this problem so it is not connected to the code of my DAG.
Error message - "Executor reports task instance finished (failed) although the task says its queued. Was the task killed externally?"
GRPC server does queue the requests and serve them based on the maxWorker configuration which is passed when the sever starts up. How do I print the metric- number of items in the queue .? Essentially, I would like to keep track of the numbers of requests in waiting state.
You can pass your own executor to serverBuilder.executor(). Note that you are then responsible for shutting down the executor service after the server has terminated.
The solution for grpc-python is similar to grpc-java. You can pass your customized future executor to the server constructor, and monitor the submission of task yourself.
gRPC Python API: grpc.server(thread_pool, ...)
The executor class to extend: concurrent.futures.ThreadPoolExecutor
JMS Queue is having 2 consumers, synchronous and asynchronous Java application process waiting for the response.
1)Synchronous application send request and will be waiting for the response for 60 seconds based on the JMS correlation ID.
2)Asynchronous thread will be constantly listening on the same queue.
In this scenario, when the response is received on the queue within 60 second I would expect load is distributed on both synchronous and asynchronous application. However, for some unknown reason almost all the response messages are consumed by synchronous process. And,only in some cases the messages are picked up asynchronous process.
Are there any factors that could cause only synchronous application to pick almost all the messages?
There is usually no guarantee that the load will be distributed evenly, especially if its synchronous versus async. consumer. The synchronous consumer will have to poll, wait, poll, wait while the async. consumer is probably waiting on the socket in a separate thread until a message arrives and then call your callback. So the async. consumer will most always be there first.
Any chance you can change to Topics and discard messages you don't wont ? Or change your sync. consumer to be async ? Another alternative would be to build a small 'asnyc' gateway in front of your synchronous consumer: a little application that makes an async consumption and then copies each message received to a second queue where the sync. consumer picks it up. Depending on your JMS provider it might support this type of 'JMS bridge' already - what are you using ?
I have a Nifi processor that is calling an external service that can take days before a result is returned. During this time the processor can call Thread.sleep() periodically to relinquish CPU.
The issue is that even if Thread.sleep() is called in an onTrigger() method, the NiFi processor will not read in and handle new FlowFiles since it is waiting for onTrigger() to finish. From NiFi's perspective the cpu is still blocking for the asynchronous call to finish.
Is there a way to maintain concurrency when asynchronous calls are being made in the onTrigger() method of a NiFi processor?
Val Bonn's suggestion of pushing asynchronous FlowFiles back to a WAIT queue works well. As asynchronous requests come in, java Process objects are created and held in memory. The FlowFile is then routed to a WAIT relationship which is connected back into the processor. Periodically FlowFiles from the WAIT queue are checked against the corresponding Process to see if it completed and are then routed to a SUCCESS relationship, otherwise they are penalized. This allows many long running asynchronous processes to be kicked off without allocating precious cpu resources for each incoming request. One source of complexity was handling processor shutdowns invoked from the UI. In these situations an onStopped method is invoked that waits for all in memory processes to complete and archives the stderr and stdout to disk. When the processor is started again, the archive is read back in and paired against any FlowFiles in the WAIT queue.