Polling multiple SQS messages using Airflow SQSSensor - airflow

I am using this SQSSensoe settings to poll messages
fetch_sqs_message = SQSSensor(
task_id="...",
sqs_queue="...",
aws_conn_id="aws_default",
max_messages=10,
wait_time_seconds=30,
poke_interval=60,
timeout=300,
dag=dag
)
I would assume everytime it polls it should poll up to 10 messages. Which my queue has around 5 when I tested this.
But each time I trigger the dag, it only polls 1 message at a time, which I found out from the SQS message count.
Why is it doing this? How can I to get it poll as much messages as possible?

Recently, a new feature has been added to SQSSensor so that the sensor can polls SQS multiple times instead of only once.
You can check out this merged PR
For example, if num_batches is set to 3, SQSSensor will poll the queue 3 times before returning the results.
Disclaimer: I contributed to this feature.

Related

How to trigger Airflow DAG from AWS SQS?

I would like to trigger an Airflow DAD based on SQS messages. I am quite new to Airflow but this is how I think it should be done:
Option 1
Use the Airflow SQS Sensor. From my understanding, this waits on SQS messages to proceed with the execution of an already trigger DAG. Does this mean a DAG would always need to be running and waiting for SQS messages to catch any eventual new messages and process them? Does this also mean I should schedule my DAG on a very short interval so that when an SQS message gets handled by a DAG, another DAG is created to handle the next SQS messages?
Option 2
Add a lambda or something watching for SQS messages and using the Airflow API to trigger DAGs when needed.
Eventually, I would like to minimise the number of interactions needed to trigger a DAG so I would like to use an Airflow built-in way of watching SQS.
Thank you
Both options are valid however Option 2 is basically an alternative implementation to sensor. I think the better solution is Option 1 with some modification:
Use SQSSensor but with mode='reschedule' that way every once in a while the sensor is "awaking" checking if the criteria is met. Note that this is not like sleep(x). When the criteria isn't met Airflow will release the worker for other tasks that needs to run and return the SQSSensor to the scheduling queue.
You can read more about the sensor modes in the docs.
from airflow.providers.amazon.aws.sensors.sqs import SQSSensor
SQSSensor(
task_id='test_task',
dag=dag,
sqs_queue='your_queue',
aws_conn_id='aws_default',
mode='reschedule')
Note that the sensor will run indefinitely until the criteria is met. You can set timeout on the sensor task (there are other possible reasons for timeout like cluster policy and other defaults but that is another topic).

Azure Service Bus Topic trigger in Function App- limiting the number of message read

I have a service bus trigger in an Azure function app which reads the messages ( which are in Json format) coming from the subscription. I would like to know if there is a way to limit the number of request processed by Service bus. So for example if my service bus get triggered and it has 20 messages to be processed, I would like only the first 10 to be processed and then next 10. How can I achieve that?
I am asking this because I am doing some manipulation with the received messages, first i creating a list of the information and running some sql query over it in C# and would prefer my code to NOT handle all the messages at once.
you can configure this in the host.json. Here's the documentation:
learn.microsoft.com
Just add this "maxConcurrentCalls": 10 to the messageHandlerOptions, then it will just process 10 messages simultaneously.

grpc client completion queue not shutting down

My code performs the following:
1)Create grpc channel
2)start monitoring completion queue in a different thread
3)Issue shutdown on completion queue
After executing step 3, I expect "(cq.Next(&tag, &ok)" to return false as there are no pending events with above 3 steps. But it is observed that "(cq.Next(&tag, &ok)" never returns false. Please let me know if I am missing something.
Thanks,
Ikshu
In order to get channel state notification, a tag was being added to the queue and that use to always post some events. so the cq->next() never returned false. I fixed this issue by achieving same functionality by using already existing standard API for channel state. So closing the bug.

rabbitmq : message is not consumed by consumer , but publisher is able to publish message

I am using rabbitmq to for messaging in my service. Lets suppose there are 2 micro service A and B.
there are more 3 exchange and respective queue is there in between.
A is publisher and B is consumer here. while sending message from A it is successfully updating in queue( able to see in console queue is increases). But here consumer is not able to receive messages. previously it was working.
but for other exchange and queue , consumer is working fine.
I tried purze the queue and restarted application , didnt helped me. there is always 4 unached message in queue and rest is ready to Go. finally I deleted queue and exchange and respective routing key and recreated the same. then all working fine..
Can any one help me here what happened to this. Why it didnt worked?
Sometimes when some failure occur for message processing then if we throw error. it stucks there so go in infinite loop(queue-> processing -> queue > .....) if method keep throwing error.
for other messages who all are in queue, by increasing the batch concurrency we can execute the other. but unack message will be there.that will only go if someone stop consumer..
Now I have one question can i set limit retry for unacked message processing. if some one knows about it then can help here.

How to prevent a Hangfire recurring job from restarting after 30 minutes of continuous execution

I am working on an asp.net mvc-5 web application, and I am facing a problem in using Hangfire tool to run long running background jobs. the problem is that if the job execution exceed 30 minutes, then hangfire will automatically initiate another job, so I will end up having two similar jobs running at the same time.
Now I have the following:-
Asp.net mvc-5
IIS-8
Hangfire 1.4.6
Windows server 2012
Now I have defined a hangfire recurring job to run at 17:00 each day. The background job mainly scan our network for servers and vms and update the DB, and the recurring job will send an email after completing the execution.
The recurring job used to work well when its execution was less than 30 minutes. But today as our system grows, the recurring job completed after 40 minutes instead of 22-25 minutes as it used to be. and I received 2 emails instead of one email (and the time between the emails was around 30 minutes). Now I re-run the job manually and I have noted that that the problem is as follow:-
"when the recurring job reaches 30 minutes of continuous execution, a
new instance of the recurring job will start, so I will have two
instances instead of one running at the same time, so that why I received 2 emails."
Now if the recurring job takes less than 30 minutes (for example 29 minute) I will not face any problem, but if the recurring job execution exceeds 30 minutes then for a reason or another hangfire will initiate a new job.
although when I access the hangfire dashboard during the execution of the job, I can find that there is only one active job, when I monitor our DB I can see from the sql profiler that there are two jobs accessing the DB. this happens after 30 minutes from the beginning of the recurring job (at 17:30 in our case), and that why I received 2 emails which mean 2 recurring jobs were running in the background instead of one.
So can anyone advice on this please, how I can avoid hangfire from automatically initiating a new recurring job if the current recurring job execution exceeds 30 minutes?
Thanks
Did you look at InvisibilityTimeout setting from the Hangfire docs?
Default SQL Server job storage implementation uses a regular table as
a job queue. To be sure that a job will not be lost in case of
unexpected process termination, it is deleted only from a queue only
upon a successful completion.
To make it invisible from other workers, the UPDATE statement with
OUTPUT clause is used to fetch a queued job and update the FetchedAt
value (that signals for other workers that it was fetched) in an
atomic way. Other workers see the fetched timestamp and ignore a job.
But to handle the process termination, they will ignore a job only
during a specified amount of time (defaults to 30 minutes).
Although this mechanism ensures that every job will be processed,
sometimes it may cause either long retry latency or lead to multiple
job execution. Consider the following scenario:
Worker A fetched a job (runs for a hour) and started it at 12:00.
Worker B fetched the same job at 12:30, because the default invisibility timeout was expired.
Worker C (did not fetch) the same job at 13:00, because (it
will be deleted after successful performance.)
If you are using cancellation tokens, it will be set for Worker A at
12:30, and at 13:00 for Worker B. This may lead to the fact that your
long-running job will never be executed. If you aren’t using
cancellation tokens, it will be concurrently executed by WorkerA and
Worker B (since 12:30), but Worker C will not fetch it, because it
will be deleted after successful performance.
So, if you have long-running jobs, it is better to configure the
invisibility timeout interval:
var options = new SqlServerStorageOptions
{
InvisibilityTimeout = TimeSpan.FromMinutes(30) // default value
};
GlobalConfiguration.Configuration.UseSqlServerStorage("<name or connection string>", options);
As of Hangfire 1.5 this option is now Obsolete. Jobs that are being worked on are invisible to other workers.
Say goodbye to confusing invisibility timeout with unexpected
background job retries after 30 minutes (by default) when using SQL
Server. New Hangfire.SqlServer implementation uses plain old
transactions to fetch background jobs and hide them from other
workers.
Even after ungraceful shutdown, the job will be available for other
workers instantly, without any delays.
I was having trouble finding documentation on how to do this properly for a Postgresql database, every example I was see is using sqlserver, I found how the invisibility timeout was a property inside the PostgreSqlStorageOptions object, I found this here : https://github.com/frankhommers/Hangfire.PostgreSql/blob/master/src/Hangfire.PostgreSql/PostgreSqlStorageOptions.cs#L36. Luckily through trial and error I was able to figure out that the UsePostgreSqlStorage has an overload to accept this object. For .Net Core 2.0 when you are setting up the hangfire postgresql DB in the ConfigureServices method in the startup class add this(the default timeout is set to 30 mins):
services.AddHangfire(config =>
config.UsePostgreSqlStorage(Configuration.GetConnectionString("Hangfire1ConnectionString"), new PostgreSqlStorageOptions {
InvisibilityTimeout = TimeSpan.FromMinutes(720)
}));
I had this problem when using Hangfire.MemoryStorage as the storage provider. With memory storage you need to set the FetchNextJobTimeout in the MemoryStorageOptions, otherwise by default jobs will timeout after 30 minutes and a new job will be executed.
var options = new MemoryStorageOptions
{
FetchNextJobTimeout = TimeSpan.FromDays(1)
};
GlobalConfiguration.Configuration.UseMemoryStorage(options);
Just would like to point out that even though, it is stated the thing below:
As of Hangfire 1.5 this option is now Obsolete. Jobs that are being worked on are invisible to other workers.
Say goodbye to confusing invisibility timeout with unexpected background job retries after 30 minutes (by default) when using SQL Server. New Hangfire.SqlServer implementation uses plain old transactions to fetch background jobs and hide them from other workers.
Even after ungraceful shutdown, the job will be available for other workers instantly, without any delays.
It seems that for many people using MySQL, PostgreSQL, MongoDB, InvisibilityTimeout is still the way to go: https://github.com/HangfireIO/Hangfire/issues/1197

Resources