Google Cloud Tasks: some tasks remain in queue instead of being processed - google-cloud-tasks

I have a Google Cloud Task queue (rate: 10/s, bucket: 200, concurrent: 1) that dispatches tasks to a worker in a App Engine service (python 2.7 runtime) Tasks are normally added to the queue about 3-4/s. Each task is processed one at a time (no concurrency)
In general, each task is processed very fast(less than 1sg). Surprisingly, the queue sometimes randomly "pauses" a small subset of 5-20 tasks. New incoming tasks are processed as usual but those ones are blocked and stay on the queue for some minutes, even when worker is idle and might process them. After 7-9 minutes, they are processed automatically without any other interaction. Issue is this delay is too much and not acceptable :(
While "paused", I can manually execute those tasks by clicking on the "Run" button and they are immediately processed. So I'd discard some kind of limitation on the worker side.
I tried redeploying the queue.yaml. I also tried pausing and resuming the queue. Both with no effect.
No errors are notified. Tasks are not retried, just ignored for some minutes.
Has anybody experienced this behavior? Any help will be appreciated. Thanks.

Cloud Tasks now uses gcloud (Cloud SDK) to manage the queue configuration. queue.yaml is a part of the legacy App Engine SDK for App Engine Task Queues. Uploading a queue.yaml when using Cloud Tasks may cause your queue to be disabled or paused.
To learn more about queue management see, Using Queue Management versus queue.yaml.
To learn more about migrating from Task Queues to Cloud Tasks, see Migrating from Task Queues to Cloud Tasks

Related

RabbitMQ - fair distribution of (user related) task processing

We have a multi-user application which is working on tasks initiated by the users.
These tasks are running asynchronously therefore we are using RabbitMQ technology for the task distribution.
In the first version of the program the tasks of every user are sent immediately to only one worker queue, so we faced the following problem, if "User A" sends his tasks quicker than "User B", then "User B" has to wait while the tasks of "User A" are all completed.
The next version of this program we want to give fair distribution so we introduced user related queues, so in the first phase the tasks of every user are sent to the user's own queue and we created a consumer which reads messages from all of the users queue and forwards these messages to the worker queue we used in the first version of the program. What we expected by this solution we could equally read messages from the users queue if we would limit the message size of the worker queue. But it is not working all of the user's tasks are immediately forwarded to the worker queue just like the message limitation of the worker does not exist.
I think it should work, but we missed some configuration...
Thank you, if somebody could help us.

Scheduling thousands of tasks with Airflow

We are considering to use Airflow for a project that needs to do thousands of calls a day to external APIs in order to download external data, where each call might take many minutes.
One option we are considering is to create a task for each distinct API call, however this will lead to thousands of tasks. Rendering all those tasks in UI is going to be challenging. We are also worried about the scheduler, which may struggle with so many tasks.
Other option is to have just a few parallel long-running tasks and then implement our own scheduler within those tasks. We can add a custom code into PythonOperator, which will query the database and will decide which API to call next.
Perhaps Airflow is not well suited for such a use case and it would be easier and better to implement such a system outside of Airflow? Does anyone have experience with running thousands of tasks in Airflow and can shed some light on pros and cons on the above use case?
One task per call would kill Airflow as it still needs to check on the status of each task at every heartbeat - even if the processing of the task (worker) is separate e.g. on K8s.
Not sure where you plan on running Airflow but if on GCP and a download is not longer than 9 min, you could use the following:
task (PythonOperator) -> pubsub -> cloud function (to retrieve) -> pubsub -> function (to save result to backend).
The latter function may not be required but we (re)use a generic and simple "bigquery streamer".
Finally, you query in a downstream AF task (PythonSensor) the number of results in the backend and compare with the number of requests published.
We do this quite efficiently for 100K API calls to a third-party system we host on GCP as we maximize parallelism. The nice thing of GCF is that you can tweak the architecture to use and concurrency, instead of provisioning a VM or container to run the tasks.

Play framework job queue

How does play handle asynchronous jobs when they are called using the now() method?
Are they executed immediately, or are they stored in a queue and processed by a fixed number of threads? What sort of control do we have over that?
When you call now(), your job is put into a ScheduledThreadPoolExecutor via submit(). Since the executor uses a fixed-size pool, your job may end up being queued. Also, the pool is shared with your scheduled jobs , so you may have contention with them in addition to any jobs you spawned on demand.
You can adjust the size of the pool in your application's configuration, using the play.jobs.pool setting. The default value is 10.

Long-running ASP.NET tasks

I know there's a bunch of APIs out there that do this, but I also know that the hosting environment (being ASP.NET) puts restrictions on what you can reliably do in a separate thread.
I could be completely wrong, so please correct me if I am, this is however what I think I know.
A request typically timeouts after 120 seconds (this is configurable) but eventually the ASP.NET runtime will kill a request that's taking too long to complete.
The hosting environment, typically IIS, employs process recycling and can at any point decide to recycle your app. When this happens all threads are aborted and the app restarts. I'm however not sure how aggressive it is, it would be kind of stupid to assume that it would abort a normal ongoing HTTP request but I would expect it to abort a thread because it doesn't know anything about the unit of work of a thread.
If you had to create a programming model that easily and reliably and theoretically put a long running task, that would have to run for days, how would you accomplish this from within an ASP.NET application?
The following are my thoughts on the issue:
I've been thinking a long the line of hosting a WCF service in a win32 service. And talk to the service through WCF. This is however not very practical, because the only reason I would choose to do so, is to send tasks (units of work) from several different web apps. I'd then eventually ask the service for status updates and act accordingly. My biggest concern with this is that it would NOT be a particular great experience if I had to deploy every task to the service for it to be able to execute some instructions. There's also this issue of input, how would I feed this service with data if I had a large data set and needed to chew through it?
What I typically do right now is this
SELECT TOP 10 *
FROM WorkItem WITH (ROWLOCK, UPDLOCK, READPAST)
WHERE WorkCompleted IS NULL
It allows me to use a SQL Server database as a work queue and periodically poll the database with this query for work. If the work item completed with success, I mark it as done and proceed until there's nothing more to do. What I don't like is that I could theoretically be interrupted at any point and if I'm in-between success and marking it as done, I could end up processing the same work item twice. I might be a bit paranoid and this might be all fine but as I understand it there's no guarantee that that won't happen...
I know there's been similar questions on SO before but non really answers with a definitive answer. This is a really common thing, yet the ASP.NET hosting environment is ill equipped to handle long-running work.
Please share your thoughts.
Have a look at NServiceBus
NServiceBus is an open source
communications framework for .NET with
build in support for publish/subscribe
and long-running processes.
It is a technology build upon MSMQ, which means that your messages don't get lost since they are persisted to disk. Nevertheless the Framework has an impressive performance and an intuitive API.
John,
I agree that ASP.NET is not suitable for Async tasks as you have described them, nor should it be. It is designed as a web hosting platform, not a back of house processor.
We have had similar situations in the past and we have used a solution similar to what you have described. In summary, keep your WCF service under ASP.NET, use a "Queue" table with a Windows Service as the "QueueProcessor". The client should poll to see if work is done (or use messaging to notify the client).
We used a table that contained the process and it's information (eg InvoicingRun). On that table was a status (Pending, Running, Completed, Failed). The client would submit a new InvoicingRun with a status of Pending. A Windows service (the processor) would poll the database to get any runs that in the pending stage (you could also use SQL Notification so you don't need to poll. If a pending run was found, it would move it to running, do the processing and then move it to completed/failed.
In the case where the process failed fatally (eg DB down, process killed), the run would be left in a running state, and human intervention was required. If the process failed in an non-fatal state (exception, error), the process would be moved to failed, and you can choose to retry or have human intervantion.
If there were multiple processors, the first one to move it to a running state got that job. You can use this method to prevent the job being run twice. Alternate is to do the select then update to running under a transaction. Make sure either of these outside a transaction larger transaction. Sample (rough) SQL:
UPDATE InvoicingRun
SET Status = 2 -- Running
WHERE ID = 1
AND Status = 1 -- Pending
IF ##RowCount = 0
SELECT Cast(0 as bit)
ELSE
SELECT Cast(1 as bit)
Rob
Use a simple background tasks / jobs framework like Hangfire and apply these best practice principals to the design of the rest of your solution:
Keep all actions as small as possible; to achieve this, you should-
Divide long running jobs into batches and queue them (in a Hangfire queue or on a bus of another sort)
Make sure your small jobs (batched parts of long jobs) are idempotent (have all the context they need to run in any order). This way you don't have to use a quete which maintains a sequence; because then you can
Parallelise the execution of jobs in your queue depending on how many nodes you have in your web server farm. You can even control how much load this subjects your farm to (as a trade off to servicing web requests). This ensures that you complete the whole job (all batches) as fast and as efficiently as possible, while not compromising your cluster from servicing web clients.
Have thought about the use the Workflow Foundation instead of your custom implementation? It also allows you to persist states. Tasks could be defined as workflows in this case.
Just some thoughts...
Michael

Keeping my web app running after Browser close

I have a aspx web application that updates or adds files in a database. The clients access through the browser and one of the requirements is that they can start the update and be able to close the browser while the update continues. It appears to run for a little bit after I close the browser but then it stops. How can you keep the application running for asp.net?
That's something you could very well solve with WF (Workflow Foundation). Create a workflow for the task that should survive closing the browser. Workflows have their own threads and livecycles separate from ASP.NET.
The web application will keep running in the application pool, but this will be recycled eventually. As long as the users session runs the application should be kept alive, so by upping the session timeout you may fix the problem.
A better approach though would be to move the long-running task into a service instead, but that may require a rewrite of your application.
Usually for long-running or asynchronous processing, you want to dispatch the request to a back-end service to handle. Trying to keep the web-app alive to finish processing can lead to problems, especially with HTTP and session timeouts.
A common pattern for this is to put the request on a message queue and let a back-end service process it when it can.
I would create a separate windows service that you can push jobs onto from your web application, then check the status of the job(s) when the user logs in again.
The windows service won't be tied to the asp.net app domain so it will continue to run regardless of whats happening in your web application.
I've run into this pattern and you have to decouple the work from the HTTP request. The way we've solved it is to abstract the computing to be done as an event to be scheduled. So, say a user at a browser takes an action that requires a long lived (relatively) computation on the back end, this computation is given a name like 'doXYZForUser' and given a prameter vector like (userId, params...) and sent off to the work queue. Some time in the future the user logs in again and can see what the status of their job is.
I'm running a Java stack and a Java Message Service (JMS) but the principle is the same. The request from the browser queues up an event and the browser get an ACK back saying the event is on the work queue. The queue is managed by an entirely separately running process which in .NET I believe is just called the Message Queue. The job comes up on the queue gets processed and the results can be placed in a separate table containing a reference to the user that kicked off the job, so the next time they log in job status/results can be returned.

Resources