Symfony2 Job Queue or Parallel Processing? - symfony

Does anyone know how to run a number of processes in the background either through a job queue or parallel processing.
I have a number of maintenance updates that take time to run and want to do this in the background.

I would recomment Gearman server, it prooved quite stable, it's totally outside of Symfony2, and you have to have server up and running (don't know what your hosting options are), but it distribues jobs perfectly. In skiniest version, it just keeps all jobs in-memory, but you can configure it to use sqlite database as backup, so for any reason server reboots, or gearman deamon breaks, you can just start it again, and your jobs will be perserved. I konw it has been tested with very large loads (adding up 1k jobs per second), and it stood it's ground. It's probably more stable nowdays, I'm speaking from experience 2 yrs ago, where we offloaded some long-running tasks in ZF application to background processing via Gearman. It should be quite self-explanitory how it works from image below:

Checkout RabbitMq. It's the most popular option according to knpbundles.com

Take a look at http://github.com/mmoreram/rsqueue-bundle
Uses Redis as queue core and will be mantained.

Take a look at enqueue libraty. There are a lot of transports (AMQP, STOMP, AmazonSQS, Redis, Filesystem, Doctrine DBAL and more) to choose from. Easy to use and feature rich. That would be enough for simple job queue, though if you need something more sophisticated look at enqueue/job-queue. It can run an exclusive job (only one job running at a given time) or a job with sub-jobs, or a job with something to do after it has been done.
Of course, there is a bundle for it

Related

How to setup bidirectional rsync?

I tend to run simulations on a cluster that produces files larger than 100MB and I can't sync my computer with the cluster. So I considered setting up rsync between the two by following this link.
However, I believe this is just a cron job to sync the backup server with the main server and doesn't work in both directions. What will be the stepwise instructions to set up a bidirectional rsync ?
Both the systems run linux
Rsync isn't really the right tool for this job. You can sort of get it to work, using cron jobs and extremely carefully chosen parameters, but there's significant danger of data loss, especially if you want file deletion to propagate.
I'd recommend a tool like Syncthing for bidirectional sync. You want something that maintains an independent database of what's changed and what hasn't, and real-time updates are nice to have too.

Free up workers for processing while waiting for long DB queries (uWSGI)?

I am maintaining API server for my company which runs a python flask app in uwsgi on top of nginx.
...
#app.route('/getquick', methods=["GET"])
def GET_GET_IP_DATA():
sp_final = "CALL sp_quick()"
cursor.execute(sp_final)
#app.route('/get_massive_log', methods=["POST"])
def get_massive_log():
sp_final = "CALL sp_slow()"
cursor.execute(sp_final)
...
While the first request /getquick gets processed very quickly, /get_massive_log can take up to five seconds due to a rather long and complex mySQL query. The server can handle few of these queries but starts creating broken pipe errors when called to much.
The problem is, the other /getquick requests get blocked by these long I/O requests.
My manager suggested that I use gevent to somehow free up the server to process the other requests while waiting for the mySQL queries, but I am not sure if I am looking in the correct direction.
I am using pymysql to run queries, which google seems to suggest to work with gevent on top of uwsgi, but I have not been able to produce better results with it.
I have googled for days now, and while I am trying to understand threads, concurrency, asynchronous requests, I don't know where to start digging to find a solution. Is it even possible? Any suggestions or even pointers to where to research would be greatly appreciated.
EDIT : Perhaps my questions wasn't too clear, so I'll try to restate it:
What's the best way to free up workers for processing other requests while waiting for long database queries with uwsgi?
You need to learn about Uwsgi offloading
Offloading is a way to optimize tiny tasks, delegating them to one or
more threads.
These threads run such tasks in a non-blocking/evented way allowing
for a huge amount of concurrency.
You can read about offloading subsystem in the docs

Python: Prioritizing tasks and Running asynchronous tasks without a lock

Right now I'm using Gevent, and I wanted to ask two questions:
Is there a way to execute specific tasks that will never execute asynchronously (instead of using a Lock in each of these tasks)
Is there's a way to prioritize spawned tasks in Gevent? Like a group of tasks that will be generated with low priority that will be executed when all of the other tasks are done. For example, two tasks that listen to different socket when each of these tasks handles the socket requests in various priority
If it's not possible in Gevent, is there any other library that it can be done?
Edit
Maybe Celery can help me here?
If you want to manage computing resources, Python async libraries can't help here, because, AFAIK, neither has priority scheduler. All greenthreads are equal.
Task queues generally have a notion of priority, so Celery or Beanstalk is one way to do it.
If your problem does not require task (re)execution guarantees, persistence, multi-machine work distribution, then I would just start few worker processes, assign them CPU, IO, disk priorities using OS and send work/results via UNIX socket DGRAM. Kind of ad-hoc simpler version of task queue. If you go this way, please share your work as open source project, I believe there's demand for this kind of solution.

Load balance background tasks in Azure Web App

I am developing an ASP.NET application that will be hosted as an Azure web app. Part of the app will continuously record multiple web-based cameras by retrieving a snapshot every N seconds. I would like to design the app so that the processes that record the cameras can be run on multiple instances. I would like it to load balance between all instances, but not duplicate effort for any one camera.
For example, if I have 100 cameras, and am running on 2 instances, I want each instance to get 50 cameras to process. If I have 5 instances, each instance should get 20 cameras to process. As I add cameras or scale instances up/down I would like for the system to load balance the work evenly.
If it's feasible, I would rather not spin up dedicated VMs just for processing cameras, due to increased cost.
I'm somewhat familiar with Akka.NET, Hangfire, and WebJobs, but am unclear if these will help in this scenario. I have used Hangfire and WebJobs to do background processing, but not with this sort of load-balancing requirement. Will these or some other framework or tool help me load balance these background tasks evenly across Azure Web App Instances? How should I go about setting up these or another framework to do this?
I honestly don't think you want to try to "balance" the servers. I think you just want to make sure the work is well distributed. If I were you, I would use a queue system like SQS to queue up all of the cameras that need a snapshot and let each instance worker dequeue one at a time and process it.
A good approach could be to have a master server responsible for queueing up the snapshots, and then have all of your workers servers simply work out of this shared queue. Even if one server happens to process more than the others, that is fine since the others were working out of the same queue. It just means that this server was able to process its jobs more quickly than the others.
To be honest, there are a lot of ways to approach this. You could do something as simple as just having a shared list of your cameras, with a timestamp for the last snapshot, and use this to work off of. Each server would request a camera, they would look at the list and find one that was stale, and then update the timestamp and perform the snapshot for the camera. The downside to something like this is you are going to struggle with non-atomic operations and the possibility of multiple workers making the request at the same time and both working on the same server. These are the type of things that a queue system will help you with, because as soon as one of those queue items are in flight, they will no longer be available. And also, because each server is responsible for invalidating their items once they are finished, if a server were to crash mid-snapshot, this work would simple go back into the queue.
No matter which solution you choose, it is going to boil down to having a central system/list for serving up stale cameras.
The Azure WebJob SDK uses the Storage Account you set up to balance the work between the various instances that are running your Jobs. You can gain finer control by using a Queue to divide up the work that needs doing and then scale your App Service Plan based on the Queue length.
Here's a rough picture of that architecture:

How come my app dies on AppHarbor?

I have a web app that will run forever (at least for a few days) on my local machine using the technique (hack?) described in Jeff Atwood's post: https://blog.stackoverflow.com/2008/07/easy-background-tasks-in-aspnet/
However when I run it on App Harbor my app doesn't run for more than an hour or so (I'm not sure when it dies) as long as I hit the site it stays up so I'm assuming it is being killed after an idle period, but I'm not sure why.
My app doesn't save any state or persist anything. It makes web service calls and survives errors in any calls.
I turned on a ping service to keep my app alive but I'm curious why this works on my local machine but not on App Harbor?
The guys behind App Harbor pays for EC2 instances for all running apps, so they naturally want to limit the cpu usage as much as possible. One way to achieve this is to shut down unused applications very fast and only restart them when someone actually try to access them. Paid hosting should not be limited in this way.
(As far as I have been informed they are able to host around 100k sites on less than twenty medium instances which is certainly quite impressive and calls for a very economic use of resources.)
To overcome the limitation you would need a cron job to ping your app harbour site. But this of course a quite recursive problem since you need app harbour to act as a cron job ;)
AppHarbor recycles the Application Pool frequently to keep sleeping websites from using idle CPU time. This is simply the price you pay of using a shared website hosting plan.
If you really want to run a background job then you should be using AppHarbor's background workers, since this is exactly the type of task they were built to run.
http://support.appharbor.com/kb/getting-started/background-workers
Simply build a new console application that runs your logic and include it in your solution. When you push the code the workers will be started automatically. If you happen to already have other exe's in your solution make sure to edit the app.config and set the 'deploy background worker' value to false.

Resources