Concurrency issues with Symfony and Doctrine - symfony

Good morning,
I have an issue when selecting the next record with Doctrine when there is concurrency. I have installed supervisord inside a docker container that starts multiple processes on the same "dispatch" command. The dispatch commands basically gets the next job in queue in the db and sends it to the right executor. Right now I have two docker containers that each run multiple processes through supervisord. These 2 containers are on 2 different servers. I'm also using Doctrine Optimistic locking. So the Doctrine query to find the next job in queue is the following:
$qb = $this->createQueryBuilder('job')
->andWhere('job.status = :todo')
->setMaxResults( 1 )
->orderBy('job.priority', 'DESC')
->addOrderBy('job.createdAt', 'ASC')
->setParameters(array("todo" => Job::STATUS_TO_DO));
return $qb->getQuery()->getOneOrNullResult();
So the issue is that when a worker tries to get the next job with the above query, I notice that they frequently run into the Optimistic Lock Exception which is fine meaning the record is already used by another worker. When there is an Optimistic Lock Exception, it's caught and then worker stops and another one starts. But I lose a lot of time because of this, because it takes multiple tries for workers to finally get the next job instead of the Optimistic Lock exception.
I thought about getting a random job id in the above Doctrine query.
What's your take on this? Is there a better way to handle this?

I finally figured it out. There was a delay between one of the server and the remote mysql so updates were not seen right away and that triggered the Optimistic Lock Exception. I fixed it by moving the mysql DB to Azure which is way faster than the old server and causes no delays.

Related

Long running php process with Doctrine

I created a Symfony 3 command that is expected to run for days (or even weeks). It uses Doctrine 2 for reading some initial data and for writing the execution status from time to time. The SQLs are expected to take few milliseconds.
My concern is that the whole process will eventually crash if the MySQL connection closes due to inactivity.
Question: is Doctrine keeping the database connection open between flush calls? Or, is it reconnecting every time flush is called?
AFAIK Symfony will open up a connection to the database the first time Doctrine is used in your app and close it when the HTTP request is sent (or if you specifically tell Doctrine to close it). Once connected, Doctrine will have the connection active until you explicitly close it (and will be active before, during and after flush())
In your case you should probably open and close the db connection explicitly when you need it. Something like the following code could solve your problem:
// When you need the DB
/**
* #var \Doctrine\DBAL\Connection $connection
*/
$connection = $this->get('doctrine')->getConnection();
// check if the connection is still active and if not connect to the db
if(!$connection->isConnected()) {
$connection->connect();
}
// Your code to update the database goes after this.
your code
// Once you're done with the db update - close the connection.
if($connection->isConnected()) {
$connection->close(); // close the db connection;
}
This will avoid db connection timeouts and etc, however you should be quite careful with memory leaks if this script will be running as long as you're saying. Using Symfony might not be the best approach to this problem.
You can simply ping the connection every 1000 seconds, less than MySQL's connection limit.
Best thing to do would be to run a supervising process (eg. supervisord), which would restart the process as soon as your app stops. Then you can simply tell your script to exit before the connection is dropped (as it's a configured value, in MySQL for instance it's the wait_timeout variable). Supervising process will notice your app is dead and will restart it.

Horizontal scaling and cron jobs

I was recently forced to move my app to Amazon and use auto-scaling, I have stumbled on to a issue with cron jobs and automatic scaling.
I have a cron job running every 15 minutes which checks if subscriptions should be charged, the query selects all subscriptions that are past due, and attempts to charge them. It changes their status once processed, but they are fetched In a batch, and the process takes 1-3 minutes.
If I have multiple instances with the same cron job, it could fire simultaneously and charge the subscriptions multiple times. This has actually happened once.
What is the Best approach here? Somehow locking the table?
I am using Amazon elastic beanstalk and symfony3.
At least you can use dedicated micro instance for subscription charging (not auto-scaled of course), just with cron jobs. Simplest way yet safest (obviously it will safe if you move your subscription handling logic from front-end servers which potentially can be hacked to the server behind VPC subnet that isn't available from global network).
But if you don't want, you still can use another approach. You mentioned you use Beanstalk. Beanstalk allow to use delayed jobs.
So possible approach is:
1) When you create subscription, you can calculate when it should be charged, and then push the job with calculated delay to Beanstalk tube.
2) Then, worker get the job (with subscription) on-time. Only one worker will get the particular job, so it will work if you use autoscaling.
3) In worker, you check the subscription (probably it can be deleted or inactive etc.) and if it ready to charge, just run the code for charging. Then calculate next charging time and push new delayed job (with subscription) to queue.
Beanstalk has Symfony bundle and powerful PHP library
You can make your job run only for one instance i.e make your functionality - charge subscription run only for one of instance.
You can use AWS api for fetching all instances and then matching the instances with current running one.
ec2 = Aws::EC2::Resource.new(region: 'region',
credentials: Aws::Credentials.new(IAM_KEY', 'IAM_SECRET')
)
metadata_endpoint = 'http://169.254.169.254/latest/meta-data/'
current_server_id = Net::HTTP.get( URI.parse( metadata_endpoint + 'instance-id' ) )
instances = []
ec2.instances.each do |i|
if (i.state.name == 'running')
instances << i.id
end
end
if (instances.first == current_server_id )
{
your functionality
}

Pagodabox or PHPfog + Gearman

All,
I'm looking for a good way to do some job backgrounding through either of these two services.
I see PHPFog supports IronWorks, but i need something more realtime. Through these cloud based PaaS services, I'm not able to use popen(background.php --token=1234). So I'm thinking the best solution, might be to try to kick off a gearman worker to handle the job. (Actually my preferred method would be to use websockets to keep a connection open and receive feedback from the job, rather than long polling a db table through AJAX, but none of these guys support websockets)
Question 1 is, is there a better solution than using gearman to offload the job?
Question 2 is, http://help.pagodabox.com/customer/portal/articles/430779 I see pagodabox supports 'worker listeners' ... has anybody set this up with gearman? Would it work?
Thanks
I am using PagodaBox with a background worker in an application I am building right now. Basically, PagodaBox daemonizes a PHP process for you (meaning it will continually run in the background), so all you really have to do is create a script that checks a database table for tasks to run, runs them, and then sleeps a bit so it's not running too many queries against your database.
This is a simplified version of what I have running:
// Remove time limit
set_time_limit(0);
// Show ALL errors
error_reporting(-1);
// Run daemon
echo "--- Starting Daemon ---\n";
while(true) {
// Query 'work_queue' table for new tasks
// Loop over items and do whatever tasks are associated with them
// Update row to mark task as completed
// Wait a bit
sleep(30);
}
A benefit to this approach is that it's easy to test via CLI:
php tasks.php
You will see all the echo statements come through in console as it's running, and of course it's much easier to do than a more complicated setup with other dependencies like Gearman.
So whenever you add a new task to the table, the maximum amount of time you'll wait for that task to be started in a batch is 30 seconds (or whatever your sleep time is). This is better and preferable to cron jobs, because if you setup a cron job to run every minute (the lowest possible interval) and the work you have to do takes longer than a minute, another cron process will start working on the same queue and you could end up with quite a lot of duplicated task work that could cause a lot of issues that are hard to debug and troubleshoot. So if you instead have either only one background worker that runs all tasks, or multiple background workers that work on different task types, you will never run into this issue.

How can I remove Host Instance Zombies from BTMessageBox

After moving most of our BT-Applications from BizTalk 2009 to BizTalk 2010 environment, we began the work to remove old applications and unused host. In this process we ended up with a zombie host instance.
This has resulted in that the bts_CleanupDeadProcesses startet to fail with error “Executed as user: RH\sqladmin. Could not find stored procedure 'dbo.int_ProcessCleanup_ProcessLabusHost'. [SQLSTATE 42000] (Error 2812). The step failed.”
After looking at the CleanupDeatProcess process, I found the zombie host instance found in the BTMsgBox.ProcessHeartBeats table, with dtNextHeartbeatTime set to the time when the host was removed.
(I'm assuming that the Host Instance Processes don't exist in your services any longer, and that the SQL Agent job fails)
From looking at the source of the [dbo].[bts_CleanupDeadProcesses] job, it loops through the dbo.ProcessHeartbeats table with a cursor (btsProcessCurse, lol) looking for 'dead' hearbeats.
Each process instance has its own cleanup sproc int_ProcessCleanup_[HostName] and a sproc for the heartbeat watchdog to call, viz bts_ProcessHeartbeat_[HostName] (although FWR the SPROC calls it #ApplicationName), filtered by WHERE (s.dtNextHeartbeatTime < #dtCurrentTime).
It is thus tempting to just delete the record for your deleted / zombie host (or, if you aren't that brave, to simply update the Next dtNextHeartbeatTime on the heartbeat record for your dead host instance to sometime next century). Either way, the SQL agent job should skip the dead instances.
An alternative could be to try and re-create the Host and Instances with the same name through the Admin Console, just to delete them (properly) again. This might however cause additional problems as BizTalk won't be able to create the 2 SPROCs above because of the undeleted objects.
However, I wouldn't obviously do this on your prod environment until you've confirmed this works with a trial run first.
It looks like someone else got stuck with a similar situation here
And there is also a good dive into the details of how the heartbeat mechanism works by XiaoDong Zhu here
Have you tried BTSTerminator? That works for one-off cleanups.
http://www.microsoft.com/en-us/download/details.aspx?id=2846

Doctrine2: Cannot find concurrently persisted entity with findById

I have the current setup:
A regular Symfony2 web request can create and persist Job entity which also creates a Gearman Job, lets say this occurs in process 1. The Gearman Job is executed by a Gearman Worker which is passed the Job entity's ID.
I also use Symfony to create a Gearman Worker, this is run as a PHP CLI process, lets call this process 2.
For those not familiar with Gearman the worker code operates something like so:
for loop 5 times
get job from gearman (blocking method call)
get job entity from database
do stuff
Essentially this code keeps a Symfony2 instance running to handle 5 Jobs before the worker dies.
My issue is this: On the first job that the worker handles Doctrine2 is able to retrieve the created job from the database without issue using the following code:
$job = $this->doctrine
->getRepository('AcmeJobBundle:Job')
->findOneById($job->workload()); // workload is the job id
However, once this job completes and the for loop increments to wait for a second job, lets say this arrives from another Symfony2 web request on process 3 creating the Job with ID 2, the call to the Doctrine2 repository returns null even though the entity is definitely in the database.
Restarting the worker solves the issue, so when it carries out it's first loop it can pick up Job 2.
Does anyone know why this happens? Does the first call of getRepository or findOneById do some sort of table caching from MySQL that doesn't allow it to see the subsequently added Job 2?
Does MySQL only show a snapshot of the DB to a given connection as long as it is held open?
I've also tried resetting the entityManager before making the second call to findOneBy to no avail.
Thanks for any advice in advance, this one is really stumping me.
Update:
I've created a single process test case to rule out whether or not it was the concurrency causing the problem, and the test case executes as expected. It seems the only time the repository can't find job 2 is when it is added to the DB on another process.
// Job 1 already exists
$job = $this->doctrine
->getRepository('AcmeJobBundle:Job')
->findOneById(1);
$job->getId(); // this is fine.
$em->persist(new Job()); // creates job 2
$em->flush();
$job = $this->doctrine
->getRepository('AcmeJobBundle:Job')
->findOneById(2);
$job->getId(); // this is fine too, no exception.
Perhaps one process tries to load entity before it has saved by the second process.
Doctrine caches loaded entities by their id, so that when you get a second request for the same object it loads without making another query to the database. You can reed more about Doctrine IdentityMap here

Resources