Doctrine2: Cannot find concurrently persisted entity with findById - symfony

I have the current setup:
A regular Symfony2 web request can create and persist Job entity which also creates a Gearman Job, lets say this occurs in process 1. The Gearman Job is executed by a Gearman Worker which is passed the Job entity's ID.
I also use Symfony to create a Gearman Worker, this is run as a PHP CLI process, lets call this process 2.
For those not familiar with Gearman the worker code operates something like so:
for loop 5 times
get job from gearman (blocking method call)
get job entity from database
do stuff
Essentially this code keeps a Symfony2 instance running to handle 5 Jobs before the worker dies.
My issue is this: On the first job that the worker handles Doctrine2 is able to retrieve the created job from the database without issue using the following code:
$job = $this->doctrine
->getRepository('AcmeJobBundle:Job')
->findOneById($job->workload()); // workload is the job id
However, once this job completes and the for loop increments to wait for a second job, lets say this arrives from another Symfony2 web request on process 3 creating the Job with ID 2, the call to the Doctrine2 repository returns null even though the entity is definitely in the database.
Restarting the worker solves the issue, so when it carries out it's first loop it can pick up Job 2.
Does anyone know why this happens? Does the first call of getRepository or findOneById do some sort of table caching from MySQL that doesn't allow it to see the subsequently added Job 2?
Does MySQL only show a snapshot of the DB to a given connection as long as it is held open?
I've also tried resetting the entityManager before making the second call to findOneBy to no avail.
Thanks for any advice in advance, this one is really stumping me.
Update:
I've created a single process test case to rule out whether or not it was the concurrency causing the problem, and the test case executes as expected. It seems the only time the repository can't find job 2 is when it is added to the DB on another process.
// Job 1 already exists
$job = $this->doctrine
->getRepository('AcmeJobBundle:Job')
->findOneById(1);
$job->getId(); // this is fine.
$em->persist(new Job()); // creates job 2
$em->flush();
$job = $this->doctrine
->getRepository('AcmeJobBundle:Job')
->findOneById(2);
$job->getId(); // this is fine too, no exception.

Perhaps one process tries to load entity before it has saved by the second process.
Doctrine caches loaded entities by their id, so that when you get a second request for the same object it loads without making another query to the database. You can reed more about Doctrine IdentityMap here

Related

Correct Approach For Airflow DAG Project

I am trying to see if Airflow is the right tool for some functionality I need for my project. We are trying to use it as a scheduler for running a sequence of jobs
that start at a particular time (or possibly on demand).
The first "task" is to query the database for the list of job id's to sequence through.
For each job in the sequence send a REST request to start the job
Wait until job completes or fails (via REST call or DB query)
Go to next job in sequence.
I am looking for recommendations on how to break down the functionality discussed above into an airflow DAG. So far my approach would :
create a Hook for the database and another for the REST server.
create a custom operator that handles the start and monitoring of the "job" (steps 2 and 3)
use a sensor to poll handle waiting for job to complete
Thanks

CloudFoundry App instances - EF Core database migration

I've written a .NET Core Rest API which does migrate/ update the database (using Entity Framework Core) in Startup.cs. Currently, only one instance is running in the production environment. It seems to be recommended to run 2 instances in production.
What happens while executing the cf push command? Are both instances stopped automatically or do I need to execute cf stop?
In addition, how do I prevent both instances from updating the database?
I've read about the CF_INSTANCE_INDEX environment variable. Is it OK to only start the database migration when CF_INSTANCE_INDEX is 0? Or does CloudFoundry provide the next mechanism: start the first instance and when this one is up-and-running, the second instance will be started?
What happens while executing the cf push command? Are both instances stopped automatically or do I need to execute cf stop?
Yes, your app will stop. The new code will stage (i.e. buildpacks run) and produce a droplet. Then the system will bring up all the requested instances using the new droplet.
In addition, how do I prevent both instances from updating the database? I've read about the CF_INSTANCE_INDEX environment variable. Is it OK to only start the database migration when CF_INSTANCE_INDEX is 0?
You can certainly do it that way. The instance number is guaranteed to be unique and the zeroth instance will always exist, so if you limit to the zeroth instance then it's guaranteed to only run once.
Another option is to run your migration as a task (i.e. cf run-task). This runs in its own container, so it would only run once regardless of the number of instances you have. This SO post has some tips about running a migration as a task.
Or does CloudFoundry provide the next mechanism: start the first instance and when this one is up-and-running, the second instance will be started?
It does, it's the --strategy=rolling flag for cf push.
See https://docs.cloudfoundry.org/devguide/deploy-apps/rolling-deploy.html
I'm not sure that this feature would work for ensuring your migration runs only once. According to the docs (See "How it works" section at the link above), your new and old containers could overlap for a short period of time. If that's the case, running the migration could potentially break the old instances. It'll be a short period of time, just until they get replaced with new instances, but maybe something to consider.

Concurrency issues with Symfony and Doctrine

Good morning,
I have an issue when selecting the next record with Doctrine when there is concurrency. I have installed supervisord inside a docker container that starts multiple processes on the same "dispatch" command. The dispatch commands basically gets the next job in queue in the db and sends it to the right executor. Right now I have two docker containers that each run multiple processes through supervisord. These 2 containers are on 2 different servers. I'm also using Doctrine Optimistic locking. So the Doctrine query to find the next job in queue is the following:
$qb = $this->createQueryBuilder('job')
->andWhere('job.status = :todo')
->setMaxResults( 1 )
->orderBy('job.priority', 'DESC')
->addOrderBy('job.createdAt', 'ASC')
->setParameters(array("todo" => Job::STATUS_TO_DO));
return $qb->getQuery()->getOneOrNullResult();
So the issue is that when a worker tries to get the next job with the above query, I notice that they frequently run into the Optimistic Lock Exception which is fine meaning the record is already used by another worker. When there is an Optimistic Lock Exception, it's caught and then worker stops and another one starts. But I lose a lot of time because of this, because it takes multiple tries for workers to finally get the next job instead of the Optimistic Lock exception.
I thought about getting a random job id in the above Doctrine query.
What's your take on this? Is there a better way to handle this?
I finally figured it out. There was a delay between one of the server and the remote mysql so updates were not seen right away and that triggered the Optimistic Lock Exception. I fixed it by moving the mysql DB to Azure which is way faster than the old server and causes no delays.

How to reschedule a coordinator job in OOZIE without restarting the job?

When i changed the start time of a coordinator job in job.properties in oozie, the job is not taking the changed time, instead its running in the old scheduled time.
Old job.properties:
startMinute=08
startTime=${startDate}T${startHour}:${startMinute}Z
New job.properties:
startMinute=07
startTime=${startDate}T${startHour}:${startMinute}Z
The job is not running at the changed time:07th minute,its running at 08th minute in every hour.
Please can you let me know the solution, how i can make the job pickup the updated properties(changed timing) without restarting or killing the job.
You can't really change the timing of the co-ordinator via any methods given by Oozie(v3.3.2) . When you submit a job the contents properties are stored in the database whereas the actual workflow is in the HDFS.
Everytime you execute the co-ordinator it is necessary to have the workflow in the path specified in properties during job submission but the properties file is not needed. What I mean to imply is the properties file does not come into the picture after submitting the job.
One hack is to update the time directly in the database using SQL query.But I am not sure about the implications of it.The property might become inconsistent across the database.
You have to kill the job and resubmit a new one.
Note: oozie provides a way to change the concurrency,endtime and pausetime as specified in the official docs.

How to get the user who initiated the process in IBM BPM 8.5?

How to get the user who initiated the process in IBM BPM 8.5. I want to reassign my task to the user who actually initiated the process. How it can be achieved in IBM BPM?
There are several ways to get that who initiated a Task , But who initiated a process Instance is somewhat different.
You can perform one out of the following :
Add a private variable and assign it tw.system.user_loginName at the POST of start. you can access that variable for user who initiated the process.(It will be null or undefined for the scenario if task is initiated by some REST API or UCA.)
Place a Tracking group after Start event . Add a input variable to it as username , assign it a value same as tw.system.user_loginName. So whenever Process is started entry will be inserted to DB Table.You can retrieve this value from that view in PerformanceDB.
Also there might be some table ,logging the process Instances details , where you can find the user_id directly.
I suggest you to look in getStarter() method of ProcessInstanceData API.
Official Documentation on API
This link on IBM Developerworks should help you too: Process Starter
Unfortunately there's not an Out Of The Box way to do this - nothing is recorded in the Process Instance that indicates "who" started a process. I presume this is because there are many ways to launch a process instance - from the Portal, via a Message Event, from an API call, etc.
Perhaps the best way to handle this is to add a required Input parameter to your BPD, and supply "who" started the process when you launch it. Unfortunately you can't supply any inputs from the OOTB Portal "New", but you can easilty build your own "launcher".
If you want to route the first task in process to the user that started the process the easiest approach is to simply put the start point in the lane, and on the activity select routing to "Last User In Lane". This will take care of the use case for you without requiring that you do the book keeping to track the user.
Its been a while since I've implemented this, so I can't remember if it will work elegantly if you have system steps before the first task, but this can easily be handled by moving the system steps into the human service to be executed as part of that call, rather than as a separate step in the BPD.
Define variable as string type and using script task to define the login user that use this task and assign it to your defined variable to keep to you in all of the process as initiator of the task.
You can use this line of code to achieve the same:
tw.system.user_loginName

Resources