I am trying to do some Bulk inserts into my database. I have read the article about it on the doctrine side and wanted to use sporadic flushs and clears in order to prevent high memory consumptions. Unfortunately all entities get detached in the process, not only the ones I am inserting, but also the relations to it.
I tried to remerge them or use references instead. In my current case I am using a reference and still I get the following error message:
[Doctrine\ORM\ORMInvalidArgumentException]
A new entity was found through the relationship
'Strego\TippBundle\Entity\Bet#betRound' that was not configured to
cascade persist operations for entity: LoadTest GameGroup. To
solve this issue: Either explicitly call EntityManager#persist() on
this unknown entity or configure cascade persist this association in
the mapping for example #ManyToOne(..,cascade={"persist"}).
The relevant coding is this:
// BetRound
print(PHP_EOL."Search for BetROund");
$betRounds = $em->getRepository('StregoTippBundle:BetRound')->findAll();
print(PHP_EOL.'found Betrrounds:'.count($betRounds));
$betRound = current($betRounds);
...
// References
$betRoundRef = $em->getReference('Strego\\TippBundle\\Entity\\BetRound', $betRound->getId());
and here the insert:
foreach($gameRefs as $game){
$bet = new GameBet();
$bet->setBetround($betRoundRef);
$bet->setUser($genuser);
$bet->setGame($game);
$bet->setScoreT1($this->getScore());
$bet->setScoreT2($this->getScore());
$bet->recalculatePoints();
$em->persist($bet);
}
if(($i % self::$batchSize) == 0){
$em->persist($userGroup);
$em->persist($mySelf);
$em->flush();
$em->clear();
$em->merge($betRound);
$em->merge($userGroup);
$em->merge($mySelf);
}
My whole Fixture for loading this data can be found here: https://gist.github.com/KeKs0r/a3006768db267311bb35
When calling the clear method everything is detached (Detaching entities).
You'll need to reload each previously loaded entity (in your case $betRoundRef, $genuser and probably $game too).
Have a look at this Stack Overflow answer
Related
We are migrating a project from a more basic ORM to using Symfony+Doctrine. In the project we have a lot of cron jobs looking like this:
$rows = $someRepository->getRows();
foreach ($rows as $row) {
try {
$db->beginTransaction(); //simple begin transaction in db
//do some handling of data
// Maybe load some other entities and update those
// ...
$db->commit();
} catch (Throwable $t) {
//log error
//clear entity cache
$db->rollback(); //simple rollback in db
}
}
When we did it this way, all changes within the try catch was atomic while it at the same time was possible to recover from an error and continue on the next $row.
In Symfony+Doctrine, I simply cannot figure out how to mimic this behaviour. The recommendation from Doctrine to handle an exception is closing the EntityManager, but how do you recover?
The ORM does this implicitly on flush, so most of the time you can avoid the hassle of doing so on your own.
However, if you want clear demarcation you can still do it explicitly, in a similar manner you did so far.
More reading and examples here: https://www.doctrine-project.org/projects/doctrine-orm/en/2.7/reference/transactions-and-concurrency.html
EDIT related to the comment below:
Instead of injecting the manager, you should inject the registry.
After that on catch, you can check if the $em->isOpen(), and call $registry->resetManager() if not.
I suspect this will also reset the unit of work, so you might encounter detached entities. In that case you should do $em->merge();
One thing to note here is that an expection is not considered normal in doctrine, so they are closing the manager because of that. You might think that this is overcompicated - yes it is, because you are working against the philosophy here. Validate your data if you can. Read this section: https://www.doctrine-project.org/projects/doctrine-orm/en/2.7/reference/transactions-and-concurrency.html#exception-handling
As for the why: (This is not offical, just based on my knowledge) The managers internal unit of work is a stateful object. When an exception occures during a transaction that state will remain the same, but couln't be persisted to the database. If they let this go that would mean the EM would try to apply all state changes again, and would encounter the same exception again. So no point in leaving it open in the same state, a reset is needed.
With Doctrine and Symfony in my PHPUnit test method :
// Change username for user #1 (Sheriff Woody to Chuck Norris)
$form = $crawler->selectButton('Update')->form([
'user[username]' => 'Chuck Norris',
]);
$client->submit($form);
// Find user #1
$user = $em->getRepository(User::class)->find(1);
dump($user); // Username = "Sheriff Woody"
$user = $em->createQueryBuilder()
->from(User::class, 'user')
->andWhere('user.id = :userId')
->setParameter('userId', 1)
->select('
user
')
->getQuery()
->getOneOrNullResult()
;
dump($user); // Username = "Chuck Norris"
Why my two methods to fetch the user #1 return different results ?
diagnosis / explanation
I assume* you already created the User object you're editing via crawler before in that function and checked that it is there. This leads to it being a managed entity.
It is in the nature of data, to not sync itself magically with the database, but some automatism must be in place or some method executed to sync it.
The find() method will always try to use the cache (unless explicitly turned off, also see side note). The query builder won't, if you explicitly call getResult() (or one of its varieties), since you explicitly want a query to be executed. Executing a different query might lead to the cache not being hit, producing the current result. (it should update the first user object though ...) [updated, due to comment from Arno Hilke]
((( side note: Keeping objects in sync is hard. It's mainly about having consistency in the database, but all of ACID is wanted. Any process talking to the database should assume, that it only is working with the state at the moment of its first query, and is the only user of the database. Unless additional constraints must be met and inconsistent reads can occur, in which case isolation levels should be raised (See also: transactions or more precisely: isolation). So, automatically syncing is usually not wanted. Doctrine uses certain assumptions for performance gains (mainly: isolation / locking is optimistic). However, in your particular case, all of those things are of no actual concern... since you actually want a non-repeatable read. )))
(* otherwise, the behavior you're seeing would be really unexpected)
solution
One easy solution would be, to actively and explicitly sync the data from the database by either calling $em->refresh($user), or - before fetching the user again - to call $em->clear(), which will detach all entities (clearing the cache, which might have a noticable performance impact) and allowing you to call find again with the proper results being returned.
Please note, that detaching entities means, that any object previously returned from the entity manager should be discarded and fetched again (not via refresh).
alternate solution 1 - everything is requests
instead of checking the database, you could instead do a different request to a page that displays the user's name and checks that it has changed.
alternate solution 2 - using only one entity manager
using only one entity manager (that is: sharing the entity manager / database in the unit test with the server on the request) may be a reasonable solution, but it comes with its own set of problems. mainly omitted commits and flushes may avoid detection.
alternate solution 3 - using multiple entity managers
using one entity manager to set up the test, since the server is using a new entity manager to perform its work, you should theoretically - to do this actually properly - create yet another entity manager to check the server's behavior.
comment: the alternate solutions 1,2 and 3 would work with the highest isolation level, the initial solution probably wouldn't.
To my surprise, I didn't find standard update and delete operations on InvenItemService. So in order to fullfill our client's requirements, I ran the AIF update wizard and added these two operations. I thought its easy and found the process of doing that very quick. Before doing that, I set the update property of the AxdItem query to Yes. Later, while debugging the update operations, I figured I had to modify the updateList() and Update() methods on AxdItem class as accordingly to provide method definitions.
public AifResult updateList( AifEntityKeyList _entityKeyList,
AifDocumentXml _xml,
AifEndpointActionPolicyInfo _actionPolicyInfo,
AifConstraintListCollection _constraintListCollection)
{
//throw error(strFmt("#SYS94920"));
return super(_entityKeyList, _xml, _actionPolicyInfo, _constraintListCollection);
}
AifResult update( AifEntityKey _entityKey ,
AifDocumentXml _xml,
AifEndpointActionPolicyInfo _actionPolicyInfo,
AifConstraintList _constraintList)
{
//throw error(strFmt("#SYS94920"));
return super(_entityKey, _xml, _actionPolicyInfo, _constraintList);
}
Now while trying to update an existing item in AX, I am getting following AIF exception.
Cannot edit a record in Item sales order settings (InventItemSalesSetup).
The operation cannot be completed, since the record was not selected for update. Remember TTSBEGIN/TTSCOMMIT as well as the FORUPDATE clause.
Then I changed the update property of all the child data sources on AxdItem Query and re-ran the wizard. Ran CIL again and getting the following exception now.
Cannot edit a record in Item sales order settings (InventItemSalesSetup).
An update conflict occurred due to another user process deleting the record or changing one or more fields in the record.
Any suggestions/ideas?
I have tried several things and spent too much time but failed.
Seems like you trying update the record that already has been deleted in method removeDefaultItemOrderSetup of AxdItem class
This will give you a hint what happens
https://dynamicsuser.net/ax/f/developers/72116/aif-update-cannot-be-run-twice
I have an Application that has a relationship to ApplicationFile:
/**
* #ORM\OneToMany(
* targetEntity="AppBundle\Entity\ApplicationFile",
* mappedBy="application",
* cascade={"remove"},
* orphanRemoval=true
* )
*/
private $files;
A file entity has a field that stores binary data, and can be up to 2MB in size. When iterating over a large list of applications and their files, PHP memory usage grows. I want to keep it down.
I've tried this:
$applications = $this->em->getRepository('AppBundle:Application')->findAll();
foreach ($applications as $app) {
...
foreach ($app->getFiles() as $file) {
...
$this->em->detach($file);
}
$this->em->detach($app);
}
Detaching the object should tell the entity manager to stop caring about this object and de-referencing it, but it surprisingly has no effect on the amount of memory usage - it keeps increasing.
Instead, I have to manually load the application files (instead of retrieving them through the association method), and the memory usage does not increase. This works:
$applications = $this->em->getRepository('AppBundle:Application')->findAll();
foreach ($applications as $app) {
...
$appFiles = $this
->em
->getRepository('AppBundle:ApplicationFile')
->findBy(array('application' => $application));
foreach ($appFiles as $file) {
...
$this->em->detach($file);
}
$this->em->detach($app);
}
I used xdebug_debug_zval to track references to the $file object. In the first example, there's an extra reference somewhere, which explains why memory is ballooning - PHP is not able to garbage collect it!
Does anyone know why this is? Where is this extra reference and how do I remove it?
EDIT: Explicitly calling unset($file) at the end of its loop has no effect. There are still TWO references to the object at this point (proven with xdebug_debug_zval). One contained in $file (which I can unset), but there's another somewhere else that I cannot unset. Calling $this->em->clear() at the end of the main loop has no effect either.
EDIT 2: SOLUTION: The answer by #origaminal led me to the solution, so I accepted his answer instead of providing my own.
In the first method, where I access the files through the association on $application, this has a side effect of initializing the previously uninitialized $files collection on the $application object I'm iterating over in the outer loop.
Calling $em->detach($application) and $em->detach($file) only tells Doctrine's UOW to stop tracking the objects, but it doesn't affect the array of $applications I'm iterating on, which now have populated collection of $files which eat up memory.
I have to unset each $application object after I'm done with it to remove all references to the loaded $files. To do this, I modified the loops as such:
$applications = $em->getRepository('AppBundle:Application')->findAll();
$count = count($applications);
for ($i = 0; $i < $count; $i++) {
foreach ($applications[$i]->getFiles() as $file) {
$file->getData();
$em->detach($file);
unset($file);
}
$em->detach($applications[$i]);
unset($applications[$i]);
// Don't NEED to force GC, but doing so helps for testing.
gc_collect_cycles();
}
Cascade
EntityManager::detach should indeed remove all references Doctrine has to the enities. But it does not do the same for associated entities automatically.
You need to cascade this action by adding detach the cascade option of the association:
/**
* #ORM\OneToMany(
* targetEntity="AppBundle\Entity\ApplicationFile",
* mappedBy="application",
* cascade={"remove", "detach"},
* orphanRemoval=true
* )
*/
private $files;
Now $em->detach($app) should be enough to remove references to the Application entity as well as its associated ApplicationFile entities.
Find vs Collection
I highly doubt that loading the ApplicationFile entities through the association, in stead of using the repository to findBy() them, is the source of your issue.
Sure that when loaded through the association, the Collection will have a reference to those child-entities. But when the parent entity is dereferenced, the entire tree will be garbage collected, unless there are other references to those child entities.
I suspect the code you show is pseudo/example code, not the actual code in production. Please examine that code thoroughly to find those other references.
Clear
Sometimes it worth clearing the entire EntityManager and merging a few entities back in. You could try $em->clear() or $em->clear('AppBundle\Entity\ApplicationFile').
Clear has no effect
You're saying that clearing the EntityManager has no effect. This means the references you're searching for are not within the EntityManager (of UnitOfWork), because you've just cleared that.
Doctrine but not Doctrine
Are you using any event-listeners or -subscribers? Any filters? Any custom mapping types? Multiple EntityManagers? Anything else that could be integrated into Doctrine or its life-cycle, but is not necessarily part of Doctrine itself?
Especially event-listeners/subscribers are often overlooked when searching for the source of issues. So I'd suggest you start to look there.
If we are speaking about your first implementation you have extra links to the collection in the PersistentCollection::coll of the Application::files property - this object is created by Doctrine on Application instantiation.
With detach you are just deleting UoW links to the object.
There are different ways to fix this but a lot of hacks should be applied. Most nice way probably to detach also Application object and unset it.
But it is still preferable to use more advanced ways for a batch processing: some were listed in the other answer. The current way forces doctrine to make use proxies and throws extra queries to DB to get the files of the current object.
Edit
The difference between the first and the second implementation is that there are no circular references in the second case: Application::files stays with uninitialized PersistenceCollection (with no elements in coll).
To check this - can you try to drop the files association explicitly?
The trick is in PHP's garbage collector, that works a bit odd. First off all each time the scripts need memory it will allocate memory from RAM, even if you use unset(), $object = null, or other tricks to free the memory, the allocated memory will not be returned to Operating System till the script is not finished and the process associated with it killed.
How to fix that ?
Usually is done on Linux Systems
Create commands that runs the needed script with limit, offset parameters and re-run the needed scripts in small batches more times. In this way, the script will use less memory, and each time the memory will be freed when the script will be finished.
Get rid of Doctrine it balloons by himself the memory, PDO is much faster and less costly.
For that kind of task where objects in memory could lead to that leaks, you should use Doctrine2 iterators
$appFiles = $this
->em
->getRepository('AppBundle:ApplicationFile')
->findBy(array('application' => $application));
should be refactored in order to return a query object and not an ArrayCollection, then from that query object, you can easily call iterate() method and clean the memory after every object inspection
Edit
You have "hidden references" because detach operation will not delete the object in memory, it only tells to EntityManager not to handle it anymore. This is why you should use my solution or unset() the object with php function.
Simple thing, but doesn't work. We have at the bottom part of script
$oMan = $this->getContainer()->get('doctrine')->getManager();
// add entry of calling
$oLastCall = new CronLastCall();
$oLastCall->setType('key');
$oMan->persist($oLastCall);
$oMan->flush();
we insert to db once we create it, then do some stuff that can take a few minutes. Then call this one.
$oLastCall->setDateEnd(new \DateTime('now'));
$oMan->flush();
after this one - exist from method\action. So regarding logic (and doctrine2 manual I read) entity that were created already become 'managed' (we persist it) and we can simply update it. (I call flush (at the end) to update this entity, but it not updating.)
Where is trouble?
As far as I can tell, there's nothing wrong with the code you've shown. But if the $oLastCall object becomes detached between the first and second code blocks, you have to re-attach (merge) it to the manager so that it detects the changes for the second flush.
Merging can be done in this way:
$oLastCallMerged = $oMan->merge($oLastCall);
$oLastCallMerged->setDateEnd(new \DateTime('now'));
$oMan->flush();
You can also check the state (MANAGED/NEW/DETACHED/REMOVED) of an object using this code:
$oMan->getUnitOfWork()->getEntityState($oLastCall);
If that doesn't help (i.e. detachment of the object isn't your problem), you need to give more info about the context this code runs in and any errors you get. Is this code part of a Console Command or a regular web-app Controller? Do you get any output or errors when running it in 'dev' environment? (Check .../app/logs/dev.log.) Does the $oLastCall object stay in memory while waiting for the stuff that takes some minutes, or do you reload it from somewhere?
Btw, objects doesn't magically get detached by themselves. They'll only be detached if you load them from a different source than the entity manager (for example storing them in the session between requests) OR if you explicitly detach them by calling $oMan->detach($entity) or $oMan->clear().
Edit
You can also check if Doctrine detects the change by echoing out the changeset using $oMan->getUnitOfWork()->getEntityChangeSet($oLastCall) before and after the change, e.g:
error_log(json_encode($oMan->getUnitOfWork()->getEntityChangeSet($oLastCall)));
$oLastCall->setDateEnd(new \DateTime('now'));
error_log(json_encode($oMan->getUnitOfWork()->getEntityChangeSet($oLastCall)));
After flushing the manager all objects will be removed internally from the manager. So if you want to update the object later you need to call persist again before flushing.