I'm running two daemons that interrogate an external service basically almost every second 24/7. Each of them inserts or updates things in the same local database after every loop, but they work on different objects of the same entities.
Since they run 24/7, after some tests I decided to clear the entity manager after every loop to avoid having a huge number of managed entities and a lot of memory usage.
So, in both of them, I run something like this after every loop:
$this->entityManager->flush();
....
$this->entityManager->clear(MyClass:class);
$this->entityManager->clear(MyOtherClass:class);
....
What I want to ask is: if DaemonA clears the entities and DaemonB hasn't flushed the persisted changes yet, what happens? When DaemonA flushes, does it affect in any way the entities in DaemonB? Could some objects get lost? Could some get duplicated? If so, what can I do to avoid this kind of things?
As I said, they work on different objects of the same entities, e.g DaemonA works on MyOtherClass objects 1, 2, 3 and DaemonB on MyOtherClass objects 4, 5, 6.
Both daemons are Symfony commands constructed like this:
class DaemonA extends Command
{
private $entityManager;
public function __construct(EntityManagerInterface $entityManager)
{
$this->entityManager = $entityManager;
parent::__construct();
}
...
}
That’s a lot of questions here, so let’s go through them step-by-step.
Before we start, remember how Doctrine works internally: If an entity, or a set of entities, is requested through a query or a repository, Doctrine loads the entity data from the database, creates entities, populates them with data, tracks changes and syncs changes back to the database. Doctrine entities have states, usually they are in a managed state unless you detach them. When you clear the entity manager, entities become detached.
Now, to answer your questions:
if DaemonA clears the entities and DaemonB hasn't flushed the persisted changes yet, what happens?
Clearing the entity manager only means that entities become detached and, if they aren’t referenced any more, are garbage-collected (I think). On a database level, this is irrelevant.
When DaemonA flushes, does it affect in any way the entities in DaemonB?
Yes, but not while DaemonB is running and doesn’t reload the entities from the database. Should DaemonA modify entities and DaemonB modify the same entities before reloading them in the meantime, the modifications by DaemonB will persist.
Could some get duplicated?
Only if you persist detached entities (they would have a new ID, though). However, persisting detached entities doesn’t make sense anyway.
If so, what can I do to avoid this kind of things?
Locks and transactions!
Every modification of your database which contains more than one query, must be wrapped in a transaction. This avoids inconsistencies if something goes wrong or a concurrent request modifies the data. On the PHP level, the transaction again should be wrapped in a try/catch block.
If you are modifying an entity, lock it. Doctrine supports different types of locking; pick the one that suits your scenario best.
The code in one of your daemons might look like this:
try
{
$em->beginTransaction();
$entity = $em->find($entityClassName, $id);
// lock the entity for all reading and writing.
$em->lock($entity, LockMode::PESSIMISTIC_WRITE);
$em->flush();
$em->commit();
}
catch (Exception $e)
{
$em->rollback();
throw $e;
}
Note that depending on your locking strategy and the general implementation of your system, the daemons may block each other by locking the database, up to the point where your system runs out of resources.
For example, pessimistic write locking is more secure (it makes sure that other processes don’t read the data until the modification is complete), but other processes will have to wait until the lock is released.
Be mindful and do test under heavy-load scenarios!
Related
I'm struggling to find a way to perform a persist() and flush() methods after the final flush (I mainly want to do it in postFlush event).
I collect the necessary entities in onFLush event (with changeSets) and wait up until all entities (which I collected) got flushed to get their id's (auto incremented).
So that I have at this point an array with all needed entities and their change sets and their id's set.
Then I want to create new entities (let's call them "traces") based on fields of previously collected entities and persist & flush "traces" in database.
But I'm really stuck here as I can't know entities id's in onFlush event, and I can't persist & flush them in postFlush when they already have their id's set.
Currently Doctrine documentation states following:
postFlush is called at the end of EntityManager#flush(). EntityManager#flush() can NOT be called safely inside its listeners.
And if I dare do this, it ends up in a recursion and php fails with an error.
Which approach may I take here?
I believe you could do a check if you aren't flushing "traces" entity and then perform your "traces" creation. That shouldn't loop.
Also you might want to look at symfony's eventDispatcher . You could dispatch your events manually, since it might be cleaner.
More details on "traces" would be helpful, from what I can imagine it is some kind of a changelog, history; so for that I might suggest EntityAuditBundle. It works pretty good with doctrine and is not hard to set up, I am using it myself.
Up till now our Doctrine entities did the cache busting via their LifecycleEvents. The APCu cache was deleted based on the entity's id in combination with a cache key constant:
/**
* #ORM\PostPersist()
*/
public function postPersist(LifecycleEventArgs $event)
{
apc_delete(sprintf(selff::CACHE_KEY, $event->getEntity()->getId()));
}
This was possible because of the procedural apc method, but this will never allow us to upgrade to PSR-6 because it makes use of a CacheItemPool that should be injected as a service.
As we will never going to inject the cache pool in the entities, my guess would be that we should create a EventSubscriber or EventListener for more than half the entities we have. This possible overhead frightens me a bit.
Will the subscriber / listener restructuring add a lot of overhead, and is that the right way to go? Should we add one global listener/subscriber for all entities (1..n) that handles all events or would it be better to add one listener/subscriber for every entity (n..m)?
Will the subscriber / listener restructuring add a lot of overhead, and is that the right way to go?
postPersist method is also added as listener to doctrine EventManager. http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/events.html
Based on this I would guess that there would be no significant overhead.
However, I wouldn't base my decision on this but instead try it out. You can use Symfony profiler to see if any overhead exists and how big it is.
Really cool service I use for this kind of comparison is Blackfire.io.
Should we add one global listener/subscriber for all entities (1..n) that handles all events or would it be better to add one listener/subscriber for every entity (n..m)?
I would create global listener for all entities. In case you need different caching logic for any particular entity you can always create separate case.
I have several entities, each with its form type. I want to be able, instead of saving the entity straight away on save, to save a copy of the changes we want to perform and store it in DB.
We'd send a message to the user who can approve the change, who will review the original and the changed field(s) and will approve or not. If approved the entity would be properly flushed.
To solve the issue I was thinking about:
1) doing a persist
2) getting the changesets (both the one related to "normal" fields, and the one relative to collections)
3) storing it in DB
4) Performing $em->refresh() to discard changes.
Later what I need is to get the changset(s) back, ask the (other) user to approve it and flush it.
Is this doable? What I'm especially concerned about is that the entity manager that generated the first changeset is not the same we are going to use to perform the flush, I basically need to "load" a changeset.
Any idea on how to solve the issue (this way, or another way ;) )
Another solution (working only for "normal" fields, not reference ones that come from other entities to the current one, like a many to many) would be to clone the current entity, store it, and then once approved copy the field(s) from the cloned to the original one. But it does not work for all fields (if the previous solution does not work we'd limit the feature just to "normal" fields).
Thank you!
SN
Well, you could just treat the modifications as entities themselves, so that every change is stored in the database, and then all the changes that were approved are executed against the entity.
So, for example, if you have some Books stored in the database, and you want to make sure that all the modifications made to these are approved, just add a model that would contain the changeset that has to be processed, and a handler that would apply these changes:
<?php
class UpdateBookCommand
{
// If you'll store these commands in a database, perhaps this field would be a relation,
// or you could just store the ID
public $bookId;
public $newTitle;
public $newAuthor;
// Perhaps this field should be somehow protected from unauthorized changes
public $isApproved;
}
class UpdateBookHandler
{
private $bookRepository;
private $em;
public function handle(UpdateBookCommand $command)
{
if (!$command->isApproved) {
throw new NotAuthorizedException();
}
$book = $this->bookRepository->find($command->bookId);
$book->setTitle($command->newTitle);
$book->setAuthor($command->newAuthor);
$this->em->persist($book);
$this->em->flush();
}
}
Next, in your controller you would just have to make sure that the commands are somehow stored (in a database or maybe even in a message queue), and the handler gets called when the changesets could possibly get applied.
P.S. Perhaps I could have explained this a bit better, but mostly the inspiration for this solution comes from the CQRS pattern that's explained quite well by Martin Fowler. However, I guess in your case a full-blown CQRS implementation is unnecessary and a simpler solution should work.
In Short
I seem to have landed on a MAJOR anti-pattern of saving objects WAY too many times. I've read through the limited Objectify docs and can't seem to find the right pattern to use.
Details
I have multiple objects I want to store. They are all transient (they don't exist in the database yet) and they have a one-to-many relationship. I don't want to sit and call ofy().save() on every last object in my hierarchy.
In the following example, a Player has a List of Cards.
My Model:
#Entity
public class Player {
#Id private Long id = null;//will be generated
private List<Ref<Card>> cards = new ArrayList<Ref<Card>>();
//getters and setters here
}
public class Card{
#Id private Long id = null;//will be generated
//lots of other fields and getters and setters here
}
My Operation:
I need to create a new player and new card, with the player having a reference to the card in his List "cards."
IDEAL SOLUTION:
I would like to just create the player and card java objects, set their relationships, and pass them to Objectify to be saved. Like this:
Player player = new Player();
Card card = new Card();
player.setPlayer(Ref.create(card));
ofy.save().entity(player).now();
That will fail. The 3rd line attempts to create a new Ref for Card, which cannot be done because Card doesn't have an Id yet, which will be assigned to it once it's already persisted. It seems I must never associate an object with another until one has already been saved.
Current Crappy Solution
So, my solution must be to save the Card first, and then relate it to the Player, then save the player.
Player player = new Player();
Card card = new Card();
ofy().save().entity(card).now();
player.setPlayer(Ref.create(card));
ofy().save().entity(card).now();
This is insane. It seems reasonable at first, but my app is dealing with many more relationships than just this, and with this pattern my algorithm will be a spiderweb of checking for transient objects inside collections before saving the entity I'm actually concerned with.
There MUST be some way to tell Objectify to just SAVE all child/related entities along with the entity I've requested, and furthermore generate the Ids necessary instead of throwing an Exception at me.
Furthermore, I'll also need this sort of "recursive save" solution even when none of my objects are transient (ie they all have IDs already). I can't waste my time iterating through collections and then all the collections WITHIN those collections and saving them all. I'm going to need some way of telling Objectify to just SAVE THIS WHOLE HEIRARCHY OF OBJECTS I just passed you.
I've been reading around this #Load annotation and I feel like maybe there's something in there I'm missing... I don't know. Need help. Documentation is slim.
UPDATED SOLUTION
For posterity -
Using the allocateId() method decouples the entire ID generation constraint away from the database and you get a VERY clean pattern, particularly if you do as I did:
All database #Entity classes get a private constructor and a static public factory for creating transient objects. This static factory method ( createTransient() ) will always allocate a new ID. So then, all client code can use this method for acquiring new transient objects, or the obvious objectify load for acquiring existing persisted instances. Simple. Done. Lovely.
I recommend two things:
Allocate ids manually when you construct your objects using ObjectifyFactory.allocateId(). Do not use the "save with null autogenerates" feature. As you've noticed, it's a PITA to deal with entity objects that have null ids, so don't allow them to exist.
Use deferred saves. ofy().defer().save().entity(blah); You can save almost any number of things this way and they'll only get saved once on commit (or closing of the objectify session). Deferring save on the same entity multiple times produces only a single save.
This pattern of leaving ids null and filling it in on save is a holdover from the JPA days. It didn't work very well with JPA either; there were plenty of frustrating edge cases dealing with entities missing ids (especially when you wanted to put the in maps or sets). The best solution is to simply guarantee that no entity is ever missing an id in the first place.
Note that you'll want to allocate the id in a custom constructor, not the no-args constructor that Objectify uses to build your entity on load. Allocating an id is cheap but still a call to the GAE service layer and you don't want to do this on every load.
Some background:
Working with:
.NET 4.5 (thinking of migrating to 4.5.1 if it's painless)
Web Forms
Entity Framework 5, Lazy Loading enabled
Context Per Request
IIS 8
Windows 2012 Datacenter
Point of concern: Memory Usage
Over the project we are currently on, and probably our first bigger project, we're often reading some bigger chunks of data, coming from CSV imports, that are likely to stay the same for very long periods of time.
Unless someone explicitly re-imports the CSV data, they are guaranteed to be the same, this happens in more than one places in our project and similar approach is used for some regular documents that are often being read by the users. We've decided to cache this data in the HttpRuntime cache.
It goes like this, and we pull about 15,000 records consisting mostly of strings.
//myObject and related methods are placeholders
public static List<myObject> GetMyCachedObjects()
{
if (CacheManager.Exists(KeyConstants.keyConstantForMyObject))
{
return CacheManager.Get(KeyConstants.keyConstantForMyObject) as List<myObject>;
}
else
{
List<myObject> myObjectList = framework.objectProvider.GetMyObjects();
CacheManager.Add(KeyConstants.keyConstantForMyObject, myObjectList, true, 5000);
return myObjectList;
}
}
The data retrieving for the above method is very simple and looks like this:
public List<myObject> GetMyObjects()
{
return context.myObjectsTable.AsNoTracking().ToList();
}
There are probably things to be said about the code structure, but that's not my concern at the moment.
I began profiling our project as soon as I saw high memory usage and found many parts where our code could be optimized. I never faced 300 simultaneous users before and our internal tests, done by ourselves were not enough to show the memory issues. I've highlighted and fixed numerous memory leaks but I'd like to understand some Entity Framework related unknowns.
Given the above example, and using ANTS Profiler, I've noticed that 'myObject', and other similar objects, are referencing many System.Data.Entity.DynamicProxies.myObject, additionally there are lots of EntityKeys which hold on to integers. They aren't taking much but their count is relatively high.
For instance 124 instances of 'myObject' are referencing nearly 300 System.Data.Entity.DynamicProxies.
Usually it looks like this, whatever the object is:
Some cache entry, some object I've cached and I now noticed many of them have been detached from dbContext prior caching, the dynamic proxies and the objectContext. I've no idea how to untie them.
My progress:
I did some research and found out that I might be caching something Entity Framework related together with those objects. I've pulled them with NoTracking but there are still those DynamicProxies in the memory which probably hold on to other things as well.
Important: I've observed some live instances of ObjectContext (74), slowly growing, but no instances of my unitOfWork which is holding the dbContext. Those seem to be disposed properly per request basis.
I know how to detach, attach or modify state of an entry from my dbContext, which is wrapped in a unitOfWork, and I often do it. However that doesn't seem to be enough or I am asking for the impossible.
Questions:
Basically, what am I doing wrong with my caching approach when it comes to Entity Framework?
Is the growing number of Object Contexts in the memory a concern, I know the cache will eventually expire but I'm worried of open connections or anything else this context might be holding.
Should I be detaching everything from the context before inserting it into the cache?
If yes, what is the best approach. Especially with List I cannot think of anything else but iterating over the collection and call detach one by one.
Bonus question: About 40% of the consumed memory is free (unallocated), I've no idea why .NET is reserving so much free memory in advance.
You can try using non entity class with specific properties with SELECT method.
public class MyObject2 {
public int ID { get; set; }
public string Name { get; set; }
}
public List<MyObject2> GetObjects(){
return framework.provider.GetObjects().Select(
x=> new MyObject2{
ID = x.ID ,
Name = x.Name
}).ToList();
);
}
Since you will be storing plain c# objects, you will not have to worry about dynamic proxies. You will not have to call detach on anything at all. Also you can store only few properties.
Even if you disable tracking, You will see dynamic proxy because EF uses dynamic class derived from your class which stores extra meta data information (relation e .g. name of foreign key etc to other entities) for the entity.
steps to reduce memory here:
Re new the context, often
Dont try and delete content from the Context. Or Set it to detached.
It hangs around like a fart in a phone box
eg context = new MyContext.
But if possible you should be
using (var context = new Mycontext){ .... }
//short lived contexts is best practice
With your Context you can set Configurations
this.Configuration.LazyLoadingEnabled = false;
this.Configuration.ProxyCreationEnabled = false; //<<<<<<<<<<< THIS one
this.Configuration.AutoDetectChangesEnabled = false;
you can disable proxies if you still feel they are hogging memory.
But that may be unecesseary if you apply using to the context in the first place.
I would redesign the solution a bit:
You are storing all data as a single entry in cache
I would move this and have an entry per cache item.
You are using HTTPRuntime cache
I would use Appfabric Caching, also MS, also free.
Not sure where you are calling that code from
I would Call it on Application start, then all data is in memory when the user needs it
You are using Entity SQL
For this I would use an Entity Data Reader http://msdn.microsoft.com/en-us/library/system.data.entityclient.entitydatareader(v=vs.110).aspx
See also:
http://msdn.microsoft.com/en-us/data/hh949853.aspx