I want to use a local (un-backed) mongo collection as a cache, but I need to cap it so that it doesn't explode in size. Does anyone know of a good way to do this?
e.g.
Cache._createCappedCollection(1024 * 1024 * 10); // 10MB
The above works if we're working with a mongo collection, but if the collection is unbacked, then it'll throw an error: Error("Can only call _createCappedCollection on server collections")
Related
Does anyone know how to limit an array so new items get pushed in and old ones are discarded in the same write?
I'm guessing this isn't possible but it sure would be handy.
// * Store notification
// Users collection
const usersCollection = db.collection('users').doc(uid).collection('notifications').doc();
// Write this notification to the database as well
await usersCollection.update({
count: admin.firestore.FieldValue.increment,
notifications: admin.firestore.FieldValue.arrayUnion({
'symbol': symbol,
'companyname': companyname,
'change': priceDifference,
'changeDirection': directionOperatorHandler,
'updatedPrice': symbolLatestPrice,
'timestamp': currentTimestamp,
})
});
Written in Typescript
Alternatively, I was thinking of running a scheduled cloud function every week to go through and trim down the arrays based on the timestamp.
The reason I'm using an array to store my notifications is because I'm expecting a lot of writes.
There is no simple configuration for this. Your code should implement your requirements by:
Reading the document
Modifying the array in memory
Checking that the size is within limits
Writing the document back
Goal here to delete as fast as possible keeping Firebaes Realtime Database instance utilization under 100 %.
I have 360 GB data in Firebase Realtime Database. Now I want to delete most the data that is not need. I have script that is doing delete by using firebase database:remove /node1/child1 (https://firebase.googleblog.com/2019/03/large-deletes-in-realtime-database.html)
Node Structure
"node1":{
"child1":{
"thousand of child's node here i want to delete"
},
"child2":{
"thousand of child's node here i want to delete"
},
"child3":{
"child3 is required can not delete this one "
}
}
I was thinking if I update path firebase database:remove /node1/child1 to null. Will it remove all the child of child 1? and difference between these two approaches?
You should use firebase database:remove, as detailed in this blog post. By just calling remove() or update(null), you will lock the database until all data is deleted, something that could be many minutes or even hours with a dataset that large.
The CLI command will instead chunk and batch deletes into reasonable sizes, keeping your database utilization from being completely locked. In fact, with database:remove you don't need to manually batch -- you can just pass it the largest node that you need deleted and it will automatically take care of batching for you.
If you pass null value to a firebase path, then it should be the same as remove that path.
Passing null for the new value is equivalent to calling remove(); namely, all data at this location and all child locations will be deleted.
Firebase documentation
The implementation of remove method in firebase admin (for android) also using this set to null.
/**
* Set the value at this location to 'null'
*
* #return The ApiFuture for this operation.
*/
public ApiFuture<Void> removeValueAsync() {
return setValueAsync(null);
}
Firebase source code in Java
I have an Application that has a relationship to ApplicationFile:
/**
* #ORM\OneToMany(
* targetEntity="AppBundle\Entity\ApplicationFile",
* mappedBy="application",
* cascade={"remove"},
* orphanRemoval=true
* )
*/
private $files;
A file entity has a field that stores binary data, and can be up to 2MB in size. When iterating over a large list of applications and their files, PHP memory usage grows. I want to keep it down.
I've tried this:
$applications = $this->em->getRepository('AppBundle:Application')->findAll();
foreach ($applications as $app) {
...
foreach ($app->getFiles() as $file) {
...
$this->em->detach($file);
}
$this->em->detach($app);
}
Detaching the object should tell the entity manager to stop caring about this object and de-referencing it, but it surprisingly has no effect on the amount of memory usage - it keeps increasing.
Instead, I have to manually load the application files (instead of retrieving them through the association method), and the memory usage does not increase. This works:
$applications = $this->em->getRepository('AppBundle:Application')->findAll();
foreach ($applications as $app) {
...
$appFiles = $this
->em
->getRepository('AppBundle:ApplicationFile')
->findBy(array('application' => $application));
foreach ($appFiles as $file) {
...
$this->em->detach($file);
}
$this->em->detach($app);
}
I used xdebug_debug_zval to track references to the $file object. In the first example, there's an extra reference somewhere, which explains why memory is ballooning - PHP is not able to garbage collect it!
Does anyone know why this is? Where is this extra reference and how do I remove it?
EDIT: Explicitly calling unset($file) at the end of its loop has no effect. There are still TWO references to the object at this point (proven with xdebug_debug_zval). One contained in $file (which I can unset), but there's another somewhere else that I cannot unset. Calling $this->em->clear() at the end of the main loop has no effect either.
EDIT 2: SOLUTION: The answer by #origaminal led me to the solution, so I accepted his answer instead of providing my own.
In the first method, where I access the files through the association on $application, this has a side effect of initializing the previously uninitialized $files collection on the $application object I'm iterating over in the outer loop.
Calling $em->detach($application) and $em->detach($file) only tells Doctrine's UOW to stop tracking the objects, but it doesn't affect the array of $applications I'm iterating on, which now have populated collection of $files which eat up memory.
I have to unset each $application object after I'm done with it to remove all references to the loaded $files. To do this, I modified the loops as such:
$applications = $em->getRepository('AppBundle:Application')->findAll();
$count = count($applications);
for ($i = 0; $i < $count; $i++) {
foreach ($applications[$i]->getFiles() as $file) {
$file->getData();
$em->detach($file);
unset($file);
}
$em->detach($applications[$i]);
unset($applications[$i]);
// Don't NEED to force GC, but doing so helps for testing.
gc_collect_cycles();
}
Cascade
EntityManager::detach should indeed remove all references Doctrine has to the enities. But it does not do the same for associated entities automatically.
You need to cascade this action by adding detach the cascade option of the association:
/**
* #ORM\OneToMany(
* targetEntity="AppBundle\Entity\ApplicationFile",
* mappedBy="application",
* cascade={"remove", "detach"},
* orphanRemoval=true
* )
*/
private $files;
Now $em->detach($app) should be enough to remove references to the Application entity as well as its associated ApplicationFile entities.
Find vs Collection
I highly doubt that loading the ApplicationFile entities through the association, in stead of using the repository to findBy() them, is the source of your issue.
Sure that when loaded through the association, the Collection will have a reference to those child-entities. But when the parent entity is dereferenced, the entire tree will be garbage collected, unless there are other references to those child entities.
I suspect the code you show is pseudo/example code, not the actual code in production. Please examine that code thoroughly to find those other references.
Clear
Sometimes it worth clearing the entire EntityManager and merging a few entities back in. You could try $em->clear() or $em->clear('AppBundle\Entity\ApplicationFile').
Clear has no effect
You're saying that clearing the EntityManager has no effect. This means the references you're searching for are not within the EntityManager (of UnitOfWork), because you've just cleared that.
Doctrine but not Doctrine
Are you using any event-listeners or -subscribers? Any filters? Any custom mapping types? Multiple EntityManagers? Anything else that could be integrated into Doctrine or its life-cycle, but is not necessarily part of Doctrine itself?
Especially event-listeners/subscribers are often overlooked when searching for the source of issues. So I'd suggest you start to look there.
If we are speaking about your first implementation you have extra links to the collection in the PersistentCollection::coll of the Application::files property - this object is created by Doctrine on Application instantiation.
With detach you are just deleting UoW links to the object.
There are different ways to fix this but a lot of hacks should be applied. Most nice way probably to detach also Application object and unset it.
But it is still preferable to use more advanced ways for a batch processing: some were listed in the other answer. The current way forces doctrine to make use proxies and throws extra queries to DB to get the files of the current object.
Edit
The difference between the first and the second implementation is that there are no circular references in the second case: Application::files stays with uninitialized PersistenceCollection (with no elements in coll).
To check this - can you try to drop the files association explicitly?
The trick is in PHP's garbage collector, that works a bit odd. First off all each time the scripts need memory it will allocate memory from RAM, even if you use unset(), $object = null, or other tricks to free the memory, the allocated memory will not be returned to Operating System till the script is not finished and the process associated with it killed.
How to fix that ?
Usually is done on Linux Systems
Create commands that runs the needed script with limit, offset parameters and re-run the needed scripts in small batches more times. In this way, the script will use less memory, and each time the memory will be freed when the script will be finished.
Get rid of Doctrine it balloons by himself the memory, PDO is much faster and less costly.
For that kind of task where objects in memory could lead to that leaks, you should use Doctrine2 iterators
$appFiles = $this
->em
->getRepository('AppBundle:ApplicationFile')
->findBy(array('application' => $application));
should be refactored in order to return a query object and not an ArrayCollection, then from that query object, you can easily call iterate() method and clean the memory after every object inspection
Edit
You have "hidden references" because detach operation will not delete the object in memory, it only tells to EntityManager not to handle it anymore. This is why you should use my solution or unset() the object with php function.
I'd like to clone a minimongo collection so I can do some calculations, get a result, then push those results back to the server.
Assuming this is a suitable pattern, how best can I clone a minimongo collection?
It appears that in the object no longer has a ._deepcopy (1.0.4), and attempting an EJSON.clone exceeds the callstack size for even tiny collections. Underscore's _.clone() only copies by reference.
Alternatively, I could just edit the local collection via collection._collection.update. But if that's the case, what would happen if on the off chance the server updated or removed a doc while it was processing? I watched this video, but am still unclear on that scenario: https://www.eventedmind.com/feed/meteor-how-does-the-client-synchronize-writes-with-the-server
The why behind your pattern escapes me but one solution could be to define a null collection, (docs) copy the records you need to that, do your work, and then copy back the results into the original collection for automatic sync back to the server.
myLocalCollection = new Mongo.Collection(null);
Is there any way in meteor to do bulk changes in published collections in server side?... like updating/inserting hundreds or thousands records without each individual record being sent to all subscribers one by one?
I am pulling in periodically third party data and just want to gather all updates or inserts in one pull as one batch update so all clients will receive it as one change package not thousands of mini-updates. Doing it one by one creates a big bottleneck in my app atm.
If there is no support in meteor for this atm then should i just do the updates directly to mongo and let meteor to pick it up on the next mongo poll?
// imagine myChanges array with 1000 items
myChanges.forEach(function(change){
// this will trigger the sync with clients immediately... 1000 times
// currently this will practically hang my server
// i want to gather the changes here instead
MyCollection.update({_id: change.docId}, change);
});
// and trigger the sync here instead
Thanks,
Reio
Depending on how complex your application is, or how many publish calls you have, this could be acheived by rolling your own Publish functions.
i.e. Instead of
Meteor.publish("myCollection", function() {
return myCollection.find(); } );
Create your own Publish / Cursor and add hooks to it.
Meteor.publish("myCollection", function() {
globalObserver = myCollection.find().observe({
added: function(item) {
publication.added(item);
}
// And So on
});
Your batch updates would then need to interact with this publish function in some way, causing it to stop and re-intialise the observer once your updates are complete.