Doctrine Batch Processing doesn't clear memory? - symfony

I am trying to use a technique suggested by Doctrine 2 to process a large number of objects. The technique suggests that by using an iterator and by detaching after processing each iteration, memory usage should be kept to a minimum (they speak about an increase of a few KB for processing 10000 records).
However, when I try to do this, I do not see any objects freed. In fact, I am retrieving a little more than 2000 assets and this increases my memory usage by 90 MB. Clearly, these objects are not freed. Can anyone tell me what I am doing wrong? My code looks as follows:
$profiles = //array of Profile entities
$qb = $this->createQueryBuilder('a')
->addSelect('file')
->leftJoin($profileNodeEntityName, 'pn', JOIN::WITH, 'pn.icon = a OR pn.asset = a')
->leftJoin(
$profileEntityName,
'p',
JOIN::WITH,
'pn.profile = p OR p.logo = a OR p.background = a OR p.pricelistAsset = a OR p.pdfTplBlancAsset = a OR p.pdfTplFrontAsset = a OR p.pdfTplBackAsset = a'
)
->innerJoin('a.currentFile', 'file')
->where('p IN (:profiles)')
->setParameter('profiles', $profiles)
->distinct(true);
$iterableResult = $qb->getQuery()->iterate();
$start = memory_get_usage() / 1024;
while (($row = $iterableResult->next()) !== false) {
// process $row[0]
$this->getEntityManager()->detach($row[0]);
}
$end = memory_get_usage() / 1024 - $start;
// $end is more of less equal to 90000 or 90 MB
Thanks!

You should also detach the related entities, or set cascade={"detach"} on the associations.

Related

Android: Why is Room so slow?

I am working on a simple database procedure in Kotlin using Room, and I can't explain why the process is so slow, mostly on the Android Studio emulator.
The table I am working on is this:
#Entity(tableName = "folders_items_table", indices = arrayOf(Index(value = ["folder_name"]), Index(value = ["item_id"])))
data class FoldersItems(
#PrimaryKey(autoGenerate = true)
var uid: Long = 0L,
#ColumnInfo(name = "folder_name")
var folder_name: String = "",
#ColumnInfo(name = "item_id")
var item_id: String = ""
)
And what I am just trying to do is this: checking if a combination folder/item is already present, insert a new record. If not, ignore it. on the emulator, it takes up to 7-8 seconds to insert 100 records. On a real device, it is much faster, but still, it takes around 3-4 seconds which is not acceptable for just 100 records. It looks like the "insert" query is particularly slow.
Here is the procedure that makes what I have just described (inside a coroutine):
val vsmFoldersItems = FoldersItems()
items.forEach{
val itmCk = database.checkFolderItem(item.folder_name, it)
if (itmCk == 0L) {
val newFolderItemHere = vsmFoldersItems.copy(
folder_name = item.folder_name,
item_id = it
)
database.insertFolderItems(newFolderItemHere)
}
}
the variable "items" is an array of Strings.
Here is the DAO definitions of the above-called functions:
#Query("SELECT uid FROM folders_items_table WHERE folder_name = :folder AND item_id = :item")
fun checkFolderItem(folder: String, item: String): Long
#Insert
suspend fun insertFolderItems(item: FoldersItems)
Placing the loop inside a single transaction should significantly reduce the time taken.
The reason is that each transaction (by default each SQL statement that makes a change to the database) will result in a disk write. So that's 100 disk writes for your loop.
If you begin a transaction before the loop and then set the transaction successful when the loop is completed and then end the transaction a single disk write is required.
What I am unsure of is exactly how to do this when using a suspended function (not that familiar with Kotlin).
As such I'd suggest either dropping the suspend or having another Dao for use within loops.
Then have something like :-
val vsmFoldersItems = FoldersItems()
your_RoomDatabase.beginTransaction()
items.forEach{
val itmCk = database.checkFolderItem(item.folder_name, it)
if (itmCk == 0L) {
val newFolderItemHere = vsmFoldersItems.copy(
folder_name = item.folder_name,
item_id = it
)
database.insertFolderItems(newFolderItemHere)
}
}
your_RoomDatabase.setTransactionSuccessful() //<<<<<<< IF NOT set then ALL updates will be rolled back
your_RoomDatabase.endTransaction()
You may wish to refer to:-
https://developer.android.com/reference/androidx/room/RoomDatabase
You may wish to especially refer to runInTransaction

Hazelcast Distributed Map Heap Size Control

I'm using Hazelcast as embedded Distributed Map in my APIs such as MemTable to accumulate entries before It would send to another storage. My question is :
Can I control the Heap Size using the LocalMapStats object provided for IMap ?
I was reading about that object and I though methods such as getHeapSize() or getOwnedEntryMemoryCost() plus getBackupEntryMemoryCost() maybe could be give me the Memory Cost to compare against a threshold and then decide what to do with the data.
Thanks in advance.
You can identify the heap cost of the Map via the API. This can also be done easily via the scripting console in Hazelcast management center portal. The code is as below
function findOverallDataSizeImap() {
var objs = hazelcast.getDistributedObjects();
var len = objs.length;
var output='';
var totalSizeInMB=0.0;
for(var i=0;i<len;i++){
if(objs[i] instanceof com.hazelcast.core.IMap){
output = output+' Name : '+objs[i].getName() +' Size (MB) :'+
(objs[i].getLocalMapStats().getHeapCost()/100000)+' \n';
totalSizeInMB=totalSizeInMB+(objs[i].getLocalMapStats().getHeapCost()/100000);
}
}
output = output + ' Total Size (MB) = ' + totalSizeInMB;
return output;
}

How does MaxItemCount from FeedOption and RetrievedDocumentCount from QueryMetric works in Cosmos DB and why both never match?

I am currently facing query performance issue with Cosmos DB and I am quite sure I have followed most of the performance tips from Microsoft page but still query takes > 1 second.
Connection policy
private static readonly ConnectionPolicy ConnectionPolicy = new ConnectionPolicy
{
ConnectionMode = ConnectionMode.Direct,
ConnectionProtocol = Protocol.Tcp,
RequestTimeout = new TimeSpan(1, 0, 0),
MaxConnectionLimit = 1000,
RetryOptions = new RetryOptions
{
MaxRetryAttemptsOnThrottledRequests = 10,
MaxRetryWaitTimeInSeconds = 60
}
};
Document Client
this.Client = new DocumentClient(new Uri(config.DocumentDBURI), config.DocumentDBKey, ConnectionPolicy);
Document Query
FeedOptions options = new FeedOptions
{
MaxItemCount = config.getSearchLimit,//// which is 100
PartitionKey = new PartitionKey(partitionKey),
RequestContinuation = responseContinuation
};
var documentQuery = Client.CreateDocumentQuery<SearchByAttributesResult>(
this.TenantCollectionUri,
querySpec,
options).AsDocumentQuery();
Query 1
SELECT p.Doc.id, p.Doc.Name, p.Doc.isOrganization,p.Doc.organizationLegalName, p.Doc.isFactoryAutoUpdate,p.Doc.StartDate, p.Doc.EndDate, p.Doc.InactiveReasonCode,p.Doc.Specialty.specialty AllSpecialty, Address from p JOIN Address IN p.Doc.Address.address WHERE (p.Doc.EndDate = null or (p.Doc.StartDate <= #STARTDATE and p.Doc.EndDate >= #ENDDATE)) and CONTAINS(p.Doc.Name, #PROVIDERNAME) and Address.alpha2Code= #ALPHA2CODE
Query 2
SELECT p.Doc.id, p.Doc.Name, p.Doc.isOrganization,p.Doc.organizationLegalName, p.Doc.isFactoryAutoUpdate,p.Doc.StartDate, p.Doc.EndDate, p.Doc.InactiveReasonCode,p.Doc.Specialty.specialty AllSpecialty, Address from p JOIN Address IN p.Doc.Address.address WHERE (p.Doc.EndDate = null or (p.Doc.StartDate <= #STARTDATE and p.Doc.EndDate >= #ENDDATE)) and STARTSWITH(Address.postalCode, #POSTALCODE) and Address.alpha2Code= #ALPHA2CODE
above query changes based on user search condition
I have only 900 documents in my collection but still query takes > 1 seconds always.
trying to understand few points here
Though I set MaxItemCount to 100 why I am seeing RetrievedDocumentCount from QueryMetrics as 900?
use of CONTAINS/STARTSWITH causing this performance issue?
What's wrong I am doing here and how can i improve this query performance into sub-seconds ( <.5s)
First things first, MaxItemCount doesn't mean that you will get the top 100 documents.
It means that every iteration of ExecuteNextAsync will return up to 100 documents at a time, but up to everything that matches this query.
If you want to limit your results to the top 100 then, in LINQ use the .Take(100) method before you use AsDocumentQuery or in SQL use the TOP keyword.
In terms of performance, it's bad for three reasons.
Checking for records between range of dates
You are using the CONTAINS/STARTSWITH function.
You are joining
At this point, if changing the schema isn't an option, I would recommend reading more about Indexing and optimising it based on the querying requirements of your application.

How to add dynamically select aliases inside a query

Context: Given the fact that the following query :
$queryBuilder = $this->createQueryBuilder("cv")
->leftJoin('cv.user', 'u')
->where('cv.game = :game')
->setParameter('game', $game);
Will trigger 1+X distinct queries (one to get all the CV, then if u.user is used in the template, will trigger X other queries to fetch users).
If I want to optimize and to reduce those multiple unoptimized queries to 1 single query, i'll do so :
$queryBuilder = $this->createQueryBuilder("cv")
->select('cv, u')
->leftJoin('cv.user', 'u')
->where('cv.game = :game')
->setParameter('game', $game);
This Way, i'll be able to save X queries.
Now, my problem is in my repository, I have conditional joins and I want to chain the select aliases at different places in my code.
Like (simplified example) :
$queryBuilder = $this->createQueryBuilder("cv")
->select('cv, u')
->leftJoin('cv.user', 'u')
->where('cv.game = :game')
->setParameter('game', $game);
if ($myCondition === true) {
$queryBuilder->add('select', 'l');
$queryBuilder->join('cv.level', 'l');
}
But it seems that the add->('select') does not stack like an addWhere().
Are there any other solutions than using a custom solution like this :
$queryBuilder = $this->createQueryBuilder("cv")
->leftJoin('cv.user', 'u')
->where('cv.game = :game')
->setParameter('game', $game);
$aliases = array('cv', 'u');
if ($myCondition === true) {
$aliases[] = 'l';
$queryBuilder->add('select', 'l');
$queryBuilder->join('cv.level', 'l');
}
$queryBuilder->select(implode(',', $aliases);
Thanks.
// Replace
$queryBuilder->add('select', 'l');
// With
$queryBuilder->addSelect('l');
And a bit of unsolicited advice. I know how much "un optimized queries" bother most people, including myself. However, consider doing some bench marks on large data sets. It's surprising how fast lazy loading is. Very little difference even with thousands of queries.

PHPUnit_Framework_TestCase memory leak with large DataProvider

When I run PHPUnit, it appears to me as if it had a memory-leak when running many tests inside a single test class. But I don't know if this is a bug or it was the expected behaviour.
To reproduce:
I create a simple testHello() with a silly assertTrue(true).
I feed it from providerHello(). Just feeding 3 dummy params.
With $numberOfTests = 1;, consumed memory is 5.75MB.
PHPUnit output = Time: 0 seconds, Memory: 5.75Mb
With $numberOfTests = 10000;, I don't expect the memory to grow so much, just the size of the new array. But the used memory is 99.75MB which I feel it is too much.
PHPUnit output = Time: 4 seconds, Memory: 99.75Mb
I added a dirty echo() in the provider, just to know how much memory the array made the script to consume.
With 1 test: Memory = 5294552 (5.2MB)
With 10.000 tests: Memory = 15735352 (15.7MB)
The questions:
Why do I loose 84MB in the way? (99.75 really consumed - 15.75 really used by the array)
Is it hormal that it allocates memory at each iteration, probably its internal setUp(), but does not free the same amount at the internal tearDown()?
Am I doing anything wrong?
My version:
phpunit --version gives PHPUnit 3.6.10 by Sebastian Bergmann..
This is the code:
<?php
class DemoTest extends \PHPUnit_Framework_TestCase
{
/** #dataProvider providerHello */
public function testHello( $a, $b, $c )
{
$this->assertTrue( true );
}
public function providerHello()
{
$numberOfTests = 10000;
$data = array();
for( $i = 0; $i < $numberOfTests; $i++ )
{
$data[] = array( 1, 2, 3 );
}
echo( "Memory = " . memory_get_peak_usage() . PHP_EOL );
return $data;
}
}
?>
you need to set backupGlobals and backupStaticAttributes to false in your phpunit.xml file. If you don't use an config file you can also do so on the command line.
--no-globals-backup
--static-backup

Resources