Doctrine2 OFFSET and LIMIT

Doctrine2 OFFSET and LIMIT - symfony

I want to use limit and offset in my query but the number of records returned does not match. When I'm not using offset and limit function gets 26 objects, and after setting methods
->setMaxResults(5)
->setFirstResult(10)
the number is 1 ...
What's going on?

What you are probably is a typical problem you get when fetch-joining in DQL. This is a very simple issue and derives from the fact that offset and limit are applied on a resultset that is not yet hydrated and has to be normalized (see the documentation about first and max results about that).
If you want to avoid the problem (even with more complex joined or fetch-joined results), you will need to use ORM DQL Paginator API. Using the paginator basically triggers multiple queries to:
compute the number of records in the resultset according to your offset/limit
compute the different identifiers of the root entity of your query (with applied max/first results)
retrieve joined results (without applied first/max results)
Its usage is quite simple:
$query = $em->crateQuery($fetchJoinQuery);
$paginator = new \Doctrine\ORM\Tools\Pagination\Paginator($query);
$query->setFirstResult(20);
$query->setMaxResults(100);
foreach ($paginator as $result) {
var_dump($result->getId());
}
This will print 100 items starting from the one at offset 20, regardless of the numer of joined or fetch-joined results.
While this may seem to be un-performant, it's the safest way to handle the problem of fetch-joined results causing apparently scrambled offsets and limits in results. You may look into how this is handled directly by diving into the internals of the ORM Paginator.

Related

Cosmos DB .NET SDK order by a dynamic field (parameterized)

I use the .NET SDK to retrieve some items in a Cosmos DB instance using continuationTokens to be able to retrieve paginated pieces of data. So far this works.
I use a generic Get function to retrieve the items:
var query = container.GetItemQueryIterator<T>(
new QueryDefinition("SELECT * FROM c"),
continuationToken: continuationToken,
requestOptions: new QueryRequestOptions()
{
MaxItemCount = itemCount
});
However I would like to add a dynamic order by field where the callee can decide on which field the results should be ordered. I tried adding a parameterized field like:
new QueryDefinition("SELECT * FROM c order by #orderBy")
.WithParameter("#orderBy", "fieldname")
But this does not work, I keep getting Syntax errors while executing, is it actually possible to dynamically add an order by clause?

The .WithParameter() fluent syntax can only be used with the WHERE clause in QueryDefinition so you will have to construct your sql with the order by appended dynamically to the sql string.
One thing to keep in mind is that unless this is a small workload with less than 20GB of data, this container will not scale unless you use the partition key in your queries. The other consideration here too is that order by gets much better performance when you using composite indexes. But if there are a wide number of properties that results can be sorted on, writes may get very expensive from all of the individual composite indexes.
In all cases, if this is meant to scale you should measure and benchmark high concurrency operations.

DynamoDBScanExpression withLimit returns more records than Limit

Have to list all the records from a DynamoDB table, without any filter expression.
I want to limit the number of records hence using DynamoDBScanExpression with setLimit.
DynamoDBScanExpression scanExpression = new DynamoDBScanExpression();
....
// Set ExclusiveStartKey
....
scanExpression.setLimit(10);
However, the scan operation returns more than 10 results always !!!!
Is this the expected behaviour and if so how?

Python Answer
It is not possible to set a limit for scan() operations, however, it is possible to do so with a query.
A query searches through items, the rows in the database. It starts at the top or bottom of the list and finds items based on set criteria. You must have a partion and a sort key to do this.
A scan on the other hand searches through the ENTIRE database and not by items, and, as a result, is NOT ordered.
Since queries are based on items and scan is based on the ENTIRE database, only queries can support limits.
To answer OP's question, essentially it doesn't work because you're using scan not query.
Here is an example of how to use it using CLIENT syntax. (More advanced syntax version. Sorry I don't have a simpler example that uses resource. you can google that.)
def retrieve_latest_item(self):
result = self.dynamodb_client.query(
TableName="cleaning_company_employees",
KeyConditionExpression= "works_night_shift = :value",
ExpressionAttributeValues={':value': {"BOOL":"True"}},
ScanIndexForward = False,
Limit = 3
)
return result
Here is the DynamoDB module docs

How to order a Doctrine Nested Set Tree when querying

I have a table that acts as a nested set.
The problem is, that the table have fields 'short_name' and 'long_name' and in different places i have to order the results accordingly.
I'm using Doctrine's "HYDRATE_RECORD_HIERARCHY" hydration mode when querying the tree.
The problem is: as far as i can tell this hydration mode is limited in terms that the query have to contain the orderBy clause "ORDER BY lft ASC"
Is there any way to get a sorted result set or do i have to apply some type of sorting after the query has been returned?
Since i'm getting back a Doctrine Collection (i'd really like to stay away from the array representation) it's not that trivial to sort it afterwards.

Optimizing doctrine performance: select * is a bad idea?

We are trying to optimize a project that is consumig a lot of memory resources. All of our query is done using this kind of sintaxes:
$qb->select(array('e'))
->from('MyBundle:Event', 'e');
This is converted in a query selecting every field of the table, like this:
SELECT t0.id AS id1,
t0.field1 AS field12,
t0.field2 AS field23,
t0.field3 AS field34,
t0.field4 AS field45,
FROM event t0
It's a good ideia for performance to use Partial Object Syntax for hydrating only some predefined fields? I really don't know if it will affect performance and I will have a lot of disadvantages because other fields will be null. What do you use to do in your select queries with Doctrine?
Regards.

My two cents
I suppose that hydration (Object Hydration and lazy loading, of course) is good until you don't know how many and what fields to pull from DB tables and put into objects. If you know that you have to retrieve all fields, is better to get them once and work with them, instead of do every time a query that is time-consuming.
However, as a good practice, when I have retrieved and used my objects I unset them explicitly (not if they are last instructions of my function that will return and implicitly unset them)
Update
$my_obj_repo = $this->getDoctrine()->getManager()->getRepository('MyBundleName:my_obj');
$my_obj = $my_obj_repo->fooHydrateFunction(12); //here I don't pull out from db all data
//do stuff with this object, like extracting data or manipulating data
if($my_obj->getBarField() == null) //this is the only field I've load with fooHydrateFunction, so no lazy loading
{
$my_obj->setBarField($request->query->get('bar');
$entity_manager->persist($my_obj);
$entity_manager->flush();
}
//here my object isn't necessary anymore
unset($my_obj); //this is a PHP instruction
//continue with my business logic

Linq 'contains' query taking too long

I have this query:
var newComponents = from ic in importedComponents
where !existingComponents.Contains(ic)
select ic;
importedComponents and existingComponents are of type List<ImportedComponent>, and exist only in memory (are not tied to a data context). In this instance, importedComponents has just over 6,100 items, and existingComponents has 511 items.
This statement is taking too long to complete (I don't know how long, I stop the script after 20 minutes). I've tried the following with no improvement in execution speed:
var existingComponentIDs = from ec in existingComponents
select ec.ID;
var newComponents = from ic in importedComponents
where !existingComponentIDs.Contains(ic.ID)
select ic;
Any help will be much appreciated.

The problem is quadratic complexity of this algorithm. Put the IDs of all existingComponentIDs into a HashSet and use the HashSet.Contains method. It has O(1) lookup cost compared to O(N) for Contains/Any on a list.
The morelinq project contains a method that does all of that in one convenient step: ExceptBy.

You could use Except to get the set difference:
var existingComponentIDs = existingComponents.Select(c => c.ID);
var importedComponentIDs = importedComponents.Select(c => c.ID);
var newComponentIDs = importedComponentIDs.Except(existingComponentIDs);
var newComponents = from ic in importedComponents
join newID in newComponentIDs on ic.ID equals newID
select ic;
foreach (var c in newComponents)
{
// insert into database?
}
Why is LINQ JOIN so much faster than linking with WHERE?
In short: Join method can set up a hash table to use as an index to quicky zip two tables together

Well based on the logic and numbers you provided that means you are basically performing 3117100 comparisons when you run that statement. Obviously that is not entirely accurate because your condition may be satisfied before running through the entire array but you get my point.
With collections this large you are going to want use a collection where you can index your key (in this case your component ID) to help reduce the overhead of the search. The thing to remember is that even though LINQ looks like SQL there are no magic indexes here; it is mainly for convenience. In fact, I have seen articles where a link lookup is actually a slight bit slower than a brute force lookup.
EDIT: If it is possible I would suggest trying a Dictionary or SortedList for your values. I believe either one would have slightly better lookup performance.