I have a table that acts as a nested set.
The problem is, that the table have fields 'short_name' and 'long_name' and in different places i have to order the results accordingly.
I'm using Doctrine's "HYDRATE_RECORD_HIERARCHY" hydration mode when querying the tree.
The problem is: as far as i can tell this hydration mode is limited in terms that the query have to contain the orderBy clause "ORDER BY lft ASC"
Is there any way to get a sorted result set or do i have to apply some type of sorting after the query has been returned?
Since i'm getting back a Doctrine Collection (i'd really like to stay away from the array representation) it's not that trivial to sort it afterwards.
Related
I've been reading a DynamoDB docs and was unable to understand if it does make sense to query on Global Secondary Index with a usage of 'contains' operator.
My problem is as follows: my dynamoDB document has a list of embedded objects, every object has a 'code' field which is unique:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
]
}
I want to be able to get all documents that contain entities with entity.code = X.
For this purpose I'm considering adding a Global Secondary Index that would contain all entity.codes that are present in current db document separated by a comma. So the example above would look like:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
],
"entitiesGlobalSecondaryIndex":"entityCode1,entityCode2"
}
And then I would like to apply filter expression on entitiesGlobalSecondaryIndex something like: entitiesGlobalSecondaryIndex contains entityCode1.
Would this be efficient or using global secondary index does not make sense in this way and DynamoDB will simply check the condition against every document which is similar so scan?
Any help is very appreciated,
Thanks
The contains operator of a query cannot be run on a partition Key. In order for a query to use any sort of operators (contains, begins with, > < ect...) you must have a range attributes- aka your Sort Key.
You can very well set up a GSI with some value as your PK and this code as your SK. However, GSIs are replication of the table - there is a slight potential for the data ina GSI to lag behind that of the master copy. If the query you're doing against this GSI isn't very often, then you're probably safe from that.
However. If you are trying to do this to the entire table at once then it's no better than a scan.
If what you need is a specific Code to return all its documents at once, then you could do a GSI with that as the PK. If you add a date field as the SK of this GSI it would even be time sorted. If you query against that code in that index, you'll get every single one of them.
Since you may have multiple codes, if they aren't too many per document, you maybe could use a Sparse Index - if you have an entity with code "AAAA" then you also have an attribute named AAAA (or AAAAflag or something.) It is always null/does not exist Unless the entities contains that code. If you do a GSI on this AAAflag attribute, it will only contain documents that contain that entity code, and ignore all where this attribute does not exist on a given document. This may work for you if you can also provide a good PK on this to keep the numbers well partitioned and if you don't have too many codes.
Filter expressions by the way are different than all of the above. Filter expressions are run on tbe data that would be returned, after it is already read out of the table. This is useful I'd you have a multi access pattern setup, but don't want a particular call to get all the documents associated with a particular PK - in the interests of keeping the data your code is working with concise. The query with a filter expression still retrieves everything from that query, but only presents what makes it past the filter.
If are only querying against a particular PK at any given time and you want to know if it contains any entities of x, then a Filter expressions would work perfectly. Of course, this is only per PK and not for your entire table.
If all you need is numbers, then you could do a count attribute on the document, or a meta document on that partition that contains these values and could be queried directly.
Lastly, and I have no idea if this would work or not, if your entities attribute is a map type you might very well be able to filter against entities code - and maybe even with entities.code.contains(value) if it was an SK - but I do not know if this is possible or not
Say I have table of photos and users.
Given I have a list of users I'm following [user1,user2,...] and I want to get a list of photos of people I'm following.
How can I query the table of photos where photo.createdBy in [user1,user2,user3...]
I saw that dynamodb has a batch operation, but that takes a primary key, and in this case we would be querying against a secondary index (createdBy).
Is there a way to do a query like this in dynamodb?
If you are querying purely on photo.createdBy, then you should create a global secondary index:
To speed up queries on non-key attributes, you can create a global secondary index. A global secondary index contains a selection of attributes from the table, but they are organized by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table; it doesn't even need to have the same key schema as a table.
This will, of course, only retrieve one item. To limit results when returning more items, use a FilterExpression:
With a Query or a Scan operation, you can provide an optional filter expression to refine the results returned to you. A filter expression lets you apply conditions to the data after it is queried or scanned, but before it is returned to you. Only the items that meet your conditions are returned.
This can be applied to a Filter or Scan, but be careful of using too many Read Capacity Units when scanning for matching entries.
I have a database with a 1..many relationship between two tables, call them Color and Car. A Color is associated 1..many with Cars. In my case, it's critical that Colors can be deleted any time. No cascade delete, so if a Color is deleted, the Car's Color_ID field points to something that doesn't exist. This is OK. They are related via a FK named Color_ID.
The problem comes in when I do this:
var query = context.Cars.Include(x => x.Colors);
This only returns Cars that have an associated Color record that exists. What I really want is ALL the Cars, even if their color doesn't exist, so I can do model binding with a GridView, i.e.
<asp:Label runat="server" Text='<%# Item.Colors == null ? "Color Deleted!" : Item.Colors %>' />
All of this works fine if I remove the .Include() and resort to lazy loading. Then Item.Car.Color is null. Perfect. However I'm seriously concerned about doing way too many database queries for a massive result set, which is certainly possible.
One solution to avoid excessive db queries is to return an anonymous type from the datasource query with all the specific related bits of info that I need for the grid and convert all my "Item" style bindings to good 'ol Eval(). But then I lose the strong typing, and the simplicity that Value Provider attributes bring. I'd hate to re-write all that.
Am I right, do I have to choose one or the other? How can I shape my query to return all the Car records, even if there is no Color record? I think I'm screwed if I try to eager load with .Include(). I need like a .IncludeWithNulls() or something.
UPDATE: Just thought of this. I don't know how ugly this is as far as query cost, but it works. Is there a better way??
var query = context.Cars.Include(x => x.Colors);
var query2 = context.Cars.Where(x => !context.Colors.Any(y => y.Color_ID == x.Color_ID);
return query.Union(query2);
The problem was an incorrect end multiplicity. What I really needed was not 1..many but 0..many. That way, Entity Framework generates a left outer join instead of an inner join from the .Include(). Which makes sense, there may be zero actual Color records in the example above. The thing that confused me was that in the SQL database, I never set those foreign key fields to nullable because at the time of creation, they always required a valid foreign key. So I set them to nullable and fixed up my .edmx table and everything is working. I did have to add a few more null checks here and there such as the one in my question above, that weren't strictly necessary before, since the .Include is now pulling in records that reference missing related entities, but no big deal.
So I lose out on the non-null checking at the db level, but I gain some consistent logic in my LINQ queries for how those tables actually relate and what I expect to get back.
I want to use limit and offset in my query but the number of records returned does not match. When I'm not using offset and limit function gets 26 objects, and after setting methods
->setMaxResults(5)
->setFirstResult(10)
the number is 1 ...
What's going on?
What you are probably is a typical problem you get when fetch-joining in DQL. This is a very simple issue and derives from the fact that offset and limit are applied on a resultset that is not yet hydrated and has to be normalized (see the documentation about first and max results about that).
If you want to avoid the problem (even with more complex joined or fetch-joined results), you will need to use ORM DQL Paginator API. Using the paginator basically triggers multiple queries to:
compute the number of records in the resultset according to your offset/limit
compute the different identifiers of the root entity of your query (with applied max/first results)
retrieve joined results (without applied first/max results)
Its usage is quite simple:
$query = $em->crateQuery($fetchJoinQuery);
$paginator = new \Doctrine\ORM\Tools\Pagination\Paginator($query);
$query->setFirstResult(20);
$query->setMaxResults(100);
foreach ($paginator as $result) {
var_dump($result->getId());
}
This will print 100 items starting from the one at offset 20, regardless of the numer of joined or fetch-joined results.
While this may seem to be un-performant, it's the safest way to handle the problem of fetch-joined results causing apparently scrambled offsets and limits in results. You may look into how this is handled directly by diving into the internals of the ORM Paginator.
I have an Entity Framework 4 design that allows referenced tables to be deleted (no cascade delete) without modifying the entities pointing to them. So for example entity A has a foreign key reference to entity B in the ID field. B can be deleted (and there are no FK constraints in the database to stop that), so if I look at A.B.ID it is always a valid field (since all this does is return the ID field in A) even if there is no record B with that ID due to a previous deletion. This is by design, I don't want cascading deletes, I need the A records to stick around for a while for auditing purposes.
The problem is that filtering out the non-existing deleted records is not as easy as it sounds. So for example if I do this:
from c in A
select A.B.somefield;
This results in a OUTER JOIN in the generated SQL so it's picking up all the A records even if they refer to missing B records. So, the hack I've been using to solve this (since I can't figure out a better way!) is do add a where clause to check a string field in the referenced B records. If that field in the B entity is null, then I assume B doesn't exist.
from c in A
where c.B.somestringfield != null
select A.B.somefield;
seems to work IF B.somestringfield is a string. If it is an integer, this doesn't work!
This is all such a hack to me. I've thought of a few solutions but they are just not practical:
Query all tables that reference B when a B is deleted and null out their foreign keys. This is so ugly, I don't want to have to remember to do this if I add another entity that references B in the future. Not to mention a huge performace delay resolving all the references whenever I delete something.
Add a string field to every table that I can count on being there that I can check to see if the entity exists. Blech, I don't want to add a database field just for this.
Implement a soft delete and keep all the referencial integrity intact - essentially set up cascading deletes, but this is going to result is huge database bloat since I can't clean up a massive amount of records due to the references. No go.
I thought I had this problem licked with the "check if a field in the referenced entity is null" trick but it breaks under conditions that I don't completely understand (what if I don't have any strings in the referenced table? What kinds of fields will work? Integers won't.)
As an example if I have an integer field "count" in entity B and I check to see if it's null like:
from c in A
where c.B.count != null
select c.B.count;
I get a bunch of records with null for count mixed in with the results, and in fact the query bombs out with an "InvalidOperationException: The cast to value type 'Int32' failed because the materialized value is null. Either the result type's generic parameter or the query must use a nullable type."
So I need to do
from c in A
where c.B.count != null
select new { count = (int?)c.B.count };
to even see the null records. So this is pretty baffling to me how that query can result in null records in the results at all.
I just discovered something, if I do an explicit join like this, the SQL is INNER JOIN and everything works great:
from c in A
join j in B on A.B.ID equals j.ID
select c;
But this sucks. I'll have to modify a ton of queries to add explicit join clauses instead of enjoying the convenience of the relationship fields I get with the EF. Kinda defeats the purpose and adds a buch more code to maintain.
When you say that your first code snippet creates an OUTER JOIN then it's the case because B is an optional navigation property of entity A. For a required navigation property EF would create an INNER JOIN (explained in more detail here: https://stackoverflow.com/a/7640489/270591).
So, the only alternative I see to your last code snippet (using explicit join in LINQ) - aside from using direct SQL - is to make your navigation property required.
This is still a very ugly hack in my opinion which might have unexpected behaviour in other situations. If a navigation property is required or optional EF adds a "semantic meaning" to this relationship which is: If there is a foreign key != NULL there must be a related entity and EF expects that you don't have removed the enforcement of the FK constraint in the database.