I've been reading a DynamoDB docs and was unable to understand if it does make sense to query on Global Secondary Index with a usage of 'contains' operator.
My problem is as follows: my dynamoDB document has a list of embedded objects, every object has a 'code' field which is unique:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
]
}
I want to be able to get all documents that contain entities with entity.code = X.
For this purpose I'm considering adding a Global Secondary Index that would contain all entity.codes that are present in current db document separated by a comma. So the example above would look like:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
],
"entitiesGlobalSecondaryIndex":"entityCode1,entityCode2"
}
And then I would like to apply filter expression on entitiesGlobalSecondaryIndex something like: entitiesGlobalSecondaryIndex contains entityCode1.
Would this be efficient or using global secondary index does not make sense in this way and DynamoDB will simply check the condition against every document which is similar so scan?
Any help is very appreciated,
Thanks
The contains operator of a query cannot be run on a partition Key. In order for a query to use any sort of operators (contains, begins with, > < ect...) you must have a range attributes- aka your Sort Key.
You can very well set up a GSI with some value as your PK and this code as your SK. However, GSIs are replication of the table - there is a slight potential for the data ina GSI to lag behind that of the master copy. If the query you're doing against this GSI isn't very often, then you're probably safe from that.
However. If you are trying to do this to the entire table at once then it's no better than a scan.
If what you need is a specific Code to return all its documents at once, then you could do a GSI with that as the PK. If you add a date field as the SK of this GSI it would even be time sorted. If you query against that code in that index, you'll get every single one of them.
Since you may have multiple codes, if they aren't too many per document, you maybe could use a Sparse Index - if you have an entity with code "AAAA" then you also have an attribute named AAAA (or AAAAflag or something.) It is always null/does not exist Unless the entities contains that code. If you do a GSI on this AAAflag attribute, it will only contain documents that contain that entity code, and ignore all where this attribute does not exist on a given document. This may work for you if you can also provide a good PK on this to keep the numbers well partitioned and if you don't have too many codes.
Filter expressions by the way are different than all of the above. Filter expressions are run on tbe data that would be returned, after it is already read out of the table. This is useful I'd you have a multi access pattern setup, but don't want a particular call to get all the documents associated with a particular PK - in the interests of keeping the data your code is working with concise. The query with a filter expression still retrieves everything from that query, but only presents what makes it past the filter.
If are only querying against a particular PK at any given time and you want to know if it contains any entities of x, then a Filter expressions would work perfectly. Of course, this is only per PK and not for your entire table.
If all you need is numbers, then you could do a count attribute on the document, or a meta document on that partition that contains these values and could be queried directly.
Lastly, and I have no idea if this would work or not, if your entities attribute is a map type you might very well be able to filter against entities code - and maybe even with entities.code.contains(value) if it was an SK - but I do not know if this is possible or not
In CosmosDB using the SQL API ( hope API might not matter ) and queries that do not use ORDER BY over an specific Logic Partition ( e.g. WHERE CustomerId = 123 ), wondering if the response will return the results always in the same order.
A use case could be something like an Audit log, where it is possible that TimeStamp _ts is not granular enough so likely to find at some point the same value twice and the source or events doesn't allow to create an sequence that can be used for ordering.
wondering if the response will return the results always in the same
order.
Based on my previous test, if you do not set any sort rules, it will be sorted as default based on the time created in the database,whatever it is partitioned or not.
In above sample documents, the sort will not be changed if I change the id,partition key(that's name) or ts.
DAO.fetch(query) allows us to get a collection of entities from the sqlite database that meets the query condition. query can be a map or string []. How can we specify ordering with the ORDER BY clause and also how do we apply the LIMIT and OFFSET clauses or do we have to default to db.execute(query)?
Currently ORDER BY, LIMIT, and OFFSET clauses aren't supported. It wouldn't be hard to add. Please file an RFE.
Alternatively it wouldn't be difficult to add this in your own DAO subclass. You can see how fetch(query) is implemented here.
In SQLite3, GROUP BY clause changes the order of extracted data. Is there any way to keep the original order?
SQL tables are unordered; there is no "original order".
Queries never return any guaranteed order unless you actually use ORDER BY.
I have a table that acts as a nested set.
The problem is, that the table have fields 'short_name' and 'long_name' and in different places i have to order the results accordingly.
I'm using Doctrine's "HYDRATE_RECORD_HIERARCHY" hydration mode when querying the tree.
The problem is: as far as i can tell this hydration mode is limited in terms that the query have to contain the orderBy clause "ORDER BY lft ASC"
Is there any way to get a sorted result set or do i have to apply some type of sorting after the query has been returned?
Since i'm getting back a Doctrine Collection (i'd really like to stay away from the array representation) it's not that trivial to sort it afterwards.