Mapping a dynamodb query result - amazon-dynamodb

I have a table with a composite key; there is both a partition and a sort key. I know that the java sdk allows me to query by just the partition key. However, if I do this then the docs say I will get this iterator back ItemCollection<QueryOutcome>. This means for me to work with this data, I will have to iterate over the entire collection in order to fulfill my needs.
It would be easier if I was able to get back a Map<T, V> type where the key here would be the sort key. That way, I can quickly find rows for a particular sort key. Is this possible? I would rather not iterate over the collection just to find certain items with a certain sort key value.

If you just want an item with a certain sort key, that’s a get item. Don’t do a Query.
You may be confused by DynamoDB’s use of the word Query. That’s not the only way to query the database. It’s one way to query which happens to have the name Query.

Related

DynamoDB Best practice to select all items from a table with pagination (Without PK)

I simply want to get a list of products back from my table and paginated, the pagination part is relatively clear with last_evaluated_key, however all the examples are using on PK or SK, but in my case I just want to get paginated results sort by createdAt.
My product id (uniq uuid) is not very useful in this case. Is the last solution to scan the whole table?
Yes, you will use Scan. DynamoDB has two types of read operation, Query and Scan. You can Query for one-and-only-one Partition Key (and optionally a range of Sort Key values if your table has a compound primary key). Everything else is a Scan.
Scan operations read every item, max 1 MB, optionally filtered. Filters are applied after the read. Results are unsorted.
The SDKs have pagination helpers like paginateScan to make life easier.
Re: Cost. Ask yourself: "is Scan returning lots of data MB I don't actually need?" If the answer is "No", you are fine. The more you are overfetching, however, the greater the cost benefit of Query over Scan.

How to filter DynamoDb by object property value

I have a DynamoDB table:
How shoul I filter entried in DB table where all keys are: access.role = "ADMIN"?
You would be best served by setting up an Global Index (GSI). You set the Partition Key equal to that attribute, and the Sort Key equal to some other attribute that you can guarantee will be unique. Then you use your SDK of choice or the Query option in the console, select the index, and query for partion_key = ADMIN
However. Be aware. Index's are a complete replication of the table. Dynamo is very good at this and relatively fast at doing so, but there is still the possibility that your index will be out of sync with the actual data. If you are not making the call against the index very often you are pretty much fine. If you are calling it very often, then you should restructure your table.
Dynamo is not an SQL. When setting up a dynamo schema you have to consider how you will access your data. your Access Patterns. You should design your data with your Partition Key as the data you will have when looking up (Ie: i always will have a user ID number) and your sort keys as the individual documents related to that PK (ie: a user has a document that is his profile data, a document that is his profile picture url, a document that is a list of his friends user numbers, a document that is ... ect)
Then you use Indexs for things like your question that you wont be doing very often.

Querying on Global Secondary indexes with a usage of contains operator

I've been reading a DynamoDB docs and was unable to understand if it does make sense to query on Global Secondary Index with a usage of 'contains' operator.
My problem is as follows: my dynamoDB document has a list of embedded objects, every object has a 'code' field which is unique:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
]
}
I want to be able to get all documents that contain entities with entity.code = X.
For this purpose I'm considering adding a Global Secondary Index that would contain all entity.codes that are present in current db document separated by a comma. So the example above would look like:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
],
"entitiesGlobalSecondaryIndex":"entityCode1,entityCode2"
}
And then I would like to apply filter expression on entitiesGlobalSecondaryIndex something like: entitiesGlobalSecondaryIndex contains entityCode1.
Would this be efficient or using global secondary index does not make sense in this way and DynamoDB will simply check the condition against every document which is similar so scan?
Any help is very appreciated,
Thanks
The contains operator of a query cannot be run on a partition Key. In order for a query to use any sort of operators (contains, begins with, > < ect...) you must have a range attributes- aka your Sort Key.
You can very well set up a GSI with some value as your PK and this code as your SK. However, GSIs are replication of the table - there is a slight potential for the data ina GSI to lag behind that of the master copy. If the query you're doing against this GSI isn't very often, then you're probably safe from that.
However. If you are trying to do this to the entire table at once then it's no better than a scan.
If what you need is a specific Code to return all its documents at once, then you could do a GSI with that as the PK. If you add a date field as the SK of this GSI it would even be time sorted. If you query against that code in that index, you'll get every single one of them.
Since you may have multiple codes, if they aren't too many per document, you maybe could use a Sparse Index - if you have an entity with code "AAAA" then you also have an attribute named AAAA (or AAAAflag or something.) It is always null/does not exist Unless the entities contains that code. If you do a GSI on this AAAflag attribute, it will only contain documents that contain that entity code, and ignore all where this attribute does not exist on a given document. This may work for you if you can also provide a good PK on this to keep the numbers well partitioned and if you don't have too many codes.
Filter expressions by the way are different than all of the above. Filter expressions are run on tbe data that would be returned, after it is already read out of the table. This is useful I'd you have a multi access pattern setup, but don't want a particular call to get all the documents associated with a particular PK - in the interests of keeping the data your code is working with concise. The query with a filter expression still retrieves everything from that query, but only presents what makes it past the filter.
If are only querying against a particular PK at any given time and you want to know if it contains any entities of x, then a Filter expressions would work perfectly. Of course, this is only per PK and not for your entire table.
If all you need is numbers, then you could do a count attribute on the document, or a meta document on that partition that contains these values and could be queried directly.
Lastly, and I have no idea if this would work or not, if your entities attribute is a map type you might very well be able to filter against entities code - and maybe even with entities.code.contains(value) if it was an SK - but I do not know if this is possible or not

How to get attribute from a list of partition keys in DynamoDB - is scan my only option?

I've got a list of partition keys from one table.
userId["123","456","235"]
I need to get an attribute that they all share. like "username".
What would be the best practice to get them all at once?
Is scan my only option knowing that I know all my partition keys?
Do I know the sort key? yes but only the beginning of it. Therefore I
don't think I could use batchGetItem.
Scan is only appropriate if you don't know the partition keys. Because you know the partition keys you want to search, you can achieve the desired behavior with multiple Query operations.
A Query searches all documents with the specified partition key; you can only query one partition key per request, so you'll need multiple queries, but this will still be significantly more efficient than a single Scan operation.
If you're only looking for documents with a sort key that begins with something, you can include it in your KeyConditionExpression along with the partition key.
For example, if you wanted to only return documents whose sort key begins with a certain string, you could pass something like userId = :user_id AND begins_with(#SortKey, :str) as the key condition expression.
You can efficiently achieve the result by using PartQL SELECT statement. It allows to query array of partition keys with IN operator and apply additional conditions on other attributes without causing a full table scan.
To ensure that a SELECT statement does not result in a full table
scan, the WHERE clause condition must specify a partition key. Use the
equality or IN operator.

Does dynamodb support something like an "in" clause in its queries?

Say I have table of photos and users.
Given I have a list of users I'm following [user1,user2,...] and I want to get a list of photos of people I'm following.
How can I query the table of photos where photo.createdBy in [user1,user2,user3...]
I saw that dynamodb has a batch operation, but that takes a primary key, and in this case we would be querying against a secondary index (createdBy).
Is there a way to do a query like this in dynamodb?
If you are querying purely on photo.createdBy, then you should create a global secondary index:
To speed up queries on non-key attributes, you can create a global secondary index. A global secondary index contains a selection of attributes from the table, but they are organized by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table; it doesn't even need to have the same key schema as a table.
This will, of course, only retrieve one item. To limit results when returning more items, use a FilterExpression:
With a Query or a Scan operation, you can provide an optional filter expression to refine the results returned to you. A filter expression lets you apply conditions to the data after it is queried or scanned, but before it is returned to you. Only the items that meet your conditions are returned.
This can be applied to a Filter or Scan, but be careful of using too many Read Capacity Units when scanning for matching entries.

Resources