DynamoDB Query - GSI - amazon-dynamodb

Say if I had a DynamoDB table:
UserId: S
BookName: S
BorrowedTimestamp: S
HasReturned: B
UserId (partition) and BookName (range) would be keys on the base table.
However I want to query using the other non-key fields e.g. BorrowedTimestamp > 3days and HasReturned is false.
I think I'd need to setup a GSI for this query to work, but it doesn't sound right having a binary field, HasReturned, as the partition key (with BorrowedTimestamp as range key). Is that correct, or am I missing something?

No, you don't need a GSI, but it might be more efficient depending on your circumstances.
Lets take your example of BorrowedTimestamp > 3days. Im going to assume this is for a particular user, so you have a userid to query.
You could do a query with a KeyConditionExpression of userid, then a FilterExpression of BorrowedTimestamp > 3days. Lets say the user has 10 books and 2 of them have a BorrowedTimestamp > 3days. This query will cost you 10 RCU (Read Capacity Units). That's because a FilterExpression just filters out items in your result set - DynamoDB actually found all 10 items in the query.
Now lets say you have a GSI where the partition key was userid and the range key was BorrowedTimestamp. Your KeyConditionExpression could specify both the parition key of the userid and the range key of BorrowedTimestamp > 3days. The result would be exactly the same. However this time it would only cost you 2 RCUs, and those RCUs would come from the index capacity not the table capacity.
Less RCUs sounds good, but remember you have to purchase throughput capacity for your primary index and GSI separately. This can be less efficient because you can't share purchased throughput between queries that use your primary key and GSI.
Finally if you didn't want to specify a userid at all you would use a scan. Scans sometimes don't scale well because they always evaluate every item in the table, but whether it works for you really depends on a lot of things (like how often you will use the scan, how many items you will have in the table etc).

Related

DynamoDB Best practice to select all items from a table with pagination (Without PK)

I simply want to get a list of products back from my table and paginated, the pagination part is relatively clear with last_evaluated_key, however all the examples are using on PK or SK, but in my case I just want to get paginated results sort by createdAt.
My product id (uniq uuid) is not very useful in this case. Is the last solution to scan the whole table?
Yes, you will use Scan. DynamoDB has two types of read operation, Query and Scan. You can Query for one-and-only-one Partition Key (and optionally a range of Sort Key values if your table has a compound primary key). Everything else is a Scan.
Scan operations read every item, max 1 MB, optionally filtered. Filters are applied after the read. Results are unsorted.
The SDKs have pagination helpers like paginateScan to make life easier.
Re: Cost. Ask yourself: "is Scan returning lots of data MB I don't actually need?" If the answer is "No", you are fine. The more you are overfetching, however, the greater the cost benefit of Query over Scan.

Dynamodb GetBatchItem vs query

Currently I use table.query to get items by matching partition key and sorted by sorting key. Now the new requirement is to handle batch query - a couple of hundred partition keys match and hopefully still sorted by sorting key in each partition key result. I find GetBatchItem that can handle up to 100 items per one query, but look like no sorting. Is one item here one row in DDB or all rows in one partition key?
From performance(query speed) and price perspective which one should I use? And do i have to do sorting for the result by myself if I use GetBatchItem? Ideally I like a solution of fast, cost effective and result sorted by sorting key in each partition key, but the first two are top priority and I can do sorting if I have to. Thanks
Query() is cheaper...
BatchGetItem() runs as individual GetItem() each costing 1 RCU (assuming your item is less than 400K).
Lets say you're item is 10K, Query() can return 40 of them for 1 RCU whereas returning 40 via BatchGetItem() will cost 40 RCU.

DynamoDB NOT EQUALS on GSI sort key

As the title suggest, I'm in a situation where I need to fetch all records from a dynamo table GSI, given that I know the hash key and I know the sort key that I want to avoid.
The table looks like this:
Id - Primary Key,
AId - GSI hash key,
BId - GSI sort key
I need an efficient query to get records by a query like this
AId = 1 and BId != 2.
DynamoDB doesn't support <> operator when querying on hash and sort keys, it's only present on filter expressions, but those are not allowed on any of the primary key fields either.
So what would be the solution here? Scanning is probably not a good idea, unless it would be possible to scan on a partition, but that doesn't seem to be supported either.
So the only solution that is obvious to me at this point is querying by the partition key and then filtering it out client side.
Assuming that your sort key is actually numeric as shown in your example...
Then your best option would be to issue two separate queries..
AId = 1 and BId < 2
AId = 1 and Bid > 2
Actually, as I write this...I think it would work regardless of the type of sort key...

dynamodb query to select all items that match a set of values

In a dynamo table I would like to query by selecting all items where an attributes value matches one of a set of values. For example my table has a current_status attribute so I would like all items that either have a 'NEW' or 'ASSIGNED' value.
If I apply a GSI to the current_status attribute it looks like I have to do this in two queries? Or instead do a scan?
DynamoDB does not recommend using scan. Use it only when there is no other option and you have fairly small amount of data.
You need use GSIs here. Putting current_status in PK of GSI would result in hot
partition issue.
The right solution is to put random number in PK of GSI, ranging from 0..N, where N is number of partitions. And put the status in SK of GSI, along with timestamp or some unique information to keep PK-SK pair unique. So when you want to query based on current_status, execute N queries in parallel with PK ranging from 0..N and SK begins_with current_status. N should be decided based on amount of data you have. If the data on each row is less than 4kb, then this parallel query operation would consume N read units without hot partition issue. Below link provides the details information on this
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-gsi-sharding.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html

How to query dynamoDB without using hashKey

I have a dynamoDB table with two attributes:
A: primary partition key
B: primary sort key
I want to query this table using attribute B since I don't know the value of A. Is it possible to do so?
Is it possible to make B as GSI (global secondary index), how to do and query the table using B, since B is already a sort key.
You need partition-key to query - you can't do it using sort-key alone. You can only scan.
So, the only way out for you is to create a GSI with B as the partition-key.
Update
Yes, you can use range-key as GSI.
The drawback to using GSI are:
There can only be a maximum of 5 GSI per table, so choose wisely what you need to index as GSI can only be specified during table creation and cannot be altered.
GSI will cost you additional money as you will need to assign Provisioned Throughput to it.
GSI is eventually consistent, meaning that DynamoDB does not guarantee that the moment data associated to the table's hash key is written into DB, the data's GSI hash key immediately becomes available for querying. The document states that this is usually immediate, but can be the case that it could take up to seconds for the GSI hash key to become available.

Resources