How to query dynamoDB without using hashKey - amazon-dynamodb

I have a dynamoDB table with two attributes:
A: primary partition key
B: primary sort key
I want to query this table using attribute B since I don't know the value of A. Is it possible to do so?
Is it possible to make B as GSI (global secondary index), how to do and query the table using B, since B is already a sort key.

You need partition-key to query - you can't do it using sort-key alone. You can only scan.
So, the only way out for you is to create a GSI with B as the partition-key.
Update
Yes, you can use range-key as GSI.
The drawback to using GSI are:
There can only be a maximum of 5 GSI per table, so choose wisely what you need to index as GSI can only be specified during table creation and cannot be altered.
GSI will cost you additional money as you will need to assign Provisioned Throughput to it.
GSI is eventually consistent, meaning that DynamoDB does not guarantee that the moment data associated to the table's hash key is written into DB, the data's GSI hash key immediately becomes available for querying. The document states that this is usually immediate, but can be the case that it could take up to seconds for the GSI hash key to become available.

Related

Fetch last item of the aws dynamodb table

So I wanted to fetch the last item/row of my dynamodb table but i am not finding resources. My primary key is id having series of incremented numbers such as 1,2,3... for each row respectively.
This is my function.
async function readMessage(){
const params = {
TableName: table,
};
return dynamo.getItem(params).promise();
}
I am not sure as to what i should be adding in my params.
DynamoDB has two types of primary keys:
Partition key – A simple primary key, composed of one attribute known as the partition key.
Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
When fetching an item by partition key, you need to specify the exact partition key. You cannot fetch the max/min partition key.
Instead, you may want to create a sort key with a timestamp (or the ID if it's a sequential number) and use the sort key to fetch the last item.
Check out the AWS docs on Choosing the Right Partition Key for more info.
The proper way to design a table in DynamoDB is based on its expected access patterns; if this is something you need perhaps you should consider using this id as Sort Key instead of Primary Key and then query the table in descending order while also limiting the amount of items to 1.
If instead you don't want to change the schema of your items and you don't care about making at least two operations to do this you have two, not optimal options:
If none of your items ever gets deleted, just make a count first and use that information to know what's the latest item that was written.
Alternatively, if you could consider keeping a "special" record in your DynamoDB table that is basically a count that gets always incremented/written when one of your "other" items gets written. Upon retrieval you first retrieve the value of this special record and use this info to retrieve the actual one.
The combination of the partition key and sort key, makes the primary key of your item in the dynamoDB, so their combination must be unique, otherwise the item will be overwritten.
In almost all my use-cases, I select the primary key as an object attribute, like the brand, an email or a class and then, for the sort key I select the TimeStamp. So in this way, you always know the partition key, we need it to retrieve the values and then you can query your dynamoDB by making some filters by the sort key. For more extensive examples using Python, check the AWS page: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.04.html, where it shows, how you can query your DynamoDB items.
There is also other ways to define the keys in your Dynamo and for that I advise you to check https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html

Querying DynamoDb timestamp data in range

I want to migrate my data from DynamoDb to Redshift. I dont want to scan the whole table at once as this might result in throttling.
My Table is as below:
acountId(hash key), lastUpdatedTime.
I thought I can create GSI on lastUpdatedTime and then I can query like give me the data between day1 to day5. Again next day I can do give me data between day6 to day7.
But even with GSI my understanding is that It will scan the whole table As I wont have any hash key to provide. I just have some range of timestamp to query.
Creating a GSI is the right solution indeed. However the GSI creation operation might be a bit slow/expensive if you set GSI to project all attributes. I would recommend creating the GSI on lastUpdatedTime, and project only the partition key (and order key if you have one) using KEYS_ONLY. Then, when you scan, you will only retrieve the item keys and query the item there and then, when migrating.
I recommend reading up on GSIs here: https://docs.aws.amazon.com/fr_fr/amazondynamodb/latest/developerguide/GSI.html

DynamoDB Query - GSI

Say if I had a DynamoDB table:
UserId: S
BookName: S
BorrowedTimestamp: S
HasReturned: B
UserId (partition) and BookName (range) would be keys on the base table.
However I want to query using the other non-key fields e.g. BorrowedTimestamp > 3days and HasReturned is false.
I think I'd need to setup a GSI for this query to work, but it doesn't sound right having a binary field, HasReturned, as the partition key (with BorrowedTimestamp as range key). Is that correct, or am I missing something?
No, you don't need a GSI, but it might be more efficient depending on your circumstances.
Lets take your example of BorrowedTimestamp > 3days. Im going to assume this is for a particular user, so you have a userid to query.
You could do a query with a KeyConditionExpression of userid, then a FilterExpression of BorrowedTimestamp > 3days. Lets say the user has 10 books and 2 of them have a BorrowedTimestamp > 3days. This query will cost you 10 RCU (Read Capacity Units). That's because a FilterExpression just filters out items in your result set - DynamoDB actually found all 10 items in the query.
Now lets say you have a GSI where the partition key was userid and the range key was BorrowedTimestamp. Your KeyConditionExpression could specify both the parition key of the userid and the range key of BorrowedTimestamp > 3days. The result would be exactly the same. However this time it would only cost you 2 RCUs, and those RCUs would come from the index capacity not the table capacity.
Less RCUs sounds good, but remember you have to purchase throughput capacity for your primary index and GSI separately. This can be less efficient because you can't share purchased throughput between queries that use your primary key and GSI.
Finally if you didn't want to specify a userid at all you would use a scan. Scans sometimes don't scale well because they always evaluate every item in the table, but whether it works for you really depends on a lot of things (like how often you will use the scan, how many items you will have in the table etc).

AWS DynamoDB Query based on non-primary keys

I'm new to AWS DynamoDB and wanted to clarify something. Is it possible to query a table and filter base on a non-primary key attribute. My table looks like the following
Store
Id: PrimaryKey
Name: simple string
Location: simple string
Now I want to query on the Name, but I think I have to give the key as well from what I know? Apart from that I can use the scan but then I will be loading all the data.
From the docs:
The Query operation finds items based on primary key values. You can query any table or secondary index that has a composite primary key (a partition key and a sort key).
DynamoDB requires queries to always use the partition key.
In your case your options are:
create a Global Secondary Index that uses Name as a primary key
use a Scan + Filter if the table is relatively small, or if you expect the result set will include the majority of the records in the table
There are few designs principals that you can follow while you are using DynamoDB. If you are coming from a relational background, you have already witnessed the query limitations from primary key attributes.
Design your tables, for querying and separating hot and cold data.
Create Indexes for Querying from Non Key attributes (You have two options, Global Secondary Index which you can define at any time and Local Secondary Index which you need to specify at table creation time).
With the Global Secondary Index you can promote any NonKey attribute as the Partition Key for the Index and select another attribute for Sort Key for querying. For Local Secondary Index, you can promote any Non Key attribute as the Sort Key keeping the same Partition Key.
Using Indexes for query is important also to improve the efficiency in using provisioned throughput.
Although having indexes consumes the read throughput from the table, it also saves read through put from in a way that, if you project the right amount of attributes to read, it can give a huge benefit in reading. Check the following example.
Lets say you have a DynamoDB table that has items of 40KB. If you read directly from the table to list 10 items, it consumes 100 Read Throughput Units (For one item 10 Units since one unit can read 4KB and multiply it by 10). If you have an index defined just to project the attributes needed to list which will be having 4KB per item, then it will be consuming only 10 Read Throughput Units(One Unit per item) which makes a huge difference in terms of cost.
With DynamoDB its really important how you define Indexes to optimize for Querying not only from Query capability but also in terms of throughput.
You can not query based non-primary key attribute in Dynamo Db.
If you wanted to still do that you can do it using scan query,but scan is costly operation in DyanmoDB and if table is large, then it will affect performance and not recommended because it will scan each item in table and AWS cost you for all item it scan for that query.
There are two ways to achieve it
Keep Store Id as your PrimaryKey/ Partaion key of Dyanmo DB table and add Name/Location as sort Key (only one as Dyanmo DB accept only one Attribute as sort key by design.
Create Global Secondary Indexes for Querying from Non Key attributes which you are more frequenly required.
There are 3 ways to created GSI in Dyanamo DB, In your case select GSI with option INCLUDE and add Name , Location and store ID in Idex.
KEYS_ONLY – Each item in the index consists only of the table partition key and sort key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.
INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.
ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALL projection results in the largest possible secondary index.

Creating Dynamodb table, 3 search columns apart from partition key, is it possible?

Hi, I have created a dynamodb table but this has errors when i try to perform table.GetItem with only username (image attached)
Found this is poorly designed table, so thought of recreating a new table, my question is how to set attributes, local secondary index and global secondary index for a table with one primary key and 3 search columns.
or
Is it possible to have 3 more search columns(User_email, Username,Usertype) apart from partition key column(user_ID) in dynamodb?
The GetItem API requires both partition key and sort key. However, you can use Query API with only partition key attribute value. Sort key is not mandatory for Query API.
Get Item Rule:-
For the primary key, you must provide all of the attributes. For
example, with a simple primary key, you only need to provide a value
for the partition key. For a composite primary key, you must provide
values for both the partition key and the sort key.
You can define a maximum of 5 local secondary indexes and 5 global secondary indexes per table.
An LSI is attached to a specific partition key value, whereas a GSI spans all partition key values. Since items having the same partition key value share the same partition in DynamoDB, the "Local" Secondary Index only covers items that are stored together (on the same partition). Thus, the purpose of the LSI is to query items that have the same partition key value but different sort key values. For example, consider a DynamoDB table that tracks Orders for customers, where CustomerId is the partition key.
With a local secondary index, there is a limit on item collection
sizes: For every distinct partition key value, the total sizes of all
table and index items cannot exceed 10 GB. This might constrain the
number of sort keys per partition key value.

Resources