How to fetch multiple rows from DynamoDB using a non primary key - amazon-dynamodb

select * from tableName where columnName="value";
How can I fetch a similar result in DynamoDB using java, without using primary key as my attribute (Need to group data based on a value for a particular column).
I have gone through articles regarding getbatchitems, QuerySpec but all these require me to pass the primary key.
Can someone give a lead here?

Short answer is you can't. Whenever you use the Query or GetItem operations in DynamoDB you must always supply the table or index primary key.
You have two options:
Perform a Scan operation on the table and filter by columnName="value". However this requires DynamoDB to look at every item in the table so it is likely to be slow and expensive.
Add a Global Secondary Index to your table. This will require you to define a primary key for the index that contains the columnName you want to query

Related

Fetch last item of the aws dynamodb table

So I wanted to fetch the last item/row of my dynamodb table but i am not finding resources. My primary key is id having series of incremented numbers such as 1,2,3... for each row respectively.
This is my function.
async function readMessage(){
const params = {
TableName: table,
};
return dynamo.getItem(params).promise();
}
I am not sure as to what i should be adding in my params.
DynamoDB has two types of primary keys:
Partition key – A simple primary key, composed of one attribute known as the partition key.
Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
When fetching an item by partition key, you need to specify the exact partition key. You cannot fetch the max/min partition key.
Instead, you may want to create a sort key with a timestamp (or the ID if it's a sequential number) and use the sort key to fetch the last item.
Check out the AWS docs on Choosing the Right Partition Key for more info.
The proper way to design a table in DynamoDB is based on its expected access patterns; if this is something you need perhaps you should consider using this id as Sort Key instead of Primary Key and then query the table in descending order while also limiting the amount of items to 1.
If instead you don't want to change the schema of your items and you don't care about making at least two operations to do this you have two, not optimal options:
If none of your items ever gets deleted, just make a count first and use that information to know what's the latest item that was written.
Alternatively, if you could consider keeping a "special" record in your DynamoDB table that is basically a count that gets always incremented/written when one of your "other" items gets written. Upon retrieval you first retrieve the value of this special record and use this info to retrieve the actual one.
The combination of the partition key and sort key, makes the primary key of your item in the dynamoDB, so their combination must be unique, otherwise the item will be overwritten.
In almost all my use-cases, I select the primary key as an object attribute, like the brand, an email or a class and then, for the sort key I select the TimeStamp. So in this way, you always know the partition key, we need it to retrieve the values and then you can query your dynamoDB by making some filters by the sort key. For more extensive examples using Python, check the AWS page: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.04.html, where it shows, how you can query your DynamoDB items.
There is also other ways to define the keys in your Dynamo and for that I advise you to check https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html

How to future proof these possible requirement changes (swaping primary key columns) with a dynamodb table design?

I have the following data structure
item_id String
version String
_id String
data String
_id is simply a UUID to identify the item. There is no need to search for a row by this field yet.
As of now, item_id, an id generated by an external system, is the a primary key. i.e. Given the item_id, I want to be able retrieve version, _id and data from the dynamodb table.
item_id -> (version, _id, data)
Therefore I am setting item_id as the partition key.
I have two questions for future-proofing (evolution of) the above "schema":
In the future, if I want to incorporate version (version number of the item) into the primary key, can I just modify the table and add it to be the partition key?
If I also want to make the data searchable by _id, is it feasible modify the table to assign _id to be the partition key (It is a unique value because it is a UUID) and reassign item_id to be a search key?
I want to avoid creation of new dynamodb table and data migration to create new key structures, because it may lead to down time.
You cannot update primary keys in DynamoDB. From the docs:
You cannot use UpdateItem to update any primary key attributes. Instead, you will need to delete the item, and then use PutItem to create a new item with new attributes.
If you wanted to make data searchable by _id, you could introduce a secondary index with the _id field as the partition key of the index.
For example, let's say your data looked like this:
If you defined a secondary index on _id, the index would look like this (same data as the previous example, just a different logical view):
DynamoDB doesn't currently have any native versioning functionality, so you'll have to incorporate that into your data model. Fortunately, there's lots of discussion about this use case on the web. AWS has a document of DynamoDB "Best Practices", including an example of versioning.

How do I query DynamoDB without specifying a partition key value?

I have a simple table consisting of orderID as PK and userID as SK, I found out that in dynamoDB you need to specify both PK and SK to use query. so in this case, how is it possible for me to get all the orders for userID x since I can't ignore orderID since they're the partition key of this table? another way to solve this which works but not recomended is using a scan filter, which scans the whole table then filters the result. it will eventually slow down as the table grow. I wonder how do you guys do it with this scenario?
You can create a Global Secondary Index (GSI) based on userId and query based on that index.
You can read more about indexes and GSI’s in the AWS docs here.

AWS DynamoDB Query based on non-primary keys

I'm new to AWS DynamoDB and wanted to clarify something. Is it possible to query a table and filter base on a non-primary key attribute. My table looks like the following
Store
Id: PrimaryKey
Name: simple string
Location: simple string
Now I want to query on the Name, but I think I have to give the key as well from what I know? Apart from that I can use the scan but then I will be loading all the data.
From the docs:
The Query operation finds items based on primary key values. You can query any table or secondary index that has a composite primary key (a partition key and a sort key).
DynamoDB requires queries to always use the partition key.
In your case your options are:
create a Global Secondary Index that uses Name as a primary key
use a Scan + Filter if the table is relatively small, or if you expect the result set will include the majority of the records in the table
There are few designs principals that you can follow while you are using DynamoDB. If you are coming from a relational background, you have already witnessed the query limitations from primary key attributes.
Design your tables, for querying and separating hot and cold data.
Create Indexes for Querying from Non Key attributes (You have two options, Global Secondary Index which you can define at any time and Local Secondary Index which you need to specify at table creation time).
With the Global Secondary Index you can promote any NonKey attribute as the Partition Key for the Index and select another attribute for Sort Key for querying. For Local Secondary Index, you can promote any Non Key attribute as the Sort Key keeping the same Partition Key.
Using Indexes for query is important also to improve the efficiency in using provisioned throughput.
Although having indexes consumes the read throughput from the table, it also saves read through put from in a way that, if you project the right amount of attributes to read, it can give a huge benefit in reading. Check the following example.
Lets say you have a DynamoDB table that has items of 40KB. If you read directly from the table to list 10 items, it consumes 100 Read Throughput Units (For one item 10 Units since one unit can read 4KB and multiply it by 10). If you have an index defined just to project the attributes needed to list which will be having 4KB per item, then it will be consuming only 10 Read Throughput Units(One Unit per item) which makes a huge difference in terms of cost.
With DynamoDB its really important how you define Indexes to optimize for Querying not only from Query capability but also in terms of throughput.
You can not query based non-primary key attribute in Dynamo Db.
If you wanted to still do that you can do it using scan query,but scan is costly operation in DyanmoDB and if table is large, then it will affect performance and not recommended because it will scan each item in table and AWS cost you for all item it scan for that query.
There are two ways to achieve it
Keep Store Id as your PrimaryKey/ Partaion key of Dyanmo DB table and add Name/Location as sort Key (only one as Dyanmo DB accept only one Attribute as sort key by design.
Create Global Secondary Indexes for Querying from Non Key attributes which you are more frequenly required.
There are 3 ways to created GSI in Dyanamo DB, In your case select GSI with option INCLUDE and add Name , Location and store ID in Idex.
KEYS_ONLY – Each item in the index consists only of the table partition key and sort key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.
INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.
ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALL projection results in the largest possible secondary index.

DynamoDB GSI BatchGetItem

Is it possible to retrieve rows from the dynamodb Global secondary index using batchgetitem api? If my aim is to retrieve data from the main table based on some non-key attribute also , but data should be retrieved in the batch of 100 items - is the GSI index won't fit here?
Also is BatchItemGet API available for Query? Say a table has the primary key and sort key and same primary key can have multiple sort keys can I retrieve multiple primary keys using batchItemGet with just primary key only or it won't fir here?
There is no way to specify the index name in the BatchGetItem API operation according to the docs. That means using BatchGetItem (and GetItem for that matter) on a secondary index isn't possible. Both of these operate on the primary index.
If you want to retrieve data from a secondary index, you need to use Query or Scan. Both support the IndexName attribute according to the documentation. When using Query you have to specify the partition key and can optionally filter based on the sort key. If you don't filter on the sort key, you will get all items with the partition key, which should take care of your second requirement.
To retrieve data from a secondary index based on different partition keys, you'd need to issue multiple Query operations for the separate values of these keys, there is no batching here.
You can use PartiQL with WHERE IN clause for that:
SELECT * FROM Orders WHERE OrderID IN [100, 300, 234]
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.select.html

Resources