DynamoDB - simple explanation - amazon-dynamodb

Question
Having gone through verbose AWS documentations, I need some help to clarify the basic keywords and concept of DynamoDB.
Kindly assist to confirm if these are correct.
Hash key
The key which decides the partition of the item, so it is also called as the 'partition' key.
Primary key
A hash key, or (hash-key, range-key) pair that can identify only 1 item in the table. A (hash-key, range-key) pair is also called 'composite' key.
If the primary key has only hash-key, "hash-key" and "primary-key" can be used interchangeably(but doing so can cause confusions).
Local secondary index
In a simple term, "alternative range key" to be used with the hash-key of the primary key.
Besides the range key in the primary key (hash-key, range-key), we can have additional range keys that can be used with the hash-key of the primary key.
Global secondary index
Alternative (hash-key, range-key) pairs for Query.
KeyConditions
For a query on a table, you can have conditions only on the range key portion on the table/index primary key. The hash key condition must always be equal.
Expression attribute name
Dozens of words cannot be used as its attribute name in DynamoDB table, such as status. It is a way to get around this restriction to be able to use such word by prefixing with '#'. Perhaps a design error of DyanmoDB.
Key condition expression
SQL WHERE like part of Query which needs a hash-key of the primary key. It seems it identifies the one partition to get items, then additionally we can use a range-key to narrow down items.
KeyConditions
For a query on a table/index, you can have conditions only on the table/index primary key attributes. You must always provide the partition key name and value as an EQ condition. You can optionally provide a second condition, referring to the sort(aka range) key.
Filter expression
SQL WHERE like part that can be used for both with Query and Scan but only with non-key attributes.
Filter Expressions for Query
A filter expression cannot contain partition key or sort key attributes. You need to specify those attributes in the key condition expression, not the filter expression.
If used in Query in addition to the key expression, the unmatched items are thrown away.

Related

Fetch last item of the aws dynamodb table

So I wanted to fetch the last item/row of my dynamodb table but i am not finding resources. My primary key is id having series of incremented numbers such as 1,2,3... for each row respectively.
This is my function.
async function readMessage(){
const params = {
TableName: table,
};
return dynamo.getItem(params).promise();
}
I am not sure as to what i should be adding in my params.
DynamoDB has two types of primary keys:
Partition key – A simple primary key, composed of one attribute known as the partition key.
Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
When fetching an item by partition key, you need to specify the exact partition key. You cannot fetch the max/min partition key.
Instead, you may want to create a sort key with a timestamp (or the ID if it's a sequential number) and use the sort key to fetch the last item.
Check out the AWS docs on Choosing the Right Partition Key for more info.
The proper way to design a table in DynamoDB is based on its expected access patterns; if this is something you need perhaps you should consider using this id as Sort Key instead of Primary Key and then query the table in descending order while also limiting the amount of items to 1.
If instead you don't want to change the schema of your items and you don't care about making at least two operations to do this you have two, not optimal options:
If none of your items ever gets deleted, just make a count first and use that information to know what's the latest item that was written.
Alternatively, if you could consider keeping a "special" record in your DynamoDB table that is basically a count that gets always incremented/written when one of your "other" items gets written. Upon retrieval you first retrieve the value of this special record and use this info to retrieve the actual one.
The combination of the partition key and sort key, makes the primary key of your item in the dynamoDB, so their combination must be unique, otherwise the item will be overwritten.
In almost all my use-cases, I select the primary key as an object attribute, like the brand, an email or a class and then, for the sort key I select the TimeStamp. So in this way, you always know the partition key, we need it to retrieve the values and then you can query your dynamoDB by making some filters by the sort key. For more extensive examples using Python, check the AWS page: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.04.html, where it shows, how you can query your DynamoDB items.
There is also other ways to define the keys in your Dynamo and for that I advise you to check https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html

How to get attribute from a list of partition keys in DynamoDB - is scan my only option?

I've got a list of partition keys from one table.
userId["123","456","235"]
I need to get an attribute that they all share. like "username".
What would be the best practice to get them all at once?
Is scan my only option knowing that I know all my partition keys?
Do I know the sort key? yes but only the beginning of it. Therefore I
don't think I could use batchGetItem.
Scan is only appropriate if you don't know the partition keys. Because you know the partition keys you want to search, you can achieve the desired behavior with multiple Query operations.
A Query searches all documents with the specified partition key; you can only query one partition key per request, so you'll need multiple queries, but this will still be significantly more efficient than a single Scan operation.
If you're only looking for documents with a sort key that begins with something, you can include it in your KeyConditionExpression along with the partition key.
For example, if you wanted to only return documents whose sort key begins with a certain string, you could pass something like userId = :user_id AND begins_with(#SortKey, :str) as the key condition expression.
You can efficiently achieve the result by using PartQL SELECT statement. It allows to query array of partition keys with IN operator and apply additional conditions on other attributes without causing a full table scan.
To ensure that a SELECT statement does not result in a full table
scan, the WHERE clause condition must specify a partition key. Use the
equality or IN operator.

Dynamodb key made up of 3 fields

Say I have an RDBMS table with a composite primary key e,g field1,field2,field3 which uniquely identify a record in the table. How can I model this on Dynamodb as it appears the primary on Dynamodb can only be made up of two fields (e.g a partition key and sort key)
You may need to somehow combine them into one value (such as concatenation with a field delimiter). For e.g. field1_field2_field3 as the partition key. If you happen to need sorting, you may also use sort key. You would also be able to search on bases for these fields for e.g. field1_ or field2 or _field3
Refrence: https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/

Invalid KeySchema: The second KeySchemaElement is not a RANGE key type

In my Cloudformation script, I'm creating a Dynamo DB table (Datasets) with two keys - let's call them CatalogId and DatasetId. They are both URIs that are outside of my control, but suffice it to say that together they make a unique ID.
I made both of them HASH keys in the primary KeySchema / index. When I did that, CF gave me the following error:
Invalid KeySchema: The second KeySchemaElement is not a RANGE key type
What am I doing wrong?
The answer is that only one of the keys can be a HASH key in the primary index. The second key must be of RANGE type, even if you never plan on comparing it with > or <. I'd love it if somebody could elaborate on why I can't have two HASH keys. Why doesn't Dynamo just concatenate the two keys internally and create one primary key?
As you mentioned, DynamoDB doesn't have that option. It expects the client to concatenate as String and store that value in one field (i.e. Hash key in the above case).
In case if you still need those attributes as separate fields, you can store that as a non-key attributes individually.
Q: Are composite attribute indexes possible?
No. But you can concatenate attributes into a string and use this as a key.
Example:-
First and last name as composite key
Concatenate first and last name and store that as hash key
Save the first name as a non-key attribute
Save the last name as a non-key attribute
I know it is little redundant. This is just an workaround to keep things clear.

limit offset, sorting and aggregation challenges in DynamoDB

I am using DynamoDB to store my device events (in JSON format) into table for further analysis and using scan APIs to display the result set on UI, which requires
To define limit offset of records,say 10 records per page, means
result set should be paginated(e.g. page-1 has 0-10 records, page-2
has 11-20 records and so on), i got an API like scanRequest.withLimit(10) but it has different meaning of limit offset, does DynamoDB API comes with support of limit offset?
I also need to sort result set on basis of user input fields like sorting on Date, Serial Number etc, but still didn't get any sorting/order by APIs.
I may look for aggregation e.g. on Device Name, Date etc. which also doesn't seems to be available in DynamoDB.
The above situation led me to think about some others noSQL database solutions, Please assist me on above mentioned issues.
The right way to think about DynamoDB is as a key-value store with support for indexes.
"Amazon DynamoDB supports key-value data structures. Each item (row) is a key-value pair where the primary key is the only required attribute for items in a table and uniquely identifies each item. DynamoDB is schema-less. Each item can have any number of attributes (columns). In addition to querying the primary key, you can query non-primary key attributes using Global Secondary Indexes and Local Secondary Indexes."
https://aws.amazon.com/dynamodb/details/
A table can have 2 types of keys:
Hash Type Primary Key—The primary key is made of one attribute, a
hash attribute. DynamoDB builds an unordered hash index on this
primary key attribute. Each item in the table is uniquely identified
by its hash key value.
Hash and Range Type Primary Key—The primary
key is made of two attributes. The first attribute is the hash
attribute and the second one is the range attribute. DynamoDB builds
an unordered hash index on the hash primary key attribute, and a
sorted range index on the range primary key attribute. Each item in
the table is uniquely identified by the combination of its hash and
range key values. It is possible for two items to have the same hash
key value, but those two items must have different range key values.
What kind of primary key have you set up for your Device Events table? I would suggest that you denormalize your data (i.e. pull specific attributes out of the json) and build additional indexes on those attributes that you want to sort and aggregate on: Date, Serial Number, etc. If I know what kind of primary key you have set up on your table, I can point you in the right direction to build these indices so that you can get what you need via the query method. The scan method will be inefficient for you because it reads every row in the table.
Lastly, with regard to your "limit offset" question, I think that you're looking for the ExclusiveStartKey, which will be returned by DynamoDB in the response to your query.
The ExclusiveStartKey is what will help you do pagination. It's not necessary to depend on the LastEvaluatedKey from the response. You'll get LastEvaluatedKey only if you are getting more than a MB worth data. If LIMIT page size is such that total returned data size is less than 1 MB, you'll not get back LastEvaluatedKey. But that does not stop you from using ExclusiveStartKey as an offset.

Resources