DynamoDB key uniqueness across primary and global secondary index - amazon-dynamodb

I'm working on creating a DynamoDB table to hold comments associated to a single object.
Comments are posted to an object at a specific time, I am using the time posted as the range so comments are sortable in time descending order. I have a global secondary index of the userId of the user who posted the comment, this should allow me to get all comments posted by a given user.
My question is, will this key be unique? I'm worried that since it is technically possible for two users to post a comment to the same objectId at the same time, the comment hash and range key will be identical.
My hope is that since it should be impossible for the same user to post two comments on the same object at the same time the global secondary index will make the key unique.
Comment table:
Hash Key Range Key Global Secondary Index Hash
---------------------------------------------------------------------------------------
| objectId | datePosted | userId |
| (not unique) | (not unique if multiple users | (unique across objectId and |
| | post for the same object # same time) | datePosted) |
---------------------------------------------------------------------------------------

DynamoDB indexes have nothing to do with uniqueness. Global and local indexes are allowed to have duplicate hash key & range key pairs. Only the hash key and range key of the table itself are unique.
In your example, it would be possible for two different users to comment on an object at the same exact moment and produce a duplicate objectId, datePosted key. There are a couple ways to deal with this. You could use a PutItem request with a condition that the primary key is null as mentioned in the API reference. That would cause the second comment save to fail and you could report an error to the user or simply try again with an updated time-stamp. Without the condition, the second comment will overwrite the first. Alternatively, you could make the range key of the table a composite value of datePosted concatenated with userId. That way, the range keys will always be unique, but will still be sorted in date time order. This is a common practice with DynamoDB.

Related

Fetch last item of the aws dynamodb table

So I wanted to fetch the last item/row of my dynamodb table but i am not finding resources. My primary key is id having series of incremented numbers such as 1,2,3... for each row respectively.
This is my function.
async function readMessage(){
const params = {
TableName: table,
};
return dynamo.getItem(params).promise();
}
I am not sure as to what i should be adding in my params.
DynamoDB has two types of primary keys:
Partition key – A simple primary key, composed of one attribute known as the partition key.
Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
When fetching an item by partition key, you need to specify the exact partition key. You cannot fetch the max/min partition key.
Instead, you may want to create a sort key with a timestamp (or the ID if it's a sequential number) and use the sort key to fetch the last item.
Check out the AWS docs on Choosing the Right Partition Key for more info.
The proper way to design a table in DynamoDB is based on its expected access patterns; if this is something you need perhaps you should consider using this id as Sort Key instead of Primary Key and then query the table in descending order while also limiting the amount of items to 1.
If instead you don't want to change the schema of your items and you don't care about making at least two operations to do this you have two, not optimal options:
If none of your items ever gets deleted, just make a count first and use that information to know what's the latest item that was written.
Alternatively, if you could consider keeping a "special" record in your DynamoDB table that is basically a count that gets always incremented/written when one of your "other" items gets written. Upon retrieval you first retrieve the value of this special record and use this info to retrieve the actual one.
The combination of the partition key and sort key, makes the primary key of your item in the dynamoDB, so their combination must be unique, otherwise the item will be overwritten.
In almost all my use-cases, I select the primary key as an object attribute, like the brand, an email or a class and then, for the sort key I select the TimeStamp. So in this way, you always know the partition key, we need it to retrieve the values and then you can query your dynamoDB by making some filters by the sort key. For more extensive examples using Python, check the AWS page: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.04.html, where it shows, how you can query your DynamoDB items.
There is also other ways to define the keys in your Dynamo and for that I advise you to check https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html

How to query and order on two separate sort keys in DynamoDB?

GROUPS
userID: string
groupID: string
lastActive: number
birthday: number
Assume I have a DynamoDB table called GROUPS which stores items with these attributes. The table records which users are joined to which groups. Users can be in multiple groups at the same time. Therefore, the composite primary key would most-commonly be:
partition key: userID
sort key: groupID
However, if I wanted to query for all users in a specific group, within a specific birthday range, sorted by lastActive, is this possible and if so what index would I need to create?
Could I synthesize lastActive and userID to create a synthetic sort key, like so:
GROUPS
groupID: string
lastActiveUserID: string (i.e. "20201230T09:45:59-abc123")
birthday: number
Which would make for a different composite primary key where the partition key is groupID and the sort key is lastActiveUserID, which would sort the participants by when they were last active, and then a secondary index to filter by birthday?
As written, no this isn't possible.
within a specific birthday range
implies sk_birthday between :start and :end
sorted by lastActive
implies lastActive as a sort key.
which are mutually exclusive...I can't devise a sort key that would be able to contain both values in a usable format.
You could have a Global Secondary Index with a hash key of group-id and lastActive as a sort key, then filter on birthday. But, that only affects the data returned, it doesn't affect the data read nor the cost to read that data. Additionally, since DDB only reads 1MB of data at a time, you'd have to call it repeatedly in a loop if it's possibly a given group has more than 1MB worth of members.
Also, when your index has a different partition (hash) key than your table, that is a global secondary index (GSI). If your index has the same partition key but a different sort key than the table, that can be done with a local secondary index (LSI)
However for any given query, you can only use the table or a given index. You can't use multiple indexes at the same time
Now having said all that, what exactly to you mean by "specific birthday range" If the range in question is a defined period, by month, by week. Perhaps you could have a GSI where the hash key is "group-id#birthday-period" and sort key is lastActive
So for instance, "give me GROUPA birthdays for next month"
Query(hs = "GROUPA#NOVEMBER")
But if you wanted November and December, you'd have to make two queries and combine & sort the results yourself.
Effective and efficient use of DDB means avoiding Scan() and avoiding the use of filterExpressions that you know will throw away lots of the data read.

Choosing Primary key for DynamoDB

A bit of context: I am trying to build an inventory to list my AWS resources in various accounts and I am planning to use DynamoDB to store the data. These will be the columns for my table: ResourceARN, ResourceName, ResourceType, StandardTag, IsDeleted, LastUpdateTime and ResourceCreationDate ( this field is available only for a few resource types like Ec2).
Question: I want to query my DDB table using account ID, resource type and tag name. I am stumped on choosing the primary key for the table. Since primary key should be unique and has to have 1:many relationship. Hence, I cannot use a combination of resourceType and account Id. Nor can I use resourceArn as my primary key since it is 1:1 relationship. Also, using the resourceARN as the sort key does not make sense to me. I understand that I can use a simple scan operation, but that is very costly and will take time if I add more data in my DDB.
I would appreciate any suggestions or guidance over the same.
Short answer
Partition key: Account ID
Sort key: <resource type>/<resource ID>
Rationale
It's a common pattern for a sort key to be a string concatenating multiple attributes. Since sort keys can be queried by prefix, you can leverage this in your queries:
Get all account resources: query all sort keys on the Account ID partition key
Get all EC2 instances of an account: query with partition key = <your account ID> and sort key begins_with('ec2-instance').
You may notice that ARNs follow such a hierarchy as well (what's probably not a coincidence). This would be effectively using a subset of the ARN as the sort key.
Some notes:
DynamoDB is about attributes as much as about columns. You don't need to include ResourceCreationDate in the records which don't have it, and doing so will save you space (see next point).
Attribute names count as storage for every record, which impacts cost and also throughput. It's common to use shorthand for names for this reason (rct instead of ResourceCreationTime for example).
You can use LSIs (Local Secondary Indexes) to order by creation and update times if you need this.

How to use boolean attribute to create combined sort key

I have a DynamoDB table called Message with the following attributes:
message_id: number (partition key)
user_id: number (sort key)
incoming: boolean
subject: string
I want to create a global secondary index with user_id for the partition key, and the combined value of incoming and subject for the sort key.
Global secondary index:
user_id: partition key
incoming#subject: sort key
Do I have to manually cast the incoming attribute to a string (where true becomes "1", and false becomes "0") before combining it with subject? What is the standard way to handle such a scenario?
As far as I know, I don't think that you can have different attributes like incoming#subject only in a global secondary index and separate incoming and subject attributes only in the original table. The attributes in your index will reflect the ones in the table. The difference between the two representations is that they have a different partition key and sort key. So, you can't "combine" incoming#subject just in the index without having this attribute in the table as well.
However, having incoming#subject in both the table and the index would solve your problem since its value would be determined outside of the database (when you write into the table). You should thus be able to "cast" it to whatever you want when you insert or update the data--whether it is true#my_subject_here or 1#another_subject.
Let me know if that works for you!

How to design DynamoDB table to facilitate searching by time ranges, and deleting by unique ID

I'm new to DynamoDB - I already have an application where the data gets inserted, but I'm getting stuck on extracting the data.
Requirement:
There must be a unique table per customer
Insert documents into the table (each doc has a unique ID and a timestamp)
Get X number of documents based on timestamp (ordered ascending)
Delete individual documents based on unique ID
So far I have created a table with composite key (S:id, N:timestamp). However when I come to query it, I realise that since my id is unique, because I can't do a wildcard search on ID I won't be able to extract a range of items...
So, how should I design my table to satisfy this scenario?
Edit: Here's what I'm thinking:
Primary index will be composite: (s:customer_id, n:timestamp) where customer ID will be the same within a table. This will enable me to extact data based on time range.
Secondary index will be hash (s: unique_doc_id) whereby I will be able to delete items using this index.
Does this sound like the correct solution? Thank you in advance.
You can satisfy the requirements like this:
Your primary key will be h:customer_id and r:unique_id. This makes sure all the elements in the table have different keys.
You will also have an attribute for timestamp and will have a Local Secondary Index on it.
You will use the LSI to do requirement 3 and batchWrite API call to do batch delete for requirement 4.
This solution doesn't require (1) - all the customers can stay in the same table (Heads up - There is a limit-before-contact-us of 256 tables per account)

Resources