I want to store and retrieve data from a DynamoDB table.
My data (an item = a review a user gave on a feature of an app) have the following attributes :
user string
feature string
appVersion string
timestamp string
rate int
description string
There is multiple features, on multiple versions of the app, and an user can give multiple reviews on these features. So I would like to use (user, appVersion, feature, timestamp) as a primary key.
But it does not seem to be possible to use that much attributes in a primary key in DynamoDB.
The first solution I implemented is to use user as a Partition Key, and a hash of (appVersion, feature, timestamp) as a Sort Key (in a new field named reviewID).
My problem is that, I want to retrieve an item for a given user, feature, appVersion without knowing the timestamp value (let's say I want the item with the latest timestamp, or the list of all items matching the 3 fields)
Without knowing the timestamp, I can't build the Sort Key necessary to retrieve my item. But if I remove the timestamp from the Sort Key, I will not be able to store multiple items having the same (user, appVersion, feature).
What would be the proper way to handle this usecase ?
I am thinking about using a hash of (user, appVersion, feature) as a Partition Key, and the timestamp as a Sort Key, would this be a correct solution ?
Put the timestamp at the end of your SK and then when you Query the data you use begins_with on the SK.
PK SK
UserID appVersion#feature#timestamp
This will allow you to dynamically query the data. For example you want all the users votes for a specific appversion
SELECT * FROM Mytable WHERE PK= 'x' AND SK BEGINS_WITH('{VERSION ID}')
This is done using a Query command.
The answer from Lee Hannigan will work, I like it.
However, keep in mind that accessing a PK is very fast because its hash-based.
I am thinking about using a hash of (user, appVersion, feature) as a
Partition Key, and the timestamp as a Sort Key, would this be a
correct solution?
This might also work, the table would look like this
PK SK
User#{User}AppVersion#{appVersion}#Feature#{feature} TimeStamp#{timestamp}
If you always know the user, appVersion, and the feature, this will be more optimal, because the SK lookup is O(logN)
one way
HASH string "modelName": "user"
RANGE string "id": "b0d5be50-4fae-11ed-981f-dbffcc56c88a"
uuid himself can be used for as timestamp
when searching you could search using reverse index
Another way
HASH string "modelName": "user"
RANGE string "createdAt" "2019-10-12T07:20:50.52Z"
createdAt, use time format rfc3339
when searching you could search using reverse index
Put down on paper what you need and you'll find others way to manage indes HASH/RANGE
Related
I know AWS questions are rarely getting an answer but I will try my luck. It's a pretty easy one but I cannot find an answer to that
Let's say I have the simples table ever with 2 'column'. token(Partition Key) and userId
What I want is to have a 1000 of tokens, and once a user signs up, a token will get assigned to the user. Basically the userId property in the table will be populated. Also, I want to be able to query, not scan, by both token and userId
In order to query by userId I can use it as a GSI, BUT it states that GSIs should not be null. All my entries in the beginning will have the token property, but the userId will be empty until it actually gets assigned.
What can I use for this scenario? I thought about a Sort Key as it's a Sparse Index, but as far as I know I cannot query ONLY by the Sort Key itself.
Just do what you’re thinking there and create a GSI based on the userID. For the base table don’t include a userID attribute at all until the token has one. This isn’t relational. You don’t need every row to have all attributes. Only the key attributes have to be provided.
So:
Early on all items are just the token attributes with no other attributes
You have a GSI with userID as its PK
Then add userID attributes to the tokens when assigned, and that will get propagated to the GSI
So I wanted to fetch the last item/row of my dynamodb table but i am not finding resources. My primary key is id having series of incremented numbers such as 1,2,3... for each row respectively.
This is my function.
async function readMessage(){
const params = {
TableName: table,
};
return dynamo.getItem(params).promise();
}
I am not sure as to what i should be adding in my params.
DynamoDB has two types of primary keys:
Partition key – A simple primary key, composed of one attribute known as the partition key.
Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
When fetching an item by partition key, you need to specify the exact partition key. You cannot fetch the max/min partition key.
Instead, you may want to create a sort key with a timestamp (or the ID if it's a sequential number) and use the sort key to fetch the last item.
Check out the AWS docs on Choosing the Right Partition Key for more info.
The proper way to design a table in DynamoDB is based on its expected access patterns; if this is something you need perhaps you should consider using this id as Sort Key instead of Primary Key and then query the table in descending order while also limiting the amount of items to 1.
If instead you don't want to change the schema of your items and you don't care about making at least two operations to do this you have two, not optimal options:
If none of your items ever gets deleted, just make a count first and use that information to know what's the latest item that was written.
Alternatively, if you could consider keeping a "special" record in your DynamoDB table that is basically a count that gets always incremented/written when one of your "other" items gets written. Upon retrieval you first retrieve the value of this special record and use this info to retrieve the actual one.
The combination of the partition key and sort key, makes the primary key of your item in the dynamoDB, so their combination must be unique, otherwise the item will be overwritten.
In almost all my use-cases, I select the primary key as an object attribute, like the brand, an email or a class and then, for the sort key I select the TimeStamp. So in this way, you always know the partition key, we need it to retrieve the values and then you can query your dynamoDB by making some filters by the sort key. For more extensive examples using Python, check the AWS page: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.04.html, where it shows, how you can query your DynamoDB items.
There is also other ways to define the keys in your Dynamo and for that I advise you to check https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html
I have the following data structure
item_id String
version String
_id String
data String
_id is simply a UUID to identify the item. There is no need to search for a row by this field yet.
As of now, item_id, an id generated by an external system, is the a primary key. i.e. Given the item_id, I want to be able retrieve version, _id and data from the dynamodb table.
item_id -> (version, _id, data)
Therefore I am setting item_id as the partition key.
I have two questions for future-proofing (evolution of) the above "schema":
In the future, if I want to incorporate version (version number of the item) into the primary key, can I just modify the table and add it to be the partition key?
If I also want to make the data searchable by _id, is it feasible modify the table to assign _id to be the partition key (It is a unique value because it is a UUID) and reassign item_id to be a search key?
I want to avoid creation of new dynamodb table and data migration to create new key structures, because it may lead to down time.
You cannot update primary keys in DynamoDB. From the docs:
You cannot use UpdateItem to update any primary key attributes. Instead, you will need to delete the item, and then use PutItem to create a new item with new attributes.
If you wanted to make data searchable by _id, you could introduce a secondary index with the _id field as the partition key of the index.
For example, let's say your data looked like this:
If you defined a secondary index on _id, the index would look like this (same data as the previous example, just a different logical view):
DynamoDB doesn't currently have any native versioning functionality, so you'll have to incorporate that into your data model. Fortunately, there's lots of discussion about this use case on the web. AWS has a document of DynamoDB "Best Practices", including an example of versioning.
Given a DynamoDB table that looks similar to:
sessionId: String
deviceType: String (mobile/tablet/computer/...)
networkType: String (wifi/ethernet/3g/4g/...)
There may be some other fields.
I need to be able to look up a session id given the other parameters. SQLish:
SELECT sessionId WHERE deviceType="Mobile"
SELECT sessionId WHERE networkType in (wifi, ethernet) AND deviceType="Tablet"
But from what I understand, querying in DynamoDB always requires the partition key (sessionId).
Is there an alternative layout to this table that will allow for better querying? We're still in setup phase, so it can be changed.
To be efficient and cost effective, I suggest you to create 2 Global Secondary Indexes (GSI). The PK will be "deviceType" and "networkType". For the SK and I don't have enough information to suggest something. Hence, no need to project all attributes because you only want to retrieve sessionId which is projected by defaut because it is a PK.
To sum up the data model:
PK Attributes
Table: sessionId deviceType, networkType, ...
GSI_1: deviceType sessionId, networkType, ...
GSI_2: networkType sessionId, deviceType, ...
For example, while querying GSI_1, you'll use PK="Mobile" for example to retrieve all related sessionId.
Doing this way is really fast and cost effective as the opposite as scan.
It appears that dynamodb's query method must include the partition key as part of the filter. How can a query be performed if you do not know the partition key?
For example, you have a User table with the attribute userid set as the partition key. Now we want to look up a user by their phone number. Is it possible to perform the query without the partition key? Using the scan method, this goal can be achieved, but at the expense of pulling every item from the table before the filter is applied, as far as I know.
You'll need to set up a global secondary index (GSI), using your phoneNumber column as the index hash key.
You can create a GSI by calling UpdateTable.
Once you create the index, you'll be able to call Query with your IndexName, to pull user records based on the phone number.