nullable GSI vs Sparse Index - amazon-dynamodb

I know AWS questions are rarely getting an answer but I will try my luck. It's a pretty easy one but I cannot find an answer to that
Let's say I have the simples table ever with 2 'column'. token(Partition Key) and userId
What I want is to have a 1000 of tokens, and once a user signs up, a token will get assigned to the user. Basically the userId property in the table will be populated. Also, I want to be able to query, not scan, by both token and userId
In order to query by userId I can use it as a GSI, BUT it states that GSIs should not be null. All my entries in the beginning will have the token property, but the userId will be empty until it actually gets assigned.
What can I use for this scenario? I thought about a Sort Key as it's a Sparse Index, but as far as I know I cannot query ONLY by the Sort Key itself.

Just do what you’re thinking there and create a GSI based on the userID. For the base table don’t include a userID attribute at all until the token has one. This isn’t relational. You don’t need every row to have all attributes. Only the key attributes have to be provided.
So:
Early on all items are just the token attributes with no other attributes
You have a GSI with userID as its PK
Then add userID attributes to the tokens when assigned, and that will get propagated to the GSI

Related

DynamoDB : Good practice to use a timestamp field in a Primary Key

I want to store and retrieve data from a DynamoDB table.
My data (an item = a review a user gave on a feature of an app) have the following attributes :
user string
feature string
appVersion string
timestamp string
rate int
description string
There is multiple features, on multiple versions of the app, and an user can give multiple reviews on these features. So I would like to use (user, appVersion, feature, timestamp) as a primary key.
But it does not seem to be possible to use that much attributes in a primary key in DynamoDB.
The first solution I implemented is to use user as a Partition Key, and a hash of (appVersion, feature, timestamp) as a Sort Key (in a new field named reviewID).
My problem is that, I want to retrieve an item for a given user, feature, appVersion without knowing the timestamp value (let's say I want the item with the latest timestamp, or the list of all items matching the 3 fields)
Without knowing the timestamp, I can't build the Sort Key necessary to retrieve my item. But if I remove the timestamp from the Sort Key, I will not be able to store multiple items having the same (user, appVersion, feature).
What would be the proper way to handle this usecase ?
I am thinking about using a hash of (user, appVersion, feature) as a Partition Key, and the timestamp as a Sort Key, would this be a correct solution ?
Put the timestamp at the end of your SK and then when you Query the data you use begins_with on the SK.
PK SK
UserID appVersion#feature#timestamp
This will allow you to dynamically query the data. For example you want all the users votes for a specific appversion
SELECT * FROM Mytable WHERE PK= 'x' AND SK BEGINS_WITH('{VERSION ID}')
This is done using a Query command.
The answer from Lee Hannigan will work, I like it.
However, keep in mind that accessing a PK is very fast because its hash-based.
I am thinking about using a hash of (user, appVersion, feature) as a
Partition Key, and the timestamp as a Sort Key, would this be a
correct solution?
This might also work, the table would look like this
PK SK
User#{User}AppVersion#{appVersion}#Feature#{feature} TimeStamp#{timestamp}
If you always know the user, appVersion, and the feature, this will be more optimal, because the SK lookup is O(logN)
one way
HASH string "modelName": "user"
RANGE string "id": "b0d5be50-4fae-11ed-981f-dbffcc56c88a"
uuid himself can be used for as timestamp
when searching you could search using reverse index
Another way
HASH string "modelName": "user"
RANGE string "createdAt" "2019-10-12T07:20:50.52Z"
createdAt, use time format rfc3339
when searching you could search using reverse index
Put down on paper what you need and you'll find others way to manage indes HASH/RANGE

How to filter DynamoDb by object property value

I have a DynamoDB table:
How shoul I filter entried in DB table where all keys are: access.role = "ADMIN"?
You would be best served by setting up an Global Index (GSI). You set the Partition Key equal to that attribute, and the Sort Key equal to some other attribute that you can guarantee will be unique. Then you use your SDK of choice or the Query option in the console, select the index, and query for partion_key = ADMIN
However. Be aware. Index's are a complete replication of the table. Dynamo is very good at this and relatively fast at doing so, but there is still the possibility that your index will be out of sync with the actual data. If you are not making the call against the index very often you are pretty much fine. If you are calling it very often, then you should restructure your table.
Dynamo is not an SQL. When setting up a dynamo schema you have to consider how you will access your data. your Access Patterns. You should design your data with your Partition Key as the data you will have when looking up (Ie: i always will have a user ID number) and your sort keys as the individual documents related to that PK (ie: a user has a document that is his profile data, a document that is his profile picture url, a document that is a list of his friends user numbers, a document that is ... ect)
Then you use Indexs for things like your question that you wont be doing very often.

Using a GUID as entity Id vs the entity's "actual" Id

In every cosmos db repository example I've seen, the id/row key has been generated like this: {partitionKey}:{Guid.newGuid()}. I'm working on a web api where the user won't necessarily have any way of knowing what this random GUID is. But they will know the EmployeeId, ProjectId etc. of the respective object, so I'm wondering if there are any issues with using i.e. EmployeeId as both the partition key and Id?
There's nothing technically wrong with the approach of setting id and partition key the same however you will have just one document per partition and that's bad design IMHO as all your read queries will be cross-partition queries (e.g. listing all employees).
One approach could be to set the partition key as the type of the entity (Employee, Project etc.) and then set the id as the unique identifier of the entity (employee id, project id etc.).
To be honest, if you know the partition key AND the item id, you can do a Point read which is the fastest.
We used to also take the approach of using random guids for all item IDs, but this means you will always need to know this id and partition key. Sometimes a more functional key as the item ID makes more sense so have a good thought about it!
And remember, an item ID is not unique, the uniqueness is only within the partition key.
So you could have two items with the same item ID and different partition key.

Fetch last item of the aws dynamodb table

So I wanted to fetch the last item/row of my dynamodb table but i am not finding resources. My primary key is id having series of incremented numbers such as 1,2,3... for each row respectively.
This is my function.
async function readMessage(){
const params = {
TableName: table,
};
return dynamo.getItem(params).promise();
}
I am not sure as to what i should be adding in my params.
DynamoDB has two types of primary keys:
Partition key – A simple primary key, composed of one attribute known as the partition key.
Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
When fetching an item by partition key, you need to specify the exact partition key. You cannot fetch the max/min partition key.
Instead, you may want to create a sort key with a timestamp (or the ID if it's a sequential number) and use the sort key to fetch the last item.
Check out the AWS docs on Choosing the Right Partition Key for more info.
The proper way to design a table in DynamoDB is based on its expected access patterns; if this is something you need perhaps you should consider using this id as Sort Key instead of Primary Key and then query the table in descending order while also limiting the amount of items to 1.
If instead you don't want to change the schema of your items and you don't care about making at least two operations to do this you have two, not optimal options:
If none of your items ever gets deleted, just make a count first and use that information to know what's the latest item that was written.
Alternatively, if you could consider keeping a "special" record in your DynamoDB table that is basically a count that gets always incremented/written when one of your "other" items gets written. Upon retrieval you first retrieve the value of this special record and use this info to retrieve the actual one.
The combination of the partition key and sort key, makes the primary key of your item in the dynamoDB, so their combination must be unique, otherwise the item will be overwritten.
In almost all my use-cases, I select the primary key as an object attribute, like the brand, an email or a class and then, for the sort key I select the TimeStamp. So in this way, you always know the partition key, we need it to retrieve the values and then you can query your dynamoDB by making some filters by the sort key. For more extensive examples using Python, check the AWS page: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.04.html, where it shows, how you can query your DynamoDB items.
There is also other ways to define the keys in your Dynamo and for that I advise you to check https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html

Query on non-key attribute

It appears that dynamodb's query method must include the partition key as part of the filter. How can a query be performed if you do not know the partition key?
For example, you have a User table with the attribute userid set as the partition key. Now we want to look up a user by their phone number. Is it possible to perform the query without the partition key? Using the scan method, this goal can be achieved, but at the expense of pulling every item from the table before the filter is applied, as far as I know.
You'll need to set up a global secondary index (GSI), using your phoneNumber column as the index hash key.
You can create a GSI by calling UpdateTable.
Once you create the index, you'll be able to call Query with your IndexName, to pull user records based on the phone number.

Resources