So, with a very simple DynamoDB table, with Primary key let's say:
Address: string (eg: '1 someRd,someCity,someCounty'
and a GSI: Postcode: string
If I try to manually add an item to the table, with just an Address field and no Postcode, it throws an error: One or more parameter values are not valid. A value specified for a secondary index key is not supported. The AttributeValue for a key attribute cannot contain an empty string value. IndexName: Postcode, IndexKey: Postcode
I assumed that sparse GSI's were allowed as per: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-general-sparse-indexes.html
So, if we set a GSI does that field need to be included on every item in the table?
Related
I want to store and retrieve data from a DynamoDB table.
My data (an item = a review a user gave on a feature of an app) have the following attributes :
user string
feature string
appVersion string
timestamp string
rate int
description string
There is multiple features, on multiple versions of the app, and an user can give multiple reviews on these features. So I would like to use (user, appVersion, feature, timestamp) as a primary key.
But it does not seem to be possible to use that much attributes in a primary key in DynamoDB.
The first solution I implemented is to use user as a Partition Key, and a hash of (appVersion, feature, timestamp) as a Sort Key (in a new field named reviewID).
My problem is that, I want to retrieve an item for a given user, feature, appVersion without knowing the timestamp value (let's say I want the item with the latest timestamp, or the list of all items matching the 3 fields)
Without knowing the timestamp, I can't build the Sort Key necessary to retrieve my item. But if I remove the timestamp from the Sort Key, I will not be able to store multiple items having the same (user, appVersion, feature).
What would be the proper way to handle this usecase ?
I am thinking about using a hash of (user, appVersion, feature) as a Partition Key, and the timestamp as a Sort Key, would this be a correct solution ?
Put the timestamp at the end of your SK and then when you Query the data you use begins_with on the SK.
PK SK
UserID appVersion#feature#timestamp
This will allow you to dynamically query the data. For example you want all the users votes for a specific appversion
SELECT * FROM Mytable WHERE PK= 'x' AND SK BEGINS_WITH('{VERSION ID}')
This is done using a Query command.
The answer from Lee Hannigan will work, I like it.
However, keep in mind that accessing a PK is very fast because its hash-based.
I am thinking about using a hash of (user, appVersion, feature) as a
Partition Key, and the timestamp as a Sort Key, would this be a
correct solution?
This might also work, the table would look like this
PK SK
User#{User}AppVersion#{appVersion}#Feature#{feature} TimeStamp#{timestamp}
If you always know the user, appVersion, and the feature, this will be more optimal, because the SK lookup is O(logN)
one way
HASH string "modelName": "user"
RANGE string "id": "b0d5be50-4fae-11ed-981f-dbffcc56c88a"
uuid himself can be used for as timestamp
when searching you could search using reverse index
Another way
HASH string "modelName": "user"
RANGE string "createdAt" "2019-10-12T07:20:50.52Z"
createdAt, use time format rfc3339
when searching you could search using reverse index
Put down on paper what you need and you'll find others way to manage indes HASH/RANGE
I have a DynamoDB table called Message with the following attributes:
message_id: number (partition key)
user_id: number (sort key)
incoming: boolean
subject: string
I want to create a global secondary index with user_id for the partition key, and the combined value of incoming and subject for the sort key.
Global secondary index:
user_id: partition key
incoming#subject: sort key
Do I have to manually cast the incoming attribute to a string (where true becomes "1", and false becomes "0") before combining it with subject? What is the standard way to handle such a scenario?
As far as I know, I don't think that you can have different attributes like incoming#subject only in a global secondary index and separate incoming and subject attributes only in the original table. The attributes in your index will reflect the ones in the table. The difference between the two representations is that they have a different partition key and sort key. So, you can't "combine" incoming#subject just in the index without having this attribute in the table as well.
However, having incoming#subject in both the table and the index would solve your problem since its value would be determined outside of the database (when you write into the table). You should thus be able to "cast" it to whatever you want when you insert or update the data--whether it is true#my_subject_here or 1#another_subject.
Let me know if that works for you!
Question
Having gone through verbose AWS documentations, I need some help to clarify the basic keywords and concept of DynamoDB.
Kindly assist to confirm if these are correct.
Hash key
The key which decides the partition of the item, so it is also called as the 'partition' key.
Primary key
A hash key, or (hash-key, range-key) pair that can identify only 1 item in the table. A (hash-key, range-key) pair is also called 'composite' key.
If the primary key has only hash-key, "hash-key" and "primary-key" can be used interchangeably(but doing so can cause confusions).
Local secondary index
In a simple term, "alternative range key" to be used with the hash-key of the primary key.
Besides the range key in the primary key (hash-key, range-key), we can have additional range keys that can be used with the hash-key of the primary key.
Global secondary index
Alternative (hash-key, range-key) pairs for Query.
KeyConditions
For a query on a table, you can have conditions only on the range key portion on the table/index primary key. The hash key condition must always be equal.
Expression attribute name
Dozens of words cannot be used as its attribute name in DynamoDB table, such as status. It is a way to get around this restriction to be able to use such word by prefixing with '#'. Perhaps a design error of DyanmoDB.
Key condition expression
SQL WHERE like part of Query which needs a hash-key of the primary key. It seems it identifies the one partition to get items, then additionally we can use a range-key to narrow down items.
KeyConditions
For a query on a table/index, you can have conditions only on the table/index primary key attributes. You must always provide the partition key name and value as an EQ condition. You can optionally provide a second condition, referring to the sort(aka range) key.
Filter expression
SQL WHERE like part that can be used for both with Query and Scan but only with non-key attributes.
Filter Expressions for Query
A filter expression cannot contain partition key or sort key attributes. You need to specify those attributes in the key condition expression, not the filter expression.
If used in Query in addition to the key expression, the unmatched items are thrown away.
I have a use case where I want to create a Dynamodb Table which contains only 2 attributes - List of String (for example, Countries) and a Boolean value.
I am extracting this value for each country and implementing different logic in case of true or false.
My question is that, what is a best way (best practice) to create a dynamodb table.
I thought of few of following ways -
Boolean value as a key
Use boolean value as key and List as another attribute.
Add a row for each country.
Create a separate record with Country value as key and flag as an attribute.
Use List of countries as key and boolean value as another attribute. (I don't think this can be a good choice)
What could be the best practice while designing tables like this?
Thank You,
Prasad
From AWS DynamoDB Docs, NamingRulesDataTypes:
When you create a table or a secondary index, you must specify the names and data types of each primary key attribute (partition key and sort key). Furthermore, each primary key attribute must be defined as type string, number, or binary.
There are many options to model your table, but keep in mind you have to respect the rules cited above.
Your second case is a good one:
Add a row for each country. Create a separate record with Country value as key and flag as an attribute.
Partition key: country - string
Some column you do not have to define at creation: flag - boolean
In my Cloudformation script, I'm creating a Dynamo DB table (Datasets) with two keys - let's call them CatalogId and DatasetId. They are both URIs that are outside of my control, but suffice it to say that together they make a unique ID.
I made both of them HASH keys in the primary KeySchema / index. When I did that, CF gave me the following error:
Invalid KeySchema: The second KeySchemaElement is not a RANGE key type
What am I doing wrong?
The answer is that only one of the keys can be a HASH key in the primary index. The second key must be of RANGE type, even if you never plan on comparing it with > or <. I'd love it if somebody could elaborate on why I can't have two HASH keys. Why doesn't Dynamo just concatenate the two keys internally and create one primary key?
As you mentioned, DynamoDB doesn't have that option. It expects the client to concatenate as String and store that value in one field (i.e. Hash key in the above case).
In case if you still need those attributes as separate fields, you can store that as a non-key attributes individually.
Q: Are composite attribute indexes possible?
No. But you can concatenate attributes into a string and use this as a key.
Example:-
First and last name as composite key
Concatenate first and last name and store that as hash key
Save the first name as a non-key attribute
Save the last name as a non-key attribute
I know it is little redundant. This is just an workaround to keep things clear.