I am moving my database from a sql database to Dynamodb. I currently have a table with those values:
tenantId (PartitionKey)
resourceId (RangeKey)
type
role
name
I have the following query at the moment:
get all the resources belonging to a tenant ten that has type t, role r and name contains n. Where type role name may be null values, so in that case those are not used as filters.
Using filters it is possible to make this query in dynamodb, but reading the following article https://aws.amazon.com/blogs/database/querying-on-multiple-attributes-in-amazon-dynamodb/ I realized it may be an expensive query as dynamodb is retrieving those data and then filtering server side. That page suggests to create a GSI with the following value:
tenantId-type-role-name
With this index I can easily filter for ten t r n but in case I just have to filter for tenantId type name how should I query the GSI to get all the records that have tenant ten type t, and name contains n but have no restrictions on role (contains statement seems only to be supported on filters).
I am wondering if I need to create a GSI for each combination, something like:
tenantId-type
tenantId-role
tenantId-name
tenantId-type-role
...
Thanks in advance for your help
Before you build GSIs to make your querying simpler. Think about storing your data in a different format.
For example how many resources do you expect per tenant? Could you store your data as such:
{
tenant: 123, //(partition)
resources: [
{ type: 'type1', role: 'role1', name: 'somename1'},
{ type: 'type2', role: 'role2', name: 'somename2'},
{ type: 'type3', role: 'role3', name: 'somename3'}
]
}
In the format above your read times will be rapid and scale. You can then filter your contains logic in code. Your dynamodb records can be 400kb in size, so you could probably store several thousands resources in the above format per record.
Also note each GSI has its own read/write unit usage that is used up when you insert into the table. If you do the GSI approach and write a lot to that table you'll have a surprisingly high write usage.
Related
A bit of context: I am trying to build an inventory to list my AWS resources in various accounts and I am planning to use DynamoDB to store the data. These will be the columns for my table: ResourceARN, ResourceName, ResourceType, StandardTag, IsDeleted, LastUpdateTime and ResourceCreationDate ( this field is available only for a few resource types like Ec2).
Question: I want to query my DDB table using account ID, resource type and tag name. I am stumped on choosing the primary key for the table. Since primary key should be unique and has to have 1:many relationship. Hence, I cannot use a combination of resourceType and account Id. Nor can I use resourceArn as my primary key since it is 1:1 relationship. Also, using the resourceARN as the sort key does not make sense to me. I understand that I can use a simple scan operation, but that is very costly and will take time if I add more data in my DDB.
I would appreciate any suggestions or guidance over the same.
Short answer
Partition key: Account ID
Sort key: <resource type>/<resource ID>
Rationale
It's a common pattern for a sort key to be a string concatenating multiple attributes. Since sort keys can be queried by prefix, you can leverage this in your queries:
Get all account resources: query all sort keys on the Account ID partition key
Get all EC2 instances of an account: query with partition key = <your account ID> and sort key begins_with('ec2-instance').
You may notice that ARNs follow such a hierarchy as well (what's probably not a coincidence). This would be effectively using a subset of the ARN as the sort key.
Some notes:
DynamoDB is about attributes as much as about columns. You don't need to include ResourceCreationDate in the records which don't have it, and doing so will save you space (see next point).
Attribute names count as storage for every record, which impacts cost and also throughput. It's common to use shorthand for names for this reason (rct instead of ResourceCreationTime for example).
You can use LSIs (Local Secondary Indexes) to order by creation and update times if you need this.
Query Pattern:
Get all posts where upvotes > downvotes
Schema
Post = new Schema({
id: {
type: String,
hashKey: true
},
upvote: {
type: Number,
},
downvote: {
type: Number,
}
});
How to achieve this query pattern?
DynamoDB splits all the data according to the partition key (PK) i.e. your data is divided into multiple servers for storage.
So to retrieve the data you need to atleast pass the partition key.
I believe your use-case is get all posts where upvotes > downvotes
Since this is a global query and not related to any partition or specific entry in the dynamodb table, You need to use a secondary index (sparse index via GSI).
To achieve this, you can create an additional attribute called upvotes_gt_downvotes and store this attribute only if upvotes are greater than downvotes (or store the diff to use it in more queries). Additionally, you will need to make this new attribute (upvotes_gt_downvotes) along with timestamp the sort-key (SK) for the GSI table.
To get the result you will have to scan this GSI. Note that this GSI will only have records which satisfy your query and will be removed from the GSI as soon as the attribute is deleted from the record.
While downvoting, if this value becomes 0 or -ve you will have to delete this attribute when updating the record. (deleting this attribute automatically removes the record from GSI)
First of all, I have table structure like this,
Users:{
UserId
Name
Email
SubTable1:[{
Column-111
Column-112
},
{
Column-121
Column-122
}]
SubTable2:[{
Column-211
Column-212
},
{
Column-221
Column-222
}]
}
As I am new to DynamoDB, so I have couple of questions regarding this as follows:
1. Can I create structure like this?
2. Can we set primary key for subtables?
3. Luckily, I found DynamoDB helper class to do some operations into my DB.
https://www.gopiportal.in/2018/12/aws-dynamodb-helper-class-c-and-net-core.html
But, don't know how to fetch only perticular subtable
4. Can we fetch only specific columns from my main table? Also need suggestion for subtables
Note: I am using .net core c# language to communicate with DynamoDB.
Can I create structure like this?
Yes
Can we set primary key for subtables?
No, hash key can be set on top level scalar attributes only (String, Number etc.)
Luckily, I found DynamoDB helper class to do some operations into my DB.
https://www.gopiportal.in/2018/12/aws-dynamodb-helper-class-c-and-net-core.html
But, don't know how to fetch only perticular subtable
When you say subtables, I assume that you are referring to Array datatype in the above sample table. In order to fetch the data from DynamoDB table, you need hash key to use Query API. If you don't have hash key, you can use Scan API which scans the entire table. The Scan API is a costly operation.
GSI (Global Secondary Index) can be created to avoid scan operation. However, it can be created on scalar attributes only. GSI can't be created on Array attribute.
Other option is to redesign the table accordingly to match your Query Access Pattern.
Can we fetch only specific columns from my main table? Also need suggestion for subtables
Yes, you can fetch specific columns using ProjectionExpression. This way you get only the required attributes in the result set
I'm investigating whether to use AWS DynamoDb or Azure DocumentDb or google cloud for price and simplicity for my app and am wondering what the best approach is for a typical invite schema.
An invite has
userId : key (who created the invite)
gameId : key
invitationList : collection of userIds
The queries I would be running are
Get invites where userId == me
Get invites where my userId is in the invitationList
In Mongo, I would just set an index on the embedded invitationList, and in SQL I would set up a join table of gameId and invited UserIds.
Using dynamodb or documentdb, could I do this in one "table" or would I have to set up a second denormalized table one that has an invited UserId per row with a set of invitedGameIds?
e.g.
A secondary table with
InvitedUserId : key
GameIds : Collection
Similar to hslriksen's answer, if certain criteria are met, I recommend that you denormalize all of this into a single document. Those criteria are:
The invitationList for games cannot grow unbounded.
Even if it's bounded, will a maximum length array fit in the document and transaction limits.
However, different from hslriksen, I recommend that an example document look like this:
{
gameId: <some game key>,
userId: <some user id>,
invitationList: [<user id 1>, <user id 2>, ...]
}
You might also decide to use the built-in id field for games in which case the name above is wrong.
The key difference between what I propose and hslriksen is that the invitationsList is a pure array of foreign keys. This will allow indexes to be used for an ARRAY_CONTAINS clause in your query.
Note, in DocumentDB, you would tend to store all entity types in the same big bucket and just distinguish them with a string type field or slightly better, an is_my_type boolean field.
For DocumentDB you could probably just keep this in one document per inviting user
where the document Id could equal the key of the inviting user. If you have many games, you could use gameId as partitionKey.
{
"id" : "gameKey+invitingUserKey",
"gameKey" : "someGameKey",
"invitingUserId": "key",
"invites": ["inviteKey1", "inviteKey2"]
}
This is based on a limited number of invites for a user/gameKey. It is however hard to determine the structure without knowing your query patterns. I find that the query patterns often dictates the document structure.
I am currently using DynamoDB and having a problem scanning. I am able to get paged results in forward order by using the ExclusiveStartKey. However, regardless of whether I set ScanIndexForward true or false, I get results in forward order from my scan operation. How can i get results in reverse order from a Scan in DynamoDB?
ScanIndexForward is the correct way to get items in descending order by the range key of the table or index you are querying. From the AWS API Reference:
A value that specifies ascending (true) or descending (false)
traversal of the index. DynamoDB returns results reflecting the
requested order determined by the range key. If the data type is
Number, the results are returned in numeric order. For type String,
the results are returned in order of ASCII character code values. For
type Binary, DynamoDB treats each byte of the binary data as unsigned
when it compares binary values.
Based on the docs for Scan, I conclude that there is no way to Scan in reverse. However, I would say that you are not using DynamoDB correctly if you need to do that. When designing a schema for a database like DyanmoDB you should plan the schema based on your expected queries to ensure that almost all application queries have a good index. Scans are meant more for sys admin operations or for feeding into MapReduce or analytics. "A Scan operation always scans the entire table, then filters out values to provide the desired result, essentially adding the extra step of removing data from the result set." (Query and Scan Performance) That can lead to performance problems and other issues.
Using DynamoDB is fundamentally different from working with a traditional relational database and requires a big change in the way you think about using it. You need to decide whether DynamoDB's advantages of availability in storage and performance, reliability and availability are worth accepting its limitations.
As of now the dynamoDB scan cannot return you sorted results.
You need to use a query with a new global secondary index (GSI) with a hashkey and range field. The trick is to use a hashkey which is assigned the same value for all data in your table.
I recommend making a new field for all data and calling it "Status" and set the value to "OK", or something similar.
Then your query to get all the results sorted would look like this:
{
TableName: "YourTable",
IndexName: "Status-YourRange-index",
KeyConditions: {
Status: {
ComparisonOperator: "EQ",
AttributeValueList: [
"OK"
]
}
},
ScanIndexForward: false
}
The docs for how to write GSI queries are found here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.Querying