DynamoDB secondary index is not unique? - amazon-dynamodb

I have set a secondary index with only a partition key (without a sort key), but I found that actually I can insert multiple items with the same partition key.
If I issue a query using the partition key in the secondary index, I'll get all the items where the partition key is equal to the given partition key value.
I'm a beginner of DynamoDB, I want to know if set a secondary index with only a partition key, but insert multiple items with the same partition key is a good idea.
I'm using Amplify.js and have this GraphQL schema:
type UserMeta #model #key(fields: ["owner"]) #auth(rules: [
{ allow: owner, operations: [create, delete, update] },
{
allow: groups,
groups: ["Admins"],
operations: [update, delete]
}
]) {
familyName: String
givenName: String
facebookUrl: AWSURL
twitterUrl: AWSURL
description: String
careers: [Career] #connection(keyName: "byOwner", fields: ["owner"])
owner: String!
}
type Career #model #key(name: "byOwner", fields: ["owner"]) #auth(rules: [
{ allow: owner, operations: [create, delete, update] },
{
allow: groups,
groups: ["Admins"],
operations: [update, delete]
}
]) {
id: ID!
company: String
companyUrl: AWSURL
industry: String
occupation: String
owner: String!
}
as you can see, the Career table has a secondary index byOwner with a partition key associated with owner(no sort key). but I can query the careers of a UserMeta normally.
with a traditional RDBMS, the index column can not be the same, I don't know why this is possible in DynamoDB, is this a best practice in DynamoDB??
Should I set a sort key for the byOwner index? maybe the sort key can be the id column?

with a traditional RDBMS, the index column can not be the same, I
don't know why this is possible in DynamoDB, is this a best practice
in DynamoDB??
Every RDBMS I've worked with allows both both unique and non-unique indexes.
The only uniqueness available in DDB is for the table's primary key.
It's very common to have records with the same partition key. In the table, records with the same partition key must have a different sort key.
For indexes, duplicates are allowed and again, this is very common use case.

One difference between RDBMS and DynamoDB is the latter expects you to know your data access patterns and use that to inform what shape the data should take. So this question ...
Should I set a sort key for the byOwner index? maybe the sort key can be the id column?
... can only be answered by knowing how you plan to load the Career objects.
If you're going to use a GraphQL query that only ever loads one at a time, like ...
type Query {
career(owner: String!, id: Id!)
}
... then adding the ID as a sort key is well worth it. It would mean the GraphQL Resolver for a Career will always be able to retrieve exactly the right object each time.
But if you'll need queries that will get a list of Career objects ...
type Query {
careers(owner: String!, since: dateString)
}
... and by default you only want to retrieve something like the "most recently created careers", then you would be better served by creating another attribute tracking when the career was created -- say createdAt: String! -- and use that as the sort key. The Resolver would then receive the list of careers by that owner in a logical sequence, allowing it to only read the oldest (or newest) careers.
This answer has some related info on how to use GSI's and sort keys with AWS AppSync.

Related

Can a DynamoDB Condition Expression work on just the Partition Key of a table with a Composite Key

I have a DynamoDB table, with a composite key, which looks like this:
PK
SK
Type
Email
Description
USER#A
USER#A
User
a#example.com
USER#A
BUG#1
Bug
This looks ok
USER#B
BUG#2
Bug
My user wasn't created first!
I'd like to ensure that a "User" record exists before adding a related "Bug" record - So the 3rd item here is incorrect.
When I put a bug item with the condition attribute_exists(PK), the condition is never true. When I remove the condition, I end up with a that third row; A Bug with no corresponding User.
My understanding is that attribute_exists() only looks at items with the combined composite key, and not across the whole table, regardless of which attribute you supply.
Is there a method of ensuring an item with the same Partition Key exists, while ignoring the Sort Key in this scenario?
DynamoDB condition expressions can be confusing, and the docs can compound that problem!
The DynamoDB condition expression works by 1) finding the item, 2) evaluating the condition expression, and finally 3) writing to the database if the condition evaluates to true.
I assume your put operation looks something like this:
ddbClient.put({
TableName: "YOUR TABLE",
Item: {
PK: "USER#B",
SK: "BUG#2",
Type "Bug",
Description: "My user wasn't created first!"
},
ConditionExpression: "attribute_exists(PK)"
})
In this example, DynamoDB first tries to find the item with PK: "USER#B" SK: "BUG#2", which does not exist. As you're experiencing, this item will not be written to DynamoDB because an item with that primary key does not exist.
The problem you are seeing, as you've alluded to in your question, is that a CondttionExpression applies to only a single item. However, you are trying to conditionally put an item in the database by applying the condition to another item. That is a great candidate for a DynamoDB transaction.
Transactions let you group operations together in an all-or-nothing operation. If one of the operations in your transaction fails, the entire transaction will fail and none of the operations will apply.
You can achieve what you are after by taking this approach
ddbClient.transactWriteItems({
TransactItems=[
{ "PUT":
{
TableName: "YOUR TABLE",
Item: {
PK: "USER#B",
SK: "BUG#2",
Type "Bug"
}
}
},
{ "ConditionCheck":
{
TableName: "YOUR TABLE",
Item: {
PK: "USER#B",
SK: "USER#B"
},
ConditionExpression: "attribute_exists(PK)"
}
}
]
})
In the above transaction, I'm using a ConditionCheck to confirm the existence of a user before entering the bug. If the user does not exist, the transaction will fail and the bug won't be written to DDB.
For a more thorough explanation of DynamoDB Condition Expressions, I highly recommend you check out Understanding DynamoDB Condition Expressions by Alex Debrie.

Dynamoose model update with hash key

I'm trying to execute an update against a dynamoose model. Here's the docs on calling model.update
Model.update(key[, updateObj[, settings]],[ callback])
key can be a string representing the hashKey or an object containing the hashKey & rangeKey.
My schema has both a hash key (partition key) and range key (sort key) like this:
// create model
let model = dynamoose.model(
"SampleStatus",
{
id: {
type: String,
hashKey: true,
},
date: {
type: Date,
rangeKey: true,
},
status: String,
});
I've created an object like this (with a fixed timestamp for demoing)
let timestamp = 1606781220842; // Date.Now()
model.create({
id: "1",
date: new Date(timestamp),
status: "pending",
});
I'd like to be able to update the status property by referencing just the id property like this:
model.update({id: "1"}, {status: "completed"})
// err: The provided key element does not match the schema
model.update("1", {status: "completed"})
// err: Argument of type 'string' is not assignable to parameter of type 'ObjectType'
But both result in the shown errors:
I can pass in the full composite key if I know the timestamp, so the following will work:
let timestamp = 1606781220842; // Date.Now()
model.update({ id: "1", date: timestamp }, { status: "completed" });
However, that requires me holding onto the timestamp and persisting alongside the id.
The ID field, in my case, should, by itself, be unique, so I don't need both to create a key, but wanted to add the date as a range key so it was sortable. Should I just update my schema so there's only a single hash key? I was thinking the docs that said a "`key can be a string representing the hashkey" would let me just pass in the ID, but that throws an error on compile (in typescript).
Any suggestions?
The solution here is to remove the rangeKey from the date property.
This is because in DynamoDB every document/item must have a unique “key”. This can either be the hashKey or hashKey + rangeKey.
Since you mention that your id property is unique, you probably want to use just the hashKey as the key, which should fix the issue.
In your example there could have been many documents with that id, so DynamoDB wouldn’t know which to update.
Don’t forget that this causes changes to your table so you might have to delete and recreate the table. But that should fix the problem you are running into.
Logically there is nothing stopping you than inserting more than 1 entry into the same partition (in your case the unique id). You could insert more than one item with the same id, if it had a different date.
Therefore if you want to get an item by only its partition key, which is really a unique ID, you need to use a query to retrieve the item (as opposed to a GET), but the return signature will be a collection of items. As you know you only have one item in the partition, you can take the first item, and specify a limit of 1 to save RCU.
// create model
let model = dynamoose.model(
"SampleStatus",
{
id: {
type: String,
hashKey: true,
"index": {
"name": "index_name",
"rangeKey": "date",
}
},
date: {
type: Date
},
status: String,
});
You have to tell the schema that hashKey and range are one partition key.
Ref: https://dynamoosejs.com/guide/Schema#index-boolean--object--array

How do I query by only part of a composite key in DynamoDB?

Let's say, I have Users writing reviews of Products.
User and Product are separate entities with their own ids.
Review is an entity with a composite id composed of userId and productId.
I have created a table review in DynamoDB with both userId and productId as HASH keys.
aws dynamodb create-table --table-name review \
--attribute-definitions \
AttributeName=user_id,AttributeType=S \
AttributeName=product_id,AttributeType=S \
--key-schema \
AttributeName=user_id,KeyType=HASH \
AttributeName=product_id,KeyType=RANGE \
--provisioned-throughput ReadCapacityUnits=10,WriteCapacityUnits=5
Thus making userId+productId the composite key.
The review data object is held against that key.
Querying for a review by user and product is fine.
But how do I query for all reviews by a user or all reviews for a product?
With a single parameter, e.g. if I do a query by single key conditional expression with just "#user_id = :userId" or just "#product_id = :productId"
I get an error of the form
Query condition missed key schema element: user_id
or
Query condition missed key schema element: product_id
I have created a table review in DynamoDB with both userId and productId as HASH keys.
You've created a composite primary key for your review table, which consists of a Partition Key of userId and a and a Sort Key of 'productId' . You did not create two HASH keys.
Logically, your review table will look something like this (I've made up some data for illustration purposes):
This table structure makes it easy to fetch reviews by user. Here's an example of a query for all reviews of USER#ABC
ddbClient.query(
"TableName": "<YOUR TABLE NAME>",
"KeyConditionExpression": "#userId = :userId",
"ExpressionAttributeValues": {
":userId": {
"S": "USER#ABC"
}
},
"ExpressionAttributeNames": {
"#userId": "userId"
}
)
This will return a collection of items reviewed by USER#ABC.
DynamoDB will not allow you to fetch items by only specifying the Sort Key (e.g. productId). You always need to provide the Partition Key. So how do you get a list of Users who have reviewed a given product?
If you want to search for all Users that have reviewed a single Product, you could introduce a global secondary index that swaps the Partition Key and Sort Key of your table. This pattern is known as an inverted index. Using my example from above, an inverted index would look like this:
This would allow you to fetch users by productId:
ddbclient.query(
{
"TableName": "<YOUR TABLE NAME>",
"IndexName": "reviews_by_product_index",
"KeyConditionExpression": "#productId = :productId",
"ExpressionAttributeValues": {
":productId": {
"S": "PRODUCT#456"
}
},
"ExpressionAttributeNames": {
"#productId": "productId"
}
}
)
This query would return a collection of two items representing reviews for PRODUCT#456.
When working with a composite primary key, you can search based on conditions of the sort key as long as you also specify the partition key. That's a mouthful, but it allows you to perform queries like (in pseudocode)
query where partition key = "USER#ABC" and sort key begins_with "PRODUCT"

Querying DynamoDB with partition key and secondary index in Javascript

Having the following table definition in DynamoDB
pk: userId (string)
sk: carId (string)
index: carModelIndex (string)
If I would like to query all the cars rented by a given user for a given model. How I would go about that?.
Please note I am using the javascript sdk for dynamodb.

DynamoDB Descending Order fetch records

i have 100 records in collection,
collection name:'users'
{
"name":'senthilkumar',
"email":'senthily88#gmail.com', //HashKey
"age":21,
"created":1465733486137, //RangeKey-timestamp
}
i need to fetch records the following sql query wise
select * from users order by created desc limit 10
How i can get above query format records from DynamoDB
Dynamodb sorts the results by the range key attribute. You can set the ScanIndexForward boolean parameter to true for ascending or false for descending.
resource: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
Use the KeyConditionExpression parameter to provide a specific value
for the partition key. The Query operation will return all of the
items from the table or index with that partition key value. You can
optionally narrow the scope of the Query operation by specifying a
sort key value and a comparison operator in KeyConditionExpression.
You can use the ScanIndexForward parameter to get results in forward
or reverse order, by sort key.
To Save Json Data to DynamoDB us put()
var Newparams = {
TableName: this.SuffleTableName,
Item: {
"userId": /* YOUR PRIMARY KEY */,
"addedAt": /* YOUR SORT KEY */,
"status": /* Additional Datas */,
}
}
Fetch Data From DynamoDB using Query()
QueryParam = {
TableName: 'YOUR TABLE NAME HERE',
IndexName: 'YOUR INDEX NAME HERE', //IF YOUR CREATED NEW INDEX
KeyConditionExpression: "UserId = :UserId ", //YOUR PRIMARY KEY
ExpressionAttributeValues: {
":UserId": UserId,
},
ScanIndexForward: false, //DESC ORDER, Set 'true' if u want asc order
ExclusiveStartKey: LastEvalVal, //Pagination - LastEvaluatedKeyPair
Limit: 10 //DataPerReq
}
If you want to return all rows in your table, you cannot use the query API, because that API requires you to provide a partition key value to filter your results by (i.e. assuming that your partition key is name you would only be able to use the query API to bring back the subset of results that have name = a given value, i.e. name= senthilkumar
If you want to return all rows in your table, you must use the Scan API: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.ReadData.Scan.html
Note that all results will be provided in ascending order by the value of the Range Key. You cannot reverse sort the contents with the Scan API. You would need to reverse your resultset in the application tier using whatever language you're writing your code in to turn the results upside down.
Scan does not scale well and it is not possible to use Scan to create a paginated, reverse sorted solution if your table contains items with unique partition keys.
If this is your situation, and if you want to return paginated + reverse sorted sets back from DynamoDB, you will need to re-consider the design of your table and which columns are the partition key/range key/index so that you can use the Query API.

Resources