ConditionExpression for PutItem not evaluating to false - amazon-dynamodb

I am trying to guarantee uniqueness in my DynamoDB table, across the partition key and other attributes (but not the sort key). Something is wrong with my ConditionExpression, because it is evaluating to true and the same values are getting inserted, leading to data duplication.
Here is my table design:
email: partition key (String)
id: sort key (Number)
firstName (String)
lastName (String)
Note: The id (sort key) holds randomly generated unique number. I know... this looks like a bad design, but that is the use case I have to support.
Here is the NodeJS code with PutItem:
const dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10'})
const params = {
TableName: <table-name>,
Item: {
"email": { "S": "<email>" },
"id": { "N": "<someUniqueRandomNumber>" },
"firstName": { "S": "<firstName>" },
"lastName": { "S": "<lastName>" }
},
ConditionExpression: "attribute_not_exists(email) AND attribute_not_exists(firstName) AND attribute_not_exists(lastName)"
}
dynamodb.putItem(params, function(err, data) {
if (err) {
console.error("Put failed")
}
else {
console.log("Put succeeded")
}
})

The documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.OperatorsAndFunctions.html says the following:
attribute_not_exists (path)
True if the attribute specified by path does not exist in the item.
Example: Check whether an item has a Manufacturer attribute.
attribute_not_exists (Manufacturer)
it specifically says "item" not "items" or "any item", so I think it really means that it checks only the item being overwritten. As you have a random sort key, it will always create a new item and the condition will be always true.
Any implementation which would check against a column which is not an index and would test all the records would cause a scan of all items and that is something what would not perform very well.
Here is an interesting article which covers how to deal with unique attributes in dynamodb https://advancedweb.hu/how-to-properly-implement-unique-constraints-in-dynamodb/ - the single table design together with transactions would be a possible solution for you if you can allow the additional partition keys in your table. Any other solution may be challenging under your current schema. DynamoDB has its own way of doing things and it may be frustrating to try to push to do things which it is not designed for.

Related

How to force the DynamoDB query's ExclusiveStartKey to use exact match?

I'm using DynamoDB for my new Serverless Restful API with nodejs.
The Restful API supports query for resources with the limit and lastKey query parameters for key pagination.
Assume there's a table like below:
PK
SK
School
firstSchool
School
secondSchool
School
thirdSchool
PK is partition key, and SK is sort key.
I use SK for key pagination.
If I call the api with http://somewhere/api/school?limit=1&lastKey=secondSchool, ExclusiveStartKey in query will be {"PK" : "School", "SK" : "secondSchool"}, and the returned item will be {"PK" : "School", "SK" : "thirdSchool"}.
It works well in that case, but the problem is the same result is created with the url like http://somewhere/api/school?limit=1&lastKey=seco.
In this case, ExclusiveStartKey in query will be {"PK" : "School", "SK" : "seco"}
It seems DynamoDB doesn't use exact match for a sk value in ExclusiveStartKey.
Is there any way to force DynamoDB to use exact match for ExclusiveStartKey?
I attach my test code below:
const { DynamoDBClient } = require("#aws-sdk/client-dynamodb");
const { DynamoDBDocument } = require("#aws-sdk/lib-dynamodb");
const ddbClient = new DynamoDBClient({
region: AWS_REGION,
endpoint: AWS_DYNAMODB_END_POINT,
credentials: {
accessKeyId: AWS_ACCESSKEY_ID,
secretAccessKey: AWS_SECRET_ACCESS_KEY,
},
});
const ddbDocClient = DynamoDBDocument.from(ddbClient);
(async () => {
try {
const data = await ddbDocClient.query({
TableName: "Table Name",
KeyConditionExpression: "#pk = :pk",
ExpressionAttributeNames: {
"#pk": "PK",
},
ExpressionAttributeValues: {
":pk": "Test",
},
Limit: 1,
ExclusiveStartKey: { PK: "Test", SK: "Seco" },
});
console.log(data);
} catch (err) {
console.log("Error", err);
}
})();
The ExclusiveKeyStart is used mainly for paging large Scan or Query requests - i.e., retrieving the next page of results after the previous page ended with a LastEvaluatedKey, and you are supposed to give exactly that key (not some subset of it...) as the ExclusiveKeyStart of the next request.
You are trying to do something different, and to achieve you can't use ExclusiveKeyStart, but you can use something else:
The Query request has a KeyConditionExpression. You can specify sk > :value as a key condition expression (don't pass ExclusiveKeyStart), and you'll get this all the sort keys higher than that :value like your string "seco". Please note, however, that because your sort key is truncated, this result may actually include one or more extra results before the first key you want (e.g., the keys "seco" and "secoaaaa" come before "secondSchool") so you may need to drop them yourself from the results.
The KeyConditionExpression is implemented efficiently - DynamoDB knows how to skip directly to that sort key in the partition, and doesn't charge you for reading the entire partition, so in this respect it is just as good as ExclusiveKeyStart.

Dynamoose model update with hash key

I'm trying to execute an update against a dynamoose model. Here's the docs on calling model.update
Model.update(key[, updateObj[, settings]],[ callback])
key can be a string representing the hashKey or an object containing the hashKey & rangeKey.
My schema has both a hash key (partition key) and range key (sort key) like this:
// create model
let model = dynamoose.model(
"SampleStatus",
{
id: {
type: String,
hashKey: true,
},
date: {
type: Date,
rangeKey: true,
},
status: String,
});
I've created an object like this (with a fixed timestamp for demoing)
let timestamp = 1606781220842; // Date.Now()
model.create({
id: "1",
date: new Date(timestamp),
status: "pending",
});
I'd like to be able to update the status property by referencing just the id property like this:
model.update({id: "1"}, {status: "completed"})
// err: The provided key element does not match the schema
model.update("1", {status: "completed"})
// err: Argument of type 'string' is not assignable to parameter of type 'ObjectType'
But both result in the shown errors:
I can pass in the full composite key if I know the timestamp, so the following will work:
let timestamp = 1606781220842; // Date.Now()
model.update({ id: "1", date: timestamp }, { status: "completed" });
However, that requires me holding onto the timestamp and persisting alongside the id.
The ID field, in my case, should, by itself, be unique, so I don't need both to create a key, but wanted to add the date as a range key so it was sortable. Should I just update my schema so there's only a single hash key? I was thinking the docs that said a "`key can be a string representing the hashkey" would let me just pass in the ID, but that throws an error on compile (in typescript).
Any suggestions?
The solution here is to remove the rangeKey from the date property.
This is because in DynamoDB every document/item must have a unique “key”. This can either be the hashKey or hashKey + rangeKey.
Since you mention that your id property is unique, you probably want to use just the hashKey as the key, which should fix the issue.
In your example there could have been many documents with that id, so DynamoDB wouldn’t know which to update.
Don’t forget that this causes changes to your table so you might have to delete and recreate the table. But that should fix the problem you are running into.
Logically there is nothing stopping you than inserting more than 1 entry into the same partition (in your case the unique id). You could insert more than one item with the same id, if it had a different date.
Therefore if you want to get an item by only its partition key, which is really a unique ID, you need to use a query to retrieve the item (as opposed to a GET), but the return signature will be a collection of items. As you know you only have one item in the partition, you can take the first item, and specify a limit of 1 to save RCU.
// create model
let model = dynamoose.model(
"SampleStatus",
{
id: {
type: String,
hashKey: true,
"index": {
"name": "index_name",
"rangeKey": "date",
}
},
date: {
type: Date
},
status: String,
});
You have to tell the schema that hashKey and range are one partition key.
Ref: https://dynamoosejs.com/guide/Schema#index-boolean--object--array

DynamoDB update - "ValidationException: An operand in the update expression has an incorrect data type"

I am trying to append to a string set (array of strings) column, which may or may not already exist, in a DynamoDB table. I referred to SO questions like this and this when writing my UpdateExpression.
My code looks like this.
const AWS = require('aws-sdk')
const dynamo = new AWS.DynamoDB.DocumentClient()
const updateParams = {
// The table definitely exists.
TableName: process.env.DYNAMO_TABLE_NAME,
Key: {
email: user.email
},
// The column may or may not exist, which is why I am combining list_append with if_not_exists.
UpdateExpression: 'SET #column = list_append(if_not_exists(#column, :empty_list), :vals)',
ExpressionAttributeNames: {
'#column': 'items'
},
ExpressionAttributeValues: {
':vals': ['test', 'test2'],
':empty_list': []
},
ReturnValues: 'UPDATED_NEW'
}
dynamo.update(updateParams).promise().catch((error) => {
console.log(`Error: ${error}`)
})
However, I am getting this error: ValidationException: An operand in the update expression has an incorrect data type. What am I doing incorrectly here?
[Update]
Thanks to Nadav Har'El's answer, I was able to make it work by amending the params to use the ADD operation instead of SET.
const updateParams = {
TableName: process.env.DYNAMO_TABLE_NAME,
Key: {
email: user.email
},
UpdateExpression: 'ADD items :vals',
ExpressionAttributeValues: {
':vals': dynamo.createSet(['test', 'test2'])
}
}
A list and a string set are not the same type - a string set can only hold strings while a list may hold any types (including nested lists and objects), element types don't need to be the same, and a list can hold also duplicate items. So if your original item is indeed as you said a string set, not a list, this explains why this operation cannot work.
To add items to a string set, use the ADD operation, not the SET operation. The parameter you will give to add should be a set (not a list, I don't know the magic js syntax to specify this, check your docs) with a bunch of elements. If the attribute already exists these elements will be added to it (dropping duplicates), and if the attribute doesn't already exit, it will be set to the set of these elements. See the documentation here: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html#DDB-UpdateItem-request-UpdateExpression

Add to list only if string doesn't already exist in DynamoDB table

I'm trying with the following code
{
ExpressionAttributeNames: {
"#items": "items"
},
ExpressionAttributeValues: {
":item": [slug]
},
Key: {
listId: listId,
userId: userData.userId,
},
UpdateExpression: "SET #items = list_append(#items,:item)",
ConditionExpression: "NOT contains (#items, :item)",
TableName: process.env.listsTableName,
}
but the item is still added even if string already exists in the list. What am I doing wrong?
The list structure is like so:
{
Item: {
userId: userData.userId,
listId: crypto.createHash('md5').update(Date.now() + userData.userId).digest('hex'),
listName: 'Wishlist',
items: [],
},
TableName: process.env.listsTableName,
};
Later Edit: I know I should use SS as it does the condition for me but SS doesn't work in my context because SS can't be empty.
As the documentation explains, the contains() function only works on a string value (checking for a substring) or a set value (checking for membership). But in your case, you don't have a set but rather a list - with are different things in DynamoDB.
If all the items which you want to add to this list are strings, and you anyway don't want duplicates in the list, the most efficient way would be to stop using a list, and instead use the set-of-strings (a.k.a. SS) type. To add an item to the set (without duplicates), you would simply use "ADD #items :item" (no need for any additional condition - duplicates will not be added).

DynamoDB consistent reads for Global Secondary Index

Why cant I get consistent reads for global-secondary-indexes?
I have the following setup:
The table: tblUsers (id as hash)
Global Secondary Index: tblUsersEmailIndex (email as hash, id as attribute)
Global Secondary Index: tblUsersUsernameIndex (username as hash, id as attribute)
I query the indexes to check if a given email or username is present, so I dont create a duplicate user.
Now, the problem is I cant do consistent reads for queries on the indexes. But why not? This is one of the few occasions I actually need up-to-date data.
According to AWS documentation:
Queries on global secondary indexes support eventual consistency only.
Changes to the table data are propagated to the global secondary indexes within a fraction of a second, under normal conditions. However, in some unlikely failure scenarios, longer propagation delays might occur. Because of this, your applications need to anticipate and handle situations where a query on a global secondary index returns results that are not up-to-date.
But how do i handle this situation? How can I make sure that a given email or username is not already present in the db?
You probably already went through this:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
The short answer is that you cannot do what you want to do with Global Secondary Indexes (ie it's always eventual consistency).
A solution here would be to have a separate table w/ the attribute you're interested in as a key and do consistent reads there. You would need to ensure you are updating that whenever you are inserting new entities, and you would also have to worry about the edge case in which inserts there succeed, but not in the main table (ie you need to ensure they are in sync)
Another solution would be to scan the whole table, but that would probably be overkill if the table is large.
Why do you care if somebody creates 2 accounts with the same email? You could just use the username as the primary hash key and just not enforce the email uniqueness.
When you try to use putItem, you have a ConditionExpression to use to check if the condition is satisfied to put the item, which means you can check if the email or username exists.
ConditionExpression — (String)
A condition that must be satisfied in order for a conditional PutItem operation to succeed.
An expression can contain any of the following:
Functions: attribute_exists | attribute_not_exists | attribute_type | contains | begins_with | size
These function names are case-sensitive.
Comparison operators: = | <> | < | > | <= | >= | BETWEEN | IN
Logical operators: AND | OR | NOT
For more information on condition expressions, see Condition Expressions in the Amazon DynamoDB Developer Guide.
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB.html#putItem-property
I ran across this recently and wanted to share an update. In 2018, DynamoDB added transactions. If you really need to keep two items (either in the same or different tables) in 100% sync with no eventual consistency to worry about, TransactWriteItems and TransactGetItems are what you need.
It's better to avoid the transaction altogether, if you can, as others have suggested.
you can't have strongly consistent read on GSI.
What you can do is
Model your schema to have 2 rows, e.g:-
user#uId as pk.
email#emailId as pk.
make pk as of type string
Depending on your situation and considering all of the alternatives, it may be acceptable to add an automatic retry when you don't find anything on the GSI the first time to work around the lack of strongly consistent reads. I didn't even think of this until I hit road blocks with other options and then realized this was simple and didn't cause any issues for our particular use case.
{
"TableName": "tokens",
"ProvisionedThroughput": { "ReadCapacityUnits": 5, "WriteCapacityUnits": 5 },
"AttributeDefinitions": [
{ "AttributeName": "data", "AttributeType": "S" },
{ "AttributeName": "type", "AttributeType": "S" },
{ "AttributeName": "token", "AttributeType": "S" }
],
"KeySchema": [
{ "AttributeName": "data", "KeyType": "HASH" },
{ "AttributeName": "type", "KeyType": "RANGE" }
],
"GlobalSecondaryIndexes": [
{
"IndexName": "tokens-token",
"KeySchema": [
{ "AttributeName": "token", "KeyType": "HASH" }
],
"Projection": {
"ProjectionType": "ALL"
},
"ProvisionedThroughput": { "ReadCapacityUnits": 2, "WriteCapacityUnits": 2 }
}
],
"SSESpecification": {"Enabled": true }
}
public async getByToken(token: string): Promise<TokenResponse> {
let tokenResponse: TokenResponse;
let tries = 1;
while (tries <= 2) { // Can't perform strongly consistent read on GSI so we have to do this to insure the token doesn't exist
let item = await this.getItemByToken(token);
if (item) return new TokenResponse(item);
if (tries == 1) await this.sleep(1000);
tries++;
}
return tokenResponse;
}
Since we don't care about performance for someone sending in a non-existent token (which should never happen anyway), we work around the problem without taking any performance hit (other than a possible 1 second delay one time after the token is created). If you just created the token, you wouldn't need to resolve it back to the data you just passed in. But if you happen to do that, we handle it transparently.

Resources