DynamoDB consistent reads for Global Secondary Index - asp.net

Why cant I get consistent reads for global-secondary-indexes?
I have the following setup:
The table: tblUsers (id as hash)
Global Secondary Index: tblUsersEmailIndex (email as hash, id as attribute)
Global Secondary Index: tblUsersUsernameIndex (username as hash, id as attribute)
I query the indexes to check if a given email or username is present, so I dont create a duplicate user.
Now, the problem is I cant do consistent reads for queries on the indexes. But why not? This is one of the few occasions I actually need up-to-date data.
According to AWS documentation:
Queries on global secondary indexes support eventual consistency only.
Changes to the table data are propagated to the global secondary indexes within a fraction of a second, under normal conditions. However, in some unlikely failure scenarios, longer propagation delays might occur. Because of this, your applications need to anticipate and handle situations where a query on a global secondary index returns results that are not up-to-date.
But how do i handle this situation? How can I make sure that a given email or username is not already present in the db?

You probably already went through this:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
The short answer is that you cannot do what you want to do with Global Secondary Indexes (ie it's always eventual consistency).
A solution here would be to have a separate table w/ the attribute you're interested in as a key and do consistent reads there. You would need to ensure you are updating that whenever you are inserting new entities, and you would also have to worry about the edge case in which inserts there succeed, but not in the main table (ie you need to ensure they are in sync)
Another solution would be to scan the whole table, but that would probably be overkill if the table is large.
Why do you care if somebody creates 2 accounts with the same email? You could just use the username as the primary hash key and just not enforce the email uniqueness.

When you try to use putItem, you have a ConditionExpression to use to check if the condition is satisfied to put the item, which means you can check if the email or username exists.
ConditionExpression — (String)
A condition that must be satisfied in order for a conditional PutItem operation to succeed.
An expression can contain any of the following:
Functions: attribute_exists | attribute_not_exists | attribute_type | contains | begins_with | size
These function names are case-sensitive.
Comparison operators: = | <> | < | > | <= | >= | BETWEEN | IN
Logical operators: AND | OR | NOT
For more information on condition expressions, see Condition Expressions in the Amazon DynamoDB Developer Guide.
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB.html#putItem-property

I ran across this recently and wanted to share an update. In 2018, DynamoDB added transactions. If you really need to keep two items (either in the same or different tables) in 100% sync with no eventual consistency to worry about, TransactWriteItems and TransactGetItems are what you need.
It's better to avoid the transaction altogether, if you can, as others have suggested.

you can't have strongly consistent read on GSI.
What you can do is
Model your schema to have 2 rows, e.g:-
user#uId as pk.
email#emailId as pk.
make pk as of type string

Depending on your situation and considering all of the alternatives, it may be acceptable to add an automatic retry when you don't find anything on the GSI the first time to work around the lack of strongly consistent reads. I didn't even think of this until I hit road blocks with other options and then realized this was simple and didn't cause any issues for our particular use case.
{
"TableName": "tokens",
"ProvisionedThroughput": { "ReadCapacityUnits": 5, "WriteCapacityUnits": 5 },
"AttributeDefinitions": [
{ "AttributeName": "data", "AttributeType": "S" },
{ "AttributeName": "type", "AttributeType": "S" },
{ "AttributeName": "token", "AttributeType": "S" }
],
"KeySchema": [
{ "AttributeName": "data", "KeyType": "HASH" },
{ "AttributeName": "type", "KeyType": "RANGE" }
],
"GlobalSecondaryIndexes": [
{
"IndexName": "tokens-token",
"KeySchema": [
{ "AttributeName": "token", "KeyType": "HASH" }
],
"Projection": {
"ProjectionType": "ALL"
},
"ProvisionedThroughput": { "ReadCapacityUnits": 2, "WriteCapacityUnits": 2 }
}
],
"SSESpecification": {"Enabled": true }
}
public async getByToken(token: string): Promise<TokenResponse> {
let tokenResponse: TokenResponse;
let tries = 1;
while (tries <= 2) { // Can't perform strongly consistent read on GSI so we have to do this to insure the token doesn't exist
let item = await this.getItemByToken(token);
if (item) return new TokenResponse(item);
if (tries == 1) await this.sleep(1000);
tries++;
}
return tokenResponse;
}
Since we don't care about performance for someone sending in a non-existent token (which should never happen anyway), we work around the problem without taking any performance hit (other than a possible 1 second delay one time after the token is created). If you just created the token, you wouldn't need to resolve it back to the data you just passed in. But if you happen to do that, we handle it transparently.

Related

Query dynamodb db list items with IN clause

I have a dynamodb table whose items have below structures.
{
"url": "some-url1",
"dependencies": [
"dependency-1",
"dependency-2",
"dependency-3",
"dependency-4"
],
"status": "active"
}
{
"url": "some-url2",
"dependencies": [
"dependency-2",
],
"status": "inactive"
}
{
"url": "some-url3",
"dependencies": [
"dependency-1",
],
"status": "active"
}
Here, url is defined as the partition key and there is no sort key.
The query which needs to run needs to find all the records with a specific dependency and status.
For example - find all the records for whom dependency-1 is present in dependencies list and whose status is active.
So for the above records, record 1st and 3rd should be returned.
Do I need to set GSI on dependencies or is this something which cannot be done in dynamodb ?
You cannot create a GSI on a nested value. You can however create a GSI on status but you would need to be careful as it has a low cardinality meaning you could limit your throughput to 1000 writes per second if all of your items being written to the table have the same status. Of course if you never intend to scale that high then it's no issue.
Your other option is to use a Scan where you read your entire data set and use a FilterExpression to filter based on dependency and status.
Depending on the SDK you use you can find some example operations here:
https://github.com/aws-samples/aws-dynamodb-examples/tree/master/DynamoDB-SDK-Examples

ConditionExpression for PutItem not evaluating to false

I am trying to guarantee uniqueness in my DynamoDB table, across the partition key and other attributes (but not the sort key). Something is wrong with my ConditionExpression, because it is evaluating to true and the same values are getting inserted, leading to data duplication.
Here is my table design:
email: partition key (String)
id: sort key (Number)
firstName (String)
lastName (String)
Note: The id (sort key) holds randomly generated unique number. I know... this looks like a bad design, but that is the use case I have to support.
Here is the NodeJS code with PutItem:
const dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10'})
const params = {
TableName: <table-name>,
Item: {
"email": { "S": "<email>" },
"id": { "N": "<someUniqueRandomNumber>" },
"firstName": { "S": "<firstName>" },
"lastName": { "S": "<lastName>" }
},
ConditionExpression: "attribute_not_exists(email) AND attribute_not_exists(firstName) AND attribute_not_exists(lastName)"
}
dynamodb.putItem(params, function(err, data) {
if (err) {
console.error("Put failed")
}
else {
console.log("Put succeeded")
}
})
The documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.OperatorsAndFunctions.html says the following:
attribute_not_exists (path)
True if the attribute specified by path does not exist in the item.
Example: Check whether an item has a Manufacturer attribute.
attribute_not_exists (Manufacturer)
it specifically says "item" not "items" or "any item", so I think it really means that it checks only the item being overwritten. As you have a random sort key, it will always create a new item and the condition will be always true.
Any implementation which would check against a column which is not an index and would test all the records would cause a scan of all items and that is something what would not perform very well.
Here is an interesting article which covers how to deal with unique attributes in dynamodb https://advancedweb.hu/how-to-properly-implement-unique-constraints-in-dynamodb/ - the single table design together with transactions would be a possible solution for you if you can allow the additional partition keys in your table. Any other solution may be challenging under your current schema. DynamoDB has its own way of doing things and it may be frustrating to try to push to do things which it is not designed for.

Use the same DynamoDB attribute as both HASH and RANGE key

I have a DynamoDB that is indexed by a single numerical key. I'd like to be able to both retrieve items having a specific value of the key, and find its maximum value by querying and requesting a single item in inverse sorting order. When I try to define 2 indices as in the excerpt below on the same key I get the error 'Two keys can not have the same name'.
"KeySchema": [
{
"AttributeName": "logs",
"KeyType": "HASH"
},
{
"AttributeName": "logs",
"KeyType": "RANGE"
}
]
You could define your key schema with only the hash key, no range key
"KeySchema": [
{
"AttributeName": "logs",
"KeyType": "HASH"
}
]
To request an item using a specific 'logs' value, use GetItem.
To find the highest value you would need to perform a Scan. This would be a poor way to find the highest value as it would mean evaluating every item in your table. It would be slow and expensive.
You might want to reassess your approach. If you are simply trying to create unique IDs, this is not the right approach for DynamoDB. What you should do is:
Generate a long UUID
Do a GetItem to verify that the UUID is available. This is very cheap and fast in DynamoDB
Use the UUID

DocumentDB adding ORDER BY clause uses excessive RUs

I have a partitioned collection with about 400k documents in a particular partition. Ideally this would be more distributed, but I need to deal with all the documents in the same partition for transaction considerations. I have a query which includes the partition key and the document id, which returns quickly with 2.58 RUs of usage.
This query is dynamic and potentially could be constructed to have an IN clause to search for multiple document ids. As such I added an ORDER BY to ensure the results were in a consistent order, adding the clause however caused the RUs to skyrocket to almost 6000! Given that the WHERE clause should be filtering down the results to a handful before sorting, I was surprised by these results. It almost seems like it's applying the ORDER BY before the WHERE clause, which must not be correct. Is there something under the covers with the ORDER BY clause that would explain this behavior?
Example document:
{ "DocumentType": "InventoryRecord", (PartitionKey, String) "id": "7867f600-c011-85c0-80f2-c44d1cf09f36", (DocDB assigned GUID, stored as string) "ItemNumber": "123345", (String) "ItemName": "Item1" (String) }
With a Query looking like this:
SELECT * FROM c where c.DocumentType = 'InventoryRecord' and c.id = '7867f600-c011-85c0-80f2-c44d1cf09f36' order by c.ItemNumber
You should at least put a range index to ItemNumber. This should ensure, there is a ordering as expected. The addition in your indexing policy this would look like
{
"path": "/ItemNumber/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}

How to query related records in Firebase?

Given this database structure in Firebase:
{
"users": {
"user1": {
"items": {
"id1": true
}
},
"user2": {
"items": {
"id2": true
}
}
},
"items": {
"id1": {
"name": "foo1",
"user": "user1"
},
"id2": {
"name": "foo2",
"user": "user2"
}
}
}
which is a more efficient way of querying the items belonged to a specific user?
The Firebase docs seem to suggest this:
var itemsRef = new Firebase("https://firebaseio.com/items");
var usersItemsRef = new Firebase("https://firebaseio/users/" + user.uid + "/items");
usersItemsRef.on("child_added", function(data){
itemsRef.child(data.key()).once("value", function(itemData){
//got the item
});
});
but using the .equalTo() query works as well:
var ref = new Firebase("https://firebaseio.com/items");
ref.orderByChild("user").equalTo(user.uid).on("child_added", function(data){
//got the item
});
The latter code seems more concise and doesn't require denormalization of the item keys into the user records but it's unclear to me if it's a less efficient methodology (assuming I create an index on "user").
thanks.
This is rather old one, but when working on the firebase-backed app, I found myself dealing with similar issues quite often.
.equalTo is more time-efficient (especially, if one user owns big number of items). Although n+1 subscriptions does not lead to n+1 networking roundtrips to the cloud, there is some performance penalty for having so many open subscriptions.
Moreover, .equalTo approach does not lead to denormalization of your data.
There is a gotcha however: When you'll want to secure the data, the .equalTo approach may stop working at all.
To allow user to call orderByChild("user").equalTo(user.uid), they must have read privilege to 'items' collection. This read permission is valid for the whole sub-document rooted at /items.
Summary: If user1 is to be prevented from finding out about items of user2, you must use the BYOI (build your own index) approach. That way you can validate that user only reads items that are put to their index.
Finally, disclaimer :) I use firebase only for a short period of time all I got is a few benchmarks and documentation. If I'm mistaken in any way, please correct me.

Resources