I am attempting to create a leader board using dynamo db for an quiz style Alexa skill. I have set up the table and users are added to the table with their appropriate data e.g.:
Item: {
"PlatformId": 2,
"UserId": 12345,
"Score": 100,
"NickName": "scott",
"Sport": "football",
}
In my table the Primary key is their UserId, the sort key is the PlatformId (this is the same for all users). I have a secondary global index which sets the platformId as the primary key, and the score as the sort key.
In this leader board i want users to be ranked, the highest scorer being number 1, my first attempt at this was to scan the table using the secondary index, this nicely returned all the users sorted by score, however with the potential to have thousands of users on this leader board, i discovered that the time to scan a table with 10000+ users exceeds the 8 second response time that Alexa skills have. This causes the skill to error and close.
Before the response time exceeded, i was using the LastEvaluatedKey to perform an extra scan if the first one didn't cover the entire table, but on this second scan is when the response time limit was exceeded. Annoyingly it's just taking too long to scan the table.
dbHelper.prototype.scanGetUsers = (ad, newParams = null) => {
return new Promise((resolve, reject) => {
let params = {};
if (newParams != null) {
params = newParams
} else {
params = {
TableName: tableName,
IndexName: 'PlatformId-Score-index',
FilterExpression: "Score >= :s AND PlatformId = :p",
ProjectionExpression: `NickName, Sport, Score`,
// Limit: 10,
ExpressionAttributeValues: {
":p": User.PlatformId,
":s": User.Score,
},
}
}
docClient.scan(params, function (err, data) {
if (err || !data) {
console.error("Unable to read item. Error JSON:", JSON.stringify(err, null, 2));
return reject(JSON.stringify(err, null, 2))
} else {
console.log("scan users data succeeded:", JSON.stringify(data, null, 2));
if(data.LastEvaluatedKey) {
console.log("found a LastEvalutedKey, Continuing scan");
params.ExclusiveStartKey = data.LastEvaluatedKey;
data = data.concat(this.scanGetUsers(ad, params));
}
resolve(data);
}
});
});
}
Is there a way to work around these issues that i haven't explored yet? Or a way to create a leader board with dynamo db that can be structured in an easier way?
You can try Sort Key:
When you combine a partition key and sort key, they create a composite key, and that composite key is the primary key for individual items in a table. With a composite key, you gain the ability to use queries with a KeyConditionExpression against the sort key. In a query, you can use KeyConditionExpression to write conditional statements by using comparison operators that evaluate against a key and limit the items returned. In other words, you can use special operators to include, exclude, and match items by their sort key values.
The article contains all information how to setup and use it.
Related
I don't get the concept of limits for query/scan in DynamoDb.
According to the docs:
A single Query operation can retrieve a maximum of 1 MB of data.This
limit applies before any FilterExpression is applied to the results.
Let's say I have 10k items, 250kb per item, all of them fit query params.
If I run a simple query, I get only 4 items?
If I use ProjectionExpression to retrieve only single attribute (1kb
in size), will I get 1k items?
If I only need to count items (select: 'COUNT'), will it count all
items (10k)?
If I run a simple query, I get only 4 items?
Yes
If I use ProjectionExpression to retrieve only single attribute (1kb in size), will I get 1k items?
No, filterexpressions and projectexpressions are applied after the query has completed. So you still get 4 items.
If I only need to count items (select: 'COUNT'), will it count all items (10k)?
No, still just 4
The thing that you are probably missing here is that you can still get all 10k results, or the 10k count, you just need to get the results in pages. Some details here. Basically when you complete your query, check the LastEvaluatedKey attribute, and if its not empty, get the next set of results. Repeat this until the attribute is empty and you know you have all the results.
EDIT: I should say some of the SDKs abstract this away for you. For example the Java SDK has query and queryPage, where query will go back to the server multiple times to get the full result set for you (i.e. in your case, give you the full 10k results).
For any operation that returns items, you can request a subset of attributes to retrieve; however, doing so has no impact on the item size calculations. In addition, Query and Scan can return item counts instead of attribute values. Getting the count of items uses the same quantity of read capacity units and is subject to the same item size calculations. This is because DynamoDB has to read each item in order to increment the count.
Managing Throughput Settings on Provisioned Tables
Great explanation by #f-so-k.
This is how I am handling the query.
import AWS from 'aws-sdk';
async function loopQuery(params) {
let keepGoing = true;
let result = null;
while (keepGoing) {
let newParams = params;
if (result && result.LastEvaluatedKey) {
newParams = {
...params,
ExclusiveStartKey: result.LastEvaluatedKey,
};
}
result = await AWS.query(newParams).promise();
if (result.count > 0 || !result.LastEvaluatedKey) {
keepGoing = false;
}
}
return result;
}
const params = {
TableName: user,
IndexName: 'userOrder',
KeyConditionExpression: 'un=:n',
ExpressionAttributeValues: {
':n': {
S: name,
},
},
ConsistentRead: false,
ReturnConsumedCapacity: 'NONE',
ProjectionExpression: ALL,
};
const result = await loopQuery(params);
Edit:
import AWS from 'aws-sdk';
async function loopQuery(params) {
let keepGoing = true;
let result = null;
let list = [];
while (keepGoing) {
let newParams = params;
if (result && result.LastEvaluatedKey) {
newParams = {
...params,
ExclusiveStartKey: result.LastEvaluatedKey,
};
}
result = await AWS.query(newParams).promise();
if (result.count > 0 || !result.LastEvaluatedKey) {
keepGoing = false;
list = [...list, ...result]
}
}
return list;
}
const params = {
TableName: user,
IndexName: 'userOrder',
KeyConditionExpression: 'un=:n',
ExpressionAttributeValues: {
':n': {
S: name,
},
},
ConsistentRead: false,
ReturnConsumedCapacity: 'NONE',
ProjectionExpression: ALL,
};
const result = await loopQuery(params);
I've written a API gateway to scan a dynamodb table and get values based on the condition and my code is as below.
var params = {
TableName: 'CarsData',
FilterExpression: '#market_category = :market_category and #vehicle_size = :vehicle_size and #transmission_type = :transmission_type and #price_range = :price_range and #doors = :doors',
ExpressionAttributeNames: {
"#market_category": "market_category",
"#vehicle_size": "vehicle_size",
"#transmission_type": "transmission_type",
"#price_range": "price_range",
"#doors": "doors"
},
ExpressionAttributeValues: {
":market_category": body.market_category,
":vehicle_size": body.vehicle_size,
":transmission_type": body.transmission_type,
":price_range": body.price_range,
":doors": body.doors
}
}
dynamodb.scan(params).promise().then(function (data) {
var uw = data.Items;
console.log(data + "\n" + JSON.stringify(data) + "\n" + JSON.stringify(data.Items));
var res = {
"statusCode": 200,
"headers": {},
"body": JSON.stringify(uw)
};
ctx.succeed(res);
}).catch(function (err) {
console.log(err);
var res = {
"statusCode": 404,
"headers": {},
"body": JSON.stringify({ "status": "error" })
};
ctx.succeed(res);
});
when I run this code, I get the result as expected. But when I was going through some online forums, I came to know that scanning is expensive compared to querying. But I'm unable to know on how can I change my query from scan to query. Here my primary key is ID. please let me know on how can I do this.
Thanks
Scan operation is more expensive comparing to query operation, in terms of performance as well as costing. Dynamodb calculates cost based on the number of read capacity units consumed for processing not on number of records returned.
Query operation finds value based on primary key (Hash) or composite primary key (Hash key and Sort Key).
Your schema should be redesigned with composite primary key(Hash key and Sort Key).
Its not neccessary to have column Id as primary Key like old school RDBMS. If you are not using Id effectively remove that column from your schema and redefine it with some other attributes. For an example am using Market Category (market_category ) as Hash Key & Price Range (price_range) as Range Key.
var params = {
"TableName": 'CarsData',
"ConsistentRead": true,
//Composite Primary Key in Key Condition Expression
"KeyConditionExpression": "#market_category = :market_category AND #price_range = :price_range",
//Remaining column in filter expression
"FilterExpression": '#vehicle_size = :vehicle_size and #transmission_type = :transmission_type and #doors = :doors',
"ExpressionAttributeNames": {
"#market_category": "market_category",
"#vehicle_size": "vehicle_size",
"#transmission_type": "transmission_type",
"#price_range": "price_range",
"#doors": "doors"
},
"ExpressionAttributeValues": {
":market_category": body.market_category,
":vehicle_size": body.vehicle_size,
":transmission_type": body.transmission_type,
":price_range": body.price_range,
":doors": body.doors
}
}
dynamodb.query(params).promise()
.then(function (data) {
console.log(data);
}).catch(function (err) {
console.log(err);
});
Hope this example will give you insights about using composite primary key,
Based on your usage choose the widely used columns for Hash & Range key.
I am trying to filter list of maps from a dynamodb table which is of the following format.
{
id: "Number",
users: {
{ userEmail: abc#gmail.com, age:"23" },
{ userEmail: de#gmail.com, age:"41" }
}
}
I need to get the data of the user with userEmail as "abc#gmail.com". Currently I am doing it using the following dynamodb query. Is there any another efficient way to solve this issue ?
var params = {
TableName: 'users',
Key:{
'id': id
}
};
var docClient = new AWS.DynamoDB.DocumentClient();
docClient.get(params, function (err, data) {
if (!err) {
const users = data.Item.users;
const user = users.filter(function (user) {
return user.email == userEmail;
});
// filtered has the required user in it
});
The only way you can get a single item in dynamo by id if you have a table with a partition key. So you need to have a table that looks like:
Email (string) - partition key
Id (some-type) - user id
...other relevant user data
Unfortunately, since a nested field cannot be a partition key you will have to maintain a separate table here and won't be able to use an index in DynamoDB (neither LSI, nor GSI).
It's a common pattern in NoSQL to duplicate data, so there is nothing unusual in it. If you were using Java, you could use transactions library, to ensure that both tables are in sync.
If you are not going to use Java you could read DynamoDB stream of the original database (where emails are nested fields) and update the new table (where emails are partition keys) when an original table is updated.
I have a DynamoDB table with a primary key (_id) being a simple int. I want to get the highest value for the primary key.
How do I return the item in the table with the highest _id?
I can use either the Amazon javascript API or the Dynamoose library.
Partition keys are not stored in order. You would need to scan the entire table, stream over the items, map to the _id attribute and then return the maximum value.
You can easily create Global Secondary Index where _id must to be a sort key and based on it you can make a request like this:
var params = {
TableName: 'Devices',
KeyConditionExpression: 'status = :status',
ScanIndexForward: false, // true = ascending, false = descending
ExpressionAttributeValues: {
':s': status
}
};
docClient.query(params, function(err, data) {});
I have a dynamodb table that stores users videos.
It's structured like this:
{
"userid": 324234234234234234, // Hash key
"videoid": 298374982364723648 // Range key
"user": {
"username": "mario"
}
}
I want to update username for all videos of a specific user. It's possible with a simple update or i have to scan the complete table and update one item a time?
var params = {
TableName: DDB_TABLE_SCENE,
Key: {
userid: userid,
},
UpdateExpression: "SET username = :username",
ExpressionAttributeValues: { ":username": username },
ReturnValues: "ALL_NEW",
ConditionExpression: 'attribute_exists (userid)'
};
docClient.update(params, function(err, data) {
if (err) fn(err, null);
else fn(err, data.Attributes.username);
});
I receive the following error, I suppose the range key is necessary.
ValidationException: The provided key element does not match the schema
Dynamo does not support write operations across multiple items (ie. for more than one item at a time). You will have to first scan/query the table, or otherwise generate a list of all items you'd like to update, and then update them one by one.
Dynamo does provide a batching API but that is still just a way to group updates together in batches of 25 at a time. It's not a proxy for a multi-item update like you're trying to achieve.