AWS-CDK get secondary indexes and metrics for those - amazon-dynamodb

I have a cdk stack with a DDB table created with some alarms over it. Now have successfully add a secondary index to it.
However, how can I programmatically list the secondary indexes that a table has and how can I get the metrics for it.
Background, we have a module which creates the alarms for our DDB tables. That module receives the cdk table object and creates some alarms over it using methods like metricConsumedWriteCapacityUnits.
I want to extend that alarm creator module to also create alarms for the indexes of the table, for such I need to read the secondary indexes (more concretely a global one) to check if the table has any; and if it does then create the alarms. Those alarms should be over capacity consumption and throttled requests (but might be extended to other metrics).
Given a table cdk object, how can I list the secondary indexes it has?
Having retrieved the secondary indexes, how can I now if they are local or global?
Having retrieved a global secondary index, how can I get the metrics associated with it; capacity usage and throttled requests?

Given a table cdk object, how can I list the secondary indexes it has?
This is the tricky bit. The indexes are not available on the ITable, only the CfnTable. Getting from the ITable to the CfnTable can be done through table.node.defaultChild. Then the variables with the indexes have to be resolved in the CDK stack.
Having retrieved the secondary indexes, how can I [know] if they are local or global?
This is the easy bit. The indexes are in separate variables, globalSecondaryIndexes and localSecondaryIndexes.
Having retrieved a global secondary index, how can I get the metrics associated with it; capacity usage and throttled requests?
By passing { GlobalSecondaryIndexName: <indexName> } to the dimensions parameter of the metric call.
Below is a class, based on something I just built for work, that does all of the above. It compiles and should even run (not 100% sure it will run because I removed some intermediate scaffolding from our solution and haven't tested this exact code).
import * as cloudwatch from '#aws-cdk/aws-cloudwatch';
import * as dynamo from '#aws-cdk/aws-dynamodb';
import { Construct, Duration } from '#aws-cdk/core';
export class DynamoMonitor extends Construct {
private static getIndexNames(dynamoTable: dynamo.ITable) {
// pull the names of any indexes on the current table from construct
const table = dynamoTable.node.defaultChild as dynamo.CfnTable;
const indexes = dynamoTable.stack.resolve(table.globalSecondaryIndexes) as
Array<dynamo.CfnTable.GlobalSecondaryIndexProperty> | undefined;
const indexNames: string[] = [];
if (indexes) {
for (const index of indexes) {
indexNames.push(index.indexName);
}
}
return indexNames;
}
constructor(scope: Construct, id: string, dynamoTable: dynamo.ITable) {
super(scope, id);
const period = Duration.seconds(30);
const threshold = 50;
const evaluationPeriods = 5;
const indexNames = DynamoMonitor.getIndexNames(dynamoTable);
for (const indexName of indexNames) {
const throttleEvents = dynamoTable.metric('WriteThrottleEvents', {
period: period,
dimensions: { GlobalSecondaryIndexName: indexName },
statistic: cloudwatch.Statistic.SAMPLE_COUNT,
unit: cloudwatch.Unit.COUNT,
});
const consumedWriteCapacityUnits = dynamoTable.metricConsumedWriteCapacityUnits({
label: 'ConsumedWriteCapacityUnits',
dimensions: { GlobalSecondaryIndexName: indexName },
period: period,
statistic: cloudwatch.Statistic.SAMPLE_COUNT,
unit: cloudwatch.Unit.COUNT,
});
const throttleRate = new cloudwatch.MathExpression({
expression: '(throttleEvents/consumedWriteCapacityUnits) * 100',
label: 'WriteThrottleRate',
usingMetrics: {
throttleEvents: throttleEvents,
consumedWriteCapacityUnits: consumedWriteCapacityUnits,
},
});
throttleRate.createAlarm(this, `WriteIndex${indexName}ThrottleRateAlarm`, {
threshold: threshold,
evaluationPeriods: evaluationPeriods,
comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
alarmDescription: 'this.description',
treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
});
}
}
}

Related

RTK Query and Mutation sequentially

I have a situation using rtk about how to use the mutations and queries sequentially.
In my use case, in a route like /status/:id/:version, I need to create a job based on Id and version and then monitor the progress of creation (takes around 30 seconds). I have a query and a mutation in this route
const {id, version} = useParams()
const pollRef = useRef(1000)
const [createJob, {data: postData, error, isSuccess}] = useCreateJobMutation()
const {data, ...} = useGetJobIdQuery(postData[0].id, { pollingInterval: pollRef.current })
if (data.progress === 100) {
pollRef.current = 0 // stop polling GET route
use data ...
}
useEffect(()=> {
createJob(newJob) // newJob is created based on id, version
}, [])
I need to wait for the postData to be valid (not undefined), the issue is how to send the result of the mutation to the query without violating the hook rules. (I get ERROR Rendered more hooks during the previous render.)
if (isSuccess) {
useGetJobIdQuery(...) // violate hook rules
}
useCreateJobMutation() and useGetJobIdQuery() work fine standanlone but not together
You can skip queries:
import { skipToken } from '#reduxjs/toolkit/query/react'
useGetJobIdQuery(isSuccess ? jobId : skipToken)

Dynamo DB leader board within an Alexa Skill

I am attempting to create a leader board using dynamo db for an quiz style Alexa skill. I have set up the table and users are added to the table with their appropriate data e.g.:
Item: {
"PlatformId": 2,
"UserId": 12345,
"Score": 100,
"NickName": "scott",
"Sport": "football",
}
In my table the Primary key is their UserId, the sort key is the PlatformId (this is the same for all users). I have a secondary global index which sets the platformId as the primary key, and the score as the sort key.
In this leader board i want users to be ranked, the highest scorer being number 1, my first attempt at this was to scan the table using the secondary index, this nicely returned all the users sorted by score, however with the potential to have thousands of users on this leader board, i discovered that the time to scan a table with 10000+ users exceeds the 8 second response time that Alexa skills have. This causes the skill to error and close.
Before the response time exceeded, i was using the LastEvaluatedKey to perform an extra scan if the first one didn't cover the entire table, but on this second scan is when the response time limit was exceeded. Annoyingly it's just taking too long to scan the table.
dbHelper.prototype.scanGetUsers = (ad, newParams = null) => {
return new Promise((resolve, reject) => {
let params = {};
if (newParams != null) {
params = newParams
} else {
params = {
TableName: tableName,
IndexName: 'PlatformId-Score-index',
FilterExpression: "Score >= :s AND PlatformId = :p",
ProjectionExpression: `NickName, Sport, Score`,
// Limit: 10,
ExpressionAttributeValues: {
":p": User.PlatformId,
":s": User.Score,
},
}
}
docClient.scan(params, function (err, data) {
if (err || !data) {
console.error("Unable to read item. Error JSON:", JSON.stringify(err, null, 2));
return reject(JSON.stringify(err, null, 2))
} else {
console.log("scan users data succeeded:", JSON.stringify(data, null, 2));
if(data.LastEvaluatedKey) {
console.log("found a LastEvalutedKey, Continuing scan");
params.ExclusiveStartKey = data.LastEvaluatedKey;
data = data.concat(this.scanGetUsers(ad, params));
}
resolve(data);
}
});
});
}
Is there a way to work around these issues that i haven't explored yet? Or a way to create a leader board with dynamo db that can be structured in an easier way?
You can try Sort Key:
When you combine a partition key and sort key, they create a composite key, and that composite key is the primary key for individual items in a table. With a composite key, you gain the ability to use queries with a KeyConditionExpression against the sort key. In a query, you can use KeyConditionExpression to write conditional statements by using comparison operators that evaluate against a key and limit the items returned. In other words, you can use special operators to include, exclude, and match items by their sort key values.
The article contains all information how to setup and use it.

Firebase Realtime DB: Order query results by number of values for a key

I have a Firebase web Realtime DB with users, each of whom has a jobs attribute whose value is an object:
{
userid1:
jobs:
guid1: {},
guid2: {},
userid2:
jobs:
guid1: {},
guid2: {},
}
I want to query to get the n users with the most jobs. Is there an orderby trick I can use to order the users by the number of values the given user has in their jobs attribute?
I specifically don't want to store an integer count of the number of jobs each user has because I need to update users' jobs attribute as a part of atomic updates that update other user attributes concurrently and atomically, and I don't believe transactions (like incrementing/decrementing counters) can be a part of those atomic transactions.
Here's an example of the kind of atomic update I'm doing. Note I don't have the user that I'm modifying in memory when I run the following update:
firebase.database().ref('/').update({
[`/users/${user.guid}/pizza`]: true,
[`/users/${user.guid}/jobs/${job.guid}/scheduled`]: true,
})
Any suggestions on patterns that would work with this data would be hugely appreciated!
Realtime Database transactions run on a single node in the JSON tree, so it would be quite difficult to integrate the update of a jobCounter node within your atomic update to several nodes (i.e. to /users/${user.guid}/pizza and /users/${user.guid}/jobs/${job.guid}/scheduled). We would need to update at /users/${user.guid} level and calculate the counter value, etc...
An easier approach is to use a Cloud Function to update a user's jobCounter node each time there is a change to one of the jobs nodes that implies a change in the counter. In other words, if a new job node is added or removed, the counter is updated. If an existing node is only modified, the counter is not updated, since there were no change in the number of jobs.
exports.updateJobsCounter = functions.database.ref('/users/{userId}/jobs')
.onWrite((change, context) => {
if (!change.after.exists()) {
//This is the case when no more jobs exist for this user
const userJobsCounterRef = change.before.ref.parent.child('jobsCounter');
return userJobsCounterRef.transaction(() => {
return 0;
});
} else {
if (!change.before.val()) {
//This is the case when the first job is created
const userJobsCounterRef = change.before.ref.parent.child('jobsCounter');
return userJobsCounterRef.transaction(() => {
return 1;
});
} else {
const valObjBefore = change.before.val();
const valObjAfter = change.after.val();
const nbrJobsBefore = Object.keys(valObjBefore).length;
const nbrJobsAfter = Object.keys(valObjAfter).length;
if (nbrJobsBefore !== nbrJobsAfter) {
//We update the jobsCounter node
const userJobsCounterRef = change.after.ref.parent.child('jobsCounter');
return userJobsCounterRef.transaction(() => {
return nbrJobsAfter;
});
} else {
//No need to update the jobsCounter node
return null;
}
}
}
});

Query size limits in DynamoDB

I don't get the concept of limits for query/scan in DynamoDb.
According to the docs:
A single Query operation can retrieve a maximum of 1 MB of data.This
limit applies before any FilterExpression is applied to the results.
Let's say I have 10k items, 250kb per item, all of them fit query params.
If I run a simple query, I get only 4 items?
If I use ProjectionExpression to retrieve only single attribute (1kb
in size), will I get 1k items?
If I only need to count items (select: 'COUNT'), will it count all
items (10k)?
If I run a simple query, I get only 4 items?
Yes
If I use ProjectionExpression to retrieve only single attribute (1kb in size), will I get 1k items?
No, filterexpressions and projectexpressions are applied after the query has completed. So you still get 4 items.
If I only need to count items (select: 'COUNT'), will it count all items (10k)?
No, still just 4
The thing that you are probably missing here is that you can still get all 10k results, or the 10k count, you just need to get the results in pages. Some details here. Basically when you complete your query, check the LastEvaluatedKey attribute, and if its not empty, get the next set of results. Repeat this until the attribute is empty and you know you have all the results.
EDIT: I should say some of the SDKs abstract this away for you. For example the Java SDK has query and queryPage, where query will go back to the server multiple times to get the full result set for you (i.e. in your case, give you the full 10k results).
For any operation that returns items, you can request a subset of attributes to retrieve; however, doing so has no impact on the item size calculations. In addition, Query and Scan can return item counts instead of attribute values. Getting the count of items uses the same quantity of read capacity units and is subject to the same item size calculations. This is because DynamoDB has to read each item in order to increment the count.
Managing Throughput Settings on Provisioned Tables
Great explanation by #f-so-k.
This is how I am handling the query.
import AWS from 'aws-sdk';
async function loopQuery(params) {
let keepGoing = true;
let result = null;
while (keepGoing) {
let newParams = params;
if (result && result.LastEvaluatedKey) {
newParams = {
...params,
ExclusiveStartKey: result.LastEvaluatedKey,
};
}
result = await AWS.query(newParams).promise();
if (result.count > 0 || !result.LastEvaluatedKey) {
keepGoing = false;
}
}
return result;
}
const params = {
TableName: user,
IndexName: 'userOrder',
KeyConditionExpression: 'un=:n',
ExpressionAttributeValues: {
':n': {
S: name,
},
},
ConsistentRead: false,
ReturnConsumedCapacity: 'NONE',
ProjectionExpression: ALL,
};
const result = await loopQuery(params);
Edit:
import AWS from 'aws-sdk';
async function loopQuery(params) {
let keepGoing = true;
let result = null;
let list = [];
while (keepGoing) {
let newParams = params;
if (result && result.LastEvaluatedKey) {
newParams = {
...params,
ExclusiveStartKey: result.LastEvaluatedKey,
};
}
result = await AWS.query(newParams).promise();
if (result.count > 0 || !result.LastEvaluatedKey) {
keepGoing = false;
list = [...list, ...result]
}
}
return list;
}
const params = {
TableName: user,
IndexName: 'userOrder',
KeyConditionExpression: 'un=:n',
ExpressionAttributeValues: {
':n': {
S: name,
},
},
ConsistentRead: false,
ReturnConsumedCapacity: 'NONE',
ProjectionExpression: ALL,
};
const result = await loopQuery(params);

Query List of Maps in DynamoDB

I am trying to filter list of maps from a dynamodb table which is of the following format.
{
id: "Number",
users: {
{ userEmail: abc#gmail.com, age:"23" },
{ userEmail: de#gmail.com, age:"41" }
}
}
I need to get the data of the user with userEmail as "abc#gmail.com". Currently I am doing it using the following dynamodb query. Is there any another efficient way to solve this issue ?
var params = {
TableName: 'users',
Key:{
'id': id
}
};
var docClient = new AWS.DynamoDB.DocumentClient();
docClient.get(params, function (err, data) {
if (!err) {
const users = data.Item.users;
const user = users.filter(function (user) {
return user.email == userEmail;
});
// filtered has the required user in it
});
The only way you can get a single item in dynamo by id if you have a table with a partition key. So you need to have a table that looks like:
Email (string) - partition key
Id (some-type) - user id
...other relevant user data
Unfortunately, since a nested field cannot be a partition key you will have to maintain a separate table here and won't be able to use an index in DynamoDB (neither LSI, nor GSI).
It's a common pattern in NoSQL to duplicate data, so there is nothing unusual in it. If you were using Java, you could use transactions library, to ensure that both tables are in sync.
If you are not going to use Java you could read DynamoDB stream of the original database (where emails are nested fields) and update the new table (where emails are partition keys) when an original table is updated.

Resources