DynamoDB Mapper Query Doesn't Respect QueryExpression Limit - amazon-dynamodb

Imagine the following function which is querying a GlobalSecondaryIndex and associated Range Key in order to find a limited number of results:
#Override
public List<Statement> getAllStatementsOlderThan(String userId, String startingDate, int limit) {
if(StringUtils.isNullOrEmpty(startingDate)) {
startingDate = UTC.now().toString();
}
LOG.info("Attempting to find all Statements older than ({})", startingDate);
Map<String, AttributeValue> eav = Maps.newHashMap();
eav.put(":userId", new AttributeValue().withS(userId));
eav.put(":receivedDate", new AttributeValue().withS(startingDate));
DynamoDBQueryExpression<Statement> queryExpression = new DynamoDBQueryExpression<Statement>()
.withKeyConditionExpression("userId = :userId and receivedDate < :receivedDate").withExpressionAttributeValues(eav)
.withIndexName("userId-index")
.withConsistentRead(false);
if(limit > 0) {
queryExpression.setLimit(limit);
}
List<Statement> statementResults = mapper.query(Statement.class, queryExpression);
LOG.info("Successfully retrieved ({}) values", statementResults.size());
return statementResults;
}
List<Statement> results = statementRepository.getAllStatementsOlderThan(userId, UTC.now().toString(), 5);
assertThat(results.size()).isEqualTo(5); // NEVER passes
The limit isn't respected whenever I query against the database. I always get back all results that match my search criteria so if I set the startingDate to now then I get every item in the database since they're all older than now.

You should use queryPage function instead of query.
From DynamoDBQueryExpression.setLimit documentation:
Sets the maximum number of items to retrieve in each service request
to DynamoDB.
Note that when calling DynamoDBMapper.query, multiple
requests are made to DynamoDB if needed to retrieve the entire result
set. Setting this will limit the number of items retrieved by each
request, NOT the total number of results that will be retrieved. Use
DynamoDBMapper.queryPage to retrieve a single page of items from
DynamoDB.

As they've rightly answered the setLimit or withLimit functions limit the number of records fetched only in each particular request and internally multiple requests take place to fetch the results.
If you want to limit the number of records fetched in all the requests then you might want to use "Scan".
Example for the same can be found here

Related

How to avoid scan operation in dynamodb

Post table
{
...otherPostFields,
tags: string[]
}
User table
{
...otherUserFields,
tags: string[]
}
I am trying to make a feed
I am first fetching User to get the tags
I don't want to use scan since its very expensive as it goes through all the records in the table.
Any better approach?
Once I have the user tags I use scan operation on Post table
const { tags } = Items[0] as IUser & Pick<CUser, 'tags'>;
const ExpressionAttributeValues = tags.reduce<Record<string, string>>((acc, tag, index) => {
acc[`:tags${index}`] = tag;
return acc;
}, {});
const FilterExpression = tags.reduce<string>((acc, _, index) => {
if (index === 0) return `contains(tags, :tags${index})`;
return `${acc} OR contains(tags, :tags${index})`;
}, '');
// expensive operation
const { Items: posts } = await client
.scan({
TableName: PostsTable.get(),
FilterExpression,
Limit: 10,
ExpressionAttributeValues,
})
.promise();
You didn't state the schema of your DynamoDB table nor which information you have before you make a read so it's difficult to help you.
However, to answer your question in short, you are not doing an expensive read as you are setting Limit=10 which will consume 5 RCU per request. If requests are infrequent (less than 5 times per second) you still stay within DynamoDBs free tier of 25 RCU.
Update
I am trying to make a feed I am first fetching User to get the tags
Why not use a Query as you are trying to get a single users tag it seems.
One thing that I noticed is the above query does not return any document when the table has over 100k items. Why is that happening?
This is because DynamoDB only returns up to 1MB per API call, if you require more than 1MB the you must paginate.
A single Query operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression. If LastEvaluatedKey is present in the response, you will need to paginate the result set. For more information, see Paginating the Results in the Amazon DynamoDB Developer Guide.

retrieve a result when the partition is not known (but row key is)

In my case (I happen to have only two types for each entry, so 2 partitions, and the row key is unique) I can write an iterative set of queries going over all possible partitions like this:
TableOperation retrieveOperation = TableOperation.Retrieve<JobStatus>(Mode.GreyScale.Description(), id);
TableResult query = await table.ExecuteAsync(retrieveOperation);
if (query.Result != null)
{
return new OkObjectResult((JobStatus)query.Result);
}
else
{
retrieveOperation = TableOperation.Retrieve<JobStatus>(Mode.Sepia.Description(), id);
query = await table.ExecuteAsync(retrieveOperation);
if (query.Result != null)
{
return new OkObjectResult((JobStatus)query.Result);
}
}
return new NotFoundResult();
The thing is, that is clearly inefficient (imagine if there were hundreds of types!). Does azure storage tables provide an efficient means to query when you know only the row key?
Does azure storage tables provide an efficient means to query when you
know only the row key?
Simple answer to your question is no, there's no efficient way to query table when you only know the RowKey. Table Service will do full table scan going from one partition to another and find entities with matching RowKey.
In your case, you would probably want to use TableQuery to create your query and then either call ExecuteQuery or ExecuteQuerySegmented to get query results.
TableQuery query = new TableQuery().Where("RowKey eq 'Your Row Key'");
var result = table.ExecuteQuery(query);

DynamoDB how to get items count for a partition keys using .net core?

How can I get items count for a particular partition key using .net core preferably using Object Persistence Interface or Document Interfaces?
Since I do not see any docs any where, currently I get the number of items count by retrieve all the item and get its count, but it is very expensive to do the reads.
What is the best practices for such item count request? Thank you.
dynamodb is mostly a document oriented key-value db; so its not optimized for functionality of the common relation db functions (like item count).
to minimize the data that is transmitted and to improve speed you may want to do the following:
Create Lambda Function that returns Item Count
To avoid transmitting data outside of AWS; which is slow and expensive.
query options
use only keys in your projection-expression,
reducing the data that is transmitted from db
max page-size, reducing number of calls needed
Stream Option
Streams could also be used for keeping counts; e.g. as described in
https://medium.com/signiant-engineering/real-time-aggregation-with-dynamodb-streams-f93547cfb244
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-aggregation.html
Related SO Question
Complexity of finding total records count with partition key in nosql dynamodb table?
I just realized that using low level interface in QueryRequest one can set Select = "COUNT" then when calling QueryAsync() orQuery() will return the count only as a integer only. Please refer to code sample below.
private static QueryRequest getStockRecordCountQueryRequest(string tickerSymbol, string prefix)
{
string partitionName = ":v_PartitionKeyName";
string sortKeyPrefix = ":v_sortKeyPrefix";
var request = new QueryRequest
{
TableName = Constants.TableName,
ReturnConsumedCapacity = ReturnConsumedCapacity.TOTAL,
Select = "COUNT",
KeyConditionExpression = $"{Constants.PartitionKeyName} = {partitionName} and begins_with({Constants.SortKeyName},{sortKeyPrefix})",
ExpressionAttributeValues = new Dictionary<string, AttributeValue>
{
{ $"{partitionName}", new AttributeValue {
S = tickerSymbol
}},
{ $"{sortKeyPrefix}", new AttributeValue {
S = prefix
}}
},
// Optional parameter.
ConsistentRead = false,
ExclusiveStartKey = null,
};
return request;
}
but I would like to point out that this still will consumed the same read units as retrieving all the item and get its count by yourself. but since it is only returning the count as an integer, it is a lot more efficient then transmitting the entire items list cross the wire.
I think using DynamoDB Streams in a more proper way to get the counts for large project. It is just a lot more complicated to implement.

DynamoDb - .NET Object Persistence Model - LoadAsync does not apply ScanCondition

I am fairly new in this realm and any help is appreciated
I have a table in Dynamodb database named Tenant as below:
"TenantId" is the hash primary key and I have no other keys. And I have a field named "IsDeleted" which is boolean
Table Structure
I am trying to run a query to get the record with specified "TenantId" while it is not deleted ("IsDeleted == 0")
I can get a correct result by running the following code: (returns 0 item)
var filter = new QueryFilter("TenantId", QueryOperator.Equal, "2235ed82-41ec-42b2-bd1c-d94fba2cf9cc");
filter.AddCondition("IsDeleted", QueryOperator.Equal, 0);
var dbTenant = await
_genericRepository.FromQueryAsync(new QueryOperationConfig
{
Filter = filter
}).GetRemainingAsync();
But no luck when I try to get it with following code snippet (It returns the item which is also deleted) (returns 1 item)
var queryFilter = new List<ScanCondition>();
var scanCondition = new ScanCondition("IsDeleted", ScanOperator.Equal, new object[]{0});
queryFilter.Add(scanCondition);
var dbTenant2 = await
_genericRepository.LoadAsync("2235ed82-41ec-42b2-bd1c-d94fba2cf9cc", new DynamoDBOperationConfig
{
QueryFilter = queryFilter,
ConditionalOperator = ConditionalOperatorValues.And
});
Any Idea why ScanCondition has no effect?
Later I also tried this: (throw exception)
var dbTenant2 = await
_genericRepository.QueryAsync("2235ed82-41ec-42b2-bd1c-d94fba2cf9cc", new DynamoDBOperationConfig()
{
QueryFilter = new List<ScanCondition>()
{
new ScanCondition("IsDeleted", ScanOperator.Equal, 0)
}
}).GetRemainingAsync();
It throws with: "Message": "Must have one range key or a GSI index defined for the table Tenants"
Why does it complain about Range key or Index? I'm calling
public AsyncSearch<T> QueryAsync<T>(object hashKeyValue, DynamoDBOperationConfig operationConfig = null);
You simply cant query a table only giving a single primary key (only hash key). Because there is one and only one item for that primary key. The result of the Query would be that still that single item, which is actually Load operation not Query. You can only query if you have composite primary key in this case (Hash (TenantID) and Range Key) or GSI (which doesn't impose key uniqueness therefore accepts duplicate keys on index).
The second code attempts to filter the Load. DynamoDBOperationConfig's QueryFilter has a description ...
// Summary:
// Query filter for the Query operation operation. Evaluates the query results and
// returns only the matching values. If you specify more than one condition, then
// by default all of the conditions must evaluate to true. To match only some conditions,
// set ConditionalOperator to Or. Note: Conditions must be against non-key properties.
So works only with Query operations
Edit: So after reading your comments on this...
I dont think there conditional expressions are for read operations. AWS documents indicates they are for put or update operations. However, not being entirely sure on this since I never needed to do a conditional Load. There is no such thing like CheckIfExists functionality as well in general. You have to read the item and see if it exists. Conditional load will still consume read throughput so your only advantage would be only NOT retrieving it in other words saving the bandwith (which is very negligible for single item).
My suggestion is read it and filter it in your application layer. Dont query for it. However what you can also do is if you very need it you can use TenantId as hashkey and isDeleted for range key. If you do so, you always have to query when you wanna get a tenant. With the query you can set rangeKey(isDeleted) to 0 or 1. This isnt how I would do it. As I said, would just read it and filter it at my application.
Another suggestion thing could be setting a GSI on isDeleted field and writing null when it is 0. This way you can only see that attribute in your table when its only 1. GSI on such attribute is called sparse index. Later if you need to get all the tenants that are deleted (isDeleted=1) you can simply scan that entire index without conditions. When you are writing null when its 0 dynamoDB wont put it in the index at the first place.

Use vogels js to implement pagination

I am implementing a website with a dynamodb + nodejs backend. I use Vogels.js in server side to query dynamodb and show results on a webpage. Because my query returns a lot of results, I would like to return only N (such as 5) results back to a user initially, and return the next N results when the user asks for more.
Is there a way I can run two vogels queries with the second query starts from the place where the first query left off ? Thanks.
Yes, vogels fully supports pagination on both query and scan operations.
For example:
var Tweet = vogels.define('tweet', {
hashKey : 'UserId',
rangeKey : 'PublishedDateTime',
schema : {
UserId : Joi.string(),
PublishedDateTime : Joi.date().default(Date.now),
content : Joi.string()
}
});
// Fetch the 5 most recent tweets from user with id 555:
Tweet.query(555).limit(5).descending().exec(function (err, data) {
var paginationKey = data.LastEvaluatedKey;
// Fetch the next page of 5 tweets
Tweet.query(555).limit(5).descending().startKey(paginationKey).exec()
});
Yes it is possible, DynamoDB has some thing called "LastEvaluatedKey" which will server your purpose.
Step 1) Query your table with option "Limit" = number of records
refer: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
Step 2) If your query has more records than the "Limit value", DynamoDB will return a "LastEvaluatedKey" which you can pass in your next query as "ExclusiveStartKey" to get next set of records until there are no records left
refer: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScan.Query
Note: Be aware that to get previous set of records you might have to store all the "LastEvaluatedKeys" and implement this at application level

Resources