Boto3 dynamodb query on a table with 10Gb size - amazon-dynamodb

I have been trying to fetch all the records on one of my GSI and have seen that there is a option to loop through using the LastEvaluatedKey in the response only if I do a scan. I did not find a better way to use pagination using query in boto3. Is it possible to paginate using a query.
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
res = table.query(
TableName='myTable',
IndexName='my-index',
KeyConditionExpression=Key('myVal').eq(1)
)
while 'LastEvaluatedKey' in res:
for item in res['Items']:
print item #returns only a subset of them

The document mentioned the limit of boto3.dynamodb.table.query() : 1MB data.
You can only use Paginator.Query return iterator(which make sense).
It seems you can replace your table.query with the Paginator.Query. Try it out.
Notes :
There is a catch for boto3.resource() : not all resources services are implemented. So for the dynamodb pagination generator, this is one of those case.
import boto3
dyno_client = boto3.client('dynamodb')
paginator = dyno_client.get_paginator('query')
response_iterator = paginator.paginate(.....)

Related

DynamoDB PartiQL pagination using SDK

I'm currently working on pagination in DynamoDB using the JS AWS-SDK's executeStatement using PartiQL, but my returned object does not contain a NextToken (only the Items array), which is used to paginate.
This is what the code looks like (pretty simple):
const statement = `SELECT "user", "id" FROM "TABLE-X" WHERE "activity" = 'XXXX'`;
const params = {Statement: statement};
try {
const posted = await dynamodb.executeStatement(params).promise();
return { posted: posted };
} catch(err) {
throw new Error(err);
}
I was wondering if anyone has dealt with pagination using PartiQL for DynamoDB.
Could this be because my partition key is a string type?
Still trying to figure it out.
Thanks, in advance!
It turns out that if you want a NextToken DO NOT use version 2 of the AWS SDK for JavaScript. Use version 3. Version 3 will always return a NextToken, even if it is undefined.
From there you can figure out your limits, etc (default limit until you actually get a NextToken is 1MB). You'll need to look into the dynamodb v3 execute statement method.
You can also look into dynamodb paginators, which I've never used, but plan on studying.

Looking at datastore field that is not indexed

I have an id field that is indexed and a boolean field x that is not indexed. Is there any way to view all the entities with x set to true without the following?
having a set of ids to filter by
scrolling through the UI page by page
Unfortunately, no. Cloud Datastore requires an index to query for a property. You could write a script to generate the list of IDs. For example, in python:
from google.cloud import datastore
client = datastore.Client()
query = client.query(kind='foo')
results = list(query.fetch())
for i in results:
if i['x'] == True:
print('Entity {} with id {} has x = True'.format(i.key, i['id']))

Document count query ignores PartitionKey

I am looking to get the count of all documents in a chosen partition. The following code however will return the count of all documents in the collection and costs 0 RU.
var collectionLink = UriFactory.CreateDocumentCollectionUri(databaseId, collectionId);
string command = "SELECT VALUE COUNT(1) FROM Collection c";
FeedOptions feedOptions = new FeedOptions()
{
PartitionKey = new PartitionKey(BuildPartitionKey(contextName, domainName)),
EnableCrossPartitionQuery = false
};
var count = client.CreateDocumentQuery<int>(collectionLink, command, feedOptions)
.ToList()
.First();
adding a WHERE c.partition = 'blah' clause to the query will work, but costs 3.71 RUs with 11 documents in the collection.
Why would the above code snippet return the Count of the whole Collection and is there a better solution to for getting the count of all documents in a chosen partition?
If the query includes a filter against the partition key, like SELECT
* FROM c WHERE c.city = "Seattle", it is routed to a single partition. If the query does not have a filter on partition key, then it is
executed in all partitions, and results are merged client side.
You could check the logical steps the SDK performs from this official doc when we issue a query to Azure Cosmos DB.
If the query is an aggregation like COUNT, the counts from individual
partitions are summed to produce the overall count.
So when you just use SELECT VALUE COUNT(1) FROM Collection c, it is executed in all partitions and results are merged client side.
If you want to get the count of all documents in a chosen partition, you just add the where c.partition = 'XX' filter.
Hope it helps you.
I believe this is actually a bug since I am having the same problem with the partition key set in both the query and the FeedOptions.
A similar issue has been reported here:
https://github.com/Azure/azure-cosmos-dotnet-v2/issues/543
And Microsoft's response makes it sound like it is an SDK issue that is x64-specific.

filter by id in ndb model

I am using ndb model as my databse. what i am trying to do is that filter the result on the basis of list of ids.
i have my model like :
class Photo(ndb.Model):
userid = ndb.StringProperty()
source = ndb.StringProperty()
handle = ndb.StringProperty()
sourceid =ndb.StringProperty()
So i am trying query like this:
queryset=Photo.query(Photo.key.id().IN(photoid_list))
I have also tried :
queryset=Photo.query(Photo.id().IN(photoid_list))
where photoid_list is the list of ids.
Help me out to solve it.
I would suggest that you create keys from each id and then get them all at once:
photo_keys = [ndb.Key(Photo, id) for id in photoid_list]
photos = ndb.get_multi(photo_keys)
The advantage is that a get is faster than a query. Also, ndb will memcache the entities by key and make subsequent gets even faster.
There are examples in the docs:
https://cloud.google.com/appengine/docs/python/ndb/queries#neq_and_in
So it looks like your query should be:
queryset=Photo.query(Photo.id.IN(photoid_list))

Use vogels js to implement pagination

I am implementing a website with a dynamodb + nodejs backend. I use Vogels.js in server side to query dynamodb and show results on a webpage. Because my query returns a lot of results, I would like to return only N (such as 5) results back to a user initially, and return the next N results when the user asks for more.
Is there a way I can run two vogels queries with the second query starts from the place where the first query left off ? Thanks.
Yes, vogels fully supports pagination on both query and scan operations.
For example:
var Tweet = vogels.define('tweet', {
hashKey : 'UserId',
rangeKey : 'PublishedDateTime',
schema : {
UserId : Joi.string(),
PublishedDateTime : Joi.date().default(Date.now),
content : Joi.string()
}
});
// Fetch the 5 most recent tweets from user with id 555:
Tweet.query(555).limit(5).descending().exec(function (err, data) {
var paginationKey = data.LastEvaluatedKey;
// Fetch the next page of 5 tweets
Tweet.query(555).limit(5).descending().startKey(paginationKey).exec()
});
Yes it is possible, DynamoDB has some thing called "LastEvaluatedKey" which will server your purpose.
Step 1) Query your table with option "Limit" = number of records
refer: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
Step 2) If your query has more records than the "Limit value", DynamoDB will return a "LastEvaluatedKey" which you can pass in your next query as "ExclusiveStartKey" to get next set of records until there are no records left
refer: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScan.Query
Note: Be aware that to get previous set of records you might have to store all the "LastEvaluatedKeys" and implement this at application level

Resources