query by pages DynamoDB - amazon-dynamodb

my app needs to query a list of objects from dynamoDB and I do so using an api written in python (boto3)
I want the user to be able to view the items by pages and request the next/ previous one
how can i implement in It I the api?
my table is struct with 2 keys,
'key1': user1 //string
'key2' : item1 //string
'key1': user1
'key2' : item2
'key1': user1
'key2' : item3
'key1': user1
'key2' : item4
I'm currently using query to get all the items for a user but on large scale I would like to get it by pages. scan operation has a parameter called "LastEvaluatedKey" so I thought about using that parameter to set the ExclusiveStartKey and limit the result to the number of items per page but it still scans the entire table right?

DynamoDB will return a LastEvaluatedKey whenever the results of a query or scan operation is greater than 1MB.
A scan is still an inefficient operation, even if you are paginating the results. If you want to provide pagination in your application, you'll get the best performance by paginating over the results of a query operation. Avoid scan if you can!

Related

Organizing a Cloud Firestore database

I can't manage to determine what is the better way of organizing my database for my app :
My users can create items identified by a unique ID.
The queries I need :
- Query 1: Get all the items created by a user
- Query 2 : From the UID of an item, get its creator
My database is organized as following :
Users database
user1 : {
item1_uid,
item2_uid
},
user2 : {
item3_uid
}
Items database
item1_uid : {
title,
description
},
item2_uid : {
title,
description
},
item3_uid : {
title,
description
}
For the query 2, its quite simple but for the query 2, I need to parse all the users database and list all the items Id to see if there is the one I am looking for. It works right now but I'm afraid that it will slow the request time as the database grows.
Should I add in the items data a row with the user id ? If yes the query will be simpler but I heard that I am not supposed to have twice the same data in the database because it can lead to conflicts when adding or removing items.
Should I add in the items data a row with the user id ?
Yes, this is a very common approach in the NoSQL world and is called denormalization. Denormalization is described, in this "famous" post about NoSQL data modeling, as "copying of the same data into multiple documents in order to simplify/optimize query processing or to fit the user’s data into a particular data model". In other words, the main driver of your data model design is the queries you plan to execute.
More concretely you could have an extra field in your item documents, which contain the ID of the creator. You could even have another one with, e.g., the name of the creator: This way, in one query, you can display the items and their creators.
Now, for maintaining these different documents in sync (for example, if you change the name of one user, you want it to be updated in the corresponding items), you can either use a Batched Write to modify several documents in one atomic operation, or rely on one or more Cloud Functions that would detect the changes of the user documents and reflect them in the item documents.

How to return the total matches on a Cosmos db query

I have setup an api that will query our Cosmos db and return the JSON results back to the front end app. There is a user defined limit on the number of results. If the number of results exceed the limit then I pass back the token to the front end and they can call for the next group of rows. The issue is I would like to provide a count of the Total Number of Matches back to the application. I have looked at the query statistics but don't see where there is a total count.
On the call to CreateDocumentQuery, i'm setting MaxItemCount to the limit, and RequestContinuation either null or the continuationToken. Looking at QueryMetrics I found RetrievedDocumentCount, but that does not seem to have the correct value.
Thanks,
J
x-ms-max-item-count request header controls how many documents should be returned to user.
Default value is 100
if your query returns 150 documents, your request will return first 100 documents and it will return a continuation token in response header(x-ms-continuation). If there is a token, you need to send another request with the given token to get the rest of the data.
SDK should be doing that for you automatically. Can you share some of your code. I might have a better answer then.
You can check out my post about this too.
https://h-savran.blogspot.com/2019/04/introduction-to-continuation-tokens-in.html

Firestore query get size of results without reading the documents?

I have an app that returns a list of health foods. There will be approximately 10000-20000 foods (documents) in the product collection.
These foods are queried by multiple fields using arrayContains. This may be categories, subcategories and when the user searches in the search bar it is an arrayContains on the keywords array.
With so many products I plan to paginate the results of query as I get the documents. The issue is that I need to know the amount of results to display the total of results to the user.
I have read that for a query you are charged one read and then if you get the documents then they are further charged per document. Is there a way of getting the number of results for a query without getting all the documents.
I have seen this answer here:
Get size of the query in Firestore
But in this example they say to use a counter which doesn't seem practical as I am using a query on keyword when the user searches and I am using a mixture of categories, subcategories when the user filters.
Thanks
With so many products I plan to paginate the results of query as I get the documents.
That's a very good decision since getting 10000-20000 foods (documents) at once is not an option. Reason one is the cost, it will be quite expensive and second is that you'll get an OutOfMemoryError when trying to load such enormous amount of data.
The issue is that I need to know the amount of results to display the total of results to the user.
There is no way in Firestore so you can know in advance the size of the result set.
Is there a way of getting the number of results for a query without getting all the documents.
No, you have to page through all the results that are returned by your query to get the total size.
But in this example they say to use a counter which doesn't seem practical as I am using a query on keyword when the user searches
That's correct, that solution doesn't fit your needs since it solves the problem of storing the number of all documents in a collection and not the number of documents that are returned by a query. As far as I know, it's just not scalable to provide that information, in the way this cloud hosted, NoSQL, realtime database needs to "massively scale".
For any future lurker, a "solution" to this problem is to paginate results with a cursor until the query doesn't return any more documents. When the query snapshot is empty, return undefined for your cursor and handle from there:
const LIMIT = 100
const offset = req.query.offset
const results = db.collection(COLLECTION)
.offset(offset)
.limit(LIMIT)
.get()
const docs = results.docs.map(doc => doc.data())
res.status(200).send({
data: docs,
// Return the next offset or undefined if no docs are returned anymore.
offset: docs.length > 0 ? offset + LIMIT : undefined
})

How to query for most recent submissions from users I follow using Firestore?

I've been reading many SO posts about specific querying situations and I've hit a roadblock on mine when trying to determine the best approach without going overboard on read requests.
I have an app where a user can follow multiple artists and on the home page it should display the upcoming submissions sorted by date. I want to limit the number of reads to 15, but the way my structure is set up, I cannot limit overall reads but just reads to a specific artist. Let me explain:
For a specific user, I save all artists that a user follows into an array:
user: {
userUsername: 'jlewallen18',
userArtistUIDs: [1,4,8,9]
}
To establish a connection to a submission from an artist a user follows, I add the artistUID to the submission:
submission: {
submissionArtistUID: 4,
submissionTitle: 'New Album'
submissionReleaseDate: 1550707200
}
Now the way I find what I need to display on the home page for a specific user is to loop through all the artist id's the user followers and query for submissions with that id. I can then subscribe to that observable using combineLatest from rxjs and receive my results.
this.submission$ = user.userArtistUIDs.map((id) => {
return this.db.collection$('submissions', ref => ref
.where('submissionArtistUID', '==', id));
});
This works great, but wont scale well against my read quota, because it queries for every single release from every artist I follow. So if I follow 300 artists, each with 3 releases, I would have 900 reads sent to the app and I'd have to sort and slice the final array to cut it to 15 on the client side.
This is what I'm looking for:
With my list of artistIDs -> query for submissions that contain the
artistID in my array -> sort by ascending order -> limit to 15.
Thus my read count would only be 15. I know that for NoSQL there isn't a one trick pony, but I am struggling to figure out the optimal method as there will be more reads than writes on my application.
With Firestore there is a limit() argument that can be passed along with the query, see here
Would this work?
this.submission$ = user.userArtistUIDs.map((id) => {
return this.db.collection$('submissions', ref => ref
.where('submissionArtistUID', '==', id).limit(15));
});

How to retrieve range of data in firebase

I have user's data which have uuid as their key. I want to send mail to each user, But I don't want to send them mail all together so I want to retrieve range of user.
For example, I have 1000 user's now I want to send mail to range 1-100 user then 101-200 , 201-300 and so on. How can I achieve this?
I have seen startAt() and endAt() functions, but my question is I don't know user key at the beginning and at the end. So I won't be able to retrieve range from that way.
To get the first 100 users you'd do a query on:
query = ref.orderByKey().limitToFirst(100)
Then as you process the users, keep track of the last key you've processed:
vast lastSeenKey;
query.once("value", function(snapshot) {
snapshot.forEach(function(userSnapshot) {
lastSeenKey = userSnapshot.key;
});
});
And then to load the next 100 users, you make the query start at the last key you've seen:
query = ref.orderByKey().startAt(lastSeenKey).limitToFirst(101)
You'll note that we retrieve one extra item here, since the first user in this query will be the same as the last user in the previous query.

Resources