How to avoid scan operation in dynamodb - amazon-dynamodb

Post table
{
...otherPostFields,
tags: string[]
}
User table
{
...otherUserFields,
tags: string[]
}
I am trying to make a feed
I am first fetching User to get the tags
I don't want to use scan since its very expensive as it goes through all the records in the table.
Any better approach?
Once I have the user tags I use scan operation on Post table
const { tags } = Items[0] as IUser & Pick<CUser, 'tags'>;
const ExpressionAttributeValues = tags.reduce<Record<string, string>>((acc, tag, index) => {
acc[`:tags${index}`] = tag;
return acc;
}, {});
const FilterExpression = tags.reduce<string>((acc, _, index) => {
if (index === 0) return `contains(tags, :tags${index})`;
return `${acc} OR contains(tags, :tags${index})`;
}, '');
// expensive operation
const { Items: posts } = await client
.scan({
TableName: PostsTable.get(),
FilterExpression,
Limit: 10,
ExpressionAttributeValues,
})
.promise();

You didn't state the schema of your DynamoDB table nor which information you have before you make a read so it's difficult to help you.
However, to answer your question in short, you are not doing an expensive read as you are setting Limit=10 which will consume 5 RCU per request. If requests are infrequent (less than 5 times per second) you still stay within DynamoDBs free tier of 25 RCU.
Update
I am trying to make a feed I am first fetching User to get the tags
Why not use a Query as you are trying to get a single users tag it seems.
One thing that I noticed is the above query does not return any document when the table has over 100k items. Why is that happening?
This is because DynamoDB only returns up to 1MB per API call, if you require more than 1MB the you must paginate.
A single Query operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression. If LastEvaluatedKey is present in the response, you will need to paginate the result set. For more information, see Paginating the Results in the Amazon DynamoDB Developer Guide.

Related

How do I know if there are more documents left to get from a firestore collection?

I'm using flutter and firebase. I use pagination, max 5 documents per page. How do I know if there are more documents left to get from a firestore collection. I want to use this information to enable/disable a next page button presented to the user.
limit: 5 (5 documents each time)
orderBy: "date" (newest first)
startAfterDocument: latestDocument (just a variable that holds the latest document)
This is how I fetch the documents.
collection.limit(5).orderBy("date", descending: true).startAfterDocument(latestDocument).get()
I thought about checking if the number of docs received from firestore is equal to 5, then assume there are more docs to get. But this will not work if I there are a total of n * 5 docs in the collection.
I thought about getting the last document in the collection and store this and compare this to every doc in the batches I get, if there is a match then I know I've reach the end, but this means one excess read.
Or maybe I could keep on getting docs until I get an empty list and assume I've reached the end of the collection.
I still feel there are a much better solution to this.
Let me know if you need more info, this is my first question on this account.
There is no flag in the response to indicate there are more documents. The common solution is to request one more document than you need/display, and then use the presence of that last document as an indicator that there are more documents.
This is also what the database would have to do to include such a flag in its response, which is probably why this isn't an explicit option in the SDK.
You might also want to check the documentation on keeping a distributed count of the number of documents in a collection as that's another way to determine whether you need to enable the UI to load a next page.
here's a way to get a large data from firebase collection
let latestDoc = null; // this is to store the last doc from a query
//result
const dataArr = []; // this is to store the data getting from firestore
let loadMore = true; // this is to check if there's more data or no
const initialQuery = async () => {
const first = db
.collection("recipes-test")
.orderBy("title")
.startAfter(latestDoc || 0)
.limit(10);
const data = await first.get();
data.docs.forEach((doc) => {
// console.log("doc.data", doc.data());
dataArr.push(doc.data()); // pushing the data into the array
});
//! update latest doc
latestDoc = data.docs[data.docs.length - 1];
//! unattach event listeners if no more docs
if (data.empty) {
loadMore = false;
}
};
// running this through this function so we can actual await for the
//docs to get from firebase
const run = async () => {
// looping until we get all the docs
while (loadMore) {
console.log({ loadMore });
await initialQuery();
}
};

Flutter & Firebase Get more than 10 Firebase Documents into a Stream<List<Map>>

With Flutter and Firestore, I am trying to get more than 10 documents into a Stream<List>. I can do this with a .where clause on a collection mapping the QuerySnapshot. However, the 10 limit is a killer.
I'm using the provider package in my app. So, in building a stream in Flutter with a StreamProvider, I can return a
Stream<List<Map from the entire collection. too expensive. 200 plus docs on these collections and too many users. Need to get more efficient.
Stream<List<Map using a .where from a Collection that returns a Stream List 10 max on the list...doesn't cut the mustard.
Stream<Map from a Document, that returns 1 stream of 1 document.
I need something in between 1 and 2.
I have a Collection with up to 500 Documents, and the user will choose any possible combination of those 500 to view. The user assembles class rosters to view their lists of users.
So I'm looking for a way to get individual streams of, say 30 documents, and then compile them into a List: But I need this List<Stream<Map to be a Stream itself so each individual doc is live, and I can also filter and sort this list of Streams. I'm using the Provider Package, and if possible would like to stay consistent with that. Here's where I am currently stuck:
So, my current effort:
Future<Stream<List<AttendeeData>>> getStreams() async {
List<Stream<AttendeeData>> getStreamsOutput = [];
for (var i = 0; i < teacherRosterList.length; i++) {
Stream thisStream = await returnTeacherRosterListStream(facility, teacherRosterList[i]);
getStreamsOutput.add(thisStream);
}
return StreamZip(getStreamsOutput).asBroadcastStream();
}
Feels like I'm cheating below: I get an await error if I put the snapshot directly in Stream thisStream above as Stream is not a future if I await, and if I don't await, it moves too fast and gets a null error.
Future<Stream<AttendeeData>> returnTeacherRosterListStream(String thisFacility, String thisID) async {
return facilityList.doc(thisFacility).collection('attendance').doc(thisID).snapshots().map(_teacherRosterListFromSnapshot);
}
}
Example of how I'm mapping in _teacherRosterListFromSnapshot (not having any problem here):
AttendeeData _teacherRosterListFromSnapshot(DocumentSnapshot doc) {
// return snapshot.docs.map((doc) {
return AttendeeData(
id: doc.data()['id'] ?? '',
authorCreatedUID: doc.data()['authorCreatedUID'] ?? '',
);
}
My StreamProvider Logic and the error:
return MultiProvider(
providers: [
StreamProvider<List<AttendeeData>>.value(
value: DatabaseService(
teacherRosterList: programList,
facility: user.claimsFacility,
).getStreams()),
]
Error: The argument type 'Future<Stream<List>>' can't be assigned to the parameter type 'Stream<List>'.
AttendeeData is my Map Class name.
So, the summary of questions:
Can I even do this? I'm basically Streaming a List of Streams of Maps....is this a thing?
If I can, how do I do it?
a. I can't get this into the StreamProvider because getStreams is a Future...how can I overcome this?
I can get the data in using another method from StreamProvider, but it's not behaving like a Stream and the state isn't updating. i'm hoping to just get this into Provider, as I'm comfortable there, and I can manage state very easily that way. However, beggars can't be choosers.
Solved this myself, and since there is a dearth of good start to finish answers, I submit my example for the poor souls who come after me trying to learn these things on their own. I'm a beginner, so this was a slog:
Objective:
You have any number of docs in a collection and you want to submit a list of any number of docs by their doc number and return a single stream of a list of those mapped documents. You want more than 10 (firestore limit on .where query), less than all the docs...so somewhere between a QuerySnapshot and a DocumentSnapshot.
Solution: We're going to get a list of QuerySnapshots, we're going to combine them and map them and spit them out as a single stream. So we're getting 10each in chunks (the max) and then some odd number left over. I plug mine into a Provider so I can get it whenever and wherever I want.
So from my provider I call this as the Stream value:
Stream<List<AttendeeData>> filteredRosterList() {
var chunks = [];
for (var i = 0; i < teacherRosterList.length; i += 10) {
chunks.add(teacherRosterList.sublist(i, i + 10 > teacherRosterList.length ? teacherRosterList.length : i + 10));
} //break a list of whatever size into chunks of 10.
List<Stream<QuerySnapshot>> combineList = [];
for (var i = 0; i < chunks.length; i++) {
combineList.add(*[point to your collection]*.where('id', whereIn: chunks[i]).snapshots());
} //get a list of the streams, which will have 10 each.
CombineLatestStream<QuerySnapshot, List<QuerySnapshot>> mergedQuerySnapshot = CombineLatestStream.list(combineList);
//now we combine all the streams....but it'll be a list of QuerySnapshots.
//and you'll want to look closely at the map, as it iterates, consolidates and returns as a single stream of List<AttendeeData>
return mergedQuerySnapshot.map(rosterListFromTeacherListDocumentSnapshot);
}
Here's a look at how I mapped it for your reference (took out all the fields for brevity):
List<AttendeeData> rosterListFromTeacherListDocumentSnapshot(List<QuerySnapshot> snapshot) {
List<AttendeeData> listToReturn = [];
snapshot.forEach((element) {
listToReturn.addAll(element.docs.map((doc) {
return AttendeeData(
id: doc.data()['id'] ?? '',
authorCreatedUID: doc.data()['authorCreatedUID'] ?? '',
);
}).toList());
});
return listToReturn;
}

What's the best way to paginate and filters large set of data in Firebase?

I have a large Firestore collection with 10,000 documents.
I want to show these documents in a table by paging and filtering the results at 25 at a time.
My idea, to limit the "reads" (and therefore the costs), was to request only 25 documents at a time (using the 'limit' method), and to load the next 25 documents at the page change.
But there's a problem. In order to show the number of pages I have to know the total number of documents and I would be forced to query all the documents to find that number.
I could opt for an infinite scroll, but even in this case I would never know the total number of results that my filter has found.
Another option would be to request all documents at the beginning and then paging and filtering using the client.
so, what is the best way to show data in this type of situation by optimizing performance and costs?
Thanks!
You will find in the Firestore documentation a page dedicated to Paginating data with query cursors.
I paste here the example which "combines query cursors with the limit() method".
var first = db.collection("cities")
.orderBy("population")
.limit(25);
return first.get().then(function (documentSnapshots) {
// Get the last visible document
var lastVisible = documentSnapshots.docs[documentSnapshots.docs.length-1];
console.log("last", lastVisible);
// Construct a new query starting at this document,
// get the next 25 cities.
var next = db.collection("cities")
.orderBy("population")
.startAfter(lastVisible)
.limit(25);
});
If you opt for an infinite scroll, you can easily know if you have reached the end of the collection by looking at the value of documentSnapshots.size. If it is under 25 (the value used in the example), you know that you have reached the end of the collection.
If you want to show the total number of documents in the collection, the best is to use a distributed counter which holds the number of documents, as explained in this answer: https://stackoverflow.com/a/61250956/3371862
Firestore does not provide a way to know how many results would be returned by a query without actually executing the query and reading each document. If you need a total count, you will have to somehow track that yourself in another document. There are plenty of suggestions on Stack Overflow about counting documents in collections.
Cloud Firestore collection count
How to get a count of number of documents in a collection with Cloud Firestore
However, the paging API itself will not help you. You need to track it on your own, which is just not very easy, especially for flexible queries that could have any number of filters.
My guess is you would be using Mat-Paginator and the next button is disabled because you cannot specify the exact length? In that case or not, a simple workaround for this is to get (pageSize +1) documents each time from the Firestore sorted by a field (such as createdAt), so that after a new page is loaded, you will always have one document in the next page which will enable the "next" button on the paginator.
What worked best for me:
Create a simple query
Create a simple pagination query
Combine both (after validating each one works separately)
Simple Pagination Query
const queryHandler = query(
db.collection('YOUR-COLLECTION-NAME'),
orderBy('ORDER-FIELD-GOES-HERE'),
startAt(0),
limit(50)
)
const result = await getDocs(queryHandler)
which will return the first 50 results (ordered by your criteria)
Simple Query
const queryHandler = query(
db.collection('YOUR-COLLECTION-NAME'),
where('FIELD-NAME', 'OPERATOR', 'VALUE')
)
const result = await getDocs(queryHandler)
Note that the result object has both query field (with relevant query) and docs field (to populate actual data)
So... combining both will result with:
const queryHandler = query(
db.collection('YOUR-COLLECTION-NAME'),
where('FIELD-NAME', 'OPERATOR', 'VALUE'),
orderBy('FIELD-NAME'),
startAt(0),
limit(50)
)
const result = await getDocs(queryHandler)
Please note that the field in the where clause and in orderBy must be the same one! Also, it is worth mentioning that you may be required to create an index (for some use cases) or that this operation will fail while using equality operators and so on.
My tip: inspect the error itself where you will find a detailed description describing why the operation failed and what should be done in order to fix it (see an example output using js client in image below)
Firebase V9 functional approach. Don't forget to enable persistence so you won't get huge bills. Don't forget to use where() function if some documents have restrictions in rules. Firestore will throw error if even one document is not allowed to read by user. In case bellow documents has to have isPublic = true.
firebase.ts
function paginatedCollection(collectionPath: string, initDocumentsLimit: number, initQueryConstraint: QueryConstraint[]) {
const data = vueRef<any[]>([]) // Vue 3 Ref<T> object You can change it to even simple array.
let snapshot: QuerySnapshot<DocumentData>
let firstDoc: QueryDocumentSnapshot<DocumentData>
let unSubSnap: Unsubscribe
let docsLimit: number = initDocumentsLimit
let queryConst: QueryConstraint[] = initQueryConstraint
const onPagination = (option?: "endBefore" | "startAfter" | "startAt") => {
if (option && !snapshot) throw new Error("Your first onPagination invoked function has to have no arguments.")
let que = queryConst
option === "endBefore" ? que = [...que, limitToLast(docsLimit), endBefore(snapshot.docs[0])] : que = [...que, limit(docsLimit)]
if (option === "startAfter") que = [...que, startAfter(snapshot.docs[snapshot.docs.length - 1])]
if (option === "startAt") que = [...que, startAt(snapshot.docs[0])]
const q = query(collection(db, collectionPath), ...que)
const unSubscribtion = onSnapshot(q, snap => {
if (!snap.empty && !option) { firstDoc = snap.docs[0] }
if (option === "endBefore") {
const firstDocInSnap = JSON.stringify(snap.docs[0])
const firstSaved = JSON.stringify(firstDoc)
if (firstDocInSnap === firstSaved || snap.empty || snap.docs.length < docsLimit) {
return onPagination()
}
}
if (option === "startAfter" && snap.empty) {
onPagination("startAt")
}
if (!snap.empty) {
snapshot = snap
data.value = []
snap.forEach(docSnap => {
const doc = docSnap.data()
doc.id = docSnap.id
data.value = [...data.value, doc]
})
}
})
if (unSubSnap) unSubSnap()
unSubSnap = unSubscribtion
}
function setLimit(documentsLimit: number) {
docsLimit = documentsLimit
}
function setQueryConstraint(queryConstraint: QueryConstraint[]) {
queryConst = queryConstraint
}
function unSub() {
if (unSubSnap) unSubSnap()
}
return { data, onPagination, unSub, setLimit, setQueryConstraint }
}
export { paginatedCollection }
How to use example in Vue 3 in TypeScript
const { data, onPagination, unSub } = paginatedCollection("posts", 8, [where("isPublic", "==", true), where("category", "==", "Blog"), orderBy("createdAt", "desc")])
onMounted(() => onPagination()) // Lifecycle function
onUnmounted(() => unSub()) // Lifecycle function
function next() {
onPagination('startAfter')
window.scrollTo({ top: 0, behavior: 'smooth' })
}
function prev() {
onPagination('endBefore')
window.scrollTo({ top: 0, behavior: 'smooth' })
}
You might have problem with knowing which document is last one for example to disable button.

DynamoDB Mapper Query Doesn't Respect QueryExpression Limit

Imagine the following function which is querying a GlobalSecondaryIndex and associated Range Key in order to find a limited number of results:
#Override
public List<Statement> getAllStatementsOlderThan(String userId, String startingDate, int limit) {
if(StringUtils.isNullOrEmpty(startingDate)) {
startingDate = UTC.now().toString();
}
LOG.info("Attempting to find all Statements older than ({})", startingDate);
Map<String, AttributeValue> eav = Maps.newHashMap();
eav.put(":userId", new AttributeValue().withS(userId));
eav.put(":receivedDate", new AttributeValue().withS(startingDate));
DynamoDBQueryExpression<Statement> queryExpression = new DynamoDBQueryExpression<Statement>()
.withKeyConditionExpression("userId = :userId and receivedDate < :receivedDate").withExpressionAttributeValues(eav)
.withIndexName("userId-index")
.withConsistentRead(false);
if(limit > 0) {
queryExpression.setLimit(limit);
}
List<Statement> statementResults = mapper.query(Statement.class, queryExpression);
LOG.info("Successfully retrieved ({}) values", statementResults.size());
return statementResults;
}
List<Statement> results = statementRepository.getAllStatementsOlderThan(userId, UTC.now().toString(), 5);
assertThat(results.size()).isEqualTo(5); // NEVER passes
The limit isn't respected whenever I query against the database. I always get back all results that match my search criteria so if I set the startingDate to now then I get every item in the database since they're all older than now.
You should use queryPage function instead of query.
From DynamoDBQueryExpression.setLimit documentation:
Sets the maximum number of items to retrieve in each service request
to DynamoDB.
Note that when calling DynamoDBMapper.query, multiple
requests are made to DynamoDB if needed to retrieve the entire result
set. Setting this will limit the number of items retrieved by each
request, NOT the total number of results that will be retrieved. Use
DynamoDBMapper.queryPage to retrieve a single page of items from
DynamoDB.
As they've rightly answered the setLimit or withLimit functions limit the number of records fetched only in each particular request and internally multiple requests take place to fetch the results.
If you want to limit the number of records fetched in all the requests then you might want to use "Scan".
Example for the same can be found here

Use vogels js to implement pagination

I am implementing a website with a dynamodb + nodejs backend. I use Vogels.js in server side to query dynamodb and show results on a webpage. Because my query returns a lot of results, I would like to return only N (such as 5) results back to a user initially, and return the next N results when the user asks for more.
Is there a way I can run two vogels queries with the second query starts from the place where the first query left off ? Thanks.
Yes, vogels fully supports pagination on both query and scan operations.
For example:
var Tweet = vogels.define('tweet', {
hashKey : 'UserId',
rangeKey : 'PublishedDateTime',
schema : {
UserId : Joi.string(),
PublishedDateTime : Joi.date().default(Date.now),
content : Joi.string()
}
});
// Fetch the 5 most recent tweets from user with id 555:
Tweet.query(555).limit(5).descending().exec(function (err, data) {
var paginationKey = data.LastEvaluatedKey;
// Fetch the next page of 5 tweets
Tweet.query(555).limit(5).descending().startKey(paginationKey).exec()
});
Yes it is possible, DynamoDB has some thing called "LastEvaluatedKey" which will server your purpose.
Step 1) Query your table with option "Limit" = number of records
refer: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
Step 2) If your query has more records than the "Limit value", DynamoDB will return a "LastEvaluatedKey" which you can pass in your next query as "ExclusiveStartKey" to get next set of records until there are no records left
refer: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScan.Query
Note: Be aware that to get previous set of records you might have to store all the "LastEvaluatedKeys" and implement this at application level

Resources