Going over read-quota in firebase firestore - firebase

I'm trying to figure out if there's a reasonable way of doing this:
My problem:
Exceeding my daily quota for reads in firestore pretty fast.
My database and what I do:
My database looks like this (simplified):
sessions: { // collection
sessionId: { // document
users: { // collection
userId: { // document
id: string
items: { // collection
itemId: trackObject
}
}
}
}
}
Now I want to retrieve from one session, all users and their items. Most sessions have 2-3 users but some users have around 3000 items. I basically want to retrieve an array like this:
[
{
userId,
items: [
...items
],
},
...users
]
How I go about it currently:
So I get all users:
const usersRef = db.collection(`sessions/${sessionId}/users`);
const userSnapshots = await usersRef.get();
const userDocs = userSnapshots.docs;
Then for each user I retrieve their items:
(I use a for-loop which can be discussed but anyhow)
const user = userDocs[i].data();
const itemsRef = usersRef.collection(`${user.id}/items`);
const itemSnapshots = await itemRef.get();
const items = itemSnapshots.docs
Finally I retrieve the actual items through a map:
user.items = items.map(doc => doc.data());
return user;
My theory:
So it looks like if I do this on a session where a user has 3000 items, the code will perform 3000 read operations on firestore. After just 17 runs I eat up my 50000 operations a day.
This reasoning is somewhat based on this answer.
My question:
Is there any other way of doing this? Like getting all tracks in one read-call? Should I see if I can fit all the items into an array-key in the user-object instead of storing as a collection? Is the free version of firestore simply not designed for this many documents being retrieved in one go?

If you're trying to reduce the number of document reads, you'll need to reduce the number of documents that you need to read to implement your use-case.
For example, it is fairly unlike that a user of your app will want to read the details of all 3000 items. So you might want to limit how many items you initially read, and load the additional items only on demand.
Also consider if each item needs to be its own document, or whether you could combine all items of a user into a single document. For example, if you never query the individual items, there is no need to store them as separate documents.
Another thing to consider if whether you can combine common items into a single document. An example of this is, even if you keep the items in a separate subcollection, to keep the names and ids of the most recent 30 items for a user in the user's document. This allows you to easily show a user and their 30 most recent items. Doing this you're essentially pre-rendering those 30 items of each user, significantly reducing the number of documents you need to read.
To learn more on data modeling considerations, see:
Cloud Firestore Payments
Going over read-quota in firebase firestore
the video series Getting to know Cloud Firestore, specifically What is a NoSQL Database? How is Cloud Firestore structured? and How to Structure Your Data
this article on NoSQL data modeling

Related

getting number documents of a collection in Firebase in dart flutter

im trying to retrieve the number of courses created, the courses created will show in "Courses" collection.
this is the code i used
String course = '';
FirebaseFirestore.instance
.collection("Courses")
.get()
.then((QuerySnapshot querySnapshot) {
course = querySnapshot.docs.length.toString();
});
this is a screenshots of my firebase
Collections don't know how many documents they contain so to get the count you have to retrieve all of the documents to then get a count. That could be time consuming as well as a lot of reading (cost).
The simple solution is to keep the count in another collection or at a known document location within the collection.
Generically speaking, suppose we have three collections, Courses, Users and Locations and each one could have 0 to thousands of documents.
Users
user_0
...some fields
user_1
...some fields
Courses
course_0
...some fields
course_1
...some fields
Locations
location_0
...some fields
location_1
...some fields
As previously mentioned, if there are a limited number of documents simply reading Users (for example) and getting the count of the documents from the snapshot works and is simple. However, as Users grows so does the document count and cost.
The better and scalable solution is to keep anther collection of counts
Document_Counts
users_collection
count: 2
courses_collection
count: 2
locations_collection
count: 2
Then when you want the number of users, simply read the Document_Counts/users_collection document and get the count from the count field.
Firestore has a lighting fast and simple increment and decrement function so as a document is added to Users, for example, increment that same count field.
With Cloud Firebase 2.0, there is a new way to count documents in a collection. According to reference notes, the count does not count as a read per document but a metaData request:
"[AggregateQuery] represents the data at a particular location for retrieving metadata without retrieving the actual documents."
Example:
final CollectionReference<Map<String, dynamic>> courseList = FirebaseFirestore.instance.collection('Courses');
Future<int> countCourses() async {
AggregateQuerySnapshot query = await courseList.count().get();
debugPrint('The number of courses: ${query.count}');
return query.count;
}

How to shard data Realtime Database for chat app?

I am building a chat app and want to use RealTime Database.
I expect my database to reach the quota 200k simultaneous connection.
So i have read the documentation about scaling and sharding the data.
However i don't understand how to handle this for a chat app.
Let's say i have a groups reference that contains ids of users inside each group, and messages for this group.
If i want to scale, i need to create a new DB instance and start writing groups there too as the first DB may have more than 200k simultaneous connection.
That means users may belong to groups in multiple databases, which seems already weird and not such a good idea.
So i would like to know :
How can i shard the groups reference ?
How can i (or even should i) make users connect to multiple DB according to the groups they belong ?
It seems to be a very complicated way to do things... Am i not understanding this correctly ?
I'm sure there are plenty of ways to shard a database but here's how I've done it. This involves selecting a shard while creating a new chat. For this answer, let's assume there are 4 users: U1, U2, U3 and U4, and 2 shards (excluding the default): shard1 and shard2.
Whenever a user creates a new chat, select a shard and create a new node for that chat. You should store list of user's chats somewhere else along with the shard ID and the default database instance seems to be great for that but Firestore works too. So an object containing information of a chat will look something like:
{
chatID: "c40f15af19a94b6f84117747337b9f7a",
createdBy: "U1",
users: ["U1", "U2", "U3"],
shardId: "shard2"
}
Now you have list of chatIDs along with their shards so just connect your listeners. Again it depends on what the expected behavior is. In my case I just had to listen to data which is selected by user (i.e. active chat).
Try to divide chats evenly across all shards. One with least amount of chats active (you will have to store number of chats created per shard somewhere else like default shard) (or something like Round Robin maybe useful. At the same time, take the user creating the chat into account.
Incrementing count of chats present in a shard when a new chat is created maybe a good way.
At the end I think it's just about how you are dividing your chats in shards and there are many algorithms you can use. Having a list of user's chats containing the shard name seems to be an easy way to do so as above. I personally prefer Firestore to store list of chats so it's easier to query based on creator of chat, chats where a user U2 is a part and so on.
Creating new chats using a Cloud Function (or your servers) is preferred so no one can just spam a single database shard by reverse engineering the app.
This way all your messages will be stored in realtime database but basic information will of the chats is in Firestore (not necessary but easier to query chats). When a user opens the chat app, load the chats they are part of:
Here's a sample Firestore document:
const db = firebase.firestore()
// loading user's chats
const chatsSnapshot = await db.collection("chats").where("members", "array-contains", "myUID").get()
const chatsInfo = chats.map((c) => ({...c.data(), id: c.id}))
// Realtime DB shards
const shards = {
shard1: firebase.database(app1),
shard2: firebase.database(app2),
shard3: firebase.database(app3)
}
// Run a loop on chatsInfo and render chats to your app
for (const chat of chatsInfo) {
// Limit to first N messages if necessary
const chatRef = shards[chat.shardId].ref(chat.id);
chatRef.on('value', (snapshot) => {
const data = snapshot.val();
// Render messages
});
}
You don't need to load all the chats as I've shown above. Load messages only for the chat that is active.

Is it possible to fetch all documents whose sub-collection contains a specific document ID?

I am trying to fetch all documents whose sub-collection contain a specific document ID. Is there any way to do this?
For example, if the boxed document under 'enquiries' sub-collection exists, then I need the boxed document ID from 'books' collection. I couldn't figure out how to go backwards to get the parent document ID.
I make the assumption that all the sub-collections have the same name, i.e. enquiries. Then, you could do as follows:
Add a field docId in your enquiries document that contains the document ID.
Execute a Collection Group query in order to get all the documents with the desired docId value (Firestore.instance.collectionGroup("enquiries").where("docId", isEqualTo: "ykXB...").getDocuments()).
Then, you loop over the results of the query and for each DocumentReference you call twice the parent() methods (first time you will get the CollectionReference and second time you will get the DocumentReference of the parent document).
You just have to use the id property and you are done.
Try the following:
Firestore.instance.collection("books").where("author", isEqualTo: "Arumugam").getDocuments().then((value) {
value.documents.forEach((result) {
var id = result.documentID;
Firestore.instance.collection("books").document(id).collection("enquiries").getDocuments().then((querySnapshot) {
querySnapshot.documents.forEach((result) {
print(result.data);
});
First you need to retrieve the id under the books collection, to be able to do that you have to do a query for example where("author", isEqualTo: "Arumugam"). After retrieving the id you can then do a query to retrieve the documents inside the collection enquiries
For example, if the boxed document under 'enquiries' sub-collection exists, then I need the boxed document ID from 'books' collection.
There is no way you can do that in a single go.
I couldn't figure out how to go backwards to get the parent document ID.
There is no going back in Firestore as you probably were thinking. In Firebase Realtime Database we have a method named getParent(), which does exactly what you want but in Firestore we don't.
Queries in Firestore are shallow, meaning that it only get items from the collection that the query is run against. Firestore doesn't support queries across different collections in one go. A single query may only use the properties of documents in a single collection. So the solution to solving your problem is to perform two get() calls. The first one would be to check that document for existence in the enquiries subcollection, and if it exists, simply create another get() call to get the document from the books collection.
Renaud Tarnec's answer is great for fetching the IDs of the relevant books.
If you need to fetch more than the ID, there is a trick you could use in some scenarios. I imagine your goal is to show some sort of an index of all books associated with a particular enquiry ID. If the data you'd like to show in that index is not too long (can be serialized in less than 1500 bytes) and if it is not changing frequently, you could try to use the document ID as the placeholder for that data.
For example, let's say you wanted to display a list of book titles and authors corresponding to some enquiryId. You could create the book ID in the collection with something like so:
// Assuming admin SDK
const bookId = nanoid();
const author = 'Brandon Sanderson';
const title = 'Mistborn: The Final Empire';
// If title + author are not unique, you could add the bookId to the array
const uniquePayloadKey = Buffer.from(JSON.stringify([author, title])).toString('base64url');
booksColRef.doc(uniquePayloadKey).set({ bookId })
booksColRef.doc(uniquePayloadKey).collection('enquiries').doc(enquiryId).set({ enquiryId })
Then, after running the collection group query per Renaud Tarnec's answer, you could extract that serialized information with a regexp on the path, and deserialize. E.g.:
// Assuming Web 9 SDK
const books = query(collectionGroup(db, 'enquiries'), where('enquiryId', '==', enquiryId));
return getDocs(books).then(snapshot => {
const data = []
snapshot.forEach(doc => {
const payload = doc.ref.path.match(/books\/(.*)\/enquiries/)[1];
const [author, title] = JSON.parse(atob(details));
data.push({ author, title })
});
return data;
});
The "store payload in ID" trick can be used only to present some basic information for your child-driven search results. If your book document has a lot of information you'd like to display once the user clicks on one of the books returned by the enquiry, you may want to store this in separate documents whose IDs are the real bookIds. The bookId field added under the unique payload key allows such lookups when necessary.
You can reuse the same data structure for returning book results from different starting points, not just enquiries, without duplicating this structure. If you stored many authors per book, for example, you could add an authors sub-collection to search by. As long as the information you want to display in the resulting index page is the same and can be serialized within the 1500-byte limit, you should be good.
The (quite substantial) downside of this approach is that it is not possible to rename document IDs in Firestore. If some of the details in the payload change (e.g. an admin fixes a book titles), you will need to create all the sub-collections under it and delete the old data. This can be quite costly - at least 1 read, 1 write, and 1 delete for every document in every sub-collection. So keep in mind it may not be pragmatic for fast changing data.
The 1500-byte limit for key names is documented in Usage and Limits.
If you are concerned about potential hotspots this can generate per Best Practices for Cloud Firestore, I imagine that adding the bookId as a prefix to the uniquePayloadKey (with a delimiter that allows you to throw it away) would do the trick - but I am not certain.

Which is a more optimal Firestore schema for getting a Social Media feed?

I'm toying with several ideas for using Firestore for a social media feed. So far, the ideas I've had haven't panned out, so for this one I'm hoping to get the community's feedback.
The idea is to allow users to post information, or to record their activity, and to any user following/subscribed to that information, display it. The posts information would be in a root collection called posts.
The approaches, as far as I can tell, require roughly the same number of reads and writes.
One idea is to have within the users/{userId} have a field called posts which is an array of documentIds that I'm interested in pulling for the user. This would allow me to pull directly from posts and get the most up-to-date version of the data.
Another approach seems more Firebasey which is to store documents within users/{userId}/feeds that are copies of the posts themselves. I can use the same postID as the data in posts. Presumably, if I need to update the data for any review, I can use a group collection query to get all collections called feeds, where the docID is equal (or just create a field to do a proper "where", "==", docId).
Third approach is all about updating the list of people who should view the posts. This seems better as long as the list of posts is shorter than the lists of followers. Instead of maintaining all posts on every follower, you're maintaining all followers on each post. For every new follower, you need to update all posts.
This list would not be a user's own posts. Instead it would be a list of all the posts to show that user.
Three challengers:
users/{userId} with field called feed - an array of doc Ids that point to the global posts. Get that feed, get all docs by ID. Every array would need to be updated for every single follower each time a user has activity.
users (coll)
-> uid (doc)
-> uid.feed: postId1, postId2, postId3, ...] (field)
posts (coll)
-> postId (doc)
Query (pseudo):
doc(users/{uid}).get(doc)
feed = doc.feed
for postId in feed:
doc(posts/{postId}).get(doc)
users/{userId}/feed which has a copy of all posts that you would want this user to see. Every activity/post would need to be added to every relevant feed list.
users (coll)
-> uid (doc)
-> feed: (coll)
-> postId1 (doc)
-> postId2
-> postId3
posts (coll)
-> postId (doc)
Query (pseudo):
collection(users/{uid}/feed).get(docs)
for post in docs:
doc(posts/{post}).get(doc)
users/{userId}/feed which has a copy of all posts that you would want this user to see. Every activity/post would need to be added to every relevant feed list.
users (coll)
-> uid (doc)
posts (coll)
-> postId (doc)
-> postId.followers_array[followerId, followerId2, ...] (field)
Query (pseudo):
collection(posts).where(followers, 'array_contains', uid).get(docs)
Reads/Writes
1. Updating the Data
For the author user of every activity, find all users following that
user. Currently, the users are stored as documents in a collection, so this is followerNumber document reads. For each of the users, update their array by prepending the postId this would be followerNumber document writes.
1. Displaying the Data/Feed
For each fetch of the feed: get array from user document (1 doc read). For each postId, call, posts/{postId}
This would be numberOfPostsCalled document reads.
2. Updating the Data
For the author user of every activity, find all users following that
user. Currently, the users are stored as documents in a collection, so this is followerNumber document reads. For each of the users, add a new document with ID postId to users/{userId}/feed this would be followerNumber document writes.
2. Displaying the Data/Feed
For each fetch of the feed: get a certain number of posts from users/{userId}/feed
This would be numberOfPostsCalled document reads.
This second approach requires me to keep all of the documents up to date in the event of an edit. So despite this approach seeming more firebase-esque, the approach of holding a postId and fetching that directly seems slightly more logical.
3. Updating the Data
For every new follower, each post authored by the person being followed needs to be updated. The new follower is appended to an array called followers.
3. Displaying the Data
For each fetch of the feed: get a certain number of posts from posts where uid == viewerUid
Nice, when I talk about what is more optimal I really need a point or a quality attribute to compare, I' will assume you care about speed (not necessary performance) and costs.
This is how I would solve the problem, it involves several collections but my goal is 1 query only.
user (col)
{
"abc": {},
"qwe": {}
}
posts (col)
{
"123": {},
"456": {}
}
users_posts (col)
{
"abc": {
"posts_ids": ["123"]
}
}
So far so good, the problem is, I need to do several queries to get all the posts information... This is where cloud functions get into the game. You can create a 4th collection where you can pre-calculate your feed
users_dashboard
{
"abc": {
posts: [
{
id: "123", /.../
}, {
id: "456", /.../
}
]
}
}
The cloud function would look like this:
/* on your front end you can manage the add or delete ids from user posts */
export const calculateDashboard = functions.firestore.document(`users_posts/{doc}).onWrite(async(change, _context) {
const firestore = admin.firestore()
const dashboardRef = firestore.collection(`users_dashboard`)
const postRef = firestore.collection(`posts`)
const user = change.after.data()
const payload = []
for (const postId of user.posts_ids) {
const data = await postRef.doc(postId).get().then((doc) => doc.exists ? doc.data() : null)
payload.push(data)
}
// Maybe you want to exponse only certain props... you can do that here
return dashboardRef.doc(user.id).set(payload)
})
The doc max size is 1 MiB (1,048,576 bytes) that is plenty of data you can store in, so you can have like a lot of posts here. Let's talk about costs; I used to think firestore was more like to have several small docs but I've found in practice it works equally well with big size into a big amount of docs.
Now on your dashboard you only need query:
const dashboard = firestore.collection(`users_dashboard`).doc(userID).get()
This a very opinionated way to solve this problem. You could avoid using the users_posts, but maybe you dont want to trigger this process for other than posts related changes.
It looks like your second approach is best in this situation.. I don't really understand what #andresmijares was trying to do and he mentioned something like storing posts in a document which is not a good approach, imagine if you have more than 20K posts (which what I think a document can hold) then the document won't be able to store any more data.. a better approach is to store posts as a document inside a Collection (just like in your 2nd option).. So let's recall here what's the best approach.
1)_ You share a post in the (posts "Collection") and in users you're following's (Feed "Collection").. maybe this can be done with cloud function and let's not forget to aggregate (with cloud functions also) the number of posts that needs to appear in the user's profile.
2)_ You follow a user and get all of their posts from the (posts "Collection") into your (Feed "Collection") this way you get to see all of their posts on your feed.
with this approach, there will be a lot of writes once but the read will be fast.. and if your app is about reading more and writing less then there's nothing to worry about unless i'm wrong.

Google Cloud Firestore documents limit

I've been working with Google Cloud Firestore. I'm about to import 13000+ records from a CSV to the firestore back-end. I'll be using this collection for look up and auto-completion purposes.
I'm curious and concerned to know if this is a good idea. Also, I'm looking for some suggestions on what techniques should I be using to make retrieval of this this data as efficient as possible. I'm working with Angular 5 and using AngularFire2 to connect with Firestore.
The document itself is really small such as:
{
address: {
state: "NSW"
street: "19 XYZ Road"
suburb: "Darling Point"
},
user: {
name: "ABC",
company: "Property Management Company"
}
file_no: "AB996"
}
Most of the searching would be based on file_no property of the document.
Update
I just imported all 13k+ records to Firestore. It is really efficient. However, I have one issue. After importing the records, I'm getting the message on my Firestore console that my daily limit for Read Operations is reached (0.05 of 0.05 Million Ops). I just wrote data and displayed those records in a Data Table. I used the following query:
this.propertyService
.getSnapshotChanges()
.subscribe(properties => {
this.properties = properties;
this.loadingIndicator = false;
});
getSnapshotChanges(): Observable < any > {
return this.afs.collection(this.propertiesCollection).snapshotChanges()
.map((actions) => {
return actions.map((snapshot) => {
const data = snapshot.payload.doc.data();
data.id = snapshot.payload.doc.id;
return data;
});
});
}
How dos this makes my reading limit exceed?
The number of documents in a collection is of no consequence when you use Cloud Firestore. That's actually one of its bigger perks: no matter how many documents are in a collection, the queries will take the same amount of time.
Say you add 130 document and (for sake of example) it takes 1 second to get 10 documents out of it. That's the performance you'll get no matter how many documents are in the collection. So with 1300 documents it will also take 1 second, with 13K it will take 1 second, and with 13M, it will also take 1 second.
The problem more developers run into is to make their use-cases fit within the API of Firestore. For example: the only way to search for strings is with a so-called prefix match, there is no support for full-text search. This means that you can search for Prop* and find Property Management Company, but not for *Man* to find it.

Resources