Which is a more optimal Firestore schema for getting a Social Media feed? - firebase

I'm toying with several ideas for using Firestore for a social media feed. So far, the ideas I've had haven't panned out, so for this one I'm hoping to get the community's feedback.
The idea is to allow users to post information, or to record their activity, and to any user following/subscribed to that information, display it. The posts information would be in a root collection called posts.
The approaches, as far as I can tell, require roughly the same number of reads and writes.
One idea is to have within the users/{userId} have a field called posts which is an array of documentIds that I'm interested in pulling for the user. This would allow me to pull directly from posts and get the most up-to-date version of the data.
Another approach seems more Firebasey which is to store documents within users/{userId}/feeds that are copies of the posts themselves. I can use the same postID as the data in posts. Presumably, if I need to update the data for any review, I can use a group collection query to get all collections called feeds, where the docID is equal (or just create a field to do a proper "where", "==", docId).
Third approach is all about updating the list of people who should view the posts. This seems better as long as the list of posts is shorter than the lists of followers. Instead of maintaining all posts on every follower, you're maintaining all followers on each post. For every new follower, you need to update all posts.
This list would not be a user's own posts. Instead it would be a list of all the posts to show that user.
Three challengers:
users/{userId} with field called feed - an array of doc Ids that point to the global posts. Get that feed, get all docs by ID. Every array would need to be updated for every single follower each time a user has activity.
users (coll)
-> uid (doc)
-> uid.feed: postId1, postId2, postId3, ...] (field)
posts (coll)
-> postId (doc)
Query (pseudo):
doc(users/{uid}).get(doc)
feed = doc.feed
for postId in feed:
doc(posts/{postId}).get(doc)
users/{userId}/feed which has a copy of all posts that you would want this user to see. Every activity/post would need to be added to every relevant feed list.
users (coll)
-> uid (doc)
-> feed: (coll)
-> postId1 (doc)
-> postId2
-> postId3
posts (coll)
-> postId (doc)
Query (pseudo):
collection(users/{uid}/feed).get(docs)
for post in docs:
doc(posts/{post}).get(doc)
users/{userId}/feed which has a copy of all posts that you would want this user to see. Every activity/post would need to be added to every relevant feed list.
users (coll)
-> uid (doc)
posts (coll)
-> postId (doc)
-> postId.followers_array[followerId, followerId2, ...] (field)
Query (pseudo):
collection(posts).where(followers, 'array_contains', uid).get(docs)
Reads/Writes
1. Updating the Data
For the author user of every activity, find all users following that
user. Currently, the users are stored as documents in a collection, so this is followerNumber document reads. For each of the users, update their array by prepending the postId this would be followerNumber document writes.
1. Displaying the Data/Feed
For each fetch of the feed: get array from user document (1 doc read). For each postId, call, posts/{postId}
This would be numberOfPostsCalled document reads.
2. Updating the Data
For the author user of every activity, find all users following that
user. Currently, the users are stored as documents in a collection, so this is followerNumber document reads. For each of the users, add a new document with ID postId to users/{userId}/feed this would be followerNumber document writes.
2. Displaying the Data/Feed
For each fetch of the feed: get a certain number of posts from users/{userId}/feed
This would be numberOfPostsCalled document reads.
This second approach requires me to keep all of the documents up to date in the event of an edit. So despite this approach seeming more firebase-esque, the approach of holding a postId and fetching that directly seems slightly more logical.
3. Updating the Data
For every new follower, each post authored by the person being followed needs to be updated. The new follower is appended to an array called followers.
3. Displaying the Data
For each fetch of the feed: get a certain number of posts from posts where uid == viewerUid

Nice, when I talk about what is more optimal I really need a point or a quality attribute to compare, I' will assume you care about speed (not necessary performance) and costs.
This is how I would solve the problem, it involves several collections but my goal is 1 query only.
user (col)
{
"abc": {},
"qwe": {}
}
posts (col)
{
"123": {},
"456": {}
}
users_posts (col)
{
"abc": {
"posts_ids": ["123"]
}
}
So far so good, the problem is, I need to do several queries to get all the posts information... This is where cloud functions get into the game. You can create a 4th collection where you can pre-calculate your feed
users_dashboard
{
"abc": {
posts: [
{
id: "123", /.../
}, {
id: "456", /.../
}
]
}
}
The cloud function would look like this:
/* on your front end you can manage the add or delete ids from user posts */
export const calculateDashboard = functions.firestore.document(`users_posts/{doc}).onWrite(async(change, _context) {
const firestore = admin.firestore()
const dashboardRef = firestore.collection(`users_dashboard`)
const postRef = firestore.collection(`posts`)
const user = change.after.data()
const payload = []
for (const postId of user.posts_ids) {
const data = await postRef.doc(postId).get().then((doc) => doc.exists ? doc.data() : null)
payload.push(data)
}
// Maybe you want to exponse only certain props... you can do that here
return dashboardRef.doc(user.id).set(payload)
})
The doc max size is 1 MiB (1,048,576 bytes) that is plenty of data you can store in, so you can have like a lot of posts here. Let's talk about costs; I used to think firestore was more like to have several small docs but I've found in practice it works equally well with big size into a big amount of docs.
Now on your dashboard you only need query:
const dashboard = firestore.collection(`users_dashboard`).doc(userID).get()
This a very opinionated way to solve this problem. You could avoid using the users_posts, but maybe you dont want to trigger this process for other than posts related changes.

It looks like your second approach is best in this situation.. I don't really understand what #andresmijares was trying to do and he mentioned something like storing posts in a document which is not a good approach, imagine if you have more than 20K posts (which what I think a document can hold) then the document won't be able to store any more data.. a better approach is to store posts as a document inside a Collection (just like in your 2nd option).. So let's recall here what's the best approach.
1)_ You share a post in the (posts "Collection") and in users you're following's (Feed "Collection").. maybe this can be done with cloud function and let's not forget to aggregate (with cloud functions also) the number of posts that needs to appear in the user's profile.
2)_ You follow a user and get all of their posts from the (posts "Collection") into your (Feed "Collection") this way you get to see all of their posts on your feed.
with this approach, there will be a lot of writes once but the read will be fast.. and if your app is about reading more and writing less then there's nothing to worry about unless i'm wrong.

Related

Is it possible to fetch all documents whose sub-collection contains a specific document ID?

I am trying to fetch all documents whose sub-collection contain a specific document ID. Is there any way to do this?
For example, if the boxed document under 'enquiries' sub-collection exists, then I need the boxed document ID from 'books' collection. I couldn't figure out how to go backwards to get the parent document ID.
I make the assumption that all the sub-collections have the same name, i.e. enquiries. Then, you could do as follows:
Add a field docId in your enquiries document that contains the document ID.
Execute a Collection Group query in order to get all the documents with the desired docId value (Firestore.instance.collectionGroup("enquiries").where("docId", isEqualTo: "ykXB...").getDocuments()).
Then, you loop over the results of the query and for each DocumentReference you call twice the parent() methods (first time you will get the CollectionReference and second time you will get the DocumentReference of the parent document).
You just have to use the id property and you are done.
Try the following:
Firestore.instance.collection("books").where("author", isEqualTo: "Arumugam").getDocuments().then((value) {
value.documents.forEach((result) {
var id = result.documentID;
Firestore.instance.collection("books").document(id).collection("enquiries").getDocuments().then((querySnapshot) {
querySnapshot.documents.forEach((result) {
print(result.data);
});
First you need to retrieve the id under the books collection, to be able to do that you have to do a query for example where("author", isEqualTo: "Arumugam"). After retrieving the id you can then do a query to retrieve the documents inside the collection enquiries
For example, if the boxed document under 'enquiries' sub-collection exists, then I need the boxed document ID from 'books' collection.
There is no way you can do that in a single go.
I couldn't figure out how to go backwards to get the parent document ID.
There is no going back in Firestore as you probably were thinking. In Firebase Realtime Database we have a method named getParent(), which does exactly what you want but in Firestore we don't.
Queries in Firestore are shallow, meaning that it only get items from the collection that the query is run against. Firestore doesn't support queries across different collections in one go. A single query may only use the properties of documents in a single collection. So the solution to solving your problem is to perform two get() calls. The first one would be to check that document for existence in the enquiries subcollection, and if it exists, simply create another get() call to get the document from the books collection.
Renaud Tarnec's answer is great for fetching the IDs of the relevant books.
If you need to fetch more than the ID, there is a trick you could use in some scenarios. I imagine your goal is to show some sort of an index of all books associated with a particular enquiry ID. If the data you'd like to show in that index is not too long (can be serialized in less than 1500 bytes) and if it is not changing frequently, you could try to use the document ID as the placeholder for that data.
For example, let's say you wanted to display a list of book titles and authors corresponding to some enquiryId. You could create the book ID in the collection with something like so:
// Assuming admin SDK
const bookId = nanoid();
const author = 'Brandon Sanderson';
const title = 'Mistborn: The Final Empire';
// If title + author are not unique, you could add the bookId to the array
const uniquePayloadKey = Buffer.from(JSON.stringify([author, title])).toString('base64url');
booksColRef.doc(uniquePayloadKey).set({ bookId })
booksColRef.doc(uniquePayloadKey).collection('enquiries').doc(enquiryId).set({ enquiryId })
Then, after running the collection group query per Renaud Tarnec's answer, you could extract that serialized information with a regexp on the path, and deserialize. E.g.:
// Assuming Web 9 SDK
const books = query(collectionGroup(db, 'enquiries'), where('enquiryId', '==', enquiryId));
return getDocs(books).then(snapshot => {
const data = []
snapshot.forEach(doc => {
const payload = doc.ref.path.match(/books\/(.*)\/enquiries/)[1];
const [author, title] = JSON.parse(atob(details));
data.push({ author, title })
});
return data;
});
The "store payload in ID" trick can be used only to present some basic information for your child-driven search results. If your book document has a lot of information you'd like to display once the user clicks on one of the books returned by the enquiry, you may want to store this in separate documents whose IDs are the real bookIds. The bookId field added under the unique payload key allows such lookups when necessary.
You can reuse the same data structure for returning book results from different starting points, not just enquiries, without duplicating this structure. If you stored many authors per book, for example, you could add an authors sub-collection to search by. As long as the information you want to display in the resulting index page is the same and can be serialized within the 1500-byte limit, you should be good.
The (quite substantial) downside of this approach is that it is not possible to rename document IDs in Firestore. If some of the details in the payload change (e.g. an admin fixes a book titles), you will need to create all the sub-collections under it and delete the old data. This can be quite costly - at least 1 read, 1 write, and 1 delete for every document in every sub-collection. So keep in mind it may not be pragmatic for fast changing data.
The 1500-byte limit for key names is documented in Usage and Limits.
If you are concerned about potential hotspots this can generate per Best Practices for Cloud Firestore, I imagine that adding the bookId as a prefix to the uniquePayloadKey (with a delimiter that allows you to throw it away) would do the trick - but I am not certain.

How to update avatar/photoUrl on multiple locations in the Firestore?

In Firestore I am trying to update avatar (photoUrl) on multiple locations when the user changes the avatar.
Data structure: https://i.imgur.com/yOAgXwV.png
Cloud function:
exports.updateAvatars = functions.firestore.document('users/{userID}').onUpdate((change, context) => {
const userID = context.params.userID;
const imageURL = change.after.data().avatar;
console.log(userID);
console.log(imageURL);
});
With this cloud function, I am watching on user update changes. When the user changes avatar I got userID and a photo URL. Now I want to update all "todos" where participant UID is equal to userID.
You could use the array_contains operator for your query. This allows you to search for array values inside your documents.
https://firebase.google.com/docs/firestore/query-data/queries#array_membership
If you found your documents you have to update the URL on all documents and write the updated documents back to the database.
But this causes a lot of unnecessary writes especially if your todos collection gets bigger and you want to archive the todos data.
Simpler solution:
I suspect a to do list does not have more then 10 users if so this is a lot of trouble to save some document reads.
I would recommend you to just load all user documents you need to show the avatar on the to do list.

Store and Query Posts in Firestore in a performant way

So I need to store Posts that are created by Users, now the data modell is the problem, bringing all existing Posts in a Posts Collection with a field of creatorUserID will make it able to show posts belonging to a user.
Now a User has a Subcollection called Followers with the ID of people following, the problem with that is that Im not sure how a query would look to show only Posts of People that the User follows.
Also im worried about performance when there are 10mio+ Posts in the collection.
In order to query a document in Firestore the data you want to query by needs to be on the Document you want to query, there is no way of querying a collection by the data of a document from another collection. This is why your use-case is a bit tricky. It might not seem very elegant, but this is a way of solving it:
We use two collections, users and posts.
User
- email: string
- followingUserIDs: array with docIds of users you are following
Posts
- postName: string
- postText: string
- creatorUserID: string
To find all the posts belonging to all the users the logged in user is following, we can do the following in the code:
1 Retrieve the logged in user document
2 For each id in the "followingUserIDs" array I would query Firestore for the Posts. In JavaScript it would be something like:
followingUserIDs.map(followingUserId => {
return firestore.collection('Posts', ref => ref.where('creatorUserID',
'==', followingUserId));
})
3 Combine the result from all the queries into one array of posts

How to query for most recent submissions from users I follow using Firestore?

I've been reading many SO posts about specific querying situations and I've hit a roadblock on mine when trying to determine the best approach without going overboard on read requests.
I have an app where a user can follow multiple artists and on the home page it should display the upcoming submissions sorted by date. I want to limit the number of reads to 15, but the way my structure is set up, I cannot limit overall reads but just reads to a specific artist. Let me explain:
For a specific user, I save all artists that a user follows into an array:
user: {
userUsername: 'jlewallen18',
userArtistUIDs: [1,4,8,9]
}
To establish a connection to a submission from an artist a user follows, I add the artistUID to the submission:
submission: {
submissionArtistUID: 4,
submissionTitle: 'New Album'
submissionReleaseDate: 1550707200
}
Now the way I find what I need to display on the home page for a specific user is to loop through all the artist id's the user followers and query for submissions with that id. I can then subscribe to that observable using combineLatest from rxjs and receive my results.
this.submission$ = user.userArtistUIDs.map((id) => {
return this.db.collection$('submissions', ref => ref
.where('submissionArtistUID', '==', id));
});
This works great, but wont scale well against my read quota, because it queries for every single release from every artist I follow. So if I follow 300 artists, each with 3 releases, I would have 900 reads sent to the app and I'd have to sort and slice the final array to cut it to 15 on the client side.
This is what I'm looking for:
With my list of artistIDs -> query for submissions that contain the
artistID in my array -> sort by ascending order -> limit to 15.
Thus my read count would only be 15. I know that for NoSQL there isn't a one trick pony, but I am struggling to figure out the optimal method as there will be more reads than writes on my application.
With Firestore there is a limit() argument that can be passed along with the query, see here
Would this work?
this.submission$ = user.userArtistUIDs.map((id) => {
return this.db.collection$('submissions', ref => ref
.where('submissionArtistUID', '==', id).limit(15));
});

Firebase Firestore, query a users friend's posts

I am looking create a social-media feed using Firebase. My data is structured like this:
users: {
uid: {
... // details
}
}
friends: {
uid: {
friends: { // sub collection
fuid: {
... // details
}
}
}
}`
posts: {
postId: {
postedBy: uid
... // details
}
}
Now I am trying to get the posts from all friends of the user, limit it to the most recent 10 posts, and then create a scrolling directive that queries the next set of 10 posts so that the user doesn't have to query and load posts^N for friends^N on the page load. But I'm not really sure how to query firebase in an effective manner like this, for the user's friends and then their posts.
I have the scrolling directive working, taken from Jeff Delaney's Infinite Scrolling Lesson on AngularFirebase.com. But it only handles the posts (boats in the tutorial) collection as a whole, without selectively querying within that collection (to check if the user is a friend).
The only solution that I could think of was to query all of the user's friends posts, store that in an array, and then chunk load the results in the DOM based on the last batch of posts that were loaded. This just seems like it could be really inefficient in the long-haul if the user has 100's of friends, with 100's of posts each.
If I get it right, you are duplicating the post for each user in the user's friend list right? I don't think it is a good idea if your app escalates... At this time, the cost for 100k doc writes is $0,18, so:
Imagine that a user of your app have 1000 friends. When he posts anything, you are making 1000 writes in the database. imagine that you have 1000 active users like him. You have just made 1.000.000 writes now and paid $1.80.
Now even worse: you probably have on each post, a duplicated field for user displayName and a profileImageUrl. Imagine that this user has 500 posts in his history and have just changed his profile picture. You will have to update one of the fields for each post on each of his 1000 friend's feed right? You will be doing 1000 * 500 = 500.000 writes just for updating the profileImageUrl! and if the user didn't like the photo? he tries 3 new photos and now in 10 minutes you had made 2.000.000 writes in the database. This means you will be charged $3.60. It may not seems too much, but pay attention that we're talking about 1 single user in a single moment. 1000 users changing profile picture 4 times in the same day and you are paying $3,600.00.
Take a look at this article: https://proandroiddev.com/working-with-firestore-building-a-simple-database-model-79a5ce2692cb#7709
I ended up solving this issue by leveraging Firebase Functions. I have two collections, one is called Posts and the other is called Feeds. When a user adds a post, it gets added to the Posts collection. When this happens, it triggers a Firebase Function, which then grabs the posting user's UID.
Once it has the UID, it queries another collection called Friends/UID/Friends and grabs all of their friend's UID's.
Once it has the UID's, it creates a batch add (in case the user has more than 500 friends), and then adds the post to their friend's Feeds/UID/Posts collection.
The reason I chose this route, was a number of reasons.
Firebase does not allow you to query with array lists (the user's friends).
I did not want to filter out posts from non-friends.
I did not want to download excessive data to the user's device.
I had to paginate the results in order from newest to oldest.
By using the above solution, I am now able to query the Feeds/UID/Posts/ collection, in a way that returns the next 10 results every time, without performance or data issues. The only limitation I have not been able to get around completely is it takes a few seconds to add the post to the user's personally feed, as the Function needs time to spin up. But this can be mitigated by increasing the memory allocation for that particular function.
I also do the above listed for posts that are edited and or deleted.
I think i have a solution for Firestore Social Feed queries. Not sure if it works but here it is;
A Friends collection keeps the friends UUID'S list as an array in a document. Every document in this collection is for a user. So when the user logs in we first have the friends list with a cloud function with "one read" right? All friends id's are in one document. And we also put a lastchecked time stamp to this document. Everytime we get friends array we record the date.
Now a cloud function can check all users posts one by one. As i understand latest IN queries allow an array up to 10 UUID's. So if user has 100 friend query will end in ten rounds. Now we have sth to serve.
Instead of directly serving the posts we create a collection for every user. We will put all this collected data to document but we slice it to days. Let's pretend we already have older posts in this usersfeed collection (every day as a document). So we had a last time check on our friends document. We query now -> last checked date. This way we only fetched unseen posts and sliced them daily (if they belong to more days ofcourse)
So while this happens on cloud function we already served the previous feed document. And when collection has new document firestore already listens and adds right? If the user scrolls down we get the previous days document. So every document will have more then one posts data as map / array.
This saves many read counts i guess.

Resources