I am aware it would be very difficult to query by a value that does not exist in an array but is there a way to do this without doing exactly that?
Here is my scenario - I have a subscription based service where people can opt in and "follow" a specific artist. In my backend, this creates a subscription doc with their followerId, the id of the artist they want to follow (called artistId), and an array called pushed. The artist can add new releases, and then send each follower a notification of a specific song in the future. I would like to keep track of which follower has been pushed which release, and this done in the aforementioned pushed array. I need a way to find which followers have already been pushed a specific release and so...
I was thinking of combining two queries but I am not sure if it is possible. Something like:
db.collection('subscriptions').where('artistId', '==', artistId)
db.collection('subscriptions').where('artistId', '==', artistId).where('pushed', 'array-contains', releaseId)
And then take the intersection of both query results and subtract from the 1st query to get the followers that have not been pushed a specific release.
Is this possible? Or is there a better way?
There is no way to query Firestore for documents that don't have a certain field or value. It's not "very difficult", but simply not possible. To learn more on why that is, see:
Firestore get documents where value not in array?
Firestore: how to perform a query with inequality / not equals
The Get to know Cloud Firestore episode on how Firestore queries work.
Your workaround is possible, and technically not even very complex. The only thing to keep in mind is that you'll need to load all artists. So the performance will be linear to the number of artists you have. This may be fine for your app at the moment, but it's something to definitely do some measurements on.
Typically my workaround is to track not what releases were pushed to a user, but when the last push was sent to a user. Say that a release has a "notifications sent timestamp" and each user has a "last received notifications timestamp". By comparing the two you can query for users who haven't received a notification about a specific release yet.
The exact data model for this will depend on your exact use-case, and you might need to track multiple timestamps for each user (e.g. for each artist they follow). But I find that in general I can come up with a reasonable solution based on this model.
For Elasticsearch case you need to sync with your database and elasticsearch server. And also need to make firewall rules at your Google Cloud Platform, you need keep away the arbitrarily request to your server, since it may cause bandwith cost.
The not-in operator is now available in Firestore!
citiesRef.where('country', 'not-in', ['USA', 'Japan']);
See the docs for a full list of examples:
https://firebase.google.com/docs/firestore/query-data/queries#in_not-in_and_array-contains-any
citiesRef.where('country', 'not-in', [['USA']]);
Notice the double array around [['USA']]. You need this to filter out any docs that have 'USA' in the 'country' array.
Single array ['USA'] assumes that 'country' is a string.
Related
I am new to Firestore and building an event planning app but I am unsure what the best way to structure the data is taking into account the speed of queries and Firestore costs based on reads etc. In both options I can think of, I have a users collection and an events collection
Option 1:
In the users collection, each user has an array of eventIds for events they are hosting and also events they are attending. Then I query the events collection for those eventIds of that user so I can list the appropriate events to the user
Option 2:
For each event in the events collection, there is a hostId and an array of attendeeIds. So I would query the events collection for events where the hostID === user.id and where attendeeIds.includes(user.id)
I am trying to figure out which is best from a performance and a costs perspective taking into account there could be thousands of events to iterate through. Is it better to search events collections by an eventId as it will stop iterating when all events are found or is that slow since it will be searching for one eventId at a time? Maybe there is a better way to do this than I haven't mentioned above. Would really appreciate the feedback.
In addition to #Dharmaraj answer, please note that none of the solutions is better than the other in terms of performance. In Firestore, the query performance depends on the number of documents you request (read) and not on the number of documents you are searching. It doesn't really matter if you search 10 documents in a collection of 100 documents or in a collection that contains 100 million documents, the response time will always be the same.
From a billing perspective, yes, the first solution will imply an additional document to read, since you first need to actually read the user document. However, reading the array and getting all the corresponding events will also be very fast.
Please bear in mind, that in the NoSQL world, we are always structuring a database according to the queries that we intend to perform. So if a query returns the documents that you're interested in, and produces the fewest reads, then that's the solution you should go ahead with. Also remember, that you'll always have to pay a number of reads that is equal to the number of documents the query returns.
Regarding security, both solutions can be secured relatively easily. Now it's up to you to decide which one works better for your use case.
I would recommend going with option 2 because it might save you some reads:
You won't have to query the user's document in the first place and then run another query like where(documentId(), "in", [...userEvents]) or fetch each of them individually if you have many.
When trying to write security rules, you can directly check if an event belongs to the user trying to update the event by resource.data.hostId == request.auth.uid.
When using the first option, you'll have to query the user's document in security rules to check if this eventID is present in that events array (that may cost you another read). Checkout the documentation for more information on billing.
There are several articles (firestore and firebase realtime database) explaining how to build a user presence system but I cannot find a resource for a friend presence system.
A simple user presence system is not perfect for some applications such as chat apps where there are millions of users and each user wants to listen to only his/her friends. I've found similar questions:
exact same question on stackoverflow
exact same issue on github
Two ok solutions with a realtime database are: (solutions are from the above stackoverflow post)
Use many listeners (one for each friend) with a collection of users. Possibly have a cap on the number of friends to keep track of.
Each user has friends collections and whenever a user's status changes, his/her status changes wherever he/she shows up in some user's friends collection as well.
Is there a better way to do? What kind of databases do chat apps like discord, whatsapp and etc. use to build their friends presence system?
I came to two approaches that might be worth looking into. Note, that I have not tested how it will scale longer term as I just pushed to prod. First step, write a users presence on their user document (will need firebase, cloud functions, and cloud firestore per https://firebase.google.com/docs/firestore/solutions/presence).
Then take either approach:
Create an array field on your user documents (users> {userID}) called friends. Every time you add a friend add your id to this array, and vice versa. Then, on the client run a function like:
db.collection(users).where("friends", "array-contains", clientUserId).onSnapshot(...)
In doing so, all documents with friends field that contains the clientUserId will be listened to for real-time updates. For some reason, my team didn't approve of this design but it works. If anyone can share their opinion as to why I'd appreciate it
Create a friend sub-collection like so: users>{userID}>friends
. When you add a friend, add a document to your friend sub-collection with the id equal to your friends userID. When a user logs on, run a get query for all documents in this collection. Get the doc IDs and store into an array (call it friendIDs). Now for the tricky part. It'd be ideal if you can read use the in operator for unlimited comparison values because you can just run an onSnapshot as so:
this.unSubscribeFriends = db.collection(users).where(firebase.firestore.FieldPath.documentId(), "in", friendIDs).onSnapshot((querySnapshot) => {get presence data}). Since this onSnapshot is attached to this.unSubscribeFriends you just need to call this once to detach the listener:
componentWillUnmount() {
this.unSubscribeFriends && this.unSubscribeFriends()
}
Because a given users friends can definetely increase into the hundreds I had to create a new array called chunkedFriendsArray consisting of a chunked version of friendIDs (chunked as in every 10 string IDs I splice into a new array to bypass the in operator 10 comparison values limit). Thus, I had to map chunkedFriendsArray and set an onSnapshot like the one above for every array of a max length of 10 inside chunkedFriendsArray. The problem with this is that the all the listeners are attached to the same const (or this.unSubscribeFriends in my case). I have to call this.unSubscribeFriends as many times as chunkedArrays exist in chunkedFriendsArray:
componentWillUnmount() {
this.state.chunkedFriendsArray.forEach((doc) => {
this.unSubscribeFriends && this.unSubscribeFriends()
})
}
It feels weird having many listeners attached to the same const (method this.unSubscribeFriends) and calling the same exact one to stop listening to them. I'm sure this will lead to bugs in my production code.
There are other decentralize approaches but the two I listed are my best attempts at avoiding having a bunch of decentralized presence data.
I am aware it would be very difficult to query by a value that does not exist in an array but is there a way to do this without doing exactly that?
Here is my scenario - I have a subscription based service where people can opt in and "follow" a specific artist. In my backend, this creates a subscription doc with their followerId, the id of the artist they want to follow (called artistId), and an array called pushed. The artist can add new releases, and then send each follower a notification of a specific song in the future. I would like to keep track of which follower has been pushed which release, and this done in the aforementioned pushed array. I need a way to find which followers have already been pushed a specific release and so...
I was thinking of combining two queries but I am not sure if it is possible. Something like:
db.collection('subscriptions').where('artistId', '==', artistId)
db.collection('subscriptions').where('artistId', '==', artistId).where('pushed', 'array-contains', releaseId)
And then take the intersection of both query results and subtract from the 1st query to get the followers that have not been pushed a specific release.
Is this possible? Or is there a better way?
There is no way to query Firestore for documents that don't have a certain field or value. It's not "very difficult", but simply not possible. To learn more on why that is, see:
Firestore get documents where value not in array?
Firestore: how to perform a query with inequality / not equals
The Get to know Cloud Firestore episode on how Firestore queries work.
Your workaround is possible, and technically not even very complex. The only thing to keep in mind is that you'll need to load all artists. So the performance will be linear to the number of artists you have. This may be fine for your app at the moment, but it's something to definitely do some measurements on.
Typically my workaround is to track not what releases were pushed to a user, but when the last push was sent to a user. Say that a release has a "notifications sent timestamp" and each user has a "last received notifications timestamp". By comparing the two you can query for users who haven't received a notification about a specific release yet.
The exact data model for this will depend on your exact use-case, and you might need to track multiple timestamps for each user (e.g. for each artist they follow). But I find that in general I can come up with a reasonable solution based on this model.
For Elasticsearch case you need to sync with your database and elasticsearch server. And also need to make firewall rules at your Google Cloud Platform, you need keep away the arbitrarily request to your server, since it may cause bandwith cost.
The not-in operator is now available in Firestore!
citiesRef.where('country', 'not-in', ['USA', 'Japan']);
See the docs for a full list of examples:
https://firebase.google.com/docs/firestore/query-data/queries#in_not-in_and_array-contains-any
citiesRef.where('country', 'not-in', [['USA']]);
Notice the double array around [['USA']]. You need this to filter out any docs that have 'USA' in the 'country' array.
Single array ['USA'] assumes that 'country' is a string.
I have a collection of conversations, each conversation having a hidden<Map> where each participant is the key, having a boolean value, so I can see if he archived the conversation on his end or not. Therefore, the query looks like this:
store.conversations
.where( 'participants', 'array-contains', uid )
.where( `hidden.${uid}`, '==', false )
.orderBy( 'createdAt', 'desc' )
Problem rises when adding orderBy, which makes it a "range" query. So, given each document has a different set of keys in the hidden<Map>, Firestore is suggesting the following, which obviously wouldn't work:
participants Arrays
hidden.`48m6lKjwvKUOboAxlc0ppX2R7qF2` Ascending
createdAt Descending
How do I go around this? I guess flattening the Map would be a solution but, not most elegant. Any advice?
Firestore is suggesting the following, which obviously wouldn't work:
participants Arrays
hidden.`48m6lKjwvKUOboAxlc0ppX2R7qF2` Ascending
createdAt Descending
You can create such an index and it will work but the problem rises if your app becomes popular and you'll have you'll have a big number of users. This means that for every conversation you'll have to create an index and this is not such a good idea because when it comes to indexes, there are some limitations. According to the official documentation regarding Firestore usage and limits:
Maximum number of composite indexes for a database: 200
Number that can be reached very quickly.
I guess flattening the Map would be a solution
You're guessing right. This practice is also called denormalization and is a common practice when it comes to Firebase. If you are new to NoQSL databases, I recommend you see this video, Denormalization is normal with the Firebase Database for a better understanding. It is for Firebase realtime database but same rules apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an item, you need to do it in every place that it exists.
For more information please also see my answer from the following post:
What is denormalization in Firebase Cloud Firestore?
So you can denormalize your database and create conversations without the need of creating indexes. For your use-case, you should consider augmenting your data structure to allow a reverse lookup by creating a new collection or subcollection named userConversations that can hold as documents all the conversations that a user has. For a simple query, there is no index needed.
I have two Firestore collections, Users and Posts. Below are simplified examples of what the typical document in each contains.
*Note that the document IDs in the friends subcollection are equal to the document ID of the corresponding user documents. Optionally, I could also add a uid field to the friends documents and/or the Users documents. Also, there is a reason not relevant to this question that we have friends as a subcollection to each user, but if need-be we change it into a unified root-level Friends collection.
This setup makes it very easy to query for posts, sorted chronologically, by any given user by simply looking for Posts documents whose owner field is equal to the document reference of that user.
I achieve this in iOS/Swift with the following, though we are building this app for iOS, Android, and web.
guard let uid = Auth.auth().currentUser?.uid else {
print("No UID")
return
}
let firestoreUserRef = firestore.collection("Users").document(uid)
firestorePostsQuery = firestore.collection("Posts").whereField("owner", isEqualTo: firestoreUserRef).order(by: "timestamp", descending: true).limit(to: 25)
My question is how to query Posts documents that have owner values contained in the user's friends subcollection, sorted chronologically. In other words, how to get the posts belonging to the user's friends, sorted chronologically.
For a real-world example, consider Twitter, where a given user's feed is populated by all tweets that have an owner property whose value is contained in the user's following list, sorted chronologically.
Now, I know from the documentation that Firestore does not support logical OR queries, so I can't just chain all of the friends together. Even if I could, that doesn't really seem like an optimal approach for anyone with more than a small handful of friends.
The only option I can think of is to create a separate query for each friend. There are several problems with this, however. The first being the challenges presenting (in a smooth manner) the results from many asynchronous fetches. The second being that I can't merge the data into chronological order without re-sorting the set manually on the client every time one of the query snapshots is updated (i.e., real-time update).
Is it possible to build the query I am describing, or am I going to have to go this less-than optimal approach? This seems like a fairly common query use-case, so I'll be surprised if there is not a way to do this.
The sort chronologically is easy provided you are using a Unix timestamp, e.g. 1547608677790 using the .orderBy method. However, that leaves you with a potential mountain of queries to iterate through (one per friend).
So, I think you want to re-think the data store schema.
Take advantage of Cloud Functions for Firebase Triggers. When a new post is written, have a cloud function calculate who all should see it. Each user could have an array-type property containing all unread-posts, read-posts, etc.
Something like that would be fast and least taxing.