I am new to Firestore and building an event planning app but I am unsure what the best way to structure the data is taking into account the speed of queries and Firestore costs based on reads etc. In both options I can think of, I have a users collection and an events collection
Option 1:
In the users collection, each user has an array of eventIds for events they are hosting and also events they are attending. Then I query the events collection for those eventIds of that user so I can list the appropriate events to the user
Option 2:
For each event in the events collection, there is a hostId and an array of attendeeIds. So I would query the events collection for events where the hostID === user.id and where attendeeIds.includes(user.id)
I am trying to figure out which is best from a performance and a costs perspective taking into account there could be thousands of events to iterate through. Is it better to search events collections by an eventId as it will stop iterating when all events are found or is that slow since it will be searching for one eventId at a time? Maybe there is a better way to do this than I haven't mentioned above. Would really appreciate the feedback.
In addition to #Dharmaraj answer, please note that none of the solutions is better than the other in terms of performance. In Firestore, the query performance depends on the number of documents you request (read) and not on the number of documents you are searching. It doesn't really matter if you search 10 documents in a collection of 100 documents or in a collection that contains 100 million documents, the response time will always be the same.
From a billing perspective, yes, the first solution will imply an additional document to read, since you first need to actually read the user document. However, reading the array and getting all the corresponding events will also be very fast.
Please bear in mind, that in the NoSQL world, we are always structuring a database according to the queries that we intend to perform. So if a query returns the documents that you're interested in, and produces the fewest reads, then that's the solution you should go ahead with. Also remember, that you'll always have to pay a number of reads that is equal to the number of documents the query returns.
Regarding security, both solutions can be secured relatively easily. Now it's up to you to decide which one works better for your use case.
I would recommend going with option 2 because it might save you some reads:
You won't have to query the user's document in the first place and then run another query like where(documentId(), "in", [...userEvents]) or fetch each of them individually if you have many.
When trying to write security rules, you can directly check if an event belongs to the user trying to update the event by resource.data.hostId == request.auth.uid.
When using the first option, you'll have to query the user's document in security rules to check if this eventID is present in that events array (that may cost you another read). Checkout the documentation for more information on billing.
Related
I am aware it would be very difficult to query by a value that does not exist in an array but is there a way to do this without doing exactly that?
Here is my scenario - I have a subscription based service where people can opt in and "follow" a specific artist. In my backend, this creates a subscription doc with their followerId, the id of the artist they want to follow (called artistId), and an array called pushed. The artist can add new releases, and then send each follower a notification of a specific song in the future. I would like to keep track of which follower has been pushed which release, and this done in the aforementioned pushed array. I need a way to find which followers have already been pushed a specific release and so...
I was thinking of combining two queries but I am not sure if it is possible. Something like:
db.collection('subscriptions').where('artistId', '==', artistId)
db.collection('subscriptions').where('artistId', '==', artistId).where('pushed', 'array-contains', releaseId)
And then take the intersection of both query results and subtract from the 1st query to get the followers that have not been pushed a specific release.
Is this possible? Or is there a better way?
There is no way to query Firestore for documents that don't have a certain field or value. It's not "very difficult", but simply not possible. To learn more on why that is, see:
Firestore get documents where value not in array?
Firestore: how to perform a query with inequality / not equals
The Get to know Cloud Firestore episode on how Firestore queries work.
Your workaround is possible, and technically not even very complex. The only thing to keep in mind is that you'll need to load all artists. So the performance will be linear to the number of artists you have. This may be fine for your app at the moment, but it's something to definitely do some measurements on.
Typically my workaround is to track not what releases were pushed to a user, but when the last push was sent to a user. Say that a release has a "notifications sent timestamp" and each user has a "last received notifications timestamp". By comparing the two you can query for users who haven't received a notification about a specific release yet.
The exact data model for this will depend on your exact use-case, and you might need to track multiple timestamps for each user (e.g. for each artist they follow). But I find that in general I can come up with a reasonable solution based on this model.
For Elasticsearch case you need to sync with your database and elasticsearch server. And also need to make firewall rules at your Google Cloud Platform, you need keep away the arbitrarily request to your server, since it may cause bandwith cost.
The not-in operator is now available in Firestore!
citiesRef.where('country', 'not-in', ['USA', 'Japan']);
See the docs for a full list of examples:
https://firebase.google.com/docs/firestore/query-data/queries#in_not-in_and_array-contains-any
citiesRef.where('country', 'not-in', [['USA']]);
Notice the double array around [['USA']]. You need this to filter out any docs that have 'USA' in the 'country' array.
Single array ['USA'] assumes that 'country' is a string.
is there any better way to get multiple specific data from collection in firestore?
Let's say have this collection:
--Feeds (collection)
--feedA (doc)
--comments (collection)
--commentA (doc)
users_in_conversation: [abcdefg, hijklmn, ...] //Field contains list of all user in conversation
Then, I'll need to retrieve the user data (name and avatar) from the Users collection, currently, I did 1 query per user, but it will be slow when there are many people in conversation.
What's the best way to retrieve specific users?
Thanks!
Retrieving the additional names is actually a lot faster than most developers expect, as the requests can often be pipelined over a single HTTP/2 connection. But if you're noticing performance problems, edit your question to show the code you use, the data you have, and the performance you're getting.
A common way to reduce the need to load additional documents is by duplicating data. For example, if you store the name and avatar of the user in each comment document, you won't need to look up the user profile every time you read a comment.
If you come from a background in relational databases, this sort of data duplication may be very unexpected. But it's actually quite common in NoSQL databases.
You will of course then have to consider how to deal with updates to the user profile, for which I recommend reading: How to write denormalized data in Firebase While this is for Firebase's other database, the same concepts apply to Firebase. I also in general recommend watching Getting to know Cloud Firestore.
I have tried some solution, but I think this solution is the best for the case:
When a user posts a comment, write a field of array named discussions in the user document containing the feed/post id.
When user load on a feed/post, get all user data which have its id in the user discussions (using array-contains)
it’s efficient and costs fewer transaction processes.
I am aware it would be very difficult to query by a value that does not exist in an array but is there a way to do this without doing exactly that?
Here is my scenario - I have a subscription based service where people can opt in and "follow" a specific artist. In my backend, this creates a subscription doc with their followerId, the id of the artist they want to follow (called artistId), and an array called pushed. The artist can add new releases, and then send each follower a notification of a specific song in the future. I would like to keep track of which follower has been pushed which release, and this done in the aforementioned pushed array. I need a way to find which followers have already been pushed a specific release and so...
I was thinking of combining two queries but I am not sure if it is possible. Something like:
db.collection('subscriptions').where('artistId', '==', artistId)
db.collection('subscriptions').where('artistId', '==', artistId).where('pushed', 'array-contains', releaseId)
And then take the intersection of both query results and subtract from the 1st query to get the followers that have not been pushed a specific release.
Is this possible? Or is there a better way?
There is no way to query Firestore for documents that don't have a certain field or value. It's not "very difficult", but simply not possible. To learn more on why that is, see:
Firestore get documents where value not in array?
Firestore: how to perform a query with inequality / not equals
The Get to know Cloud Firestore episode on how Firestore queries work.
Your workaround is possible, and technically not even very complex. The only thing to keep in mind is that you'll need to load all artists. So the performance will be linear to the number of artists you have. This may be fine for your app at the moment, but it's something to definitely do some measurements on.
Typically my workaround is to track not what releases were pushed to a user, but when the last push was sent to a user. Say that a release has a "notifications sent timestamp" and each user has a "last received notifications timestamp". By comparing the two you can query for users who haven't received a notification about a specific release yet.
The exact data model for this will depend on your exact use-case, and you might need to track multiple timestamps for each user (e.g. for each artist they follow). But I find that in general I can come up with a reasonable solution based on this model.
For Elasticsearch case you need to sync with your database and elasticsearch server. And also need to make firewall rules at your Google Cloud Platform, you need keep away the arbitrarily request to your server, since it may cause bandwith cost.
The not-in operator is now available in Firestore!
citiesRef.where('country', 'not-in', ['USA', 'Japan']);
See the docs for a full list of examples:
https://firebase.google.com/docs/firestore/query-data/queries#in_not-in_and_array-contains-any
citiesRef.where('country', 'not-in', [['USA']]);
Notice the double array around [['USA']]. You need this to filter out any docs that have 'USA' in the 'country' array.
Single array ['USA'] assumes that 'country' is a string.
There are several questions asked about this topic but I cant find one that answers my question. As described here, there is no clear explanation as to whether the minimum charges are applicable to query.get() or real-time listeners as well. Quoted:
There is a minimum charge of one document read for each query that you perform, even if the query returns no results.
The reason am asking this question even though it may seem obvious for someone is due to the section; *for each query that you perform* in that statement which could mean a one time trigger e.g with get() method.
Scenario: If 10 users are listening to changes in a collection with queries i.e query.addSnapshotListener() then change occurs in one document which matches query filter of only two users, are the other eight charged a cost of one read too?
Database used: Firestore
In this scenario I would say no, the other eight would not be counted as reads because the documents they are listening to have not been updated or have not been added/removed from that collection based on their filters (query params). The reads aren't based on changes to the collection but rather changes to the stream of documents you are specifically listening to. Because that 1 document change was not part of the documents that the other 8 users were listening to then there is no new read for them. However, if that 1 document change led to that document now matching the query filters of those other 8, then yes there would be 8 new reads for those users. Hope that makes sense.
Also it's worth noting that things like have offlinePersistence enabled via the sdk and firestore's caching maximize the efficiency of limiting reads as well as using a singleton Observable that multiple instances in your app subscribe to as oppose to opening multiple streams of the same query throughout your app. Doesn't really apply to this question directory but again while in the same vein, it's worth noting.
I need an optimal way to store a lot of individual fields in firestore. Here is the problem:
I get json data from some api. it contains a list of users. I need to tell if those users are active, ie have been online in the past n days.
I cannot query each user in the list from the api against firestore, because there could be hundreds of thousands of users in that list, and therefore hundreds of thousands of queries and reads, which is way too expensive.
There is no way to use a list as a map for querying as far as I know in firestore, so that's not an option.
What I initially did was have a cloud function go through and find all the active users maybe once every hour, and place them in firebase realtime database in the structure:
activeUsers{
uid1: true
uid2: true
uid2: true
etc...
}
and every time I need to check which users are active, I get all fields under activeUsers (which is constrained to a maximum of 100,000 fields, approx 3~5 mb.
Now i was going to use that as my final mechanism, but I just realised that firebase charges for amount of bandwidth used, not number of reads. Therefore it could get very expensive doing this over and over whenever a user makes this request. And I cannot query every single result from firebase database as, while it does not charge per read (i think), it would be very slow to carry out hundreds of thousands of queries.
Now I have decided to use cloud firestore as my final hope, since it charges for number of reads and writes primarily as opposed to data downloaded and uploaded. I am going to use cloud functions again to check every hour the active users, and I'm going to try to figure out the best way to store that data within a few documents. I was thinking 10,000 fields per document with all the active users, then when a user needs to get the active users, they get all the documents (would be
10 if there are 100,000 total active users) and maps those client side to filter the active users.
So I really have 2 questions. 1, If I do it this way, what is the best way to store that data in firestore, is it the way I suggested? And 2, is there an all around better way to be performing this check of active users against the list returned from the api? Have I got it all wrong?
You could use firebase storage to store all the users in a text file, then download that text file every time?
Well this is three years old, but I'll answer here.
What you have done is not efficient and not a good approach. What I would do is as follows:
Make a separate collection, for all active users.
and store all the active users unique field such as ID there.
Then query that collection. Update that collection when needed.