I'm creating a React firebase website that has a collection of documents that contains a rating from 1 to 10. All of these documents have an author attached. The average rating of all of the author's documents should be calculated and presented.
Here are my current two solutions:
Calculate the average from all the documents with the same author
Add the statistic to the author himself, such that every time the author adds a new document it will update his statistic
My thought process of the second one is such that the website doesn't have to calculate the average rating each time it's requested. Would this be a bad idea, or isn't there a problem in the first place, reading all the documents and calculating in the first place?
Your second approach is in fact a best practice when working with NoSQL databases. If you calculate the average on demand across a dynamic number of documents, the cost of that operation will grow as you add more documents to the database.
For this reason you'll want to calculate all aggregates on write, and store them in the database. With that approach looking up an aggregate value is a simple write.
Also see:
The Firebase documentation on aggregation queries
The Firebase documentation on distributed counters
How to get a count of number of documents in a collection with Cloud Firestore
Leaderboard ranking with Firebase
Related
I'm creating my first ever project with Firebase, and I come to the point when I need some statistics based on user input. I know Firebase (or NoSQL databases in general) are not ideal for statistics but they work for me in any other cases so I would like to give it a try.
What I have:
I work on the application where people can invite a friend to work for their company, so I have a collection of "referrals" where ID of each referral is basically UserID of a user to who the referral belongs, and then there is a subcollection with name "items" where data are stored.
How my data looks like:
Each item have these data:
applicant
appliedDate
position(part of position is positionId & department on which this position is coming from)
status
What I wanted is to let user to make statistics based on:
date range
status
department
What I was thinking about:
It's probably not the best idea to let firebase iterate over all referrals once users make requests as it may get really expensive on firebase. What I was thinking of is using cloudfunctions to calculate statistics always when something change e.g. when a new applicant applies I will increase the counter by one and the same for a counter to a specific department. However I feel like this make work for total numbers or for predefined queries e.g. "LAST MONTH" but once I will not know what dates user will select it start to get tricky.
Any idea how can I design something like this?
Thanks a lot!
What you're considering is the idiomatic approach to calculate aggregated in Firestore, and most NoSQL databases. If you follow this pattern, Firestore is quite well suited to storing statistics.
It's ad-hoc statistic, like the unknown data range, that are trickier. Usually this comes down to storing the right values to allow you to get rid of the need to read an unknown number of documents to calculate a value.
For example, if you store counters for the statistics per month, week, day and hour, you can satisfy a wide range of date ranges with a limited number of read operations. You may need to read multiple documents, but the number of documents to read depends on the range, and not on the total number of documents in the database.
Of course, for the most flexible ad-hoc querying, you may still want to consider another solution, such as BigQuery, which was made precisely for this use-case.
If you have a lot of entries you would usually have a document that sums the price for each transaction. I could read that document to show the overall value in the overview.
My problem is that the price (price * amount) for each entry changes every 5 min. Because of that I can't save this sum value of all documents.
In order to calculate I would need the price I bought at that point in time and the amount.
Thats basically what im saving for each transaction and to save it again makes no sense anything else.
I can't just do a document with one value that I can update all the time because each transaction is bought for a different price and a different amount of this item.
I could have thousands of transactions and the 20k Firestore limit would not be enough for that summary document.
The view for the transactions shows the latest 50 and is paged, that's fine for Firestore but I can't read all documents for the overall sum price.
Is there no other option as reading all transactions? I thought maybe of using firebase cloud storage and saving there a document with all transactions just for the the summary page.
I'm building an app which works like this: the user of the app is the manager of a team, he/she asks some questions to the team and collects the data in the app. Monthly, a report is generated by using this data. There is no use case/scenario where user will need to see all data at once, i.e. not filtered by month.
That being said, I thought about modelling the data this way:
- persons/{personId}:
- name
- answersByPerson/{personId}:
- personName
- byMonth/{YYYYMM}: (using month as key)
- month
- collectedAnswers/{uuid}:
- answer_to_q1 ... (these are all yes or no questions)
_ answer_to_qn
- aggregationsByPerson/{personId}: (this should be computed by cloud function)
- month
- byMonth/{YYYYMM}: (also using month as key)
- sum_q1... (count amount answered with 'yes')
- sum_qn
- reportByPerson/{personId}:
- personName
- month
- score (computed from aggregations)
So I have these questions:
Is it bad for me to use year/month as keys to my documents? (I'd make sure in my app to overwrite data if the key exists)
Is it bad for me to reuse the personId as keys in answersByPerson collection? The idea is that I wouldn't have to fetch the persons collection, nor filter the answer collection by personId.
Is it overengineering for me to use monthly buckets? I thought that maybe I'd save some money if I fetched collection('answersByPerson').doc('$personId').collection($month) instead of fetching collection('answersByPerson').doc('$personId').where(...).
Also, would it make sense for me to put the aggregations inside the answers collection? Would I be able to updated it without using a cloud function, or could this lead to issues with synchronization?
edit: I've searched about this and it seems that the term "bucketing" is not that common, I've taken it from this article.
Firestore charges for the number of documents read, and the bandwidth consumed; it does explicitly not charge for the number of documents it has to search through. If you can write a query to get exactly the documents you need from the combined collection, then the cost will be exactly the same between these two operations. More uniquely: so will the performance, as Firestore's performance depends only on the amount of data you retrieve and not on the size of the collection.
I have been using Postgres to store time-series sensor data but I am weighing the cost of using Firestore cause I prefer the serverless nature of Firestore. My only concern is the cost of Firestore because I am paying for every read. I want to be able to display this sensor information on my web app. Now, I am taking data every 10 seconds and theres over 400+ sensor points (400 columns per row in my postgres table)
Currently, if a user queries for a week's work of data that's about 60,000 rows of data, but I optimise it by just taking every nth value to "feather" the data. So by taking every 20th row for example, I have reduced the return of the data to 3000 rows which is manageable and still the chart shows a clear trend.
I want to be able to do this in Firestore to save costs, because if a user queries for a week's data, I am paying for 60000 document reads which I can't display all those data points on the web app anyway. I have tried searching for ways to query firestore to take the Nth row of data, but haven't found any concrete solutions.
Does anybody have any recommendation how I can optimise my Firestore costs for time series data or perhaps any other cheap serverless methods to manage this data?
Firestore doesn't offer any way to "feather" data from queries, as you say. What you could do instead is put an integer in each document that describes its "Nth" value, then query for only those "N" that you want.
I'm wondering whats the better structure for my Firestore database.
I want to create some sort of appointment manager for employee where I can show every employee its appointment for some date. I have thought of these two options:
Option:
Every employee has a collection Appointments where I save all the upcoming appointments. The appointment documents would have a column date.
When I want to load all appointments for a date I would have to query all appointments by this date.
Option:
Every employee has a collection Workdays with documents for each day. These workday documents would have the column date. And then a collection with Appointments where I save the appointments for a workday.
When I want to load all appointments, I would have to query the Workdays collection for the correct date and then load all its Appointments.
I expect an average workday to contain 10-20 appointments. And let's say I save appointments for the next 30 days. For option 1, I would then have to query 300-600 documents down to 10-20.
For option 2 I would have 30 documents and query it for 1 documents. Then load around 10-20 documents.
So in option 2 I would have to query fewer documents, but I would have to wait until the query is finished and then load 10-20 further documents. While for option 1, I would have to query more documents but once this query is finished I wouldn't have to load any more documents.
I'm wondering what option is the faster for my use case - any thoughts?
Documents are not necessarily tabular (columnar). Keep it simple, follow their documentation, Choose a data structure, and do not overthink optimizing search. Leave query optimization to the Firebase platform/team as there are several search approaches which might be implemented, depending on the type of data you are querying for. Examples include source Wikipedia:
Dijkstra's algorithm, Kruskal's algorithm, the nearest neighbour
algorithm, and Prim's algorithm.
Again, provided you basically follow their data structure guideline, the optimal search approach should be baked in to the Firebase/Firestore platform and may be optimized by them when possible. In short, the speed of the compute platform will amaze you. Focus on higher level tasks relating to your particular app.
If the total number of documents read in each case is the same, the faster option will be the one that reduces the number of round trips between the client and server. So, fewer total queries would be better. The total number of documents in the collection is not going to affect the results very much.
With Firestore, performance of queries scales with the size of the result set (total number of documents read), not with the total size of the collection.
The first option is rather straightforward, and is definitely the way you'd do it with a relational database. The date column could become a natural foreign key to any potential workday table, assuming one is even needed.
The second option is more complicated because there are three data cases:
Workday does not exist
Workday does exist but has no appointments in the list
Workday exists and has appointments
In terms of performance, they are not likely to be very different, but if there is a significant gap, I'd gamble option 1 to be more efficient.