Suggestions for organizing data in Firestore [closed] - firebase

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 months ago.
Improve this question
Does anyone have good ideas about what kind of a data model makes sense in Firestore for time dependent data?
I have a lot of event data that I would like to store in Firestore and then run an analysis on it
Each event has a timestamp and I would like to run the aggregated analysis for example for 1 day of data, 7 day of data, X days of data, 1 month, X months, etc
How should this be setup in firestore, 7 days of event data is already a lot of data that and I can't return it to the client and make the analysis there.
If I aggregate some predefined set of days beforehand in firestore it is then locked to only those days and you can't choose an arbitrary amount of days.
I would also need to keep updating the aggregated data every time there is new data
Any help much appreciated!

As I understand you're looking to perform a query similar to:
SELECT hits, COUNT(*) FROM event_type_api GROUP BY hits WHERE start_date > TODAY - X
Firestore is a NoSQL database, but that doesn't mean that you cannot know the number of documents in a query. You cannot in SQL terms, but you can count them. It's a little costly to read all documents in a collection to only have the number of documents. That's why you need to call count(). As you already mentioned, there is also no "GROUP BY" present in Firestore. However, we can achieve almost the same thing.
Assuming that you'll create a collection called "hits" in which you store documents that have a field of type timestamp, then you can perform the following query:
val queryByTimestamp = db.collection("hits").whereGreaterThan("timestamp", TODAY - X)
If you want to know how many documents the query returns, you need to call count() like this:
val numberOfDocuments = queryByTimestamp.count()
The last thing is related to grouping. As mentioned before, Firestore doesn't offer any aggregation queries such as grouping. However, there are two solutions for this use case. You can get all the documents in the hits collection, and then group them in your application code by whatever fields you need. Or you can create separate collections of pre-grouped documents to satisfy your needs. The latter is the recommended approach. And remember, that duplicating data is a quite common practice when it comes to NoSQL databases.

Related

Firestore data structure suggestion for a survey app [duplicate]

This question already has answers here:
Firestore: How to get random documents in a collection
(15 answers)
Closed 1 year ago.
Here is my current DB structure, I have a collection store all the survey question documents, and under each question doc, it has an Answers sub-collection that stores all the users who answered the question. The challenging part is that how do I randomly load 8 questions that are not answered by a specific user, without query the entire question collection? What is the least costly approach? Any suggestions are helpful. Thanks
Below are my Db structure:
The best option that you have is to store all question IDs you have in your application into a document in an array data type. If you are worried about the 1 MiB limitation, then you should consider sharding the IDs over multiple documents.
Every time a user answers 8 questions, add those IDs in an array in the User object.
The challenging part is that how do I randomly load 8 questions that are not answered by a specific user, without query the entire question collection?
To load 8 random questions, all you have to do is to download both arrays and remove from the question IDs all the IDs the users already answered. In this way, you'll only have an array of question IDs the user didn't answer. Get other 8 IDs, add them to the User object, and so on.
Remember also that you also need to keep the data in sync, meaning that each time you add a new question, add to the array as well. Do it for the delete operation too.
You should store the ids of the questions the user has answered in the user data. Then you can just query enough docs that don't match those ids by using a where clause. https://firebase.google.com/docs/firestore/query-data/get-data#get_multiple_documents_from_a_collection

Cosmos DB data modelling to optimize search [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I watched this video on data modelling in Cosmos DB.
In the video, it is explained that if you can model your data such that your most common queries are in partition queries, then you can minimize RUs, which in turn minimizes cost and maximizes performance.
The example used in the video is a blogging system. They showed that by moving things around such that Blog Posts and Comments are stored as separate entities in the same collection all partitioned by blogId they could achieve a low RU for a common query.
They then showed that searching for all blog posts by a specific user, being a cross partition query, is very expensive. So they then duplicate all blog post data and add each blog post as a separate entity to the users collection, which is already partitioned by userId. Searching for posts by a user is now cheap. The argument is storage is much cheaper than CPU time so this is a fine thing to do.
My question is: do I continue to follow this pattern when I want to make more things efficiently searchable? For example, I want to be able to search on blog topic (of which there could be many per blog post), a discrete blog rating, and so on.
I feel like extending this pattern for each search term is unsustainable. In these cases, do I just have to live with high RU searches or is there some clever way of making things efficient?
The essentially comes down to knowing whether the cost of using change feed to copy data from one container to another is less than the cost of doing cross-partition queries. This requires knowing the access patterns of your application and also requires measuring the average cost of these queries versus the cost of using change feed to make another copy. Change feed consumes 2 RU/s when it polls the container, then 1 RU for each 1Kb or less read from the source container and ~8 RU for each 1Kb or less insert on target container depending on your index policy. Multiply that by the rate at which data is inserted or updated. Then calculate this per day or per month to compare cost.
If what you're looking for is to do free text search on your data, you may want to look at using Azure Search. This is simpler than using the approach using change feed, but Azure Search can be quite expensive as well.

Firebase Realtime Database: Good Practice (Filter, Ordering, etc..) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am beginner on Firebase (and beginner on NoSQL).
I would like to learn the good practices about filtering data.
For example with this simple realtime database:
How you would go about filtering this posts?
Example 1: List all posts (without any filter):
firebase.database().ref('post/').once('value').then((snapshot) => {
});
Example 2: List all posts order by createdAt and limit at 3:
firebase.database().ref('post/').orderByValue('createdAt').limitToLast(3).once('value').then((snapshot) => {
});
Example 3: List all posts order by createdAt, endAt(1605972663986) and limit at 3:
firebase.database().ref('post/').orderByValue('createdAt').endAt(1605972663986).limitToLast(3).once('value').then((snapshot) => {
});
But:
How can I get all posts of user "et6e1AKrhk2GwqjCAKUHK5Bjlgu2" order by "createdAt" and limit at 3 ?
How can I get all posts in category [9, 12] order by "createdAt" ?
How can I get all posts exclude category [2, 4] order by "createdAt" ?
Should I retrieve all the posts and then filter them myself? (is it a good way? if I have 100 millions of posts, what should I do ?)
Sorry If my questions look like stupid but I am 100% beginner and I don't know not yet the goods practice (Currently I have over 1000 messages and I need to filter [with pagination] them based on the current user settings).
Thank you
Firebase charges per document read, so reading 100 million posts at once wouldn’t be a good idea as I’m sure you know.
The query questions you asked seem rather simple and should not be a problem. Firebase does have limitations on queries and you should review all the documentation.
When I started with Firebase I used real-time database initially but eventually switch to Cloud Firestone as I found the querying to be much more powerful using where conditions. (Fire store also has some major limitations: https://firebase.google.com/docs/firestore/query-data/queries#query_limitations )
It may be necessary to add fields to your documents for the sole purpose of filtering and querying. Sometimes it works out where you need to sort data client side, which shouldn’t be a big deal.
I suggest you run tests to ensure you can query all data properly and add fields and ensure the DB fits your needs/research limitations before you dive in too deep! Your DB structure looks good!

how to query in flutter cloud firestore with multiple conditions(multiple where clauses)? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
widget.filteredRef= FirebaseFirestore.instance
.collection('users')
.doc(widget.uid)
.collection('incomes')
.where('date',isGreaterThanOrEqualTo: startTimeStamp,isLessThanOrEqualTo: endTimeStamp)
.where('amount',isGreaterThanOrEqualTo: amountFrom,isLessThanOrEqualTo: amountTo)
.snapshots();
Hi, I'm new to flutter. I want to specify a query to retrieve income data from specific range of date and amount of the users as a list from flutter cloud firestore. Example, Date from 01/01/2020 to 01/02/2020 of amount 0 to 1000. But using two where clause or two orderby or one orderby with one where clause is not working. Please tell me the way get as a stream.
Here is my firestore income document link:
https://drive.google.com/file/d/1lAbF1t641nBK4hRn5zLT-4eNUQ-FyajW/view?usp=sharing
You will need to create an Index in the Firebase Console itself. This is because firebase stores basically long arrays of the documents, and does something like this: Query = CorrectIndex[start:end], which it cannot do if the data is not sorted. It makes such indices for every item of the document, but not for any combinations (That would be to expensive for things you probably do not need). So, you need to tell Firebase that it needs to make an correctly sorted index for every combination you want to use, so it can be fast when you start querying. If I am not mistaken, in your error message a link to your firebase console is already given. Click that, and you will be guided through the process.
It is, however, not completely correct. You will need to remake that index to use a "CollectionGroup". Everything else can stay the same. Anyway, some good series about firebase is this: https://www.youtube.com/watch?v=Ofux_4c94FI.

Using Firestore as a Cache for SQL Aggregation [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm not sure if this is the place to ask this, but I have a best practices question.
I have a dashboarding service fed by Salesforce data that displays the number of Task X performed this week (X being Opportunities Closed - Won, Leads Created, etc).
Currently, the data is being pulled regularly and stored in a SQL database, which is mapped to a REST API that the Client App calls to get the aggregations between two date values, and will be fed additionally by Webhook calls via SF's Insert Triggers.
I want to know if having a Firestore Collection as a Cache for Aggregated SQL is a good idea, or if there is a better approach. The benefits I see are reduced traffic on my SQL server, instant updates (if the "cache" (Firestore) is updated, the client's value updates instantly as well).
When data is pulled from SF or a new record is received via the Insert Trigger/Webhook, I can update the Firestore record and the client will receive the change immediately.
My idea for a Firestore Document would be
{
user: "123",
sfOwnerId: "124",
sfTaskType: "Opportunities Closed Won This Week",
count: 23
}
Is this a good idea? Is there a better one out there?
Thank you in advanced!
Your strategy of storing the aggregated data is what the Firestore documentation suggests for aggregation, so I think it's pretty solid idea.
An alternative strategy would be to only store the Salesforce data in Firestore as it comes in, not aggregated, and let the client perform the aggregation. This can be achieved by subscribing to real-time updates to a query of Collection. In this setup, you would perform the calculation within the onSnapshot callback (assuming you're using the Web environment).
The advantage here is a possible increase to performance, since Cloud Functions often suffer from "cold start" latency.
Note: Several of the recommendations in this document center around
what is known as a cold start. Functions are stateless, and the
execution environment is often initialized from scratch, which is
called a cold start. Cold starts can take significant amounts of time
to complete. It is best practice to avoid unnecessary cold starts, and
to streamline the cold start process to whatever extent possible (for
example, by avoiding unnecessary dependencies).
Source

Resources