Need Inputs for Database Structure in firebase - firebase

1.My query is regarding Firebase database structure for the below requirement and how to normalize the data.
I will have 0.5M records of members with classified functions under team categories, identified across multiple states -> regions -> zones.
2.Next, when a new member fills the form, the region tab should auto-populate based on the state chosen. So is the case of zones.
As a newbie to Firebase, I have planned to create States key_id and key_ids by push method for regions as well. Will this help in addressing most of operational queries on members with functions across state / regions / zones.
3.Will normalizing the data based on districts / members help.
I plan to duplicate the state, region & team category attributes in Members data to have efficient structure and reduced query time on vast no# of records.

Related

Firebase / NoSQL - How to aggregate data for statistics

I'm creating my first ever project with Firebase, and I come to the point when I need some statistics based on user input. I know Firebase (or NoSQL databases in general) are not ideal for statistics but they work for me in any other cases so I would like to give it a try.
What I have:
I work on the application where people can invite a friend to work for their company, so I have a collection of "referrals" where ID of each referral is basically UserID of a user to who the referral belongs, and then there is a subcollection with name "items" where data are stored.
How my data looks like:
Each item have these data:
applicant
appliedDate
position(part of position is positionId & department on which this position is coming from)
status
What I wanted is to let user to make statistics based on:
date range
status
department
What I was thinking about:
It's probably not the best idea to let firebase iterate over all referrals once users make requests as it may get really expensive on firebase. What I was thinking of is using cloudfunctions to calculate statistics always when something change e.g. when a new applicant applies I will increase the counter by one and the same for a counter to a specific department. However I feel like this make work for total numbers or for predefined queries e.g. "LAST MONTH" but once I will not know what dates user will select it start to get tricky.
Any idea how can I design something like this?
Thanks a lot!
What you're considering is the idiomatic approach to calculate aggregated in Firestore, and most NoSQL databases. If you follow this pattern, Firestore is quite well suited to storing statistics.
It's ad-hoc statistic, like the unknown data range, that are trickier. Usually this comes down to storing the right values to allow you to get rid of the need to read an unknown number of documents to calculate a value.
For example, if you store counters for the statistics per month, week, day and hour, you can satisfy a wide range of date ranges with a limited number of read operations. You may need to read multiple documents, but the number of documents to read depends on the range, and not on the total number of documents in the database.
Of course, for the most flexible ad-hoc querying, you may still want to consider another solution, such as BigQuery, which was made precisely for this use-case.

Firestore data modelling for travel planner

I'm building an apps that involved travel planning using flutter. this app will help people plan their travel by providing few options for them to choose from either cheapest, fastest, shortest etc.
I'm quite new with firebase and i need some advice with the data structure, I was thinking of having public transportation such as train. this train will have it's own schedule. What is the best way to structure this schedule inside the firestore. so that i could create a view that will display train schedule.
If it were me, I would want to map out how I intend to interact with the application. I usually draw this out by hand as I find it much quicker than trying to model something electronically. I review other similar apps at the same time and try and identify data that are missing. Start simple and build up the various data points that you want to include.
The main thing to note is that Cloud Firestore is a NoSQL, document-oriented database; there are no tables or rows. You store data in documents, which are organised into collections that contain a set of key-value pairs.
Once you have a basic structure, you can opt for one of the Firestore data structures and test it out:
Documents
Multiple collections
Subcollections within documents
Nested data
A typical use case for this might involve a chat app, and you want to store a user's three most recently visited chat rooms as a nested list in their profile.
Subcollections
You can create collections within documents when you have data that might expand over time; for example, you might create a collection of users or messages within chat room documents.
Root-level collections
You can create collections at the root level to organise distinct data sets for users and another for rooms and messages.

How to achieve sorting by any attribute of an item in DynamoDB

I have a DynamoDB structure as following.
I have patients with patient information stored in its documents.
I have claims with claim information stored in its documents.
I have payments with payment information stored in its documents.
Every claim belongs to a patient. A patient can have one or more claims.
Every payment belongs to a patient. A patient can have one or more payments.
I created only one DynamoDB table since all of aws dynamodb documentations indicates using only one table if possible is the best solution. So I end up with following :
In this table ID is the partition key and EntryType is the sortkey. Every claim and payment holds its owner.
My access patterns are as following :
Listing all patients in the DB with pagination with patients sorted on creation dates.
Listing all claims in the DB with pagination with claims sorted on creation dates.
Listing all payments in the DB with pagination with payments sorted on creation dates.
Listing claims of a particular patient.
Listing payments of a particular patient.
I can achieve these with two global secondary indexes. I can list patients, claims and payments sorted by their creation date by using a GSI with EntryType as a partition key and CreationDate as a sort key. Also I can list a patient's claims and payments by using another GSI with EntryType partition key and OwnerID sort key.
My problem is this approach brings me only sorting with creation date. My patients and claims have much more attributes (around 25 each) and I need to sort them according to each of their attribute as well. But there is a limit on Amazon DynamoDB that every table can have at most 20 GSI. So I tried creating GSI's on the fly (dynamically upon the request) but that also ended very inefficiently since it copies the items to another partition to create a GSI (as far as I know). So what is the best solution to sort patients by their patient name, claims by their claim description and any other fields they have?
Sorting in DynamoDB happens only on the sort key. In your data model, your sort key is EntryType, which doesn't support any of the access patterns you've outlined.
You could create a secondary index on the fields you want to sort by (e.g. creationDate). However, that pattern can be limiting if you want to support sorting by many attributes.
I'm afraid there is no simple solution to your problem. While this is super simple in SQL, DynamoDB sorting just doens't work that way. Instead, I'll suggest a few ideas that may help get you unstuck:
Client Side Sorting - Use DDB to efficiently query the data your application needs, and let the client worry about sorting the data. For example, if your client is a web application, you could use javascript to dynamically sort the fields on the fly, depending on which field the user wants to sort by.
Consider using KSUIDs for your IDs - I noticed most of your access patterns involves sorting by CreationDate. The KSUID, or K-Sortable Globally Unique Id's, is a globally unique ID that is sortable by generation time. It's a great option when your application needs to create unique IDs and sort by a creation timestamp. If you build a KSUID into your sort keys, your query results could automatically support sorting by creation date.
Reorganize Your Data - If you have the flexibility to redesign how you store your data, you could accommodate several of your access patterns with fewer secondary indexes (example below).
Finally, I notice that your table example is very "flat" and doesn't appear to be modeling the relationships in a way that supports any of your access patterns (without adding indexes). Perhaps it's just an example data set to highlight your question about sorting, but I wanted to address a different way to model your data in the event you are unfamiliar with these patterns.
For example, consider your access patterns that require you to fetch a patient's claims and payments, sorted by creation date. Here's one way that could be modeled:
This design handles four access patterns:
get patient claims, sorted by date created.
get patient payments, sorted by date created.
get patient info (names, etc...)
get patient claims, payments and info (in a single query).
The queries would look like this (in pseudocode):
query where PK = "PATIENT#UUID1" and SK < "PATIENT#UUID1"
query where PK = "PATIENT#UUID1" and SK > "PATIENT#UUID1"
query where PK = "PATIENT#UUID1" and SK = "PATIENT#UUID1"
query where PK = "PATIENT#UUID1"
These queries take advantage of the sort keys being lexicographically sorted. When you ask DDB to fetch the PATIENT#UUID1 partition with a sort key less than "PATIENT#UUID1", it will return only the CLAIM items. This is because CLAIMS comes before PATIENT when sorted alphabetically. The same pattern is how I access the PAYMENT items for the given patient. I've used KSUIDs in this scenario, which gives you the added feature of having the CLAIMS and PAYMENT items sorted by creation date!
While this pattern may not solve all of your sorting problems, I hope it gives you some ideas of how you can model your data to support a variety of access patterns with sorting functionality as a side effect.

Why is using nested collections in Firestore not good

I am making an app and I am trying to figure out why using nested collections is frowned upon by Firestore. The app is a expense tracking app and the data is only relevant to the logged in user and that user never cares about any other user. There are two ways that I have found to structure the data. One uses a few more levels of nesting than the other. The following structures mean:
collectionName: valueNames
subcollectionName: valueName
Structure 1 (Not as nested):
user:
month: totalSpent, startDate, endDate
transactions: categoryId, amount, timestamp
categories: monthId, name, totalSpent
Structure 2 (More nested):
user:
month: totalSpent, name, startDate, endDate
categories: name, totalSpent
transactions: categoryName, amount, timestamp
Can someone tell me the advantages of structure 1 as opposed to structure 2? Considering structure 2 seems to be easier to query and I do not have to keep track of multiple id's I can just get the sub collection. This would also make it easier to track previous months to show the user later when they want to analyze their spending.
Structure 1 allows you to view transactions and categories across multiple months. You cannot query across subcollections (see Is it possible to query multiple document sub-collections in Cloud Firestore?) and so with Structure 2 you would not be able to query all transactions across months or categories.
Explained
With Structure 2 you would need to query the months first, then pick a single month and query the categories within that month, then pick a category (or iterate over each one) and query for the transactions in that category. To aggregate category spending for the year you would need to make 12 calls, one for each month.
With Structure 1 you could query all transactions, limit by date range, limit by category, or a combination of the above. You could query all categories for the year in one go to sum the values for a year overview. Structure 1 gives you a lot more performant queries.
Summary
Remember, Firestore is not like Firebase Realtime Database where you can select all the data in a given tree structure at once. You will need to make a query at each level of the tree (each collection) to pull data.
There is nothing wrong in creating those collections as long as you remember to delete them with the actual document. In short deleting a document does not delete the subcollection it contains. This is how it works:
Each document in cloud firestore contains reference or path to the subcollections within it (not the whole subcollection), so when you delete a document, every field gets deleted including the field that stored the reference. Mean while the actual subcollection now lies in the form of garbage which you cannot access because you deleted it's path reference.
Subcollections are actually an improvement over the natural json data flow, but since cloud firestore is in beta version, some of the features (like deleting the subcollections along with documents) may be released when it graduated from beta or later on.
Main advantage of using subcollections is that you save user's data because when you query the document data, subcollection data is not fetched ie. queries are shallow.

Trying to understand database denormalization, Is this database denormalized?

I've been struggling for a couple of days trying to figure out the best way to design a database of a large data set on Firebase, I even wrote a question on database administration site.
I came up with a design, I don't know that's what's called denormalized data or not. I want to minimize querying time of data and also not making inserting/updating data so hard.
Here's my design:
Is that the right database design for this kind of data ?
(Please check my question at database administration site for more details about the nature of the data).
But also here's a short description of the data nature:
So I have an affiliator_category which maybe banks, clubs or organisations. And each category contains a number of affiliators and each affiliator contains number of stores divided into store_category, each store has a number of offers.
And for the user side (the one who do the shopping). A users has a number of memberships in several affiliators, and a number of spendings he/she does.

Resources