A newbie here.
I need help regarding Firebase sub collection referencing in a structured way where a user can select and pass information through sub collection.
=> Tournaments => Cities => Cairo => Year => High Goal => Team A
That goes like this from the root I have a list of cities let’s say
1. Cairo
2. Alexandria
3. Sixth October
I want to keep record of tournaments hosted each year by these cities based on years. Let’s say
a.
1. 2019
2. 2018
3. 2017
Each year there are 3 different competed cups let’s say
1. High goal
2. Medium goal
3. Low goal
Every competed cup has teams that participate in the tournament
1. Team A
2. Team B
3. Team C
I have added a visual representation of the app designed in adobe XD.
Data modeling for NoSQL databases depends as much on the use-cases of your app as it depends on the data that you store. So there is no "perfect" data model, nor are there nearly as many best practices (or normal forms) for NoSQL databases are there are for relational data models.
Firestore (which you seem to be looking to use), offers a few tools for modeling data:
The discrete unit of storage is called a document. Each document contains fields of various types, including nested fields, and a document can be up to 1MB in size.
Documents are stored in named collections.
You can nest collections under a document, and build hierarchies that way.
Each document has a unique path of the form /collection1/docid1/collection2/doc2 etc.
To write to a document, you must know its exact path.
You can query a collection for a subset of the documents in there.
You can query across all collections with the same name, no matter their path in the database.
The performance of queries depends solely on the number of documents you retrieve, and not on the number of documents in the collection(s).
There are probably quite a few more rules, but these should be enough to get you started.
I typically recommend writing a list of your top 3-5 use-cases, and determining what reads/queries you need for that. With those queries, you can then start defining your data model, and implementing your application code.
Then each time you add a use-case, you figure out how to read/write the data for that use-case, and potentially change/expand the data model to allow for the new and existing use-cases. If you get stuck when adding a specific use-case, report back here and we can try to help.
Some good additional material to get started:
NoSQL data modeling
Getting to know Cloud Firestore
Firebase for SQL developers, which is for Firebase's other NoSQL database, but is a great primer on NoSQL modeling too.
Related
I'm building an apps that involved travel planning using flutter. this app will help people plan their travel by providing few options for them to choose from either cheapest, fastest, shortest etc.
I'm quite new with firebase and i need some advice with the data structure, I was thinking of having public transportation such as train. this train will have it's own schedule. What is the best way to structure this schedule inside the firestore. so that i could create a view that will display train schedule.
If it were me, I would want to map out how I intend to interact with the application. I usually draw this out by hand as I find it much quicker than trying to model something electronically. I review other similar apps at the same time and try and identify data that are missing. Start simple and build up the various data points that you want to include.
The main thing to note is that Cloud Firestore is a NoSQL, document-oriented database; there are no tables or rows. You store data in documents, which are organised into collections that contain a set of key-value pairs.
Once you have a basic structure, you can opt for one of the Firestore data structures and test it out:
Documents
Multiple collections
Subcollections within documents
Nested data
A typical use case for this might involve a chat app, and you want to store a user's three most recently visited chat rooms as a nested list in their profile.
Subcollections
You can create collections within documents when you have data that might expand over time; for example, you might create a collection of users or messages within chat room documents.
Root-level collections
You can create collections at the root level to organise distinct data sets for users and another for rooms and messages.
The firestore docs don't have an in depth discussion of the tradeoffs involved in using sub-collections vs top-level collections, but do point out that they are less flexible and less 'scalable'. Given that you sacrifice flexibility in setting up your data in sub-collections, there must be some definite plus sides besides a mentally satisfying structure.
For example how does the time for a firestore query on a single key across a large collection compare with getting all items from a much smaller collection?
Say we want to query a large collection 'People' for all people in a family unit. Alternatively, partition the data by family in the first place into family units.
People -> person: {family: 'Smith'}
versus
Families -> family: {name:'Smith'} -> People -> person
I would expect the latter to be more efficient, but is this correct? Are the any big-O estimates for each?
Any other advantages of sub-collections (eg for transactions)?
I’ ve got some key points about subcollections that you need to be aware of when modeling your database.
1 – Subcollections give you a more structured database.
2 - Queries are indexed by default: Query performance is proportional to the size of your result set, not your data set. So does not matter the size of your collection, the performance depends on the size of your result set.
3 – Each document has a max size of 1MB. For instance, if you have an array of orders in your customer document, it might be a good idea to create a subcollection of orders to each customer because you cannot foresee how many orders a customer will have. By doing this you don’t need to worry about the max size of your document.
4 – Pricing: Firestore charges you for document reads, writes and deletes. Therefore, when you create many subcollections instead of using arrays in the documents, you will need to perform more read, writes and deletes, thus increasing your bill.
To answer the original question about efficiency:
Querying all people with the family 'Smith' from the people top-level collections really is not any slower than asking for all the people in the 'Smith' family sub-collection.
This is explained in the How to Structure Your Data episode of the Get to Know Cloud Firestore video series.
There are some trade-offs between top-level collections and sub-collections to be aware of. Depending on the specific queries you intend to use you may need to create composite indexes to query top-level collections or collection group indexes to query sub-collections. Both these index types count towards the 200 index exemptions limit.
These trade-offs are discussed in detail near the bottom of the Understanding Collection Group Queries blog post and in Maps, Arrays and Subcollections, Oh My! episode of the Get to Know Cloud Firestore video series.
I've linked to the relevant parts of both videos.
I was wondering about the same thing. The documentation mainly talks about arrays vs sub-collections. My conclusion is that there are no clear advantages of using a sub-collection over a top-level collection. Sub collections had some clear technical limitations before, but I think those are removed with the recent introduction of collection group queries.
Here are some advantages of both approaches:
Sub collection:
Your database "feels" more structured as you will have less top-level collections listed.
No need to store a reference/foreign key/id of the parent document, as it is implied by the database structure. You can get to the parent via the sub collection document ref.
Top-level collection:
Documents are easier to delete. Using sub collections you need to make sure to first delete all sub collection documents before you delete the parent document. There is no API for this so you might need to roll your own helper functions.
Having the parent id directly in each (sub) document might make it easier to process query results, depending on the application.
Todd answered this in firebase youtube video
1) There's a limit to how many documents you can create per minute in
a single collection if the documents have an always-increasing value
(like a timestamp)
2) Very large collections don't do as well from a
performance standpoint when you're offline. But they are generally
good options to consider.
Migrating from realtime database to cloud firestore needs a total redesign of the database. For this I created an example with some main design decisions.
See picture and the database design in the spreadsheet below.
My two questions are:
1 - when I have a one to many relation is it also an option to store information as an array within the document? See line 8 in database design.
2 - Should I include only a reference, or duplicate all information in the one to many relation. See line 38 in the database model.
https://docs.google.com/spreadsheets/d/13KtzSwR67-6TQ3V9X73HGsI2EQDG9FA8WMN9CCHKq48/edit?usp=sharing
In general: keep the data store as shallow as possible, i.e., avoid subcollections and nesting.
Data can be related one-to-one, one-to-many, or many-to-many. Firestore is an automatically indexed realtime datastore. Firestore is often subscribed to rather than just a one time query/response (the realtime nature of the system).
Regarding the Firestore data model, always consider How will I query this data store?. Use subcollections, arrays, and maps sparingly (rarely) and only if you must (and you most likely don't need to). Use auto-id's vs human readable id's, e.g. use 000kztLDGafF4uKb8Cal rather than banana for document ID's.
As app functionality increases, server-side scripting with Cloud Functions for Firebase and/or the Admin SDK becomes an invaluable tool for managing (creating and indexing) many-to-many data relationships. For example, full-text search is not supported in Firestore. This boils down to what seems like a barrier to implementing robust search functionality on your app.
In conclusion, try and avoid subcollections, nesting, arrays, and maps. Follow the keep it simple stupid, KISS, principle. Once your app scales up and/or requires more functionality, server-side scripting can be utilized to to keep your app responsive (fast) while offering robust features.
For Question 1 there's a solution in the firestore docs:
https://cloud.google.com/firestore/docs/solutions/arrays
instead of using an array you use a map of values and set them to 'true' which allows you to query for them, like so:
teachers: {
"teacherid1": true,
"teacherid2": true,
"teacherid3": true
}
And for Question 2, you just need to save the teacher-ids because if you have those you can easily query for the corresponding data.
I have a collection, itemsCollection, which contains a very large amount of small itemDocs. Each itemDoc has a subcollection, statistics. Each itemDoc also has a field "owner" which indicates which user owns the itemDoc.
itemsCollection
itemDoc1
statistics
itemDoc2
statistics
itemDoc3
statistics
itemDoc4
statistics
...
I also have a collection, usersCollection, which contains basic user info.
usersCollection
user1
user2
user3
...
Since each itemDoc belongs to a specific user, it's necessary to display to each user which itemDocs they own. I have been using the query:
db.collection("itemsCollection").where("owner", "==", "user1")
I am wondering if this will scale effectively, i.e. whenever itemsCollection gets to be millions of records? If not, is the best solution to duplicate each itemDoc and its statistics subcollection as a subcollection in the user document, or should I be doing something else?
As Alex Dufter, the product manager from Firebase, explained in one of days at Firebase Dev Summit 2017 that Firestore was inspired in many ways by the feed-back that they had on the Firebase Realtime Database over the years. They faced two types of issues:
Data modelling and querying. Firebase Realtime Database cannot query over multiple properties because it ussaly involves duplication data or cliend-side filtering, which we all already know that is some kind of messy.
Realtime Database does not scale automatically.
With this new product, they say that you can now build an app and grow it to planetary scale without changing a single line of code. Cloud Firestore is also a NoSQL database that was build specifically for mobile and web app development. It's flexible to build all kinds of apps and scalable to grow to any size.
So because the new database was build knowing this iusses, duplication data is not nedeed anymore. So you will not have to worry about using that line of code, even if your data will grow to millions of records, it will scale automatically. But one thing you need to remember, if you will use multiple conditions, don't forget to use the indexes by simply adding them in the Firebase console. Here are two simple examples from the offical documentation:
citiesRef.whereEqualTo("state", "CO").whereEqualTo("name", "Denver");
citiesRef.whereEqualTo("state", "CA").whereLessThan("population", 1000000);
The first assertion is that document style nosql databases such as MarkLogic and Mongo should store each piece of information in a nested/complex object.
Consider the following model
<patient>
<patientid>1000</patientid>
<firstname>Johnny</firstname>
<claim>
<claimid>1</claimid>
<claimdate>2015-01-02</claimdate>
<charge><amount>100</amount><code>374.3</code></charge>
<charge><amount>200</amount><code>784.3</code></charge>
</claim>
<claim>
<claimid>2</claimid>
<claimdate>2015-02-02</claimdate>
<charge><amount>300</amount><code>372.2</code></charge>
<charge><amount>400</amount><code>783.1</code></charge>
</claim>
</patient>
In the relational world this would be modeled as a patient table, claim table, and claim charge table.
Our primary desire is to simultaneously feed downstream applications with this data, but also perform analytics on it. Since we don't want to write a complex program for every measure, we should be able to put a tool on top of this. For example Tableau claims to have a native connection with MarkLogic, which is through ODBC.
When we create views using range indexes on our document model, the SQL against it in MarkLogic returns excessive repeating results. The charge numbers are also double counted with sum functions. It does not work.
The thought is that through these index, view, and possibly fragment techniques of MarkLogic, we can define a semantic layer that resembles a relational structure.
The documentation hints that you should create 1 object per table, but this seems to be against the preferred document db structure.
What is the data modeling and application pattern to store large amounts of document data and then provide a turnkey analytics tool on top of it?
If the ODBC connection is going to always return bad data and not be aware of relationships, then all of the tools claiming to have ODBC support against NoSQL is not true.
References
https://docs.marklogic.com/guide/sql/setup
https://docs.marklogic.com/guide/sql/tableau
http://www.marklogic.com/press-releases/marklogic-and-tableau-build-connection/
https://developer.marklogic.com/learn/arch/data-model
For your question: "What is the data modeling and application pattern to store large amounts of document data and then provide a turnkey analytics tool on top of it?"
The rule of thumb I use is that when I want to count "objects", I model them as separate documents. So if you want to run queries that count patients, claims, and charges, you would put them in separate documents.
That doesn't mean we're constraining MarkLogic to only relational patterns. In UML terms, a one-to-many relationship can be a composition or an aggregation. In a relational model, I have no choice but to model those as separate tables. But in a document model, I can do separate documents per object or roll them all together - the choice is usually based on how I want to query the data.
So your first assertion is partially true - in a document store, you have the option of nesting all your related data, but you don't have to. Also note that because MarkLogic is schema-agnostic, it's straightforward to transform your data as your requirements evolve (corb is a good option for this). Certain requirements may require denormalization to help searches run efficiently.
Brief example - a person can have many names (aliases, maiden name) and many addresses (different homes, work address). In a relational model, I'd need a persons table, a names table, and an addresses table. But I'd consider the names to be a composite relationship - the lifecycle of a name equals that of the person - and so I'd rather nest those names into a person document. An address OTOH has a lifecycle independent of the person, so I'd make that an address document and toss an element onto the person document for each related address. From an analytics perspective, I can now ask lots of interesting questions about persons and their names, and persons and addresses - I just can't get counts of names efficiently, because names aren't in separate documents.
I guess MarkLogic is a little atypical compared to other document stores. It works best when you don't store an entire table as one document, but one record per document. MarkLogic indexing is optimized for this approach, and handles searching across millions of documents easily that way. You will see that as soon as you store records as documents, results in Tableau will improve greatly.
Splitting documents to such small fragments also allows higher performance, and lower footprints. MarkLogic doesn't hold the data as persisted DOM trees that allow random access. Instead, it streams the data in a very efficient way, and relies on index resolution to pull relevant fragments quickly..
HTH!