Firebase: How flat should my data structure be? - firebase

I'm building an app that tracks the user's location and updates Firebase. I've read the documentation about structure data but still have a few questions.
I'm considering structuring the data in one of two ways, but can't determine which one.
users
$id
-position
-other attr
vs:
user_position
$id
users
$id
-other attr.
In what scenario would the first design work best, second?

If you only keep one position per user (as seems to be the case by the fact that you use singular user_position), there is no useful difference between the two structures. A user's position in that case is just another attribute, just one that happens to have two value (lat and lon).
But if you want to keep multiple positions per user, then your first structure is mixing entity types: users and user_positions. This is an anti-pattern when it comes to Firebase Database.
The two most common reasons are:
Say you want to show a list of user names (or any specific, single-value attribute). With the first structure you will also need to read the list of all positions of all users, just to get the list of names. With the second structure, you just read the user's attributes. If that is still much more data than you need, consider also keeping a list of /user_names for optimal read performance.
Many developers end up wanting different access rules for the user positions and the other user attributes. In the first structure that is only possible by pushing the read permission from the top /users down to lower in the tree. In the second structure, you can just give separate permissions to /users and /user_positions.

Related

Firestore data structure for two use cases

I would appreciate some guidance on how to structure data stored within an app. While there are some reasons for the first way, I'm concerned it wouldn't be able to operate efficiently for the second case.
Simplified, the app would contain a list of Places by State. The main use case would be viewing Places within a selected State. The second use case would be that individual users could save specific Places they liked into their profile and view them all at once (showing all state Places in one list).
Option 1- Places saved in one "places" collection, which has a field of "state."
Main use: To show these places by state, the app would query where the "state" field matches the state.
Secondary use: When a user saved the place, the app would save the docID for each place into the user's profile, each of which would need to be retrieved to show the list of places.
Option 2- Have one collection per state.
Main use: To show these places by state, the app would pull all documents within the query and list them out.
Secondary use: When a user saved the place to the user's profile, the app would save the docID for each place into the user's profile, distributed across the different collections, each of which would need to be retrieved to show the list of places.
Goals:
Use the same place document to appear in both the State lists and the user's profile.
Minimize the number of calls/slowness as much as possible in the Secondary use case.
I have been reviewing Firestore data storage guidelines, but I would appreciate any thoughts from experienced developers regarding this data structure.
There is no "perfect", "the best" or "the correct" solution for structuring a Firestore database. We are usually structuring the database according to the queries that we intend to perform.
Regarding storing all the places in a single collection vs. having one collection per state, please note that there is no difference in terms of speed or costs. You'll always have to pay a number of reads that is equal to the number of documents that your query returns. However, if you need to display in your app, for example, all places of all states, then having a collection for each state, will require a separate query for each state.
Furthermore, regarding saving a list of places in a user's profile vs. storing only the IDs, it's a matter of measurement. You should measure how often the details within the places are changed. Remember that if a place is changed, then you should update that data in all places it exists. So if it's not changed so often then you can save the entire place object, otherwise, save only the ID.

Firestore : How to design a Data model to make querying documents that are not exist in an array possible?

I'm trying to find a way to properly desing my Data Model with Firestore. I'm looking for something similar to what Tinder does, showing you people that you have'nt swiped yet, based on your location.
So I ended up with something like :
A User1 has an array of "met people"
A "Haven't yet met user"/ User2 his also a User with the same document model
They all belong in the same "Users" collection
I want to query all the users that this User1 haven't swiped yet
I know that you can't do something like "array_not_contains" or "!=" because all fields that you query need to be indexed.
So I wonder, is this possible to model data to make it work, or the only solution is to drop Firebase because this kind of query is not possible at all?
One alternative can be to store in a collection all the relationships (with theirs status) between all users. But that also means that whenever a user signup, I have to create as many documents as I have users that's really ugly and make a enormous numbers of documents.
EDIT:
Thanks again for your answer and sorry for my late answer.
There is no need to create a new database call since you already got all the users from that area in the first place.
Not If have a large response set, I will limit to a number. (5 in the example below).
And even If I don't limit the number, in the next db call, how I can know that new peoples has been added and how to retrieve only those.
I will not remove them from Users Collection has they can be show to others users.
P.S: I forget User4 in Users Collection pictures.
For User 1, get 5 first matchs, remove existing ones, show User5.
For User2, get 5 first matchs, remove existing ones, show User4, User5.
After users choices, Users are added to their list. Users Collection stay the same.
For User 1, get 5 first matchs, remove existing ones, nothing to show, even if I have a User 6, 7.
To fix that I launch a second query get the new ones but, more the user use the app more query I may need to do to try to display to him existing user in his area.
Maybe I've misunderstood what you named "initial list", for me it is the list object retrieve from my db containing all users (with limit).
EDIT 2:
You can check the answers of Alex Mamo to know how to query documents that are not exist in an array possible.
Let's me explain my use case and why I think, that won't work.
I want to be able to search all users next to me, for trying to do that in Firebase, I store Geopoint. Geopoint can't be really use for now out of the box with Firebase, so I user Geofirestore in a Cloud Function.
I store and update user Geopoints based on theirs locations, so this means user location change by time.
I limit the numbers of Users return by this function.
In my initial state I retrieve users next to me (User1), I get 3 an 4.
Let's say that I store last checked userId to use it later as a cursor for my query (User 4).
Now my geopoint change, and the users in this area changes too.
I request next bunch of users next to me, and I use my previous userId/document to "startAfter" (more on this
here), see the image below, that's won't work.
If I use the cursor (User4), I'll take 5, but not 2, because in the return list, if I order by Id, 2 will be before 4.
Worse, like below, if the return list may not even have user 4 in it, the cursor will be pointless.
My example is a bit simplified and does not take in account what is described in the first answer and my first edit (limited subset of users, data design).
A possible database structure for your app might be:
Firestore-root
|
--- users (collection)
|
--- uid (document)
|
--- acceptedUsers: ["uidOne", "uidTwo"]
|
--- declinedUsers: ["uidThree", "uidFour"]
|
--- //Other user properties
The mechanism is simple. When you first want to show a user profile to the current (authenticated) user, you have to create a query that will return all users (in user area). According to the user decision, you need to add the corresponding uid in either the acceptedUsers array or in declinedUsers array. Once you want to show another users, use the same query but this time, you need to make an extra operation. Once the query returns the users within user location, add all those users to a list. Compare the list that is coming from the database with your exting arrays and remove all the users from both arrays. In this way you'll have a list that contains only users that the actual user didn't see. This extra step is needed to make sure the id of the user does not exist in one of those arrays. In the end, simply choose a random user from the list and show the details to the user. That's it!
One alternative can be to store in a collection all the relationships (with theirs status) between all users. But that also means that whenever a user signup, I have to create as many documents as I have users.....that's really ugly and make a enormous numbers of documents.
This is not an option. This means that you need to write each time a user joins your app an enormous amount of data, which will be very costly. Since everything in Firestore is about the number of read and writes, I think you should think again about this approach. Please see Firestore usage and limits.
Edit:
Let's consider the initial list of users that has 10 records. With other words, all the users within that area are 10. You say that 7 users are already seen, that makes the list contain only the 3 remaining users.
So I display the 3, (or I do another request to get some more) and he check the 3.
Yes, you should display those 3 users and then remove them one by one from the initial list. There is no need to create a new database call since you already got all the users from that area in the first place. Once the list remains empty, you should display a message to the user that in that particular area are no more users to swipe.
When will create another database call?
Only when needed. Which means that you create another call once new users enter that area. Let's say 3 new users are new, you get a list now of 3 user and use the same algorithm.
More my user use the app more it’s difficult to show people that he haven’t seen, because his list become bigger.
If you think that the arrays will grow more than a document can hold, then you should consider storing the users in a collection and not in an array. So in this case, the problem is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. When we are talking about storing text (uids), you can store pretty much but as your array getts bigger, be careful about this limitation.
But if you'll stay within this limits, which I personally think you'll do, you have nothing to worry about.
Edit2:
Not If have a large response set, I will limit to a number. (5 in the example below). And even If I don't limit the number, in the next db call, how I can know that new peoples has been added and how to retrieve only those.
I will not remove them from Users Collection has they can be show to others users.
If you have large amount of data (many users in a single area), yes it's good idea to limit the results, but a much better idea would be to load the data in smaller chunks. In short, get 5 users, remove one by one till the list has zero users, load other 5 users and so on. This can be made using my answer from the following post:
Is there a way to paginate queries by combining query cursors using FirestoreRecyclerAdapter?
The initial list, is the list that you are getting when you first query the database. In this case, the initial list will contain 5 users.

How to easily maintain/update data duplicates in firebase

Question in short, how to easily update a value that is duplicated in multiple locations?
I have spent days to try to grab data structure design in firebase.
I have studied many resources like:
Firebase data structure and url
https://firebase.google.com/docs/database/web/structure-data
Then I got the point that duplicating some data to speed up read action is a key point in firebase.
A typical design: the user's display name shall be duplicated in multiple locations, like in article list, in comment list, in follower list, or following list, etc.. I cannot imagine not to duplicate this piece of data but async retrieve them one by one from users node.
What if a user updates his display name? It seems that we need to update the value in all places, which is a pain in butt to maintain in long term, isn't it?

Firebase data structure and url to use

I'm really new to firebase, want to try out a simple mix-client app on it - android, js. I have a users table and a tasks table. The very first question that comes to my mind is, how to store them (and thus how the url to be)? For example, based on the tasks table, should I use:
/tasks/{userid}/task1, /tasks/{userid}/task2, ...
Or
/{userid}/tasks/task1, /{userid}/tasks/task2, ...
The next question, based on the answer to the first one - why to use any of the versions?
In my opinion, the first version is good because domains are separated.
The second approach is good because data is stored per-user which may make some of the operations easier.
Any ideas/suggestions?
Update: For the current case, let's say there are following features:
show list of tasks for each user
add new task to the list
edit/delete a task by user.
Simple operations.
This answer might come in late, but here's how I feel about the question after a year's experience with Firebase.
For your very first question, it totally depends on which data your application will mostly read and how and in which order ( kind of like sorting ) you expect to read the data.
your first proposal of data structure, that is "/tasks/{userid}/task1", "taks/{userid}/task2"... is good if the application will oftentimes read the tasks as per users with an added advantage of possibly sorting the data by any task's "attribute" if I might call it so.
say each task has got a priority attribute then,
// get all of a user's tasks with a priority of 25.
var userTasksRef = firebase.database().ref("tasks/${auth.uid}");
userTasksRef.orderByChild("priority").equalTo(25).on(
"desired_event",
(snapshot) => {
//do something important here.
});
2. I'll highly advice against the second approach because generally most if not all of the data that is associated to that user will be stored under the "/{userid}/" node and with firebase's mechanism, should a situation be in which you need more than one datum at that path level, it will require you getting that data with all the other data that's associated to that user's node ( tasks and any other data included). I won't want that behavior on my database. Nonetheless, this approach still permits you to store the tasks as per the users or making multiple RESTfull requesting and collecting the required data datum after datum. Suggest fanning out the data structure if this situation is encountered. Totally valid data structure if there don't exist a use case in the application where in datum at the first level of the path is needed and only that datum is needed but rather the block of data available at that path level with all the data at the deriving paths at that level( that is 2nd 3rd ... levels).
As per the use cases you've described, and if the database structure you've given is exhaustive of your database structure, I'll say it isn't enough to cover your use cases.
Suggest reading the docs here. Great and exhaustive documentation of their's.
As a pick, the first approach is a better approach to modelling this data use case in NoSQL and more accurately Firebase's NoSQL database.

Riak solution for querying data by books or unique pages

Consider a set of data called Library, which contains a set of Books and each book contains a set of Pages.
Let's say you are using Riak to store this data, and you need to be access the data in two possible ways:
- Query for a particular page (with a unique id)
- Query for all pages in a particular book (with a unique name)
Additionally, you need to be able to easily update and delete pages of a particular Book.
What would be the best way to accomplish this in Riak?
Obviously Riak Search will do the trick, but maybe is inefficient for what I am trying to do. I am wondering if it makes sense to set up buckets where each bucket can be a Book (which would make for potentially millions of "Book" buckets). Maybe that is a bad idea...
Can this be accomplished with secondary indexes?
I am trying to keep this simple...
I am new to Riak and I am trying to find the best way to accomplish something that is probably relatively simple. I would appreciate any help from the Stack Overflow community. Thanks!
A common way to model master-detail relationships in Riak is to have the master record contain a list of detail record IDs, possibly together with some information about the detail record that may be useful when deciding which detail records to retrieve.
In your example, you could have two buckets called 'books' and 'pages'. The master record in the 'books' bucket will contain metadata and information about the book as a whole together with a list of pages that are included in the book. Each page would contain the ID of the 'pages' record holding the page data as well as the corresponding page number. If you e.g. wanted to be able to query by chapter, you could also add information about which chapters a certain page belongs to.
The 'pages' bucket would contain the text of the page and possibly links to images and other media data that are included on that page. This data could be stored in yet another bucket.
In order to get a specific page or a range of pages, one would first retrieve the master record from the 'books' bucket and then based on the contents of the record the appropriate pages. Even though this requires several GET operations, they are all direct lookups based on keys, which is the most efficient and scalable way to retrieve data from Riak, so it is will perform and scale well.
This approach also makes it simple to change the order of pages and/or chapters as only the master record needs to be updated. Adding, deleting or modifying pages would however require both the master record as well as one or more detail records to be updated, added or deleted.
You can most certainly also solve this problem by adding secondary indexes to the objects and query based on this. Secondary index queries in Riak does however have to include processing on a covering set (generally ring size / n_val) of partitions in order to fulfil the request, and therefore puts a bit more load on the system and generally results in higher latencies than retrieving a single object containing keys through a direct key lookup (which only needs to involve the partitions where the object is actually stored).
Although maintaining a separate object containing indexes adds a bit of extra work when inserting or deleting pages/entries, this approach will generally result in more efficient reads, as only direct key lookups are required. If your application is heavy on reads, it probably makes sense to use this approach, while secondary indexes could be more efficient for a write heavy application as inserts and modifications are made cheaper at the expense of more expensive reads. You can however always add secondary indexes just in case in order to keep your options open.
In cases like this I would usually recommend performing some benchmarks to test the solutions and chech which solution that best matches you particular performance and scaling requirements.
The most efficient way will be to store hole book as an one object, and duplicate it's pages as another separate objects.
Pros:
you will be able to select any object by its key(the most cheapest op
in riak is kv query)
any query will be predicted by latency
this is natural way of storing for riak
Cons:
If you need to update any page you must update whole book, and then page. As riak doesn't have atomic ops, you must to think how to recover any failure situation (like this: book was updated, but page was not).
Riak is about availability predictable latency, so if you will use something like 2i to collect results, it will make unpredictable time query, which will grow with page numbers

Resources