In firebase, is modeling many-many relationships using a separate endpoint a good idea? - firebase

Suppose I have a typical users & groups data model where a user can be in many groups and a group can have many users. It seems to me that the firebase docs recommend that I model my data by replicating user ids inside groups and group ids inside users like this:
{
"usergroups": {
"bob": {
"groups": {
"one": true,
"two": true
}
},
"fred": {
"groups": {
"one": true
}
}
},
"groupusers": {
"one": {
"users": {
"bob": true,
"fred": true
}
},
"two": {
"users": {
"bob": true
}
}
}
}
In order to maintain this structure, whenever my app updates one side of the relationship (e.g., adds a user to a group), it also needs to update the other side of the relationship (e.g., add the group to the user).
I'm concerned that eventually someone's computer will crash in the middle of an update or something else will go wrong and the two sides of the relationship will get out of sync. Ideally I'd like to put the updates inside a transaction so that either both sides get updated or neither side does, but as far as I can tell I can't do that with the current transaction support in firebase.
Another approach would be to use the upcoming firebase triggers to update the other side of the relationship, but triggers are not available yet and it seems like a pretty heavyweight solution to post a message to an external server just to have that server keep redundant data up to date.
So I'm thinking about another approach where the many-many user-group memberships are stored as a separate endpoint:
{
"memberships": {
"id1": {
"user": "bob",
"group": "one"
},
"id2": {
"user": "bob",
"group": "two"
},
"id3": {
"user": "fred",
"group": "one"
}
}
}
I can add indexes on "user" and "group", and issue firebase queries ".orderByChild("user").equalTo(...)" and ".orderByChild("group").equalTo(...)" to determine the groups for a particular user and the users for a particular group respectively.
What are the downsides to this approach? We no longer have to maintain redundant data, so why is this not the recommended approach? Is it significantly slower than the recommended replicate-the-data approach?

In the design you propose you'd always need to access three locations to show a user and her groups:
the users child to determine the properties of the user
the memberships to determine what groups she's a member of
the groups child to determine the properties of the group
In the denormalized example from the documentation, your code would only need to access #1 and #3, since the membership information is embedded into both users and groups.
If you denormalize one step further, you'd end up storing all relevant group information for each user and all relevant user information for each group. With such a data structure, you'd only need to read a single location to show all information for a group or a user.
Redundancy is not necessarily a bad thing in a NoSQL database, indeed precisely because it speeds things up.
For the moment I would go with a secondary process that periodically scans the data and reconciles any irregular data it finds. Of course that also means that regular client code needs to be robust enough to handle such irregular data (e.g. a group that points to a user, where that user's record doesn't point to the group).
Alternatively you could set up some advanced .validate rules that ensure the two sides are always in sync. I've just always found that takes more time to implement, so never bothered.
You might also want to read this answer: Firebase data structure and url

Related

Firebase RTD, atomic "move" ... delete and add from two "tables"?

In Firebase Realtime Database, it's a pretty common transactional thing that you have
"table" A - think of it as "pending"
"table" B - think of it as "results"
Some state happens, and you need to "move" an item from A to B.
So, I certainly mean this would likely be a cloud function doing this.
Obviously, this operation has to be atomic and you have to be guarded against racetrack effects and so on.
So, for item 123456, you have to do three things
read A/123456/
delete A/123456/
write the value to B/123456
all atomically, with a lock.
In short what is the Firebase way to achieve this?
There's already the awesome ref.transaction system, but I don't think it's relevant here.
Perhaps using triggers in a perverted manner?
IDK
Just for anyone googling here, it's worth noting that the mind-boggling new Firestore (it's hard to imagine anything being more mind-boggling than traditional Firebase, but there you have it...), the new Firestore system has built-in .......
This question is about good old traditional Firebase Realtime.
Gustavo's answer allows the update to happen with a single API call, which either complete succeeds or fails. And since it doesn't have to use a transaction, it has much less contention issues. It just loads the value from the key it wants to move, and then writes a single update.
The problem is that somebody might have modified the data in the meantime. So you need to use security rules to catch that situation and reject it. So the recipe becomes:
read the value of the source node
write the value to its new location while deleting the old location in a single update() call
the security rules validate the operation, either accepting or rejecting it
if rejected, the client retries from #1
Doing so essentially reimplements Firebase Database transactions with client-side code and (some admittedly tricky) security rules.
To be able to do this, the update becomes a bit more tricky. Say that we have this structure:
"key1": "value1",
"key2": "value2"
And we want to move value1 from key1 to key3, then Gustavo's approach would send this JSON:
ref.update({
"key1": null,
"key3": "value1"
})
When can easily validate this operation with these rules:
".validate": "
!data.child("key3").exists() &&
!newData.child("key1").exists() &&
newData.child("key3").val() === data.child("key1").val()
"
In words:
There is currently no value in key3.
There is no value in key1 after the update
The new value of key3 is the current value of key1
This works great, but unfortunately means that we're hardcoding key1 and key3 in our rules. To prevent hardcoding them, we can add the keys to our update statement:
ref.update({
_fromKey: "key1",
_toKey: "key3",
key1: null,
key3: "value1"
})
The different is that we added two keys with known names, to indicate the source and destination of the move. Now with this structure we have all the information we need, and we can validate the move with:
".validate": "
!data.child(newData.child('_toKey').val()).exists() &&
!newData.child(newData.child('_fromKey').val()).exists() &&
newData.child(newData.child('_toKey').val()).val() === data.child(newData.child('_fromKey').val()).val()
"
It's a bit longer to read, but each line still means the same as before.
And in the client code we'd do:
function move(from, to) {
ref.child(from).once("value").then(function(snapshot) {
var value = snapshot.val();
updates = {
_fromKey: from,
_toKey: to
};
updates[from] = null;
updates[to] = value;
ref.update(updates).catch(function() {
// the update failed, wait half a second and try again
setTimeout(function() {
move(from, to);
}, 500);
});
}
move ("key1", "key3");
If you feel like playing around with the code for these rules, have a look at: https://jsbin.com/munosih/edit?js,console
There are no "tables" in Realtime Database, so I'll use the term "location" instead to refer to a path that contains some child nodes.
Realtime Database provides no way to atomically transaction on two different locations. When you perform a transaction, you have to choose a single location, and you may only make changes under that single location.
You might think that you could just transact at the root of the database. This is possible, but those transactions may fail in the face of concurrent non-transaction write operations anywhere within the database. It's a requirement that there must be no non-transactional writes anywhere at the location where transactions take place. In other words, if you want to transact at a location, all clients must be transacting there, and no clients may write there without a transaction.
This rule is certainly going to be problematic if you transact at the root of your database, where clients are probably writing data all over the place without transactions. So, if you want perform an atomic "move", you'll either have to make all your clients use transactions all the time at the common root location for the move, or accept that you can't do this truly atomically.
Firebase works with Dictionaries, a.k.a, key-value pair. And to change data in more than one table on the same transaction you can get the base reference, with a dictionary containing "all the instructions", for instance in Swift:
let reference = Database.database().reference() // base reference
let tableADict = ["TableA/SomeID" : NSNull()] // value that will be deleted on table A
let tableBDict = ["TableB/SomeID" : true] // value that will be appended on table B, instead of true you can put another dictionary, containing your values
You should then merge (how to do it here: How do you add a Dictionary of items into another Dictionary) both dictionaries into one, lets call it finalDict,
then you can update those values, and both tables will be updated, deleting from A and "moving to" B
reference.updateChildValues(finalDict) // update everything on the same time with only one transaction, w/o having to wait for one callback to update another table

Are Firebase Object Keys Guessable?

Say I have data like so:
users{
$authId: {
name:"",
propertiesById:{
uniqueId1:true,
uniqueId2:true
}
}
},
properties:{
uniqueId1:{
key:val
},
uniqueId2:{
key:val
}
}
Assuming that I have the proper rules in place for users to be able to only read/write their own user object, how do I go about reading my properties safely?
I know that in rules, I can go to root, find the user object and ensure that the property that I'm trying to read exists in the propertiesById, but that seems like it would impact performance since these will be very large collections of properties.
How would I, if at all possible, go about writing a rule that says: "users cannot grab the entire properties collection, only individually by key". Also, if I were able to write this rule, would it be safe? I'm unclear on difficult it would be to guess the keys generated by firebase push().
While Firebase push IDs are hard to guess, they are definitely guessable. You should not rely on the user not being able to guess a push ID for security.
Given that though, it is very easy to do precisely that:
{
"rules": {
"properties": {
"$uniqueId": {
".read": true,
".write": true
}
}
}
}
These rules allow any user to read any specific property, as long as they know that property exists. So no-one can read /properties, but everyone can read /properties/uniqueId1 and /properties/uniqueId2.
But again: this is not secure as push IDs are at some level reasonably guessable.
From the Firebase blog post explaining push IDs:
We also get questions on whether developers can rely on push IDs to be unguessable by others, which can be important if you're trying to do security via unguessable Firebase paths. While push IDs are generally very hard to guess, if you’re relying on unguessable IDs you should generate them yourself using a more secure mechanism.

Should fields be added sparingly or generously in a GraphQL API?

This is a general question, I'm making an example just to better illustrate what I mean.
Assume I have a User model, and a Tournament model, and that the Tournament model has a key/value map of user ids and their scores. When exposing this as a GraphQL API, I could expose it more or less directly like so:
Schema {
tournament: {
scores: [{
user: User
score: Number
}]
}
user($id: ID) {
id: ID
name: String
}
}
This gives access to all the data. However, in many cases it might be useful to get a user's scores in a certain tournament, or going from the tournament, get a certain user's scores. In other words, there are many edges that seem handy that I could add:
Schema {
tournament: {
scores: [{
user: User
score: Number
}]
userScore($userID: ID): Number # New edge!
}
user($id: ID) {
id: ID
name: String
tournamentScore($tournamentID: ID): Number # New edge!
}
}
This would probably be more practical to consume for the client, covering more use cases in a handy way. On the other hand the more I expose, the more I have to maintain.
My question is: In general, is it better to be "generous" and expose many edges between nodes where applicable (because it makes it easier for the client), or is it better to code sparingly and only expose as much as needed to get the data (because it will be less to maintain)?
Of course, in this trivial example it won't make much difference either way, but I feel like these might be important questions when designing larger API's.
I could write it as a comment but I can't help emphasizing the following point as an answer:
Always always follow YAGNI principle. The less to maintain, the better. A good API design is not about how large it is, it's about how good it meets the needs, how easy it is to use.
You can always add the new fields (what you call edge in your example) later when you need them. KISS is good.
Or you could do this
Schema {
tournament: {
scores(user_ids: [ID]): [{
user: User
score: Number
}]
}
user($id: ID) {
id: ID
name: String
tournaments(tournament_ids: [ID]): [{
tournament: Tournament
score: Number
}]
}
}
and since user_ids and tournament_ids are not mandatory, a user can make the decision to get all edges, some, or one.

Users sees one part of deeply-nested state, should visible properties be at top level?

I'm working on a game. Originally, the user was in a single dungeon, with properties:
// state
{
health: 95,
creatures: [ {}, {} ],
bigBoss: {},
lightIsOn: true,
goldReward: 54,
// .. you get the idea
}
Now there are many kingdoms, and many dungeons, and we may want to fetch this data asynchronously.
Is it better to represent that deeply-nested structure in the user's state, effectively caching all the other possible dungeons when they are loaded, and every time we want to update a property (e.g. action TURN_ON_LIGHT) we need to find exactly which dungeons we're talking about, or to update the top-level properties every time we move to a new dungeon?
The state below shows nesting. Most of the information is irrelevant to my presentational objects and actions, they only care about the one dungeon the user is currently in.
// state with nesting
{
health: 95,
kingdom: 0,
dungeon: 1,
kingdoms: [
{
dungeons: [
{
creatures: [ {}, {} ],
bigBoss: {},
lightIsOn: true,
goldReward: 54
}
{
creatures: [ {}, {}, {} ],
bigBoss: {},
lightIsOn: false,
goldReward: 79
}
{
//...
}
]
}
{
// ...
}
]
}
One of the things that's holding me back is that all the clean reducers, which previously could just take an action like TURN_ON_LIGHT and update the top-level property lightIsOn, allowing for very straight-forward reducer composition, now have to reach into the state and update the correct property depending on the kingdom and dungeon that we are currently in. Is there a nice way of composing the reducers that would keep this clean?
The recommended approach for dealing with nested or relational data in Redux is to normalize it, similar to how you would structure a database. Use objects with IDs as keys and the items as values to allow direct lookup by IDs, use arrays of IDs to indicate ordering, and any other part of your state that needs to reference an item should just store the ID, not the item itself. This keeps your state flatter and makes it more straightforward to update a given item.
As part of this, you can use multiple levels of connected components in your UI. One typical technique with Redux is to have a connected parent component that retrieves the IDs of multiple items, and renders <SomeConnectedChild itemID={itemID} /> for each ID. That connected child would then look up its own data using that ID, and pass the data to any presentational children below it. Actions dispatched from that subtree would reference the item's ID, and the reducers would be able to update the correct normalized item entry based on that.
The Redux FAQ has further discussion on this topic: http://redux.js.org/docs/FAQ.html#organizing-state-nested-data . Some of the articles on Redux performance at https://github.com/markerikson/react-redux-links/blob/master/react-performance.md#redux-performance describe the "pass an ID" approach, and https://medium.com/#adamrackis/querying-a-redux-store-37db8c7f3b0f is a good reference as well. Finally, I just gave an example of what a normalized state might look like over at https://github.com/reactjs/redux/issues/1824#issuecomment-228609501.
edit:
As a follow-up, I recently added a new section to the Redux docs, on the topic of "Structuring Reducers". In particular, this section includes chapters on "Normalizing State Shape" and "Updating Normalized Data".

How to structure data in Firebase to avoid N+1 selects?

Since Firebase security rules cannot be used to filter children, what's the best way to structure data for efficient queries in a basic multi-user application? I've read through several guides, but they seem to break down when scaled past the examples given.
Say you have a basic messaging application like WhatsApp. Users can open chats with other groups of users to send private messages between themselves. Here's my initial idea of how this could be organized in Firebase (a bit similar to this example from the docs):
{
users: {
$uid: {
name: string,
chats: {
$chat_uid : true,
$chat2_uid: true
}
}
},
chats: {
$uid: {
messages: {
message1: 'first message',
message2: 'another message'
}
}
}
}
Firebase permissions could be set up to only let users read chats that are marked true in their user object (and restrict adding arbitrarily to the chats object, etc).
However this layout requires N+1 selects for several common scenarios. For example: to build the home screen, the app has to first retrieve the user's chats object, then make a get request for each thread to get its info. Same thing if a user wants to search their conversations for a specific string: the app has to run a separate request for every chat they have access to in order to see if it matches.
I'm tempted to set up a node.js server to run root-authenticated queries against the chats tree and skip the client-side firebase code altogether. But that's defeating the purpose of Firebase in the first place.
Is there a way to organize data like this using Firebase permissions and avoid the N+1 select problem?
It appears that n+1 queries do not necessarily need to be avoided and that Firebase is engineered specifically to offer good performance when doing n+1 selects, despite being counter-intuitive for developers coming from a relational database background.
An example of n+1 in the Firebase 2.4.2 documentation is followed by a reassuring message:
// List the names of all Mary's groups
var ref = new Firebase("https://docs-examples.firebaseio.com/web/org");
// fetch a list of Mary's groups
ref.child("users/mchen/groups").on('child_added', function(snapshot) {
// for each group, fetch the name and print it
String groupKey = snapshot.key();
ref.child("groups/" + groupKey + "/name").once('value', function(snapshot) {
System.out.println("Mary is a member of this group: " + snapshot.val());
});
});
Is it really okay to look up each record individually? Yes. The Firebase protocol uses web sockets, and the client libraries do a great deal of internal optimization of incoming and outgoing requests. Until we get into tens of thousands of records, this approach is perfectly reasonable. In fact, the time required to download the data (i.e. the byte count) eclipses any other concerns regarding connection overhead.

Resources