Should I be further denormalizing? [duplicate] - firebase

I've read the Firebase docs on Stucturing Data. Data storage is cheap, but the user's time is not. We should optimize for get operations, and write in multiple places.
So then I might store a list node and a list-index node, with some duplicated data between the two, at very least the list name.
I'm using ES6 and promises in my javascript app to handle the async flow, mainly of fetching a ref key from firebase after the first data push.
let addIndexPromise = new Promise( (resolve, reject) => {
let newRef = ref.child('list-index').push(newItem);
resolve( newRef.key()); // ignore reject() for brevity
});
addIndexPromise.then( key => {
ref.child('list').child(key).set(newItem);
});
How do I make sure the data stays in sync in all places, knowing my app runs only on the client?
For sanity check, I set a setTimeout in my promise and shut my browser before it resolved, and indeed my database was no longer consistent, with an extra index saved without a corresponding list.
Any advice?

Great question. I know of three approaches to this, which I'll list below.
I'll take a slightly different example for this, mostly because it allows me to use more concrete terms in the explanation.
Say we have a chat application, where we store two entities: messages and users. In the screen where we show the messages, we also show the name of the user. So to minimize the number of reads, we store the name of the user with each chat message too.
users
so:209103
name: "Frank van Puffelen"
location: "San Francisco, CA"
questionCount: 12
so:3648524
name: "legolandbridge"
location: "London, Prague, Barcelona"
questionCount: 4
messages
-Jabhsay3487
message: "How to write denormalized data in Firebase"
user: so:3648524
username: "legolandbridge"
-Jabhsay3591
message: "Great question."
user: so:209103
username: "Frank van Puffelen"
-Jabhsay3595
message: "I know of three approaches, which I'll list below."
user: so:209103
username: "Frank van Puffelen"
So we store the primary copy of the user's profile in the users node. In the message we store the uid (so:209103 and so:3648524) so that we can look up the user. But we also store the user's name in the messages, so that we don't have to look this up for each user when we want to display a list of messages.
So now what happens when I go to the Profile page on the chat service and change my name from "Frank van Puffelen" to just "puf".
Transactional update
Performing a transactional update is the one that probably pops to mind of most developers initially. We always want the username in messages to match the name in the corresponding profile.
Using multipath writes (added on 20150925)
Since Firebase 2.3 (for JavaScript) and 2.4 (for Android and iOS), you can achieve atomic updates quite easily by using a single multi-path update:
function renameUser(ref, uid, name) {
var updates = {}; // all paths to be updated and their new values
updates['users/'+uid+'/name'] = name;
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.once('value', function(snapshot) {
snapshot.forEach(function(messageSnapshot) {
updates['messages/'+messageSnapshot.key()+'/username'] = name;
})
ref.update(updates);
});
}
This will send a single update command to Firebase that updates the user's name in their profile and in each message.
Previous atomic approach
So when the user change's the name in their profile:
var ref = new Firebase('https://mychat.firebaseio.com/');
var uid = "so:209103";
var nameInProfileRef = ref.child('users').child(uid).child('name');
nameInProfileRef.transaction(function(currentName) {
return "puf";
}, function(error, committed, snapshot) {
if (error) {
console.log('Transaction failed abnormally!', error);
} else if (!committed) {
console.log('Transaction aborted by our code.');
} else {
console.log('Name updated in profile, now update it in the messages');
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.on('child_added', function(messageSnapshot) {
messageSnapshot.ref().update({ username: "puf" });
});
}
console.log("Wilma's data: ", snapshot.val());
}, false /* don't apply the change locally */);
Pretty involved and the astute reader will notice that I cheat in the handling of the messages. First cheat is that I never call off for the listener, but I also don't use a transaction.
If we want to securely do this type of operation from the client, we'd need:
security rules that ensure the names in both places match. But the rules need to allow enough flexibility for them to temporarily be different while we're changing the name. So this turns into a pretty painful two-phase commit scheme.
change all username fields for messages by so:209103 to null (some magic value)
change the name of user so:209103 to 'puf'
change the username in every message by so:209103 that is null to puf.
that query requires an and of two conditions, which Firebase queries don't support. So we'll end up with an extra property uid_plus_name (with value so:209103_puf) that we can query on.
client-side code that handles all these transitions transactionally.
This type of approach makes my head hurt. And usually that means that I'm doing something wrong. But even if it's the right approach, with a head that hurts I'm way more likely to make coding mistakes. So I prefer to look for a simpler solution.
Eventual consistency
Update (20150925): Firebase released a feature to allow atomic writes to multiple paths. This works similar to approach below, but with a single command. See the updated section above to read how this works.
The second approach depends on splitting the user action ("I want to change my name to 'puf'") from the implications of that action ("We need to update the name in profile so:209103 and in every message that has user = so:209103).
I'd handle the rename in a script that we run on a server. The main method would be something like this:
function renameUser(ref, uid, name) {
ref.child('users').child(uid).update({ name: name });
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.once('value', function(snapshot) {
snapshot.forEach(function(messageSnapshot) {
messageSnapshot.update({ username: name });
})
});
}
Once again I take a few shortcuts here, such as using once('value' (which is in general a bad idea for optimal performance with Firebase). But overall the approach is simpler, at the cost of not having all data completely updated at the same time. But eventually the messages will all be updated to match the new value.
Not caring
The third approach is the simplest of all: in many cases you don't really have to update the duplicated data at all. In the example we've used here, you could say that each message recorded the name as I used it at that time. I didn't change my name until just now, so it makes sense that older messages show the name I used at that time. This applies in many cases where the secondary data is transactional in nature. It doesn't apply everywhere of course, but where it applies "not caring" is the simplest approach of all.
Summary
While the above are just broad descriptions of how you could solve this problem and they are definitely not complete, I find that each time I need to fan out duplicate data it comes back to one of these basic approaches.

To add to Franks great reply, I implemented the eventual consistency approach with a set of Firebase Cloud Functions. The functions get triggered whenever a primary value (eg. users name) gets changed, and then propagate the changes to the denormalized fields.
It is not as fast as a transaction, but for many cases it does not need to be.

Related

Firebase RTD, atomic "move" ... delete and add from two "tables"?

In Firebase Realtime Database, it's a pretty common transactional thing that you have
"table" A - think of it as "pending"
"table" B - think of it as "results"
Some state happens, and you need to "move" an item from A to B.
So, I certainly mean this would likely be a cloud function doing this.
Obviously, this operation has to be atomic and you have to be guarded against racetrack effects and so on.
So, for item 123456, you have to do three things
read A/123456/
delete A/123456/
write the value to B/123456
all atomically, with a lock.
In short what is the Firebase way to achieve this?
There's already the awesome ref.transaction system, but I don't think it's relevant here.
Perhaps using triggers in a perverted manner?
IDK
Just for anyone googling here, it's worth noting that the mind-boggling new Firestore (it's hard to imagine anything being more mind-boggling than traditional Firebase, but there you have it...), the new Firestore system has built-in .......
This question is about good old traditional Firebase Realtime.
Gustavo's answer allows the update to happen with a single API call, which either complete succeeds or fails. And since it doesn't have to use a transaction, it has much less contention issues. It just loads the value from the key it wants to move, and then writes a single update.
The problem is that somebody might have modified the data in the meantime. So you need to use security rules to catch that situation and reject it. So the recipe becomes:
read the value of the source node
write the value to its new location while deleting the old location in a single update() call
the security rules validate the operation, either accepting or rejecting it
if rejected, the client retries from #1
Doing so essentially reimplements Firebase Database transactions with client-side code and (some admittedly tricky) security rules.
To be able to do this, the update becomes a bit more tricky. Say that we have this structure:
"key1": "value1",
"key2": "value2"
And we want to move value1 from key1 to key3, then Gustavo's approach would send this JSON:
ref.update({
"key1": null,
"key3": "value1"
})
When can easily validate this operation with these rules:
".validate": "
!data.child("key3").exists() &&
!newData.child("key1").exists() &&
newData.child("key3").val() === data.child("key1").val()
"
In words:
There is currently no value in key3.
There is no value in key1 after the update
The new value of key3 is the current value of key1
This works great, but unfortunately means that we're hardcoding key1 and key3 in our rules. To prevent hardcoding them, we can add the keys to our update statement:
ref.update({
_fromKey: "key1",
_toKey: "key3",
key1: null,
key3: "value1"
})
The different is that we added two keys with known names, to indicate the source and destination of the move. Now with this structure we have all the information we need, and we can validate the move with:
".validate": "
!data.child(newData.child('_toKey').val()).exists() &&
!newData.child(newData.child('_fromKey').val()).exists() &&
newData.child(newData.child('_toKey').val()).val() === data.child(newData.child('_fromKey').val()).val()
"
It's a bit longer to read, but each line still means the same as before.
And in the client code we'd do:
function move(from, to) {
ref.child(from).once("value").then(function(snapshot) {
var value = snapshot.val();
updates = {
_fromKey: from,
_toKey: to
};
updates[from] = null;
updates[to] = value;
ref.update(updates).catch(function() {
// the update failed, wait half a second and try again
setTimeout(function() {
move(from, to);
}, 500);
});
}
move ("key1", "key3");
If you feel like playing around with the code for these rules, have a look at: https://jsbin.com/munosih/edit?js,console
There are no "tables" in Realtime Database, so I'll use the term "location" instead to refer to a path that contains some child nodes.
Realtime Database provides no way to atomically transaction on two different locations. When you perform a transaction, you have to choose a single location, and you may only make changes under that single location.
You might think that you could just transact at the root of the database. This is possible, but those transactions may fail in the face of concurrent non-transaction write operations anywhere within the database. It's a requirement that there must be no non-transactional writes anywhere at the location where transactions take place. In other words, if you want to transact at a location, all clients must be transacting there, and no clients may write there without a transaction.
This rule is certainly going to be problematic if you transact at the root of your database, where clients are probably writing data all over the place without transactions. So, if you want perform an atomic "move", you'll either have to make all your clients use transactions all the time at the common root location for the move, or accept that you can't do this truly atomically.
Firebase works with Dictionaries, a.k.a, key-value pair. And to change data in more than one table on the same transaction you can get the base reference, with a dictionary containing "all the instructions", for instance in Swift:
let reference = Database.database().reference() // base reference
let tableADict = ["TableA/SomeID" : NSNull()] // value that will be deleted on table A
let tableBDict = ["TableB/SomeID" : true] // value that will be appended on table B, instead of true you can put another dictionary, containing your values
You should then merge (how to do it here: How do you add a Dictionary of items into another Dictionary) both dictionaries into one, lets call it finalDict,
then you can update those values, and both tables will be updated, deleting from A and "moving to" B
reference.updateChildValues(finalDict) // update everything on the same time with only one transaction, w/o having to wait for one callback to update another table

How to structure data in Firebase to avoid N+1 selects?

Since Firebase security rules cannot be used to filter children, what's the best way to structure data for efficient queries in a basic multi-user application? I've read through several guides, but they seem to break down when scaled past the examples given.
Say you have a basic messaging application like WhatsApp. Users can open chats with other groups of users to send private messages between themselves. Here's my initial idea of how this could be organized in Firebase (a bit similar to this example from the docs):
{
users: {
$uid: {
name: string,
chats: {
$chat_uid : true,
$chat2_uid: true
}
}
},
chats: {
$uid: {
messages: {
message1: 'first message',
message2: 'another message'
}
}
}
}
Firebase permissions could be set up to only let users read chats that are marked true in their user object (and restrict adding arbitrarily to the chats object, etc).
However this layout requires N+1 selects for several common scenarios. For example: to build the home screen, the app has to first retrieve the user's chats object, then make a get request for each thread to get its info. Same thing if a user wants to search their conversations for a specific string: the app has to run a separate request for every chat they have access to in order to see if it matches.
I'm tempted to set up a node.js server to run root-authenticated queries against the chats tree and skip the client-side firebase code altogether. But that's defeating the purpose of Firebase in the first place.
Is there a way to organize data like this using Firebase permissions and avoid the N+1 select problem?
It appears that n+1 queries do not necessarily need to be avoided and that Firebase is engineered specifically to offer good performance when doing n+1 selects, despite being counter-intuitive for developers coming from a relational database background.
An example of n+1 in the Firebase 2.4.2 documentation is followed by a reassuring message:
// List the names of all Mary's groups
var ref = new Firebase("https://docs-examples.firebaseio.com/web/org");
// fetch a list of Mary's groups
ref.child("users/mchen/groups").on('child_added', function(snapshot) {
// for each group, fetch the name and print it
String groupKey = snapshot.key();
ref.child("groups/" + groupKey + "/name").once('value', function(snapshot) {
System.out.println("Mary is a member of this group: " + snapshot.val());
});
});
Is it really okay to look up each record individually? Yes. The Firebase protocol uses web sockets, and the client libraries do a great deal of internal optimization of incoming and outgoing requests. Until we get into tens of thousands of records, this approach is perfectly reasonable. In fact, the time required to download the data (i.e. the byte count) eclipses any other concerns regarding connection overhead.

How to prevent collection modification via console for an otherwise secure update operation?

I am fairly new to Meteor and am just trying to figure out meteor security.
I am writing a quiz app that allows a logged in user to save their scores. I have created a collection which consists of a user id and an array of scores. The way I expose a push of new score is a method on the server side:
Meteor.methods({
'pushScore' : function(playerId, playerScore) {
UserScores.upsert({ userId : playerId}, {$push : {scores : playerScore}});
}
});
I call the method on click of a button from the client like so:
if (Meteor.userId()){
Meteor.call('pushScore', Meteor.userId(), Session.get("score"));
}
I have the following concerns here:
Obviously the user can manipulate the score value in "Session" and cheat the system. What could be an alternate secure mechanism to keep track of the running score while a quiz is being taken?
The other one is probably a bigger concern. How do I prevent the user from just firing a console call to my method "pushScore" and again cheat the system by adding, say a score of 100?
Is there an inherent flaw in the way I have designed here?
This is just a sample application, but I can easily imagine a real world scenario which could mimic this. What woudl be a best practice in such a scenario?
Thanks in advance.
Cheers..
As #Peppe suggested, you should move the logic to the server somehow. The main rule for Meteor security (and web security in general) is
You cannot trust the client.
The reason for that is what you've already mentioned: if there is something a client can do, then there is no way to stop a rogue user to do the same thing from the browser console, or even to write his own malicious client that will exploit the leak.
In your case, that means that if client is able to add points to scores, then the user is able to do so as well, regardless on what security measures you employ. You can make this more or less difficult, but your system has a designed leak which cannot be completely closed.
Thus, the only bulletproof solution is to make the server decide on when to assign points. I assume that in a quiz app user gets points when he choose the right answer to a question. So instead of checking that on the client, create a server–side method that will receive the question ID, answer ID, and increase user scores if the answer is correct. Then make sure user cannot just call this method with all possible answer, with a way that corresponds to your quiz design – for example give negative points if wrong answer is chosen, or allow to answer the same question only once in a period of time.
Finally, make sure the client doesn't just get the correct answer ID in the data it receives.
In a nutshell, there are 2 common soloutions to your problem:
if you're using a Meteor.method dont pass any arguments in the Meteor.call, the server can and should gather the data it plans to insert/update on the server side.
you can add a validation function to the collection using the collection "allow" method to verify any updates from the client, in that case you don't need the Meteor.method and can just update from the client and validate it server-side.
Security (insert/update/delete operations) in meteor works in the same way as security in any other framework: before executing an action taken by the user, make sure the user has the rights to perform it. Security may appear as a weakness in Meteor, but it does not suffer from it any more than other frameworks (though, it's easier to exploit it in Meteor through the console).
The best way to solve it probably varies from case to case, but here's an example: if a user posts a post, the user should gain 5 points. Here's a bad way to solve it:
if(Meteor.isClient){
// Insert the post and increase points.
Posts.insert({userId: Meteor.userId(), post: "The post."})
Meteor.users.update(Meteor.userId(), {$inc: {'profile.points': 5}})
}
if(Meteor.isServer){
Posts.allow({
insert: function(userId, doc){
check(doc, {
_id: String,
userId: String,
post: String
})
// You must be yourself.
if(doc.userId != userId){
return false
}
return true
}
})
Meteor.users.allow({
update: function(userId, doc, fieldNames, modifier){
check(modifier, {
$inc: {
'profile.points': Number
}
})
if(modifier.$inc['profile.points'] != 5){
return false
}
return true
}
})
}
What makes it bad? The user can increase his points without posting a post. Here's a better solution:
if(Meteor.isClient){
// Insert the post and increase points.
Method.call('postAndIncrease', {userId: Meteor.userId(), post: "The post."})
}
if(Meteor.isServer){
Meteor.methods({
postAndIncrease: function(post){
check(post, {
userId: String,
post: String
})
// You must be yourself.
if(post.userId != this.userId){
return false
}
Posts.insert(post)
Meteor.users.update(this.userId, {$inc: {'profile.points': 5}})
}
})
}
Better, but still bad. Why? Because of the latency (the post is created on the server, not the client). Here's a better solution:
if(Meteor.isClient){
// Insert the post and increase points.
Posts.insert({userId: Meteor.userId(), post: "The post."})
}
if(Meteor.isServer){
Posts.allow({
insert: function(userId, doc){
check(doc, {
_id: String,
userId: String,
post: String
})
// You must be yourself.
if(doc.userId != userId){
return false
}
return true
}
})
Posts.find().observe({
added: function(post){
// When new posts are added, the user gain the points.
Meteor.users.update(post.userId, {$inc: {'profile.points': 5}})
}
})
}
The only disadvantage this solution suffers from is the latency of the increment of the points, but it is something we must live with (at least at the moment). Using observe on the server may also be a disadvantage, but I think you can get pass it by using the package collection hooks instead.

Meteor.users information from collection helpers

I'm trying to get some information via the collection-helpers package for non-logged in users and I'm obviously missing something fundamental here as I'm getting nowhere.
I have a relationship set up what is happily returning the profile.name element for the owner of a document, as long as that happens to coincide with the logged in user, but, I'm getting nothing back for non-logged in users (because of the security on the client side).
I've added a new publication on both client and server as
// User Profile
Meteor.publish("userProfile", function() {
return Meteor.users.find({_id: this.userId},
{fields: {'profile': 1}});
});
and have subscribed to this publication in the js associated with the page I'm trying to display it in
// Don't need this to be reactive, so
Meteor.subscribe("userProfile");
but am still not getting access to the profile data in the document with
<h4>Posted by: {{projOwner.profile.name}}</h4>
where projOwner looks like
projectDocs.helpers({
projOwner: function() {
console.log(this.owner._id);
var owner = Meteor.users.findOne(this.owner._id);
//console.log("Owner is: " +owner);
return owner;
}
});
What am I doing wrong??
In a publish function, this.userId is always the id of the currently logged in user. The profile of the current user is automatically published so that function doesn't do anything useful.
The real problem here is you need to get the correct subset of users published to the client. Maybe that's the project owner of the document you are looking at, maybe it's a all of the users in a group, etc. Without knowing more about your problem it's hard to say.
An easy place to start with is just publishing all of the users to make sure your code works, and then try reducing the set. Remember that publish functions can take arguments, so you could pass in, for example, the id of a project and then publish the owner like so:
Meteor.publish('projectOwner', function(projectId) {
check(projectId, String);
var project = Projects.findOne(projectId);
return Meteor.users.find(project.owner, {
fields: {'profile': 1}
});
});

How to avoid race conditions on cursor.observe?

Race condiditions
In my Meteor application, I made an observe within a publish, that insert some new data in certain conditions. The point is that sometimes we have duplicated subscriptions, and race condition leads us to duplicate inserted data.
If it is not possible to have "singleton observers":
How can we avoid race conditions and duplicated inserted data on database?
Example:
Meteor.publish("fortuneUpdate", function () {
var selector = {user: this.userId, seen:false};
DailyFortunes.find(selector).observe({
removed: function(doc, beforeIndex){
if(DailyFortunes.find(selector).count()<1)
createDailyFortune(this.userId);
}
});
}
This question has been moved from How cursor.observe works and how to avoid multiple instances running?
According to Tom, it is not possible, for now, to ensure that calls to subscribe that have the same arguments are shared.
So, if you are having the same problem I had, of redundant data created inside observers, I suggest you, as workaround, to:
Create robust indexes that prevent repeted data creating. Compound Keys is probable what you need here.
Treat duplicate key error exceptions inside your observer ignoring race conditions.
example:
Collection.find(selector).observe({
removed: function(document){
try {
// Workaround to avoid race conditions > https://stackoverflow.com/q/13095647/599991
createNewDocument();
} catch (e) {
// XXX string parsing sucks, maybe
// https://jira.mongodb.org/browse/SERVER-3069 will get fixed one day
if (e.name !== 'MongoError') throw e;
var match = e.err.match(/^E11000 duplicate key error index: ([^ ]+)/);
if (!match) throw e;
//if match, just do nothing.
}
self.flush();
}
});
This is an odd pattern. Can you share some example code?
Generally I'd either expect to see mutations in a method, or setting up an observe inside Meteor.startup() on the server. (The latter is tricky if you're running multiple server processes, but so are many other things in a multi process regime. We'll have a better pattern down the line.)
Because it can be arbitrary JS, a publish function has to run once per subscribing client. It may log new subscriptions, set up per-client server state, or vary its behavior based on this.userId or even a random source. For example, consider a subscription that returns 10 randomly selected documents from a DB collection to each subscribed client!
So the place to optimize the case of many clients subscribing to the same data set is at the DB query layer: if a thousand clients are subscribed to the same DB query, we'll just run that underlying query once.

Resources