How to structure data in Firebase to avoid N+1 selects? - firebase

Since Firebase security rules cannot be used to filter children, what's the best way to structure data for efficient queries in a basic multi-user application? I've read through several guides, but they seem to break down when scaled past the examples given.
Say you have a basic messaging application like WhatsApp. Users can open chats with other groups of users to send private messages between themselves. Here's my initial idea of how this could be organized in Firebase (a bit similar to this example from the docs):
{
users: {
$uid: {
name: string,
chats: {
$chat_uid : true,
$chat2_uid: true
}
}
},
chats: {
$uid: {
messages: {
message1: 'first message',
message2: 'another message'
}
}
}
}
Firebase permissions could be set up to only let users read chats that are marked true in their user object (and restrict adding arbitrarily to the chats object, etc).
However this layout requires N+1 selects for several common scenarios. For example: to build the home screen, the app has to first retrieve the user's chats object, then make a get request for each thread to get its info. Same thing if a user wants to search their conversations for a specific string: the app has to run a separate request for every chat they have access to in order to see if it matches.
I'm tempted to set up a node.js server to run root-authenticated queries against the chats tree and skip the client-side firebase code altogether. But that's defeating the purpose of Firebase in the first place.
Is there a way to organize data like this using Firebase permissions and avoid the N+1 select problem?

It appears that n+1 queries do not necessarily need to be avoided and that Firebase is engineered specifically to offer good performance when doing n+1 selects, despite being counter-intuitive for developers coming from a relational database background.
An example of n+1 in the Firebase 2.4.2 documentation is followed by a reassuring message:
// List the names of all Mary's groups
var ref = new Firebase("https://docs-examples.firebaseio.com/web/org");
// fetch a list of Mary's groups
ref.child("users/mchen/groups").on('child_added', function(snapshot) {
// for each group, fetch the name and print it
String groupKey = snapshot.key();
ref.child("groups/" + groupKey + "/name").once('value', function(snapshot) {
System.out.println("Mary is a member of this group: " + snapshot.val());
});
});
Is it really okay to look up each record individually? Yes. The Firebase protocol uses web sockets, and the client libraries do a great deal of internal optimization of incoming and outgoing requests. Until we get into tens of thousands of records, this approach is perfectly reasonable. In fact, the time required to download the data (i.e. the byte count) eclipses any other concerns regarding connection overhead.

Related

I want to sync user contacts to firebase firestore in one go

I am building chat application somewhat like whatsapp. I want to show registered app users list from user's device contact list while creating new group. Now in order to do that I have to compare each and every contact number with firebase firestore users. And any normal user can have more than 500 contacts in device. And moreover firestore has limitation to for querying the db so I can not compare more than one number at a time, the whole process takes almost 6-7 minutes as well as each read operation costs financially.
How can I overcome with this situation, or what is the better way to deal with this particular scenario?
You can store the contacts of the user on device and only send them to firestore as backup. You can then sync your local database with firestore on app start.
The operations you need are not possible to be robust in firebase. Even then if you want to do a search in firebase data, you need to use 3rd party search solution like Elastic Search with your firebase data to perform complex searching.
For local database you can use Room library: https://developer.android.com/topic/libraries/architecture/room
For using Elastic Search with Firebase have a look at this utility Flashlight: https://github.com/FirebaseExtended/flashlight .
The OP requested a structure and some code (Swift, Firebase Database) as a solution. I will present two options
If you want to use a Firebase Query to see if the phone numbers exist, a possible stucture would be
users
uid_0
contact_name: "Larry"
contact_phone: "111-222-3333"
uid_1
contact_name: "Joe"
contact_phone: "444-555-6666"
and then the swift code to query for existing numbers
let phoneNumbers = ["111-222-3333","444-555-6666"] //an array of numbers to look for
let myQueryRef = self.ref.child("users")
for contactPhone in phoneNumbers {
let queryRef = myQueryRef.queryOrdered(byChild: "contact_phone").queryEqual(toValue: contactPhone)
queryRef.observeSingleEvent(of: .childAdded, with: { snapshot in
if snapshot.exists() {
print("found \(contactPhone)") //or add to array etc
}
})
}
Having queries in a tight loop like this is generally not recommended but it usually works fine for me with low iterations. However, queries have a lot more overhead than .observers.
IMO, a better and considerably faster option is to keep a node of just phone numbers. Then iterate over the ones you are looking for and use .observe to see if that node exists.
phone_numbers
111-222-3333: true
444-555-6666: true
and then the code to see if the ones from the array exist
let phoneNumbers = ["111-222-3333","444-555-6666"] //an array of numbers to look for
let phoneNumberRef = self.ref.child("phone_numbers")
for contactPhone in phoneNumbers {
let ref = phoneNumberRef.child(contactPhone)
ref.observeSingleEvent(of: .value, with: { snapshot in
if snapshot.exists() {
print("found \(contactPhone)")
}
})
}
In testing, this second solution is must faster than the first solution.

Firebase control server for maintaining counters and aggregates

It's a known issue that firebase doesn't have easy way to count items. I'm planning to create an app that relies heavily on counts and other aggregates. I fear creating this app's counters with the rules as suggested here will be incredibly complex and hard to maintain.
So I thought about this pattern:
I will keep a server that will listen to all items entered in the database and this server will update all counters and aggregates. The server will hold the UID of a special admin that only he can update counters.
This way, users will not have to download entire nodes in order to get a count, plus I won't have to deal with issues that arise from maintaining counters by clients.
Does this pattern make sense? Am I missing something?
Firebase has recently released Cloud Functions. As mentioned on the documentation:
Cloud Functions is a hosted, private, and scalable Node.js environment
where you can run JavaScript code.
With Cloud Functions, you don't need to create your own Server. You can simply write JavaScript functions and upload it to Firebase. Firebase will be responsible for triggering functions whenever an event occurs.
For example, let's say you want to count the number of likes in a post. You should have a structure similar to this one:
{
"Posts" : {
"randomKey" : {
"likes_count":5,
"likes" : {
"userX" : true,
"userY" : true,
"userZ" : true,
...
}
}
}
}
And your JavaScript function would be written like this:
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp(functions.config().firebase);
// Keeps track of the length of the 'likes' child list in a separate attribute.
exports.countlikes = functions.database.ref('/posts/$postid/likes').onWrite(event => {
return event.data.ref.parent().child('likes_count').set(event.data.numChildren());
});
This code increases the likes_count variable every time there is a new write on the likes node.
This sample is available on GitHub.

Should I be further denormalizing? [duplicate]

I've read the Firebase docs on Stucturing Data. Data storage is cheap, but the user's time is not. We should optimize for get operations, and write in multiple places.
So then I might store a list node and a list-index node, with some duplicated data between the two, at very least the list name.
I'm using ES6 and promises in my javascript app to handle the async flow, mainly of fetching a ref key from firebase after the first data push.
let addIndexPromise = new Promise( (resolve, reject) => {
let newRef = ref.child('list-index').push(newItem);
resolve( newRef.key()); // ignore reject() for brevity
});
addIndexPromise.then( key => {
ref.child('list').child(key).set(newItem);
});
How do I make sure the data stays in sync in all places, knowing my app runs only on the client?
For sanity check, I set a setTimeout in my promise and shut my browser before it resolved, and indeed my database was no longer consistent, with an extra index saved without a corresponding list.
Any advice?
Great question. I know of three approaches to this, which I'll list below.
I'll take a slightly different example for this, mostly because it allows me to use more concrete terms in the explanation.
Say we have a chat application, where we store two entities: messages and users. In the screen where we show the messages, we also show the name of the user. So to minimize the number of reads, we store the name of the user with each chat message too.
users
so:209103
name: "Frank van Puffelen"
location: "San Francisco, CA"
questionCount: 12
so:3648524
name: "legolandbridge"
location: "London, Prague, Barcelona"
questionCount: 4
messages
-Jabhsay3487
message: "How to write denormalized data in Firebase"
user: so:3648524
username: "legolandbridge"
-Jabhsay3591
message: "Great question."
user: so:209103
username: "Frank van Puffelen"
-Jabhsay3595
message: "I know of three approaches, which I'll list below."
user: so:209103
username: "Frank van Puffelen"
So we store the primary copy of the user's profile in the users node. In the message we store the uid (so:209103 and so:3648524) so that we can look up the user. But we also store the user's name in the messages, so that we don't have to look this up for each user when we want to display a list of messages.
So now what happens when I go to the Profile page on the chat service and change my name from "Frank van Puffelen" to just "puf".
Transactional update
Performing a transactional update is the one that probably pops to mind of most developers initially. We always want the username in messages to match the name in the corresponding profile.
Using multipath writes (added on 20150925)
Since Firebase 2.3 (for JavaScript) and 2.4 (for Android and iOS), you can achieve atomic updates quite easily by using a single multi-path update:
function renameUser(ref, uid, name) {
var updates = {}; // all paths to be updated and their new values
updates['users/'+uid+'/name'] = name;
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.once('value', function(snapshot) {
snapshot.forEach(function(messageSnapshot) {
updates['messages/'+messageSnapshot.key()+'/username'] = name;
})
ref.update(updates);
});
}
This will send a single update command to Firebase that updates the user's name in their profile and in each message.
Previous atomic approach
So when the user change's the name in their profile:
var ref = new Firebase('https://mychat.firebaseio.com/');
var uid = "so:209103";
var nameInProfileRef = ref.child('users').child(uid).child('name');
nameInProfileRef.transaction(function(currentName) {
return "puf";
}, function(error, committed, snapshot) {
if (error) {
console.log('Transaction failed abnormally!', error);
} else if (!committed) {
console.log('Transaction aborted by our code.');
} else {
console.log('Name updated in profile, now update it in the messages');
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.on('child_added', function(messageSnapshot) {
messageSnapshot.ref().update({ username: "puf" });
});
}
console.log("Wilma's data: ", snapshot.val());
}, false /* don't apply the change locally */);
Pretty involved and the astute reader will notice that I cheat in the handling of the messages. First cheat is that I never call off for the listener, but I also don't use a transaction.
If we want to securely do this type of operation from the client, we'd need:
security rules that ensure the names in both places match. But the rules need to allow enough flexibility for them to temporarily be different while we're changing the name. So this turns into a pretty painful two-phase commit scheme.
change all username fields for messages by so:209103 to null (some magic value)
change the name of user so:209103 to 'puf'
change the username in every message by so:209103 that is null to puf.
that query requires an and of two conditions, which Firebase queries don't support. So we'll end up with an extra property uid_plus_name (with value so:209103_puf) that we can query on.
client-side code that handles all these transitions transactionally.
This type of approach makes my head hurt. And usually that means that I'm doing something wrong. But even if it's the right approach, with a head that hurts I'm way more likely to make coding mistakes. So I prefer to look for a simpler solution.
Eventual consistency
Update (20150925): Firebase released a feature to allow atomic writes to multiple paths. This works similar to approach below, but with a single command. See the updated section above to read how this works.
The second approach depends on splitting the user action ("I want to change my name to 'puf'") from the implications of that action ("We need to update the name in profile so:209103 and in every message that has user = so:209103).
I'd handle the rename in a script that we run on a server. The main method would be something like this:
function renameUser(ref, uid, name) {
ref.child('users').child(uid).update({ name: name });
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.once('value', function(snapshot) {
snapshot.forEach(function(messageSnapshot) {
messageSnapshot.update({ username: name });
})
});
}
Once again I take a few shortcuts here, such as using once('value' (which is in general a bad idea for optimal performance with Firebase). But overall the approach is simpler, at the cost of not having all data completely updated at the same time. But eventually the messages will all be updated to match the new value.
Not caring
The third approach is the simplest of all: in many cases you don't really have to update the duplicated data at all. In the example we've used here, you could say that each message recorded the name as I used it at that time. I didn't change my name until just now, so it makes sense that older messages show the name I used at that time. This applies in many cases where the secondary data is transactional in nature. It doesn't apply everywhere of course, but where it applies "not caring" is the simplest approach of all.
Summary
While the above are just broad descriptions of how you could solve this problem and they are definitely not complete, I find that each time I need to fan out duplicate data it comes back to one of these basic approaches.
To add to Franks great reply, I implemented the eventual consistency approach with a set of Firebase Cloud Functions. The functions get triggered whenever a primary value (eg. users name) gets changed, and then propagate the changes to the denormalized fields.
It is not as fast as a transaction, but for many cases it does not need to be.

How do I get the number of children in a protected Firebase collection?

I have a protected firebase collection for users of my site, just an array of user objects. The permission rules for users allow an authenticated user to access only their user object in the list of users and no one else.
I'm trying to setup a simple way to get the count of all users in the collection with this permission scheme so that I can display a total user count on my site, however there doesn't seem to be a way to get a count of all users without getting a permission problem.
Any ideas about how to fix this?
I suppose I could store a count at a publicly readable firebase location that gets incremented and decremented whenever a user is added/removed, but I'd rather not store the data twice and worry about mismatches.
I suppose I could also have an authenticated watcher on my server that bypasses the permission requirement and sends to the client (either through firebase by writing to public location or exposed as an api) a user count.
Ideally I'd like to have everything client side at the moment, so please let me know if there's a simple permissions based solution to this.
Thanks!
Data duplication is pretty much the norm in NoSQL, so storing a counter is perfectly reasonable. Check out the Firebase article on denormalization
This pretty much sums up the approaches as I understand them.
Using a counter
It's fast and it's fairly simple, assuming you're using good DRY principles and centralizing all your manipulations of the records. Utilize a transaction to update the counter each time a record is added or removed:
function addUser(user) {
// do your add stuff...
updateCounter(1);
}
function removeUser(user) {
// do your remove stuff...
updateCounter(-1);
}
function updateCounter(amt) {
userCounter.transaction(function(currentValue) {
currentValue || (currentValue === 0); // can be null
return currentValue + amt;
});
}
Separate public and secured data
Store sensitive data (email addresses, things people can't see) in a private path, keep their public user data readable.
This prevents the need to synchronize a counter. It does mean, however, that clients must download the entire list of public users to create a count. So keep the public profiles small (a name, a timestamp, not much else) so it works into the tens of thousands without taking seconds.
"users": {
".read": true,
"$user": {
// don't try to put a ".read" here; it won't remove access
// after the parent path allows it
}
}
"users_secured": {
"$user": {
".read": "auth.id === $user"
}
}
Utilize a server process
Easy and painless; uber fast for clients, easily handles hundreds of thousands of profiles as long as they have a small footprint. Requires you to maintain something. Heroku and Nodejitsu will host this for free until you have users coming out of your ears.
var Firebase = require('firebase');
var fb = new Firebase(process.env.FBURL);
fb.auth( process.env.SECRET, function() {
fb.child('users').on('value', function(snap) {
fb.child('user_counter').set( snap.numChildren() );
});
}

How to avoid race conditions on cursor.observe?

Race condiditions
In my Meteor application, I made an observe within a publish, that insert some new data in certain conditions. The point is that sometimes we have duplicated subscriptions, and race condition leads us to duplicate inserted data.
If it is not possible to have "singleton observers":
How can we avoid race conditions and duplicated inserted data on database?
Example:
Meteor.publish("fortuneUpdate", function () {
var selector = {user: this.userId, seen:false};
DailyFortunes.find(selector).observe({
removed: function(doc, beforeIndex){
if(DailyFortunes.find(selector).count()<1)
createDailyFortune(this.userId);
}
});
}
This question has been moved from How cursor.observe works and how to avoid multiple instances running?
According to Tom, it is not possible, for now, to ensure that calls to subscribe that have the same arguments are shared.
So, if you are having the same problem I had, of redundant data created inside observers, I suggest you, as workaround, to:
Create robust indexes that prevent repeted data creating. Compound Keys is probable what you need here.
Treat duplicate key error exceptions inside your observer ignoring race conditions.
example:
Collection.find(selector).observe({
removed: function(document){
try {
// Workaround to avoid race conditions > https://stackoverflow.com/q/13095647/599991
createNewDocument();
} catch (e) {
// XXX string parsing sucks, maybe
// https://jira.mongodb.org/browse/SERVER-3069 will get fixed one day
if (e.name !== 'MongoError') throw e;
var match = e.err.match(/^E11000 duplicate key error index: ([^ ]+)/);
if (!match) throw e;
//if match, just do nothing.
}
self.flush();
}
});
This is an odd pattern. Can you share some example code?
Generally I'd either expect to see mutations in a method, or setting up an observe inside Meteor.startup() on the server. (The latter is tricky if you're running multiple server processes, but so are many other things in a multi process regime. We'll have a better pattern down the line.)
Because it can be arbitrary JS, a publish function has to run once per subscribing client. It may log new subscriptions, set up per-client server state, or vary its behavior based on this.userId or even a random source. For example, consider a subscription that returns 10 randomly selected documents from a DB collection to each subscribed client!
So the place to optimize the case of many clients subscribing to the same data set is at the DB query layer: if a thousand clients are subscribed to the same DB query, we'll just run that underlying query once.

Resources