Firebase control server for maintaining counters and aggregates - firebase

It's a known issue that firebase doesn't have easy way to count items. I'm planning to create an app that relies heavily on counts and other aggregates. I fear creating this app's counters with the rules as suggested here will be incredibly complex and hard to maintain.
So I thought about this pattern:
I will keep a server that will listen to all items entered in the database and this server will update all counters and aggregates. The server will hold the UID of a special admin that only he can update counters.
This way, users will not have to download entire nodes in order to get a count, plus I won't have to deal with issues that arise from maintaining counters by clients.
Does this pattern make sense? Am I missing something?

Firebase has recently released Cloud Functions. As mentioned on the documentation:
Cloud Functions is a hosted, private, and scalable Node.js environment
where you can run JavaScript code.
With Cloud Functions, you don't need to create your own Server. You can simply write JavaScript functions and upload it to Firebase. Firebase will be responsible for triggering functions whenever an event occurs.
For example, let's say you want to count the number of likes in a post. You should have a structure similar to this one:
{
"Posts" : {
"randomKey" : {
"likes_count":5,
"likes" : {
"userX" : true,
"userY" : true,
"userZ" : true,
...
}
}
}
}
And your JavaScript function would be written like this:
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp(functions.config().firebase);
// Keeps track of the length of the 'likes' child list in a separate attribute.
exports.countlikes = functions.database.ref('/posts/$postid/likes').onWrite(event => {
return event.data.ref.parent().child('likes_count').set(event.data.numChildren());
});
This code increases the likes_count variable every time there is a new write on the likes node.
This sample is available on GitHub.

Related

Firebase cloud functions: How to wait for a document to be created and update it afterwards

Here is the situation:
I have collections 'lists', 'stats', and 'posts'.
From frontend, there is a scenario where the user uploads a content. The frontend function creates a document under 'lists', and after the document is created, it creates another document under 'posts'.
I have a CF that listens to creation of a document under 'lists' and create a new document under 'stats'.
I have a CF that listens to creation of a document under 'posts' and update the document created under 'stats'.
The intended order of things to happen is 2->3->4. However, apparently, step 4 is triggered before step 3, and so there is no relevant document under 'stats' to update, thus throwing an error.
Is there a way to make the function wait for the document creation under 'stats' and update only after it is created? I thought about using setTimeout() for the function in step 4, but guess there might be a better way.
Below is the code that I am using for steps 3 and 4. Can someone advise? Thanks!
//This listens to a creation of a document under 'lists' and creates a new document
//with the same document ID under 'stats'.
exports.statsCreate = functions.firestore
.document('lists/{listid}').onCreate((snap,context)=>{
const listidpath=snap.ref.path;
const pathfinder=listidpath.split('/');
const listid=pathfinder[pathfinder.length-1];
return db.collection('stats').doc(listid).set({
postcount:0,
})
})
//This listens to a creation of a document under 'posts' and updates the corresponding
// document under 'stats'. There is a field under 'posts' with the list ID to make this possible.
// How do I make sure the update operation happens only after the document is actually there?
exports.statsUpdate = functions.firestore
.document('posts/{postid}').onCreate((snap,context)=>{
const data=snap.data();
return db.collection('stats').doc(data.listid).update({
postcount:admin.firestore.FieldValue.increment(1)
})
})
I can see at least two "easy" solutions:
Solution #1: In your front end, set a listener to the to-be-created stat document (with onSnapshot()), and only create the post document when the stat one has been created. Note however that this solution will not work if the user does not have read access right to the posts collection.
Solution #2: Use the "retry on failure" option for background Cloud Functions. Within your statsUpdate Cloud Function you intentionally throw an exception if the stat doc is not found => The CF will be retried until the stat doc is created.
A third solution would be to use a Callable Cloud Function, called from your front-end. This Callable Cloud Function would write the three docs in the following order: list, stat and post. Then the statsUpdate Cloud Function would be triggered in the background (or you could include its business logic in the Callable Cloud Function as well).
One of the drawbacks of this solution is that the Cloud Function may encounter some cold start effect. In this case, from an end-user perspective, the process may take more time than the abonne solutions. However note that you can specify a minimum number of container instances to be kept warm and ready to serve requests.
PS: Note that in the statsCreate CF, you don't need to extract the listid with:
const listidpath=snap.ref.path;
const pathfinder=listidpath.split('/');
const listid=pathfinder[pathfinder.length-1];
Just do:
const listid = context.params.listid;
The context parameter provides information about the Cloud Function's execution.

Firebase RTDB load balancing with cloud functions

In my app, I'm using RTDB multiple instances, together with RTDB management APIs, to try to dynamically balance the load.
So my point is, because I don't know future load, I will just start with one RTDB instance, and create multiple ones if a specified threshold of usage is exceeded.
In my case, this requires the following:
create new RTDB instance through management API
apply rules to that RTDB instance
apply cloud functions to that RTDB instance
1 and 2 could be programmatically done inside a cloud function, but I'm having troubles with 3.
Is this possible?
Are there any workaround or future plans to support 3?
I'm thinking about two options: deploy a function from a function, or allow RTDB triggers to apply to every instances.
As you can check here in the management API documentation, the response body of the create method returns a DatabaseInstance object which has the following structure:
{
"name": string,
"project": string,
"databaseUrl": string,
"type": enum (DatabaseInstanceType),
"state": enum (State)
}
So, you can get the databaseUrl value, store it somewhere in your code and send it to your cloud function as a parameter later, assuming it is a http function. In the function all you have to do is use the following code to access it:
let app = admin.app();
let ref = app.database('YOUR_SECOND_INSTANCE_URL_HERE').ref();
EDIT
For triggered functions this is not possible, since you would need to know the instances names or URL when deploying to function to apply the trigger to them. If you already have the instances names you could try doing something like this community answer, although I am not sure it will suit your app's needs.

How do I setup firebase realtime DB triggers for multiple databases using cloud functions?

I've a specific use case where I have multiple realtime DBs on a single project (and this number will grow) and I want to set up cloud functions triggers on all of them, currently I'm hoping if there's a way to get the DB name in the callback on which the cloud function is triggered?
import * as functions from 'firebase-functions';
import mongoose from 'mongoose';
export const updateData = functions.database.ref('/someendpoint/{code}').onUpdate(async (change, context) => {
$dbName = getFireBaseDBName(); //some function to get the DB name - This is the step that I would like to know how
await mongoose.connect(`mongo-db-string-connection/${dbName}`, {useNewUrlParser: true});
const Code = context.params.code;
const Schema = new mongoose.Schema({}, { collection: `someendpoint`, strict: false });
const Model = mongoose.model(`someendpoint`, Schema);
const after = change.after.val();
await Model.deleteMany({code: Code});
await Model.create({after, ...{code:Code}});
});
I need the DB name so that I can save to the database with the same name on Mongo.
For example:
Given I have a firebase project 'My-Project' and I have multiple Realtime Database instances on them say:
'db1', 'db2', 'db3'
When the trigger fires, I want to save/update/delete the data in MongoDB database so that it stays in sync with my Firebase Realtime database.
So it's crucial that not only do I get the data stored in db1 but also I get the name 'db1' so that the right data can be altered in Mongo.
Please keep in mind that more databases will be added to My-Project so
somewhere down the line it'll be 'db100.
First thing - I'll say that the way you're using database shards for multi-tenancy isn't really the way they're meant to be used. The Firebase team recommends using separate projects for multi-tenancy, one for each tenant, in order to keep users and their data isolated. The reason that database shards exist is to help developers deal with the scaling limitations of Realtime Database.
All that said, the triggers for Realtime Database don't directly provide the name of the shard in the callback. You will need to write one function for each shard, as required by the API, and described in the documentation.
To control when and where your function should trigger, call ref(path)
to specify a path, and optionally specify a database instance with
instance('INSTANCE_NAME'). If you do not specify an instance, the
function deploys to the default database instance for the Firebase
project For example:
Default database instance: functions.database.ref('/foo/bar')
Instance named "my-app-db-2": functions.database.instance('my-app-db-2').ref('/foo/bar')
Since you have to hard code the name of the shard in the function code, the you might as well just type it again inside the function itself. Or put them in global variables, and use them inside each function.
If you want to see an example of how to share code between each function declared for each instance, read this other question: How to trigger firebase function on all database instances rather than default one?

Firebase function document.create and user.create triggers firing multiple times

I'm trying to keep track of the number of documents in collections and the number of users in my Firebase project. I set up some .create triggers to update a stats document using increment, but sometimes the .create functions trigger multiple times for a single creation event. This happens with both Firestore documents and new users. Any ideas?
const functions = require('firebase-functions');
const admin = require('firebase-admin');
const firestore = require('#google-cloud/firestore')
admin.initializeApp();
const db = admin.firestore()
/* for counting documents created */
exports.countDoc = functions.firestore
.document('collection/{docId}')
.onCreate((change, context) => {
const docId = context.params.docId
db.doc('stats/doc').update({
'docsCreated': firestore.FieldValue.increment(1)
})
return true;
});
/* for counting users created */
exports.countUsers = functions.auth.user().onCreate((user) => {
db.doc('stats/doc').update({
'usersCreated': firestore.FieldValue.increment(1)
})
return true;
});
Thanks!
There is some advice on how to achieve your functions' idempotency.
There are FieldValue.arrayUnion() & FieldValue.arrayRemove() functions which safely remove and add elements to an array, without duplicates or errors if the element being deleted is nonexistent.
You can make array fields in your documents called 'users' and 'docs' and add there data with FieldValue.arrayUnion() by triggered functions. With that approach you can retrieve the actual sizes on the client side by getting users & docs fields and calling .size() on it.
You should expect that a background trigger could possibly be executed multiple times per event. This should be very rare, but not impossible. It's part of the guarantee that Cloud Functions gives you for "at-least-once execution". Since the internal infrastructure is entirely asynchronous with respect to the execution of your code on a dedicated server instance, that infrastructure might not receive the signal that your function finished successfully. In that case, it triggers the function again in order to ensure delivery.
It's recommended that you write your function to be idempotent in order to handle this situation, if it's important for your app. This is not always a very simple thing to implement correctly, and could also add a lot of weight to your code. There are also many ways to do this for different sorts of scenarios. But the choice is yours.
Read more about it in the documentation for execution guarantees.

How to structure data in Firebase to avoid N+1 selects?

Since Firebase security rules cannot be used to filter children, what's the best way to structure data for efficient queries in a basic multi-user application? I've read through several guides, but they seem to break down when scaled past the examples given.
Say you have a basic messaging application like WhatsApp. Users can open chats with other groups of users to send private messages between themselves. Here's my initial idea of how this could be organized in Firebase (a bit similar to this example from the docs):
{
users: {
$uid: {
name: string,
chats: {
$chat_uid : true,
$chat2_uid: true
}
}
},
chats: {
$uid: {
messages: {
message1: 'first message',
message2: 'another message'
}
}
}
}
Firebase permissions could be set up to only let users read chats that are marked true in their user object (and restrict adding arbitrarily to the chats object, etc).
However this layout requires N+1 selects for several common scenarios. For example: to build the home screen, the app has to first retrieve the user's chats object, then make a get request for each thread to get its info. Same thing if a user wants to search their conversations for a specific string: the app has to run a separate request for every chat they have access to in order to see if it matches.
I'm tempted to set up a node.js server to run root-authenticated queries against the chats tree and skip the client-side firebase code altogether. But that's defeating the purpose of Firebase in the first place.
Is there a way to organize data like this using Firebase permissions and avoid the N+1 select problem?
It appears that n+1 queries do not necessarily need to be avoided and that Firebase is engineered specifically to offer good performance when doing n+1 selects, despite being counter-intuitive for developers coming from a relational database background.
An example of n+1 in the Firebase 2.4.2 documentation is followed by a reassuring message:
// List the names of all Mary's groups
var ref = new Firebase("https://docs-examples.firebaseio.com/web/org");
// fetch a list of Mary's groups
ref.child("users/mchen/groups").on('child_added', function(snapshot) {
// for each group, fetch the name and print it
String groupKey = snapshot.key();
ref.child("groups/" + groupKey + "/name").once('value', function(snapshot) {
System.out.println("Mary is a member of this group: " + snapshot.val());
});
});
Is it really okay to look up each record individually? Yes. The Firebase protocol uses web sockets, and the client libraries do a great deal of internal optimization of incoming and outgoing requests. Until we get into tens of thousands of records, this approach is perfectly reasonable. In fact, the time required to download the data (i.e. the byte count) eclipses any other concerns regarding connection overhead.

Resources