Import large data (json) into Firebase periodically - firebase

We are in the situation that we will have to update large amounts of data (ca. 5 Mio Records) in firebase periodically. At the moment we have a few json files that are around ~1 GB in size.
As existing third party solutions (here and here) have some reliability issues (import object per object; or need for open connection) and are quite disconnected to the google cloud platform ecosystem. I wonder if there is now an "official" way using i.e. the new google cloud functions? Or a combination with app engine / google cloud storage / google cloud datastore.
I really like not to deal with authentication — something that cloud functions seems to handle well, but I assume the function would time out (?)
With the new firebase tooling available, how to:
Have long running cloud functions to do data fetching / inserts? (does it make sense?)
Get the json files into & from somewhere inside the google cloud platform?
Does it make sense to first throw large data into google-cloud-datastore (i.e. too $$$ expensive to store in firebase) or can the firebase real-time database be reliably treaded as a large data storage.

I finally post the answer as it aligns with the new Google Cloud Platform tooling of 2017.
The newly introduced Google Cloud Functions have a limited run-time of approximately 9 minutes (540 seconds). However, cloud functions are able to create a node.js read stream from cloud storage like so (#googlecloud/storage on npm)
var gcs = require('#google-cloud/storage')({
// You don't need extra authentication when running the function
// online in the same project
projectId: 'grape-spaceship-123',
keyFilename: '/path/to/keyfile.json'
});
// Reference an existing bucket.
var bucket = gcs.bucket('json-upload-bucket');
var remoteReadStream = bucket.file('superlarge.json').createReadStream();
Even though it is a remote stream, it is highly efficient. In tests I was able to parse jsons larger than 3 GB under 4 minutes, doing simple json transformations.
As we are working with node.js streams now, any JSONStream Library can efficiently transform the data on the fly (JSONStream on npm), dealing with the data asynchronously just like a large array with event streams (event-stream on npm).
es = require('event-stream')
remoteReadStream.pipe(JSONStream.parse('objects.*'))
.pipe(es.map(function (data, callback(err, data)) {
console.error(data)
// Insert Data into Firebase.
callback(null, data) // ! Return data if you want to make further transformations.
}))
Return only null in the callback at the end of the pipe to prevent a memory leak blocking the whole function.
If you do heavier transformations that require a longer run time, either use a "job db" in firebase to track where you are at and only do i.e. 100.000 transformations and call the function again, or set up an additional function which listens on inserts into a "forimport db" that finally transforms the raw jsons object record into your target format and production system asynchronously. Splitting import and computation.
Additionally, you can run cloud functions code in a nodejs app engine. But not necessarily the other way around.

Related

Is there a way to add new field to all of the documents in a firestore collection?

I have a collection that needs to be updated. There's a need to add new field and fill it out based on the existing field.
Let's say I have a collection called documents:
documents/{documentId}: {
existingField: ['foo', 'bar'],
myNewField ['foo', 'bar']
}
documents/{anotherDocumentId}: {
existingField: ['baz'],
myNewField ['baz']
}
// ... and so on
I already tried to fire up local cloud function from emulator that loops for each document and writes to production data based on the logic I need. The problem is that function can only live up to max of 30 seconds. What I need would be some kind of console tool that I can run as admin (using service-account) to quickly manage my needs.
How do you handle such cases?
Firebase does not provide a console or tool to do migrations.
You can write a program to run on your development machine that uses the one of the backend SDKs (like the Firebase Admin SDK) to query, iterate, and update the documents and let it run as long as you want.
There is nothing specific built into the API for this type of data migration. You'll have to update each document in turn, which typically involves also reading all documents (or at least their IDs).
While it is possible to do this on Cloud Functions, I find it easier to do it with a local Node.js script, as that doesn't have the runtime limits Cloud Functions imposes.

Reading the Realtime database directly VS using Cloud Functions

I have been reading this article about reading the realtime database directly vs calling cloud functions that returns database data.
If I am returning a fairly large chunk of data e.g. a json object that holds data representing 50 user comments from a cloud function does this count
As Outbound Data (Egress) data? If so does this cost $0.12 per gb per month?
The comments are stored like so with an incremental key.
comments: [0 -> {text: “Adsadsads”},
1 -> {text: “dadsacxdg”},
etc.]
Furthermore, I have read you can call goOffline() and goOnline() using the client sdks to stop concurrent connections. Are there any costs associated with closing and
Opening database connections or is it just the speed aspect of opening a connection every time you read?
Would it be more cost effective to call a cloud function that returns the set of 50 comments or allow the devices to read the comments directly from the database
But open/close before/after each read to the database, using orderByKey(), once(), startAt() and limitToFirst()?
e.g something like this
ref(‘comments’).once().orderByKey().startAt(0).limitToFirst(50).
Thanks
If your Cloud Function reads data from Realtime Database and returns (part of) that data to the caller, you pay for the data that is read from the database (at $1/GB) and then also for the data that your Cloud Function returns to the user (at $0.12/GB).
Opening a connection to the database means data is sent from the database to the client, and you are charged for this data (typically a few KB).
Which one is more cost effective is something you can calculate once you have all parameters. I'd recommend against premature cost optimization though: Firebase has a pretty generous free tier on its Realtime Database, so I'd start reading directly from the database and seeing how much traffic that generates. Also: if you are explicitly managing the connection state, and seem uninterested in the realtime nature of Firebase, there might be better/cheaper alternatives than Firebase to fill your needs.

Attach firebase cloud function or cache its data from cloud function call

I have a frontend component that consists of a chart and several different filters that allow users to filter by data type. However, the data that they are filtering is relatively large, and so I do not want to load all of it into the webpage, and instead have a firebase cloud function handle the filtering. The issue is that users will usually do a bunch of filtering while using this component, so it does not make sense for the cloud function to repeatedly download the necessary data. Is there a way to "attach" the cloud function to the call and have it update without having to re-retrieve the data, or to somehow cache the retrieved firebase data somewhere accessible to the cloud function if this is not possible?
exports.handleChartData = functions.database.ref("chartData").onCall((data, context) => {
// can I cache data here somehow
// or can I have this function read in updates from user selected filters
// without having to retrieve data again?
}
You can write data to the local /tmp disk. Just be aware that:
There is no guarantee that the data will be there next time, as instances are spun up and down as needed. So you will need to check if the file exists on each call, and be ready to create it when it doesn't exist yet.
The /tmp disk space is a RAM disk, so any files written there will come out of the memory you've allocated for your Cloud Functions containers.
You can't reliably keep listeners alive across calls, so you won't be able to update the cache.
Also see:
Write temporary files from Google Cloud Function
the documentation on cleaning up temporary files
Firebase cloud function [ Error: memory limit exceeded. Function invocation was interrupted.] on youtube video upload

How to generate DownloadUrl from Google-Cloud storage (I came from firebase)

Just trying to figure out something that seemed trivial in firebase, in google-cloud.
It seems as though if you're making a node.js app for HTML (i'm talking to it through Unity actually, but it's a desktop application) you can't use firebase-storage for some odd reason, you have to use google-cloud, even the firebase-admin tools use the cloud storage to do storage from here.
Nevertheless, i got it working, i am uploading the files to the firebase storage; however, the problem is in firebase, you could specify a specific file, and then do storage().ref().child(filelocation).GetDownloadURL(): this would generate a unique url for some set time that can be used publicly, without having to give out access to read to all anonymous users.
I did some research and i need to implement something called GS UTIL in order to generate my own special urls, but it's so damn complicated (im a newbie to this whole server stuff), i don't even know where to start to get this working in my node server.
Any pointers? I'm really stuck here.
-------if anyones interested, this is what im trying to do high level-----
I'm sending 3d model data to node app from Unity
the node app is publishing this model on sketchfab
then it puts the model data onto my own storage, along with some additional data specially made for my app
after it gets signed to storage, it gets saved to my Firebase DB in my global model database
to be accessed later, by users, to try to get the downloadURL of this storage file and send them all back to Unity users(s)
I would just download the files into my node app, but i wanna reduce any server load, it's supposed to be just a middleman between Unity and Firebase
(i would've done it straight from Unity, but apparently firebase isn't for desktop windows apps).
Figured it out:
var firebase_admin = require("firebase-admin");
var storage = firebase_admin.storage();
var bucket = storage.bucket();
bucket.file(childSnapshot.val().modelLink).getSignedUrl({
action: 'read',
expires: expDate
},function(err,url){
if(err){
reject(err);
}
else{
finalData.ModelDownloadLink = url;
console.log("Download model DL url: " + url);
resolve();
}
});

Can Firebase RemoteConfig be accessed from cloud functions

I'm using Firebase as a simple game-server and have some settings that are relevant for both client and backend and would like to keep them in RemoteConfig for consistency, but not sure if I can access it from my cloud functions in a simple way (I don't consider going through the REST interface a "simple" way)
As far as I can tell there is no mention of it in the docs, so I guess it's not possible, but does anyone know for sure?
firebaser here
There is a public REST API that allows you to read and set Firebase Remote Config conditions. This API requires that you have full administrative access to the Firebase project, so must only be used on a trusted environment (such as your development machine, a server you control or Cloud Functions).
There is no public API to get Firebase Remote Config settings from a client environment at the moment. Sorry I don't have better news.
This is probably only included in newer versions of firebase (8th or 9th and above if I'm not mistaken).
// We first need to import remoteConfig function.
import { remoteConfig } from firebase-admin
// Then in your cloud function we use it to fetch our remote config values.
const remoteConfigTemplate = await remoteConfig().getTemplate().catch(e => {
// Your error handling if fetching fails...
}
// Next it is just matter of extracting the values, which is kinda convoluted,
// let's say you want to extract `game_version` field from remote config:
const gameVersion = remoteConfigTemplate.parameters.game_version.defaultValue.value
So parameters are always followed by the name of the field that you defined in Firebase console's remote config, in this example game_version.
It's a mouthful (or typeful) but that's how you get it.
Also note that if value is stored as JSON string, you will need to parse it before usage, commonly: JSON.parse(gameVersion).
Similar process is outlined in Firebase docs.

Resources