Can I use Firebase Cloud Functions for search engine? - firebase

Firebase recently released integration to Cloud Functions that allows us to upload Javascript functions to run without needing our own servers.
Is it possible to build a search engine using those functions? My idea is to use local disk (tmpfs volume) to keep indexed data in memory and for each write event I would index the new data. Does tmpfs keeps data between function calls (instances)?
Can cloud functions be used for this purpose or should I use a dedicated server for indexing data?
Another question related to this is: when cloud functions get data from Firebase Realtime Database, does it consumes network or just disk reading? How is it computed in princing?
Thanks

You could certainly try that. Cloud Functions have a local file system that typically is used to maintain state during a run. See this answer for more: Write temporary files from Google Cloud Function
But there are (as far as I know) no guarantees that state will be maintained between runs of your function. Or even that the function will be running on the same container next time. You may be running on a newly created container next time. Or when there's a spike in invocations, your function may be running on multiple containers at once. So you'd potentially have to rebuild the search index for every run of your function.
I would instead look at integrating an external dedicated search engine, such as Algolia in this example: https://github.com/firebase/functions-samples/tree/master/fulltext-search. Have a look at the code: even with comments and license it's only 55 lines!
Alternatively you could find a persistent storage service (Firebase Database and Firebase Storage being two examples) and use that to persist the search index. So you'd run the code to update the search index in Cloud Functions, but would store the resulting index files in a more persistent location.

GCF team member + former google search member. Cloud Functions would not be suitable for in-memory search engines for a few reasons.
A search engine is very wise to separate its indexing and serving machines. At scale you'll want to worry about read and write hot-spotting differently.
As Frank alluded to, you're not guaranteed to get the same instance across multiple requests. I'd like to strengthen his concern: you will never get the same instance across two different Cloud Functions. Each Cloud Function has its own backend infrastructure that is provisioned and scaled independently.
I know it's tempting to cut dependencies, but cutting out persistent storage isn't the way. Your serving layer can use caching to speed up requests, but durable storage makes sure you don't have to reindex the whole corpus if your Cloud Function crashes or you deploy an update (each guarantees the whole instance is scrapped and recreated).

Related

Firebase - Perform Analytics from database/firestore data

I am using Firebase as my authentication and database platform in my React Native-Expo app. I have not yet decided if I will be using the realtime-database or Firestore database.
I need to perform statistical analysis on daily data gathered from my users, which is stored in the database. I.e. the users type in their daily intake of protein, from it I would like to calculate their weekly average, expected monthly average, provide suggestions of types of food if protein intake is too low and etc.
What would be the best approach in order to achieve the result wanted in my specific situation?
I am really unfamiliar and stepping into uncharted territory regarding on how I can accomplish this. I have read that Firebase Analytics generates different basic analytics regarding usage of the app, number crash-free users etc. But can it perform analytics on custom events? Can I create a custom event for Firebase analytics to keep track of a certain node in my database, and output analytics from that? And then of course, if yes, does it work with React Native-Expo or do I need to detach from Expo? In addition, I have read that Firebase Analytics can be combined with Google BigQuery. Would this be an alternative for my case?
Are there any other ways of performing such data analysis on my data stored in Firebase database? For example, export the data and use Python and SciKit Learn?
Whatever opinion or advice you may have, I would be grateful if you could share it!
You're not alone - many people building web apps on GCP have this question, and there is no single answer.
I'm not too familiar with Firebase Analytics, but can answer the question for Firestore and for your custom analytics (e.g. weekly avg protein consumption)
The first thing to point out is that Firestore, unlike other NoSQL databases, is storage only. You can't perform aggregations in real time like you can with MongoDB, so the calculations have to be done somewhere else.
The best practice recommended by GCP in this case is indeed to do a regular export of your Firestore data into BQ (BigQuery), and you can run analytical calculations there in the meantime. You could also, when a user inputs some data, send that to Pub/Sub and use one of GCP Dataflow's streaming templates to stream the data into BQ, and have everything in near real time.
Here's the issue with that however: while this solution gives you real time, and is very scalable, it gets expensive fast, and if you're more used to Python than SQL to run analytics it can be a steep learning curve. Here's an alternative I've used for smaller webapps, which scales well for <100k users and costs <$20 a month on GCP's current pricing:
Write a Python script that grabs the data from Firestore (using the Firestore Python SDK), generates the analytics you need on it, and writes the results back to a Firestore collection
Create an endpoint for that function using Flask or Django
Deploy that server application on Cloud Run, preventing unauthenticated invocations (you'll only be calling it from within GCP) - see this article, steps 1 and 2 only. You can also deploy the Python script(s) to GCP's Vertex AI or hosted Jupyter notebooks if you're more comfortable with that
Use Cloud Scheduler to call that function every x minutes - see these docs for authentication
Have your React app query the "analytics results" collection to get the results
My solution is a FlutterWeb based Dashboard that displays relevant data in (near) realtime like the Regular Flutter IOS/Android app and likewise some aggregated data.
The aggregated data is compiled using a few nodejs based triggers in the database that does any analytic lifting and hence is also near realtime. If you study pricing you will learn, that function invocations are pretty cheap unless of-course you happen to make a 'desphew' :)
I came up with a great solution.
I used the inbuilt firebase BigQuery plugin. Then I used Cube.js (deployed on GCP - cloud run on docker) on top of bigquery.
Cube.js just makes everything just so easy. You do need to make a manual query It tries to do optimize queries. On top of that, it uses caching so you won't get big bills on GCP. I think this is the best solution I was able to find. And this is infinitely scalable and totally real-time.
Also if you are a small startup then it is mostly free with GCP - free limits on cloud run and BigQuery.
Note:- This is not affiliated in any way with cubejs.

Performance difference Firestore through Firebase Functions vs Firestore SDK

Our team is developing a mobile app and is currently in use of (Firebase) Firestore for our backend. We wrapped every DB access with Firebase Functions in order to clean up the object returned to the client app.
Does this approach introduce any (additional) unignorable overhead compared to accessing to Firestore directly?
Yes but No depending on your use case.
If you have small amount of users with relatively low usage (in terms of the given quota), it is recommended to apply Cloud Functions. As stated in the documentation, Firebase Cloud Function offers big quota in terms of Resource limits, Time limits and Rate limits with good pricing especially for the Spark plan (FREE).
The advantage of using Cloud Functions is that it has a high speed and scalable computing / processing unit which could shorten the processing time of a specific function as compared with using the mobile phone CPU which in some cases the mobile phones has low computing power (have to consider various users as not everyone own a high spec phone), in order to provide better user experience (UX), all this hassle can be done by Cloud Function!
Note: I do agree with Doug where cost is one of the factor, but we should also consider the performance and other perspective.
Yes, at the very least, now your path to get data has two hops instead of one. Before, you directly accessed the database using a channel that's optimized for returning the query results. Now, you have to pay the cost of an additional hop to Cloud Functions, which makes the query. And it's possible that the results returned to the client are bigger than if you made the query directly.
Perhaps the biggest loss you'll experience is the client side caching of documents that's automatically performed by the client (enabled by default on Android and iOS). If you repeat a query and none of the documents have changed, you get immediate results from the cache instead of having to wait for the server. And you won't have to pay for document reads for cache hits. So, if you aren't also caching your results, you're also paying the monetary cost of Cloud Functions and the query to Firestore for every request.
Yes, but the answer could be different based on the situation.
If a client wants to fetch a record exactly as in the database, the Firebase SDK might be faster because there is no overhead calling the Firebase Functions.
If we have a heavy processing after fetching a record, then Firebase Functions + Firebase Admin SDK could be faster because the processing unit in Firebase Functions could be faster than mobile CPU. However, if the request responds faster, the client app could display an additional message that something was fetched and currently in process during the heavy processing, the user experience could be acceptable.
The only case I can come up with Firebase Functions could always win is that the server reduces the data size so that the overhead introduced by Firebase Functions (including processing time) was compensated by the shorter network delay. This also has advantage of saving client's data plan.

An Alternative for Firebase functions ? Is it okay to run them on a VM?

I am using firebase functions for an Uberlike product. I can't get expected performance. Specially it takes a long time to load data from realtime-db. Up to 2-3 seconds for a read.
It's may be due to called start, which is discussed here. => Why is Cloud Functions for Firebase taking 25 seconds?
So I decided to move the functionality of these functions to a VM instance. Using firebase onWrite and admin SDK, a similar functionality can be achieved on a virtual machine.
Is it okay to do so? Will I get any scalability issue?
It is definitely possible to run similar code on your own hardware/VM. In fact that is how many of Firebase's own back-end processes ran, before Cloud Functions was available.
What you'll miss is the auto-scaling of Cloud Functions though. Your machine/VM will always be running, and has a limited capacity (how much it can handle). Unlike Firebase, it has a fixed capacity.
Cloud Functions on the other hand, scaled down to 0 when there are no request, and scales up to meet demand as needed. Whether that is needed for your use-case, only you can determine.

What is the best way to store a large constant hashmap in node.js (firebase cloud functions)?

If I try to download or read the file from Firebase database or firebase storage, it will incur unnecessarily large quota costs and be financially unsustainable. And since firebase only has an in-memory file system with a "tmp" directory it is impossible to deploy any files there.
The reason I think my request should be very reasonable, is I could accomplish it by literally declaring the entire 80 MB of hashmaps data in the code. Maybe I'll write a script that enumerates all those fields in js and put it inside the index.js itself? Then it never has to download from anywhere, and won't incur any quotas. However this seems like a very desperate solution for what seems to be a simple problem.
Cross post of https://www.quora.com/How-can-I-keep-objects-stored-in-RAM-with-Cloud-Functions

Firebase and indexing/search

I am considering using Firebase for an application that should people to use full-text search over a collection of a few thousand objects. I like the idea of delivering a client-only application (not having to worry about hosting the data), but I am not sure how to handle search. The data will be static, so the indexing itself is not a big deal.
I assume I will need some additional service that runs queries and returns Firebase object handles. I can spin up such a service at some fixed location, but then I have to worry about its availability ad scalability. Although I don't expect too much traffic for this app, it can peak at a couple of thousand concurrent users.
Architectural thoughts?
Long-term, Firebase may have more advanced querying, so hopefully it'll support this sort of thing directly without you having to do anything special. Until then, you have a few options:
Write server code to handle the searching. The easiest way would be to run some server code responsible for the indexing/searching, as you mentioned. Firebase has a Node.JS client, so that would be an easy way to interface the service into Firebase. All of the data transfer could still happen through Firebase, but you would write a Node.JS service that watches for client "search requests" at some designated location in Firebase and then "responds" by writing the result set back into Firebase, for the client to consume.
Store the index in Firebase with clients automatically updating it. If you want to get really clever, you could try implementing a server-less scheme where clients automatically index their data as they write it... So the index for the full-text search would be stored in Firebase, and when a client writes a new item to the collection, it would be responsible for also updating the index appropriately. And to do a search, the client would directly consume the index to build the result set. This actually makes a lot of sense for simple cases where you want to index one field of a complex object stored in Firebase, but for full-text-search, this would probably be pretty gnarly. :-)
Store the index in Firebase with server code updating it. You could try a hybrid approach where the index is stored in Firebase and is used directly by clients to do searches, but rather than have clients update the index, you'd have server code that updates the index whenever new items are added to the collection. This way, clients could still search for data when your server is down. They just might get stale results until your server catches up on the indexing.
Until Firebase has more advanced querying, #1 is probably your best bet if you're willing to run a little server code. :-)
Google's current method to do full text search seems to be syncing with either Algolia or BigQuery with Cloud Functions for Firebase.
Here's Firebase's Algolia Full-text search integration example, and their BigQuery integration example that could be extended to support full search.

Resources