I declare that I am a beginner in using Google Cloud Platform.
I am developing a web application in react using firebase, so all data is saved on firestore.
Now I need to have a relational database, and I am very confused as to which is the best between Cloud SQL and BigQuery.
My idea was to have one part of the data on Cloud SQL and the other part on Firestore.
When an event happens, the data from Cloud SQL and firestore are merged and uploaded to BigQuery for analysis.
Example:
On Firestore I have a product that has an array field where IDs are
stored. These IDs are related to the Database saved on Cloud SQL. When
an order is placed it is added to a collection on Firestore and
appended to the database on BigQuery.
My problem is that from what I have read there is no possibility of autoscaling on Cloud SQL, while on BigQuery it does.
So my question is can you autoscale on CloudSQL?
If it can't be done, is it correct to use BigQuery exclusively?
Is there another solution on GCP that allows you to have a relational database but with autoscaling?
Edit 1
This is the very simplified model of a part of the database on CloudSQL / BigQuery
I'll use a 2/3 inner join query to get all the values I need.
I don't know how to make it non-relational and therefore be able to use firestore without having a large duplication of data, I am open to any kind of advice
Not sure that I understood correctly, but I reckon you would like to get some data (from one data source), combine/process that data with the data from a Firestore collection, and load/stream the result into BigQuery. All of that - is operationally in run time. The question is about the choice of that data source - either a Cloud SQL or a BigQuery.
Am I right that from you point of view the main Cloud SQL drawback - is a lack of scalability (autoscale). And you would like to consider a BigQuery instead of the Cloud SQL due to the 'autoscale'?
It is not clear what is the rate of the request/queries you expect, and where the data is located (any requirements on a global access), so it may be difficult to discuss the situation. Anyway...
Thinking about BigQuery, in my opinion, - this is a great "database" (the best from my point of view), but mainly for analytical purposes... Each query has some 'initial' latency (the query job won't be executed faster than some threshold), which cannot be significantly minimised, and there is no binary indexes in BigQuery tables. It means that your query will take a few seconds (let's assume 3 or more) every time you run it (unless the result is taken from the cache). If the number of requests is significant - it may become expensive (in BigQuery) and expensive in the component, which is used to process that task (i.e. Cloud Function triggered by some event) - as the later has to wait (and do nothting) during the query time.
In addition, BigQuery is very good in loading or steeaming data into it, but not very good in regular data updates inside it - there are plenty of limitations. Thus, depending on your context, it may be not very good idea to maintain operational data in BigQuery.
If I rule out the BigQuery -
Can we sacrifice 'autoscalability' for the Cloud SQL?
Can we use a Firestore collection instead of the Cloud SQL (and sacrifice the 'relational' property?
Can we use Cloud SQl and handle the the amount of data in tables which are used for querying, so there is no delays?
Not sure if I managed to help, but at least I provided some thoughts about the problem.
'Now I need to have a relational database, and I am very confused as to which is the best between Cloud SQL and BigQuery.'
Please be aware that BigQuery cannot be used to substitute a relational Database, and it is oriented on running analytical queries, not for simple CRUD operations and queries (Like in Cloud SQL). That doesn’t mean BigQuery can’t handle normalized data and joins. It absolutely can. It just performs better on denormalized stuff because BigQuery is essentially an OLAP engine. So, denormalize whenever possible (please read here).
You can use read replications to scale Cloud SQL. Read Replica instances allow data from the master instance to be replicated to one or more slaves. This setup can provide increased read throughput. Please see this.
Related
I am wondering why aggregation (like SUM) is not built into Firestore when other NoSQL databases like MongoDB has them. Is it inherent to the design of firebase? Do you think it can be added soon?
This is a good question, actually.
Firestore was built for some certain use cases, that are not the use cases MongoDB is built for. MongoDB can be used for a lot of use cases, even those covered by Firestore.
Basically, the main idea behind the dev team was that they wanted to build a document database, easy to use, managed and lightweight. This led to making it without features like shardings, or aggregations, and so on, but still, the dev team knew some of them would be useful.
So they decided to leverage the possibilities offered by a cloud platform, and built it to support everything the platform (Firebase) could offer: Firebase Functions.
So, in the end, the answer is:
no, Firestore will never support aggregation functions. Or at least, that's not in the plans, as for now;
it is still possible to obtain a SUM, using a Firebase Function that will trigger everytime you perform a write operation, so you can update the SUM value. You'll need to store the value somewhere else in your firestore database, but that's a pretty good solution, and it is even documented here as an example. Only thing you have to remind is: the sum value will be "eventually consistent". It means that there could be instants when the persisted value is different from the real value, because the trigger is yet to fire, or because the function that will update the value has yet to finish, but this is the way Google designed Firestore and Firebase, so it's a good practice and pattern we can use.
There is also a third option when it comes to aggregations queries in Firestore (other than counters and client side aggregations)
You could alos also mirror your database to some OLAP capable database and do your aggregations there. Either build the sync mechanism yourself by listening to your data or use dedicated services.
Currently there are two Firebase Extensions you can install that do exacly that for you. "Firestore Big Query Stream" and "Firea.io". Both automatically mirror your data to another database and then allow for queries over that database.
This will allow you to use much more powerfull query languages (SQL / MQL) over your data.
(full disclosure I am one of the founders of Firea.io)
I am building an application that has a single collection (itinerary data) that will have many (40,000+) entries. This data needs to be queryable and included in the firestore.
When I attempted to import the data set, I realized that executing so many writes would be costly and use up most of my allowance, so bulk importing this data isn't an option, unless there is a way to do so without executing so many writes.
My mentor floated the idea of serving the itinerary data as a separate API, and using firestore pulling it into firestore on demand. This would spread the burden of writes over time.
I'm curious about my options here, and would like some advice on how to execute.
What would my client side request look like? Would it involve using a cloud function? How do I ensure the data in the firestore is up to date if my API data changes?
I am using Firebase as my authentication and database platform in my React Native-Expo app. I have not yet decided if I will be using the realtime-database or Firestore database.
I need to perform statistical analysis on daily data gathered from my users, which is stored in the database. I.e. the users type in their daily intake of protein, from it I would like to calculate their weekly average, expected monthly average, provide suggestions of types of food if protein intake is too low and etc.
What would be the best approach in order to achieve the result wanted in my specific situation?
I am really unfamiliar and stepping into uncharted territory regarding on how I can accomplish this. I have read that Firebase Analytics generates different basic analytics regarding usage of the app, number crash-free users etc. But can it perform analytics on custom events? Can I create a custom event for Firebase analytics to keep track of a certain node in my database, and output analytics from that? And then of course, if yes, does it work with React Native-Expo or do I need to detach from Expo? In addition, I have read that Firebase Analytics can be combined with Google BigQuery. Would this be an alternative for my case?
Are there any other ways of performing such data analysis on my data stored in Firebase database? For example, export the data and use Python and SciKit Learn?
Whatever opinion or advice you may have, I would be grateful if you could share it!
You're not alone - many people building web apps on GCP have this question, and there is no single answer.
I'm not too familiar with Firebase Analytics, but can answer the question for Firestore and for your custom analytics (e.g. weekly avg protein consumption)
The first thing to point out is that Firestore, unlike other NoSQL databases, is storage only. You can't perform aggregations in real time like you can with MongoDB, so the calculations have to be done somewhere else.
The best practice recommended by GCP in this case is indeed to do a regular export of your Firestore data into BQ (BigQuery), and you can run analytical calculations there in the meantime. You could also, when a user inputs some data, send that to Pub/Sub and use one of GCP Dataflow's streaming templates to stream the data into BQ, and have everything in near real time.
Here's the issue with that however: while this solution gives you real time, and is very scalable, it gets expensive fast, and if you're more used to Python than SQL to run analytics it can be a steep learning curve. Here's an alternative I've used for smaller webapps, which scales well for <100k users and costs <$20 a month on GCP's current pricing:
Write a Python script that grabs the data from Firestore (using the Firestore Python SDK), generates the analytics you need on it, and writes the results back to a Firestore collection
Create an endpoint for that function using Flask or Django
Deploy that server application on Cloud Run, preventing unauthenticated invocations (you'll only be calling it from within GCP) - see this article, steps 1 and 2 only. You can also deploy the Python script(s) to GCP's Vertex AI or hosted Jupyter notebooks if you're more comfortable with that
Use Cloud Scheduler to call that function every x minutes - see these docs for authentication
Have your React app query the "analytics results" collection to get the results
My solution is a FlutterWeb based Dashboard that displays relevant data in (near) realtime like the Regular Flutter IOS/Android app and likewise some aggregated data.
The aggregated data is compiled using a few nodejs based triggers in the database that does any analytic lifting and hence is also near realtime. If you study pricing you will learn, that function invocations are pretty cheap unless of-course you happen to make a 'desphew' :)
I came up with a great solution.
I used the inbuilt firebase BigQuery plugin. Then I used Cube.js (deployed on GCP - cloud run on docker) on top of bigquery.
Cube.js just makes everything just so easy. You do need to make a manual query It tries to do optimize queries. On top of that, it uses caching so you won't get big bills on GCP. I think this is the best solution I was able to find. And this is infinitely scalable and totally real-time.
Also if you are a small startup then it is mostly free with GCP - free limits on cloud run and BigQuery.
Note:- This is not affiliated in any way with cubejs.
I am using AWS DynamoDB in order to store information.
I have two machines running separate codes, that accessing the information in the database.
One of the machines is writing into the database and the second one is reading.
Since the second one does not know whether or not the information in the database has been changed I need to somehow monitor my database for changes.
I know that there is something called dynamo streams that can provide you with the information regarding changes made in your database and I already have that code implemented.
The question is as follows: if I am monitoring the database constantly, I need to query this stream all the time, let's say once every minute.
What is the difference between doing that and actually querying the database every minute?
Is it much more efficient?
Is it less costly (resources, moneywise)?
Is there any other, more efficient way of monitoring changes in the database in a specific table from the code?
any help would be appreciated, thank you.
Most people I have seen do something like this do it with DynamoDB Streams + Lambda for best results. Definitely check out the DynamoDB docs and the Lambda docs on this topic.
There's also an example in the docs of monitoring DynamoDB where changes fire off a message to an SNS topic.
DynamoDB Streams is more efficient and near real time. Think of using Lambda in this way like you would a trigger in a relational database. Why do the extra effort, when the patterns are this very well defined and people use them all the time?
Google just released Cloud Firestore, their new Document Database for apps.
I have been reading the documentation but I don't see a lot of differences between Firestore and Firebase DB.
The main point is that Firestore uses documents and collections which allow the easy use of querying compared to Firebase, which is a traditional noSQL database with a JSON base.
I would like to know a bit more about their differences, or usages, or whether Firestore just came to replace Firebase DB?
I wrote an entire blog post all about this very question, and I recommend you check it out (or the official documentation) for a more complete answer.
But if you want the quick(-ish) summary, here it is:
Better querying and more structured data -- While the Realtime Database is just a giant JSON tree, Cloud Firestore is a little more structured. All your data consists of documents (which are basically key-value stores) and collections (which are collections of documents). Documents will also frequently point to subcollections, which contain other documents, which themselves can contain other documents, and so on.
This structured data helps you out in two ways. First, all queries are shallow, meaning that you can request a document without grabbing all the data underneath. This means you can keep your data stored hierarchically in a way that makes more sense to you without having to worry about keeping your database shallow. Second, you have more powerful queries. For instance, you can now query across multiple fields without having to create those "combo" fields that combine (and denormalize) data from other parts of your database. In some cases, Cloud Firestore will just run those queries directly, and in other cases, it will automatically create and maintain indexes for you.
Designed to Scale -- Cloud Firestore will be able to scale better than the Realtime Database. It's important to note that your queries scale to the size of your result set, not your data set. So searching will remain fast no matter how large your data set might become.
Easier manual fetching of data -- Like the Realtime Database, you can set up listeners in Cloud Firestore to stream in changes in real-time. But if you don't want that kind of behavior, and just want a simple "fetch my data" call, Cloud Firestore has that as well, and it's built in as a primary use case. (They're much better than the once calls in Realtime Database-land)
Multi region support -- This basically means more reliability, as your data is shared across multiple data centers at once. But you still have strong consistency, meaning you can always make a query and be assured that you're getting the latest version of your data.
Different pricing model -- While the Realtime Database primarily charges based on storage or network bandwidth, Cloud Firestore primarily charges based on the number of operations you perform. Will this be better, or worse? It depends on your app.
For powering a news app, turn-based multiplayer game, or something like your own version of Stack Overflow, Cloud Firestore will probably look pretty favorable from a pricing standpoint. For something like a real-time group drawing app where you're sending across multiple updates a second to multiple people, it probably will be more expensive than the Realtime Database.
Why you still might want the to use the Realtime Database -- It comes down to a few reasons.
That whole "it'll probably be cheaper for apps that make lots of frequent updates" thing I mentioned previously,
It's been around for a long time and has been battle tested by thousands of apps,
It's got better latency and when you need something with reliably low latency for a real-timey feel, the Realtime Database might work better.
For most new apps, we recommend you check out Cloud Firestore. But if you have an app that's already on the Realtime Database, I don't really recommend switching just for the sake of switching, unless you have a compelling reason to do so.
Reasons to choose Cloud Firestore over Realtime Database
It is an improved version
Firebase database was enough for basic applications. But it was not powerful enough to handle complex requirements. That is why Cloud Firestore is introduced. Here are some major changes.
The basic file structure is improved.
Offline support for the web client.
Supports more advanced querying.
Write and transaction operations are atomic.
Reliability and performance improvements
Scaling will be automatic.
Will be more secure.
Pricing
In Cloud Firestore, rates have lowered even though it charges primarily on operations performed in your database along with bandwidth and storage. You can set a daily spending limit too. Here is the complete details about billing.
Future plans of Google
When they discovered the flaws with Real-time Database, they created another product rather than improving the old one. Even though there are no reliable details revealing their current standings on Real-time Database, it is the time to start thinking that it is likely to be abandoned.
Suggest link from google as well :
Firebase Real-time Database vs FireStore
Extracted from google docs, a small sumamry here:
FireBase Real Time DB is JSON based NO SQL DB, meant for mobile apps, regional, and used typically to store and sync data between users/devices in realtime / extremely low latency.
FireStore is JSON 'like' NOSQL DB meant for high concurrency, global, easily auto scaling persistence, designed for any clients (not only mobile apps) with typical use cases such as asset tracking, real time analytics, building retail product catalogs, social user profile, gaming leaderboards, chat based applications etc.
Cloud Firestore is Firebase's database for mobile app
development. It builds on the successes of the Realtime Database with
a new, more intuitive data model. Cloud Firestore also features
richer, faster queries and scales further than the Realtime Database.
Realtime Database is Firebase's original database. It's an efficient,
low-latency solution for mobile apps that require synced states
across clients in realtime.
To choose between Firebase Realtime database and Cloud firestore based on your application requirements, read official documentation here.