Query Times : Cloud Datastore - google-cloud-datastore

I have a simple data loaded in Cloud Datastore. A "kind" with few "entities".
I am using Cloud Datastore Java client and I am consistently seeing a 100+ ms for the response times.
The document is a simple string based key-value-pair data and is indexed. The higher SLAs were observed through the API explorer console as well as when running the client code in a cluster.
Is this SLA normal for the cloud datastore ?. I am looking for SLAs < ~50ms for a simple lookup.

Without knowing the specifics of your project or the nature of the queries you're making, it's difficult to say whether these are normal response times. It's possible you can improve performance of your queries by following the guidelines that are laid out in the Best Practices.
The latency of queries to Datastore is not currently covered by any service level agreement (SLA).

Related

BigQuery vs Cloud SQL autoscaling?

I declare that I am a beginner in using Google Cloud Platform.
I am developing a web application in react using firebase, so all data is saved on firestore.
Now I need to have a relational database, and I am very confused as to which is the best between Cloud SQL and BigQuery.
My idea was to have one part of the data on Cloud SQL and the other part on Firestore.
When an event happens, the data from Cloud SQL and firestore are merged and uploaded to BigQuery for analysis.
Example:
On Firestore I have a product that has an array field where IDs are
stored. These IDs are related to the Database saved on Cloud SQL. When
an order is placed it is added to a collection on Firestore and
appended to the database on BigQuery.
My problem is that from what I have read there is no possibility of autoscaling on Cloud SQL, while on BigQuery it does.
So my question is can you autoscale on CloudSQL?
If it can't be done, is it correct to use BigQuery exclusively?
Is there another solution on GCP that allows you to have a relational database but with autoscaling?
Edit 1
This is the very simplified model of a part of the database on CloudSQL / BigQuery
I'll use a 2/3 inner join query to get all the values I need.
I don't know how to make it non-relational and therefore be able to use firestore without having a large duplication of data, I am open to any kind of advice
Not sure that I understood correctly, but I reckon you would like to get some data (from one data source), combine/process that data with the data from a Firestore collection, and load/stream the result into BigQuery. All of that - is operationally in run time. The question is about the choice of that data source - either a Cloud SQL or a BigQuery.
Am I right that from you point of view the main Cloud SQL drawback - is a lack of scalability (autoscale). And you would like to consider a BigQuery instead of the Cloud SQL due to the 'autoscale'?
It is not clear what is the rate of the request/queries you expect, and where the data is located (any requirements on a global access), so it may be difficult to discuss the situation. Anyway...
Thinking about BigQuery, in my opinion, - this is a great "database" (the best from my point of view), but mainly for analytical purposes... Each query has some 'initial' latency (the query job won't be executed faster than some threshold), which cannot be significantly minimised, and there is no binary indexes in BigQuery tables. It means that your query will take a few seconds (let's assume 3 or more) every time you run it (unless the result is taken from the cache). If the number of requests is significant - it may become expensive (in BigQuery) and expensive in the component, which is used to process that task (i.e. Cloud Function triggered by some event) - as the later has to wait (and do nothting) during the query time.
In addition, BigQuery is very good in loading or steeaming data into it, but not very good in regular data updates inside it - there are plenty of limitations. Thus, depending on your context, it may be not very good idea to maintain operational data in BigQuery.
If I rule out the BigQuery -
Can we sacrifice 'autoscalability' for the Cloud SQL?
Can we use a Firestore collection instead of the Cloud SQL (and sacrifice the 'relational' property?
Can we use Cloud SQl and handle the the amount of data in tables which are used for querying, so there is no delays?
Not sure if I managed to help, but at least I provided some thoughts about the problem.
'Now I need to have a relational database, and I am very confused as to which is the best between Cloud SQL and BigQuery.'
Please be aware that BigQuery cannot be used to substitute a relational Database, and it is oriented on running analytical queries, not for simple CRUD operations and queries (Like in Cloud SQL). That doesn’t mean BigQuery can’t handle normalized data and joins. It absolutely can. It just performs better on denormalized stuff because BigQuery is essentially an OLAP engine. So, denormalize whenever possible (please read here).
You can use read replications to scale Cloud SQL. Read Replica instances allow data from the master instance to be replicated to one or more slaves. This setup can provide increased read throughput. Please see this.

Is there a Google API answering about Firestore database either Metrics or Health Checks or Current Active Connectios or Exceptions or Performance

Context: I am total Google Cloud begginer and I have just convinced my company headers to use Firestore Realtime Database for pushing transaction status to our mobile application. We have around 4 millions users that will use significantly our application for small money transfers. Now-a-days we use the concept of polling from Android/IOS to our Microservice endpoints and it will replaced by Firebase SDK imported to our Mobile app which will listen/observe to our Firestore Collection following few Firestore Rules. Since all money transfer will be confirmed/denied in short time (from few seconds to 1 or 2 minutes) the idea of replacing polling by a real reactive approach straigh from Firestore sounded and is already ongoing coding.
The issue: Firstly I don't what to compare solutions. It is just my reality: the prodution support operators must look after our internal Dashboard. Isn't allowed to them look at Google Dashboard Console (please accept this for this question). I need get on demand metrics of our FIrestore. It is nothing to do with Google pricing. It is just our demand: they want to see metrics like:
how many users listening at the same time now
how many users took some exception during connection
is there any user holding connection for more than X minute
when was the connection pick this morning
any exception of any type surrounding our Firestore database
I read Code Samples carefully follow the sample step-by-step trying to figure out some idea if there is some API providing the answers I am looking for.
So, my straight question is: is there such type of Google API providing metrics about my Firestore Database? Maybe following the same idea we found in Performance Monitor which works on Mobile side also some similar aproach on Firestore side.
*** Edited
Future readers may find worth read also about a way to get Firestore metrics info striagh from curl/postman
A couple of things: You mentioned both Firestore and Realtime Database; just wanted to make sure that you are aware that those are two different databases offered under the Firebase umbrella.
how many users listening at the same time now
is there any user holding connection for more than X minute
Yes, there's a dashboard: https://support.google.com/firebase/answer/6317517?hl=en. Including lots of options, like users active in the last 30 mins.
how many users took some exception during connection
any exception of any type surrounding our Firestore database
Yes, you can track errors and other logging via Stack Driver logging. These can give you reports on your cloud functions.
https://cloud.google.com/functions/docs/monitoring
Where can I find Stackdriver in Firebase console?
when was the connection pick this morning
For this one, I'm not sure if you mean A. when did somebody log on in the morning, or B. what was the time that there was the peak \ most usage. If B see 1. If A,
Real-time database has the concept of presence, which lets you know if a user is currently logged in or not. See examples here from the official documentation:
https://firebase.google.com/docs/firestore/solutions/presence
and this post
How to make user presence mechanism using Firebase?
Also applies to your
is there any user holding connection for more than X minute
..............
Edit in response to comments: I believe you are experiencing the XY problem https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem where you are focused on a particular solution, even though your problem has other solutions. User metrics, database events, and errors are all accessible through both dashboards and cloud functions. You can cURL cloud functions if you wish, or set up cron functions to auto report, or set up database trigger functions to log errors. So, while the exact way you want this to work may not exist, you just need to connect existing tools to get the result you want.

Performance difference Firestore through Firebase Functions vs Firestore SDK

Our team is developing a mobile app and is currently in use of (Firebase) Firestore for our backend. We wrapped every DB access with Firebase Functions in order to clean up the object returned to the client app.
Does this approach introduce any (additional) unignorable overhead compared to accessing to Firestore directly?
Yes but No depending on your use case.
If you have small amount of users with relatively low usage (in terms of the given quota), it is recommended to apply Cloud Functions. As stated in the documentation, Firebase Cloud Function offers big quota in terms of Resource limits, Time limits and Rate limits with good pricing especially for the Spark plan (FREE).
The advantage of using Cloud Functions is that it has a high speed and scalable computing / processing unit which could shorten the processing time of a specific function as compared with using the mobile phone CPU which in some cases the mobile phones has low computing power (have to consider various users as not everyone own a high spec phone), in order to provide better user experience (UX), all this hassle can be done by Cloud Function!
Note: I do agree with Doug where cost is one of the factor, but we should also consider the performance and other perspective.
Yes, at the very least, now your path to get data has two hops instead of one. Before, you directly accessed the database using a channel that's optimized for returning the query results. Now, you have to pay the cost of an additional hop to Cloud Functions, which makes the query. And it's possible that the results returned to the client are bigger than if you made the query directly.
Perhaps the biggest loss you'll experience is the client side caching of documents that's automatically performed by the client (enabled by default on Android and iOS). If you repeat a query and none of the documents have changed, you get immediate results from the cache instead of having to wait for the server. And you won't have to pay for document reads for cache hits. So, if you aren't also caching your results, you're also paying the monetary cost of Cloud Functions and the query to Firestore for every request.
Yes, but the answer could be different based on the situation.
If a client wants to fetch a record exactly as in the database, the Firebase SDK might be faster because there is no overhead calling the Firebase Functions.
If we have a heavy processing after fetching a record, then Firebase Functions + Firebase Admin SDK could be faster because the processing unit in Firebase Functions could be faster than mobile CPU. However, if the request responds faster, the client app could display an additional message that something was fetched and currently in process during the heavy processing, the user experience could be acceptable.
The only case I can come up with Firebase Functions could always win is that the server reduces the data size so that the overhead introduced by Firebase Functions (including processing time) was compensated by the shorter network delay. This also has advantage of saving client's data plan.

What are the limitations of performing a Firestore query over a large collection

I would like to use Firebase Firestore for my next project but I need some help understanding the limitations of what is possible with Firestore queries.
Basically, I would like to perform queries over collections with large amounts of documents but upon doing research I have come across conflicting information.
This video (from the Firebase team): https://youtu.be/W3xIOQu0h1w?t=11m50s states that you can perform a query over a collection with "billions" of documents and maintain the same level of performance compared to a query over a collection with a few documents.
Then, I came across this github issue where the poster states that queries are taking too long and demand alot from the system. A member of the Firebase team answers by stating that performing a query over a collection containing 35k documents is beyond the "performance envelope".
So can someone point me in the right direction concerning Firestore queries and its limitations.
Please let me know if I was not clear in any part of this post.
That GitHub issue you linked to is specifically talking about offline searches. This means the Firestore backend service is not available, so the search is performed on the client. This is a very different situation than when the client is online and the service can perform the query in a massively scalable way. (Client apps are never massively scalable.)

What's the difference between Cloud Firestore and the Firebase Realtime Database?

Google just released Cloud Firestore, their new Document Database for apps.
I have been reading the documentation but I don't see a lot of differences between Firestore and Firebase DB.
The main point is that Firestore uses documents and collections which allow the easy use of querying compared to Firebase, which is a traditional noSQL database with a JSON base.
I would like to know a bit more about their differences, or usages, or whether Firestore just came to replace Firebase DB?
I wrote an entire blog post all about this very question, and I recommend you check it out (or the official documentation) for a more complete answer.
But if you want the quick(-ish) summary, here it is:
Better querying and more structured data -- While the Realtime Database is just a giant JSON tree, Cloud Firestore is a little more structured. All your data consists of documents (which are basically key-value stores) and collections (which are collections of documents). Documents will also frequently point to subcollections, which contain other documents, which themselves can contain other documents, and so on.
This structured data helps you out in two ways. First, all queries are shallow, meaning that you can request a document without grabbing all the data underneath. This means you can keep your data stored hierarchically in a way that makes more sense to you without having to worry about keeping your database shallow. Second, you have more powerful queries. For instance, you can now query across multiple fields without having to create those "combo" fields that combine (and denormalize) data from other parts of your database. In some cases, Cloud Firestore will just run those queries directly, and in other cases, it will automatically create and maintain indexes for you.
Designed to Scale -- Cloud Firestore will be able to scale better than the Realtime Database. It's important to note that your queries scale to the size of your result set, not your data set. So searching will remain fast no matter how large your data set might become.
Easier manual fetching of data -- Like the Realtime Database, you can set up listeners in Cloud Firestore to stream in changes in real-time. But if you don't want that kind of behavior, and just want a simple "fetch my data" call, Cloud Firestore has that as well, and it's built in as a primary use case. (They're much better than the once calls in Realtime Database-land)
Multi region support -- This basically means more reliability, as your data is shared across multiple data centers at once. But you still have strong consistency, meaning you can always make a query and be assured that you're getting the latest version of your data.
Different pricing model -- While the Realtime Database primarily charges based on storage or network bandwidth, Cloud Firestore primarily charges based on the number of operations you perform. Will this be better, or worse? It depends on your app.
For powering a news app, turn-based multiplayer game, or something like your own version of Stack Overflow, Cloud Firestore will probably look pretty favorable from a pricing standpoint. For something like a real-time group drawing app where you're sending across multiple updates a second to multiple people, it probably will be more expensive than the Realtime Database.
Why you still might want the to use the Realtime Database -- It comes down to a few reasons.
That whole "it'll probably be cheaper for apps that make lots of frequent updates" thing I mentioned previously,
It's been around for a long time and has been battle tested by thousands of apps,
It's got better latency and when you need something with reliably low latency for a real-timey feel, the Realtime Database might work better.
For most new apps, we recommend you check out Cloud Firestore. But if you have an app that's already on the Realtime Database, I don't really recommend switching just for the sake of switching, unless you have a compelling reason to do so.
Reasons to choose Cloud Firestore over Realtime Database
It is an improved version
Firebase database was enough for basic applications. But it was not powerful enough to handle complex requirements. That is why Cloud Firestore is introduced. Here are some major changes.
The basic file structure is improved.
Offline support for the web client.
Supports more advanced querying.
Write and transaction operations are atomic.
Reliability and performance improvements
Scaling will be automatic.
Will be more secure.
Pricing
In Cloud Firestore, rates have lowered even though it charges primarily on operations performed in your database along with bandwidth and storage. You can set a daily spending limit too. Here is the complete details about billing.
Future plans of Google
When they discovered the flaws with Real-time Database, they created another product rather than improving the old one. Even though there are no reliable details revealing their current standings on Real-time Database, it is the time to start thinking that it is likely to be abandoned.
Suggest link from google as well :
Firebase Real-time Database vs FireStore
Extracted from google docs, a small sumamry here:
FireBase Real Time DB is JSON based NO SQL DB, meant for mobile apps, regional, and used typically to store and sync data between users/devices in realtime / extremely low latency.
FireStore is JSON 'like' NOSQL DB meant for high concurrency, global, easily auto scaling persistence, designed for any clients (not only mobile apps) with typical use cases such as asset tracking, real time analytics, building retail product catalogs, social user profile, gaming leaderboards, chat based applications etc.
Cloud Firestore is Firebase's database for mobile app
development. It builds on the successes of the Realtime Database with
a new, more intuitive data model. Cloud Firestore also features
richer, faster queries and scales further than the Realtime Database.
Realtime Database is Firebase's original database. It's an efficient,
low-latency solution for mobile apps that require synced states
across clients in realtime.
To choose between Firebase Realtime database and Cloud firestore based on your application requirements, read official documentation here.

Resources