How to consume changes to Google Cloud Datastore as a stream? - google-cloud-datastore

The Cloud Dataflow page implicates that this would be possible, but I haven't found a way of observing change events in the Google Cloud Datastore docs. How is it done?

As far as I am aware, the integration of Cloud Datastore with Dataflow is through DatastoreIO (now based on DatastoreV1), which can only be used as a bounded source for batch jobs.
I have been trying to find an alternative solution that would allow you to use Datastore (directly or indirectly) as an unbounded source (for instance creating a Pub/Sub topic where Datastore changes are published and can be consumed from Dataflow), but I do not think that would be a viable solution given that, as you said, there is no easy way to detect changes (addition of entities, modification of entities, etc.) in Datastore.
For now, I have filed an internal request to improve the documentation to either modify the image so that it does not imply that Cloud Datastore can be used with a Streaming Pipeline, or clarify this use case.

Related

Trigger an R function from Google storage bucket?

From this link https://code.markedmondson.me/googleCloudRunner/articles/usecase-r-event-driven-pubsub.html, it's possible to trigger an R function from pub/sub.
But does Google offer possibility to trigger an R function when a new file is uploaded to a bucket?
Thanks.
You should be able to do this using EventArc and Cloud Run, see Create a trigger for Cloud Run.
NOTE Cloud Run supports any runtime (that you can containerize) and -- although opinionated (constrained set of runtime versions), Cloud Functions next generation (gen2) creates Cloud Run services.
One other way to do this is less practical: Cloud Storage triggers with Cloud Functions. R isn't a supported runtime for Cloud Functions and so, while not impossible, I don't recommend this approach.
There are probably others, GCP's eventing is confusing and my take is that Google's supporting different approaches for customer continuity reasons and doesn't provide clear guidance on the best solutions for "green field" applications.
I think (!?) Eventarc is the strategic solution and recommend using that if you can.

How do I send data from Pub/Sub to Google Cloud Firestore?

I'm setting up a dataflow in Google Cloud Platform. I have retrieved data from IoT devices and sent to Pub/Sub, and I now want to store in Firestore. I am using Firebase for my web app where data should be displayed etc.
I realise this should be a simple task, but am still not finding too much information about it (probably too straight forward). Have looked at using Cloud Functions to retrieve from Pub/Sub and add to Firestore, which ought to work (although I got stuck at credentials), but is there an easier way that I am missing?
Your thinking on using Cloud Functions as the recipient of a PubSub message coming from Cloud IoT Core and in that Cloud Function performing an insert into Firestore is exactly correct.
Here is an example showing just that:
https://blog.usejournal.com/build-a-weather-station-with-google-cloud-iot-cloud-firestore-mongoose-os-android-jetpack-350556d7a
If we Google search using "gcp pubsub firestore" we will find many others.
If you use this technique and run into puzzles/problems, don't hesitate to raise new questions (but first search and see if they have been raised before). In your new questions, please be as detailed as possible. There are many samples available for all sorts of related areas. Have a study of these.

Using Firebase as a backend for recurring tasks

I am working on a project that, currently, is 100% Firebase. Ideally, given I'm fully Firebase, I'd like to stay with Firebase for a next task which is updating some of the records based on external API calls once per day.
I'm currently using Firebase Functions for triggered events, not using it for API calls, everything that happens in the functions is after a user does something, and doesn't respond back to any clients (only responds back to the database for updates).
Is Firestore Cloud Functions a good place to run something like this that could call an external API and then update as necessary? I saw the scheduled functions that require the Blaze plan, have considered it but not sure if there's another approach that's better built for this task.
Cloud Functions that trigger on Firestore events probably aren't what you're looking for. Firestore triggers only fire when something in your Cloud Firestore database has changed. That means you need something that's writing to some document in the database in order to get the code to run. Which means you need a way to schedule that operation.
No matter what kind of trigger you write, you will need to be on a billing plan in order to make external requests anyway. So even if you somehow managed to put together a solution that uses Firestore triggers, your project would still need to be on a billing plan.
This approach is perfectly okay - in fact, I am using the exact same approach in my project which has 100% Firebase back-end. The overall (Firebase) Cloud Functions gives flexibility in terms of invocation i.e. they can be invoked based on trigger (e.g. storage or database event) or can be called with the HTTP end-point. So, depending on your need you can either use Firestore trigger or database trigger or call an end-point.
Switching to Blaze plan is perfectly fine since otherwise we can't call an external end-point. I switched to Blaze plan just a few months back and didn't pay anything for that as my usage is within the free limit.

Google Firestore a subset or superset of Google Cloud Datastore?

Google announced Firestore, the new document datastore on the block.
I have been developing an application using Google Cloud Datastore for over six months now and after reading the blog, I feel Firestore seems to be a better choice.
The concept of the alternate collection-document-subcollection looks excellent to me because while designing schema for datastore I was aware I will be unable to query nested fields. Now with firestore subcollections, I get full query capabilities which is a game changer for me (I can get maximum data with minimum queries).
As a counter argument, the flowchart suggests me to use datastore because I do not have any mobile clients.
Will it be a good idea to use Firestore just like Datastore ?
(I will conveniently ignore the mobile client/realtime updates/syncing features!)
Update 2 (01/31/19)
As of today, Cloud Firestore is no longer in Beta and is Generally Available:
https://cloud.google.com/blog/products/databases/announcing-cloud-firestore-general-availability-and-updates
This means that Cloud Datastore is no longer an option for new projects (you can keep using it on existing projects). New projects that want to use the Datastore API can use Cloud Firestore in Datastore mode.
Update 1
As you we have noticed, we've expanded Cloud Firestore since this question was posted.
This means Cloud Firestore now has 2 modes:
The original launch was 'Native mode'
The new launch adds 'Datastore mode'
'Datastore mode' is the 3rd gen of Cloud Datastore. 1st was called Master/Slave Datastore, 2nd was High-Replication Datastore (HRD) that was rebranded as Cloud Datastore in 2013.
The below answer is still largely relevant since both modes are currently mutually exclusive, so you need to pick one or the other.
The main differences are the improves of Cloud Firestore in Datastore mode over Cloud Datastore. The biggest ones are:
Write through-put per entity group now unbounded (was 1 write/second)
Transactions no longer limited to 25 entity groups
All queries now strongly consistent.
Also note Cloud Firestore regardless of mode is beta, so the new Service-Level Agreement (SLA) doesn't go into effect until the product reaches General Availability (GA).
Original Answer
Cloud Datastore (CD) and Cloud Firestore (CF) are similar, however different in significant ways.
CF is mobile-centric with direct from mobile client functionality with the Firebase SDKs and Rules functionality. CD is server-centric with a wider range of server client libraries, as well as some mature frameworks on App Engine Standard that bundle in memcache functionality.
CF has a newer storage layer that is strongly consistency in the same way as Cloud Spanner, however, it's still in beta without an SLA. CD's storage layer is only strongly consistent within entity-groups and eventually consistent across entity-groups, however, it is GA with a 99.95% SLA for the Multi-Region locations.
CF is only available in the US Multi-Region at this time. CD is available Cloud across a dozen locations including places in the Americas, Europe, Asia, and Australia.
CF during beta has a guideline limit of 2500 writes/second while we build experience monitoring and tuning the system prior to GA, whereas CD will happily handle >1M writes/second (please reach out to your account rep first though).
CF and CD's set of query capabilities are overlapping but not the same. Overall CD has a broader set of query capabilities we haven't built in CF yet, so you'd have more flexibility in CD.
Overall, I'd consider this list to see if any of the differences make or break what you're trying to build then pick the DB that fits closest to your needs.
Firestore is the 3rd generation architecture and replacement for Datastore, essentially available in 2 modes: Native mode and Datastore mode.
Documentation regarding the choices: https://cloud.google.com/datastore/docs/firestore-or-datastore
Video overview: https://www.youtube.com/watch?v=SYG-BgXoJFQ
I'd say that Datastore is now a subset of Firestore:
Cloud Firestore is the next major version of Cloud Datastore and a re-branding of the product.
See Choosing between Cloud Firestore and Cloud Datastore
Cloud Firestore can operate in "Datastore mode", making it backwards- compatible with Cloud Datastore. Some time after Cloud Firestore is released for general availability, Google will begin contacting owners of existing Cloud Datastore databases to schedule an automatic upgrade to Cloud Firestore in Datastore mode. See auto upgrade
Google documentation says:
Firestore is the new version of Datastore and removes several
Datastore limitations.
I think cloud firestore also has nodejs client and its not centric to mobile. Actually thats the difference between Firebase realtime database which was mobile-centri vs Cloud Firestore which is anything centric.

Streaming into Dataflow from Datastore?

Is it possible to use the Datastore as an input on a streaming basis? I.e. any time an entity is saved to the datastore it streams that to a dataflow project?
Currently we do not stream out of Datastore automatically, but I've made a note of your interest in it. One approach you can consider is to monitor any relevant source from a App Engine and publish its contents to PubSub.

Resources