I have been reading this article about reading the realtime database directly vs calling cloud functions that returns database data.
If I am returning a fairly large chunk of data e.g. a json object that holds data representing 50 user comments from a cloud function does this count
As Outbound Data (Egress) data? If so does this cost $0.12 per gb per month?
The comments are stored like so with an incremental key.
comments: [0 -> {text: “Adsadsads”},
1 -> {text: “dadsacxdg”},
etc.]
Furthermore, I have read you can call goOffline() and goOnline() using the client sdks to stop concurrent connections. Are there any costs associated with closing and
Opening database connections or is it just the speed aspect of opening a connection every time you read?
Would it be more cost effective to call a cloud function that returns the set of 50 comments or allow the devices to read the comments directly from the database
But open/close before/after each read to the database, using orderByKey(), once(), startAt() and limitToFirst()?
e.g something like this
ref(‘comments’).once().orderByKey().startAt(0).limitToFirst(50).
Thanks
If your Cloud Function reads data from Realtime Database and returns (part of) that data to the caller, you pay for the data that is read from the database (at $1/GB) and then also for the data that your Cloud Function returns to the user (at $0.12/GB).
Opening a connection to the database means data is sent from the database to the client, and you are charged for this data (typically a few KB).
Which one is more cost effective is something you can calculate once you have all parameters. I'd recommend against premature cost optimization though: Firebase has a pretty generous free tier on its Realtime Database, so I'd start reading directly from the database and seeing how much traffic that generates. Also: if you are explicitly managing the connection state, and seem uninterested in the realtime nature of Firebase, there might be better/cheaper alternatives than Firebase to fill your needs.
Related
In firestore document (https://firebase.google.com/docs/firestore/manage-data/transactions#transactions) it says that: "Read operations must come before write operations". What's the rationale for this rule enforced in the nodejs sdk?
Here's what I know:
Read operations are associated with the transaction id; write operations will be submitted with the same transaction as batch writes; all reads and writes within the transaction take place exactly at the commit time.
Firestore server sdk cache write operations in a write batch and commit at the end; if the write batch is not empty, read operations fail: https://github.com/googleapis/nodejs-firestore/blob/master/dev/src/transaction.ts#L123
Firestore SDKs are based on Google gRPC APIs: https://github.com/googleapis/googleapis/blob/master/google/firestore/v1/firestore.proto
What if we remove this empty write batch check? The server sdk will cache all write operations and commit them at the end. What am I missing?
What I'm trying to do is: manage collections in different modules and they need to reuse the same transaction to perform an atomic operation (e.g. create an order and reduce the inventory and subtract user balance); each module need to read from some collections and make some changes.
I created several new pipelines in Azure Data Factory to process CosmosDB Change Feed (which go into Blob storage for ADF processing to on-prem SQL Server), and I'd like to "resnap" the data from the leases collection to force a full re-sync. Is there a way to do this?
For clarity, my set-up is:
Change Feed ->
Azure Function to process the changes -> Blob Storage to hold the JSON documents -> Azure Data Factory which picks up the Blob Storage documents and maps them to on-prem SQL Server stored proc inserts/updates.
The easiest and simplest way is to do it is to simply delete the lease documents and make sure that the StartFromBeginning setting is set to true. Once restarted the change feed service will recreate the leases (if the appropriate setting is configured to true) and reprocess all the documents.
The other way to do so is to update every single lease document and reset the Continuation token "checkpoint" to null, however I don't recommend this method since you might accidentally miss a lease which can lead to issues.
One child node of my Firebase realtime database has become huge (aroung 20 GB) and I need to purge this and insert the the extracted data of last month from the backup into the Firebase realtime database using Python Admin SDK.
In the documentation, I see the following options:
set - Write or replace data to a defined path, like messages/users/
update - Update some of the keys for a defined path without replacing all of the data
push - Add to a list of data in the database. Every time you push a new node onto a list, your database generates a unique key, like messages/users//
transaction - Use transactions when working with complex data that could be corrupted by concurrent updates
However, I want to add/insert the data from the firebase backup. I have to insert because the app is used in production and I cannot afford the overwrite of data.
Is there any method available to insert/add the data and not overwrite the data?
Any help/support is greatly appreciated.
There is no way to do this in Firebase Realtime Database without reading the current value of each location.
The only operation that allows you to update data based on its existing value is a transaction. A Firebase transaction gives you the (likely) current value at a location, and you then return what the new value should become.
But if the data you're restoring is (largely) the same as the data you have in the database, you might be able to use an update() call with sufficiently deep paths.
I researched several places and could not find any direction on what options are there to archive old data from cosmosdb into a cold storage. I see for DynamoDb in AWS it is mentioned that you can move dynamodb data into S3. But not sure what options are for cosmosdb. I understand there is time to live option where the data will be deleted after certain date but I am interested in archiving versus deleting. Any direction would be greatly appreciated. Thanks
I don't think there is a single-click built-in feature in CosmosDB to achieve that.
Still, as you mentioned appreciating any directions, then I suggest you consider DocumentDB Data Migration Tool.
Notes about Data Migration Tool:
you can specify a query to extract only the cold-data (for example, by creation date stored within documents).
supports exporting export to various targets (JSON file, blob
storage, DB, another cosmosDB collection, etc..),
compacts the data in the process - can merge documents into single array document and zip it.
Once you have the configuration set up you can script this
to be triggered automatically using your favorite scheduling tool.
you can easily reverse the source and target to restore the cold data to active store (or to dev, test, backup, etc).
To remove exported data you could use the mentioned TTL feature, but that could cause data loss should your export step fail. I would suggest writing and executing a Stored Procedure to query and delete all exported documents with single call. That SP would not execute automatically but could be included in the automation script and executed only if data was exported successfully first.
See: Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs.
UPDATE:
These days CosmosDB has added Change feed. this really simplifies writing a carbon copy somewhere else.
We are in the situation that we will have to update large amounts of data (ca. 5 Mio Records) in firebase periodically. At the moment we have a few json files that are around ~1 GB in size.
As existing third party solutions (here and here) have some reliability issues (import object per object; or need for open connection) and are quite disconnected to the google cloud platform ecosystem. I wonder if there is now an "official" way using i.e. the new google cloud functions? Or a combination with app engine / google cloud storage / google cloud datastore.
I really like not to deal with authentication — something that cloud functions seems to handle well, but I assume the function would time out (?)
With the new firebase tooling available, how to:
Have long running cloud functions to do data fetching / inserts? (does it make sense?)
Get the json files into & from somewhere inside the google cloud platform?
Does it make sense to first throw large data into google-cloud-datastore (i.e. too $$$ expensive to store in firebase) or can the firebase real-time database be reliably treaded as a large data storage.
I finally post the answer as it aligns with the new Google Cloud Platform tooling of 2017.
The newly introduced Google Cloud Functions have a limited run-time of approximately 9 minutes (540 seconds). However, cloud functions are able to create a node.js read stream from cloud storage like so (#googlecloud/storage on npm)
var gcs = require('#google-cloud/storage')({
// You don't need extra authentication when running the function
// online in the same project
projectId: 'grape-spaceship-123',
keyFilename: '/path/to/keyfile.json'
});
// Reference an existing bucket.
var bucket = gcs.bucket('json-upload-bucket');
var remoteReadStream = bucket.file('superlarge.json').createReadStream();
Even though it is a remote stream, it is highly efficient. In tests I was able to parse jsons larger than 3 GB under 4 minutes, doing simple json transformations.
As we are working with node.js streams now, any JSONStream Library can efficiently transform the data on the fly (JSONStream on npm), dealing with the data asynchronously just like a large array with event streams (event-stream on npm).
es = require('event-stream')
remoteReadStream.pipe(JSONStream.parse('objects.*'))
.pipe(es.map(function (data, callback(err, data)) {
console.error(data)
// Insert Data into Firebase.
callback(null, data) // ! Return data if you want to make further transformations.
}))
Return only null in the callback at the end of the pipe to prevent a memory leak blocking the whole function.
If you do heavier transformations that require a longer run time, either use a "job db" in firebase to track where you are at and only do i.e. 100.000 transformations and call the function again, or set up an additional function which listens on inserts into a "forimport db" that finally transforms the raw jsons object record into your target format and production system asynchronously. Splitting import and computation.
Additionally, you can run cloud functions code in a nodejs app engine. But not necessarily the other way around.