Goal here to delete as fast as possible keeping Firebaes Realtime Database instance utilization under 100 %.
I have 360 GB data in Firebase Realtime Database. Now I want to delete most the data that is not need. I have script that is doing delete by using firebase database:remove /node1/child1 (https://firebase.googleblog.com/2019/03/large-deletes-in-realtime-database.html)
Node Structure
"node1":{
"child1":{
"thousand of child's node here i want to delete"
},
"child2":{
"thousand of child's node here i want to delete"
},
"child3":{
"child3 is required can not delete this one "
}
}
I was thinking if I update path firebase database:remove /node1/child1 to null. Will it remove all the child of child 1? and difference between these two approaches?
You should use firebase database:remove, as detailed in this blog post. By just calling remove() or update(null), you will lock the database until all data is deleted, something that could be many minutes or even hours with a dataset that large.
The CLI command will instead chunk and batch deletes into reasonable sizes, keeping your database utilization from being completely locked. In fact, with database:remove you don't need to manually batch -- you can just pass it the largest node that you need deleted and it will automatically take care of batching for you.
If you pass null value to a firebase path, then it should be the same as remove that path.
Passing null for the new value is equivalent to calling remove(); namely, all data at this location and all child locations will be deleted.
Firebase documentation
The implementation of remove method in firebase admin (for android) also using this set to null.
/**
* Set the value at this location to 'null'
*
* #return The ApiFuture for this operation.
*/
public ApiFuture<Void> removeValueAsync() {
return setValueAsync(null);
}
Firebase source code in Java
Related
I'm relying on Firebase Firestore offline capabilities, so I'm not using await on my queries as stated on the Access Data Offline Firebase doc. I'm expecting that when I write something I'll get an immediate reflection on my read stream, however, I'm only getting an update when the server/remote has been updated. Basically:
Update something in the DB. Note, I'm not using await
_db.doc(parentDoc).collection(DocInnerCollection).doc(childDoc).update({
"name": value,
});
I expect my listeners to be updated immediately. Note I've set the includeMetadataChanges to true as stated in the above doc.
_db.doc(parentDoc)
.collection(DocInnerCollection)
.orderBy('start_date', 'desc')
.limitToLast(1)
.snapshots(includeMetadataChanges: true)
.map((snapshot) {
print(snapshot.metadata.isFromCache)
});
However, I get no such update and instead I only get an update when the server has been updated.
You're requesting only one document with .limitToLast(1), yet are not providing a sort order for your query. This essentially means that you'll get a random document from your collection, and the chances of that being the newly updated document are close to zero.
If you want the latest (not just last) document, you need some ordering criteria to determine what latest means. Typically you'd do this by:
Adding a lastUpdated field to your documents, and setting that to firebase.firestore.FieldValue.serverTimestamp().
Ordering your query on that timestamp with orderBy('lastUpdated', 'desc').
And then limiting to the first result with limit(1).
What would a Cosmos stored procedure look like that would set the PumperID field for every record to a default value?
We are needing to do this to repair some data, so the procedure would visit every record that has a PumperID field (not all docs have this), and set it to a default value.
Assuming a one-time data maintenance task, arguably the simplest solution is to create a single purpose .NET Core console app and use the SDK to query for the items that require changes, and perform the updates. I've used this approach to rename properties, for example. This works for any Cosmos database and doesn't require deploying any stored procs or otherwise.
Ideally, it is designed to be idempotent so it can be run multiple times if several passes are required to catch new data coming in. If the item count is large, one could optionally use the SDK operations to scale up throughput on start and scale back down when finished. For performance run it close to the endpoint on an Azure Virtual Machine or Function.
For scenarios where you want to iterate through every item in a container and update a property, the best means to accomplish this is to use the Change Feed Processor and run the operation in an Azure function or VM. See Change Feed Processor to learn more and examples to start with.
With Change Feed you will want to start it to read from the beginning of the container. To do this see Reading Change Feed from the beginning.
Then within your delegate you will read each item off the change feed, check it's value and then call ReplaceItemAsync() to write back if it needed to be updated.
static async Task HandleChangesAsync(IReadOnlyCollection<MyType> changes, CancellationToken cancellationToken)
{
Console.WriteLine("Started handling changes...");
foreach (MyType item in changes)
{
if(item.PumperID == null)
{
item.PumperID = "some value"
//call ReplaceItemAsync(), etc.
}
}
Console.WriteLine("Finished handling changes.");
}
Imagine I have the following pseudo code in flutter/dart :
Stream<List<T>> list() {
Query query = Firestore.instance.collection("items");
return query.snapshots().map((snapshot) {
return snapshot.documents.map((doc) {
return standardSerializers.deserializeWith(serializer, doc.data());
}).toList();
});
}
I am listening to the whole collection of "items" in my database. Let's say for simplicity there are 10 documents in total and I constantly listen for changes.
I listen to this stream somewhere in my code. Let's say this query returns all 10 "items" the first time this is called for example. This counts as 10 reads, ok fine. If I modify one of these documents directly in the firestore web interface (or elsewhere), the listener is fired and I have the impression another 10 reads are counted, even though I only modified one document. I checked in the usage tab of my cloud project and I have this suspicion.
Is this the case that 10 document reads are counted even if just one document is modified for this query?
If the answer is yes, the next question would be "Imagine I wanted to have two calls to list(), one with orderBy "rating", another with orderBy "time" (random attributes), one of these documents changes, this would mean 20 reads for 1 update"?
Either I am missing something or firestore isn't adapted for my use or I should change my architecture or I miscounted.
Is there any way to just retrieve the changed documents? (I can obviously implement a cache, local db, and timestamp system to avoid useless reads if firestore does not do this)
pubspec.yaml =>
firebase_database: ^4.0.0
firebase_auth: ^0.18.0+1
cloud_firestore: ^0.14.0+2
This probably applies to all envs like iOS and Android as it is essentially a more general "firestore" question, but example in flutter/dart as that is what I am using just in case it has something to do with the flutterfire plugin.
Thank you in advance.
Q1: Is this the case that 10 document reads are counted even if just one document is modified for this query?
No, as detailed in the documentation:
When you listen to the results of a query [Note: (or a collection or subcollection)], you are charged for a read
each time a document in the result set is added or updated. You are
also charged for a read when a document is removed from the result set
because the document has changed. (In contrast, when a document is
deleted, you are not charged for a read.)
Also, if the listener is disconnected for more than 30 minutes (for
example, if the user goes offline), you will be charged for reads as
if you had issued a brand-new query. [Note: So 10 reads in your example.]
Q2: If the answer is yes, the next question...
The answer to Q1 is "no" :-)
Q3: Is there any way to just retrieve the changed documents?
Yes, see this part of the doc, which explains how to catch the actual changes to query results between query snapshots, instead of simply using the entire query snapshot. For Flutter you should use the docChanges property.
I am using the firebase-tools shell CLI to test Firestore cloud functions.
My functions respond to the onCreate trigger for all documents in a certain collection, by using a wildcard, and then mutate that document with an update call.
firestore
.document(`myCollection/{documentId}`)
.onCreate(event => {
const ref = event.data.ref
return ref.update({ some: "mutation"})
})
In the shell I run something like this, (passing some fake auth data required by my database permissions):
myFunction({some: "data"}, { auth: { variable: { uid: "jj5BpbX2PxU7fQn87z10d4Ks6oA3" } } } )
Hoever this results in an error, because the update tries to mutate a document that is not in the database.
Error: no entity to update
In the documentation about unit testing it is explained how you would create mocks for event.data in order to execute the function without touching the actual database.
However I am trying to invoke a real function which should operate on the database. A mock would not make sense, otherwise this is nothing more then a unit test.
I'm wondering what the strategy should be for invoking a function like this?
By using an existing id of a document the function can execute successfully, but this seems cumbersome because you need look it up in the database for every test, and it might not be there anymore at some point.
I think it would be very helpful if the shell would somehow create a new document from the data you pass in, and run the trigger from that. Would this be possible maybe, or is there another way?
The Cloud Functions emulator can only emulate events that could happen within your project. It doesn't emulate the actual change to the database that would have triggered it.
As you're discovering, when your function depends on that actual change previously occurring, you can run into problems. The fact of the matter is that it's entirely possible that the created document may have already been deleted by the time you're handling the event in the function (imagine a user acts quickly to delete, but the event is delayed for whatever reason).
All that said, perhaps you want to use set() with SetOptions that indicate you want to merge instead of overwrite. Bear in mind that if the document was previously deleted (with good reason) before the event triggered, you'll unconditionally recreate the document, which may not be what the user wanted.
So, from Firebase functions, I'm listening to this event -
exports.populateVairations_delete =
functions.database.ref('/parentA/parentB/child').onDelete(event =>
{
// I know how to get the previous value for what I'm listening too...
val = event.data.previous.val();
...
}
This function is being invoked also when deleting the parent, which is exactly what I want.
But when deleting a parent, how do I access data from /parentA before it's being deleted?
onDelete triggers are always executed after the delete has occurred. There's no way to prevent a delete from happening with a function. Your onDelete code will be delivered an event that contains only the data that was deleted. The event object itself can't be used to see other parts of the database.
If you need to access other parts of the database inside a database trigger, you can use the Admin SDK to make those queries. There is a lot of official sample code that illustrates how to do this.
With context.resource.name you could get a string containing the data path.
Just use Admin SDK for Firebase, You can have administrator access to the firebase Db.From there, you can do basically anything with the Firebase Db