I have a scenario where most of the documents i want to delete are in a collection called "expired". I do not want to overload my servers by running a long running process which would iterate over documents and delete them one by one i would rather do them in batch size using document-delete.
So my question is how does xdmp:collection-delete work ?
Does it iterate over documents and delete them ?
or
Does it do something like DROP Table in SQL and its "instantaneous" ?
I want to know what is the background process for xdmp:collection-delete. I wonder if anyone can draw the flow of how this function handles document for deletion as i want to understand the process in more depth than just overview of what it does.
xdmp:collection-delete() will delete all documents in the collection in a single transaction. While it's not instantaneous, it should be fast, as it just needs to set the deletion timestamp of each document.
You may try to use corb to delete documents one by one. You may increase threads though for parallel processing.
Related
I have a very large collection of aprox 2 milions documents, all of them are outdated, and needed to be deleted.
I need to do this operation only one time, in the new data i have TTL (time to live) so i won't run into this problem again.
Sould i use the firestore console ui to delete those, or there is a better way to do this. is it possible to do this in one shot or sould i split it?
There's no single way that is pertinently better here.
The simplest option is probably to delete the documents from the console, but I often also use the Firebase CLI's firestore:delete command - and writing your own logic through the API is equally fine. Any of these can work fine, all will need to read the documents before deleting them, and none of them is going to be significantly faster than the other.
Getting a bunch of documents from a Firestore collection is best done through a query, I fully understand that. However, in certain situations where we want to get a bunch of specific documents based on their document ID from a single collection (or even spanning multiple collections), is it performant in Firestore to loop through those document IDs (whether it's 10 or 1,000) on the client and perform a getDocument() call on each one?
That's the best way to get the job done. It's also the only way, unless you want to use an "in" query to fetch in batches (which I'm told is actually slower than fetching each one individually).
I need to know how to clear documentdb collection before inserting new documents. I am using datafactory pipeline activity to fecth data from on-prem sql server and insert into documentdb collection. The frequency is set to every 2 hrs. So when the next cycle runs, I want to first clear the exisitng data in documentdb collection. How do I do that?
The easiest way is to programmatically delete the collection and recreate it with the same name. Our test scripts do this automatically. There is the potential for this to fail due to a subtle race condition, but we've found that adding a half second delay between the delete and recreate avoids this.
Alternatively, it would be possible to fetch every document id and then delete them one at a time. This would be most efficiently done from a stored procedure (sproc) so you didn't have to send it all over the wire, but it would still be consuming RUs and take time.
Is it possible to delete all entities in a collection using the shell? I need to refresh names of a very large collection and have a script that will accomplish this but I do need to delete them all first. Failing that, can I delete an app and start over?
Thanks!
You cannot delete all entities in a collection in one go, but you can select all entities on a page in the Admin Console and delete the selected entities. That is certainly faster than doing it individually, but for a really large collection, this may take some time.
This is the best way currently:
DELETE /yourcollectionname?select * where created > 0&limit=100
You can do the equivalent of this in the Data Explorer. Just fill out the query and the limit boxes.
Or, run this command via curl
We will have a better solution in the near future.
I need to manage the acquisition of many record at hour. About 1000000 records. And I need to get every second the last insert value for every primary key. It works quit well with sharding. I was thinking to try the use os capped collection to get only the last record for every primary key. In order to do this, I made two separated insert, there is a way, into mongodb, to make some kind of trigger to propagate the insert into a collection to another collection?
MongoDB does not have any support for triggers or similar behavior.
The only way to do this is to make it happen in your code. So the code that writes the first entry should also write the second.
People have definitely requested triggers. If they are necessary for your solution, please cast a vote on the feature request.
I disagree with "triggers is needed". People, MongoDB was created to be very fast and to provide as basic functionalities as can be. This is a power of this solution.
I think that here the best think is to create triggers inside Your application as a part of Data Access layer.