How can I mass-update my data in Firebase? - firebase

For a while I was using the deprecated .removeOnDisconnect() function to manage client presence, so some documents now permanently show that there are multiple viewers even if there is only one looking at their document. To fix this, I want to delete all children of "clients" for each scratchpad. I read through the docs, but couldn't find a good way to do this. Any suggestions?
My data tree looks like this:
scratchpad.firebaseio.com/:scratchpad_id/clients/:client_id

Firebase doesn't have an operation like you describe (yet).
I'd recommend attaching a "child_added" callback at the root of your Firebase, and then for each child, deleting the "clients" location. This would require you to sync the entire Firebase, but for a server running Node.js that may not be a big deal.

Related

Deleting a very large collection in firestore from the firebase console

I have a very large collection of aprox 2 milions documents, all of them are outdated, and needed to be deleted.
I need to do this operation only one time, in the new data i have TTL (time to live) so i won't run into this problem again.
Sould i use the firestore console ui to delete those, or there is a better way to do this. is it possible to do this in one shot or sould i split it?
There's no single way that is pertinently better here.
The simplest option is probably to delete the documents from the console, but I often also use the Firebase CLI's firestore:delete command - and writing your own logic through the API is equally fine. Any of these can work fine, all will need to read the documents before deleting them, and none of them is going to be significantly faster than the other.

Changing data structure after app has been published - Firestore

I have just published an app that uses Firestore as a backend.
I want to change how the data is structured;
for example if some documents are stored in subcollections like 'PostsCollection/userId/SubcolletionPosts/postIdDocument' I want to move all this last postIdDocument inside the first collection 'PostsCollection'.
Obviously doing so would prevent users of the previous app version from writing and reading the right collection and all data would be lost.
Since I don't know how to approach this issue, I want to ask you what is the best approach that also big companies use when changing the data structure of their projects.
So the approach I have used is document versioning. There is an explanation here.
You basically version your documents so when you app reads them, it knows how to update those documents to get them to the desired version. So in your case, you would have no version, and need to get to version 1, which means read the sub-collections to the top collection and remove the sub collection before working with the document.
Yes it is more work, but allows an iterative approach to document changes. And sometimes, a script is written to update to the desired state and new code is deployed 😛. Which usually happens when someone wants it done yesterday. Which with many documents can have it's own issues.

Modeling one to one chat on firebase

I'm building a one to one messaging feature the intent behind is the following:
There is a unique project and people (two or more) can chat about the project so we can think a project is a room, I've been looking to different modeling structures the most common is something like the following:
Chats
- projectId (room)
- messages
message
userId
name
profilePicture
posted (timestamp)
But I've been thinking in a flat structure something like
Messages
ProjectId
Message
userId
name
profilePicture
posted
The chat feature is going to have a huge impact on the web app I'm building, being said that is quite important to make the right desition (I'm sure there is no always a right or wrong but consider the purpose of the chat)
Just some questions that come to my mind:
are there any implications in performance by using a flat structure?
what are the advantages of using a nested structure like the mentioned in example #1
which solution is cheaper? (reads/writes)
There are befenits from both the solutions you proposed. Let's dive into them:
performance: they are pretty similar from this point of view. In fact, if you want to get a chat from Firestore, in the second case simply make a query for the messages of a particular chat and parse the required information from the first document you receive (since in each message you have the userID, name, profilePicture, etc ...). With the first approach this operation is straightforward since you already asking for a Chat document.
structure: the first solution is the one that I prefer because it's clear what it does and since Firestore is schemaless it enforces a clear design. With the second approach you are basically flattening your DB but you are also exposing your messages to privacy issues. In fact, setting up rules in the first case is pretty straightforward, simply let the users access only the chats they are involved in. But in this case, all the users can, "possibly", read each other messages which should not be something which you want.
cost: this basically depends on what you will do with these documents. In fact, the cost of Firestore either depended on the number of documents read/written but also on the amount of data you store. Here, the first solution is clearly better since you are not adding redundancy for fields like profilePicture, name, userID, etc ... This fields logically belong to the Chat entity, and not to its messages.
I hope this helps since properly setting up a database is vital for any good project.

Deleting very large collections in Firestore

I need to delete very large collections in Firestore.
Initially I used client side batch deletes, but when the documentation changed and started to discouraged that with the comments
Deleting collections from an iOS client is not recommended.
Deleting collections from a Web client is not recommended.
Deleting collections from an Android client is not recommended.
https://firebase.google.com/docs/firestore/manage-data/delete-data?authuser=0
I switched to a cloud function as recommended in the docs. The cloud function gets triggered when a document is deleted and then deletes all documents in a subcollection as proposed in the above link in the section on "NODE.JS".
The problem that I am running into now is that the cloud function seems to be able to manage around 300 deletes per seconds. With the maximum runtime of a cloud function of 9 minutes I can manage up to 162000 deletes this way. But the collection I want to delete currently holds 237560 documents, which makes the cloud function timeout about half way.
I cannot trigger the cloud function again with an onDelete trigger on the parent document, as this one has already been deleted (which triggered the initial call of the function).
So my question is: What is the recommended way to delete large collections in Firestore? According to the docs it's not client side but server side, but the recommended solution does not scale for large collections.
Thanks!
When you have too muck work that can be performed in a single Cloud Function execution, you will need to either find a way to shard that work across multiple invocations, or continue the work in a subsequent invocations after the first. This is not trivial, and you have to put some thought and work into constructing the best solution for your particular situation.
For a sharding solution, you will have to figure out how to split up the document deletes ahead of time, and have your master function kick off subordinate functions (probably via pubsub), passing it the arguments to use to figure out which shard to delete. For example, you might kick off a function whose sole purpose is to delete documents that begin with 'a'. And another with 'b', etc by querying for them, then deleting them.
For a continuation solution, you might just start deleting documents from the beginning, go for as long as you can before timing out, remember where you left off, then kick off a subordinate function to pick up where the prior stopped.
You should be able to use one of these strategies to limit the amount of work done per functions, but the implementation details are entirely up to you to work out.
If, for some reason, neither of these strategies are viable, you will have to manage your own server (perhaps via App Engine), and message (via pubsub) it to perform a single unit of long-running work in response to a Cloud Function.

How to use multiple namespaces in DoctrineCacheBundle cacheDriver?

I know I can setup multiple namespaces for DoctrineCacheBundle in config.yml file. But Can I use one driver but with multiple namespaces?
The case is that in my app I want to cache all queries for all of my entities. The problem is with flushing cache while making create/update actions. I want to flush only part of my cached queries. My app is used by multiple clients. So when a client updates sth in his data for instance in Article entity, I want to clear cache only for this client only for Article. I could add proper IDs for each query and remove them manually but the queries are dynamically used. In my API mobile app send version number for which DB should return data so I don't know what kind of IDs will be used in the end.
Unfortunately I don't think what you want to do can be solved with some configuration magic. What you want it some sort of indexed cache, and for that you have to find a more powerful tool.
You can take a look at doctrines second level cache. Don't know how good it is now (tried it once when it was in beta and did not make the cut for me).
Or you can build your own cache manager. If you do i recommend using redis. The data structures will help you keep you indexes (Can be simulated with memcached, but it requires more work). What I meen by indexes.
You will have a key like client_1_articles where 1 is the client id. In that key you will store all the ids of the articles of client 1. For every article id you will have a key like article_x where x is the id the of article. In this example client_1_articles is a rudimentary index that will help you, if you want at some point, to invalidated all the caches of articles coming from client 1.
The abstract implementation for the above example will end up being a graph like structure over your cache, with possibly
-composed indexes 'client_1:category_1' => {article_1, article_2}
-multiple indexes for one item eg: 'category_1'=>{article_1, article_2, article_3}, 'client_1' => {article_1, article_3}
-etc.
Hope this help you in some way. At least that was my solution for a similar problem.
Good luck with your project,
Alexandru Cosoi

Resources