Firestore unused Index limitation? - firebase

I have an application that uses Firebase Firestore as a database. Currently, I am out of indexes(composite) in the database. It reaches the limit to 200 and cant add more.
I'd manually deleted the index to some extent. Since my application is pretty large It is very difficult to manually find the unused index. Also, it takes time to recheck the same in multiple parts of the application.
I am looking for a solution. Either to identify the unused indexes in a better way than manually searching or an option to extend the limit of the index.

You may file a support ticket with Google Cloud Support, or Firebase Support to get the unused indexes, there is no automated way to do this.
Additionally, there is no way to increase the limit.
Changing the way your data is modeled so that you will need less composite indexes to support your query might also be an alternative as mentioned here.

Related

Deleting a very large collection in firestore from the firebase console

I have a very large collection of aprox 2 milions documents, all of them are outdated, and needed to be deleted.
I need to do this operation only one time, in the new data i have TTL (time to live) so i won't run into this problem again.
Sould i use the firestore console ui to delete those, or there is a better way to do this. is it possible to do this in one shot or sould i split it?
There's no single way that is pertinently better here.
The simplest option is probably to delete the documents from the console, but I often also use the Firebase CLI's firestore:delete command - and writing your own logic through the API is equally fine. Any of these can work fine, all will need to read the documents before deleting them, and none of them is going to be significantly faster than the other.

How to find which kinds are not being used in Google Datastore

There's any way to list the kinds that are not being used in google's datastore by our app engine app without having to look into our code and/or logic? : )
I'm not talking about indexes, which I can list by issuing an
gcloud datastore indexes list
and then compare with the datastore-indexes.xml or index.yaml.
I tried to check datastore kinds statistics and other metadata but I could not find anything useful to help me on this matter.
Should I give up to find ways of datastore providing me useful stats and code something to keep collecting datastore statistics(like data size), during a huge period to have at least a clue of which kinds are not being used and then, only after this research, take a look into our app code to see if the kind Model was removed?
Example:
select bytes from __Stat_Kind__
Store it somewhere and keep updating for a period. If the Kind bytes size does not change than probably the kind is not being used anymore.
The idea is to do some cleaning in datastore.
I would like to find which kinds are not being used anymore, maybe for a long time or were created manually to be used once... You know, like a table in oracle that no one knows what is used for and then if we look into the statistics of that table we would see that this table was only used once 5 years ago. I'm trying to achieve the same in datastore, I want to know which kinds are not being used anymore or were used a while ago, then ask around and backup/delete it if no owner was found.
It's an interesting question.
I think you would be best-placed to audit your code and instill organizational practice that requires this documentation to be performed in future as a business|technical pre-prod requirement.
IIRC, Datastore doesn't automatically timestamp Entities and keys (rightly) aren't incremental. So there appears no intrinsic mechanism to track changes short of taking a snapshot (expensive) and comparing your in-flight and backup copies for changes (also expensive and inconclusive).
One challenge with identifying a Kind that appears to be non-changing is that it could be referenced (rarely) by another Kind and so, while it does not change, it is required.
Auditing your code and documenting it for posterity should not only provide you with a definitive answer (and identify owners) but it pays off a significant technical debt that has been incurred and avoids this and probably future problems (e.g. GDPR-like) requirements that will arise in the future.
Assuming you are referring to records being created/updated, then I can think of the following options
Via the Cloud Console (Datastore > Dashboard) - This lists all your 'Kinds' and the number of records in each Kind. Theoretically, you can take a screen shot and compare the counts so that you know which one has experienced an increase or not.
Use of Created/LastModified Date columns - I usually add these 2 columns to most of my datastore tables. If you have them, then you can have a stored function that queries them. For example, you run a query to sort all of your Kinds in descending order of creation (or last modified date) and you only pull the first record from each one. This tells you the last time a record was created or modified.
I would write a function as part of my App, put it behind a page which requires admin privilege (only app creator can run it) and then just clicking a link on my App would give me the information.

Firestore Realtime Updates 1M Limit

When using Firestore and subscribing to document updates, it states a limit of 1M concurrent mobile/web connections per database.
https://firebase.google.com/docs/firestore/quotas#realtime_updates
Is that a hard limit (enforced/throttled in code)? Or is it a theoretical limit (like you're safe up to 1M, then things get dicey)? Is it possible to get an uplift?
Trying to understand how to support a large user base without needing to shard the database (which is one of the advantages of Firestore). Even at 5M users, it seems you would start having problems because you'd probably hit times when >20% of those users were on your app simultaneously.
As you already noticed, the maximum size of a single document in Firestore is 1 Megabyte. Trying to store large number of objects (maps) that may exceed this limitation, is generally considered a bad design.
You should reconsider the logic of you app and think at the reson why you need to have more than 1Mib in single a document, rather than each object being their own document. So to be able to use Firestore, you should change the way you are holding the data from within a single documents to a collection. In case of collections, there are no limitations. You can add as many documents as you want. According to the official documentation regarding Cloud Firestore Data model:
Cloud Firestore is optimized for storing large collections of small documents.
IMHO, you should take advantage of this feature.
For details, I recommend you see my answer from this post where I have explained some practices regarding storing data in arrays (documents), maps or collections.
Edit:
Without sharding, I'm affraid it is not an option. So in this case, sharding will work for sure. So in my opinion, that's certainly a reasonable option.

How can I remove automatic Google Cloud Firestore indexes?

I have a small web application that uses Firestore for the data.
I am currently storing around 8 million objects.
The data size in the Firestore database is 10 times bigger than it is in "raw" format (json files).
Reading this wiki I came to the conclusion that there is a high chance that the automatic indexes are guilty of the humongous size of the database.
I have read the indexing documentation and it says how to create and remove custom indexes from the console, but I haven't find a way of removing automatic indexes.
Since I am only doing direct access to the objects, I don't need fields indexing at all.
During the Beta there is no way to remove automatic generated indexes in Firestore.
So far you are only able to manage composite indexes by yourself.
Source: A Google Firestore employee mentioned this as a part of an answer on another question on Stackoverflow. Though, I can't find the link to the post anymore.

DocumentDb and how to create folder?

New to documentdb and I am trying to determine the best way to store documents. We are uploading documents every 15 minutes and I need to keep them as easily separated by upload as possible. At first glance, I thought I could have a database and a collection for each upload. Then, I discovered you can only have 3 collections per database. This leaves me with either adding a naming convention or trying to use folders and paths. According to the same source (http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/), we are limited to 100 paths per collection. This leaves folders. I have been looking, but I haven't found anything concrete on creating folders within a collection. The object API doesn't have an obvious add/create method.
Is this possible? If so, are we limited to how many (assuming I stay within the allowed collection/database size)?
You could define a sequential naming convention and create a range index on the collection indexing policy. In this way, if you need to retrieve a range of documents, you can do it in this way, which will leverage the indexing capabilities of docdb efficiently.
As a recommendation, you can examine the charge response header on the requests you fire off during your tests. This allows you to gauge how efficient your setup is (how stringent it is against the Db, which will translate into your cost structure for the service)
Sorry about the comment. What we ended up doing was just dumping everything into one collection. The azure documentdb query language (i.e. sql like) seems robust enough to handle detailed queries. Though I am not sure what the efficiency will be like once we have a ton of documents in there.

Resources