Is there a limit on number of paths in firebase multi-path updates? - firebase

How many different paths are allowed inside a multi-path update. (maximum)
What is the ideal number of different paths that can be used for simultaneous updation without causing any issues/warnings.
Basically to summarize it all .... how many locations can be simultaneously written before firebase can no longer handle it.
I am looking to run a script which resets various paths.. The number of locations can be a huge number... to optimize this operation, i was thinking of using the multi location update for handling this.

If you're running a script which performs a huge number of queries, Multi-path updates are exactly what you need. Don't forget that multi-path updates are atomic operations (all or nothing) which means that if 1 of the operations doesn't succeed, all the other will be cancelled.
Now when it comes to number of updates, there is no limit. You can add as many paths as you want.
One last warning: Make sure all of the paths are corect and the value you're updating is the one you really want to update. Many developers (beginners and experts) sometimes make mistakes when specifying the paths and often end up deleting the whole database or a good part of it ends up with data that belongs to another node.

Related

Firebase Database data consumption optimization (observing node only partly)

On my database, I have Post and User Models
User Model has a lot of information, but when I load the posts, I only need 3 out of like 20 parameters.
What I am currently doing is just loading the entire node. This is obviously not really efficient.
My question: Is it more efficient if I observe all 3 values (making 3 connections) individually or just observe the entire node once (making only a single connection).
I don't know exactly what would be more expensive (higher consumption as making 3 connections is probably not better than 1)
Kind regards
Edit
Firebase always loads complete nodes. While it is possible to get a subset of nodes with queries, that doesn't apply here.
So you will either have to load all nodes and do the subselection client-side, or you'll have to create another higher level node that only contains the three properties that you're interested in.
Which one to choose is highly dependent, and (honestly) largely subjective. The main options:
You can reduce bandwidth a bit by only loading the three properties, but if you store them as a duplicate you'll then end up paying for the storage of duplicate information.
You can also store the three properties separately, but not duplicate them. But that means that if you need all properties, you'll need to execute two read operations that then add some overhead and complicate code.

How to find which kinds are not being used in Google Datastore

There's any way to list the kinds that are not being used in google's datastore by our app engine app without having to look into our code and/or logic? : )
I'm not talking about indexes, which I can list by issuing an
gcloud datastore indexes list
and then compare with the datastore-indexes.xml or index.yaml.
I tried to check datastore kinds statistics and other metadata but I could not find anything useful to help me on this matter.
Should I give up to find ways of datastore providing me useful stats and code something to keep collecting datastore statistics(like data size), during a huge period to have at least a clue of which kinds are not being used and then, only after this research, take a look into our app code to see if the kind Model was removed?
Example:
select bytes from __Stat_Kind__
Store it somewhere and keep updating for a period. If the Kind bytes size does not change than probably the kind is not being used anymore.
The idea is to do some cleaning in datastore.
I would like to find which kinds are not being used anymore, maybe for a long time or were created manually to be used once... You know, like a table in oracle that no one knows what is used for and then if we look into the statistics of that table we would see that this table was only used once 5 years ago. I'm trying to achieve the same in datastore, I want to know which kinds are not being used anymore or were used a while ago, then ask around and backup/delete it if no owner was found.
It's an interesting question.
I think you would be best-placed to audit your code and instill organizational practice that requires this documentation to be performed in future as a business|technical pre-prod requirement.
IIRC, Datastore doesn't automatically timestamp Entities and keys (rightly) aren't incremental. So there appears no intrinsic mechanism to track changes short of taking a snapshot (expensive) and comparing your in-flight and backup copies for changes (also expensive and inconclusive).
One challenge with identifying a Kind that appears to be non-changing is that it could be referenced (rarely) by another Kind and so, while it does not change, it is required.
Auditing your code and documenting it for posterity should not only provide you with a definitive answer (and identify owners) but it pays off a significant technical debt that has been incurred and avoids this and probably future problems (e.g. GDPR-like) requirements that will arise in the future.
Assuming you are referring to records being created/updated, then I can think of the following options
Via the Cloud Console (Datastore > Dashboard) - This lists all your 'Kinds' and the number of records in each Kind. Theoretically, you can take a screen shot and compare the counts so that you know which one has experienced an increase or not.
Use of Created/LastModified Date columns - I usually add these 2 columns to most of my datastore tables. If you have them, then you can have a stored function that queries them. For example, you run a query to sort all of your Kinds in descending order of creation (or last modified date) and you only pull the first record from each one. This tells you the last time a record was created or modified.
I would write a function as part of my App, put it behind a page which requires admin privilege (only app creator can run it) and then just clicking a link on my App would give me the information.

firebase database equivalent of MySQL transaction

I'm seeking something where I can thread through multiple updates to multiple firebase.database.References (before performing a commit) a single object and then commit that at the end and if it is unsuccessful no changes are made to any of my Firebase References.
Does this exist? the firebase.database.Transaction I thought would be similar since it is an atomic update and it does involve a callback which says if it has been committed or not, but the update function, I believe, is only for a single object, and the function doesn't seem to return a transactionId or something I could pass to other firebase.database.Transactionss or something.
UPDATE
This transaction's update seems to return a Transaction which would lend itself to perhaps chaining: https://firebase.google.com/docs/reference/js/firebase.firestore.Transaction
however this is different from the other Transaction:
Firebase Database transactions perform an update to a single location based on the current value of that same location. They explicitly do not work across multiple locations, since that would limit their scalability. Sometimes developers work around this by performing a transaction higher up in their JSON tree (at the first common point of the locations). I'd recommend against that, as that would limit the scalability even further.
The only way to efficiently update multiple locations with one API call, is with a multiple location update. This does however not have reading of the current value built-in.
So if you want to update multiple locations based on their current value, you'll have to perform the read operation in your application code, turn that into a multi-location update, and then use security rules to ensure all of those updates follow your application rules. This is a quite non-trivial approach, so I hardly see it being done in practice. See my answer here for an example: Is the way the Firebase database quickstart handles counts secure?

Aggregation on FireStore/CloudDatastore. Use Cloud Functions onCreate/Update?

I want to create an expense tracker and one of the things I want to find out is how much did I spend in each month per category.
How should I do this in FireStore/DataStore?
Pull down required data and do aggregation locally? Seems very slow?
Perform aggregation everytime a transaction is created/updated and save it in a table? But this may result in many invocations of the functions, which may be costly?
Is there a better way? Seems like 2 is currently the best option? But I wonder if theres anyway I can reduce costs?
I note that I may not need the aggregated data to be realtime, so is there a way to debounce the cloud function execution? Since I note that at times, I will batch insert a bunch of transactions. Wonder if theres a way to disable functions for certain queries and manually call them after the batch has finished for example?
The two approaches you describe are indeed the most common.
The best approach mostly depends on the number of transactions you have. If you have few transactions, then it may be totally fine to do the aggregation on each client. But as you get more transactions, the overhead of downloading the data will become prohibitive and you're more likely to want to keep a running total in the database.
I'd normally recommend keeping the total up to date with any transaction. You can even do that with client-side code, by using transactions (to prevent multiple users overwriting each other's updates) and server-side security rules (to prevent malicious actors from writing an aggregate that doesn't match its transaction).
If you want to aggregate in batches, you'll want to run code periodically, either in a server you control, or in Cloud Functions.
There is nothing built into Cloud Functions to debounce document writes. You could probably keep a debounce counter in Firestore, but that would then be reading/writing a document on each transaction.
More reasonable seems to run a function on a timer, as described in this blog post and shown in this video. But you'll need to make sure your data structure in that case allows the code to detect what transactions it needs to aggregate.
One way to do this is to ensure the transactions can be ordered in some way, e.g. by giving them a timestamp, and having your aggregation code keep track (likely in the database) of the last timestamp it has aggregated already. Then whenever the aggregator runs, it:
reads the current aggregated value
queries the database for transactions that have been added since it last ran
loops over those transactions, updating the aggregated value
writes the aggregated value and the last timestamp back to the database in a transaction (to ensure either both are written, or neither is written)

Riak: are my 2is broken?

we're having some weird things happening with a cleanup cronjob and riak:
the objects we store (postboxes) have a 2i for modification date (which is a unix timestamp).
there's a cronjob running freqently deleting all postboxes that have not been modified within 180 days. however we've found evidence that postboxes that some (very little) postboxes that were modified in the last three days were deleted by this cronjob.
After reviewing and debugging several times over every line of code, I am confident, that this is not a problem of the cronjob.
I also traced back all delete calls to that bucket - and no one else is deleting objects there.
Of course I also checked with Riak to read the postboxes with r=ALL: they're definitely gone. (and they are stored with w=QUORUM)
I also checked the logs: updating the post boxes did succeed (there were no errors reported back from the write operations)
This leaves me with two possible causes for this:
riak loses data (which I am not willing to believe that easily)
the secondary indexes are corrupt and queries to them return wrong keys
So my questions are:
Can 2is actually break?
Is it possible to verify that?
Am I missing something completely different?
Cheers,
Matthias
Secondary index queries in Riak are coverage queries, which means that they will only use one of the stored replicas, and not perform a quorum read.
As you are writing with w=QUORUM, it is possible that one (or more) of the replicas may not get updated if you have n_val set to 3 or higher while the operation still is deemed successful. If this is the one selected for the coverage query, you could end up deleting based on the old value. In order to avoid this, you will need to perform updates with w=ALL.

Resources