Having consistency during multi path updates when the paths are not deterministic and are variable - firebase

I need help in a scenario when we do multipath updates to a fan-out data. When we calculate the number of paths and then update, in between that, if a new path is added somewhere, the data would be inconsistent in the newly added path.
For example below is the data of blog posts. The posts can be tagged by multiple terms like “tag1”, “tag2”. In order to find how many posts are tagged with a specific tag I can fanout the posts data to the tags path path as well:
/posts/postid1:{“Title”:”Title 1”, “body”: “About Firebase”, “tags”: {“tag1:true, “tag2”: true}}
/tags/tag1/postid1: {“Title”:”Title 1”, “body”: “About Firebase”}
/tags/tag2/postid1: {“Title”:”Title 1”, “body”: “About Firebase”}
Now consider concurrently,
1a) that User1 wants to modify title of postid1 and he builds following multi-path update:
/posts/postid1/Title : “Title 1 modified”
/tags/tag1/postid1/Title : “Title 1 modified”
/tags/tag2/postid1/Title : “Title 1 modified”
1b) At the same time User2 wants to add tag3 to the postid1 and build following multi-path update:
/posts/postid1/tags : {“tag1:true, “tag2”: true, “tag3”: true}
/tags/tag3/postid1: {“Title”:”Title 1”, “body”: “About Firebase”}
So apparently both updates can succeed one after other and we can have tags/tag3/postid1 data out of sync as it has old title.
I can think of security rules to handle this but then not sure if this is correct or will work.
Like we can have updatedAt and lastUpdatedAt fields and we have check if we are updating our own version of post that we read:
posts":{
"$postid":{
".write":true,
".read":true,
".validate": "
newData.hasChildren(['userId', 'updatedAt', 'lastUpdated', 'Title']) && (
!data.exists() ||
data.child('updatedAt').val() === newData.child('lastUpdated').val())"
}
}
Also for tags we do not want to check that again and we can check if /tags/$tag/$postid/updatedAt is same as /posts/$postid/updatedAt.
"tags":{
"$tag":{
"$postid":{
".write":true,
".read":true,
".validate": "
newData.hasChildren(['userId', 'updatedAt', 'lastUpdated', 'Title']) && (
newData.child('updatedAt').val() === root.child('posts').child('$postid').val().child('updatedAt').val())”
}
}
}
By this “/posts/$postid” has concurrency control in it and users can write their own reads
Also /posts/$postid” becomes source of truth and rest other fan-out paths check if updatedAt fields matches with it the primary source of truth path.
Will this bring in consistency or there are still problems? Or can bring performance down when done at scale?
Are multi path updates and rules atomic together by that I mean a rule or both rules are evaluated separately in isolation for multi path updates like 1a and 1b above?

Unfortunately, Firebase does not provide any guarantees, or mechanisms, to provide the level of determinism you're looking for. I have had the best luck front-ending such updates with an API stack (GCF and Lambda are both very easy, server-less methods of doing this). The updates can be made in that layer, and even serialized if absolutely necessary. But there isn't a safe way to do this in Firebase itself.
There are numerous "hack" options you could apply. You could, for example, have a simple lock mechanism using a dedicated collection for tracking write locks. Clients could post to a lock collection, then verify that their key was the only member of that collection, before performing a write. But I hope you'll agree with me that such cooperative systems have too many potential edge cases, potential security issues, and so on. In Firebase, it is best to design such that this component is not a requirement in the first place.

Related

Is there a way with Hasura to do a mutation based on the result of a query, within the same GraphQL call (Hasura transaction)?

I tried to search for an example but, I presume it's not doable. I am looking to hopefully be proven wrong or to find an official confirmation that it's not doable.
Before using Hasura, I was doing transactional SQL queries that ensured that data was kept consistent.
For example, I would like to create a password reset token if a user requests it, only if the user can be found using an email address. Right now, I have to do 2 queries:
Try to find a user with the specified email address
Insert and assign the token to this user id
In that case, it's not too bad, but now if I want to consume that token, I have to do 3 queries:
Find the valid token
Change the password to the user associated with that token
Delete the token
Obviously, if something goes wrong and the token is not deleted, this could be an issue - so I would be curious to see if there would be ways to merge these queries/mutations into transactions.
Sounds like supporting nested updates would solve this problem for you with the least amount of effort. We are working on a rfc for the feature and hope to start development soon. Please follow this Github issue on our community for future updates.
https://github.com/hasura/graphql-engine/issues/1573
This comment outlines the current scope of the proposed feature. The rfc will provide a more complete explanation.
https://github.com/hasura/graphql-engine/issues/1573#issuecomment-1338057350
You can apply changes to rows that you filter by certain criteria. Here is a sample mutation:
mutation PasswordUpdate($id: uuid!, $token: String!, $new_password: String!) {
update_user(
where: {id: {_eq: $id}, token: {_eq: $token}}
_set: {token: null, password: $new_password}
) {
affected_rows
}
}
That query deletes the token and sets a password for all users (hopefully just one) that have the token assigned.
After some research here is what I found:
For the first example:
Try to find a user with the specified email address
Insert and assign the token to this user id
There are no solutions for this today and as answered by #damel, there is an ongoing RFC to support nested mutations: https://github.com/hasura/graphql-engine/issues/1573#issuecomment-1338057350
Hopefully, this feature will be out soon, but in the meantime, for most cases, it's not such a big deal to have multiple queries as it is possible to catch errors on the first query.
For the second example:
Find the valid token
Change the password to the user associated with that token
Delete the token
When sending multiple mutations in the same query, Hasura treats them as a transaction as announced in 2020.
Of course, it would be nice to do this in the same query (similar to the first example) but since there is a transaction on the mutation, for this case it's still not a problem.
I am sure there are probably cases where this can become a problem but I am not exposed to them right now. Nevertheless, it would be great if the RFC makes it to production, giving more options to Hasura users.

Request.auth.metadata in security rules?

I have a Firebase project where I'd like for users to be able to see when other users created their profiles. My initial hope was that I could use "user.metadata.creationTime" on the frontend to pass the date into the user's extra info document and verify that it is correct by having "request.resource.data.datecreated == request.auth.metadata.creationTime" as a Database Rule, but it looks like it is not possible according to the documentation.
Is there any way I can verify that the creation date is correct on the backend?
More info edit: Below is the code that is being triggered when a user creates a new account on my profile. The three values are displayed publicly. I'm creating a niche gear for sale page so being able to see when a user first created their account could be helpful when deciding if a seller is sketchy. I don't want someone to be able to make it seem like they have been around for longer than they have been.
db.collection('users').doc(user.uid).set({
username: "Username-156135",
bio: "Add a bio",
created: user.metadata.creationTime
});
Firestore rules:
match /users/{id} {
allow get;
allow create, update: if request.resource.data.username is string &&
request.resource.data.bio is string &&
request.resource.data.created == request.auth.metadata.creationTime;
}
user.metadata.creationTime, according to the API documentation is a string with no documented format. I suggest not using it. In fact, what you're trying to do seems impossible since that value isn't available in the API documentation for request.auth.
What I suggest you do instead is use a Firebase Auth onCreate trigger with Cloud Functions to automatically create that document with the current time as a proper timestamp. Then, in security rules, I wouldn't even give the user the ability to change that field, so you can be sure it was only ever set accurately by the trigger. You might be interested in this solution overall.

Tracking if a User 'likes' a post

This is more of a theoretical how database should be setup, and less about programming.
Lets say I have a news feed full of cards, which each contain a message and a like count. Each user is able to like a mesesage. I want it to be displayed to a user if they have already liked that particular card. (The same way you can see the post you like on facebook, even if you come back days later)
How would you implement that with this Firestore type database? Speed is definitely a concern..
storying it locally isn't an option, my guess would be on each card object, you would have to reference a collection that just kept a list of people who liked it. The only thing is that is a lot more querying.. which feels like it would be slow..
is there a better way to do this?
TL;DR
This approach requires more to setup, ie a cron service, knowledge of Firestore Security Rules and Cloud Functions for Firebase. With that said, the following is the best approach I've come up with. Please note, only pseudo-rules that are required are shown.
Firestore structure with some rules
/*
allow read
allow update if auth.uid == admin_uid and the
admin is updating total_likes ...
*/
messages/{message_key} : {
total_likes: <int>,
other_field:
[,...]
}
/*allow read
allow write if newData == {updated: true} and
docId exists under /messages
*/
messages_updated/{message_key} : {
updated: true
}
/*
allow read
allow create if auth.uid == liker_uid && !counted && !delete and
liker_uid/message_key match those in the docId...
allow update if auth.uid == admin_uid && the admin is
toggling counted from false -> true ...
allow update if auth.uid == liker_uid && the liker is
toggling delete ...
allow delete if auth.uid == admin_uid && delete == true and
counted == true
*/
likes/{liker_uid + '#' + message_key} : {
liker_uid:,
message_key:,
counted: <bool>,
delete: <bool>,
other_field:
[,...]
}
count_likes/{request_id}: {
message_key:,
request_time: <timestamp>
}
Functions
Function A
Triggered every X minutes to count message likes for potentially all messages.
query /messages_updated for BATCH_SIZE docs
for each, set its docId to true in a local object.
go to step 1 if BATCH_SIZE docs were retrieved (there's more to read in)
for each message_key in local object, add to /count_likes a doc w/ fields request_time and message_key.
Function B
Triggered onCreate of count_likes/{request_id}
Delete created docs message_key from /messages_updated.
let delta_likes = 0
query /likes for docs where message_key == created docs message_key and where counted == false.
for each, try to update counted to true (in parallel, not atomically)
if successful, increment delta_likes by 1.
query /likes for docs where message_key == created docs message_key, where delete == true and where counted == true.
for each doc, try to delete it (in parallel, not atomically)
if successful, decrement delta_likes by 1
if delta_likes != 0, transact the total likes for this message under
/messages by delta_likes.
delete this doc from /count_likes.
Function C (optional)
Triggered every Y minutes to delete /count_likes requests that were never met.
query docs under /count_likes that have request_time older than Z.
for each doc, delete it.
On the client
to see if you liked a message, query under /likes for a doc where liker_uid equals your uid, where message_key equals the message's key and where delete == false. if a doc exists, you have liked it.
to like a message, batch.set a like under /likes and batch.set a /messages_updated. if this batch fails, try a batch_two.update on the like by updating its delete field to false and batch_two.set its /messages_updated.
to unlike a message, batch.update on the like by updating its delete field to true and batch.set its /messages_updated.
Pros of this approach
this can be extended to counters for other things, not just messages.
a user can see if they've liked something.
a user can only like something once.
a user can spam toggle a like button and this still works.
any user can see who's liked what message by querying /likes by message_key.
any user can see all the messages any user has liked by querying /likes by liker_uid.
only a cloud function admin updates your like counts.
if a function is fired multiple times for the same event, this function is safe, meaning like counts will not be incremented multiple times for the same like.
if a function is not fired for some event, this approach still works. It just means that the count will not update until the next time someone else likes the same message.
likes are denormalized to only one root level collection, instead of the two that would be required if you had the like under the the message's likes subcollection and under the liker's messages_liked subcollection.
like counts for each message are updated in batches, ie if something has been liked 100 times, only 1 transaction of 100 is required, not 100 transactions of 1. this reduces write rate conflicts due to like counter transactions significantly.
Cons of this approach
Counts are only updated however often your cron job fires.
Relies on a cron service to fire and in general there's just more to set up.
Requires the function to authenticate with limited privileges to perform secure writes under /likes. In the Realtime Database this is possible. In Firestore, it's possible, but a bit hacky. If you can wait and don't want to use the hacky approach, use the regular unrestricted admin in development until Firestore supports authenticating with limited privileges.
May be costly depending on your standpoint. There are function invocations and read/write counts you should think about.
Things to consider
When you transact the count in Function B, you may want to consider trying this multiple times in case the max write rate of 1/sec is exceeded and the transaction fails.
In Function B, you may want to implement batch reading like in Function A if you expect to be counting a lot of likes per message.
If you need to update anything else periodically for the message (in another cron job), you may want to consider merging that function into Function B so the write rate of 1/sec isn't exceeded.

Firebase database rules – `data.exists()` always seems to be true, possible bug?

I am trying to secure my firebase database to allow the creation of new records, but not allow the deletion of existing records. Ultimately, I plan to utilise Firebase authentication in my app as well, and allow users to update existing records if they are the author, but I am trying to get the simple case working first.
However! No matter what I try in the database rules simulator, despite what the documentation seems to suggest, the value of data.exists() seems to always be true. From what I what I can understand from the documentation, the variable data represents a record in the database as it did before an operation took-place. That is to say, for creates, data would not exist, and for updates/deletes, data would refer to a real record that exists in the database. This does not seem to be the case, to the point where I am actually suspecting a bug in Firebase, as when setting the following rules on my database, all write operations are disallowed:
{
"rules": {
".read": true,
".write": "!data.exists()"
}
}
No matter what values I put into the simulator, be it Location or Data. I have even written a small EmberJS app to verify if the Simulator is telling the truth and it too, is denied permission for all write operations.
I really have no idea where to go from here as I am pretty much out of things to try. I tried deleting all records from my database, which lets the simulator think it can perform write operations, but my test app still gets PERMISSION_DENIED, so I don't know what's causing inconsistencies there.
Is my understanding of the predefined data variable correct? If so, why can't I write the rules I want? I have seen snippets literally trying to achieve my "create only, no-delete" rule that seem to line up with my understanding.
Last note: I am trying this in a totally new Firebase project with JUST the rules above, and only ~a few records of junk data laying around my database.
Because you have placed the !data.exists() at the root location of your database, data refers to the entire database. You will only be able to write to the database when it is completely empty.
You indicate that you run your tests with only a few records of junk data laying around my database. Those records will cause data.exists() to be true.
You can achieve your goal by placing the !data.exists() rule in your tree at the specific location where you want to require that no data already exists. This is typically done at a location with a wildcard key, as in the example you linked:
{
"rules": {
// default rules are false if not specified
"posts": {
".read": true, // everyone can read all posts
"$postId": {
// a new post can be created if it does not exist
// existing posts can only be edited by their original "author"
".write": "!data.exists() && newData.exists() || data.child('author').val() == auth.uid",
".validate": "newData.hasChildren(['title', 'author', 'timestamp'])",
}
}
}
}

How to create time-expiring data with Firebase Rules?

This talk mentions time-expiring data using Firebase rules at 22:55
https://www.youtube.com/watch?v=PUBnlbjZFAI
How can one do this ?
I didn't find any information regarding this.
I recommend two solutions.
1) Use cloud functions to record a message path and the date it was posted. Then every hour sort that list by date, pick all the expired ones, and create a deep update object to null out every expired message. Nowadays you can use Cron Scheduler to handle the periodic flush.
2) Make a rule that says anyone can delete expired messages and make it so that clients automatically delete expired messages when they are in a chat room.
Written here: https://firebase.google.com/docs/database/security/securing-data
You can't have it auto delete your data but you can make them unreadable (which is the same thing from the user standpoint). Just send a timestamp child field with you data and check against it.
{
"rules": {
"messages": {
"$message": {
// only messages from the last ten minutes can be read
".read": "data.child('timestamp').val() > (now - 600000)",
// new messages must have a string content and a number timestamp
".validate": "newData.hasChildren(['content', 'timestamp']) && newData.child('content').isString() && newData.child('timestamp').isNumber()"
}
}
}
}
Same question here.
You can't do it using firebase rules. You should either have a NodeJS backend removing your old data or clients doing it for you. For example, before a client retrieves data, he could remove old data.

Resources