Maximum write rate to a document on Firestore - firebase

I am using Firestore to figure out, in real-time, each user's share of the cost of an item. Example:
/tickets/100/ticket-item/1:
{
name: 'Red Dead Redemption'
price: '5000'
payers (array of maps): [
{
name: 'John',
share: '1666'
},
{
name: 'Jane',
share: '1667'
},
{
name: 'Jack',
share: '1667'
}
]
}
Given that the max write rate to a document is 1/second, will the write always fail if two users add themselves to the same ticket item doc at the exact same time?
I know that this can be mitigated to an extent by using transactions, but a transaction will only re-execute a finite number of times. Let's say it re-executes up to 5 times. If 6 users write to same ticket item doc at the exact same time, can I expect one of these writes to fail?
I would appreciate any and all advice regarding how to handle this.

will the write always fail if two users add themselves to the same ticket item doc at the exact same time?
Yes it will. So if you are sure you'll have situations in which two or even more users will try to write/update data in a single document in the exact same time, I recommend you to be careful about this limitation because you might start to see some of this write operations to fail.
I know that this can be mitigated to an extent by using transactions
It's a good idea but please be aware that transactions will fail when the client is offline.
If 6 users write to same ticket item doc at the exact same time, can I expect one of these writes to fail?
As the docs states, a transaction will only re-execute a finite number of times. But please also note that in case of a transaction failure:
A failed transaction returns an error and does not write anything to the database.
So all you have to do is to take some action in case o transaction failure.

I'm researching same problem.
May be like a solution: moving "payers" into separate collection with a ticket_id field?
So you'll have no limitations.

Related

Is there a way with Hasura to do a mutation based on the result of a query, within the same GraphQL call (Hasura transaction)?

I tried to search for an example but, I presume it's not doable. I am looking to hopefully be proven wrong or to find an official confirmation that it's not doable.
Before using Hasura, I was doing transactional SQL queries that ensured that data was kept consistent.
For example, I would like to create a password reset token if a user requests it, only if the user can be found using an email address. Right now, I have to do 2 queries:
Try to find a user with the specified email address
Insert and assign the token to this user id
In that case, it's not too bad, but now if I want to consume that token, I have to do 3 queries:
Find the valid token
Change the password to the user associated with that token
Delete the token
Obviously, if something goes wrong and the token is not deleted, this could be an issue - so I would be curious to see if there would be ways to merge these queries/mutations into transactions.
Sounds like supporting nested updates would solve this problem for you with the least amount of effort. We are working on a rfc for the feature and hope to start development soon. Please follow this Github issue on our community for future updates.
https://github.com/hasura/graphql-engine/issues/1573
This comment outlines the current scope of the proposed feature. The rfc will provide a more complete explanation.
https://github.com/hasura/graphql-engine/issues/1573#issuecomment-1338057350
You can apply changes to rows that you filter by certain criteria. Here is a sample mutation:
mutation PasswordUpdate($id: uuid!, $token: String!, $new_password: String!) {
update_user(
where: {id: {_eq: $id}, token: {_eq: $token}}
_set: {token: null, password: $new_password}
) {
affected_rows
}
}
That query deletes the token and sets a password for all users (hopefully just one) that have the token assigned.
After some research here is what I found:
For the first example:
Try to find a user with the specified email address
Insert and assign the token to this user id
There are no solutions for this today and as answered by #damel, there is an ongoing RFC to support nested mutations: https://github.com/hasura/graphql-engine/issues/1573#issuecomment-1338057350
Hopefully, this feature will be out soon, but in the meantime, for most cases, it's not such a big deal to have multiple queries as it is possible to catch errors on the first query.
For the second example:
Find the valid token
Change the password to the user associated with that token
Delete the token
When sending multiple mutations in the same query, Hasura treats them as a transaction as announced in 2020.
Of course, it would be nice to do this in the same query (similar to the first example) but since there is a transaction on the mutation, for this case it's still not a problem.
I am sure there are probably cases where this can become a problem but I am not exposed to them right now. Nevertheless, it would be great if the RFC makes it to production, giving more options to Hasura users.

Firestore: Better understand how security rules work when fetching other document

I'm really digging Firestore but it's hard to find answer to specific question so here I am. This is just to be sure I understood properly how security rules works :)
Here's my schema:
/databases/{database}/documents/Bases/Base1 {
roles: { // map
user1: {admin: true}
},
Items: { // SubCollection
item1: {
name: "Hello World"
},
...n,
item10: {
name: "Good Bye World"
}
}
}
I want my user1 to fetch all 10 items in Base1. Query is pretty simple db.collection('Bases').doc('Base1').collection('Items').get()
But I also want to be sure that user1 is an admin in Base1. So I'm setting this security rules:
match /bases/{baseId}/items/{itemId}{
allow read: if request.auth != null
&& get(/databases/$(database)/documents/bases/$(baseId)).data.roles[request.auth.id].admin == true
}
Which works, all good. Here're the questions:
1/ I understand this rules get() will cost me one read (which is very cheap I know). Is it one read per query OR one read per document that needs to be validate? ie. 10 reads in my case.
2/ I assume answer to 1/ is that it'll cost 10 reads. But, as I'm always querying the same $(baseId), cache will kick-in and even if not guarantee, it should drastically reduce the number of charged reads (theorically 1 read even if I'm fetchin 1000 docs)?
Any other advice on how to handle those kind of schemas are welcome. I know read ops are very cheap but I like to understand where I'm going :)
Thanks SO :)
The cost of a get() in security rules only applies once per query. It does not apply per document fetched.
Since you have one query, it will cost one read.

Tracking if a User 'likes' a post

This is more of a theoretical how database should be setup, and less about programming.
Lets say I have a news feed full of cards, which each contain a message and a like count. Each user is able to like a mesesage. I want it to be displayed to a user if they have already liked that particular card. (The same way you can see the post you like on facebook, even if you come back days later)
How would you implement that with this Firestore type database? Speed is definitely a concern..
storying it locally isn't an option, my guess would be on each card object, you would have to reference a collection that just kept a list of people who liked it. The only thing is that is a lot more querying.. which feels like it would be slow..
is there a better way to do this?
TL;DR
This approach requires more to setup, ie a cron service, knowledge of Firestore Security Rules and Cloud Functions for Firebase. With that said, the following is the best approach I've come up with. Please note, only pseudo-rules that are required are shown.
Firestore structure with some rules
/*
allow read
allow update if auth.uid == admin_uid and the
admin is updating total_likes ...
*/
messages/{message_key} : {
total_likes: <int>,
other_field:
[,...]
}
/*allow read
allow write if newData == {updated: true} and
docId exists under /messages
*/
messages_updated/{message_key} : {
updated: true
}
/*
allow read
allow create if auth.uid == liker_uid && !counted && !delete and
liker_uid/message_key match those in the docId...
allow update if auth.uid == admin_uid && the admin is
toggling counted from false -> true ...
allow update if auth.uid == liker_uid && the liker is
toggling delete ...
allow delete if auth.uid == admin_uid && delete == true and
counted == true
*/
likes/{liker_uid + '#' + message_key} : {
liker_uid:,
message_key:,
counted: <bool>,
delete: <bool>,
other_field:
[,...]
}
count_likes/{request_id}: {
message_key:,
request_time: <timestamp>
}
Functions
Function A
Triggered every X minutes to count message likes for potentially all messages.
query /messages_updated for BATCH_SIZE docs
for each, set its docId to true in a local object.
go to step 1 if BATCH_SIZE docs were retrieved (there's more to read in)
for each message_key in local object, add to /count_likes a doc w/ fields request_time and message_key.
Function B
Triggered onCreate of count_likes/{request_id}
Delete created docs message_key from /messages_updated.
let delta_likes = 0
query /likes for docs where message_key == created docs message_key and where counted == false.
for each, try to update counted to true (in parallel, not atomically)
if successful, increment delta_likes by 1.
query /likes for docs where message_key == created docs message_key, where delete == true and where counted == true.
for each doc, try to delete it (in parallel, not atomically)
if successful, decrement delta_likes by 1
if delta_likes != 0, transact the total likes for this message under
/messages by delta_likes.
delete this doc from /count_likes.
Function C (optional)
Triggered every Y minutes to delete /count_likes requests that were never met.
query docs under /count_likes that have request_time older than Z.
for each doc, delete it.
On the client
to see if you liked a message, query under /likes for a doc where liker_uid equals your uid, where message_key equals the message's key and where delete == false. if a doc exists, you have liked it.
to like a message, batch.set a like under /likes and batch.set a /messages_updated. if this batch fails, try a batch_two.update on the like by updating its delete field to false and batch_two.set its /messages_updated.
to unlike a message, batch.update on the like by updating its delete field to true and batch.set its /messages_updated.
Pros of this approach
this can be extended to counters for other things, not just messages.
a user can see if they've liked something.
a user can only like something once.
a user can spam toggle a like button and this still works.
any user can see who's liked what message by querying /likes by message_key.
any user can see all the messages any user has liked by querying /likes by liker_uid.
only a cloud function admin updates your like counts.
if a function is fired multiple times for the same event, this function is safe, meaning like counts will not be incremented multiple times for the same like.
if a function is not fired for some event, this approach still works. It just means that the count will not update until the next time someone else likes the same message.
likes are denormalized to only one root level collection, instead of the two that would be required if you had the like under the the message's likes subcollection and under the liker's messages_liked subcollection.
like counts for each message are updated in batches, ie if something has been liked 100 times, only 1 transaction of 100 is required, not 100 transactions of 1. this reduces write rate conflicts due to like counter transactions significantly.
Cons of this approach
Counts are only updated however often your cron job fires.
Relies on a cron service to fire and in general there's just more to set up.
Requires the function to authenticate with limited privileges to perform secure writes under /likes. In the Realtime Database this is possible. In Firestore, it's possible, but a bit hacky. If you can wait and don't want to use the hacky approach, use the regular unrestricted admin in development until Firestore supports authenticating with limited privileges.
May be costly depending on your standpoint. There are function invocations and read/write counts you should think about.
Things to consider
When you transact the count in Function B, you may want to consider trying this multiple times in case the max write rate of 1/sec is exceeded and the transaction fails.
In Function B, you may want to implement batch reading like in Function A if you expect to be counting a lot of likes per message.
If you need to update anything else periodically for the message (in another cron job), you may want to consider merging that function into Function B so the write rate of 1/sec isn't exceeded.

Having quite a lot of issues when write massive record to firebase database

From my last question let me decided to write huge amount of data to
firebase databases for testing purpose.
Here is the outcome
1000 record: Nothing significant happen it work fine.
10000 record: Response from all other read operation from firebase return only after the write operation complete.
100000 record : Same as the result of 10000 record but took more longer, I can't perform any firebase operation unless force close the app and reopen it. The screen started hang for some time might because I perform loop in main thread(ios).
1 million record : I'm afraid so never try.
The reason I need to write such amount of data is because I building some social app(android,ios,web) it use SQL before but I think it is time to switch to firebase. By studying this I having the idea on how to build a user feed without using the IN clause. The data structure look like this
users
user1
name: bob
user2
name: alice
follows:
user1: true
posts
post1
author: user1
text: 'Hi there'
feeds
user2
post1: true
As the example If one of the user having 61 million follower it will need to insert record to 61 million feeds/$uid/. Which the write operation barely survive with 100k. On this link it suggest not to do it in the client side but big point of firebase is it is backendless how I suppose to write beside from client side.
So my question is there any efficient way to achieve on how to not get other read operation interupt by this kind of massive write operation? Or there is way better data modeling for this?
I really need help. I really apperciated even just a comment.
Lets start with an example Firebase structure using observers and events to capture feeds from specific users.
We have a users node which stores data about all of the users, in this case just their name. We also have a feeds node which stores the feeds. In this case we could also call it messages as we are just storing messages users create.
users
uid_0
name: "Scott"
uid_1
name: "Frank"
uid_2
name: "Leroy"
feeds
feed_0
uid: uid_1
msg: "some message from uid_1"
feed_1
uid: uid_2
msg: "a message from uid_2"
Assume that Scott (uid_0) wants to 'subscribe' to any feeds from Frank and Leroy.
func addSomeFeeds() {
self.addFeed(uidFeed: "uid_1")
self.addFeed(uidFeed: "uid_2")
}
func addFeed(uidFeed: String) {
let feedsRef = self.ref.child("feeds")
let feedQuery = feedsRef.queryOrdered(byChild: "uid").queryEqual(toValue: uidFeed)
feedQuery.observe(.childAdded, with: { snapshot in
let feedDict = snapshot.value as! [String: Any]
let msg = feedDict["msg"] as! String
print(msg)
})
}
and the output
some message from uid_1
a message from uid_2
and then if uid_2 adds another feed (with a message)
Feed From uid_2
The above code is run on Scott's device and attaches an observer to the Feeds node that 'watches' for any feeds added by Frank or Leroy.
This could expanded and watch for changes via .childChanged so if the feed has multiple 'posts' in it any time a post is updated in a particular feed, Scott's app will be notified of the changes within that feed.
Another option is to add a more generic observer to the feeds node wheras the app would be notified for any feed added, changed or removed. In that case, when the app receives a snapshot of the feed, simply compare it to an array of feeds the user is interested in and if doesn't match one of those, ignore it, otherwise, notify the user.

Having consistency during multi path updates when the paths are not deterministic and are variable

I need help in a scenario when we do multipath updates to a fan-out data. When we calculate the number of paths and then update, in between that, if a new path is added somewhere, the data would be inconsistent in the newly added path.
For example below is the data of blog posts. The posts can be tagged by multiple terms like “tag1”, “tag2”. In order to find how many posts are tagged with a specific tag I can fanout the posts data to the tags path path as well:
/posts/postid1:{“Title”:”Title 1”, “body”: “About Firebase”, “tags”: {“tag1:true, “tag2”: true}}
/tags/tag1/postid1: {“Title”:”Title 1”, “body”: “About Firebase”}
/tags/tag2/postid1: {“Title”:”Title 1”, “body”: “About Firebase”}
Now consider concurrently,
1a) that User1 wants to modify title of postid1 and he builds following multi-path update:
/posts/postid1/Title : “Title 1 modified”
/tags/tag1/postid1/Title : “Title 1 modified”
/tags/tag2/postid1/Title : “Title 1 modified”
1b) At the same time User2 wants to add tag3 to the postid1 and build following multi-path update:
/posts/postid1/tags : {“tag1:true, “tag2”: true, “tag3”: true}
/tags/tag3/postid1: {“Title”:”Title 1”, “body”: “About Firebase”}
So apparently both updates can succeed one after other and we can have tags/tag3/postid1 data out of sync as it has old title.
I can think of security rules to handle this but then not sure if this is correct or will work.
Like we can have updatedAt and lastUpdatedAt fields and we have check if we are updating our own version of post that we read:
posts":{
"$postid":{
".write":true,
".read":true,
".validate": "
newData.hasChildren(['userId', 'updatedAt', 'lastUpdated', 'Title']) && (
!data.exists() ||
data.child('updatedAt').val() === newData.child('lastUpdated').val())"
}
}
Also for tags we do not want to check that again and we can check if /tags/$tag/$postid/updatedAt is same as /posts/$postid/updatedAt.
"tags":{
"$tag":{
"$postid":{
".write":true,
".read":true,
".validate": "
newData.hasChildren(['userId', 'updatedAt', 'lastUpdated', 'Title']) && (
newData.child('updatedAt').val() === root.child('posts').child('$postid').val().child('updatedAt').val())”
}
}
}
By this “/posts/$postid” has concurrency control in it and users can write their own reads
Also /posts/$postid” becomes source of truth and rest other fan-out paths check if updatedAt fields matches with it the primary source of truth path.
Will this bring in consistency or there are still problems? Or can bring performance down when done at scale?
Are multi path updates and rules atomic together by that I mean a rule or both rules are evaluated separately in isolation for multi path updates like 1a and 1b above?
Unfortunately, Firebase does not provide any guarantees, or mechanisms, to provide the level of determinism you're looking for. I have had the best luck front-ending such updates with an API stack (GCF and Lambda are both very easy, server-less methods of doing this). The updates can be made in that layer, and even serialized if absolutely necessary. But there isn't a safe way to do this in Firebase itself.
There are numerous "hack" options you could apply. You could, for example, have a simple lock mechanism using a dedicated collection for tracking write locks. Clients could post to a lock collection, then verify that their key was the only member of that collection, before performing a write. But I hope you'll agree with me that such cooperative systems have too many potential edge cases, potential security issues, and so on. In Firebase, it is best to design such that this component is not a requirement in the first place.

Resources