What is the best way to enforce per-user quota on data stored in Firebase?
My users will be able to create documents with a unique id on the following path:
/documents/id/contents
The id will be uniquely generated using a transaction. The id will be reserved by using a verification rule (contents.id == auth.id)
However how do I prevent a user from spamming the db (by randomly allocating ids to themselves)? Can I have a rule which counts the number of ids allocated to a user and rejects them if the count is too high?
There's currently no good way to do this.
In some cases you could fully enumerate the children that are allowed to exist (child1, child2, etc), and grant read / write for each one. This won't work for large numbers though or for ids you don't know beforehand.
We do have plans to built features to restrict the number of allowed children and to provide other features to enforce quotas on users.
Related
I am building a community-type app based on Firestore where users should have granual control over what kind of information they share with whom.
Users can have properties such as name, birthdate, etc. and for each of them they can decide to share it with the one of the following groups/roles:
Private
Contacts
Admin (Admins of organizations that user is a member of)
Organization (Members of organizations that a users is a member of)
Public (All users of the app)
As documents in Firestore will always be retrieved as a whole, I already know that I somehow will have to segregate my user properties by access level.
I've got two approaches so far:
Approach 1
Store each user property in a separate document that contains a field access level
Store some metadata in, for example /user/12345/meta/roles, so that I can point the security rules to those documents to validate access
Benefits:
Easy structure
Flexibly
(Almost) no data duplication
Drawbacks:
Lots of document reads for getting a user's profile
Approach 2
Store user profile in, for example /user/12345/profile/private and duplicate the public information into /user/12345/profile/public, and do the same for each access level
Benefits:
Reduced document reads
Drawbacks:
Complexity
It feels wrong to duplicate that much data
Does anyone have any experience with this and any suggestions or alternative approaches they can share?
Follow-up question:
Let’s say I store the list of members of an organization in a subcollection, that is only accessible for members of the organization (for privacy reasons). Doesn’t that mean that when querying that list of members from client side, I have to do it „blindly“, meaning I can’t know if the user can access that document until I actually try? The fact that the query might fail would tell me that the user is not actually a member of that organization.
Would you consider this kind of query that is set up for failure bad practice? Are there any alternatives that still allow to keep the memberlist private?
I think you are moving from a SQL environment to NoSql now which is why you are finding the Approach 2 as not the right way to proceed.
Actually approach 2 is the right way to proceed there are couple of advantages
1.) Reduced Document Reads - More cost savings. Firestore charges by number of reads and writes if you are reducing no of reads and writes optimally its always the way to go for. Also the cost of storage due is increased reads will always be less than the actual cost of reads if you are scaling up your application.
2.) In NoSql database your are allowed to duplicate data provided it is going to increase the read / search speed from the database.
I am not seeing the second approach as complex because that's the tradeoff you are making when Choosing a NoSql over Sql
I have a collection of usernames that map to their character ids. App allows users to search by username, when username is submitted I get the document from firestore and check if it exists or not.
Right now there are no limits to how fast users can query usernames. Ideally I want to allow to query this collection once every 2s per user.
I was able to find this answer https://stackoverflow.com/a/56487579/911930 but if I understood security rules correctly this example imposes "Global" delay on the collection i.e. if user no.1 queries usernames, user no.2 can't query them for 5s. This is obv not ideal for my use case, as I want this rule imposed per user as opposed to globally.
Is this achievable with security rules?
The link you provide describes a write rate limit, both globally and per user (see the section "The final example is a per-user write rate-limit").
There is no way to implement a read rate limit in Firestore security rules. If that is a hard requirement for your app, the most common approach is to make all read operation go through Cloud Functions, where you can enforce the limit.
I have the following one-to-many relationship:
Account 1--* User
The Account contains global account-level information, which is mutable.
The User contains user-level information, which is also mutable.
When the user signs-in, they need both Account and User information. (I only know the UserId at this point).
I ideally want to design the schema such that a single query is necessary. However, I cannot determine how to do this without duplicating the Account into each User and thus requiring some background Lambda job to propagate changes to Account attributes across all User objects -- which, for the record, seems like more resource usage (and code to maintain) than simply normalizing the data and having 2 queries on each sign-in: fetch user, then fetch account (using an FK inside the user object that identifies the account).
Is it possible to design a schema that allows one query to fetch both and doesn't require a non-transactional background job to propagate updates? (Transactional batch updates are out of the question, since there's >25 users.) And if not, is the 2-query idea the best / an acceptable method?
I'll focus on one angle in your question - the 2-query idea. In many cases it is indeed an acceptable method, better than the alternatives. In fact in many NoSQL uses, every user-visible request results in significantly more than two database requests. In fact, it is often stated that this is the reason why NoSQL systems care about low tail latencies (i.e., even 99th percentile latencies should be low).
You didn't say why you wanted to avoid the 2-query solution. The 2-query implementation you presented has two downsides:
It is more costly: you need to do two queries instead of one, costing (when the reads are shorter than 4 KB) double than a single read.
Latency doubles if you need to do the first query, and only then can do the second query.
There may be tricks you can use to solve both problems, depending on more details of your use case:
For the latency: You didn't say what is a "user id" in your application. If it is some sort of unique numeric identifier, maybe it can be set up such that the account id can be determined from the user id directly, without a table lookup (e.g., the first bits of the user id are the account id). If this is the case, you can start both lookups at the same time, and not double the latency. The cost will still be double, but not the latency.
For the cost: If there is a large number of users per account (you said there are more than 25 - I don't know if it's much more or not), it may be useful to cache the Account data, so that not every user lookup will need to read the Account data again - it might often be cached. If Account information rarely changes and consistency of it is not a big deal (I don't know if it is...), you can also get by with doing an "eventual consistency" read for the Account information - which costs half of the regular "consistent" read.
I think the following scheme will be useful for.
You will store both account and user records inthe same table
You want to get both account metadata and linked users in a single query
PK: account SK: recordId
=== Account record ===
account: 123512321 recordId: METADATA attributes: name, environment, ownerId...
=== User record ===
account: 123512321 recordId: USERID#34543543 attributes: name, email, phone...
With this denormalization of the data, you can retrieve both account metadata and related users in a single query. You can also change the account metadata without a need to apply any change to related users.
BONUS: you can also link other types of assets to the account record
Teams in my app can have members with different roles (owner, admin, member). I need to be able to enforce security rules based on those roles and perform queries on teams and users based on those access roles. I've read the group access control solution in the Firebase documentation (that implements team membership only, not team roles) and studied Firestore indexing and query constraints.
Below are schema options I've considered. I'm needing confirmation of my assumptions and advice on the best solution.
Option 1: The most natural choice would be to maintain a team.roles map (object) that maps uid (key) => "owner" or "admin" or "member" (team role value). We can mirror this on user.teams. However since a uid is now a nested field name, when a query needs to find all members who are either "admin" or "member" I think we're forced to create a custom index per uid! You might make it work for a single where clause on the roles.${uid} field but the moment you add an additional where clauses or one the includes an array-contains condition I think you're forced to create a composite index per uid. Right?
If you have large lists of team members that can be a problem as you'll need to implement server side sorting with pagination in the query - which I think forces you to do custom indexing so in that cases this is a non-viable solution.
Option 2 Replace the roles map with 3 object fields, one per role, each having uids as keys with value true. You can now use a simple query condition to find a "member", "admin" or "owner". You can also use a single update() to add/remove users from a role.
However I think you must once again avoid creating any composite indexes as they will include a uid so unlimited indexes required! Right?
Option 3: Replace the role map with 3 array fields, one per role, each containing a list of uid's. You can now use an in condition on a single role and can safely combine it with other conditions. This fixes the per uid indexing issue. Users can also be an owner or admin without being considered to be a member. However once you've used up the single in condition, you can no longer use another one on the same query so that can seriously constrain the types of queries you can now perform on the remaining fields (Firebase team please note the single in condition is a serious pain point that warps your schemas).
For Options 1 & 2, uids are nested field names so I think we also have an upper limit limit of 40,000 users (max index limit) in the system or maybe it's only 40,000 per team. A benefit of this approach is that add/remove of users to roles are easy to do as single operations (no need to read then update, can just do a merge update).
Are there other best practice options? Thanks for you help!
Context: I am putting together a time tracking application using Firebase as my backend. My current node structure has Time Entries and Clients at the root like so:
Time Entry
Entry ID
UserID
clientID, hours, date, description, etc
Clients
ClientID
name, projects, etc
This structure works fine if I'm just adding and pulling time entries based on the user, but I want to start putting together reports on a per client basis. Currently, this means making a separate HTTP request for each user and then filtering by the clientID to get at the data.
The rule structure for Firebase grants access to all child nodes once access is given to the parent node, so one big list doesn't work as it can't restrict users from seeing or editing each other's entries.
Question: Is there a way to structure the nodes that would allow for restricting users to only managing their own time entries, as well as allow for one query to pull all entries tied to a client?
** The only solution I could come up with was duplicating the entries into a single node used just for reporting purposes, but this doesn't seem like a sustainable option
#AL. your answer was what I went up going with after scouring the docs across the web. Duplicating the data is the best route to take.
The new Firestore beta seems to provided some workarounds to this.
The way that I would do this is with Cloud Firestore.
Create a root collection clients and a document for each client. This partitions the data into easily manageable chunks, so that a client admin can see all data for their company.
Within the client document, create a sub-collection called timeEntries. When a user writes to this, they must include a userId field (you can enforce this in the rules) which is equal to request.auth.uid
https://firebase.google.com/docs/firestore/security/rules-conditions#data_validation
You can now create read rules which allow an admin to query any document in the timeEntries sub-collection, but an individual user must query with userId = request.auth.uid in order to only return the entries that they have created.
https://firebase.google.com/docs/firestore/security/rules-conditions#security_rules_and_query_results
Within your users/{uid} collection or clients/{clientId} collection, you can easily create a flag to identify admin users and check this when reading data.
https://firebase.google.com/docs/firestore/security/rules-conditions#access_other_documents