Teams in my app can have members with different roles (owner, admin, member). I need to be able to enforce security rules based on those roles and perform queries on teams and users based on those access roles. I've read the group access control solution in the Firebase documentation (that implements team membership only, not team roles) and studied Firestore indexing and query constraints.
Below are schema options I've considered. I'm needing confirmation of my assumptions and advice on the best solution.
Option 1: The most natural choice would be to maintain a team.roles map (object) that maps uid (key) => "owner" or "admin" or "member" (team role value). We can mirror this on user.teams. However since a uid is now a nested field name, when a query needs to find all members who are either "admin" or "member" I think we're forced to create a custom index per uid! You might make it work for a single where clause on the roles.${uid} field but the moment you add an additional where clauses or one the includes an array-contains condition I think you're forced to create a composite index per uid. Right?
If you have large lists of team members that can be a problem as you'll need to implement server side sorting with pagination in the query - which I think forces you to do custom indexing so in that cases this is a non-viable solution.
Option 2 Replace the roles map with 3 object fields, one per role, each having uids as keys with value true. You can now use a simple query condition to find a "member", "admin" or "owner". You can also use a single update() to add/remove users from a role.
However I think you must once again avoid creating any composite indexes as they will include a uid so unlimited indexes required! Right?
Option 3: Replace the role map with 3 array fields, one per role, each containing a list of uid's. You can now use an in condition on a single role and can safely combine it with other conditions. This fixes the per uid indexing issue. Users can also be an owner or admin without being considered to be a member. However once you've used up the single in condition, you can no longer use another one on the same query so that can seriously constrain the types of queries you can now perform on the remaining fields (Firebase team please note the single in condition is a serious pain point that warps your schemas).
For Options 1 & 2, uids are nested field names so I think we also have an upper limit limit of 40,000 users (max index limit) in the system or maybe it's only 40,000 per team. A benefit of this approach is that add/remove of users to roles are easy to do as single operations (no need to read then update, can just do a merge update).
Are there other best practice options? Thanks for you help!
Related
I do not have much experience and I would like to know if there is an easy way to create user rights and privileges, so that each user can access only specific records from the database tables, based on the level he belongs to.
More specifically, suppose we have a group of companies where this group has some companies and these companies have some branches and the branches have some users.
I want the user belonging to the "group of companies" level to have access to and view all the entries in the database related to that group and what is below it (its companies and the branches of these companies).
The user who belongs to the "company" level should have access and see only the files of this company and the branches that this company may have in the database.
The user belonging to the "Branch" level should only be able to access and view this barnch records in the database.
And finally the user belonging to the "End User" level to have access and see only the records created by the user in the database.
Of course level "administrator" will have access to all records in the database.
I thought of creating a user table with a field "User_Level" and in each table to enter USER_ID where based on this I can find the level of a user but how can I restrict access based on the Group of Companies or the Company or the Branch where it belongs?
In APEX you can create authorization schemes to determine what components a user has access to within an application - but that is just a part of the answer to this question. Your question is about filtering the data that is showed to a user based on certain criteria.
There are a couple of possible solutions to this. Since this is a very broad question I'm just going to give you pointers/concepts to start your research. Up to you to determine what solution/combination is most suitable for your implementation.
Concept: Multi-Tenancy
If the data is used by multiple tenants then add a tenant_id to each table that has tenant specific data. In your case a tenant should be a branch. A simple design could be a groups table (to hold branch - companies - company groups), a group_members table (to define relationship between branch - companies - company groups OR between any group and a user) and a users table.
Concept: VPD This is a feature in the oracle database that allows a transparent implementation of security rules. In the application you'll define a simple select like
SELECT * FROM emp
But the VPD implementation will automatically add a where clause to the query to only show the records defined in the VPD policy. This makes developing the application a lot easier since there is less room for errors. Note that this database option could not be included for your licence. There is also something called "Poor Man's VPD" that does not use the VPD option. Google on how to implement this in your apex application.
Just do it all by hand: This is the least preferred option but it can be done. For every component where a select is done, manually add a where clause to restrict the returned rows. However this is very maintenance intensive and there is a ton of room for errors - obiously the data model will still have to support the striping of the data.
This blog post by Jeffrey Kemp might give you some pointers as well: https://jeffkemponoracle.com/2017/11/convert-an-apex-application-to-multi-tenant/ - go through the "further reading" section at the bottom.
you can create a procedure or function and in your app's shared components -> authorization scheme use that such as pl/sql function/procedure returning boolean and return true for the users you want to see the things and false for hiding.
In Apex components, select this authorization scheme like in items, pages etc.
I have the following one-to-many relationship:
Account 1--* User
The Account contains global account-level information, which is mutable.
The User contains user-level information, which is also mutable.
When the user signs-in, they need both Account and User information. (I only know the UserId at this point).
I ideally want to design the schema such that a single query is necessary. However, I cannot determine how to do this without duplicating the Account into each User and thus requiring some background Lambda job to propagate changes to Account attributes across all User objects -- which, for the record, seems like more resource usage (and code to maintain) than simply normalizing the data and having 2 queries on each sign-in: fetch user, then fetch account (using an FK inside the user object that identifies the account).
Is it possible to design a schema that allows one query to fetch both and doesn't require a non-transactional background job to propagate updates? (Transactional batch updates are out of the question, since there's >25 users.) And if not, is the 2-query idea the best / an acceptable method?
I'll focus on one angle in your question - the 2-query idea. In many cases it is indeed an acceptable method, better than the alternatives. In fact in many NoSQL uses, every user-visible request results in significantly more than two database requests. In fact, it is often stated that this is the reason why NoSQL systems care about low tail latencies (i.e., even 99th percentile latencies should be low).
You didn't say why you wanted to avoid the 2-query solution. The 2-query implementation you presented has two downsides:
It is more costly: you need to do two queries instead of one, costing (when the reads are shorter than 4 KB) double than a single read.
Latency doubles if you need to do the first query, and only then can do the second query.
There may be tricks you can use to solve both problems, depending on more details of your use case:
For the latency: You didn't say what is a "user id" in your application. If it is some sort of unique numeric identifier, maybe it can be set up such that the account id can be determined from the user id directly, without a table lookup (e.g., the first bits of the user id are the account id). If this is the case, you can start both lookups at the same time, and not double the latency. The cost will still be double, but not the latency.
For the cost: If there is a large number of users per account (you said there are more than 25 - I don't know if it's much more or not), it may be useful to cache the Account data, so that not every user lookup will need to read the Account data again - it might often be cached. If Account information rarely changes and consistency of it is not a big deal (I don't know if it is...), you can also get by with doing an "eventual consistency" read for the Account information - which costs half of the regular "consistent" read.
I think the following scheme will be useful for.
You will store both account and user records inthe same table
You want to get both account metadata and linked users in a single query
PK: account SK: recordId
=== Account record ===
account: 123512321 recordId: METADATA attributes: name, environment, ownerId...
=== User record ===
account: 123512321 recordId: USERID#34543543 attributes: name, email, phone...
With this denormalization of the data, you can retrieve both account metadata and related users in a single query. You can also change the account metadata without a need to apply any change to related users.
BONUS: you can also link other types of assets to the account record
I am trying to model 2 concepts in firestore and also associate
collection: users
key/document_id: email
document: profile info
collection: topics
key/document_id: random
document: metadata with a field indicating email of user (to use for lookups)
My goal is to
"reference" topics in users for easy lookups, but not sure how to do
it other than a sub collection.
Based on email which will be passed as part of auth, I want to have security rule to allow writes in collection only on path, field
based on email
Are both of above feasible in Firebase. Appreciate any pointers!
Preamble: There isn't ONE and only ONE correct approach in NoSQL data modelling
Your approach seems valid, however I would suggest the following adaptations:
"Reference topics in users for easy lookups":
To "reference topics in users for easy lookups" you could duplicate the list of topics in an array in the user profile. You will then be able to use array-contains (and other array membership methods) for your queries. (Note however the limitation of the in operator).
Advantage of this approach: you only need to query one document to get all the topics of a user. Possible drawback: there is a limit on the size for a document (and for a single field value) which is maximum 1 MiB (1,048,576 bytes), see the doc.
You can easily keep in sync the topics array and the topics sub-collection by combining a batched write and the arrayUnion() and arrayRemove() methods.
Use the user ID instead of the email for doc Ids and Security Rules:
Instead of using the email as the users collection document ID and using it in Security Rules, use the user ID. See the examples in the doc.
I'm building a SaaS system that allows users to define their own data models and enter data according to those models. It's a bit like airtable.
One user might model a bookshop, and would have a Book model, with title and ISBN fields. Another user might model medical records, and would have "date of last visit" as a field.
In the case of the bookshop, I want users to be able to search on title and ISBN. In the case of the medical records, I want users to be able to search on the date of the last visit.
I am using Firestore as my backend.
Firestore requires an index to enable a search. So that approach will not scale as # of customers increases.
My thought therefore was to have a Firestore instance for each customer, and those specific instances would have the necessary indexes.
I'm sure there are downsides to doing this though.
What would folks recommend to best solve this need?
What you are trying to achieve is some kind of weird, since you will not provide at least a few standard common properties for each user of your Bookshop.
When you want to perform a search in a Cloud Firestore database, you need the exact name of the property on which you want to search for. Having dynamic properties might not help you solve the search feature. However, you can create a document with a property of type array that can hold the name of all properties the users have chosen and perform a search on every property, but this solution will be much too expensive.
In my opinion, a possible solution might be to create at least a few common properties, so you can have the properties on which you can search. When someone creates, for example, a book shop you can display at the beginning all available properties a user can choose. Once you create a shop, you can have different users with different shop properties. This means that if a user does not choose a property, when you perform a search on that property, the results won't contain his/her products. This will work, only if you have predefined properties.
In my firestore database, I use the same collection name in different parts of my hierarchy. For example, imagine a stackoverflow-like site with the following 2 collections
/questions/{questionId}/votes/
/questions/{questionId}/answers/{answerId}/votes/
So now I want to create an index on one of these 2 collections. I would expect firestore to require some kind of "path-with-wildcards" like I've used above to identify the data to be indexed. However, instead, they only require the collection name: in this case, "votes".
So if I put an index on "votes" does it apply to both of these collections? Is there any way to put an index on one of these collection and not the other? Is it a best practice to use unique collection names to avoid this issue?
TL;DR:
Yes. Indexes are based on the collection id. This applies to both the ones we create automatically for you on single fields, as well as the composite indexes you create manually. If they are semantically different indexes we recommend you give them unique ids, so you could use question_votes and answer_votes.
More Info
Collection id is the identifier of the collection, excluding the full path. In your case, this is votes as you've noted.
The queries we currently serve use the subset of indexes for a specific path, although we have plans in the future to allow you to do a query that spans all collections with the same collection id (the collection group). This small bit of info adds some context to why.
A second reason is there is a 200 composite index limit in the system, so if someone had a data model structured like the following, /users/{user_id}/blog_posts/{post_id}, there would be no real way for them to create composite indexes on blog_posts for more than a handful of users (not to mention the operational burden of creating new indexes for every user!)