I've been working with RMDB's for decades but I'm having a hard time wrapping my head around the "right way" to structure a specific data-model using DynamoDB. I'm completely new to NoSQL but I understand I should try to create as few tables as I can to accomplish my needs and the structure will depend on how I plan on querying the database. I've attached an image of the table relationship I would create in a relational database relational db table diagram to accomplish my application requirements, which are:
The application will have multiple organizations
Organizations can have zero-or-more staff members
Staff can be associated to one-or-more organization
Staff can have an independent role for each of their organization associations
Some Queries the data-model will need to support:
Lookup Staff member based on email address (for login)
Lookup Staff member based on ID (for RESTful API reference)
Lookup Organization based on ID (for RESTful API reference)
Any help in the right direction would be great and maybe a great reference for DynamoDB design?
Thanks
Related
I'm trying to create my first DynamoDB based project and I'm having some trouble figuring out the best practices working with a NoSQL database.
My usecase currently is storing users and teams. I have a table that has a partition key of either USER#{userId} or TEAM{#teamId}. If the PK is TEAM{#teamId} I store records with SK either TEAM#{teamId} for team details, or USER#{userId} for the user's details in the team (acceptedInvite, joinDate etc). I also have a GSI based on the userId/email column that allows me to query all the teams a user has been invted to, or the user's team, depending on the value of acceptedInvite field. Attached screenshots of the table structure at the moment:
The table
The GSI
In my application I have an access pattern of getting a team's team members, given a user id.
Currently, I'm doing two queries in my lambda function:
Get user's team, by querying the GSI on PK = {userId} and fitler acceptedInvite = true
Get the team data by querying the table on PK = {teamId} and SK begins_with USER#
This works fine, but I'm concerned I need to preform two separate DynamoDB calls in my API function.
I'm wondering if there's a better way to represent this access pattern and if multiple dynamoDB calls are actually that bad, since I cannot see another way to do this.
Any kind of feedback is appreciated!
The best way to avoid making two queries like this is to supply the API caller with all the information needed to make a single DynamoDB request. For your case this means supplying the caller with the teamId. You can do this as either as part of a list operation response, or if it is the authenticated user, then as part of their claims in a JWT.
I do not have much experience and I would like to know if there is an easy way to create user rights and privileges, so that each user can access only specific records from the database tables, based on the level he belongs to.
More specifically, suppose we have a group of companies where this group has some companies and these companies have some branches and the branches have some users.
I want the user belonging to the "group of companies" level to have access to and view all the entries in the database related to that group and what is below it (its companies and the branches of these companies).
The user who belongs to the "company" level should have access and see only the files of this company and the branches that this company may have in the database.
The user belonging to the "Branch" level should only be able to access and view this barnch records in the database.
And finally the user belonging to the "End User" level to have access and see only the records created by the user in the database.
Of course level "administrator" will have access to all records in the database.
I thought of creating a user table with a field "User_Level" and in each table to enter USER_ID where based on this I can find the level of a user but how can I restrict access based on the Group of Companies or the Company or the Branch where it belongs?
In APEX you can create authorization schemes to determine what components a user has access to within an application - but that is just a part of the answer to this question. Your question is about filtering the data that is showed to a user based on certain criteria.
There are a couple of possible solutions to this. Since this is a very broad question I'm just going to give you pointers/concepts to start your research. Up to you to determine what solution/combination is most suitable for your implementation.
Concept: Multi-Tenancy
If the data is used by multiple tenants then add a tenant_id to each table that has tenant specific data. In your case a tenant should be a branch. A simple design could be a groups table (to hold branch - companies - company groups), a group_members table (to define relationship between branch - companies - company groups OR between any group and a user) and a users table.
Concept: VPD This is a feature in the oracle database that allows a transparent implementation of security rules. In the application you'll define a simple select like
SELECT * FROM emp
But the VPD implementation will automatically add a where clause to the query to only show the records defined in the VPD policy. This makes developing the application a lot easier since there is less room for errors. Note that this database option could not be included for your licence. There is also something called "Poor Man's VPD" that does not use the VPD option. Google on how to implement this in your apex application.
Just do it all by hand: This is the least preferred option but it can be done. For every component where a select is done, manually add a where clause to restrict the returned rows. However this is very maintenance intensive and there is a ton of room for errors - obiously the data model will still have to support the striping of the data.
This blog post by Jeffrey Kemp might give you some pointers as well: https://jeffkemponoracle.com/2017/11/convert-an-apex-application-to-multi-tenant/ - go through the "further reading" section at the bottom.
you can create a procedure or function and in your app's shared components -> authorization scheme use that such as pl/sql function/procedure returning boolean and return true for the users you want to see the things and false for hiding.
In Apex components, select this authorization scheme like in items, pages etc.
Background: I have a relational db background and have never built anything for DynamoDB that wasn't just used for fast writes with very few reads. I am trying to learn DynamoDB patterns by migrating one of my help desk apps from MySQL to DynamoDB.
The application is a fairly simple one from a data storage perspective. A user submits a request and that request generates 1 or more tickets.
Setup: I have screens where people see initial requests and that request's tickets and search views that allow support to query on a bunch of attributes of a ticket (last name of user, status of ticket, use case of ticket, phone number of user, dept of user). This design in a SQL db is pretty straightforward but in Dynamo, I'm really being thrown for a loop on how to structure primary/sort keys and secondary indexes (if necessary).
I created one collection for requests and one collection for tickets. The individual requests have an array of ticket ids that belong to it. The ticket item has an attribute that stores the request id so that I can search that way. But what I am hung up on, is how do I incorporate searching on a ticket/request's attributes without having to do a full scan?
I read about composite keys and perhaps creating a composite sort key similar to: ## so that I can search on each of those fields directly without having to know the primary key (ticket id).
Question: How do you design dynamo collections/tables that require querying a lot of different attribute values without relying on a primary key?
This is typically something that DynamoDB is not good at, not to say it definitely cannot be done. The strength and speed for DynamoDB comes from having well known access patterns and designing your schema for these patterns. In general if you don't know what your users will search for, or there are many different possible queries, it's better to look at something like RDS or a native SQL DB. That being said a possible direction to solve this could be to create multiple lists for each of the fields and duplicate the data. This could all be done in the same table.
I see that a CosmosDb can support both graph queries as well as more traditional SQL like queries - however I'm a bit confused about what kind of underlying schema is best at the collections level. If I were to model something in MongoDb or SQL Server, or Neo4j, I would have very different schemas. Also - it seems like I can query using more traditional SQL-like syntax - which makes it confusing about what's right or efficient underneath. Sometimes, making something easy to query does not mean that one should assume that it's an efficient query.
Is CosmosDb at it's heart a document database and I should model it accordingly - or is it a very different beast.
Example use case
Here's an example- let's say I have:
a user profile
multiple post types (photo, blog, question)
users can like photos
users can comment on photos, blogs, questions
With a sql database I would have tables:
profiles
photos
blogs
questions
and join tables with referential integrity to support the actions:
photoLikes
blogComments
photoComments
questionComments
With a graph database
I would just have the same core tables
profiles
photos
blogs
questions
and just create graph relationship types for like and comment - relying on the code business logic to enforce the rule that you can't like blogs, etc..
With a document db like MongoDb
Again, I might have the same core tables
profiles
photos
blogs
questions
Comments would be sub collections under each - and there would be a question of whether we want to keep the likes as an embedded collection under each profile, or under photos.. and we would have to increment and sync a like count to the other collection (depending on the use case we might create a like collection as well). Comments would be tucked under each photo, blog or question as an embedded collection and not have their own top-level collection.
So my question is this:
How do we model this schema in CosmosDB? Should we model it like a traditional Document Database like MongoDb, or does having access to a graph query allow us additional freedoms like not having to denormalize fields for actions such as "like?"
Azure Cosmos DB database engine is designed to be fully schema-agnostic.
A container (which can be a graph, a collection of documents, or a table) is a schema-agnostic container of arbitrary user generated content which gets automatically indexed upon ingest. I suggest to read "Schema-Agnostic Indexing with Azure DocumentDB" - http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf, which is the same in Cosmos DB to better understand the details.
How do we model this schema in CosmosDB? Should we model it like a traditional Document Database like MongoDb, or does having access to a graph query allow us additional freedoms like not having to denormalize fields for actions such as "like?"
When you start modeling data in Azure Cosmos DB, you need to consider: 1.Is your application read heavy or write heavy? 2.How is your application going to query and update data? etc. Normally denormalized data models can provide better read performance, normalizing can provide better write performance.
This article explained with example how to model document data for NoSQL databases, and shared some scenarios for using embedded data models, normalized data models and Hybrid data models, which should be helpful.
I have a question abut what is the best practice/pattern to ensure proper authorization in my SOA application. I have a bunch of services that allow users access certain data (stored in DB). An example of a typical scenario is...
We have a EMPLOYEE table that has say a one to many relationship with PROJECT. So AN EMPLOYEE can have many PROJECTS. Also each employee belongs to a region. The users of the system are allowed to edit information on employee and projects. However Each user only manages data for a few regions and therefore can only modify employees and their projects that belong to a region that the user manages. So user may have access to regions A, B, C and can edit employee/project data for Region A employee but not region Z employee.
I have one service that lets you edit an employee and another to edit an project. Similarly a project has relationships to other entities, for e.g. SCHEDULE, and I have services to edit those as well.
However my problem is this -- whether an user can edit or not a project or schedule, by calling the corresponding service, is determined by which region the employee (to which these projects and schedules are related) belongs to. So for every service to modify a project or schedule or any entity in that hierarchy of data starting at employee, I am having to query the corresponding employee and enforce the region constraints. Which can become very expensive operation considering the database calls and number of joins (my real example has lot of such entities and corresponding services) I have to make on every service call. Is there a more elegant light weight solution for my scenario?
First of all, I'd call these requirements regular business rules rather than authorization. Authorization is usually much more generic in nature, meaning whether a user is allowed access to a specific system or is allowed to invoke a certain function (regardless of parameters).
Second, and based on a business rules view, you should take into account the consistency needs around these business rules before dividing things up into services.
Third, relating to your statement "my real example has lot of such entities and corresponding services", it usually not a good idea to have services that correspond to entities.
So, in summary, you need to redesign your service boundaries such that the rules that need to be enforced (in a highly consistent manner) are contained within a single service. One way to do that is to take different parts of a given entity and put them in different services - provided that there aren't any business rules that demand consistency across those diffe