How do I model 1:N:N relationship in Dynamo DB - amazon-dynamodb

I have a school which has a many class/grade(1-10) and each class has many students, I need to store a student's record on a yearly basis so that I could partition better. So its basically Class->N years->N students. How do I model this problem to store this on a Dynamo DB

On NoSQL, the design depends on the Query Access Pattern (QAP). As you have not mentioned QAP, how you would like to retreive the data. I have assumed a typical scenario and provided the below design.
Table : Student
Partition Key : Student Id
Sort Key : year
Other attributes: Student name, class etc.
The year is defined as sort key because a student can study in multiple grades (1-10) during different years. For eg,
2010 - He/She could be on grade 5
2011 - He/She could be on grade 6
In case, if you would like to get all the student ids for a particular year, you can create GSI (Global Secondary Index) on year field.
Partition Key for the index : year
If you have any other access pattern, please update the question. So that we can discuss the answer for that particular query access pattern (QAP).

Related

Single table DynamoDB design tips

I have an old application I am modernizing and bringing to AWS. I will be using DynamoDB for the database and am looking to go with a single table design. This is a multitenant application.
The applications will consist of Organisations, Outlets, Customers & Transactions.
Everything stems from an organization, an organization can have multiple outlets, outlets can have multiple customers and customers can have multiple transactions.
Access patterns are expected to be as follows:
Fetch a customer by its ID
Search for a customer by name or email
Get all customers for a given outlet
Get all transactions for a customer
Get all transactions for an outlet
Get all transactions for an outlet during a given time period (timestamps will be stored with each transaction)
Get all outlets for a given organisation
Get an outlet by its ID
I've been reading into single table designs and utilizing the primary key and sort keys to enable this sort of access but right now I can't quite figure out the table/schema design.
The customer will have the outletID and OrganiastionID attached so I should always know those ID's
Data Structure (can be modified)
Organisations:
id
Name
Owner
List of Outlets
createdAt (timestamp)
Outlets:
OrganisationId
Outlet Name
Number of customers
Number of transactions
createdAt (timestamp)
Customers:
id
OrganisationID
OutletID
firstName
lastName
email
total transactions
total spent
createdAt (timestamp)
Transactions:
id
customerID
OrganisationID
OutletID
createdAt (timestamp)
type
value
You're off to a great start by having a thorough understanding of your entities and access patterns! I've taken a stab at modeling for these access patterns, but keep in mind this is not the only way to model a solution. Data modeling in DynamoDB is iterative, so this is very likely that this specific design might not fit 100% of your use cases.
With that disclaimer out of the way, let's get into it!
I've modeled your access patterns using a single table named data with global secondary indexes (GSI) named GSI1 and GSI2. Each GSI has partition and sort keys named GSI#PK and GSI#SK respectively.
The base table models the following access patterns:
Fetch customer by ID: getItem where PK=CUST#<id> and SK = A
Fetch all transactions for a customer: query where PK=CUST#<id> and SK begins_with TX
Fetch an outlet by ID: getItem where PK=ORG#<id> and SK = A
Fetch all customers for an outlet: query where PK=OUT#<id>#CUST
That last access pattern may require a bit more explanation. I've chosen to model the relationship between outlets and customers using a unique PK/SK pattern where PK is OUT#<id>#CUST and SK isCUST#<id>. When your application records a transaction for a particular customer, it can insert two records in DDB using a batch write operation. The batch write operation would perform two operations:
Write a new Transaction into the Customer partition (e.g. PK = CUST#1 and SK = TX#<id>)
Write a new record to the CUSTOMERLIST partition (e.g. PK = OUT#<id>#CUST and SK = CUST#<id>). It this record already exists, DynamoDB will just overwrite the existing record, which is fine for your use case.
Moving onto GSI1:
GSI1 supports the following operations:
Fetch outlets by organization: query GSI1 where GSI1PK = ORG#<id>
Fetch transactions by outlet: query GSI1 where GSI1PK = OUT#<id>
Fetch transactions by outlet for a given time period: `query GSI1 where GSI1PK=OUT# and GSI1SK between and
And finally, there's GSI2
GSI2 supports the following transactions:
Fetch transactions by organization: query GSI2 where GSI2PK = ORG#<id>
Fetch transactions by organization for a given time period: query GSI2 where GSI2PK=OUT#<id> and GSI2SK between <period1> and <period2>
For your final access pattern, you've asked to support searching for customers by email or name. DynamoDB is really good at finding items by their primary key. DynamoDB is not good for search, where fuzzy or partial matches are expected. If you need an exact match on email or name, you could do that in DynamoDB by incorporating email//name in the primary key of the User item.
I hope this gives you some ideas on how to model your access patterns!

How to query and order on two separate sort keys in DynamoDB?

GROUPS
userID: string
groupID: string
lastActive: number
birthday: number
Assume I have a DynamoDB table called GROUPS which stores items with these attributes. The table records which users are joined to which groups. Users can be in multiple groups at the same time. Therefore, the composite primary key would most-commonly be:
partition key: userID
sort key: groupID
However, if I wanted to query for all users in a specific group, within a specific birthday range, sorted by lastActive, is this possible and if so what index would I need to create?
Could I synthesize lastActive and userID to create a synthetic sort key, like so:
GROUPS
groupID: string
lastActiveUserID: string (i.e. "20201230T09:45:59-abc123")
birthday: number
Which would make for a different composite primary key where the partition key is groupID and the sort key is lastActiveUserID, which would sort the participants by when they were last active, and then a secondary index to filter by birthday?
As written, no this isn't possible.
within a specific birthday range
implies sk_birthday between :start and :end
sorted by lastActive
implies lastActive as a sort key.
which are mutually exclusive...I can't devise a sort key that would be able to contain both values in a usable format.
You could have a Global Secondary Index with a hash key of group-id and lastActive as a sort key, then filter on birthday. But, that only affects the data returned, it doesn't affect the data read nor the cost to read that data. Additionally, since DDB only reads 1MB of data at a time, you'd have to call it repeatedly in a loop if it's possibly a given group has more than 1MB worth of members.
Also, when your index has a different partition (hash) key than your table, that is a global secondary index (GSI). If your index has the same partition key but a different sort key than the table, that can be done with a local secondary index (LSI)
However for any given query, you can only use the table or a given index. You can't use multiple indexes at the same time
Now having said all that, what exactly to you mean by "specific birthday range" If the range in question is a defined period, by month, by week. Perhaps you could have a GSI where the hash key is "group-id#birthday-period" and sort key is lastActive
So for instance, "give me GROUPA birthdays for next month"
Query(hs = "GROUPA#NOVEMBER")
But if you wanted November and December, you'd have to make two queries and combine & sort the results yourself.
Effective and efficient use of DDB means avoiding Scan() and avoiding the use of filterExpressions that you know will throw away lots of the data read.

How best to perform a query on primary partition key only, for a table which has both partition key and sort key?

Ok, I have a table with primary partition key (Employee ID) and Sort Key (Poject ID). Now I want a list of all projects an employee works on. Also I want list of all employees working on a project. The relationship is many to many. I have created schema in AppSync (GraphQL). Appsync created the required queries and mutations for the type (EmployeeProjects). Now the ListEmployeeProjects takes a filter input with different attributes. My question is when I do the two searches on Employee ID or Project ID only, will it be a complete table scan? How efficient will that be. If it is a table scan, can I reduce the time complexity by creating indexes (GSI or LSI). The end product will have huge amount of data, so I cannot test the app with such data before hand. My project works fine, but I am worried about the problems that might arise later on with a lot of data. Can someone please help.
You don't need to (and should not) perform a Scan for this.
To get all of the projects an employee is working on, you just need to perform a Query on the base table, specifying employee ID as the partition key.
To get all of the employees on a project, you should create a GSI on the table. The partition key should be project ID and sort key should be employee ID. Then perform a Query on the GSI, using partition key of project ID.
In order to model this correctly you will probably want three tables
Employee Table
Project Table
Employee-Project reference table (i.e. just two attributes of employee ID and project ID)

How to store one to many relationship data in dynamodb

As per my data model, I need to store many to many relationship data items in dynamodb
Example :
I have a field called studentId and every studentId will have several subjects assigned to him.
Requirement :
So that for a given studentId, I need to store all the subjects. I would need to get all the subjects assigned to a given student.
Similary, for a given subjectId, I need to know the studentIds whom that subject has been assigned to.
am planning to store this in dynamoDb as follows :
Table1 : StudentToSubjects :
Hash Key : StudenId,
RangeKey: subjectId
so that if I query using only primaryKey, it would give me all the rows having that primary key and all the different hash keys.
Secondary Key as
secondary HashKey: subjectId
Secondary RangeKey: studentId
I wanted to know if this makes sense or the right thing to do. Or there are better ways to solve this problem.
Your Design looks OK but you need to think it through before finalizing it, let say you have implemented this design and after 10 years when you will query the table for particular subject, you will get all the students of past 10 years which you might not need (when you query using secondary table-GSI).
I would probably go with following
Student Master:
Hash Key: studentId
subjectIds (Number-set or String-set)
Subject Master:
Hash Key: subjectId
Range Key: Year
studentIds (Number-set or String-set)
Advantage of this would be you will consume less queries, for particular subject or student you will consume only 1 read (if the size is less then 4kb).
Again this is just scratching a surface think of all the queries before finalizing the Schema.
Edit: You don't need to repeat the studentId it will remain unique.
it would look something like this
studentId -> s1
subjectIds -> sub1,sub2,subN (This is set)
studentId -> s2
subjectIds -> sub3,sub4
Following is the data type link you can refer http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataModel.html#DataModel.DataTypes

SSAS - "Count" measure with multiple members

I am new to SSAS and I'm having some trouble with a "count of rows" measure.
I'm doing this all with named queries from a MySQL database and have created two logical tables in my DSV. The first table is called "Person" and has a primary key of "Person GUID". The second is called "Role" and has a primary key of "Role GUID" and a foreign key of "Person GUID". This is acting as a dimension table and also has an attribute of "Role Name".
What I want to do is be able to select a role from my dimension table and have this show me the number of people in that role, using a "count of records" measure from the Person table. The problem is, people can hold multiple roles, and the way it is structured in my "Role" table is that there is a separate row for each role that a person might have...in other words, "Person GUID", which is how it is mapped to the measure group, could be duplicated many times.
This is not working in SSAS - It doesn't seem to be giving me an accurate count. It appears to be only considering the role of the first instance of a particular Person GUID.
I know that I must be looking at this the wrong way...any help that anyone could offer would be much appreciated. I understand that I could just do a count of rows on the "Roles" table and then be done with it but because I have other dimensions that I want to correlate with it that are also mapped to the "Person" table, this isn't an acceptable solution for me. (These other dimensions have "Person GUID" as the primary key and thus don't have the same problem)
Sounds like you need to model a many to many dimension relationship between the person fact table and role dimension.
Your current role dimension sounds like it needs to be split out into a new bridge table mapping persons to roles. The other table is a simplified role table joining to the mapping table.

Resources