How to store one to many relationship data in dynamodb - amazon-dynamodb

As per my data model, I need to store many to many relationship data items in dynamodb
Example :
I have a field called studentId and every studentId will have several subjects assigned to him.
Requirement :
So that for a given studentId, I need to store all the subjects. I would need to get all the subjects assigned to a given student.
Similary, for a given subjectId, I need to know the studentIds whom that subject has been assigned to.
am planning to store this in dynamoDb as follows :
Table1 : StudentToSubjects :
Hash Key : StudenId,
RangeKey: subjectId
so that if I query using only primaryKey, it would give me all the rows having that primary key and all the different hash keys.
Secondary Key as
secondary HashKey: subjectId
Secondary RangeKey: studentId
I wanted to know if this makes sense or the right thing to do. Or there are better ways to solve this problem.

Your Design looks OK but you need to think it through before finalizing it, let say you have implemented this design and after 10 years when you will query the table for particular subject, you will get all the students of past 10 years which you might not need (when you query using secondary table-GSI).
I would probably go with following
Student Master:
Hash Key: studentId
subjectIds (Number-set or String-set)
Subject Master:
Hash Key: subjectId
Range Key: Year
studentIds (Number-set or String-set)
Advantage of this would be you will consume less queries, for particular subject or student you will consume only 1 read (if the size is less then 4kb).
Again this is just scratching a surface think of all the queries before finalizing the Schema.
Edit: You don't need to repeat the studentId it will remain unique.
it would look something like this
studentId -> s1
subjectIds -> sub1,sub2,subN (This is set)
studentId -> s2
subjectIds -> sub3,sub4
Following is the data type link you can refer http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataModel.html#DataModel.DataTypes

Related

Random Sampling of size N in Dynamo DB without full Table scan

I am new to dynamodb & was having some trouble in finding a way to randomly getting items without a full table scan ,most of the algorithms that i found consist of full table scans
I am also taking the case where we don’t have additional information of the table(Like columns and column Type such info is unknown)
Is there a way exist to do so
You can randomly sample by using a randomly generated exclusive start key for the scan or query operation. The exclusive start key does not have to match a record in the table. It just needs to follow the key structure of the table/index.
As with most questions about queries in DynamoDB, how you structure your data depends on how you want to query it.
For something like a random sampling, you have to make it confirm to the following core constraint of DynamoDB:
You have to provide a partition key
You can provide a sort key
So with a "single table" type design, you could structure your data something like this:
PK
SK
myVal
my_dict
6caaf1e3-eb8d-404a-a2ae-97d6682b0224
foo
my_dict
1c5496e8-c660-4b4e-980f-4abfb1942863
bar
my_dict
56551340-fff8-4824-a5be-70fcaece2e1a
baz
my_other_dict
520a7b37-233c-49dd-87da-77d871d98c92
test1
my_other_dict
65ccd54e-72c3-499d-a3a7-0cd989252607
test2
The PK is the identifier for your collection of random things to look up. The SK is a random UUID. And myVal contains the value you want to be returned.
You can query this db the following way:
SELECT * FROM "my-table" WHERE PK = 'my_dict' AND SK < '06a04e20-b239-48f2-a205-552eb61fef35'
By querying with an UUID as the SK, you'll get the first item in the table with an UUID close to the one you query for. By using a random uuid each time you query, you'll get a random result back.
The particular query above actually returns nothing, so you need to retry until you get a result.
Also, I haven't done the math (who has?), but I'd imagine that periodic queries like this won't generate perfectly random distributions, especially for small data sets.

OR-query on multiple attributes in Amazon DynamoDB

I have a table like this:
Transports
id (PK)
createDt
shipperId
carrierId
consigneeId
1
23
contact3
contact2
contact1
2
24
contact1
contact2
contact3
3
28
contact3
contact2
contact4
My access pattern is:
find all transports where a contact was either shipper, carrier or consignee sorted by createDt. E.g. entering contact1 should return records 1, 2.
How can I do this in DyanomoDB?
I thought about creating a GSI. But then I need to create a separate GSI for each column, which would mean I need to join the query results on the columns myself. Perhaps there is an easier way.
I'd create a GSI on the table and split your single record up into multiple ones.
That would make writes slightly more complex, because you write multiple entities, but I'd do something like this:
PK
SK
type
GSI1PK
GSI1SK
other attributes
TRANSP#1
TRANSP#1
transport
createDt, (shipperId, carrierId, consigneeId)...
TRANSP#1
CONTACT#SHIP
shipper-contact
CONTACT#contact3
TRANSP#1#SHIP
...
TRANSP#1
CONTACT#CARR
carrier-contact
CONTACT#contact2
TRANSP#1#CARR
...
TRANSP#1
CONTACT#CONS
consignee-contact
CONTACT#contact1
TRANSP#1#CONS
...
To get all information about a given Transport ID you do a query with PK=TRANSP#<id>
To get just the basic information about a given Transport, you can do a GetItem on PK=TRANSP#<id> and SK=TRANSP<id> (You could also duplicate the contact infos here if they're fairly static.)
To get all transports a contact is involved in, you do a PK=CONTACT#<id> and SK starts with TRANSP on GSI1
If you really need server-side sorting, you might choose a different GSI1SK, maybe prefix it with the dt value, but I'd probably just do that client side.

Do I need a sort key or should I use AWS DAX

I have a dilemma and I know I should of used an SQL DB from the beginning.
I am unsure if I can use a sort key for my particular use case. I have a table that contains multiple attributes brand, model ref, reference... What I am trying to do is let the user select brand then the model then the reference etc then get all products that match that criteria and give a mean of the prices of those items.
Now doing a scan operation of the whole DB that has 300K+ items is not very cost effect to say the least but this is the situation I am in.
My question is how can I most cost effectively do what I want to do?
Let the table T have only a partition key: ID.
For the sake of the simplicity you let your client choose n = 3 attributes: brand, model-ref, reference.
Now, define a Global Secondary Index (GSI) with partition key: brand_model-ref_reference and sorting key: ID. I suggest you to use Projection: ALL.
Thus, when your client has chosen its 3 values: a, b, c, all you have to do is to query the GSI with brand_model-ref_reference = "a#b#c". You will efficiently fetch all and only the items you need to compute your average. The size of the table is no longer of any importance.
Notes:
With this solution you have to fix in advance the number of criteria and the client must choose a value for all of them. Not so nice.
If there are more constraints all that solution becomes useless. Use it as a hint. :)

Recommended Schema for DynamoDB calendar/event like structure

I'm pretty new to DynamoDB design and trying to get the correct schema for my application. In this app different users will enter various attributes about their day. For example "User X, March 1st 12:00-2:00, Tired". There could be multiple entries for a given time, or overlapping times (e.g. tired from 12-2 and eating lunch from 12-1).
I'll need to query based on user and time ranges. Common queries:
Give me all the "actions" for user X between time t1 and t2
Give me all the start times for action Z for user X
My initial thought was that the partition key would be userid and range key for the start time, but that wont work because of duplicate start times right?
A second thought:
UserID - Partition Key
StartTime - RangeKey
Action - JSON document of all actions for that start time
[{ action: "Lunch", endTime:"1pm"},{action:tired, endTime:"2pm"}]
Any recommendation on a proper schema?
This doesn't really have a one solution. And you will need to evaluate multiple options depending on your use case how much data you have/how often would you query and by which fields etc.
But one good solution is to partition your schema like this.
Generated UUID as partition key
UserID
Start time (in unix epoch time or ISO8601 time format)
Advantages
Can handle multiple time zones
Can easily query for userID and start date (you will need secondary index with primary key userID and sort key start time)
More even distribution and less hot keys of your data across dynamoDB partitions because of randomly generated primary key.
Disadvantages
More data for every item (because of UUID) (+16 bytes)
Additional cost for new secondary index, note scanning the data in table is generally much more expensive than having secondary index.
This is pretty close to your initial thought, in order to get a bit more precise answer we will need a lot more information about how many writes and reads are you planning, and what kind of queries you will need.
You are right in that UserID as Partition key and StartTime as rangeKey would be the obvious choice, if it wasn't for the fact of your overlapping activities.
I would consider going for
UserID - Partition Key
StartTime + uuid - RangeKey
StartTime - Plain old attribute
Datetimes in DynamoDB just get stored as strings anyway. So the idea here is that you have StartTime + some uuid as your rangekey, which gives you a sortable table based on datetime whilst also assuring you have unique primary keys. You could then store the StartTime in a separate attribute or have a function for adding/removing the uuid from the StartTime + uuid attribute.

How do I model 1:N:N relationship in Dynamo DB

I have a school which has a many class/grade(1-10) and each class has many students, I need to store a student's record on a yearly basis so that I could partition better. So its basically Class->N years->N students. How do I model this problem to store this on a Dynamo DB
On NoSQL, the design depends on the Query Access Pattern (QAP). As you have not mentioned QAP, how you would like to retreive the data. I have assumed a typical scenario and provided the below design.
Table : Student
Partition Key : Student Id
Sort Key : year
Other attributes: Student name, class etc.
The year is defined as sort key because a student can study in multiple grades (1-10) during different years. For eg,
2010 - He/She could be on grade 5
2011 - He/She could be on grade 6
In case, if you would like to get all the student ids for a particular year, you can create GSI (Global Secondary Index) on year field.
Partition Key for the index : year
If you have any other access pattern, please update the question. So that we can discuss the answer for that particular query access pattern (QAP).

Resources