As a lockdown project I'm introducing myself to the concept of multi-tenancy applications. My simple application has a tenant who has a an online shop front. The shop has product categories each containing many products. My initial thought on database schema is as follows:
+====================================================================================+
| Primary Key | Sort Key (GSI PK) | Attribute 1 (GSI SK) | Attribute 2 | Attribute 3 |
|-------------|-------------------|----------------------|-------------|-------------|
| TENANT-uuid | CATEGORY-uuid | categoryName | ... | ... |
| TENANT-uuid | PRODUCT-uuid | productName | ... | ... |
| TENANT-uuid | PRODUCT-uuid | productName | ... | ... |
+====================================================================================+
So our GSI looks like so:
+=======================================================================================+
| Primary Key | Sort Key | Attribute 1 (PK) | Attribute 2 (SK) | Attribute 3 |
|---------------|-------------------|------------------|------------------|-------------|
| CATEGORY-uuid | categoryName | TENANT-uuid | CATEGORY-uuid | ... |
| PRODUCT-uuid | productName | TENANT-uuid | PRODUCT-uuid | ... |
| PRODUCT-uuid | productName | TENANT-uuid | PRODUCT-uuid | ... |
+=======================================================================================+
If I were to implement the following role policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem"
],
"Resource": [
"arn:aws:dynamodb:XXX:XXX:table/XXX"
],
"Condition": {
"ForAllValues:StringEquals": {
"dynamodb:LeadingKeys": [
"TENANT-uuid"
]
}
}
}
]
}
How does the LeadingKeys condition work if we're running a query on an index?
Update 1
So upon further inspection it seems one way to do this (for this situation) is to have a GSI with the partition key as the TENANT-uuid and the sort key as the item's parent. I've realised I should probably add slightly more information as follows.
Our desired outcomes are:
Get list of tenant's categories -> Query with PK = TENANT-uuid and SK BeginsWith "CATEGORY"
Get list of tenant's products -> Query with PK = TENANT-uuid and SK BegingsWith "PRODUCT"
Get list of products in a specific tenant's category -> ???
Get single tenant's category -> Query with PK = TENANT-uuid and SK = CATEGORY-uuid
Get single tenant's product -> Query with PK = TENANT-uuid and SK = PRODUCT-uuid
As it stands the only one that was an issue was number 3. A little reorganisation of the schema as follows seems to work. However it does limit our ability to sort our data slightly.
Table
+----------------------+---------------+-----------------+-------------+
| TenantID (PK/GSI PK) | ItemType (SK) | Data - (GSI SK) | Attribute 2 |
+----------------------+---------------+-----------------+-------------+
| TENANT-uuid | CATEGORY-1 | Category Name | ... |
+----------------------+---------------+-----------------+-------------+
| TENANT-uuid | PRODUCT-1 | CATEGORY-1 | ... |
+----------------------+---------------+-----------------+-------------+
| TENANT-uuid | PRODUCT-2 | CATEGORY-1 | ... |
+----------------------+---------------+-----------------+-------------+
Index
+---------------+---------------+------------+-------------+
| TenantID (PK) | Data (SK) | ItemType | Attribute 2 |
+---------------+---------------+------------+-------------+
| TENANT-uuid | Category Name | CATEGORY-1 | ... |
+---------------+---------------+------------+-------------+
| TENANT-uuid | CATEGORY-1 | PRODUCT-1 | ... |
+---------------+---------------+------------+-------------+
| TENANT-uuid | CATEGORY-1 | PRODUCT-2 | ... |
+---------------+---------------+------------+-------------+
So now, for number 3, to get a list of products in a specific tenant's category we query the index with PK = TENANT-uuid and SK=CATEGORY-uuid
This allows us to meet the leadingKeys condition.
However, I'm not sure if this it the best solution. For the time being, in my little project, it works.
After almost giving up, I have found a solution. See this SO post describing how you can use wildcards in the IAM policy. Then in your GSI's, you could prefix each of your Id's with a tenant ID. Using your second table as an example, replace CATEGORY-uuid with TENANT-uuid-CATEGORY-uuid
And then your policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem"
],
"Resource": [
"arn:aws:dynamodb:XXX:XXX:table/XXX"
],
"Condition": {
"ForAllValues:StringLike": {
"dynamodb:LeadingKeys": [
"TENANT-uuid*"
]
}
}
}
]
}
I tested this quick, it works just fine, and this is the approach I plan to use in my multi-tenant app.
Related
I have a table with DATETIME field, which is indexed by a BTree. Now i want to query it with following statement:
SELECT
count(us.CITY) as metric,
us.CITY as Name,
us.LATITUDE as latitude,
us.LONGITUDE as longitude
FROM
FACT
LEFT JOIN
USER us
ON
us.ID_USER = FACT.USER
WHERE
ASSESSMENT_DATE BETWEEN FROM_UNIXTIME(1601568552) AND FROM_UNIXTIME(1604028277)
GROUP BY us.CITY, us.LATITUDE, us.LONGITUDE;
EXPLAIN:
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | FACT | ALL | INDEX_FACT_ASSESSMENT_DATE | NULL | NULL | NULL | 762621 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | us | eq_ref | PRIMARY | PRIMARY | 46 | dwh0.FACT.USER,dwh0.FACT.ENV | 1 | |
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
2 rows in set (0.001 sec)
Interestingly, by only changing the dates manually into the DATETIME Format string it uses the index. But the FROM_UNIXTIME() function should in my opinion return the exactly same thing...
SELECT
count(us.CITY) as metric,
us.CITY as Name,
us.LATITUDE as latitude,
us.LONGITUDE as longitude
FROM
FACT
LEFT JOIN
USER us
ON
us.ENV = FACT.ENV AND us.ID_USER = FACT.USER
WHERE
-- ASSESSMENT_DATE BETWEEN FROM_UNIXTIME(1596649101) AND FROM_UNIXTIME(1599108827)
ASSESSMENT_DATE BETWEEN '2020-08-05 11:30:11.987' AND '2020-09-03 11:30:11.987'
GROUP BY us.CITY, us.LATITUDE, us.LONGITUDE;
EXPLAIN:
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
| 1 | SIMPLE | FACT | range | INDEX_FACT_ASSESSMENT_DATE | INDEX_FACT_ASSESSMENT_DATE | 5 | NULL | 132008 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | us | eq_ref | PRIMARY | PRIMARY | 46 | dwh0.FACT.USER,dwh0.FACT.ENV | 1 |
|
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
2 rows in set (0.001 sec)
Can anyone refer to such a problem? the where clause is generated by grafana, so i can not change that, but the rest i can change if it changes something.
Thanks for suggestions!
Sorry for bothering.. after around 10^5 more inserts, it works for both cases... Maybe it was just bad luck
I already have GSI set to author as partition key and status as a sort key and use them to query the data I need. However, I can't figure out to sort my return data by chronological order.
DynamoDb table
-----------------------------------------------
| id | post | author | status | createdAt |
-----------------------------------------------
| 1 | post1 | author1 | publish | 2019-12-10 |
-----------------------------------------------
| 2 | post2 | author2 | draft | 2019-12-11 |
-----------------------------------------------
| 3 | post3 | author1 | publish | 2019-12-12 |
-----------------------------------------------
and the query
var params = {
TableName : "TABLENAME",
IndexName: "gsi-authorStatus",
KeyConditionExpression: "author = :author AND status = :status",
ExpressionAttributeValues: {
":author": JSON.stringify(event.bodyJSON.author),
":status": JSON.stringify(event.bodyJSON.status)
}
};
dynamo.query(params, function (err, data) {
if (err) {
console.error('Error with ', err);
context.fail(err);
} else {
context.succeed(data);
}
});
My query give me the data where author and status are matching but it give me a data in a random order. It is possible to use createdAt to make the return data order by latest?
Entity Model:
I've read AWS Guide about create a Modeling Relational Data in DynamoDB. It's so confusing in my access pattern.
Access Pattern
+-------------------------------------------+------------+------------+
| Access Pattern | Params | Conditions |
+-------------------------------------------+------------+------------+
| Get TEST SUITE detail and check that |TestSuiteID | |
| USER_ID belongs to project has test suite | &UserId | |
+-------------------------------------------+------------+------------+
| Get TEST CASE detail and check that | TestCaseID | |
| USER_ID belongs to project has test case | &UserId | |
+-------------------------------------------+------------+------------+
| Remove PROJECT ID, all TEST SUITE | ProjectID | |
| AND TEST CASE also removed | &UserId | |
+-------------------------------------------+------------+------------+
So, I model a relational entity data as guide.
+-------------------------+---------------------------------+
| Primary Key | Attributes |
+-------------------------+ +
| PK | SK | |
+------------+------------+---------------------------------+
| user_1 | USER | FullName | |
+ + +----------------+----------------+
| | | John Doe | |
+ +------------+----------------+----------------+
| | prj_01 | JoinedDate | |
+ + +----------------+----------------+
| | | 2019-04-22 | |
+ +------------+----------------+----------------+
| | prj_02 | JoinedDate | |
+ + +----------------+----------------+
| | | 2019-05-26 | |
+------------+------------+----------------+----------------+
| user_2 | USER | FullName | |
+ + +----------------+----------------+
| | | Harry Potter | |
+ +------------+----------------+----------------+
| | prj_01 | JoinedDate | |
+ + +----------------+----------------+
| | | 2019-04-25 | |
+------------+------------+----------------+----------------+
| prj_01 | PROJECT | Name | Description |
+ + +----------------+----------------+
| | | Facebook Test | Do some stuffs |
+ +------------+----------------+----------------+
| | t_suite_01 | | |
+ + +----------------+----------------+
| | | | |
+------------+------------+----------------+----------------+
| prj_02 | PROJECT | Name | Description |
+ + +----------------+----------------+
| | | Instagram Test | ... |
+------------+------------+----------------+----------------+
| t_suite_01 | TEST_SUITE | Name | |
+ + +----------------+----------------+
| | | Test Suite 1 | |
+ +------------+----------------+----------------+
| | t_case_1 | | |
+ + +----------------+----------------+
| | | | |
+------------+------------+----------------+----------------+
| t_case_1 | TEST_CASE | Name | |
+ + +----------------+----------------+
| | | Test Case 1 | |
+------------+------------+----------------+----------------+
If I just have UserID and TestCaseId as a parameter, how could I get TestCase Detail and verify that UserId has permission.
I've thought about storing complex hierarchical data within a single item. Something likes this
+------------+-------------------------+
| t_suite_01 | user_1#prj_1 |
+------------+-------------------------+
| t_suite_02 | user_1#prj_2 |
+------------+-------------------------+
| t_case_01 | user_1#prj_1#t_suite_01 |
+------------+-------------------------+
| t_case_02 | user_2#prj_1#t_suite_01 |
+------------+-------------------------+
Question: What is the best way for this case? I appreciate if you could give me some suggestion for this approach (bow)
I think the schema below does what you want. Create a Partition Key only GSI on the "GSIPK" attribute and query as follows:
Get Test Suite Detail and Validate User: Query GSI - PK == ProjectId, FilterCondition [SK == TestSuiteId || PK == UserId]
Get Test Case Detail and Validate User: Query GSI - PK == TestCaseId, FilterCondition [SK = TestSuiteId:TestCaseId || PK = UserId]
Remove Project: Query GSI - PK == ProjectId, remove all items returned.
Queries 1 and 2 come back with 1 or 2 items. One is the detail item and the other is the user permissions for the test suite or test case. If only one item returns then its the detail item and the user has no access.
The first question you should ask is: why do I want to use key-value document DB over relational DB when I clearly have strong relations in my data?
The answer might be: I need a single-digit millisecond queries at any scale (millions of records). Or, I want to save money using dynamodb on-demand. If this is not the case, you might be better with a relational DB.
Let’s say you have to go for dynamodb. If so, most of patterns applicable for relational DBs are anti-patterns when it comes to NoSQL. There is a useful talk from last re-invent about design patterns for dynamodb and advice to watch it https://youtu.be/HaEPXoXVf2k.
For your data I’d think about taking similar approach, and having two tables: users and projects.
Projects should store sub-set of test suits as map of array of objects and test cases as map of array of objects. Plus you could add list of user ids in the map of strings. Of course you will need to maintain this list when users join or leave the project/s.
This should satisfy your access patterns.
I'm having some trouble getting App Maker to respect the order of a many-to-many relation.
Let's say I have two models:
Model 1 has an ID and a many-to-many relation to model 2 which also has an ID.
App maker generates three tables:
DESCRIBE model_1;
+--------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+----------------+
| Id | int(11) | NO | PRI | NULL | auto_increment |
+--------------------+--------------+------+-----+---------+----------------+
DESCRIBE model_2;
+--------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+----------------+
| Id | int(11) | NO | PRI | NULL | auto_increment |
+--------------------+--------------+------+-----+---------+----------------+
DESCRIBE model_1_Has_model_2;
+------------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+---------+------+-----+---------+-------+
| parentModel1_fk | int(11) | NO | MUL | NULL | |
| childModel2_fk | int(11) | NO | MUL | NULL | |
+------------------+---------+------+-----+---------+-------+
Now let's say I have a model_1 object with ID 1 and three model_2 objects with IDs 1, 2, 3. If I assign model_1.childModel_2 to [model_2_ID_1, model_2_ID_2] the model_1_Has_model_2 table will contain:
parentModel1_fk | childModel2_fk
--------------------------------
1 | 1
1 | 2
Now let's say I splice model_1.childModel_2 using model_1.childModel_2.splice(0, 1) and then insert model_2 ID 3 in index 0 using model_1.childModel_2.splice(0, 0, model_2_ID_3). I would expect my table to contain the following:
parentModel1_fk | childModel2_fk
--------------------------------
1 | 3
1 | 1
However it contains the opposite:
parentModel1_fk | childModel2_fk
--------------------------------
1 | 1
1 | 3
Is there any way I can stop this behavior short of clearing the entire relation and then setting it to my new expected order?
The short answer is no. App Maker is just creating a new record, not rearranging the table. Otherwise it would have to edit all the records below the desired insertion point (which could be a prohibitively time consuming transaction). If this is the desired functionality, you'll have to do it manually.
I would seriously consider creating your own join table that will allow you to have additional columns, where you can store the desired sort order.
I am trying to do the following.
Connected to DevCluster
[cqlsh 5.0.1 | Cassandra 3.10.0.1695 | DSE 5.1.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
user#cqlsh:test> desc table del28;
CREATE TABLE test.del28 (
sno int PRIMARY KEY,
dob date,
name range_dates,
ssss_details map<text, date>,
ssss_range map<text, frozen<map<date, date>>> );
CREATE INDEX idx_ssss_range ON test.del28 (keys(ssss_range));
CREATE INDEX ssss_details_idx ON test.del28 (values(ssss_details));
CREATE INDEX ssss_range_idx ON test.del28 (values(ssss_range));
user#cqlsh:test> select * from del28;
sno | dob | name | ssss_details | ssss_range
-----+------+--------------------------------------+----------------------------------------------+---------------------------------
5 | null | {start: 2014-03-05, end: 2018-04-05} | {'hello': 2014-05-05} | {'1': {2018-04-05: 2012-02-05}}
8 | null | {start: 2018-03-04, end: 2018-08-02} | {'hello8': 2018-08-08} | {'8': {2018-08-08: 2012-02-08}}
2 | null | {start: 2018-03-04, end: 2018-05-05} | {'hello': 2018-05-05} | {'1': {2018-07-08: 2018-09-01}}
4 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello1': 2014-05-02} | {'1': {2018-04-08: 2012-02-04}}
7 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello4': 2014-05-03, 'hello5': 2014-05-02} | {'2': {2018-04-08: 2012-02-04}}
6 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello2': 2014-05-02, 'hello3': 2014-05-03} | {'2': {2018-04-08: 2012-02-04}}
9 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello7': 2014-05-02, 'hello8': 2014-05-03} | {'2': {2018-04-08: 2012-02-04}}
3 | null | {start: 2014-03-04, end: 2018-04-02} | {'hello': 2014-05-02} | {'1': {2018-04-08: 2012-02-04}}
(8 rows)
My question is, can I use Filters on ssss_range, if so how? If not what is the best way to save this data. Idea is, there is number or text followed by dates. Example house1: {2012-04-05: 2013-02-05}, house2:{2013-04-08: 2014-02-04}...... for one particular user and where dates are a set and , explaining that person stayed on these times. I tried to split the dates in 'name' column. Still it did not work for me. Now there is lot of other info regarding this record.
I should be able to query based on house1, house2 i.e. where aaa = 'house1', some thing like that. Also should be able query based on dates i.e. where from_date > '' and to_date < ''. Something like that.
I am okay to change the way data is changed if it can query a better way. Any type of collections or data types are fine.
Please suggest the right approach.
Thanks