Modeling Relational Data in DynamoDB (nested relationship) - amazon-dynamodb

Entity Model:
I've read AWS Guide about create a Modeling Relational Data in DynamoDB. It's so confusing in my access pattern.
Access Pattern
+-------------------------------------------+------------+------------+
| Access Pattern | Params | Conditions |
+-------------------------------------------+------------+------------+
| Get TEST SUITE detail and check that |TestSuiteID | |
| USER_ID belongs to project has test suite | &UserId | |
+-------------------------------------------+------------+------------+
| Get TEST CASE detail and check that | TestCaseID | |
| USER_ID belongs to project has test case | &UserId | |
+-------------------------------------------+------------+------------+
| Remove PROJECT ID, all TEST SUITE | ProjectID | |
| AND TEST CASE also removed | &UserId | |
+-------------------------------------------+------------+------------+
So, I model a relational entity data as guide.
+-------------------------+---------------------------------+
| Primary Key | Attributes |
+-------------------------+ +
| PK | SK | |
+------------+------------+---------------------------------+
| user_1 | USER | FullName | |
+ + +----------------+----------------+
| | | John Doe | |
+ +------------+----------------+----------------+
| | prj_01 | JoinedDate | |
+ + +----------------+----------------+
| | | 2019-04-22 | |
+ +------------+----------------+----------------+
| | prj_02 | JoinedDate | |
+ + +----------------+----------------+
| | | 2019-05-26 | |
+------------+------------+----------------+----------------+
| user_2 | USER | FullName | |
+ + +----------------+----------------+
| | | Harry Potter | |
+ +------------+----------------+----------------+
| | prj_01 | JoinedDate | |
+ + +----------------+----------------+
| | | 2019-04-25 | |
+------------+------------+----------------+----------------+
| prj_01 | PROJECT | Name | Description |
+ + +----------------+----------------+
| | | Facebook Test | Do some stuffs |
+ +------------+----------------+----------------+
| | t_suite_01 | | |
+ + +----------------+----------------+
| | | | |
+------------+------------+----------------+----------------+
| prj_02 | PROJECT | Name | Description |
+ + +----------------+----------------+
| | | Instagram Test | ... |
+------------+------------+----------------+----------------+
| t_suite_01 | TEST_SUITE | Name | |
+ + +----------------+----------------+
| | | Test Suite 1 | |
+ +------------+----------------+----------------+
| | t_case_1 | | |
+ + +----------------+----------------+
| | | | |
+------------+------------+----------------+----------------+
| t_case_1 | TEST_CASE | Name | |
+ + +----------------+----------------+
| | | Test Case 1 | |
+------------+------------+----------------+----------------+
If I just have UserID and TestCaseId as a parameter, how could I get TestCase Detail and verify that UserId has permission.
I've thought about storing complex hierarchical data within a single item. Something likes this
+------------+-------------------------+
| t_suite_01 | user_1#prj_1 |
+------------+-------------------------+
| t_suite_02 | user_1#prj_2 |
+------------+-------------------------+
| t_case_01 | user_1#prj_1#t_suite_01 |
+------------+-------------------------+
| t_case_02 | user_2#prj_1#t_suite_01 |
+------------+-------------------------+
Question: What is the best way for this case? I appreciate if you could give me some suggestion for this approach (bow)

I think the schema below does what you want. Create a Partition Key only GSI on the "GSIPK" attribute and query as follows:
Get Test Suite Detail and Validate User: Query GSI - PK == ProjectId, FilterCondition [SK == TestSuiteId || PK == UserId]
Get Test Case Detail and Validate User: Query GSI - PK == TestCaseId, FilterCondition [SK = TestSuiteId:TestCaseId || PK = UserId]
Remove Project: Query GSI - PK == ProjectId, remove all items returned.
Queries 1 and 2 come back with 1 or 2 items. One is the detail item and the other is the user permissions for the test suite or test case. If only one item returns then its the detail item and the user has no access.

The first question you should ask is: why do I want to use key-value document DB over relational DB when I clearly have strong relations in my data?
The answer might be: I need a single-digit millisecond queries at any scale (millions of records). Or, I want to save money using dynamodb on-demand. If this is not the case, you might be better with a relational DB.
Let’s say you have to go for dynamodb. If so, most of patterns applicable for relational DBs are anti-patterns when it comes to NoSQL. There is a useful talk from last re-invent about design patterns for dynamodb and advice to watch it https://youtu.be/HaEPXoXVf2k.
For your data I’d think about taking similar approach, and having two tables: users and projects.
Projects should store sub-set of test suits as map of array of objects and test cases as map of array of objects. Plus you could add list of user ids in the map of strings. Of course you will need to maintain this list when users join or leave the project/s.
This should satisfy your access patterns.

Related

How to replace empty spaces with values from adjacent colum that needs to be separated?

Hi everyone. I'm so sorry for my english. I need to separate the
domain data of some emails in a table. Then, if these mail data have
the domain of a country, this information must be moved to another
column that is incomplete in which the participants of a congress are
included. This for a relatively large database. I put an example
below.
| email | country |
| -------- | -------------- |
| naco#gmail.com | CO |
| monic45814#gmail.com | AR |
| jsalazar#chapingo.mx | |
| andresramirez#urosario.edu.co | |
| jeimy861491#hotmail.com | CL |
|jytvc#hotmail.com | |
Outcome should be
| email | country |
| -------- | -------------- |
| naco#gmail.com | CO |
| monic45814#gmail.com | AR |
| jsalazar#chapingo.mx | MX |
| andresramirez#urosario.edu.co | CO |
|jeimy861491#hotmail.com | CL |
|jytvc#hotmail.com | *NA* |
Thank you so much.
You can use str_extract to get the string after the last occurrence of "." and if_else to ignore rows that already have a country and rows which e-mail doesn't end with a country code:
df %>%
mutate(country = if_else(is.na(country) & str_extract(email, "[^.]+$") != "com", toupper(str_extract(email, "[^.]+$")), country))
small but not so small PS: I would always recommend to provide fake data when you are mentioning personal data like e-mail addresses
Here is a solution in base R.
Suppose:
df<-data.frame(email,country)
Then:
df$country<-ifelse(is.na(df$country)&sub(".*(.*?)[\\.|:]", "",df$email)!="com",sub(".*(.*?)[\\.|:]", "",df$email),paste(df$country))

MariaDB DATETIME Index not working with Between FROM_UNIXTIME()

I have a table with DATETIME field, which is indexed by a BTree. Now i want to query it with following statement:
SELECT
count(us.CITY) as metric,
us.CITY as Name,
us.LATITUDE as latitude,
us.LONGITUDE as longitude
FROM
FACT
LEFT JOIN
USER us
ON
us.ID_USER = FACT.USER
WHERE
ASSESSMENT_DATE BETWEEN FROM_UNIXTIME(1601568552) AND FROM_UNIXTIME(1604028277)
GROUP BY us.CITY, us.LATITUDE, us.LONGITUDE;
EXPLAIN:
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | FACT | ALL | INDEX_FACT_ASSESSMENT_DATE | NULL | NULL | NULL | 762621 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | us | eq_ref | PRIMARY | PRIMARY | 46 | dwh0.FACT.USER,dwh0.FACT.ENV | 1 | |
+------+-------------+-------+--------+----------------------------+---------+---------+------------------------------+--------+----------------------------------------------+
2 rows in set (0.001 sec)
Interestingly, by only changing the dates manually into the DATETIME Format string it uses the index. But the FROM_UNIXTIME() function should in my opinion return the exactly same thing...
SELECT
count(us.CITY) as metric,
us.CITY as Name,
us.LATITUDE as latitude,
us.LONGITUDE as longitude
FROM
FACT
LEFT JOIN
USER us
ON
us.ENV = FACT.ENV AND us.ID_USER = FACT.USER
WHERE
-- ASSESSMENT_DATE BETWEEN FROM_UNIXTIME(1596649101) AND FROM_UNIXTIME(1599108827)
ASSESSMENT_DATE BETWEEN '2020-08-05 11:30:11.987' AND '2020-09-03 11:30:11.987'
GROUP BY us.CITY, us.LATITUDE, us.LONGITUDE;
EXPLAIN:
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
| 1 | SIMPLE | FACT | range | INDEX_FACT_ASSESSMENT_DATE | INDEX_FACT_ASSESSMENT_DATE | 5 | NULL | 132008 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | us | eq_ref | PRIMARY | PRIMARY | 46 | dwh0.FACT.USER,dwh0.FACT.ENV | 1 |
|
+------+-------------+-------+--------+----------------------------+----------------------------+---------+------------------------------+--------+--------------------------------------------------------+
2 rows in set (0.001 sec)
Can anyone refer to such a problem? the where clause is generated by grafana, so i can not change that, but the rest i can change if it changes something.
Thanks for suggestions!
Sorry for bothering.. after around 10^5 more inserts, it works for both cases... Maybe it was just bad luck

DynamoDB intersection select with pagination

I have following DB schema and I'd like to find the best way how to select list of Sorted keys which are common for PK_A and PK_B:
+---------------+---------+
| PK | SortKey |
+---------------+---------+
| | SK_A |
| PK_A | SK_B |
| | SK_C |
| - - - - - - - | |
| | SK_B |
| PK_B | SK_C |
| | SK_D |
+---------------+---------+
so when I do select by PK_A and PK_B it should return me only SK_B and SK_C?
Any help is appreciated.
Simple answer, you can't do it (in one call).
Dynamo is not a relational database, operations such as intersection are not supported.
You'd need to query() once for each partition key and then calculate the intersect yourself.

WSO2 analytics server database is growing

I am using WSO2 API Manager along with it's analytics server. I configured MySQL as it's database.
After a year of PROD use, I found that there are couple of tables from Analytics module, which consumes most of the DB space, around 95%.
Would like to know the significance of these tables. As well the challenges if we delete those tables.
Table names are
+--------------------------------+------------------------------------------------------+------------+
| Database | Table | Size in MB |
+--------------------------------+------------------------------------------------------+------------+
| wso2_analytics_event_store | anx___7lsekeca_ | 665.03 |
| wso2_analytics_event_store | anx___7lmnf2xa_ | 638.00 |
| wso2_analytics_event_store | anx___7lqcf_8o_ | 636.14 |
| wso2_analytics_event_store | anx___7lmk3tr0_ | 398.13 |
| analytics_processed_data_store | anx___7lpteea4_ | 282.75 |
| analytics_processed_data_store | anx___7lsn7ita_ | 249.97 |
| wso2_analytics_event_store | anx___7lsgqyce_ | 209.25 |
| wso2_analytics_event_store | anx___7lmno15m_ | 207.25 |
| wso2_analytics_event_store | anx___7lver1fy_ | 191.16 |
You can enable data purging for analytics tables. See below section taken from the docs.
Ref: https://docs.wso2.com/display/AM220/Purging+Analytics+Data

Asterisk dial function answered extension

When I dial multiple extension with dial function, I couldn't find which extension answered.
I'm using dial function with these parameters dial(SIP/1001&SIP/1002&SIP/1003,30,tTr) and I'm checking results on the real-time table with MySQL. But when I check the cdr record on table its looking like
+---------------------+-----------------------------+--------------+------------+----------------+---------------------------+---------------------------+------------+---------------------------------------------------------------------------------+----------+---------+-------------+----------+-------------+-----------+----------------+----------------+----------+-------------+
| calldate | clid | src | dst | dcontext | channel | dstchannel | lastapp | lastdata | duration | billsec | disposition | amaflags | accountcode | userfield | uniqueid | linkedid | sequence | peeraccount |
+---------------------+-----------------------------+--------------+------------+----------------+---------------------------+---------------------------+------------+---------------------------------------------------------------------------------+----------+---------+-------------+----------+-------------+-----------+----------------+----------------+----------+-------------+
| 2018-04-06 17:10:17 | "05555555555" <05555555555> | 05555555555| aa | aaContext | SIP/908500000000-000000f7 | SIP/908500000000-000000f8 | Dial | SIP/1001&SIP/1002&SIP/1003 | 462 | 435 | ANSWERED | 3 | | | 1523049017.247 | 1523049017.247 | 269 | |
So, I can see which channel answered, but there is no extension on it
You can have 3 solutions
1) CEL. Channel Level Log in newer asterisk version allow you get much more events per call.
2) Dial like Local/1001#ext&Local/1002#ext&Local/1003#ext.
This way you will have more cdrs in ext context.
3) Use on-answer macro and record which ext answered.

Resources