Need guidance on data modelling dynamodb for below example - amazon-dynamodb

I have flowchart like below (logical design) sorry for my bad shaped diagram (just created in paint) my apologies for that. But now I need guidance how to put this representation in dynamo db table definition. I am newbie to dynamo db and reading through different types of data types which i use, I could still think of map type for some of the attributes, but still dont have full view how can i efficiently.
What I need is: to put these logical design into single dynamodb table . I need table definition (i.e which data type is suitable for which attribute)
Note: Yes or No which is there in green font in diagram are actual values all others are hierarchy labels
Note1: In diagram, that parent can be treated as customer_id

Related

AWS DynamoDB Naming Convention

I am trying to create a naming convention for different objects in DynamoDB, such as tables, partition and sort keys, LSIs, GSIs, attributes, etc. I read a lot of articles and there is no common way to do that but want to learn from real-time examples to choose which one will fit best our needs.
The infrastructure I am working on is based on microservices. Along with this, some of our development environments share the same AWS account. Based on this, I ended up with something like this:
Tables: [Environment].[Service Name].[Table Name].ddb-table
GSIs/LSIs: [Environment].[Service Name].[Table Name].[GSI/LSI Name].ddb-[gsi/lsi]
Partition Key: pk ??? (in my understanding, the keys should have abstract names, because the single table stores versatile data in the same key)
Sort Key: sk ??? (in my understanding, the keys should have abstract names, because the single table stores versatile data in the same key)
Attributes: meaningful but as short as possible as they are kept for every item in the table
Different elements are separated by dot (.)
All names are separated by dashes (kebab-case) and in lower case
Tables/GSIs/LSIs are in singular form
Here is an example:
Table: dev.user-service.user-order.ddb-table
LSI: dev.user-service.user-order.lsi1pk.ddb-lsi
GSI: dev.user-service.user-order.gsi1pk.ddb-gsi
What naming conventions do you follow?
Thanks a lot in advance!
My advice:
Use PK and SK as your partition key and sort key.
Don't put table names into code. Use ParameterStore. For example, if you ever do a table restore it will be to a new table name, and if you want to send traffic to the new name you'll not want to change code.
Thus don't get too fixed to any particular table name. Never try to have code predict a table name. Only have them be consistent to help humans.
Don't put regions in your table names. When you switch to Global Tables they all keep the same name. Awkward!
GSIs can be called GSI1, GSI2, etc. GSI keys are GSI1PK and GSI1SK, etc.
Tag your tables with their name if you ever want to track per-table costs later.
Short yet meaningful attribute names are nice because it reduces storage and can reduce RCU/WCU if you're near the 4kb or 1kb lines.
Use difference accounts for dev, staging, and production. If you want to put the names into tables as well to help you spot "OMG I'm in production" that's fine.
If you have lots of attributes as the item payload which aren't used for GSIs or filtering and are always returned together, consider just storing them as a string or binary which gets parsed client side. You can even compress them. It's more efficient and lower latency because it skips the data marshaling.

ER diagram - design issues

There are 3 entities:
vehicle_model
vehicle
extra_options (such as open top, leather seats, etc..)
Vehicle model can have a subset of the extra options.
Vehicle can have a subset of it's model extras.
I've been trying hours to figure out how to represent this as er diagram, but without success. I Thought about ternary relationship ,and although I don't understand it completely I think this isn't the way.
I thought about creating another 2 entities, model_ext & vehicle_ext ,so that vehicle_ext would be connected to model_ext but this isn't a good design.
This is my first er diagram design. I'm really lost (read er-diagram chapter in "Silberschatz, Database System Concepts" three times already) so any idea would be appreciated.
did you try adding a new table say 'vehicle_vehicle_model_extra_options_map'? (you can name this table to any thing short, but for better explanation i use __map as a standard way for defining the map tables.)
note those two null able foreign key columns in this table.
Basically, vehicle has one to many relation to extra_options, and vehicle_model has one to many relation to extra_options table, therefore the new table was added.
updated:

Visual Paradigm for UML - Allow duplicate names

Is there a way how to allow duplicate names in ERD diagrams in Visual Paradigm for UML? What I am trying to do is to document database changes. I want to have every database version in separate diagram. Changes to database are incremental so there are many objects with the same name in different diagrams. Now I am constrained to use postfixes. So first diagram contains table named "METER". Second diagram contains table named "METER2" which is not a real name of database object. In fact "METER2" table doesnt exist, real name is "METER".
Use different Models. Go to "Model Explorer", create a new model and then you can use the same entity names in different ERD diagrams.

How to setup data model for customizable application

I have an ASP.NET data entry application that is used by multiple clients. The application consists of multiple data entry modules that are common to all clients.
I now have multiple clients that want their own custom module added which will typically consist of a dozen or so data points. Some values will be text, others numeric, some will be dropdown selections, etc.
I'm in need of suggestions for handling the data model for this. I have two thoughts on how to handle. First would be to create a new table for each new module for each client. This is pretty clean but I don't particular like it. My other thought is to have one table with columns for each custom data point for each client. This table would end up with a lot of columns and a lot of NULL values. I don't really like either solution and suspect there's a better way to do this, so any feedback you have will be appreciated.
I'm using SQL Server 2008.
As always with these questions, "it depends".
The dreaded key-value table.
This approach relies on a table which lists the fields and their values as individual records.
CustomFields(clientId int, fieldName sysname, fieldValue varbinary)
Benefits:
Infinitely flexible
Easy to implement
Easy to index
non existing values take no space
Disadvantage:
Showing a list of all records with complete field list is a very dirty query
The Microsoft way
The Microsoft way of this kind of problem is "sparse columns" (introduced in SQL 2008)
Benefits:
Blessed by the people who design SQL Server
records can be queried without having to apply fancy pivots
Fields without data don't take space on disk
Disadvantage:
Many technical restrictions
a new field requires DML
The xml tax
You can add an xml field to the table which will be used to store all the "extra" fields.
Benefits:
unlimited flexibility
can be indexed
storage efficient (when it fits in a page)
With some xpath gymnastics the fields can be included in a flat recordset.
schema can be enforced with schema collections
Disadvantages:
not clearly visible what's in the field
xquery support in SQL Server has gaps which makes getting your data a real nightmare sometimes
There are maybe more solutions, but to me these are the main contenders. Which one to choose:
key-value seems appropriate when the number of extra fields is limited. (say no more than 10-20 or so)
Sparse columns is more suitable for data with many properties which are filled out infrequent. Sounds more appropriate when you can have many extra fields
xml column is very flexible, but a pain to query. Appropriate for solutions that write rarely and query rarely. ie: don't run aggregates etc on the data stored in this field.
I'd suggest you go with the first option you described. I wouldn't over think it. The second option you outlined would be a bad idea in my opinion.
If there are fields common to all the modules you're adding to the system you should consider keeping those in a single table then have other tables with the fields specific to a particular module related back to the primary key in the common table. This is basically table inheritance (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server) and will centralize the common module data and make it easier to query across modules.

Add new columns in asp .net application

I am facing this question in a new little project:
The system to be built will allow user to add new columns to a table in the system, and then the user will be able to maintain the data, I think there is two ways to implement this:
1) create a few tables including "columns" table with "columnName" "columnValue" "datatype" etc to store the column definition, aonther table "XXCoumn" to store the value of the column (entered by user), and user a store procedure to query/update column data.
2) create the column in the table schema when user enter a new column, then the maintain of the table data is just as normal
which way do you guys reckon? or any new suggestion?
Some additional info: the data volumn is small, and I need to create reports.
Any good recommendations would require a much better understanding of your requirements, but here are some comments on the options you mentioned, as well as some additional thoughts.
1) Entity-Attribute-Value (EAV) Design: This is the option you describe where you have a table that has columns for ColumnName, Type and Value. This option has the advantage of being able to accomodate unlimited new columns easily, but I have found it to be painful when the time comes to retrieve meaningful data back. For example, say you have rows in this EAV table for {Color, varchar}{Red, Green, Blue}, and {Size, varchar}{Small, Medium, Large}. If you want to find all the small green items, you need something like this (untested SQL of course):
SELECT *
FROM ITEMS
WHERE ITEMID IN (SELECT ITEMID
FROM ITEM_ATTRIBUTES ATT INNER JOIN ITEM_VALUES VLS
ON ATT.AttributeID = VLS.AttributeID
WHERE ATT.ColumnName = 'Color' AND VLS.Value = 'Green')
AND ITEMID IN (SELECT ITEMID
FROM ITEM_ATTRIBUTES ATT INNER JOIN ITEM_VALUES VLS
ON ATT.AttributeID = VLS.AttributeID
WHERE ATT.ColumnName = 'Size' AND VLS.Value = 'Small')
Contrast this with having actual columns on the items table for color and size:
SELECT *
FROM ITEMS
WHERE COLOR = 'Green' AND SIZE = 'Small'
In addition, you will have a difficult time maintaining data integrity, if that is important for this app (and it is almost always important, even when you are told otherwise). In the example above, you will need to implement extra logic if "Color" should be limited to Blue, Green, and Red. Also, you will need to implement even more logic if certain colors only come in certain sizes (example - blue items are only available in small and medium)
2) User-Defined Columns: Just giving the user the ability to add additional columns to the table has the advantage of making data retrieval simpler, but all the data integrity issues remain. Also, your app usually requires extra logic to deal with the unknown columns.
3) Pre-Existing Custom Columns: I have worked with a few apps, such as CRMs, that provide a dozen or more columns already in place for user definition. Basically, the designers put in columns like "Text1","Text2","Text3","Number1","Number2", etc. The users then provide header and description information for these columns, and that is what the app uses for display purposes. This model has the advantage of easy data retrieval, as well as a pre-defined DB schema which should simplify app logic. Data integrity issues remain, however. The other obvious downside is that you will run out of pre-defined columns, which is what you are usually trying to avoid with this type of solution.
As with most design issues, there are tradeoffs to each solution. My experience has been that while many users/clients say they want solutions like these, in reality they are simply trying to ensure they don't get trapped with an app that can't grow with their needs. I have found that there are actually very few places where a design like this is needed. I can almost always create a design that addresses the expansion desires of the client without putting them into the role of database designer.
"The system to be built will allow user to add new columns to a table in the system..."
Really - that's the user story? Sounds like you've already made up your mind on the solution, to me.
Whether it's a good idea or not to allow a user to extend schemas is pretty context dependent. I'd have little problem in an admin-like, limited use way. But it'd be a horribly bad idea in a MySpace type way. I suspect your situation lies somewhere between those 2 extremes.
Extending the schema would lead to greatly more efficient queries - as you could add indexes and such - but it does expose some relational rules on your users. Also, the extension would (probably) lock the entire table and concurrent edits would need to be dealt with.
If this is centrally hosted by you, I would suggest NOT allowing user-input data to change the schema of the database (i.e. drive the creation of new tables).
Rather you may want to look into using XML fields in SQL to store variable field names of data, or a more generic table structure... this technique works pretty well if we're not talking crazy amounts of data...
Is it possible you're looking at your solution sideways? It sounds like you need a mapping table (sort of like your #1). You have a table, say "objects" for example, a table called "properties" which holds what you're calling columns and then a table that holds the values, so it just has object_id, property_id, value.
To put in a smarter way than I said it, take a look at the Entity-attribute-value model.

Resources