sql server database design - asp.net

I am planning to create a website using ASP.NET and SQL Server. However, my plan for the database design leaves me wondering if there is a better way.
The website will serve as a repository of information for various users. I figure I would have two databases, a Membership and Profile database.
The profile database would contain user data for all users, where each user may have ~20 tables. I would create the tables when the user account is created and generate a key used to name the tables. The tables are not directly related.
For Example a set of tables for two different users could look like:
User1 Tables - TransactionTable_Key1, AssetTable_Key1, ResearchTable_Key1 ....;
User2 Tables - TransactionTable_Key2, AssetTable_Key2, ResearchTable_Key2 ....;
The Key1, Key2 etc.. values would be retrieved based on the MembershipID data when the account was created. This could result in a very large number of tables over time. I'm not sure if this will limit scalability by setting up the database in this way. Any recommendations?
Edit: I should mention that some of these tables would contain 20k+ rows.

Realistically it sounds like you only really need one database for this.
From the way you worded your question, it sounds like you're trying to dynamically create tables for users as they create accounts. I wouldn't recommend this method.
What you want to do is create a master table that contains a primary key for each individual user. I'm assuming this is the Membership table. Then create the ~20 tables that you need for the profiles of these members. Every record, no matter the number of users that you have, will go into these tables. These 20 tables would need to have a foreign key pointing to the unique identifier of the Membership table.
When you want to query a Member for their user information, just select from the tables where the membership table's primary Id matches the foreign key in the profile tables.
This would result in only a few tables in the end and is easily maintainable and follows better database design.

Your ORM layer (EF, LINQ, DAL code) will hate having to deal with one set of tables per tenant. It is much better to have either one set of tables for all tenant in a single database, or a separate database per tenant. The later is only better if schema upgrade has to be vetted by tenant (like Salesforce.com has). If you can afford to upgrade all tenant to a new schema at once then there is no reason for database per tenant.
When you design a schema that hold multiple tenant the important things to remember are
don't use heaps, all tables must be clustered index
add the tenant ID as the leftmost key to every clustered
add the tenant ID as the leftmost key to every non-clustered index too
add the Left.tenantID = right.tenantID predicate to every join
add the table.TenantID = #currentTenantID to every query
These are fairly simple rules and if you obey them (with no exceptions) you will get a perfect partitioning per tenant of every query (no query will ever ever scan rows in a range of a different tenant) so you eliminate contention between tenants. To be more through, you can disable lock escalation to make sure no tenant escalates to block every other tenant.
This design also lends itself to table partitioning and to sharing the database for scale-out.

You definitely don't want to create a set of tables for each user, and you would want these only in one database. Even with SQL Server 2008's large capacity for tables (note really total objects in database), it would quickly become unmanageable. Your best bet is to use 20 tables, and separate them via a column into user areas. You might consider partitioning the tables by this user value, but that should be tested for performance reasons too.

Yes, since the tables only contain id, key, and value, why not make one single table?
Have the columns:
id, user ID, key, value
Put an Index on the user ID field.
A key idea behind a relational database is that the table structure does not change. You create a solid set of tables, and these are the "bones" of your application.
Cheers,
Daniel

Neal,
The solution really depends on your requirement. If security and data access are concern and you have only a handful of users, you can set up a different db for each user with access for him set to only his/her database.
Other wise, what Daniel Williams suggested is a good alternative where you have one DB and tables laid out with a indexed column partitioning the users data rows.

It's hard to tell from the summary, but it looks like you are designing for dynamic attribution by user. This design approach is called EAV (Entity-Attribute-Value) and consists of a simple base collection key (UserID, SiteID, ProductID...) and then rows consisting of name/value pairs. In a more complex version, categories are sometimes added as "super columns" to the tuple/row and provide sub-groupings for a set of name/value pairs.
Designing in this way moves responsibility for data type integrity, relational integrity and tuple integrity to the application layer.
The risk with doing this in a relational system involves the breaking of the tuple or row into a set of rows. Updates, deletes, missing values and the definition of a tuple are no longer easily accessible through human interaction. As your application evolves and the definition of a tuple changes, it becomes almost impossible to tell if a name/value pair is missing because it's part of an earlier-version tuple or because it was unintentionally deleted. Ad-hoc research as well becomes harder to manage as business analysts must keep an understanding of the virtual structure either in their heads or in documentation provided.
If you are looking to implement an EAV model, I would suggest you look at a non-relational solution (nosql) like MongoDB or CouchDB. These stores allow a developer to save and retrieve "documents" or json-formatted messages that are essentially made up of a collection of name/value pairs and can look very much like a serialized object. The advantage here is that you can store dynamic attribution without breaking your tuple. You always know that you have a complete tuple because you can store and retrieve it as a single "blob" of information that can be serialized and deserialized at-will. You can also update single attributes within the tuple, if that's a concern.
MongoDB also provides some database-like features such as multiple-attribute indexes, a query engine that is robust in comparison to other similar non-relational offerings and a sharding solution that is much less trouble than trying to do it with MySQL.
I hope this helps.

Related

Are multiple dynamoDB queries in a single API request bad practice

I'm trying to create my first DynamoDB based project and I'm having some trouble figuring out the best practices working with a NoSQL database.
My usecase currently is storing users and teams. I have a table that has a partition key of either USER#{userId} or TEAM{#teamId}. If the PK is TEAM{#teamId} I store records with SK either TEAM#{teamId} for team details, or USER#{userId} for the user's details in the team (acceptedInvite, joinDate etc). I also have a GSI based on the userId/email column that allows me to query all the teams a user has been invted to, or the user's team, depending on the value of acceptedInvite field. Attached screenshots of the table structure at the moment:
The table
The GSI
In my application I have an access pattern of getting a team's team members, given a user id.
Currently, I'm doing two queries in my lambda function:
Get user's team, by querying the GSI on PK = {userId} and fitler acceptedInvite = true
Get the team data by querying the table on PK = {teamId} and SK begins_with USER#
This works fine, but I'm concerned I need to preform two separate DynamoDB calls in my API function.
I'm wondering if there's a better way to represent this access pattern and if multiple dynamoDB calls are actually that bad, since I cannot see another way to do this.
Any kind of feedback is appreciated!
The best way to avoid making two queries like this is to supply the API caller with all the information needed to make a single DynamoDB request. For your case this means supplying the caller with the teamId. You can do this as either as part of a list operation response, or if it is the authenticated user, then as part of their claims in a JWT.

DynamoDB Modeling Multiple Query Elements

Background: I have a relational db background and have never built anything for DynamoDB that wasn't just used for fast writes with very few reads. I am trying to learn DynamoDB patterns by migrating one of my help desk apps from MySQL to DynamoDB.
The application is a fairly simple one from a data storage perspective. A user submits a request and that request generates 1 or more tickets.
Setup: I have screens where people see initial requests and that request's tickets and search views that allow support to query on a bunch of attributes of a ticket (last name of user, status of ticket, use case of ticket, phone number of user, dept of user). This design in a SQL db is pretty straightforward but in Dynamo, I'm really being thrown for a loop on how to structure primary/sort keys and secondary indexes (if necessary).
I created one collection for requests and one collection for tickets. The individual requests have an array of ticket ids that belong to it. The ticket item has an attribute that stores the request id so that I can search that way. But what I am hung up on, is how do I incorporate searching on a ticket/request's attributes without having to do a full scan?
I read about composite keys and perhaps creating a composite sort key similar to: ## so that I can search on each of those fields directly without having to know the primary key (ticket id).
Question: How do you design dynamo collections/tables that require querying a lot of different attribute values without relying on a primary key?
This is typically something that DynamoDB is not good at, not to say it definitely cannot be done. The strength and speed for DynamoDB comes from having well known access patterns and designing your schema for these patterns. In general if you don't know what your users will search for, or there are many different possible queries, it's better to look at something like RDS or a native SQL DB. That being said a possible direction to solve this could be to create multiple lists for each of the fields and duplicate the data. This could all be done in the same table.

DynamoDB optimized search for common parent

So Im designing currently three tables, an organization, organization_relationships, members.
Organization
OrgID PK
Metdata..
Org_Relationships
ParentOrgID PK
ChildOrgID Range/GSI
Member
OrgID PK
MemberID Range/GSI
One way that I need to access data, is by determining whether two members share a parent organization. With the way this is right now, I would basically have to do a weird search on the tables, that requires multiple calls to the table to determine whether two members belong to the same parent organization. With that being said is there a more efficient way of designing the table to do this without requiring multiple calls to the table.
The reason you're having to perform multiple queries is because you've modeled the relationship across several tables. This is a common approach when using traditional relational databases, but could be considered an anti-pattern with NoSQL databases.
Keep in mind that DynamoDB does not have a join operation like SQL databases. Therefore, it is a best practice to store related data in the same DynamoDB table. This can be counter-intuitive if you're used to working with relational DBs.
There are several ways to model your data in DynamoDB. The approach you choose depends on your access patterns. In other words, you store your data in a way that makes it easier to get the data your application needs.
For example, here's one way to model Users and Organizations:
The primary key is made up of a user id (e.g. USER#) and a sort key of META. This record (called an "item") in DynamoDB is where I'll define various user attributes. In this example, I've provided a name and an org attribute.
For illustrative purposes, I've also created a global secondary index (GSI) that swaps the partition key/sort key pattern in your base table. Your GSI will look like this:
This lets you fetch all users by organization.
If I wanted to check if two users are in the same organization, I can either query the GSI, or fetch both user records and compare the org fields.
This is just an example meant to give you a starting point with NoSQL design. The key takeaways here are:
NoSQL (or non-relational) data modeling is different than SQL (relational) data modeling.
You want to store related data in the same table.
How you store your data depends entirely on how you plan to use the data.

Is there a way to enforce a schema constraint on an AWS DynamoDB table?

I'm a MSSQL developer who recently was tasked with building a new application using DynamoDB since we use AWS and we wanted a highly scaleable database service.
My biggest concern is data integrity. For example, I have a table for all my users where every row needs to have a username, email, and name field, all strings, with a verified field that's an int. Is there anyway to require all entries in that table to have those fields and to be of that particular type?
Since the application is in PHP I'm using Kettle as my ORM which should prevent me from messing up the data integrity but another developer voiced a concern about if we ever add another application or if someone manually changes some types via the console.
https://github.com/inouet/kettle
Currently, no, you are responsible for maintaining the integrity of your items with respect to the existence of attributes that are not keys on the base table. However, you can use LSI and GSI to enforce data types of attributes (notwithstanding my qualm that this is not a recommended pattern, as it could cause partition heat especially for attributes whose range of values is small). For example, verified seems like it might take only 0 or 1 as a value, so if you create a GSI with PK=verified where verified is a Number, writes to the base table may get throttled by the verified GSI.

Storing messages and threads in Windows Azure Table Storage

I am designing a simple messaging service using ASP.NET MVC / Windows Azure Table Storage. I have two kinds of entities - messages and message threads. Relation between them is simple - each thread can have multiple messages but the message can only be assigned to one thread.
Table storage is not a relational DB, so representing relations is always a bit tricky. I need to decide between 2 approaches:
Having one big table for threads and one for messages. And having threadId as a partition key of message entity so that messages are partitioned by threads.
Dynamically creating a special table for each message thread and having threadId as a name of the table.
I tend to prefer the second because it fits better into architecture of the rest of the service. But there will obviously be large number of tables created in a storage account.
Do you think this may be a problem?
You could also consider having just one table, that stores both Thread and Message entities. This would give you transaction support, and you could use Lucifure's hybrid approach on this table.
Creating a large number of tables may be an issue, depending on how you want to manage them. The underlying REST API for listing tables works like a query for table entities. It only returns the first 1000 tables, after that you have to use a continuation token. All of the storage explorers I've seen don't allow you to query tables based on name, they simply like the first 1000 tables. If you end up with 20000 threads, it could take you a while to get to the table you want.
One way you could mitigate this is to put your message table in its own storage account. This way your storage account with all of your other tables won't get crowded out by all of these dynamic tables that you will be creating and possibly deleting.
Deleting is actually one of the ways in which using a separate table for each thread would be easier. To delete all of the related messages you simply have to delete one table rather than iterating over each message and deleting it.
Everything else however will be more complicated than keeping all of the messages in one table. If this is core functionality to your app and you can dedicate enough time to develop it this way, one table per thread is probably a good idea. Otherwise the easy way to do things is with one big table.
You may consider a hybrid approach to keep the number of tables to a manageable level, depending on your scalability needs.
My experience has been that date based partitioning at the table level is a very effective approach and can be leverage across the board.
For example you could partition tables based on date and with a granularity of day or month. So a table name like “Thread201202” could be used for all threads started in February 2012.
Your thread id would implicitly include the “201202” and be something like “201202-myid01” although you would not need to explicitly store it in the partition key since it would be implied in the table name.
Aged threads could then be easily disposed by deleting tables say more than a year old.

Resources