Database design - one database, multiple sites

Database design - one database, multiple sites - asp.net

I know this question has been asked so many times before but I couldn't find the exact answer for mine. So please let me ask it here.
We built a CMS to control one site. Now the company is expanding and we have a couple more sites with almost identical core structure. We decided to go with one database for easy maintenance later on.
We have about 10 tables. For example, Pages, News, Settings (different for each site), ...
It doesn't sound like a good idea if we add one App_ID (or Site_ID) column to every each of these tables so we know what records from which particular site we should pull out.
For instance,
PAGES:
PageID | Body | SiteID
1 | abc | 1
2 | cde | 1
3 | aafd | 2
4 | gsgs | 2
5 | feg | 3
I think it is very abundant to add this SiteID Column to every table in this Database.
I looked carefully at the Multi-Tenant Architecture but I don't know how to apply it to our sites CMS.
What is the best way to handle this situation, please help. Any enlightenment is appreciated.
Simple Code

We've recently been reviewing various strategies for multi-tenant single database.
If you don't want to add a Site identifier to your tables then you could give each tenant their own schema and set of tables. Each tenant could have their own connection string which only provides access to their schema (you would obviously need to switch connection at runtime (easy enough with both EF and NH if you are using these).
However, the option we opted for was to introduce an additional level into our application such that each component of the application (in your case News, Pages etc.) was represented as a feature in the database.
Each site then has a collection of feature "instances" and the data stored for each of those features (a blog might have posts, tags, categories) has a reference to the feature instance (not the site).
This does add additional complexity but we have found it to be extremely flexible and decouples our feature data from the site (make it possible to move feature instances between sites if we wanted to).

Mixed answer. Work in a similar project, same database structure, several sites.
We tried several stuff.
We are fan of "best practices", "database normalization", "design patterns", but, we ended using a practical approach, more than theorical.
We had one or more databases for each site / company, and each database table had a "site_id", and it worked.
We had some cases where a single company decided to split their database site for each division, so one database become several databases sometimes within the same database server, sometimes different database server.
We had a case where a company with a single site, buy a smaller company, added a new site with same database structure, and after 5 yeas, they merge the data.
The several sites plus "site_id", worked well.

Multiple databases would be the recommended way since it allows simpler backup and restoration of a single site. Also, if you find you need to introduce replication slaves, replication can be more efficient with the use of separate databases. Most multi-site hosting solutions I know of, use multiple databases.
However, if you are intent on a single database, the other option is to duplicate the tables with a prefix indicating which site it belongs to.

Related

At what point do you need more than one table in dynamodb?

I am working on an asset tracking system that also manages the concept of "projects". The users of this application perform maintenance activities on their customer's assets, so they need an action log where actions on an asset start life as a task in a project. For example, "Fix broken frame" might be a task where an action would have something like "Used parts a, b, and c to fix the frame" with a completed time and the employee who performed the action.
The conceptual data model for the application starts with a Customer that has multiple locations and each location has multiple assets. Each asset should have an associated action log so it is easy to view previous actions applied to that asset.
To me, that should all go in one table based upon the logical ownership of that data. Customer owns Locations which own Assets which own Actions.
I believe I should have a second table for projects as this data is tangential to the Customer/Location/Asset data. However, because I read so much about how it should all be one table, I'm not sure if this delineation only exists because I've modeled the data incorrectly because I can't get over the 3NF modeling that I've used for my entire career.

Single table design doesn't forbid you to create multiple tables. Instead in encourages to use only a single table per micro-services (meaning, store correlated data, which you want to access together, in the same table).
Let's look at some anecdotes from experts:
Rick Houlihan tweeted over a year ago
Using a table per entity in DynamoDB is like deploying a new server for each table in RDBMS. Nobody does that. As soon as you segregate items across tables you can no longer group them on a GSI. Instead you must query each table to get related items. This is slow and expensive.
Alex DeBrie responded to a tweet last August
Think of it as one table per service, not across your whole architecture. Each service should own its own table, just like with other databases. The key around single table is more about not requiring a table per entity like in an RDBMS.
Based on this, you should answer to yourself ...
How related is the data?
If you'd build using a relational database, would you store it in separate databases?
Are those actually 2 separate micro services, or is it part of the same micro service?
...
Based on the answers to those (and similar) questions you can argue to either keep it in one table, or to split it across 2 tables.

Migrate Plone users and groups to relational data

I have a Plone 4 site which contains a lot of users and groups which are stored in the ZODB. Over time, we added some functionality which uses relational data (in a PostgreSQL database); some tables have fields which contain user or group ids.
However, currently the users and groups are defined in ZODB rather than the RDB, so we don't have proper foreign keys here. Thus, the obvious idea is to migrate the user and groups data to the RDB - those who/which are used by the Plone site, at least; I assume emergency users need to be an exception to this (but those are no members of any groups anyway).
Would this be a good thing to do?
Are there reasons to do it only partly, or should I transfer everything including group memberships? (Since memberships are stored as lists of users (and/or groups) with the containing group, I could imagine a reverse table which holds all groups a user is member of, and which is maintained by a trigger function.)
Are there any special tools to use?
Thank you!

imho it's based on what you want to achieve. In Plone you have PAS, so technically it doesn't really matter, where you put users, groups and user group relationships.
You can store users/groups in:
Plone (by default)
SQL - pas.plugins.sqlalchemy
LDAP/AD - Products.PloneLDAP
There are also many other plugins for AUTH, like RPX, Goolge+, etc.
You can enable, disable and modify the behabvior of every plugin thru PAS.
Does it make sense, to NOT use Plone users?
Of course, if you want to share user credentials (Example LDAP), or if you need the user informations in other Apps, etc.
Migration
Should be very simple if the PAS plugins you are using supports "Properties" and "User enumeration".
Get the data from one plugin and put the data into another one with a simple python script. Both supports the same API.

the tool you're looking for is https://pypi.python.org/pypi/pas.plugins.sqlalchemy/0.3
I've used this in a webportal where users are "shared" with a newsletter system.
I've 200 users and any problem.
I think the only "good reason" to store users in an external DB rather in zodb/plone is in a use-case like mine.
Have you ever think about "extend" plone users (ex. https://plone.org/products/collective.examples.userdata)? With plone.api you can easly manipulate users' properties in your code.

Create new database programmatically in Asp.Net MVC application?

I have worked on a timesheet application application in MVC 2 for internal use in our company. Now other small companies have showed interest in the application. I hadn't considered this use of the application, but it got me interested in what it might imply.
I believe I could make it work for several clients by modifying the database (Sql Server accessed by Entity Framework model). But I have read some people advocating multiple databases (one for each client).
Intuitively, this feels like a good idea, since I wouldn't risk having the data of various clients mixed up in the same database (which shouldn't happen of course, but what if it did...). But how would a multiple database solution be implemented specifically?
I.e. with a single database I could just have a client register and all the data needed would be added by the application the same way it is now when there's just one client (my own company).
But with a multiple database solution, how would I create a new database programmatically when a user registers? Please note that I have done all database stuff using Linq to Sql, and I am not very familiar with regular SQL programming...
I would really appreciate a clear detailed explanation of how this could be done (as well as input on whether it is a good idea or if a single database would be better for some reason).
EDIT:
I have also seen discussions about the single database alternative, suggesting that you would then add ClientId to each table... But wouldn't that be hard to maintain in the code? I would have to add "where" conditions to a lot of linq queries I assume... And I assume having a ClientId on each table would mean that each table would have need to have a many to one relationship to the Client table? Wouldn't that be a very complex database structure?
As it is right now (without the Client table) I have the following tables (1 -> * designates one to many relationship):
Customer 1 -> * Project 1 -> * Task 1 -> * TimeSegment 1 -> * Employee
Also, Customer has a one to many relationship directly with TimeSegment, for convenience to simplify some queries.
This has worked very well so far. Wouldn't it be possible to simply have a Client table (or UserCompany or whatever one might call it) with a one to many relationship with Customer table? Wouldn't the data integrity be sufficient for the other tables since the rest is handled by the relationships?

as far as whether or not to use a single database or multiple databases, it really all depends on the use cases. more databases means more management needs, potentially more diskspace needs, etc. there are alot more things to consider here than just how to create the database, such as how will you automate the backup process creation, etc. i personally would use one database with a good authentication system that would filter the data to the appropriate client.
as to creating a database, check out this blog post. it describes how to use SMO (sql management objects) in c#.net to create a database. they are a really neat tool, and you'll definitely want to familiarize yourself with them.
to deal with the follow up question, yes, a single, top level relationship between clients and customers should be enough to limit the new customers to their appropriate data.
without any real knowledge about your application i can't say how complex adding that table will be, but assuming your data layer is up to snuff, i would assume you'd really only need to limit the customers class by the current client, and then get all the rest of your data based on the customers that are available.
did that make any sense?

See my answer here, it applies to your case as well: c# database architecture

I need to Edit 100,000+ Products

I'm looking at accepting a project that would require me to clean up an existing e-commerce website. Its been relatively successful and has over 100,000 individual products - loaded both by the client and its publishers.
The site wasn't originally designed for this many products and has become fairly disorganized.
SO, the client has asked I look at a more robust search option - filterable and so forth. I completely agree it needs to be improved, but after looking at the database, I can tell that there are dozens and dozens of categories and not everything is labeled correctly etc.
Is there any database management software that could help me clean up 100,000 entries quickly? Make categories consistent - fix uppercase/lowercase problems etc.
Are there any companies out there that I can source just this particular part of the project to?
Its a massive amount of data-entry. If I spent 2 minutes per product, it would take me 6 months full time to just to complete the database cleanup. I either need to get it down to a matter of seconds per product or find a company that specializes in this type of work.
I don't even know what to search for on Google.
Thanks guys!
--
Thanks everyone for your ideas! I have a lot of options now so I feel a lot more comfortable heading in to this project. Right now I think the direction we will go is to build a tool that allows the client to hire data entry people that can update it as necessary. Then I will work as a consultant, taking care of any UPDATE-WHERE type functions as necessary.
Thanks again!

If there are inconsistencies like you are describing, it sounds like the problem may be more an issues of a bad data model (i.e. lack of normalization) than just dirty data. If good normalization is in place, cleaning up categories should be as simple as updating a single record per each category - but if category name is used instead of a foreign key, then you will most likely need to perform a series of UPDATE WHERE statements to clean up the text.
You may want to look into an ETL (extract, transform, load) tool that can help with bulk data transformation. I'm not familiar with ETL tools for mysql, but I'm sure they exist. SQL Server has a build in service called SQL Integration Services that provides the ability to extract data from an existing data source, perform bulk changes or transformations, and then reload the data back into a destination database. Tools like this may help speed up the process of standardizing capitalization, punctuation, changing categories etc.
Even still, don't overlook the possibility that the data model may need tweaking to help prevent this type of situation in the future.
Edit: Wikipedia has a list of opensource ETL products that you may want to investigate.

In any case you'll probability need to do more than "clean the data", which means you'll need to build new normalized tables. So start there, build a new database that is fully normalized, import the data "as is", with all the duplicate categories, etc.
for example, new tables:
Items
ItemID int identity/auto number
ItemName string
CategoryID int
....
Categories
CategoryID int identity/auto number
CategoryName string
....
import the bad data into the new system:
Items
ItemID ItemName CategoryID
1 thing A 1
2 thing B 2
3 thing C 3
4 thing D 1
Categories
CategoryID CategoryName
1 Game
2 food
3 games
now, you can consolidate the data using the PKs
UPDATE Items
SET CategoryID=1
WHERE CategoryID=3
DELETE Categories
WHERE CategoryID=3
You might just write an application where the customer can do the consolidation. Let them select the duplicates on a screen and merge to a selected parent category. you have this application do the merge sql from above.
If there are issues of needing to have a clean cut over date, create an application that generates a series of "Map" tables, where you store the CategoryNameOld="games" and the CategoryNameNew="Game" and use these when you do the conversion/load of the bad data into the new system's tables.

I would implement the new search system or whatever and build them a tool that would allow them to easily go through and cleanup the listings, re-categorize, etc. This task requires domain knowledge, so they're the best ones to do it.
Do some number crunching so they can prioritize the list and clean in order of importance.

Keep in mind that one or your options is to build a crappy interface that somebody can use to edit records, hire half a dozen data-entry people from a temp agency, spend two days training them, and let them go to town.

SQL Server hosting only offers 1GB databases. How do I split my data up?

Using ASP.NET and Windows Stack.
Purpose:
Ive got a website that takes in over 1GB of data about every 6 months. So as you can tell my database can become huge.
Problem:
Most hosting providers only offer Databases in 1GB increments. This means that every time I go over another 1GB, I will need to create another Database. I have absolutely no experience in this type of setup and Im looking for some advice on what to do?
Wondering:
Do I move the membership stuff over to a separate database? This still won't solve much because of the size of the other data I have.
Do I archive data into another database? If I do, how to I allow users to access it?
If I split the data between two databases, do I name the tables the same?
I query all my data with LINQ. So establishing a few different connections wouldn't be a horrible thing.
Is there a hosting provider that anyone knows of that can scale their databases?
I just want to know what to do? How can I solve this dilemma? I don't have the advertising dollars coming in to spend more than $50 a month so far...
While http://www.ultimahosts.net/windows/vps/ seems to offer the best solution for the best price, they still split the databases up. So where do I go from here?
Again, I am a total amateur to multiple databases. Ive only used one at a time..

I'd be genuinely surprised if they actually impose a hard 1GB per DB limit and create a new one for each additional GB, but on the assumption that that actually is the case -
Designate a particular database as your master database. This is the only one your app will directly connect to.
Create a clone of all the tables you'll need in your second (and third, fourth etc) databases.
Within your master database, create a view that does a UNION on the tables as a cross-DB query - SELECT * FROM Master..TableName UNION SELECT * FROM DB2..TableName UNION SELECT * FROM DB3..TableName
For writing, you'll need to use sprocs to locate the relevant records and update them, but you shouldn't have a major problem there. In principle you could extend the view above to return which DB the record was in if you wanted.

Answering this question is very hard for it requires knowing at least some basic facts about the data model, the way the data is queried, etc. Also as suggested by rexem, a better understanding of the use model may allow using normalization to limit the growth (and I had may also allow introducing compression, if applicable)
I'm more puzzled at the general approach and business model (and I do understand the need to keep cost down with a startup application based on ad revenues). Wouldn't you be able to contract an amount that will fit your need for the next 6 months, then, when you start outgrowing this space, purchase additional storage (for an extra 6 month/year, by then you may be "rich"); such may not even require anything on your end (depends on the way hosting service manages racks etc.), or at worse, may require you to copy the old database to the new (bigger) storage?
In this fashion, you wouldn't need to split the database in any artificial fashion, and hence focus on customer-oriented features, rather than optimizing queries that need to compile info from multiple servers.

I believe solution is much more simpler than that: also if your provider manage database in 1 GB space it does not means that you have N databases of 1 GB each, it means that once you reach 1 GB the database could be increased to move to 2 GB, 3 GB and so on...
Regards
Massimo

You would have multiple questions to answer:
It seems the current hosting provider can not be very reliable if it is the way you say: they create a new database every time the initial one gets more then 1GB - this sounds strange... at least they should increase the storage for the current db and announce you that you'll be charged more... Find other hosting solutions with better options...
Is there any information into your current DB that could be archived? That's a very important question since you may carry over "useless" data that could be archived into separate databases and queried only when special requests. As other colleagues told you already, that would be difficult for us to evaluate since we do not know the data model.
Can you split the data model into two total different storages and only replicate between them the common information? You could use SQL Server Replication (http://technet.microsoft.com/en-us/library/ms151198.aspx) to maintain the same membership information between the databases.
If the data model can not be splited then I do not see any practical choice to have multiple databases - just find a bigger storage solution.

You may want to look for a better hosting provider.
Even SQL Express supports a 4GB database, and it's free. Some hosts don't like using SQL Express in a shared environment, but disk space is so cheap these days that finding a plan that starts at or grows in chunks of more than 1GB should be pretty easy.

You should go for a Windows VPS solution. Most of the Windows VPS providers will offer SQL 2008 Web Edition that can support upto 10 GB of database space ...