Create new database programmatically in Asp.Net MVC application? - asp.net

I have worked on a timesheet application application in MVC 2 for internal use in our company. Now other small companies have showed interest in the application. I hadn't considered this use of the application, but it got me interested in what it might imply.
I believe I could make it work for several clients by modifying the database (Sql Server accessed by Entity Framework model). But I have read some people advocating multiple databases (one for each client).
Intuitively, this feels like a good idea, since I wouldn't risk having the data of various clients mixed up in the same database (which shouldn't happen of course, but what if it did...). But how would a multiple database solution be implemented specifically?
I.e. with a single database I could just have a client register and all the data needed would be added by the application the same way it is now when there's just one client (my own company).
But with a multiple database solution, how would I create a new database programmatically when a user registers? Please note that I have done all database stuff using Linq to Sql, and I am not very familiar with regular SQL programming...
I would really appreciate a clear detailed explanation of how this could be done (as well as input on whether it is a good idea or if a single database would be better for some reason).
EDIT:
I have also seen discussions about the single database alternative, suggesting that you would then add ClientId to each table... But wouldn't that be hard to maintain in the code? I would have to add "where" conditions to a lot of linq queries I assume... And I assume having a ClientId on each table would mean that each table would have need to have a many to one relationship to the Client table? Wouldn't that be a very complex database structure?
As it is right now (without the Client table) I have the following tables (1 -> * designates one to many relationship):
Customer 1 -> * Project 1 -> * Task 1 -> * TimeSegment 1 -> * Employee
Also, Customer has a one to many relationship directly with TimeSegment, for convenience to simplify some queries.
This has worked very well so far. Wouldn't it be possible to simply have a Client table (or UserCompany or whatever one might call it) with a one to many relationship with Customer table? Wouldn't the data integrity be sufficient for the other tables since the rest is handled by the relationships?

as far as whether or not to use a single database or multiple databases, it really all depends on the use cases. more databases means more management needs, potentially more diskspace needs, etc. there are alot more things to consider here than just how to create the database, such as how will you automate the backup process creation, etc. i personally would use one database with a good authentication system that would filter the data to the appropriate client.
as to creating a database, check out this blog post. it describes how to use SMO (sql management objects) in c#.net to create a database. they are a really neat tool, and you'll definitely want to familiarize yourself with them.
to deal with the follow up question, yes, a single, top level relationship between clients and customers should be enough to limit the new customers to their appropriate data.
without any real knowledge about your application i can't say how complex adding that table will be, but assuming your data layer is up to snuff, i would assume you'd really only need to limit the customers class by the current client, and then get all the rest of your data based on the customers that are available.
did that make any sense?

See my answer here, it applies to your case as well: c# database architecture

Related

Does DynamoDB GSI overloading give performance benefits or just flexibility

Does GSI Overloading provide any performance benefits, e.g. by allowing cached partition keys to be more efficiently routed? Or is it mostly about preventing you from running out of GSIs? Or maybe opening up other query patterns that might not be so immediately obvious.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html
e.g. I you have a base table and you want to partition it so you can query a specific attribute (which becomes the PK of the GSI) over two dimensions, does it make any difference if you create 1 overloaded GSI, or 2 non-overloaded GSIs.
For an example of what I'm referring to see the attached image:
https://drive.google.com/file/d/1fsI50oUOFIx-CFp7zcYMij7KQc5hJGIa/view?usp=sharing
The base table has documents which can be in a published or draft state. Each document is owned by a single user. I want to be able to query by user to find:
Published documents by date
Draft documents by date
I'm asking in relation to the more recent DynamoDB best practice that implies that all applications only require one table. Some of the techniques being shown in this documentation show how a reasonably complex relational model can be squashed into 1 DynamoDB table and 2 GSIs and yet still support 10-15 query patterns.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-relational-modeling.html
I'm trying to understand why someone would go down this route as it seems incredibly complicated.
The idea – in a nutshell – is to not have the overhead of doing joins on the database layer or having to go back to the database to effectively try to do the join on the application layer. By having the data sliced already in the format that your application requires, all you really need to do is basically do one select * from table where x = y call which returns multiple entities in one call (in your example that could be Users and Documents). This means that it will be extremely efficient and scalable on the db level. But also means that you'll be less flexible as you need to know the access patterns in advance and model your data accordingly.
See Rick Houlihan's excellent talk on this https://www.youtube.com/watch?v=HaEPXoXVf2k for why you'd want to do this.
I don't think it has any performance benefits, at least none that's not called out – which makes sense since it's the same query and storage engine.
That being said, I think there are some practical reasons for why you'd want to go with a single table as it allows you to keep your infrastructure somewhat simple: you don't have to keep track of metrics and/or provisioning settings for separate tables.
My opinion would be cost of storage and provisioned throughput.
Apart from that not sure with new limit of 20

Storing messages and threads in Windows Azure Table Storage

I am designing a simple messaging service using ASP.NET MVC / Windows Azure Table Storage. I have two kinds of entities - messages and message threads. Relation between them is simple - each thread can have multiple messages but the message can only be assigned to one thread.
Table storage is not a relational DB, so representing relations is always a bit tricky. I need to decide between 2 approaches:
Having one big table for threads and one for messages. And having threadId as a partition key of message entity so that messages are partitioned by threads.
Dynamically creating a special table for each message thread and having threadId as a name of the table.
I tend to prefer the second because it fits better into architecture of the rest of the service. But there will obviously be large number of tables created in a storage account.
Do you think this may be a problem?
You could also consider having just one table, that stores both Thread and Message entities. This would give you transaction support, and you could use Lucifure's hybrid approach on this table.
Creating a large number of tables may be an issue, depending on how you want to manage them. The underlying REST API for listing tables works like a query for table entities. It only returns the first 1000 tables, after that you have to use a continuation token. All of the storage explorers I've seen don't allow you to query tables based on name, they simply like the first 1000 tables. If you end up with 20000 threads, it could take you a while to get to the table you want.
One way you could mitigate this is to put your message table in its own storage account. This way your storage account with all of your other tables won't get crowded out by all of these dynamic tables that you will be creating and possibly deleting.
Deleting is actually one of the ways in which using a separate table for each thread would be easier. To delete all of the related messages you simply have to delete one table rather than iterating over each message and deleting it.
Everything else however will be more complicated than keeping all of the messages in one table. If this is core functionality to your app and you can dedicate enough time to develop it this way, one table per thread is probably a good idea. Otherwise the easy way to do things is with one big table.
You may consider a hybrid approach to keep the number of tables to a manageable level, depending on your scalability needs.
My experience has been that date based partitioning at the table level is a very effective approach and can be leverage across the board.
For example you could partition tables based on date and with a granularity of day or month. So a table name like “Thread201202” could be used for all threads started in February 2012.
Your thread id would implicitly include the “201202” and be something like “201202-myid01” although you would not need to explicitly store it in the partition key since it would be implied in the table name.
Aged threads could then be easily disposed by deleting tables say more than a year old.

Caching result of SELECT statement for reuse in multiple queries

I have a reasonably complex query to extract the Id field of the results I am interested in based on parameters entered by the user.
After extracting the relevant Ids I am using the resulting set of Ids several times, in separate queries, to extract the actual output record sets I want (by joining to other tables, using aggregate functions, etc).
I would like to avoid running the initial query separately for every set of results I want to return. I imagine my situation is a common pattern so I am interested in what the best approach is.
The database is in MS SQL Server and I am using .NET 3.5.
It would definitely help if the question contained some measurements of the unoptimized solution (data sizes, timings). There is a variety of techniques that could be considered here, some listed in the other answers. I will assume that the reason why you do not want to run the same query repeatedly is performance.
If all the uses of the set of cached IDs consist of joins of the whole set to additional tables, the solution should definitely not involve caching the set of IDs outside of the database. Data should not travel there and back again if you can avoid it.
In some cases (when cursors or extremely complex SQL are not involved) it may be best (even if counterintuitive) to perform no caching and simply join the repetitive SQL to all desired queries. After all, each query needs to be traversed based on one of the joined tables and then the performance depends to a large degree on availability of indexes necessary to join and evaluate all the remaining information quickly.
The most intuitive approach to "caching" the set of IDs within the database is a temporary table (if named #something, it is private to the connection and therefore usable by parallel independent clients; or it can be named ##something and be global). If the table is going to have many records, indexes are necessary. For optimum performance, the index should be a clustered index (only one per table allowed), or be only created after constructing that set, where index creation is slightly faster.
Indexed views are cleary preferable to temporary tables except when the underlying data is read only during the whole process or when you can and want to ignore such updates to keep the whole set of reports consistent as far as the set goes. However, the ability of indexed views to always accurately project the underlying data comes at a cost of slowing down those updates.
One other answer to this question mentions stored procedures. This is largely a way of organizing your code. However, it if you go this way, it is preferable to avoid using temporary tables, because such references to a temporary table prevent pre-compilation of the stored procedure; go for views or indexed views if you can.
Regardless of the approach you choose, do not guess at the performance characteristics and query optimizer behavior. Learn to display query execution plans (within SQL Server Management Studio) and make sure that you see index accesses as opposed to nested loops combining multiple large sets of data; only add indexes that demonstrably and drastically change the performance of your queries. A well chosen index can often change the performance of a query by a factor of 1000, so this is somewhat complex to learn but crucial for success.
And last but not least, make sure you use UPDATE STATISTICS when repopulating the database (and nightly in production), or your query optimizer will not be able to put the indexes you have created to their best uses.
If you are planning to cache the result set in your application code, then ASP.NET has cache, Your Winform will have the object holding the data with it with which you can reuse the data.
If planning to do the same in SQL Server, you might consider using indexed views to find out the Id's. The view will be materialized and hence you can get the results faster. You might even consider using a staging table to hold the id's temporarily.
With SQL Server 2008 you can pass table variables as params to SQL. Just cache the IDs and then pass them as a table variable to the queries that fetch the data. The only caveat of this approach is that you have to predefine the table type as UDT.
http://msdn.microsoft.com/en-us/library/bb510489.aspx
For SQL Server, Microsoft generally recommends using stored procedures whenever practical.
Here are a few of the advantages:
http://blog.sqlauthority.com/2007/04/13/sql-server-stored-procedures-advantages-and-best-advantage/
* Execution plan retention and reuse
* Query auto-parameterization
* Encapsulation of business rules and policies
* Application modularization
* Sharing of application logic between applications
* Access to database objects that is both secure and uniform
* Consistent, safe data modification
* Network bandwidth conservation
* Support for automatic execution at system start-up
* Enhanced hardware and software capabilities
* Improved security
* Reduced development cost and increased reliability
* Centralized security, administration, and maintenance for common routines
It's also worth noting that, unlike other RDBMS vendors (like Oracle, for example), MSSQL automatically caches all execution plans:
http://msdn.microsoft.com/en-us/library/ms973918.aspx
However, for the last couple of versions of SQL Server, execution
plans are cached for all T-SQL batches, regardless of whether or not
they are in a stored procedure
The best approach depends on how often the Id changes, or how often you want to look it up again.
One technique is to simply store the result in the ASP.NET object cache, using the Cache object (also accessible from HttpRuntime.Cache). For example (from a page):
this.Cache["key"] = "value";
There are many possible variations on this theme.
You can use Memcached to cache values in the memory.
As I see there are some .net ports.
How frequently does the data change that you'll be querying? To me, this sounds like a perfect scenario for data warehousing, where you flatting the data for quicker data retrieval and create the tables exactly as your 'DTO' wants to see the data. This method is different than an indexed view in that it's simply a table which will have quick seek operations, and could especially be improved if you setup the indexes properly on the columns that you plan to query
You can create Global temporary Table. Create the table on the fly. Now insert the records as per your request. Access this table in your next request in your joins... for reusability

Is it a good practice to avoid ad hoc sql altogether in ASP.NET applications?

Instead create only stored procedures and call them from from the code?
There is a place for dynamic SQL and/or ad hoc SQL, but it needs to be justified based on the particular usage needs.
Stored procedures are by far a best practice for almost all situations and should be strongly considered first.
This issue is a little bigger than just procs or ad hoc, because the database has a wide variety of tools to define its interface, including tables, views, functions and procedures.
People here have mentioned the execution plans and parameterization but, by far, the most important thing in my mind is that any technique which relies on exposed base tables to users means that you lose any ability for the database to change its underlying implementation or control security vertically or horizontally. At the very least, I would expose only views to a typical application/user/role.
In a scenario where the application or user's account only has access to EXEC SPs, then there is no possibility of that account being able to even have a hope of using a SQL injection of the form: "; SELECT name, password from USERS;" or "; DELETE FROM USERS;" or "; DROP TABLE USERS;" because the account doesn't have anything but EXEC (and certainly no DDL). You can control column visibility at the SP level and not have to deny select on an employee salary column, for example.
In other words, unless you are comfortable granting db_datareader to public (because that's effectively what you are doing when you LINQ-to-tables), then you need some sort of realistic security in your application, and SPs are the only way to go, with LINQ-to-views possibly being acceptable.
Depends entirely on what you're doing.
As a general rule a stored proc will have it's query plan cached better than a dynamically generated SQL statement. It will also be slightly easier to maintain indexes for.
However, dynamically generated SQL statements can have their query plans cached, so the difference is marginal.
Dynamically generated SQL statement can also introduce security risk - always parameterise them.
That said sprocs are a pain to maintain and update, they separate DB-logic and .Net code in a way that makes it harder for developers to piece together what a data access method is doing.
Also, to fix or update a SQL string you just change code. To fix or update a sproc you have to change the database - often a much messier option.
So I wouldn't recommend that as a 'one size fits all' best practice.
There is no right or wrong answer here. There are benefits to both which can be easily obtained through a google search. Different projects with different requirements may lead you to different solutions. It's not as black or white as you might want it to be. You might as well throw ORMs into the mix. If you prefer sql queries in your data layer as opposed to stored procs, make sure you use parametrized queries.
sql in sp- easy to maintain, sql in app- pain in the butt ot maintain.
it's so much faster and easier to hop onto a sql instance, modify an sp, test it, then deploy the sp, instead of having to modify the code in the app, test it, then deploy the app.
It depends on the data distribution in your table. Prepared query plans and stored procedures get cached, and the plan itself depends on the table statistics.
Suppose you've building a blog and that your posts table has a user_id. And that you're frequently doing stuff like:
select posts.* from posts where user_id = ? order by published desc limit 20;
Suppose indexes on posts (user_id) and posts (published desc).
Additionally suppose that you've two authors, author1 which wrote 3 posts a long time ago, and author2 who has written 10k posts since.
In this case, the query plan of the ad hoc query will be very different depending on whether you're fetching the author1 posts or the author2 posts:
For author1, the database will decide to use the index on user_id and sort the results.
For author2, the database will read the first 20 rows using the index on published.
If you prepare the statement, the planner will pick either of the two. Suppose the second (which I think is likely): applied to author1, this means going through the whole table by way of the index -- which is much slower than the optimal plan.
If simplicity is your goal, then an ORM would be a good practice for your simple database operations
ORMs like Entity Framework, nHibernate, LINQ to SQL, etc. will manage the code creation of the data access and repository layers and provide you with strongly typed objects representing your tables. This can lead to a cleaner, more maintainable architecture.
Save the stored procedures for your more complex queries. This is where you can take advantage of advanced SQL and cached query plans.
Dynamic SQL - Bad
Stored Procedures - Better
Linq-To-SQL or Linq-to-EF (or ORM tools) - Best
You do not want dynamic SQL inside your application since you do not have compile-time checking. Stored procedures will at least be checked, but it is still not part of a cohesive usnit and removes business logic to the database layer. Linq-To-EF will allow business logic to stay inside your application and allow you to have compile-time checking of syntax.

Use ASP.NET Profile or not?

I need to store a few attributes of an authenticated user (I am using Membership API) and I need to make a choice between using Profiles or adding a new table with UserId as the PK. It appears that using Profiles is quick and needs less work upfront. However, I see the following downsides:
The profile values are squished into a single ntext column. At some point in the future, I will have SQL scripts that may update user's attributes. Querying a ntext column and trying to update a value sounds a little buggy to me.
If I choose to add a new user specific property and would like to assign a default for all the existing users, would it be possible?
My first impression has been that using profiles may cause maintainance headaches in the long run. Thoughts?
There was an article on MSDN (now on ASP.NET http://www.asp.net/downloads/sandbox/table-profile-provider-samples) that discusses how to make a Profile Table Provider. The idea is to store the Profile data in a table versus a row, making it easier to query with just SQL.
More onto that point, SQL Server 2005/2008 provides support for getting data via services and CLR code. You could conceivably access the Profile data via the API instead of the underlying tables directly.
As to point #2, you can set defaults to properties, and while this will not update other profiles immediately, the profile would be updated when next it is accessed.
Seems to me you have answered your own question. If your point 1 is likely to happen, then a SQL table is the only sensible option.
Check out this question...
ASP.NET built in user profile vs. old stile user class/tables
The first hint that the built-in profiles are badly designed is their use of delimited data in a relational database. There are a few cases that delimited data in a RDBMS makes sense, but this is definitely not one of them.
Unless you have a specific reason to use ASP.Net Profiles, I'd suggest you go with the separate tables instead.

Resources