Ways to circumvent a bad database schema? - asp.net

Our team has been asked to write a Web interface to an existing SQL Server backend that has its roots in Access.
One of the requirements/constraints is that we must limit changes to the SQL backend. We can create views and stored procedures but we have been asked to leave the tables/columns as-is.
The SQL backend is less than ideal. Most of the relationships are implicit due to a lack of foreign keys. Some tables lack primary keys. Table and column names are inconsistent and include characters like spaces, slashes, and pound signs.
Other than getting a new job or asking them to reconsider this requirement, can anyone provide any good patterns for addressing this deficiency?
NOTE: We will be using SQL Server 2005 and ASP.NET with the .NET Framework 3.5.

Views and Synonyms can get you so far, but fixing the underlying structure is clearly going to require more work.
I would certainly try again to convince the stakeholder's that by not fixing the underlying problems they are accruing technical debt and this will create a slower velocity of code moving forward, eventually to a point where the debt can be crippling. Whilst you can work around it, the debt will be there.
Even with a data access layer, the underlying problems on the database such as the keys / indexes you mention will cause problems if you attempt to scale.

Easy: make sure that you have robust Data Access and Business Logic layers. You must avoid the temptation to program directly to the database from your ASPX codebehinds!
Even with a robust database schema, I now make it a practice never to work with SQL in the codebehinds - a practice that developed only after learning the hard way that it has its drawbacks.
Here are a few tips to help the process along:
First, investigate the ObjectDataSource class. It will allow you to build a robust BLL that can still feed controls like the GridView without using direct SQL. They look like this:
<asp:ObjectDataSource ID="ObjectDataSource1" runat="server"
OldValuesParameterFormatString="original_{0}"
SelectMethod="GetArticles" <-- The name of the method in your BLL class
OnObjectCreating="OnObjectCreating" <-- Needed to provide an instance whose constructor takes arguments (see below)
TypeName="MotivationBusinessModel.ContentPagesLogic"> <-- The BLL Class
<SelectParameters>
<asp:SessionParameter DefaultValue="News" Name="category" <-- Pass parameters to the method
SessionField="CurPageCategory" Type="String" />
</SelectParameters>
</asp:ObjectDataSource>
If constructing an instance of your BLL class requires that you pass arguments, you'll need the OnObjectCreating link. In your codebehind, implement this as so:
public void OnObjectCreating(object sender, ObjectDataSourceEventArgs e)
{
e.ObjectInstance = new ContentPagesLogic(sessionObj);
}
Next, implementing the BLL requires a few more things that I'll save you the trouble of Googling. Here's an implementation that matches the calls above.
namespace MotivationBusinessModel <-- My business model namespace
{
public class ContentPagesItem <-- The class that "stands in" for a table/query - a List of these is returned after I pull the corresponding records from the db
{
public int UID { get; set; }
public string Title { get; set; } <-- My DAL requires properties but they're a good idea anyway
....etc...
}
[DataObject] <-- Needed to makes this class pop up when you are looking up a data source from a GridView, etc.
public class ContentPagesLogic : BusinessLogic
{
public ContentPagesLogic(SessionClass inSessionObj) : base(inSessionObj)
{
}
[DataObjectMethodAttribute(DataObjectMethodType.Select, true)] <-- Needed to make this *function* pop up as a data source
public List<ContentPagesItem> GetArticles(string category) <-- Note the use of a generic list - which is iEnumerable
{
using (BSDIQuery qry = new BSDIQuery()) <-- My DAL - just for perspective
{
return
qry.Command("Select UID, Title, Content From ContentLinks ")
.Where("Category", category)
.OrderBy("Title")
.ReturnList<ContentPagesItem>();
// ^-- This is a simple table query but it could be any type of join, View or sproc call.
}
}
}
}
Second, it is easy to add DAL/BLL dlls to your project as additional projects and then add a reference to the main web project. Doing this not only gives your DAL and BLL their own identities but it makes Unit testing a snap.
Third, I almost hate to admit it but this might be one place where Microsoft's Entity Framework comes in handy. I generally dislike the Linq to Entities but it does permit the kind of code-side specification of data relationships you are lacking in your database.
Finally, I can see why changes to your db structure (e.g. moving fields around) would be a problem but adding new constraints (especially indices) shouldn't be. Are they afraid that a foreign key will end up causing errors in other software? If so...isn't this sort of a good thing; you have to have some pain to know where the disease lies, no?
At the very least, you should be pushing for the ability to add indexes as needed for performance reasons. Also, I agree with others that Views can go a long way toward making the structure more sensible. However, this really isn't enough in the long run. So...go ahead and build Views (Stored Procedures too) but you should still avoid coding directly to the database. Otherwise, you are still anchoring your implementation to the database schema and escaping it in the future will be harder than if you isolate db interactions to a DAL.

I've been here before. Management thinks it will be faster to use the old schema as it is and move forward. I'd try to convince them that a new schema would cause this and all future development to be faster and more robust.
If you are still shot down and can not do a complete redesign, approach them with a compromise, where you can rename columns and tables, add PKs, FKs, and indexes. This shouldn't take much time and will help a lot.
short of that you'll have to grind it out. I'd encapsulate everything in stored procedures where you can add all sorts of checks to enforce data integrity. I'd still sneak in all the PK, FK, indexes and column renames as possible.

Since you stated that you need to "limit" the changes to the database schema, it seems that you can possibly add a Primary Key field to those tables that do not have them. Assuming existing applications aren't performing any "SELECT *..." statements, this shouldn't break existing code. After that's been done, create views of the tables that are in a uniform approach, and set up triggers on the view for INSERT, UPDATE, and DELETE statements. It will have a minor performance hit, but it will allow for a uniform interface to the database. And then any new tables that are added should conform to the new standard.
Very hacky way of handling it, but it's an approach I've taken in the past with a limited access to modify the existing schema.

I would do an "end-around" if faced with this problem. Tell your bosses that the best way to handle this is to create a "reporting" database, which is essentially a dynamic copy of the original. You would create scripts or a separate data-pump application to update your reporting database with changes to the original, and propagate changes made to your database back to the original, and then write your web interface to communicate only with your "reporting" database.
Once you have this setup in place, you'd be free to rationalize your copy of the database and bring it back to the realm of sanity by adding primary keys, indexes, normalizing it etc. Even in the short-term, this is a better alternative than trying to write your web interface to communicate with a poorly-designed database, and in the long-term your version of the database would probably end up replacing the original.

can anyone provide any good patterns for addressing this deficiency
Seems that you have been denied the only possibility to "address this deficiency".
I suggest adding the logic of integrity check into your coming stored procedures. Foreign keys, unique keys, it all can be enforced manually. Not a nice work but doable.

One thing you could do is create lots of views, so that the table and column names can be more reasonable, getting rid of the troublesome characters. I tend to prefer not having spaces in my table/column names so that I don't have to use square brackets everywhere.
Unfortunately you will have problems due to the lack of foreign keys, but you may want to create indexes on the column, such as a name, that must be unique, to help out.
At least on the bright side you shouldn't really have many joins to do, so your SQL should be simple.

If you use LinqToSql, you can manually create relationships in the DBML. Additionally, you can rename objects to your preferred standard (instead of the slashes, etc).

I would start with indexed views on each of the tables defining a primary key. And perhaps more indexed views in places where you want indexes on the tables. That will start to address some of your performance issues and all of your naming issues.
I would treat these views as if they were raw tables. This is what your stored procedures or data access layer should be manipulating, not the base tables.
In terms of enforcing integrity rules, I prefer to put that in a data access layer in the code with only minimal integrity checks at the database layer. If you prefer the integrity checks in the database, then you should use sprocs extensively to enforce them.
And don't underestimate the possibility of finding a new job ;-)

It really depends how much automated test coverage you have. If you have little or none, you can't make any changes without breaking stuff (If it wasn't already broken to start with).
I would recommend that you sensibly refactor things as you make other changes, but don't do a large-scale refactoring on the DB, it will introduce bugs and/or create too much testing work.
Some things like missing primary keys are really difficult to work-around, and should be rectified sooner rather than later. Inconsistent table names can basically be tolerated forever.

I do a lot of data integration with poorly-designed databases, and the method that I've embraced is to use a cross-referenced database that sits alongside the original source db; this allows me to store views, stored procedures, and maintenance code in an associated database without touching the original schema. The one exception I make to touching the original schema is that I will implement indexes if needed; however, I try to avoid that if at all possible.
The beauty of this method is you can isolate new code from old code, and you can create views that address poor entity definitions, naming errors, etc without breaking other code.
Stu

Related

Who's responsibility should be to paginate controller/domail service/repository?

My question might seem strange for pros but please take to account that I am coming from ruby on rails world =)
So, I am learning ASP.NET Core. And I like what I am seeing in it compared to rails. But there is always that but... Let me describe the theoretical problem.
Let's say I have a Product model. And there are over 9000 records in the database. It is obvious that I have to paginate them. I've read this article, but it seems to me that something is wrong here since the controller shouldn't use context directly. It has to use some repository (but that example might be provided in such a way only for simplicity).
So my question is: who should be responsible for pagination? Should it be the controller which will receive some queryable object from the repository and take only those records it needs? Or should it be my own business service which does the same? Or should the repository has a method like public IEnumerable<Product> ListProducts(int offset, int page)?
One Domain-Driven-Design solution to this problem is to use a Specification. The Specification design pattern describes a query in an object. So you might create a PagedProduct specification which would take in any necessary parameters (pageSize, pageNumber, filter). Then one of your repository methods (usually a List() overload) would accept an ISpecification and would be able to produce the expected result given the specification. There are several benefits to this approach. The specification has a name (as opposed to just a bunch of LINQ) that you can reason about and discuss. It can be unit tested in isolation to ensure correctness. And it can easily be reused if you need the same behavior (say on an MVC View action and a Web API action).
I cover the Specification pattern in the Pluralsight Design Patterns Library.
For first, I would like to remind you that all such examples you linked are overly simplified, so it shouldn't drive you to believe that that is the correct way. Simple things, with fewer abstraction layers are easier to oversee and understand (at least in the case of simple examples for beginners when the reader may not know where to look for what) and that's why they are presented like that.
Regarding the question: I would say none of the above. If I had to decide between them then I would say the service and/or the repository, but that depends on how you define your storage layer, etc.
"None of the above", then what? My preference is to implement an intermediary layer between the service layer and the Web UI layer. The service layer exposes manipulation functionality but for read operations, exposes the whole collection as an IQueryable, and not as an IEnumerable, so that you can utilize LINQ-to-whatever-storage.
Why am I doing this, many may ask. Because almost all the time you will use specialized viewmodels. To display the list of products on an admin page, for example, you would need to display values of columns in the products table, but you are very likely to need to display its category as well. Very rarely is it the case that you need data only from one table and by exposing the items as an IQueryable<T> you get the benefit of being able to do Selects like this:
public IEnumerable<ProductAdminTableViewModel> GetProducts(int page, int pageSize)
{
backingQueryable.Select(prod => new ProductAdminTableViewModel
{
Id = prod.Id,
Category = prod.Category.Name, // your provider will likely resolve this to a Join
Name = prod.Name
}).Skip((page - 1) * pageSize).Take(pageSize).ToList();
}
As commented, by using the backing store as an IQueryable you will be able to do projections before your query hits the DB and thus you can avoid any nasty Select N+1s.
The reason that this sits in an intermediary layer is simply you do not want to add references to your web project neither in your repo nor in your service layer (project) but because of this you cannot implement the viewmodel-specific queries in your service layer simply because the viewmodels cannot be resolved there. This implies that the viewmodels reside in this same project as well, and to this end, the MVC project only contains views, controllers and the ASP.NET MVC-related guttings of your app. I usually call this intermediate layer as 'SolutionName.Web.Core' and it references the service layer to be able to access the IQueryable<T>-returning method.

Is returning DataTable or DataSet from DAL is wrong approach

I am creating my assignment project from scratch, I started off by creating a DAL that handles connection to SQL and can execute stored procedure.
The project later will be a CMS or Blog
My DAC or DAL is reading connection string from web.config and accepting arrays of SQL Parameters along with Stored Procedure Name and is returning output of the executed stored procedure in DataTable.
I am calling my DAC as follows(using class functions and sometimes using code behind)
Dim dt As DataTable = Dac.ReturnDataTable("category_select_bypage", New SqlParameter("#pageid", pageid), New SqlParameter("#id", id))
Which is working fine and i am assuming to complete my whole application by calling this DAC and insert and retrieve data using stored procedures or may be simple queries later.
But, i have showed this work to my teacher and he told's me that you are using a wrong approach and your code is not fulfilling the correct DAL approach.
So, i am now too much confused with DAL and DAC. Any How here are main questions.
1. What really is the difference between DAL and DAC
2. By using my approach, what kind of application is good to make with it ? Can i make shopping carts using this approach?
3. What is wrong with my approach ?
I think you can start with these links:
- http://msdn.microsoft.com/en-us/library/aa581778.aspx
- http://martinfowler.com/eaaCatalog/ (Data Source Architectural Patterns)
- http://www.simple-talk.com/dotnet/.net-framework/.net-application-architecture-the-data-access-layer/ (it talks about DataSets, I guess you can use them until application is small or pretty simple: they're well known and deep integrated in many VS tools).
Take a look at the SO question too: How to organize Data Access Layer (DAL) in ASP.NET
Adriano provided a very good link. Certainly a must read. But I do see one thing. When creating logical layers in an application, each layer is not aware of the inner workings of those beneath it. For example the presentation layer has no idea how the data layer gets data, it just magic. The reason for this is what if you decide to not use the SqlClient and chose a different technology. Now by using "Technology Hiding" with layers you can make that change quite easily.
So given what you have I'm assuming the DataTable is in the presentation (or application) layer. And, if this is true your DAC method call should not expose anything related to what or how it is retrieving data. In your case this rule is violated by the SqlParameter parameter. Perhaps you could pass string's or int's. For example:
public DataTable Dac.GetPage(int pageId, int id)
Never the less, best of luck. I'm glad to see those willing to learn and those willing to teach.
I don't think returning DataSets could readily be described as 'wrong'.
Your DAL serves purely as an abstraction so (hypothetically) we can swap databases, or change from a database to an XML file or whatever. The rest of the application is unchanged.
As long as your business logic is happy with a contract that passes DataSets up from the DAL I don't think it's an architectural problem. It might be suboptimal, since DataSets can be quite expensive in terms of resources, but they are very well supported in .Net (they are the foundation of more simple architectures like the Transaction Script or Table patterns).
What you might have instead are some custom objects and have the DAL transform its SQL query results into those objects, so they can then be processed by the business logic layer.
But then I can think of situations where you just might want DataSets…
All subjective, of course ;)
Function in Data Access SQLServer Layer:
public override DataTable abc()
{
//Connection with database
}
Function in Data Access Layer:
public abstract DataTable abc();
Function in Business Logic Layer:
public DataTable abc()
{
//Preparing the list
}
Function in Presentation Layer:
using (DataTable DT = (new BusinessLogicLayerClass()).abc())
{
}
The above approach is according to the DataTable. But it should insteadd return list of class to the presentation. The List preparation job should be done in Business Logic Layer.
The problem with DataSet/DataTable is that it's a representation of your database. Using it everywhere would create high coupling to your database.
That means that you have to update all depending code every time you do a change in your db.
It's better to create simple classes which represent the database. by doing so you'll only have to change the internals of those objects (or the mapping layer) when the db is changed.
I recommend that you read about the Repository pattern. It's the most common way to abstract away the data source.

EF4 ASP.NET - Managing Entity Edits between HTTP Posts and Rollback

I am struggling with the following use-case:
User amends an existing order. The order is complex - lots of related 'entities' (addresses, post options, suppliers, makes, models, various items etc). Across multiple http posts.
User wants to discard the changes.
--
I have an order entity and as the user is editing this I am making various changes to the entity associations e.g changing order.address, order.items.add(item)...
In a single post this is fine, but across posts I don't know how best store state. If I store the entities then I cannot save the changes as they are across different data contexts. I have read that it is bad practice to store the data context in the session state i.e. long-lived context. I can't save changes after each edit/post because I cannot roll-back (?). I really would like to work with the entities during the editing process rather than one big save at the end (taking UI settings and applying these in one chunk).
This must be a pretty common problem - it's driving me mad. Any help really appreciated.
Cheers!
We have a similar problem where we are building a complex business object through a multi-page wizard.
Instead of creating a partially complete business object at each step of the wizard, we create a dedicated wizard object that looks pretty similar to the business object, populate that through the wizard. At each step in the wizard, the wizard object is saved into the database. At the end the user can accept it and it is converted to a real business object and then becomes visible to everyone else, or they can bin it and no-one else ever knows it existed.
If this kind of approach was not suitable, I suspect you're looking at some kind of difference tracking, either at the entity or database levels. Neither are simple to implement, work with or manage in a system. The former would be some kind of calculation and storage of n changes to the entities and developing an algorithm to undo them, the latter depends on your RDBMS, but might include versioned rows or similar.
Yes its pretty much common for us. In most scenarios we use the MVC approach. Even without the actual ASP .NET MVC Projects, we use similar ViewModel with our Views/Pages/Scenarios etc. where there is no direct/single entity mapping to the Business Layer (in other words, Business.Entities). This is pretty much similar to DTOs.
It is always easy to use Disconnected EF. We retrieve data and discard the context, then transform the Entities into ViewModels/DTOs if necessary. When you need to persist changes, all you have to do is to create a new context, find the latest entity instance do the changes.
The Views/Pages/Controllers will be managing these ViewModels/DTOs. Tracking Changed and Deleted content can be done by introducing a HistoryList<T> (you can extend a List<T> to implement this).
Once done, using a Controller/Workflow/Component you can observe the ViewModel/DTO and do the necessary changes to your Entities using a new Context to retrieve and persist.
It involves a bit of a coding and I would say its not a perfect solution since it has its own pros and cons.
/KP

(LINQ-To-SQL) Creating classes first, database table second, how to auto-connect the two?

I'm creating a data model first using the LINQ-To-SQL graphical designer by using right-click->Add->Class. My idea is that I'll set up everything first using test repositories, design the entire website, then as a final step, create a database using the LINQ-To-SQL classes as a model for the database tables and relationships. My reasoning is that it's easy to edit the classes, but hard to modify DB tables (especially if there's already data in them), so by doing the database part last, it becomes much easier to design the structure.
My question is, is there an automatic way to link the two once I have the DB tables created? I know you can manually fill out the class properties for the LINQ-To-SQL entities but this is pretty cumbersome if you have a lot of tables to deal with. The other option is to delete your manually-created classes and drag the tables from the database into the designer to auto-generate the classes, but I'm not sure if this is the best way of doing it.
Linq to Sql is intended to be a relatively thin ORM layer over a database. While you can of course just add properties to a data context and use them as a sort of mock, you are correct, it isn't really easy to work with.
Instead of relying solely on Linq to Sql generated classes to give you freedom from the database implementation, you may want to look into the repository design pattern. It allows you to have a smooth separation between your database, domain model, and your middle tier; I have used it on two projects now, and have been able to (for the most part) build everything top-down, leaving the actual database for last. Below is a link to a good tutorial on the pattern (better than I could scribble down here).
https://web.archive.org/web/20110503184234/http://blogs.hibernatingrhinos.com/nhibernate/archive/2008/10/08/the-repository-pattern.aspx
Depending on your database permissions, you may call your datacontext's DeleteDatabase() and CreateDatabase() methods as a ungraceful way of resyncing your classes and tables. This is not much of an option when you have actual data in the database, but does work when you are in your development stages.
Take a look at my add-in (which you can download from http://www.huagati.com/dbmltools/ , free 45-day trial licenses are also available from the same site).
It can generate SQL-DDL diff scripts with the SQL-DDL statements for updating your database with only the portions that has changed in the L2S model (e.g. add missing columns, missing tables, missing FKs etc), instead of the L2S-out-of-the-box support for recreating the entire db from scratch.
It also supports syncing the other way; updating the model from the database.

Saving MFC Model as SQLite database

I am playing with a CAD application using MFC. I was thinking it would be nice to save the document (model) as an SQLite database.
Advantages:
I avoid file format changes (SQLite takes care of that)
Free query engine
Undo stack is simplified (table name, column name, new value
and so on...)
Opinions?
This is a fine idea. Sqlite is very pleasant to work with!
But remember the old truism (I can't get an authoritative answer from Google about where it originally is from) that storing your data in a relational database is like parking your car by driving it into the garage, disassembling it, and putting each piece into a labeled cabinet.
Geometric data, consisting of points and lines and segments that refer to each other by name, is a good candidate for storing in database tables. But when you start having composite objects, with a heirarchy of subcomponents, it might require a lot less code just to use serialization and store/load the model with a single call.
So that would be a fine idea too.
But serialization in MFC is not nearly as much of a win as it is in, say, C#, so on balance I would go ahead and use SQL.
This is a great idea but before you start I have a few recommendations:
Be careful that each database is uniquely identifiable in some way besides file name such as having a table that describes the file within the database.
Take a look at some of the MFC based examples and wrappers already available before creating your own. The ones I have seen had borrowed on each to create a better result. Google: MFC SQLite Wrapper.
Using SQLite database is also useful for maintaining state. Think ahead about how you would manage keeping in mind what features are included and are missing in SQLite.
You can also think now about how you may extend your application to the web by making sure your database table structure is easily exportable to other SQL database systems- as well as easy enough to extend to a backup system.

Resources