In order to fully use LinqToSql in an ASP.net 3.5 application, it is necessary to create DataContext classes (which is usually done using the designer in VS 2008). From the UI perspective, the DataContext is a design of the sections of your database that you would like to expose to through LinqToSql and is integral in setting up the ORM features of LinqToSql.
My question is: I am setting up a project that uses a large database where all tables are interconnected in some way through Foreign Keys. My first inclination is to make one huge DataContext class that models the entire database. That way I could in theory (though I don't know if this would be needed in practice) use the Foreign Key connections that are generated through LinqToSql to easily go between related objects in my code, insert related objects, etc.
However, after giving it some thought, I am now thinking that it may make more sense to create multiple DataContext classes, each one relating to a specific namespace or logical interrelated section within my database. My main concern is that instantiating and disposing one huge DataContext class all the time for individual operations that relate to specific areas of the Database would be impose an unnecessary imposition on application resources. Additionally, it is easier to create and manage smaller DataContext files than one big one. The thing that I would lose is that there would be some distant sections of the database that would not be navigable through LinqToSql (even though a chain of relationships connects them in the actual database). Additionally, there would be some table classes that would exist in more than one DataContext.
Any thoughts or experience on whether multiple DataContexts (corresponding to DB namespaces) are appropriate in place of (or in addition to) one very large DataContext class (corresponding to the whole DB)?
I disagree with John's answer. The DataContext (or Linq to Entities ObjectContext) is more of a "unit of work" than a connection. It manages change tracking, etc. See this blog post for a description:
Lifetime of a LINQ to SQL DataContext
The four main points of this blog post are that DataContext:
Is ideally suited
for a "unit of work" approach
Is also designed for
"stateless" server operation
Is not designed for
Long-lived usage
Should be used very carefully after
any SumbitChanges() operation.
Considering that, I don't think using more than one DataContext would do any harm- in fact, creating different DataContexts for different types of work would help make your LinqToSql impelmentation more usuable and organized. The only downside is you wouldn't be able to use sqlmetal to auto-generate your dmbl.
I'd been wrangling over the same question whilst retro fitting LINQ to SQL over a legacy DB. Our database is a bit of a whopper (150 tables) and after some thought and experimentation I elected to use multiple DataContexts. Whether this is considered an anti-pattern remains to be seen, but for now it makes life manageable.
I think John is correct.
"My main concern is that instantiating and disposing one huge DataContext class all the time for individual operations that relate to specific areas of the Database would be impose an unnecessary imposition on application resources"
How do you support that statement? What is your experiment that shows that a large DataContext is a performance bottleneck? Having multiple datacontexts is a lot like having multiple databases and makes sense in similar scenarios, that is, hardly ever. If you are working with multiple datacontexts you need to keep track of which objects belong to which datacontext and you can't relate objects that are not in the same data context. That is a costly design smell for no real benefit.
#Evan "The DataContext (or Linq to Entities ObjectContext) is more of a "unit of work" than a connection"
That is precisely why you should not have more than one datacontext. Why would you want more that one "unit of work" at a time?
I have to disagree with the accepted answer. In the question posed, the system has a single large database with strong foreign key relationships between almost every table (also the case where I work). In this scenario, breaking it up into smaller DataContexts (DC's) has two immediate and major drawbacks (both mentioned by the question):
You lose relationships between some tables. You can try to choose your DC boundaries wisely, but you will eventually run into a situation where it would be very convenient to use a relationship from a table in one DC to a table in another, and you won't be able to.
Some tables may appear in multiple DC's. This means that if you want to add table-specific helper methods, business logic, or other code in partial classes, the types won't be compatible across DC's. You can work around this by inheriting each entity class from its own specific base class, which gets messy. Also, schema changes will have to be duplicated across multiple DC's.
Now those are significant drawbacks. Are there advantages big enough to overcome them? The question mentions performance:
My main concern is that instantiating and disposing one huge
DataContext class all the time for individual operations that relate
to specific areas of the Database would be impose an unnecessary
imposition on application resources.
Actually, it is not true that a large DC takes significantly more time to instantiate or use in a typical unit of work. In fact, after the first instance is created in a running process, subsequent copies of the same DC can be created almost instantaneously.
The only real advantage from multiple DC's for a single, large database with thorough foreign key relationships is that you can compartmentalize your code a little better. But you can already do this with partial classes.
Also, the unit of work concept is not really relevant to the original question. Unit of work typically refers to how much a work a single DC instance is doing, not how much work a DC class is capable of doing.
In my experience with LINQ to SQL and LINQ to Entities a DataContext is synonymous to a connection to the database. So if you were to use multiple data stores you would need to use multiple DataContexts. My gut reaction is you wouldn't notice to much of a slow down with a DataContext that encompasses a large number of tables. If you did however you could always split the database logically at points where you can isolate tables that don't have any relationship to other sets of tables and create multiple contexts.
Related
Can using Modules or Shared/Static references to the BLL/DAL improve the performance of an ASP.NET website?
I am working of a site that consists of two projects, one the website, the other a VB.NET class library which acts as a combination of DAL and BLL.
The library is used to communicate with databases and sometimes transform/validate the data going into/coming from the DBs.
Currently each page on the site that needs db access (vast majority) will create an instance of the relevant class in the library to access specific tables.
As I understand it this leads to a class from the library being instantiated and garbage collected for each request, with the possibility of multiple concurrent instances if multiple users view the same page.
If I converted the classes to modules (shared/static class) would performance increase and memory be saved as only one instance of each module exists at a time and a new instance is not having to be created for each request?
(if so, does anyone know if having TableAdapters as global variables in the modules would cause problems due to threading?)
Alternatively would making the references to the Library class it the ASP.NET page have the same effect? (except I would have to re-write a lot less)
I'm no expert, but think that the absence of examples of this static class / session object model in books and online is indicative of it being a bad idea.
I inherited a Linq-To-Sql application where the db contexts were static, and after n requests the whole thing just fell apart. The standard model for L2Sql is the Unit-of-Work pattern (define a task or set of tasks - do them and close). Let the framework worry about connection pooling and efficient GC.
Are you just trying to be efficient or do you have performance issues? If the latter it's usually more effective to look at caching or improving query efficiency (use stored procedures, root out queries in loops) than looking at object instantiation.
Statics don't play well with unit tests either (another reason why they have dropped out of fashion).
instances are only a problem if they are not collected by the CG (a memory leak). Instances are more flexible than static as well because you can configure the instance to the specific context you are using.
When an application has poor performance or memory problems its usually a sign that
instances are not properly released (IDisposable)
the amount of data retrieved is too big (not paging large sets of data)
a large number of queries are executed (select n+1, or just a lot of queries)
poorly constructed sql statements (missing indexes, FK, too many joins, etc)
too many remote calls (either to other servers, or disk)
These are first things I would check. then start looking at the number of instantiated objects. Chances are that correcting the above mentioned list will solve most performance bottlenecks.
Can using Modules or Shared/Static references to the BLL/DAL improve
the performance of an ASP.NET website?
It's possible, but it depends heavily on how you use your data. One tradeoff in using a single shared instance of an object instead of one per request is that you will need to apply locking unless the objects are strictly read-only, and locking can both slow things down and complicate your code (not to mention being a common source of bugs).
However, if each object is going to contain the exact same data, then the tradeoff may be worth it -- even more so if it can save a DB round-trip.
You might consider using either a Singleton or a small number of parameterized objects rather than a static, though -- and use caching to manage them. That would give you the flexibility to let go of objects that you no longer need, which is harder to do when you're dealing with statics.
i read in one article that its not a good practice to pass dataset between different layers of .net web application.(DAL->BAL->Pages vice versa).Is that correct?
please give your suggestions.
Thanks
SNA
On the one hand, the problem with datasets and datatables is that they expose database implementation details like column names and types outside of your data access layer. Change a column name in your database or query and odds are that change is propogated to your dataset as well, forcing a re-compile of any tier that uses the dataset. So if you retrieve data into a dataset you should convert it to use strongly-typed business objects before passing it on.
On the other hand, a dataset doesn't care what kind of database it belongs to. You can use them with access, oracle, sql server, mysql, anything. So there is some generic-ness there that can make them useful when passing data between tiers. And just like the business layer shouldn't care about database details the data layer shouldn't really need to know what the the business objects are, so there's a good argument that you should use them for data interchange at that level.
My normal procedure is to have a sort of one-way "translation" tier between the business and data access layers, so that the business layer only deals with business objects and the data layer only returns generic data. This currently takes one of two forms:
I'll write my data access methods to return datatables or datareaders, the the translation tier will use a factory pattern to convert those rows into the desired strongly-typed business objects.
or
I'll use C# iterator blocks to convert a datareader into an IEnumerable<IDataRecord> in the data access layer and the translation tier will use them to change that IEnumerable<IDataRecor> into an IEnumerable<MyBusinessObject>, such that the code only ever iterates over the result set one time.
There is nothing wrong with passing around datasets but it's not a great practice.
Pros:
Easy to pass around and use in .NET apps
No having to code wrapper classes
Lots of functionality built into DataSets
Cons:
Data type that is not really type safe.
Your data field names can change all parts of your app will compile fine until they blow up at runtime.
Heavy object. Dataset does a ton of stuff and you probably don't need 90% of it.
Having non .NET apps talk to your DAL or BAL is going to be very clean.
There's nothing wrong about passing DataSets from your DAL to your BAL.
I think this stackoverflow question on DAL best practices sums up the two schools of thought pretty well.
I am in the middle of a "discussion"
with a colleague about the best way to
implement the data layer in a new
application.
One viewpoint is that the data layer
should be aware of business objects
(our own classes that represent an
entity), and be able to work with that
object natively.
The opposing viewpoint is that the
data layer should be object-agnostic,
and purely handle simple data types
(strings, bools, dates, etc.)
There is no problem with passing dataset across layers. If you observe, you will notice that passing dataset is by reference and not by value.So there is no issue of performance here.
Now what you read is also right, but you have to understand the context. If you are passing the dataset across remote boundaries, that is not a recommended practice.
There's nothing fundamentally wrong with that doing that. Although the basic idea of having a DAL, BLL and UI layer is so that each layer can abstract what's beneath it. E.g. the BLL shouldn't have any knowledge of how the database is structured because the DAL abstracts that away. If a dataset is being loaded in the DAL then passed straight through the BLL to the pages, it kind of sounds like the BLL is pointless.
The strongest statements often seen about DataSet is not to pass it into or out of a web service. That goes beyond exposing implementation details, and includes exposing details of the platform (.NET).
Although it's possible to change "table" and "column" names in a DataSet from those in the underlying database, you're still largely stuck with the underlying structure of the database. To abstract that, I would use Entity Framework. It allows you, for instance, to define a "Customer" entity which takes data from multiple tables and puts it into a single entity. Code using the entity doesn't need to know whether it is implemented as one table, two tables, or whatever.
Even there, you should not pass these entities outside of a web service boundary. They still pass implementation details outside of the implementation. For instance, properties of the base classes get serialized, even though these are just implementation details.
As far as I've understood, the DataSet requires the db connection to be open, for as long as it is used, which will reduce performance in your application as it keeps the connection open until the content is rendered.
Instead, I recommend using generic collections, such as IEnumerable<myType> or IQueryable<myType>, where myType is a custom type which you fill with your data.
Is it possible to get a LINQ to SQL DataContext to run completely in-memory? Without it touching the database?
I am doing some very rapid prototyping, and want to minimize the surface area for major changes since the UI is changing so fast. However, the data model already exists.
Data access is handled through the use of I[Model]Repository classes that return the actual LINQ to SQL data classes, so I currently have some concrete InMemory[Model]Repository classes that shove stuff in cache. The implementation is a little cumbersome however.
So... is it possible to simply override enough of the DataContext behavior to have it run in-memory and never touch the database. My assumption is that it is not possible, but I thought I would go fishing anyway.
You can only do this if you are prepared to wrap access to the datacontext with your own interface. Then for rapid prototyping you can write your own datacontext alternative that implements this interface and instead uses lists and LINQ to Objects to perform in-memory queries.
I'm thinking through data access for an ASP.NET application. Coming from a company that uses a lot of Windows applications with Client Data sets there is a natural dendancy towards a DataSet approach for dealing with data.
I'm more keen on a Business Object approach and I don't like the idea of caching a DataSet in the session then applying an update.
Does anyone have any experience / help to pass on about the pros and cons of both approaches?
You are smart to be thinking of designing a Data Layer in your app. In an ASP.NET application this will help you standardize and pretty dramatically simplify your data access. You will need to learn how to create and use ObjectDataSources but this is quite straightforward.
The other advantage of a data access layer (built using a separate project/DLL) is that it makes Unit testing much simpler. I'd also encourage you to build a Business Layer to do much of the processing of data (the business layer, for example, would be responsible for pulling ObjectDataSources from the DAL to hand to the UI code). Not only does this let you encapsulate your business logic, it improves the testability of the code as well.
You do not want to be caching DataSets (or DAL objects, for that matter) in the session! You will build a Web app so that record modifications work through a Unique ID (or other primary key spec) and feed changes directly to the DAL as they are made. If you were to cache everything you would dramatically reduce the scalability of your app.
Update: Others on this thread are promoting the idea of using ORMs. I would be careful about adopting a full-blown ORM for reasons that I have previously outlined here and here. I do agree, though, that it would be wise to avoid DataSets. In my own work, I make extensive use of DataReaders to fill my ObjectDataSources (which is trivial due to the design of my DAL) and find it to be very efficient.
DataSets can be incredibly inefficient compared even to other ADO.NET objects like DataReaders. I would suggest going towards the BO/ORM route based off what you are saying.
If you're going to follow Microsoft's direction, then the trend is definitely towards LINQ (ORM) vs. DataSets. When DataSets came into being (ASP.NET 1.0), LINQ wasn't even possible. With LINQ you get type-safety and build-in functions to Create / Update / Delete from the database.
Microsoft has even tried to make the transition easier through LINQ to DataSet.
We're about to do a big update to an existing asp app that used DataSet objects heavily; although I am not looking forward to the pain, I am going to insist on going down the BO route. Just the thought of trying to make datasets work now causes me to break out in a sweat.
I think we are going to go down the LINQ route and use lightweight entity objects.
The company where I work makes heavy use of DataSets as well while there is a business layer as well. BL mainly loads datasets from the DB.
I personally dislike this approach. There is also a practice of direct modifying the datasets after load/before save to meet some immediate needs here and there. To me it really violates the idea of business objects but it's how it is done.
ORM frameworks can really save you a great deal of time, especially in enterprise applications with lots of views with similar buttons and operations.
But it's also easy to lose control. Since that point it will slowly be turning into a mess.
Both options are good when used in right cases. Just don't mix them. Decide to do it one way and follow it.
Mark Brittingham's answer is accurate for two tier applications. But what if I want to use a service tier. DataSets are serializable. Typed DataSets save time over hand coding your own objects. Typed DataSets are extendable. Linq to Entities has performace issues, Linq to SQL is now dead. Linq to DataSet will always be an option.
I will use Typed DataSets and a multi-layered architecture to save time and organize code. I've tried hand coded BOs and the extra time and maintenance time is not worth it.
What is the best way to implement DTOs?
My understanding is that they are one way to transfer data between objects. For example, in an ASP.Net app, you might use a DTO to send data from the code-behind to the business logic layer component.
What about other options, like just sending the data as method parameters? (Would this be easiest in asces wher there is less data to send?)
What about a static class that just holds data, that can be referenced by other objects (a kind of global asembly data storage class)? (Does this break encapsulation too much?)
What about a single generic DTO used for every transfer? It may be a bit more trouble to use, but reduces the number of classes needed to work with (reduces object clutter).
Thanks for sharing your thoughts.
I've used DTO's to:
Pass data between the UI and service tier's of a standard 3-tier app.
Pass data as method parameters to encapsulate a large number (5+) of parameters.
The 'one DTO to rule them all' approach could get messy, best bet is to go with specific DTO's for each feature/feature group, taking care to name them so they're easy to match between the features they're used in.
I've never seen static DTO's in the way you mention and would hesitate at creating DTO singletons like you describe.
I keep it simple and map one DTO class to one db table. They are lightweight so I can send them everywhere, including over the wire.
I wish it could be as simple. Though DTO originated due to network distribution tiers of a system there can be whole load of issues if domain objects are returned to View layers. Here are some of them:
1.By exposing Domain objects to View layer, Views become aware of structure of domain objects, which lets view makes some assumptions about how related objects are available. For example if a domain object "Person" was retunrned to a view to which it is "bound" and on some other view, "Address" of Person is to be bound, there would be a tendency for Application layer to use a semantic like person.getAddresse() which woukd fail since at that point Address domain object might have not been loaded at point. In essence, with domain objects becoming available to View layers, views can always make assumptions about how data is made available.
2.) when domain objects are bound to views (more so in Thick clients), there will alwyas be a tendency for View centric logic to creep inside these objects making them logically corrupt.
Basically from my experience I have seen that making domain objects available to Views create architectural issues but there are issues with use of DTO's also since use of DTO creates additional work in terms of creation of Assemblers (DTO to Domain objects and reverse), Proliferation of analogous objects like Patient domain object, Patient DTO and perhaps Patient bean bound to view.
Clearly there are no right answers for this specially in a thick client system.
I borrowed this short and not complete but true answer to DTO cliché from:
http://www.theserverside.com/discussions/thread.tss?thread_id=32389#160505
I think it's pretty common to use DataSet/DataTable as the "one DTO to rule them all". It's easy to load them from the database, and persist the values back, and they can be easily serialized.
I would definitely say they are more trouble to use. They do provide all of the plumbing, but programming against them is a pain (lots of casting, null checks, magic strings, etc). It would be interesting to see a good set of extension methods to make working with them a little more "natural".
DTOs are used to send data over the wire, not between objects. Check out this post:
POCO vs DTO
Thanks for all the helpful ideas...
A summary + my take on this:
--If there is a small amount of data to move and not too many places to move it, regular parameters may suffice
--If there is a lot of data and/or many objects to move it to, a specially created object may be easiest (DTO object).
--A global data object that can be referenced (rather than passed) by various objects would seem to be frowned on...however, I wonder if there isn't sometimes a place for it within a particular sub-system? It is one way to reduce the amount of data passing. It does push the limits of "good encapsulation", however in specific instances within specific layers, perhaps it could add simplicity to a particluar assemply of classes. Thus one would lose class-level encapsulation, but could still have assembly-level encapsulation.