Entity Attribute Value (EAV) vs. XML Column for New Product Atttributes - asp.net

I have an existing, mature schema to which we need to add some new Product attributes. For example, we have Products.Flavor, and now need to add new attributes such as Weight, Fragrance, etc. Rather than continue to widen the Products table, I am considering a couple of other options. First is a new Attributes table, which will effectively be a property bag for arbitary attributes, and a ProductsAttributes table to store the mappings (and values) for a particular product's attributes. This is the Entity-Attribute-Value (EAV) pattern, as I've come to understand it. The other option is to add a new column to the Products table called Attributes, which is of type XML. Here, we can arbitrarily add attributes to any product instance without adding new tables.
What are the pros/cons to each approach? I'm using SQL Server 2008 and ASP.NET 4.0.

This is (imho) one of the classic database design issues. Call it "attribute creep", perhaps, as once you start, there will always be another attribute or property to add. They key decision is, do you store the data within the database using the basic tools provided by the database (tables and columns) to structure and format the data, or do you store the data in some other fashion (XML and name/value pairs being the most common alternates). Simply put, if you store the data in a form other than that supported by the DBMS system, then you lose the power of the DBMS system to manage, maintain, and work with that data. This is not much of a problem if you only need to store it as "blob data" (dump it all in, pump it all out), but once you start have to seek, sort, or filter by this data, it can get very ugly very fast.
With that said, I do have strong opinions on name/value pairs and XML, but alas, none are positive. If you do have to store your data this way, and yes it can be an entirely valid business/design decision, then I would recommend looking long and hard on how the data you need to store in the database will be used and accessed in the future. Weight the pros and cons of each methodology in light of how it will be used, and pick the once that's easiest to manage and maintain. (Don't pick the one that's easiest to implement, you'll be supporting it for a lot longer than you'll be writing it.)
(It's long, but the "RLH" essay is a classic example of name/value pairs run amok.)
(Oh, and if you're using it, look into SQL Server 2008's "Sparse Columns" option. Doesn't sound like what you need, but you never know.)

Related

Best way to store key value pairs in BizTalk

We have requirements where we need to store Key-Value pair type data for quick retrieval in BizTalk Map.
Is there any best practice by which we can store it. The data stored should be easy to maintain and should have a cache mechanism for easy retrieval as the number of a key-value pair to be stored can be high in a number ranging from 1- 100 or more.
We do not need to store confidential information so I am not preferring SSO. but still is it a preferable method?
We need to use it in BizTalk map and data retrieval might happen for each row of data so there is performance pressure as well.
For this I would use the Get Common Value and Get Application Value functoids XRef functiods. It both gives you the ability to have key value pairs (with an additional element so you can scope it per application), and they also do Caching.
I wrote a blog post about it BizTalk Pattern: Translating Reference Data in a Map using Xref
You do have the xRef Functoids, but those are somewhat difficult to maintain and use beyond their original design requirement.
What I have done for a similar situation is to pre-Fetch the lookup tables as SQL Table Types into an Orchestration, then use a Multi-Input Map passing the business message and lookup tables. That way, all lookups are internal to the transform.
Retrieving the entire lookup table once is in many case less taxing than doing many lookups.

How to setup data model for customizable application

I have an ASP.NET data entry application that is used by multiple clients. The application consists of multiple data entry modules that are common to all clients.
I now have multiple clients that want their own custom module added which will typically consist of a dozen or so data points. Some values will be text, others numeric, some will be dropdown selections, etc.
I'm in need of suggestions for handling the data model for this. I have two thoughts on how to handle. First would be to create a new table for each new module for each client. This is pretty clean but I don't particular like it. My other thought is to have one table with columns for each custom data point for each client. This table would end up with a lot of columns and a lot of NULL values. I don't really like either solution and suspect there's a better way to do this, so any feedback you have will be appreciated.
I'm using SQL Server 2008.
As always with these questions, "it depends".
The dreaded key-value table.
This approach relies on a table which lists the fields and their values as individual records.
CustomFields(clientId int, fieldName sysname, fieldValue varbinary)
Benefits:
Infinitely flexible
Easy to implement
Easy to index
non existing values take no space
Disadvantage:
Showing a list of all records with complete field list is a very dirty query
The Microsoft way
The Microsoft way of this kind of problem is "sparse columns" (introduced in SQL 2008)
Benefits:
Blessed by the people who design SQL Server
records can be queried without having to apply fancy pivots
Fields without data don't take space on disk
Disadvantage:
Many technical restrictions
a new field requires DML
The xml tax
You can add an xml field to the table which will be used to store all the "extra" fields.
Benefits:
unlimited flexibility
can be indexed
storage efficient (when it fits in a page)
With some xpath gymnastics the fields can be included in a flat recordset.
schema can be enforced with schema collections
Disadvantages:
not clearly visible what's in the field
xquery support in SQL Server has gaps which makes getting your data a real nightmare sometimes
There are maybe more solutions, but to me these are the main contenders. Which one to choose:
key-value seems appropriate when the number of extra fields is limited. (say no more than 10-20 or so)
Sparse columns is more suitable for data with many properties which are filled out infrequent. Sounds more appropriate when you can have many extra fields
xml column is very flexible, but a pain to query. Appropriate for solutions that write rarely and query rarely. ie: don't run aggregates etc on the data stored in this field.
I'd suggest you go with the first option you described. I wouldn't over think it. The second option you outlined would be a bad idea in my opinion.
If there are fields common to all the modules you're adding to the system you should consider keeping those in a single table then have other tables with the fields specific to a particular module related back to the primary key in the common table. This is basically table inheritance (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server) and will centralize the common module data and make it easier to query across modules.

How to implement gapless, user-friendly IDs in NHibernate?

I'm designing an application where my Order objects need to have a sequential and user-friendly Id field. I'm avoiding the HiLo algorithm because of the rather large gaps it produces (see here). Naturally, Guid values would make my corporate users go bananas. I'm also avoiding Oracle sequences because of the major disadvantages of it:
(From: NHibernate POID Generators revealed)
Post insert generators, as the name
suggest, assigns the id’s after the
entity is stored in the database. A
select statement is executed against
database. They have many drawbacks,
and in my opinion they must be used
only on brownfield projects. Those
generators are what WE DO NOT SUGGEST
as NH Team.
> Some of the drawbacks are the
following:
Unit Of Work is broken with the use of
those strategies. It doesn’t matter if
you’re using FlushMode.Commit, each
Save results in an insert statement
against DB. As a best practice, we
should defer insertions to the commit,
but using a post insert generator
makes it commit on save (which is what
UoW doesn’t do).
Those strategies
nullify batcher, you can’t take the
advantage of sending multiple queries
at once(as it must go to database at
the time of Save).
Any ideas/experience on implementing user-friendly IDs without major gaps between them?
Edit:
User friendly Id fields are ones my corporate users can memorize and even discuss and/or have phone conversations talking about a particular Order by its code, e.g. "I'm calling to know why the order #1625 was denied.".
The Id doesn't need to be strictly gapless, but I am worried that my users would get confused when they see gaps like 100, 201, 305. For my older projects, I currently implement NHibernate using Oracle sequences which occasionally lose a few sequences when exceptions are thrown, but yet keep a rather tidy order to them. The downside to them is how they break the Unit of Work which results in additional hits to the database for every Save command with or without the Session.Flush.
One option would be to keep a key-table that simply stores an incrementing value. This can introduce a few problems, namely possible locking issues as well as additional hits to the database.
Another option might be to refine what you mean by "User-friendly Id". This could consist of a combination of a Date/Time and a customer-specific sequence (or including the customer id as well). Also, your order id does not necessarily have to be the actual key on the table. There is nothing to say that you can't use a surrogate key with a separate "calculated" column which represents the order id.
The bottom-line is that it sounds like you want to use a surrogate key, but have the benefits of a natural key. It can be very difficult to have it both ways and a lot comes down to how you actually plan on using the data, how users interpret the data, and personal preference.

What is the best way to implement multilingual domain objects using NHibernate?

What is the best way to design the Domain objects which can have multi-lingual fields. An example can be a Product class with Description being multi-lingual.
I have found few links but could not decide which one is the best way.
http://fabiomaulo.blogspot.com/2009/06/localized-property-with-nhibernate.html
(This stores all localised language data in one field. Can be a problem if we query from Sql)
http://ayende.com/Blog/archive/2006/12/26/LocalizingNHibernateContextualParameters.aspx
(This one has a warning at the beginning that it is a hack and no longer supported)
http://www.webdevbros.net/2009/06/24/create-a-multi-languaged-domain-model-with-nhibernate-and-c/
(This does not describe how multilingual data will be structured in the database.)
Anyone having experience with using NHibernate with multi-lingual data. Is there a better way?
The third option looks great. The hibernate mapping is given, but not the database schema - if that's what you are missing, then I'll sketch it out here:
dictionary
----------
ID: int - identity
name: nvarchar(255)
phrase
------
dictionary_id:int (fkey dictionary.ID)
culture_id:int (LCID)
phrase:nvarchar(255) - this is the default size - seems too small
According to this blog entry, 255 is the default string length for String values. To overcome the short string length on the phrase text, you can change the <element> tag to
<element column="phrase" type="String" length="4001"></element>
To use this in your domain model, you add a PhraseDictionary property to your entity where you want translatable text. E.g. the title property or decription property.
I think the article describes a great approach, and is the one that I would go
for.
EDIT: In response to the comments, make the length less than 4001 if you know the absolute maximum size is less than that, as this will typically be faster. Also, NHibernate will lazily fetch the collection, but it may fetch all the items at once. You can profile to determine if this has any performance implications. (If you have only a handful of languages then I doubt you will see a difference.) If you have many languages (Say 50+) then it may be worthwhile creating custom properties to fetch the localized text. These will issue queries to fetch specifically the text required. More importantly, you may be able to fetch all the text for a given entity in one query, rather than each localized text property as a separate query.
Note that this extra effort is only needed if profiling gives you reason to be concerned about the performance. Chances are that the implementation in the article as is will function more than adequately.
I only have experience for Hibernate, but since nHibernate is so similar:
One option is to define a component type MultilingualString with members for each language (this assumes the set of languages is known at coding time). This type is also a convenient location to place an getter for the string by language id.
class MultiLingualString {
String english;
String chinese;
String klingon;
String forLanguage(Language lang) {
switch (lang) {
// you can guess what goes here
}
}
}
This results in the strings for all languages being stored in separate columns in the database while the representation in the object world retains fine granularity.
The advantage is that no join is required to fetch the strings. On the other hand, the only way not to fetch a string with this approach is to use a projection, which is a severe limitation if the strings are large, numerous and rarely needed.
If you do this a lot, writing a UserType might be worth it.
From a strictly database oriented standpoint with SQL Server, you should have one table with all of the base data (record key, dates, numbers, etc) and one table with all of the translatable string data. Let call the two tables Base and Base_Description.
Base ensures that there is a single key for each record, the key might be a string or auto-generated id depending on your particular use case.
The Base_Description table is related to the Base table, but also contains a value to select the language that the data is in. In my projects we use the langid column from sys.languages because we can set the language of the connection with and then grab it with ##LANGID for most operations.
In our testing we found this to be significantly faster than having multiple fields for each language, it also allows you to add other languages more easily. We are also using SQL Server Full-Text indexing and it fully works with this method. You should index in the neutral language and then you can pick the language to search against at run time (also filtering against the LangID column in Base_Description).
Do your requirements include the domain objects actually having multiple-language properties in the same object? And, if so, is it unlimited translations stored in the object (in a collection, say - in which case I would say that it would need to be just like any master/detail or parent/child collection) or fixed translations, in which case the languages (and thus the mapping to results of a stored proc or whatever) have to be determined statically anyway?
In many internationalized applications I worked on, the data was in only one language - customer names, the product names (there was no point in mapping even identical products used in one country to products in another, they all had different distributors and different SKUs, and of course localized pricing). The interface was also only in one language (at a time). So all the domain objects only required one language at a time. Thus the language of the translation would be determined when the object was instantiated.
We had translation user interfaces which allowed users to update the translated texts, but these only required two languages at a time (local and the default). I can see this being closest to what you are talking about. I guess that you would have child collections for each translatable property with all the possible translations in the collection. This would probably be closest to the second solution in the third article you linked. Of course, at this point you would also need to see if you want eager/lazy loading etc.

n-tier design, lookup tables, and custom entities

In an n-tiered application where you are using custom entities, how do you find yourself handling data needed from lookup tables? Do you create entities for each of these lookup tables or employ some other strategy?
For example. I have a "Ratings" lookup table that will be used to populate a dropdownlist. Would you create a ratings object with a ratingid and rating property and pass that to your UI or is there a more efficient way to go about it?
Appreciate your thoughts.
I'd suggest that the solution will depend on how often the lookup data changes, whether or not it needs to be editable, and whether or not you're enforcing referential integrity at the database. I think it makes the schema more understandable if you put each lookup type into a separate table.
I generally don't create entities for each lookup table, but instead will load most of the common lookups into structures that are easily re-used by the application - for an asp.net app, for example, I'll create hashtables or ordered dictionaries which can easily be bound to most web controls.
And, horror of horrors, I sometimes create a singleton to manage access to all these lookups, which can be stored as static vars or in the cache, depending on requirements.
We seperate out the different look up types into different objects. It seems to be a little more work up front, but it provides us the ability to make changes to each individual object when we need to, such as an addition of additional information to an object.

Resources