What is the best way to implement multilingual domain objects using NHibernate? - asp.net

What is the best way to design the Domain objects which can have multi-lingual fields. An example can be a Product class with Description being multi-lingual.
I have found few links but could not decide which one is the best way.
http://fabiomaulo.blogspot.com/2009/06/localized-property-with-nhibernate.html
(This stores all localised language data in one field. Can be a problem if we query from Sql)
http://ayende.com/Blog/archive/2006/12/26/LocalizingNHibernateContextualParameters.aspx
(This one has a warning at the beginning that it is a hack and no longer supported)
http://www.webdevbros.net/2009/06/24/create-a-multi-languaged-domain-model-with-nhibernate-and-c/
(This does not describe how multilingual data will be structured in the database.)
Anyone having experience with using NHibernate with multi-lingual data. Is there a better way?

The third option looks great. The hibernate mapping is given, but not the database schema - if that's what you are missing, then I'll sketch it out here:
dictionary
----------
ID: int - identity
name: nvarchar(255)
phrase
------
dictionary_id:int (fkey dictionary.ID)
culture_id:int (LCID)
phrase:nvarchar(255) - this is the default size - seems too small
According to this blog entry, 255 is the default string length for String values. To overcome the short string length on the phrase text, you can change the <element> tag to
<element column="phrase" type="String" length="4001"></element>
To use this in your domain model, you add a PhraseDictionary property to your entity where you want translatable text. E.g. the title property or decription property.
I think the article describes a great approach, and is the one that I would go
for.
EDIT: In response to the comments, make the length less than 4001 if you know the absolute maximum size is less than that, as this will typically be faster. Also, NHibernate will lazily fetch the collection, but it may fetch all the items at once. You can profile to determine if this has any performance implications. (If you have only a handful of languages then I doubt you will see a difference.) If you have many languages (Say 50+) then it may be worthwhile creating custom properties to fetch the localized text. These will issue queries to fetch specifically the text required. More importantly, you may be able to fetch all the text for a given entity in one query, rather than each localized text property as a separate query.
Note that this extra effort is only needed if profiling gives you reason to be concerned about the performance. Chances are that the implementation in the article as is will function more than adequately.

I only have experience for Hibernate, but since nHibernate is so similar:
One option is to define a component type MultilingualString with members for each language (this assumes the set of languages is known at coding time). This type is also a convenient location to place an getter for the string by language id.
class MultiLingualString {
String english;
String chinese;
String klingon;
String forLanguage(Language lang) {
switch (lang) {
// you can guess what goes here
}
}
}
This results in the strings for all languages being stored in separate columns in the database while the representation in the object world retains fine granularity.
The advantage is that no join is required to fetch the strings. On the other hand, the only way not to fetch a string with this approach is to use a projection, which is a severe limitation if the strings are large, numerous and rarely needed.
If you do this a lot, writing a UserType might be worth it.

From a strictly database oriented standpoint with SQL Server, you should have one table with all of the base data (record key, dates, numbers, etc) and one table with all of the translatable string data. Let call the two tables Base and Base_Description.
Base ensures that there is a single key for each record, the key might be a string or auto-generated id depending on your particular use case.
The Base_Description table is related to the Base table, but also contains a value to select the language that the data is in. In my projects we use the langid column from sys.languages because we can set the language of the connection with and then grab it with ##LANGID for most operations.
In our testing we found this to be significantly faster than having multiple fields for each language, it also allows you to add other languages more easily. We are also using SQL Server Full-Text indexing and it fully works with this method. You should index in the neutral language and then you can pick the language to search against at run time (also filtering against the LangID column in Base_Description).

Do your requirements include the domain objects actually having multiple-language properties in the same object? And, if so, is it unlimited translations stored in the object (in a collection, say - in which case I would say that it would need to be just like any master/detail or parent/child collection) or fixed translations, in which case the languages (and thus the mapping to results of a stored proc or whatever) have to be determined statically anyway?
In many internationalized applications I worked on, the data was in only one language - customer names, the product names (there was no point in mapping even identical products used in one country to products in another, they all had different distributors and different SKUs, and of course localized pricing). The interface was also only in one language (at a time). So all the domain objects only required one language at a time. Thus the language of the translation would be determined when the object was instantiated.
We had translation user interfaces which allowed users to update the translated texts, but these only required two languages at a time (local and the default). I can see this being closest to what you are talking about. I guess that you would have child collections for each translatable property with all the possible translations in the collection. This would probably be closest to the second solution in the third article you linked. Of course, at this point you would also need to see if you want eager/lazy loading etc.

Related

REST URI - GET Resource batch using array of ID's

The title is probably poorly worded, but I'm trying my hand at creating a REST api with symfony. I've studied a few public api's to get a feel for it, and a common principle seems to be dealing with a single resource path at a time. However, the data I'm working with has a lot of levels (7-8), and each level is only guaranteed to be unique under its parent (the whole path makes a composite key).
In this structure, I'd like to get all children resources from all or several parents. I know about filtering data using the queryParam at the end of a URI, but it seems like specifying the parent id(s) as an array is better.
As an example, let's say I have companies in my database, which own routers, which delegate traffic for some number of devices. The REST URI to get all devices for a router might look like this:
/devices/company/:c_id/routers/:r_id/getdevices
but then the user has to crawl through all :r_id's to get all the devices for a company. Some suggestions I've seen all involve moving the :r_id out of the path and using it in the the query string:
/devices/company/:c_id/getdevices?router_id[]=1&router_id[]=2
I get it, but I wouldn't want to use it at that point.
Instead, what seems functionally better, yet philosophically questionable, is doing this:
/devices/company/:c_id/routers/:[r_ids]/getdevices
Where [r_ids] is a stringified array of ids that can be decoded into an array of integers/strings server-side. This also frees up the query-parameter string to focus on filtering devices by attributes (age, price, total traffic, status).
However, I am new to all of this and having trouble finding what is "standard". Is this a reasonable solution?
I'll add I've tested the array string out in Symfony and it works great. But I can't tell if it can become a vehicle for malicious queries since I intend on using Doctrine's DBAL - I'll take tips on that too (although it seems like a problem regardless for string id's)
However, I am new to all of this and having trouble finding what is "standard". Is this a reasonable solution?
TL;DR: yes, it's fine.
You would probably see an identifier like that described using a level 4 URI Template, with your list of identifiers encoded via a path segment expansion.
Your example template might look something like:
/devices/company{/c_id}/routers{/r_ids}/devices
And you would need to communicate to the template consumer that c_id is a company id, and r_ids is a list of router identifiers, or whatever.
You've seen simplified versions of this on the web: URI templates are generalizations of web forms that read information from input controls and encode the inputs into the query string.

How should I store localized versions of user-entered data in my database?

I am working for a client on a web app that requires localization in 3 languages (English and 2 others). I understand how to use resources in an ASP.NET application to display localized versions of static data. However, I am not sure how to approach the issue of localized user-entered data. For example, an administrator may want to add some new metadata the application (e.g. a new product category). This will eventually need to be translated into all 3 languages, but it will initially be entered in whatever language the administrator knows. Since this kind of data is not static, we store it in the database. Should we add a culture code to the primary key to differentiate different localized versions of the same data? Is there a "best practice" or pattern I'm not aware of for this kind of problem?
Have a child table your your entity, with a composite PK of MainItemID and LanguageCode (EN, DE, FR etc). This child table stores your language specific text.
If you always have English, or it is a fallback then you could have the child table for DE, FR etc and the main table for English. A LEFT JOIN and ISNULL will take care of this.
Either way is OK depending on your exact needs which I suspect is the first one. Of course, you'd need to ensure you have at least one child row on data entry of, say, a new product category
I would suggest you make a table to track the Language and then use the languageID as a foreign key in the other table instead of language code.
Language(LanguageID, Name)
And then in the other tables use that LanguageID as a foreign key.
e.g. you are storing localized text in the table
LocalizedTextTable(ID,text,LanguageID)
My solution was to create a string column which holds encoded data for all supported languages. Special application logic is required to insert and extract the data.
Specialized text editor supporting multi-lingual data helped a lot too.

Entity Attribute Value (EAV) vs. XML Column for New Product Atttributes

I have an existing, mature schema to which we need to add some new Product attributes. For example, we have Products.Flavor, and now need to add new attributes such as Weight, Fragrance, etc. Rather than continue to widen the Products table, I am considering a couple of other options. First is a new Attributes table, which will effectively be a property bag for arbitary attributes, and a ProductsAttributes table to store the mappings (and values) for a particular product's attributes. This is the Entity-Attribute-Value (EAV) pattern, as I've come to understand it. The other option is to add a new column to the Products table called Attributes, which is of type XML. Here, we can arbitrarily add attributes to any product instance without adding new tables.
What are the pros/cons to each approach? I'm using SQL Server 2008 and ASP.NET 4.0.
This is (imho) one of the classic database design issues. Call it "attribute creep", perhaps, as once you start, there will always be another attribute or property to add. They key decision is, do you store the data within the database using the basic tools provided by the database (tables and columns) to structure and format the data, or do you store the data in some other fashion (XML and name/value pairs being the most common alternates). Simply put, if you store the data in a form other than that supported by the DBMS system, then you lose the power of the DBMS system to manage, maintain, and work with that data. This is not much of a problem if you only need to store it as "blob data" (dump it all in, pump it all out), but once you start have to seek, sort, or filter by this data, it can get very ugly very fast.
With that said, I do have strong opinions on name/value pairs and XML, but alas, none are positive. If you do have to store your data this way, and yes it can be an entirely valid business/design decision, then I would recommend looking long and hard on how the data you need to store in the database will be used and accessed in the future. Weight the pros and cons of each methodology in light of how it will be used, and pick the once that's easiest to manage and maintain. (Don't pick the one that's easiest to implement, you'll be supporting it for a lot longer than you'll be writing it.)
(It's long, but the "RLH" essay is a classic example of name/value pairs run amok.)
(Oh, and if you're using it, look into SQL Server 2008's "Sparse Columns" option. Doesn't sound like what you need, but you never know.)

Which pattern most closely matches scenario detailed and is it good practice?

I have seen a particular pattern a few times over the last few years. Please let me describe it.
In the UI, each new record (e.g., new customers details) is stored on the form without saving to database. This clearly has been done so not clutter the database or cause unnecessary database hits.
While in the UI state, these objects are identified using a Guid. When these are a saved to the database, their associated Guids are not stored. Instead, they are assigned a database Int as their primary key.
The form can cope with a mixure of retrieved items from the database (using Int) as well as those that have not yet been committed (using Guid).
When inspecting the form (using Firebug) to see which key was used, we found a two part delimited combined key had been used. The first part is a guid (an empty guid if drawn from the database) and the second part is the integer (zero is stored if it is not drawn from the database). As one part of the combined key will always uniquely identify a record, it works rather well.
Is this Good practice or not? Can anyone tell me the pattern name or suggest one if it is not already named?
There are a couple patterns at play here.
Identity Field Pattern
Defined in P of EAA as "Saves a database ID field in an object to maintain identity between an in-memory object and a database row." This part is obvious.
Transaction Script and Metadata Mapping
In general, the ASP.NET DataBound controls use something like an Transaction Script pattern in conjunction with a Metadata Mapping pattern. Fowler defines Metadata Mapping as "holding details of object-relational mapping in metadata". If you have ever written a data source control, the Metadata Mapping aspect of this pattern seems obvious.
The Transaction Script pattern "organizes business logic by procedures where each procedure handles a single request from the presentation." In order to encapsulate the logic of maintaining both presentation state and data-state it is necessary for the intermediary object to indicate:
If a database record exists
How to identify the backend data record, to populate the UI control
How to identify the data and the UI control if there is no current data record, so that presentation data can be updated from the backend datastore.
The presence of the new client data entry Guid and the data-record integer Id provide adequate information to determine all of this with only a single call to the database. This could be accomplished by just using integers (and perhaps giving a unique negative integer for each unpersisted UI data item), but it is probably more explicit to have two separate fields.
Good or Bad Practice?
It depends. ASP.NET is a pretty successful software project, and this pattern seems to work consistently. However, this type of ASP.NET web control has a very specific scope of application - to encapsulate interaction between a UI and a database about data objects with simple mappings. The concerns do seem a little blurred, but for many applicable scenarios this will still be acceptable. The pattern is valid whereever a Row Data Gateway would be acceptable. If there is more than one database row affected by a web control, then this approach will not be functional. In these more complex cases, either an Active Record implementation or the combination of a Domain Model and a Repository implementation would be better suited.
Whether a pattern is good or bad practice really depends on the scenario in which it is being applied. It seems like people tend to advocate more complex design structures, because they can be applied to more scenarios without failing. However, in a very simple application where the mappings between data records and the UI are direct, this pattern is very useful because it creates the intended result while minimizing the amount of performance and development overhead.
I don't think there is a specific pattern for that.
Is it good practice? I don't think so. First, it's not very object oriented. How about:
interface ICommittable
{
/// <summary>
/// Gets or sets a value indicating whether the entity was already committed to the database.
/// </summary>
bool IsCommitted;
/// <summary>
/// Gets or sets the ID of the entity, used either in database or generated by UI or an underlying BL.
/// </summary>
Guid Id;
}
Instead, what they do is to mix three separate data entries in one in a non obvious way:
The ID
Another ID (why?)
A fact that the entity was committed or not.
Especially, having two separate IDs is extremely confusing and will require not only a good documentation, but some time for a new developer to understand what's happening here.
If the purpose was to create new entities without querying the database for a new ID, they could use GUIDs everywhere: when a new entity is created, you Guid.CreateNew it's ID, then, if need, you commit everything, this GUID being the identifier in the database too (there are few chances to have a collision between already saved GUIDs and a new one, so I wouldn't care about that).
Much more simple, isn't it?
It's also not easy to do a few things. For example, how do you compare two entities? Remember that:
Two committed entities which have different GUIDs are not equal,
Two not committed entities which have different IDs are not equal,
A committed entity may be equal to a non committed entity, even if their GUIDs and their IDs will be different.
To conclude, it seems like a lack of refactoring. Probably they were modifying a project where entities were already identified in the database by their id (int) unique key, so instead of refactoring this, they just added GUIDs, thus making the overall thing:
More difficult to understand,
Very difficult to work with and to modify in future.
If I'm not wrong it's the repository pattern: http://martinfowler.com/eaaCatalog/repository.html
This is well described in the Evans Domain Driven Design book and has proven to work well under specific circumstances.

Store in DB or not to store?

There are few string lists in my web application that i don't know where to store in DB or just class.
ie. I have 7 major browsers with which users enter the site. I want to save these stats thus i need to create browser column in UserLogin database. I don't want to waste space and resources so i can save full browser name in each login row. So i either need to save browserID field and hook it up with Browsers table which will store names following db normalization rules or to have sort of Dataholder abstract class which has a list of browsers from which i can retrieve browser name by it's ID...
The question what should i do ? These few data lists i have contain no more than 200 items each so i think it makes sense to have them as abstract class but again i don't know whether MS-SQL will handle multiple joins so well. Think of idea when i have user with country,ip,language,browser and few more stats ..
thanks
I have been on both sides of the fence about this.
My rule of thumb is:
If one of these lists changes, will I have to do changes to the code, too?
(e.g..: in your case, if someone writes "yet another browser" tomorrow, will I need to write code that caters for it?)
If the answer is "most probably yes" or "definitely" you can leave it inside code.
In all other cases (even just a "maybe, 50%-50%) you better put it in the DB, or at the very least a property file.
And please consider this, too: if you expect to have to provide statistics based on this data (e.g.: "how many users use Explorer") you better put it in the DB anyway: it becomes part of your domain data and therefore it must be there.
About the "domain data" part.
The information stored in your DB is the "domain data" of your application. It is, in a sense, a (hopefully consistent) representation of what your application is about - it represents the "known universe" for your application.
If you agree to this definition, then you must also accept that it does not make sense to have 99.9% of your "reality" in the DB, and 0.1% outside of it - if nothing else, it makes some operations cumbersome (if you only store the smallint you can't create meaningful reports without either post-processing them using the class to decode "1" into "Firefox" or providing some other key for the end-user).
It also makes impossible for you to leverage some inherent DB techniques like foreign key (if you just use a smallint without correlating it to any other table, who guarantees that "10" is an acceptable value in your domain?)
MS SQL handles multiple joins really well; it's up to you where you want to store the data. You can also consider XML too, as another option. I would consider the database or XL; it is easier to change the values than if the values are in code (have to recompile/deploy to change when in production).
HTH.

Resources