using guid as entity Id - guid

I strongly used to use hilo algorithm for generating pk ids, and is really veryusefull also for readibility and fast for locating an entity.
Is not the same with GUID, and as i see many examples of abp are using it.
Any suggestion, about using GUID o HILO? (Despite the fact that guid is almost universe unique)
I feel that guid is too cumbersome for debugging
add an extra column with an id Integer, will it be too much?

Related

Is it best practice to reference the IdentityUser primary throughout the application?

I'm relatively new to .NET Core MVC / Web API and I'm currently implementing a multi user, multi role system.
The default IdentityUser that I extend from has a Guid primary key (which I know I can override).
In all other systems yet I pointed to the user's auto incremented PK int value to reference data that belongs to a certain user.
The question now is if this is Microsoft's intended use of this primary key for IdentityUser and if it's ok when I reference in all tables this (big) Guid, or I could create a separate Users table and create an (IdentityUser.Id, UserId auto increment) separate table and reference this UserId int. Or even another solution which I don't know yet.
I especially ask because I read multiple times that this Guid is supposed to be kept secret, but when I start to reference the Guid everywhere the likelihood of leakage increases.
Database is where all your secrets are stored, therefore, you don't need to worry if your GUIDs are used as foreign keys in other tables because they are still in your database. Maybe what you read about keeping it secret meant that you should not expose it to client-side apps, although I still disagree. Because one advantage of GUID is that even if a user gains one GUID, (unlike auto-incremented integers) he cannot guess the other ones.

Which pattern most closely matches scenario detailed and is it good practice?

I have seen a particular pattern a few times over the last few years. Please let me describe it.
In the UI, each new record (e.g., new customers details) is stored on the form without saving to database. This clearly has been done so not clutter the database or cause unnecessary database hits.
While in the UI state, these objects are identified using a Guid. When these are a saved to the database, their associated Guids are not stored. Instead, they are assigned a database Int as their primary key.
The form can cope with a mixure of retrieved items from the database (using Int) as well as those that have not yet been committed (using Guid).
When inspecting the form (using Firebug) to see which key was used, we found a two part delimited combined key had been used. The first part is a guid (an empty guid if drawn from the database) and the second part is the integer (zero is stored if it is not drawn from the database). As one part of the combined key will always uniquely identify a record, it works rather well.
Is this Good practice or not? Can anyone tell me the pattern name or suggest one if it is not already named?
There are a couple patterns at play here.
Identity Field Pattern
Defined in P of EAA as "Saves a database ID field in an object to maintain identity between an in-memory object and a database row." This part is obvious.
Transaction Script and Metadata Mapping
In general, the ASP.NET DataBound controls use something like an Transaction Script pattern in conjunction with a Metadata Mapping pattern. Fowler defines Metadata Mapping as "holding details of object-relational mapping in metadata". If you have ever written a data source control, the Metadata Mapping aspect of this pattern seems obvious.
The Transaction Script pattern "organizes business logic by procedures where each procedure handles a single request from the presentation." In order to encapsulate the logic of maintaining both presentation state and data-state it is necessary for the intermediary object to indicate:
If a database record exists
How to identify the backend data record, to populate the UI control
How to identify the data and the UI control if there is no current data record, so that presentation data can be updated from the backend datastore.
The presence of the new client data entry Guid and the data-record integer Id provide adequate information to determine all of this with only a single call to the database. This could be accomplished by just using integers (and perhaps giving a unique negative integer for each unpersisted UI data item), but it is probably more explicit to have two separate fields.
Good or Bad Practice?
It depends. ASP.NET is a pretty successful software project, and this pattern seems to work consistently. However, this type of ASP.NET web control has a very specific scope of application - to encapsulate interaction between a UI and a database about data objects with simple mappings. The concerns do seem a little blurred, but for many applicable scenarios this will still be acceptable. The pattern is valid whereever a Row Data Gateway would be acceptable. If there is more than one database row affected by a web control, then this approach will not be functional. In these more complex cases, either an Active Record implementation or the combination of a Domain Model and a Repository implementation would be better suited.
Whether a pattern is good or bad practice really depends on the scenario in which it is being applied. It seems like people tend to advocate more complex design structures, because they can be applied to more scenarios without failing. However, in a very simple application where the mappings between data records and the UI are direct, this pattern is very useful because it creates the intended result while minimizing the amount of performance and development overhead.
I don't think there is a specific pattern for that.
Is it good practice? I don't think so. First, it's not very object oriented. How about:
interface ICommittable
{
/// <summary>
/// Gets or sets a value indicating whether the entity was already committed to the database.
/// </summary>
bool IsCommitted;
/// <summary>
/// Gets or sets the ID of the entity, used either in database or generated by UI or an underlying BL.
/// </summary>
Guid Id;
}
Instead, what they do is to mix three separate data entries in one in a non obvious way:
The ID
Another ID (why?)
A fact that the entity was committed or not.
Especially, having two separate IDs is extremely confusing and will require not only a good documentation, but some time for a new developer to understand what's happening here.
If the purpose was to create new entities without querying the database for a new ID, they could use GUIDs everywhere: when a new entity is created, you Guid.CreateNew it's ID, then, if need, you commit everything, this GUID being the identifier in the database too (there are few chances to have a collision between already saved GUIDs and a new one, so I wouldn't care about that).
Much more simple, isn't it?
It's also not easy to do a few things. For example, how do you compare two entities? Remember that:
Two committed entities which have different GUIDs are not equal,
Two not committed entities which have different IDs are not equal,
A committed entity may be equal to a non committed entity, even if their GUIDs and their IDs will be different.
To conclude, it seems like a lack of refactoring. Probably they were modifying a project where entities were already identified in the database by their id (int) unique key, so instead of refactoring this, they just added GUIDs, thus making the overall thing:
More difficult to understand,
Very difficult to work with and to modify in future.
If I'm not wrong it's the repository pattern: http://martinfowler.com/eaaCatalog/repository.html
This is well described in the Evans Domain Driven Design book and has proven to work well under specific circumstances.

What is the best way to implement multilingual domain objects using NHibernate?

What is the best way to design the Domain objects which can have multi-lingual fields. An example can be a Product class with Description being multi-lingual.
I have found few links but could not decide which one is the best way.
http://fabiomaulo.blogspot.com/2009/06/localized-property-with-nhibernate.html
(This stores all localised language data in one field. Can be a problem if we query from Sql)
http://ayende.com/Blog/archive/2006/12/26/LocalizingNHibernateContextualParameters.aspx
(This one has a warning at the beginning that it is a hack and no longer supported)
http://www.webdevbros.net/2009/06/24/create-a-multi-languaged-domain-model-with-nhibernate-and-c/
(This does not describe how multilingual data will be structured in the database.)
Anyone having experience with using NHibernate with multi-lingual data. Is there a better way?
The third option looks great. The hibernate mapping is given, but not the database schema - if that's what you are missing, then I'll sketch it out here:
dictionary
----------
ID: int - identity
name: nvarchar(255)
phrase
------
dictionary_id:int (fkey dictionary.ID)
culture_id:int (LCID)
phrase:nvarchar(255) - this is the default size - seems too small
According to this blog entry, 255 is the default string length for String values. To overcome the short string length on the phrase text, you can change the <element> tag to
<element column="phrase" type="String" length="4001"></element>
To use this in your domain model, you add a PhraseDictionary property to your entity where you want translatable text. E.g. the title property or decription property.
I think the article describes a great approach, and is the one that I would go
for.
EDIT: In response to the comments, make the length less than 4001 if you know the absolute maximum size is less than that, as this will typically be faster. Also, NHibernate will lazily fetch the collection, but it may fetch all the items at once. You can profile to determine if this has any performance implications. (If you have only a handful of languages then I doubt you will see a difference.) If you have many languages (Say 50+) then it may be worthwhile creating custom properties to fetch the localized text. These will issue queries to fetch specifically the text required. More importantly, you may be able to fetch all the text for a given entity in one query, rather than each localized text property as a separate query.
Note that this extra effort is only needed if profiling gives you reason to be concerned about the performance. Chances are that the implementation in the article as is will function more than adequately.
I only have experience for Hibernate, but since nHibernate is so similar:
One option is to define a component type MultilingualString with members for each language (this assumes the set of languages is known at coding time). This type is also a convenient location to place an getter for the string by language id.
class MultiLingualString {
String english;
String chinese;
String klingon;
String forLanguage(Language lang) {
switch (lang) {
// you can guess what goes here
}
}
}
This results in the strings for all languages being stored in separate columns in the database while the representation in the object world retains fine granularity.
The advantage is that no join is required to fetch the strings. On the other hand, the only way not to fetch a string with this approach is to use a projection, which is a severe limitation if the strings are large, numerous and rarely needed.
If you do this a lot, writing a UserType might be worth it.
From a strictly database oriented standpoint with SQL Server, you should have one table with all of the base data (record key, dates, numbers, etc) and one table with all of the translatable string data. Let call the two tables Base and Base_Description.
Base ensures that there is a single key for each record, the key might be a string or auto-generated id depending on your particular use case.
The Base_Description table is related to the Base table, but also contains a value to select the language that the data is in. In my projects we use the langid column from sys.languages because we can set the language of the connection with and then grab it with ##LANGID for most operations.
In our testing we found this to be significantly faster than having multiple fields for each language, it also allows you to add other languages more easily. We are also using SQL Server Full-Text indexing and it fully works with this method. You should index in the neutral language and then you can pick the language to search against at run time (also filtering against the LangID column in Base_Description).
Do your requirements include the domain objects actually having multiple-language properties in the same object? And, if so, is it unlimited translations stored in the object (in a collection, say - in which case I would say that it would need to be just like any master/detail or parent/child collection) or fixed translations, in which case the languages (and thus the mapping to results of a stored proc or whatever) have to be determined statically anyway?
In many internationalized applications I worked on, the data was in only one language - customer names, the product names (there was no point in mapping even identical products used in one country to products in another, they all had different distributors and different SKUs, and of course localized pricing). The interface was also only in one language (at a time). So all the domain objects only required one language at a time. Thus the language of the translation would be determined when the object was instantiated.
We had translation user interfaces which allowed users to update the translated texts, but these only required two languages at a time (local and the default). I can see this being closest to what you are talking about. I guess that you would have child collections for each translatable property with all the possible translations in the collection. This would probably be closest to the second solution in the third article you linked. Of course, at this point you would also need to see if you want eager/lazy loading etc.

Why does aspnet_users use guid for id rather than incrementing int? bonus points for help on extending user fields

Why does aspnet_users use guid for id rather than incrementing int?
Also is there any reason no to use this in other tables as the primary key? It feels a bit odd as I know most apps I've worked with in the past just use the normal int system.
I'm also about to start using this id to match against an extended details table for extra user prefs etc. I was also considering using a link table with a guid and an int in it, but decided that as I don't think I actually need to have user id as a public int.
Although I would like to have the int (feels easier to do a user lookup etc stackoverflow.com/users/12345/user-name ) , as I am just going to have the username, I don't think I need to carry this item around, and incure the extra complexity of lookups when I need to find a users int.
Thanks for any help with any of this.
It ensures uniqueness across disconnected systems. Any data store which may need to interface with another previously unconnected datastore can potentially encounter collisions - e.g. they both used int to identify users, now we have to go through a complex resolution process to choose new IDs for the conflicting ones and update all references accordingly.
The downside to using a standard uniqueidentifier in SQL (with newid()) as the primary key is that GUIDs are not sequential, so as new rows are created they are inserted at some arbitrary position in the physical database page, instead of appended to the end. This causes severe page fragmentation for systems that have any substantial insert rate. It can be corrected by using newsequentialid() instead. I discussed this in more detail here.
In general, its best practice to either use newsequentialid() for your GUID primary key, or just don't use GUIDs for the primary key. You can always have a secondary indexed column that stores a GUID, which you can use to keep your references unique.
GUIDs as a primary key are quite popular with certain groups of programmers who don't really (don't want to or don't know to) care about their underlying database.
GUIDs are cool because they're (almost) guaranteed to be unique, and you can create them "ahead of time" on the client app in .NET code.
Unfortunately, those folks aren't aware of the terrible downsides (horrendous index fragmentation and thus performance loss) of those choices when it comes to SQL Server. Lots of programmer really just don't care one bit..... and then blame SQL Server for being slow as a dog.
If you want to use GUIDs for your primary keys (and they do have some really good uses, as Rex M. pointed out - in replication scenarios mostly), then OK, but make sure to use a INT IDENTITY column as your clustering key in SQL Server to minimize index fragmentation and thus performance losses.
Kimberly Tripp, the "Queen of SQL Server Indexing", has a bunch of really good and insightful articles on the topic - see some of my favorites:
GUIDs as Primary and/or clustering key
The clustered index key debate continues....
The clustered index key debate....again!
Indexes in SQL Server 2005/2008 Best Practices
and basically anything she ever publishes on her blog is worth reading.

Should I use the username, or the user's ID to reference authenticated users in ASP.NET

So in my simple learning website, I use the built in ASP.NET authentication system.
I am adding now a user table to save stuff like his zip, DOB etc. My question is:
In the new table, should the key be the user name (the string) or the user ID which is that GUID looking number they use in the asp_ tables.
If the best practice is to use that ugly guid, does anyone know how to get it? it seems to not be accessible as easily as the name (System.Web.HttpContext.Current.User.Identity.Name)
If you suggest I use neither (not the guid nor the userName fields provided by ASP.NET authentication) then how do I do it with ASP.NET authentication? One option I like is to use the email address of the user as login, but how to I make ASP.NET authentication system use an email address instead of a user name? (or there is nothing to do there, it is just me deciding I "know" userName is actually an email address?
Please note:
I am not asking on how get a GUID in .NET, I am just referring to the userID column in the asp_ tables as guid.
The user name is unique in ASP.NET authentication.
You should use some unique ID, either the GUID you mention or some other auto generated key. However, this number should never be visible to the user.
A huge benefit of this is that all your code can work on the user ID, but the user's name is not really tied to it. Then, the user can change their name (which I've found useful on sites). This is especially useful if you use email address as the user's login... which is very convenient for users (then they don't have to remember 20 IDs in case their common user ID is a popular one).
You should use the UserID.
It's the ProviderUserKey property of MembershipUser.
Guid UserID = new Guid(Membership.GetUser(User.Identity.Name).ProviderUserKey.ToString());
I would suggest using the username as the primary key in the table if the username is going to be unique, there are a few good reasons to do this:
The primary key will be a clustered index and thus search for a users details via their username will be very quick.
It will stop duplicate usernames from appearing
You don't have to worry about using two different peices of information (username or guid)
It will make writing code much easier because of not having to lookup two bits of information.
I would use a userid. If you want to use an user name, you are going to make the "change the username" feature very expensive.
I would say use the UserID so Usernames can still be changed without affecting the primary key. I would also set the username column to be unique to stop duplicate usernames.
If you'll mainly be searching on username rather than UserID then make Username a clustered index and set the Primary key to be non clustered. This will give you the fastest access when searching for usernames, if however you will be mainly searching for UserIds then leave this as the clustered index.
Edit : This will also fit better with the current ASP.Net membership tables as they also use the UserID as the primary key.
I agree with Palmsey,
Though there seems to be a little error in his code:
Guid UserID = new Guid(Membership.GetUser(User.Identity.Name)).ProviderUserKey.ToString());
should be
Guid UserID = new Guid(Membership.GetUser(User.Identity.Name).ProviderUserKey.ToString());
This is old but I just want people who find this to note a few things:
The aspnet membership database IS optimized when it comes to accessing user records. The clustered index seek (optimal) in sql server is used when a record is searched for using loweredusername and applicationid. This makes a lot of sense as we only have the supplied username to go on when the user first sends their credentials.
The guid userid will give a larger index size than an int but this is not really significant because we often only retrieve 1 record (user) at a time and in terms of fragmentation, the number of reads usually greately outweighs the number of writes and edits to a users table - people simply don't update that info all that often.
the regsql script that creates the aspnet membership tables can be edited so that instead of using NEWID as the default for userid, it can use NEWSEQUENTIALID() which delivers better performance (I have profiled this).
Profile. Someone creating a "new learning website" should not try to reinvent the wheel. One of the websites I have worked for used an out of the box version of the aspnet membership tables (excluding the horrible profile system) and the users table contained nearly 2 million user records. Even with such a high number of records, selects were still fast because, as I said to begin with, the database indexes focus on loweredusername+applicationid to peform clustered index seek for these records and generally speaking, if sql is doing a clustered index seek to find 1 record, you don't have any problems, even with huge numbers of records provided that you dont add columns to the tables and start pulling back too much data.
Worrying about a guid in this system, to me, based on actual performance and experience of the system, is premature optimization. If you have an int for your userid but the system performs sub-optimal queries because of your custom index design etc. the system won't scale well. The Microsoft guys did a generally good job with the aspnet membership db and there are many more productive things to focus on than changing userId to int.
I would use an auto incrementing number usually an int.
You want to keep the size of the key as small as possible. This keeps your index small and benefits any foreign keys as well. Additonally you are not tightly coupling the data design to external user data (this holds true for the aspnet GUID as well).
Generally GUIDs don't make good primary keys as they are large and inserts can happen at potentially any data page within the table rather than at the last data page. The main exception to this is if you are running mutilple replicated databases. GUIDs are very useful for keys in this scenario, but I am guessing you only have one database so this is not a problem.
If you're going to be using LinqToSql for development, I would recommend using an Int as a primary key. I've had many issues when I had relationships built off of non-Int fields, even when the nvarchar(x) field had constraints to make it a unique field.
I'm not sure if this is a known bug in LinqToSql or what, but I've had issues with it on a current project and I had to swap out PKs and FKs on several tables.
I agree with Mike Stone. I would also suggest only using a GUID in the event you are going to be tracking an enormous amount of data. Otherwise, a simple auto incrementing integer (Id) column will suffice.
If you do need the GUID, .NET is lovely enough that you can get one by a simple...
Dim guidProduct As Guid = Guid.NewGuid()
or
Guid guidProduct = Guid.NewGuid();
I'm agreeing with Mike Stone also. My company recently implemented a new user table for outside clients (as opposed to internal users who authenticate through LDAP). For the external users, we chose to store the GUID as the primary key, and store the username as varchar with unique constraints on the username field.
Also, if you are going to store the password field, I highly recommend storing the password as a salted, hashed binary in the database. This way, if someone were to hack your database, they would not have access to your customer's passwords.
I would use the guid in my code and as already mentioned an email address as username. It is, after all, already unique and memorable for the user. Maybe even ditch the guid (v. debateable).
Someone mentioned using a clustered index on the GUID if this was being used in your code. I would avoid this, especially if INSERTs are high; the index will be rebuilt every time you INSERT a record. Clustered indexes work well on auto increment IDs though because new records are appended only.

Resources