One requirement is that when persisting my C# objects to the database I must decide the database ID (surrogate primary key) in code.
Second requirement is that the database type for the key must be int or char(x)... so no uniqueidentifier or binary(16) or the like.
These are unchangeable requirements.
What would be the best way to go about handling this?
One idea is the base64 encoded GUIDs looking like "XSiZtdXcKU68QWe7N96Dig". These are easily created in code and are to me acceptable in URLs if necessary. But will it be too expensive regarding performance (indexing, size) having all primary and foreign keys be char(22)? Off hand I really like this idea.
Another idea would be to create a code version of a database sequence creating incremented integers for me. But I don't know if this is plausible and would need some guidance to secure the reliability. The sequencer must know har far it has come and what about threads that I don't control etc.
I imagine that no table involved will ever exceed 1.000.000 rows... will probably be far less.
You could have a table called "sequences". For each table there would be a row with a counter. Then, when you need another number, fetch it from the counter table and increment it. Put it in a transaction and you will have uniqueness.
However this will suffer in terms of performance, of course.
A simple incrementing int would be the easiest way to ensure uniqueness. This is what the database will do if you let it. If you set the table row to auto_increment, the database will do this for you automatically.
There are no security issues with this, but since you will be handling it yourself instead of letting the database engine take care of it, you will need to ensure that you don't generate the same id twice. This should be simple if you are on a single threaded system, but if your program is distributed you will need to put some effort into ensuring the uniqueness.
Seeing that you have an ASP.NET app, you could do the following (hoping and assuming all users must authenticate themselves before using your app!):
Assign each user a unique "UserID" in your database (can be INT, or CHAR)
Assign each user a "HighestSequentialID" (INT) in your database
When the user logs on, read those values from the database and store them in e.g. a custom principal, or in a cookie, or something else
whenever the user is about to insert a row, create a segmented ID: (UserID).(User's sequential number) and store it as "VARCHAR(20)" - e.g. your UserID is 15 and thus this user's entries would have unique IDs of "15.00001", "15.00002" and so on.
when the user logs off (or at any other time), update its new, highest used sequential ID in the database so that next time around, you'll know what this user has used last
Again - you'll have to do a lot more housekeeping work yourself, and it's always prone to a mishap (assigning a duplicate user ID, or misinterpreting the highest sequential number for that user).
I would strongly recommend trying to get these requirements changed - with these in place, all solutions will be sub-optimal at best, while using the database to handle this would be totally painless.
Marc
For a table below 1.000.000 rows, I would not be too terribly concerned about a char(22) Primary key. Of course the ideal solution for a situation like this would be for each object to have something unique about it that you could leverage for the key, even if it is a multi-part key. The next ideal solution would be to have the requirements changed :)
Related
I'm designing an application where my Order objects need to have a sequential and user-friendly Id field. I'm avoiding the HiLo algorithm because of the rather large gaps it produces (see here). Naturally, Guid values would make my corporate users go bananas. I'm also avoiding Oracle sequences because of the major disadvantages of it:
(From: NHibernate POID Generators revealed)
Post insert generators, as the name
suggest, assigns the id’s after the
entity is stored in the database. A
select statement is executed against
database. They have many drawbacks,
and in my opinion they must be used
only on brownfield projects. Those
generators are what WE DO NOT SUGGEST
as NH Team.
> Some of the drawbacks are the
following:
Unit Of Work is broken with the use of
those strategies. It doesn’t matter if
you’re using FlushMode.Commit, each
Save results in an insert statement
against DB. As a best practice, we
should defer insertions to the commit,
but using a post insert generator
makes it commit on save (which is what
UoW doesn’t do).
Those strategies
nullify batcher, you can’t take the
advantage of sending multiple queries
at once(as it must go to database at
the time of Save).
Any ideas/experience on implementing user-friendly IDs without major gaps between them?
Edit:
User friendly Id fields are ones my corporate users can memorize and even discuss and/or have phone conversations talking about a particular Order by its code, e.g. "I'm calling to know why the order #1625 was denied.".
The Id doesn't need to be strictly gapless, but I am worried that my users would get confused when they see gaps like 100, 201, 305. For my older projects, I currently implement NHibernate using Oracle sequences which occasionally lose a few sequences when exceptions are thrown, but yet keep a rather tidy order to them. The downside to them is how they break the Unit of Work which results in additional hits to the database for every Save command with or without the Session.Flush.
One option would be to keep a key-table that simply stores an incrementing value. This can introduce a few problems, namely possible locking issues as well as additional hits to the database.
Another option might be to refine what you mean by "User-friendly Id". This could consist of a combination of a Date/Time and a customer-specific sequence (or including the customer id as well). Also, your order id does not necessarily have to be the actual key on the table. There is nothing to say that you can't use a surrogate key with a separate "calculated" column which represents the order id.
The bottom-line is that it sounds like you want to use a surrogate key, but have the benefits of a natural key. It can be very difficult to have it both ways and a lot comes down to how you actually plan on using the data, how users interpret the data, and personal preference.
I use SQL Server and when I create a new table I make a specific field an auto increment
primary key. The problem is some people told me making the field an auto increment for the primary key means when deleting any record (they don't care about the auto increment field number) the field increases so at some point - if the type of my field is integer for example - the range of integer will be consumed totally and i will be in trouble. So they tell me not to use this feature any more.
The best solution is making this through the code by getting the max of my primary key then if the value does not exist the max will be 1 other wise max + 1.
Any suggestions about this problem? Can I use the auto increment feature?
I want also to know the cases which are not preferable to use auto increment ..and the alternatives...
note :: this question is general not specific to any DBMS , i wanna to know is this true also for DBMSs like ORACLE ,Mysql,INFORMIX,....
Thanks so much.
You should use identity (auto increment) columns. The bigint data type can store values up to 2^63-1 (9,223,372,036,854,775,807). I don't think your system is going to reach this value soon, even if you are inserting and deleting lots of records.
If you implement the method you propose properly, you will end up with a lot of locking problems. Otherwise, you will have to deal with exceptions thrown because of constraint violation (or even worse - non-unique values, if there is no primary key constraint).
An int datatype in SQL Server can hold values from -2,147,483,648 through 2,147,483,647.
If you seed your identity column with -2,147,483,648, e.g. FooId identity(-2,147,483,648, 1) then you have over 4 billion values to play with.
If you really think this is still not enough, you could use a bigint, which can hold values from -9,223,372,036,854,775,808 through 9,223,372,036,854,775,807, but this almost guaranteed to be overkill. Even with large data volumes and/or a large number of transactions, you will probably either run out of disk space or exhaust the lifetime of your application before you exhaust the identity values when using an int, and almost certainly when using a bigint.
To summarise, you should use an identity column and you should not care about gaps in the values since a) you have enough candidate values and b) it's an abstract number with no logical meaning.
If you were to implement the solution you suggest, with the code deriving the next identity column, you would have to consider concurrency, since you will have to synchronise access to the current maximum identity value between two competing transactions. Indeed, you may end up introducing a significant performance degradation, since you will have to first read the max value, calculate and then insert (not to mention the extra work involved in synchronising concurrent transactions). If, however, you use an identity column, concurrency will be handled for you by the database engine.
The solution they suggest can, and most likely will, create a concurrency problem and/or scalability problem. If two sessions use the Max technique you describe at the same time, they can come up with the same number and then both try to add it at the same time. This will create a constraint violation.
You can work around that problem by locking the table or catching exceptions, and keep re-inserting.. but that's a really bad way to do things. Locking will reduce performance and cause scalability issues (and if you're planning as many records as to be worried about overflowing an int then you will need scalability).
Identity fields are atomic operations. Two sessions cannot create the same identity field, so this problem is non-existent when using it.
If you're concerned that an identity field may overflow, then use a larger datatype, such as bigint. You would be hard pressed to generate enough records to overflow that.
Now, there are valid reasons NOT to use an identity field, but this is not one of them.
Continue to use the identity feature with PK in SQL Server. In mysql, there is also auto increment feature. Don't worry that you run out of integer range, you will run out of hard disk space before that happens.
I would advice AGAINST using the Identity/Auto-increment, because:
It's implementation is broken in SQL server 2005/2008. Read more
It doesn't work well if you are going to use an ORM to map your database to objects. Read more
I would advice you to use the Hi/Lo generator if you usually access your database through a program and don't depend on sending insert statements manually to the DB. You can read more about it in the second link.
This is just a general question irrespective of database architecture.
I am maintaining an ASP.NET web application. The structure is such that,
Say on 'Add a new employee' webform
The primary key (or the record id to
be saved with) is initially loaded on form
load event & displayed as a label
So when the form loads, the record id to save with is shown to the user
Positives:
End user already knows what the id/serial of the form is (even before he saves the form)
So on form save when he is directed
to gridview screen (with all entries)
he can search records easily
(although the most recent one is at
the top anyway)
Negatives:
If he does not save the form, say he
just cancels after loading the data entry form,
the id/key initially fetched is
wasted (in my case it is a sequence
field fetched on form load from database)
What do you guys do in these scenarios ? Which approach would you recommend for 'web applications'? And how to facilitate the user with a different approach ? Is our current approach recommended (To me,it wastes the ids/sequence from database)
I'd always recommend not presenting the identity field value for the record being created until the record has been created. The "create a temporary placeholder record first to obtain the identity field value ahead of time" approach can, as you mention, result in wasted IDs, unless you have a process in place to reclaim them.
You can always pop-up a message box when the user presses save that tells them the identity field value of the newly created record.
In this situation you could use a GUID created by the application itself. The database would then only have the PK set to be a Unique Identifier (GUID) and that it must not be null. In this situation you are not wasting any unique keys as each call to get a new GUID should be definition produce a (mathmatically) unique identifier. It is worth noting that if you use this method, it is best to make sure your PK is not set up to be clustered. The resulting index reorganisation upon insert could quickly result in an application that suffers performance hits.
For one: I wouldn't care so much about wasted id values. When you are in danger of running out of int32 values (and when has that happened to you last?), use int64. The user experience is way much more important than wasting a few id values.
Having said that, I would not want the primary key to be anything the user would want to type in. If you are having a primary key that users need to type in, chances are it then is (or will be requested to be) more than just an int32/64 value and carries (will carry) meaning in its composition and/or formatting. Primary keys should not have that. (Tons of reasons google for meaningless primary keys or other such terms).
If you need a meaningful key, make it a secondary index that is in no way related to the primary key. If a part of that is still a sequential number taken from some counter value in your database. Decide whether functionally it is a problem for gaps to appear in the sequence. (The tax people generally don't want gaps in invoice numbers). If functionally it is no problem, then certainly don't start worrying about it technically. If functionally it is a problem, then yes, you have no option but to wait for the save in order to show it to the user. But, please, when you do, don't do it in a popup. They are horribly intrusive as they have to be dismissed. Just put up an informative message on the screen where the user is sent after (s)he saves the new employee. Much like gmail is telling you about actions you have performed just above the list of messages.
Why does aspnet_users use guid for id rather than incrementing int?
Also is there any reason no to use this in other tables as the primary key? It feels a bit odd as I know most apps I've worked with in the past just use the normal int system.
I'm also about to start using this id to match against an extended details table for extra user prefs etc. I was also considering using a link table with a guid and an int in it, but decided that as I don't think I actually need to have user id as a public int.
Although I would like to have the int (feels easier to do a user lookup etc stackoverflow.com/users/12345/user-name ) , as I am just going to have the username, I don't think I need to carry this item around, and incure the extra complexity of lookups when I need to find a users int.
Thanks for any help with any of this.
It ensures uniqueness across disconnected systems. Any data store which may need to interface with another previously unconnected datastore can potentially encounter collisions - e.g. they both used int to identify users, now we have to go through a complex resolution process to choose new IDs for the conflicting ones and update all references accordingly.
The downside to using a standard uniqueidentifier in SQL (with newid()) as the primary key is that GUIDs are not sequential, so as new rows are created they are inserted at some arbitrary position in the physical database page, instead of appended to the end. This causes severe page fragmentation for systems that have any substantial insert rate. It can be corrected by using newsequentialid() instead. I discussed this in more detail here.
In general, its best practice to either use newsequentialid() for your GUID primary key, or just don't use GUIDs for the primary key. You can always have a secondary indexed column that stores a GUID, which you can use to keep your references unique.
GUIDs as a primary key are quite popular with certain groups of programmers who don't really (don't want to or don't know to) care about their underlying database.
GUIDs are cool because they're (almost) guaranteed to be unique, and you can create them "ahead of time" on the client app in .NET code.
Unfortunately, those folks aren't aware of the terrible downsides (horrendous index fragmentation and thus performance loss) of those choices when it comes to SQL Server. Lots of programmer really just don't care one bit..... and then blame SQL Server for being slow as a dog.
If you want to use GUIDs for your primary keys (and they do have some really good uses, as Rex M. pointed out - in replication scenarios mostly), then OK, but make sure to use a INT IDENTITY column as your clustering key in SQL Server to minimize index fragmentation and thus performance losses.
Kimberly Tripp, the "Queen of SQL Server Indexing", has a bunch of really good and insightful articles on the topic - see some of my favorites:
GUIDs as Primary and/or clustering key
The clustered index key debate continues....
The clustered index key debate....again!
Indexes in SQL Server 2005/2008 Best Practices
and basically anything she ever publishes on her blog is worth reading.
So in my simple learning website, I use the built in ASP.NET authentication system.
I am adding now a user table to save stuff like his zip, DOB etc. My question is:
In the new table, should the key be the user name (the string) or the user ID which is that GUID looking number they use in the asp_ tables.
If the best practice is to use that ugly guid, does anyone know how to get it? it seems to not be accessible as easily as the name (System.Web.HttpContext.Current.User.Identity.Name)
If you suggest I use neither (not the guid nor the userName fields provided by ASP.NET authentication) then how do I do it with ASP.NET authentication? One option I like is to use the email address of the user as login, but how to I make ASP.NET authentication system use an email address instead of a user name? (or there is nothing to do there, it is just me deciding I "know" userName is actually an email address?
Please note:
I am not asking on how get a GUID in .NET, I am just referring to the userID column in the asp_ tables as guid.
The user name is unique in ASP.NET authentication.
You should use some unique ID, either the GUID you mention or some other auto generated key. However, this number should never be visible to the user.
A huge benefit of this is that all your code can work on the user ID, but the user's name is not really tied to it. Then, the user can change their name (which I've found useful on sites). This is especially useful if you use email address as the user's login... which is very convenient for users (then they don't have to remember 20 IDs in case their common user ID is a popular one).
You should use the UserID.
It's the ProviderUserKey property of MembershipUser.
Guid UserID = new Guid(Membership.GetUser(User.Identity.Name).ProviderUserKey.ToString());
I would suggest using the username as the primary key in the table if the username is going to be unique, there are a few good reasons to do this:
The primary key will be a clustered index and thus search for a users details via their username will be very quick.
It will stop duplicate usernames from appearing
You don't have to worry about using two different peices of information (username or guid)
It will make writing code much easier because of not having to lookup two bits of information.
I would use a userid. If you want to use an user name, you are going to make the "change the username" feature very expensive.
I would say use the UserID so Usernames can still be changed without affecting the primary key. I would also set the username column to be unique to stop duplicate usernames.
If you'll mainly be searching on username rather than UserID then make Username a clustered index and set the Primary key to be non clustered. This will give you the fastest access when searching for usernames, if however you will be mainly searching for UserIds then leave this as the clustered index.
Edit : This will also fit better with the current ASP.Net membership tables as they also use the UserID as the primary key.
I agree with Palmsey,
Though there seems to be a little error in his code:
Guid UserID = new Guid(Membership.GetUser(User.Identity.Name)).ProviderUserKey.ToString());
should be
Guid UserID = new Guid(Membership.GetUser(User.Identity.Name).ProviderUserKey.ToString());
This is old but I just want people who find this to note a few things:
The aspnet membership database IS optimized when it comes to accessing user records. The clustered index seek (optimal) in sql server is used when a record is searched for using loweredusername and applicationid. This makes a lot of sense as we only have the supplied username to go on when the user first sends their credentials.
The guid userid will give a larger index size than an int but this is not really significant because we often only retrieve 1 record (user) at a time and in terms of fragmentation, the number of reads usually greately outweighs the number of writes and edits to a users table - people simply don't update that info all that often.
the regsql script that creates the aspnet membership tables can be edited so that instead of using NEWID as the default for userid, it can use NEWSEQUENTIALID() which delivers better performance (I have profiled this).
Profile. Someone creating a "new learning website" should not try to reinvent the wheel. One of the websites I have worked for used an out of the box version of the aspnet membership tables (excluding the horrible profile system) and the users table contained nearly 2 million user records. Even with such a high number of records, selects were still fast because, as I said to begin with, the database indexes focus on loweredusername+applicationid to peform clustered index seek for these records and generally speaking, if sql is doing a clustered index seek to find 1 record, you don't have any problems, even with huge numbers of records provided that you dont add columns to the tables and start pulling back too much data.
Worrying about a guid in this system, to me, based on actual performance and experience of the system, is premature optimization. If you have an int for your userid but the system performs sub-optimal queries because of your custom index design etc. the system won't scale well. The Microsoft guys did a generally good job with the aspnet membership db and there are many more productive things to focus on than changing userId to int.
I would use an auto incrementing number usually an int.
You want to keep the size of the key as small as possible. This keeps your index small and benefits any foreign keys as well. Additonally you are not tightly coupling the data design to external user data (this holds true for the aspnet GUID as well).
Generally GUIDs don't make good primary keys as they are large and inserts can happen at potentially any data page within the table rather than at the last data page. The main exception to this is if you are running mutilple replicated databases. GUIDs are very useful for keys in this scenario, but I am guessing you only have one database so this is not a problem.
If you're going to be using LinqToSql for development, I would recommend using an Int as a primary key. I've had many issues when I had relationships built off of non-Int fields, even when the nvarchar(x) field had constraints to make it a unique field.
I'm not sure if this is a known bug in LinqToSql or what, but I've had issues with it on a current project and I had to swap out PKs and FKs on several tables.
I agree with Mike Stone. I would also suggest only using a GUID in the event you are going to be tracking an enormous amount of data. Otherwise, a simple auto incrementing integer (Id) column will suffice.
If you do need the GUID, .NET is lovely enough that you can get one by a simple...
Dim guidProduct As Guid = Guid.NewGuid()
or
Guid guidProduct = Guid.NewGuid();
I'm agreeing with Mike Stone also. My company recently implemented a new user table for outside clients (as opposed to internal users who authenticate through LDAP). For the external users, we chose to store the GUID as the primary key, and store the username as varchar with unique constraints on the username field.
Also, if you are going to store the password field, I highly recommend storing the password as a salted, hashed binary in the database. This way, if someone were to hack your database, they would not have access to your customer's passwords.
I would use the guid in my code and as already mentioned an email address as username. It is, after all, already unique and memorable for the user. Maybe even ditch the guid (v. debateable).
Someone mentioned using a clustered index on the GUID if this was being used in your code. I would avoid this, especially if INSERTs are high; the index will be rebuilt every time you INSERT a record. Clustered indexes work well on auto increment IDs though because new records are appended only.