Formulas to generate a unique id? - guid

I would like to get a few ideas on generating unique id's without using the GUID. Preferably i would like the unique value to be of type int32.
I'm looking for something that can be used for database primary key as well as being url friendly.
Can these considered Unique?
(int)DateTime.Now.Ticks
(int)DateTime.Now * RandomNumber
Any other ideas?
Thanks
EDIT: Well i am trying to practise Domain Driven Design and all my entities need to have a ID upon creation to be valid. I could in theory call into the DB to get an auto incremented number but would rather steer clear of this as DB related stuff is getting into the Domain.

It depends on how unique you needed it to be and how many items you need to give IDs to. Your best bet may be assigning them sequentially; if you try to get fancy you'll likely run into the Birthday Paradox (collisions are more likely than you might expect) or (as in your case 1) above) be foreced to limit the rate at which you can issue them.
Your 1) above is a little better than the 2) for most cases; it's rate limited--you can't issue more than 1 ID per tick--but not susceptible to the Birthday Paradox. Your 2) is just throwing bits away. Might be slightly better to XOR with the random number, but in any case I don't think the rand is buying you anything, just hiding the problem & making it harder to fix.

Are these considered Globally Unique?
1) (int)DateTime.Now.Ticks 2)
(int)DateTime.Now * RandomNumber
Neither option is globally unique.
Option 1 - This is only unique if you can guarantee no more than one ID is generated per tick. From your description, it does not sound like this would work.
Option 2 - Random numbers are pseudo random, but not guaranteed to be unique. With that already in mind, we can reduce the DateTime portion of this option to a similar problem to option 1.
If you want a globally unique ID that is an int32, one good way would be a synchronous service of some sort that returns sequential IDs. I guess it depends on what your definition of global means. If you had larger than an int32 to work with, and you mean global on a given network, then maybe you could use IP address with a sequence number appended, where the sequence number is generated synchronously across processes.
If you have other unique identifiers besides IP address, then that would obviously be a better choice for displaying as part of a URL.

You can use the RNGCryptoServiceProvider class, if you are using .NET
RNGCryptoServiceProvider Class

Related

Using auto-number database fields theory

I was on "another" programming forum, and we were talking about getting the next number from an auto-increment field BEFORE an insert takes place (there is a way using ADOX). This was in an MS-Access database btw.
Anyway, the discussion veered off into the area of SHOULD you use auto-increment fields for things like invoice numbers, PO numbers, bill of lading numbers, or anything else that needs an unique, incrementing number.
My thoughts were "why not"? Other people are arguing that an Invoice number (for instance) should be managed as a separate table and incremented with code, not using an auto-number field.
Can someone give me a good reason why that would be true?
I've used auto-number fields for years for just this type of thing and have never had problem one.
Your thoughts?
I have always avoided number auto_increment. As it turns out for good reason. But originally my reasons were because that was what the professor told us.
Facebook had a major breach a few years ago - simply because they were use AUTO_INCREMENT fields for user id's. Doesn't take a calculator to figure out that if my ID is 10320 there is likely someone with ID 10319, etc.
When debugging (or proofing design) having a key that implicit of the data it represents is a heck of a lot easier.
Have keys that are implicit of the data reduces the potencial for corrupted data (type's and user guessing).
Implicit keys require the developer think about they're data. I have never come across a table using implicit keys that was not normalized.
Other than the fact deadlines often run tight - there is no great reason for auto increment.
Normally I use and autonumbering field for the ID so I don't need to think about how's generated.
The recordset operation like insert and delete alter the sequence skipping block of numbers.
When you manage CustomerID, Invoice Numbers and so on, it's better to have the full control over them instead of letting them under system's control.
You can create a function that generates for you the desired numbers using a rule (e.g. the invoice can be a function that include the invoicing date).
With autonumbering you can't manage this.
After that there is NO FIXED RULES about what to do and what not do.
It's just your practice and experience and the degree of freedom you want to have.
Bye:-)

Alternative to GUID with Scalablity in mind and Friendly URL

I've decided to use GUID as primary key for many of my project DB tables. I think it is a good practice, especially for scalability, backup and restore in mind. The problem is that I don't want to use the regular GUID and search for an alternative approach. I was actually interested to know what Pinterest i using as primary key. When you look at the URL you see something like this:
http://pinterest.com/pin/275001120966638272/
I prefer the numerical representation, even it it is stores as string. Is there any way to achieve this?
Furthermore, youtube also use a different kind of hashing technique which I can't figure it out:
http://www.youtube.com/watch?v=kOXFLI6fd5A
This reminds me shorten url like scheme.
I prefer the shortest one, but I know that it won't guarantee to be unique. I first thought about doing something like this:
DateTime dt1970 = new DateTime(1970, 1, 1);
DateTime current = DateTime.Now;
TimeSpan span = current - dt1970;
Result Example:
1350433430523.66
Prints the total milliseconds since 1970, But what happens if I have hundreds thousands of writes per second.
I mainly prefer the non BIGINT Auto-Increment solution because it makes a lot less headache to scale the DB using 3rd party tools as well as less problematic backup/restore functionality because I can transfer data between servers and such if I want.
Another sophisticated approach is to tailor the solution towards my application. In the database, the primary key will also contain the username (unique and can't be changed by the user), so I can combine the numerical value of the name with the millisecond number which will give me a unique numerical string. Because the user doesn't insert data as such a high rate, the numerical ID is guarantee to be unique. I can also remove the last 5 figures and still get a unique ID, because I assume that the user won't insert data at more than 1 per second the most, but I would probably won't do that (what do you think about this idea?)
So I ask for your help. My data is assumes to grow very big, 2TB a year with ten of thousands new rows each second. I want URLs to look as "friendly" as possible, and prefer not to use the 'regular' GUID.
I am developing my app using ASP.NET 4.5 and MySQL
Thanks.
Collision Table
For YouTube like GUID's you can see this answer. They are basically keeping a database table of all random video ID's they are generating. When they request a new one, they check the table for any collisions. If they find a collision, they try to generate a new one.
Long Primary Keys
You could use a long (e.g. 275001120966638272) as a primary key, however if you have multiple servers generating unique identifiers you'll have to partition them somehow or introduce a global lock, so each server doesn't generate the same unique identifier.
Twitter Snowflake ID's
One solution to the partitioning problem with long ID's is to use snowflake ID's. This is what Twitter uses to generate it's ID's. All generated ID's are made up of the following parts:
Epoch timestamp in millisecond precision - 41 bits (gives us 69 years with a custom epoch)
Configured machine id - 10 bits (gives us up to 1024 machines)
Sequence number - 12 bits (A local counter per machine that rolls over every 4096)
One extra bit is reserved for future purposes. Since the ID's use timestamp as the first component, they are time sortable (which is very important for query performance).
Base64 Encoded GUID's
You can use ShortGuid which encodes a GUID as a base64 string. The downside is that the output is a little ugly (e.g. 00amyWGct0y_ze4lIsj2Mw) and it's case sensitive which may not be good for URL's if you are lower-casing them.
Base32 Encoded GUID's
There is also base32 encoding of GUID's, which you can see this answer for. These are slightly longer than ShortGuid above (e.g. lt7fz44kdqlu5pt7wnyzmu4ov4) but the advantage is that they can be all lower case.
Multiple Factors
One alternative I have been thinking about is to introduce multiple factors e.g. If Pintrest used a username and an ID for extra uniqueness:
https://pinterest.com/some-user/1
Here the ID 1 is unique to the user some-user and could be the number of posts they've made i.e. their next post would be 2. You could also use YouTube's approach with their video ID but specific to a user, this could lead to some ridiculously short URL's.
The first, simplest and practical scenario for unique keys
is the increasing numbering sequence of the write order,
This represent the record number inside one database providing unique numbering on a local scale : this is the -- often met -- application level requirement.
Next, the numerical approach based on a concatenation of time and counters is commonly used to ensure that concurrent transactions in same wagons will have unique ids before writing.
When the system gets highly threaded and distributed, like in highly concurrent situations, do some constraints need to be relaxed, before they become a penalty for scaling.
Universally unique identifier as primary key
Yes, it's a good practice.
A key reference system can provide independence from the underlying database system.
This provides one more level of integrity for the database when the evoked scenario occurs : backup, restore, scale, migrate and perhaps prove some authenticity.
This article Generating Globally Unique Identifiers for Use with MongoDB
by Alexander Marquardt (a Senior Consulting Engineer at MongoDB) covers the question in detail and gives some insight about database and informatics.
UUID are 128 bits length. They introduce an amount of entropy
high enough to ensure a practical uniqueness of labels.
They can be represented by a 32 hex character strings.
Enough to write several thousands of billions of billions
of decimal number.
Here are a few more questions that can occur when considering the overall principle and the analysis:
should primary keys of database
and Unique Resource Location be kept as two different entities ?
does this numbering destruct the sequentiality in the system ?
Does providing a machine host number (h),
followed by a user number (u) and time (t) along a write index (i)
guarantee the PK huti to stay unique ?
Now considering the DB system:
primary keys should be preserved as numerical (be it hexa)
the database system relies on it and this implies performance considerations.
their size should be fixed,
the system must answer rapidly to tell if it's potentially dealing with a PK or not.
Hashids
The hashing technique of Youtube is hashids.
It's a good choice :
the hash are shorts and the length can be controlled,
the alphabet can be customized,
it is reversible (and as such interesting as short reference to the primary keys),
it can use salt.
it's design to hash positive numbers.
However it is a hash and as such the probability exists that a collision happen. They can be detected : unique constraint is violated before they are stored and in such case, should be run again.
Consider the comment to this answer to figure out how much entropy it's possible to get from a shorten sha1+b64 recipe.
To anticipate on the colliding scenario,
calls for the estimation of the future dimension of the database, that is, the potential number of records. Recommended reading : Z.Bloom, How Long Does An ID Need To Be ?
Milliseconds since epoch
Cited from the previous article, which provides most of the answer to the problem at hand with a nice synthetic style
It may not be necessary for you to encode every time since 1970
however. If you are only interested in keeping recent records close to
each other, you only need enough values to ensure that you don’t have
more values with the same prefix than your database can cache at once
What you could do is convert a GUID into only numeric by converting all the letters into numbers in the guid. Here is a example of what that would look like. It's abit long but if that is not a problem this could be one way of going about generating the keys.
1004234499987310234371029731000544986101469898102
Here is the code i used to generate the string above. But i would probably recommend you using a long primary key insteed although it can be abit of a pain it's probably a safer way to do it then the function below.
string generateKey()
{
Guid guid = Guid.NewGuid();
string newKey = "";
foreach(char c in guid.ToString().Replace("-", "").ToCharArray())
{
if(char.IsLetter(c))
{
newKey += (int)c;
}
else
{
newKey += c;
}
}
return newKey;
}
Edit:
I did some testing with only taking the 20 first numbers and out of 5000000 generated keys 4999978 was uniqe. But when using 25 first numbers it is 5000000 out of 5000000. I would recommend you to do some more testing if going with this method.

ASP.NET: Generating Order ID?

I am getting ready to launch a website I designed in ASP.NET.
The problem is, I don't want my customers to have a super low order id(example:#00000001).
How would I generate a Unique(and random) Order ID, so the customer would get an order number like K20434034?
Set your Identity Seed for your OrderId to a large number. Then when you present an order number to the user, you could have a constant that you prepend to the order id (like all orders start with K), or you could generate a random character string and store that on the order record as well.
There are multiple options from both the business tier and database:
Consider
a random number has a chance of collision
it is probably best not to expose an internal ID, especially a sequential one
a long value will annoy users if they ever have to type or speak it
Options
Generate a cryptographically random number (an Int64 generated with RNGCryptoServiceProvider has a very low chance of collision or predictability)
begin an auto-incremented column which begins at some arbitrary number other than zero
use UNIQUEIDENTIFIER (or System.Guid) and base 62 encode the bytes
I suggest you just start the identity seed at some higher number if all you care about is that they don't think the number is low. The problem with random is that there is always the chance for collisions, and it gets more and more expensive to check for duplicates as the number of existing order IDs piles up.
Make column data type as UNIQUEIDENTIFIER . This data type will provide you the ID in the below mentioned format. Hope this fulfills the need.
B85E62C3-DC56-40C0-852A-49F759AC68FB.

Shorter GUID using CRC

I am making a website in ASP.NET and want to be able to have a user profile which can be accessed via a URL with the users id at the end. Unique identifier is obviously a bad choice as it is long and (correct me if i am wrong) not really URL friendly.
I was wondering if i produced a unique idnetifier on the ASP page then hashed it using CRC (or something similar) if it would still be as unique (or even unique at all) as just a GUID.
For example:
The GUID 6f1a7841-190b-4c7a-9f23-98709b6f8848 equals CRC E6DC2D44.
Thanks
A CRC of a GUID would not be unique, no. That would be some awesome compression algorithm otherwise, to be able to put everything into just 4 bytes.
Also, if your users are stored in the database with a GUID key, you'd have trouble finding the user that matches up to this particular CRC.
You'd be better off using a plain old integer to uniquely identify a user. If you want to have the URL unguessable, you can combine it with a second ticket (or token) parameter that's randomly generated. It doesn't have to be unique, because you use the integer ID for identifying the user. You can think of it more or less as a password.
Any calculated hash contains less information (bits) than the original data and can never be as unique. There are always collisions.
If the users have a username then why not use that? It should be unique (I would hope!) and would probably be short and URL friendly. It would also be easy for users to remember, too, and fits in the with the ASP.NET membership scheme (since usernames are the "primary key" in membership providers). I don't see any security issue as (presumably) only authenticated users would be able to access it, anyway?
No, it won't be as unique, because you're losing information from it. If you take a 32 character hex string and convert it to an 8 character hex string then, by definition, you're losing 75% of the data.
What you can do is use more characters to represent the data. A guid uses ony 16 characters (base 16) so you could use a higher base (e.g. base 64) which lets you encode the same amount of information in fewer characters.
I don't see any problem with the normal GUID in HTTP URL. If you want the shorted form of Guid use the below.
var gid = Guid.NewGuid().ToString("N");
This will give a GUID without any hyphen or special characters.
A GUID is globally unique, meaning that you won't run into clashes, hopefully ever. These are usually based on some sort of time based calculation with randomness interjected. If you want to shorten something using a hash, such as CRC, then then uniqueness it not automatic, but as long as you manage your uniqueness yourself (checking to see if the hash is not currently assigned to another user and if so, regenerating until you get a unique one) then you could use almost anything.
This is the way a lot of url-shorteners work.
If you use a CRC of a UUID/GUID as ID you could also use a shorter ID in the first place.
The idea of an UUID/GUID as ID is IMO that you can create IDs on disconnected systems and should have no problem with duplicate IDs.
Anyway who is going to enter the URL for the profile page by hand anyway?
Also I see no problems with URL friendliness of an UUID/GUID - there are no chars which are not allowed by http.
How are users identified in the database (or any other place you use to store your data)?
If they are identified using this GUID I'd say, you have a really good reason for this, because this makes searching for a special ID really complicated (even when using a binary tree); there is also more space needed to store these values.
If they are identified by an unique integer value, why not using this to call the user profile?
You can shorten a GUID to 20 printable ASCII characters, with it still being unique and without losing any information.
Take a look at this blog post by Jeff Atwood:
Equipping our ASCII Armor

Is it possible for an ASP.NET server to generate the same GUID to more than one user?

I have seen the GUID Collisons
discussions but just wanted your thoughts on whether there could be a GUID collision if both clients accessed the same web page that generates the GUID at exactly the same time (- probably down to the micro-second) ?
It's theoretically possible, but highly unlikely.
No. If that happens rush out and buy a lottery ticket!
On a single server, no, it isn't possible. Version 4 Guids are made up (amongst other things) of a pseudo-random 54-bit value, and as I understand those, they cycle through all values before repeating.
If creating on more than one server, then it is possible to have a guid clash, although that is highly highly unlikely.
Refer to RFC 4122, specifically section 4.1.5. Modern Windows uses v4 UUIDs, I believe.
From Wikipedia, the free encyclopedia
A Globally Unique Identifier or GUID (pronounced /ˈguːɪd/ or /ˈgwɪd/) is a special type of identifier used in software applications in order to provide a reference number which is unique in any context (hence, "Globally"), for example, in defining the internal reference for a type of access point in a software application, or for creating unique keys in a database. While each generated GUID is not guaranteed to be unique, the total number of unique keys (2128 or 3.4×1038) is so large that the probability of the same number being generated twice is very small. For example, consider the observable universe, which contains about 5×1022 stars; every star could then have 6.8×1015 universally unique GUIDs.
If you generate lots and lots of GUIDs, then likelihood of collision is getting quite high due to Birthday paradox. Theoretically GUID collision should be highly unlikely (naïve intuition) but practically it happens from time to time.
Sure, it's waste of time to handle these collisions programatically but you should still write your code in a way that if it happens, then your code should fail loudly, not quietly and undetected.

Resources