I maintain a WCF service. Since it's been decided to put a version of the service in a public server, for demo/testing purposes, it's needed to add some kind of security regarding who can access the functions. So, I thought about adding a key to each function that the client must supply in order to verify his access.
But, as the software is a licensed one (by a period of time), and it's installed locally on the customer's server when it's bought, I thought that an elegant solution could be to embed the expiration date into the key, so I don't have to put some license file or something.
I'm thinking about, giving a certain date (the expiration date) I could generate a short string (like 8 characters, letters and numbers) that appears random to the user and that he can't alter into a valid one but which I could decode and get the date that it was used to generate it.
I thought about encrypting a plain date, but the algorithms I know generate a super user-unfriendly results.
I appreciate any suggestions, thank you very much!
You could try changing the date to a single number, such as the number of days since 4 July 2017, or some semi-random starting date. If that is too transparent, then use some type of format preserving encryption to encrypt the day count to the same number of digits using a standard key.
Related
I've decided to use GUID as primary key for many of my project DB tables. I think it is a good practice, especially for scalability, backup and restore in mind. The problem is that I don't want to use the regular GUID and search for an alternative approach. I was actually interested to know what Pinterest i using as primary key. When you look at the URL you see something like this:
http://pinterest.com/pin/275001120966638272/
I prefer the numerical representation, even it it is stores as string. Is there any way to achieve this?
Furthermore, youtube also use a different kind of hashing technique which I can't figure it out:
http://www.youtube.com/watch?v=kOXFLI6fd5A
This reminds me shorten url like scheme.
I prefer the shortest one, but I know that it won't guarantee to be unique. I first thought about doing something like this:
DateTime dt1970 = new DateTime(1970, 1, 1);
DateTime current = DateTime.Now;
TimeSpan span = current - dt1970;
Result Example:
1350433430523.66
Prints the total milliseconds since 1970, But what happens if I have hundreds thousands of writes per second.
I mainly prefer the non BIGINT Auto-Increment solution because it makes a lot less headache to scale the DB using 3rd party tools as well as less problematic backup/restore functionality because I can transfer data between servers and such if I want.
Another sophisticated approach is to tailor the solution towards my application. In the database, the primary key will also contain the username (unique and can't be changed by the user), so I can combine the numerical value of the name with the millisecond number which will give me a unique numerical string. Because the user doesn't insert data as such a high rate, the numerical ID is guarantee to be unique. I can also remove the last 5 figures and still get a unique ID, because I assume that the user won't insert data at more than 1 per second the most, but I would probably won't do that (what do you think about this idea?)
So I ask for your help. My data is assumes to grow very big, 2TB a year with ten of thousands new rows each second. I want URLs to look as "friendly" as possible, and prefer not to use the 'regular' GUID.
I am developing my app using ASP.NET 4.5 and MySQL
Thanks.
Collision Table
For YouTube like GUID's you can see this answer. They are basically keeping a database table of all random video ID's they are generating. When they request a new one, they check the table for any collisions. If they find a collision, they try to generate a new one.
Long Primary Keys
You could use a long (e.g. 275001120966638272) as a primary key, however if you have multiple servers generating unique identifiers you'll have to partition them somehow or introduce a global lock, so each server doesn't generate the same unique identifier.
Twitter Snowflake ID's
One solution to the partitioning problem with long ID's is to use snowflake ID's. This is what Twitter uses to generate it's ID's. All generated ID's are made up of the following parts:
Epoch timestamp in millisecond precision - 41 bits (gives us 69 years with a custom epoch)
Configured machine id - 10 bits (gives us up to 1024 machines)
Sequence number - 12 bits (A local counter per machine that rolls over every 4096)
One extra bit is reserved for future purposes. Since the ID's use timestamp as the first component, they are time sortable (which is very important for query performance).
Base64 Encoded GUID's
You can use ShortGuid which encodes a GUID as a base64 string. The downside is that the output is a little ugly (e.g. 00amyWGct0y_ze4lIsj2Mw) and it's case sensitive which may not be good for URL's if you are lower-casing them.
Base32 Encoded GUID's
There is also base32 encoding of GUID's, which you can see this answer for. These are slightly longer than ShortGuid above (e.g. lt7fz44kdqlu5pt7wnyzmu4ov4) but the advantage is that they can be all lower case.
Multiple Factors
One alternative I have been thinking about is to introduce multiple factors e.g. If Pintrest used a username and an ID for extra uniqueness:
https://pinterest.com/some-user/1
Here the ID 1 is unique to the user some-user and could be the number of posts they've made i.e. their next post would be 2. You could also use YouTube's approach with their video ID but specific to a user, this could lead to some ridiculously short URL's.
The first, simplest and practical scenario for unique keys
is the increasing numbering sequence of the write order,
This represent the record number inside one database providing unique numbering on a local scale : this is the -- often met -- application level requirement.
Next, the numerical approach based on a concatenation of time and counters is commonly used to ensure that concurrent transactions in same wagons will have unique ids before writing.
When the system gets highly threaded and distributed, like in highly concurrent situations, do some constraints need to be relaxed, before they become a penalty for scaling.
Universally unique identifier as primary key
Yes, it's a good practice.
A key reference system can provide independence from the underlying database system.
This provides one more level of integrity for the database when the evoked scenario occurs : backup, restore, scale, migrate and perhaps prove some authenticity.
This article Generating Globally Unique Identifiers for Use with MongoDB
by Alexander Marquardt (a Senior Consulting Engineer at MongoDB) covers the question in detail and gives some insight about database and informatics.
UUID are 128 bits length. They introduce an amount of entropy
high enough to ensure a practical uniqueness of labels.
They can be represented by a 32 hex character strings.
Enough to write several thousands of billions of billions
of decimal number.
Here are a few more questions that can occur when considering the overall principle and the analysis:
should primary keys of database
and Unique Resource Location be kept as two different entities ?
does this numbering destruct the sequentiality in the system ?
Does providing a machine host number (h),
followed by a user number (u) and time (t) along a write index (i)
guarantee the PK huti to stay unique ?
Now considering the DB system:
primary keys should be preserved as numerical (be it hexa)
the database system relies on it and this implies performance considerations.
their size should be fixed,
the system must answer rapidly to tell if it's potentially dealing with a PK or not.
Hashids
The hashing technique of Youtube is hashids.
It's a good choice :
the hash are shorts and the length can be controlled,
the alphabet can be customized,
it is reversible (and as such interesting as short reference to the primary keys),
it can use salt.
it's design to hash positive numbers.
However it is a hash and as such the probability exists that a collision happen. They can be detected : unique constraint is violated before they are stored and in such case, should be run again.
Consider the comment to this answer to figure out how much entropy it's possible to get from a shorten sha1+b64 recipe.
To anticipate on the colliding scenario,
calls for the estimation of the future dimension of the database, that is, the potential number of records. Recommended reading : Z.Bloom, How Long Does An ID Need To Be ?
Milliseconds since epoch
Cited from the previous article, which provides most of the answer to the problem at hand with a nice synthetic style
It may not be necessary for you to encode every time since 1970
however. If you are only interested in keeping recent records close to
each other, you only need enough values to ensure that you don’t have
more values with the same prefix than your database can cache at once
What you could do is convert a GUID into only numeric by converting all the letters into numbers in the guid. Here is a example of what that would look like. It's abit long but if that is not a problem this could be one way of going about generating the keys.
1004234499987310234371029731000544986101469898102
Here is the code i used to generate the string above. But i would probably recommend you using a long primary key insteed although it can be abit of a pain it's probably a safer way to do it then the function below.
string generateKey()
{
Guid guid = Guid.NewGuid();
string newKey = "";
foreach(char c in guid.ToString().Replace("-", "").ToCharArray())
{
if(char.IsLetter(c))
{
newKey += (int)c;
}
else
{
newKey += c;
}
}
return newKey;
}
Edit:
I did some testing with only taking the 20 first numbers and out of 5000000 generated keys 4999978 was uniqe. But when using 25 first numbers it is 5000000 out of 5000000. I would recommend you to do some more testing if going with this method.
We have an ASP.NET website and an SQL database hosted in US. Whenever I use the function Now() in VB.NET and getdate() in SQL, I get the US' current time. The problem is, the client is in the Philippines which is on GMT+8 Time Zone. My question is, is there any way I can set the Time Zone of a specific database and website so that when I use the functions, I'll get the Philippine's current time? How do you deal with this? As much as possible, we don't want to do subtraction or addition to the result date of the functions since in the future, clients will be from other country. It will give us headache updating the codes if we do that.
Thank you in advance!
Given that your clients may be in different time zones, you should store a timezone for the clients, that they (or you) can set as a preference for their account. Store all dates+times as UTC, and then convert to their timezone when displaying results in your interface.
This question has already been addressed to a great extent in the following question:
How to work with time zones in ASP.NET?
Follow-up:
Unfortunately, the SQL server date is a system-level setting, so it's not really something that can be manipulated on a per-session basis. It sounds like you will need to make some code changes, but you can isolate them.
Do you have a session-level variable which contains the client time zone offset? If not, create one.
Create a small date/time utility class.
In the utility class, provide 3 methods to:
(1) get the current date/time (offset to the client's time zone)
(2) pass in a database date/time to return the time offset for the client's TZ.
(3) pass in a time from the client to subtract out the client's TZ difference.
You will have to make code changes, but you can probably use those utility functions to wrap inputs and outputs everywhere, centralizing the logic. Microsoft has a page about mis-steps to avoid when using the DateTime class and manipulating time zones:
http://msdn.microsoft.com/en-us/library/ms973825.aspx#datetime_topic1a
I am making a website in ASP.NET and want to be able to have a user profile which can be accessed via a URL with the users id at the end. Unique identifier is obviously a bad choice as it is long and (correct me if i am wrong) not really URL friendly.
I was wondering if i produced a unique idnetifier on the ASP page then hashed it using CRC (or something similar) if it would still be as unique (or even unique at all) as just a GUID.
For example:
The GUID 6f1a7841-190b-4c7a-9f23-98709b6f8848 equals CRC E6DC2D44.
Thanks
A CRC of a GUID would not be unique, no. That would be some awesome compression algorithm otherwise, to be able to put everything into just 4 bytes.
Also, if your users are stored in the database with a GUID key, you'd have trouble finding the user that matches up to this particular CRC.
You'd be better off using a plain old integer to uniquely identify a user. If you want to have the URL unguessable, you can combine it with a second ticket (or token) parameter that's randomly generated. It doesn't have to be unique, because you use the integer ID for identifying the user. You can think of it more or less as a password.
Any calculated hash contains less information (bits) than the original data and can never be as unique. There are always collisions.
If the users have a username then why not use that? It should be unique (I would hope!) and would probably be short and URL friendly. It would also be easy for users to remember, too, and fits in the with the ASP.NET membership scheme (since usernames are the "primary key" in membership providers). I don't see any security issue as (presumably) only authenticated users would be able to access it, anyway?
No, it won't be as unique, because you're losing information from it. If you take a 32 character hex string and convert it to an 8 character hex string then, by definition, you're losing 75% of the data.
What you can do is use more characters to represent the data. A guid uses ony 16 characters (base 16) so you could use a higher base (e.g. base 64) which lets you encode the same amount of information in fewer characters.
I don't see any problem with the normal GUID in HTTP URL. If you want the shorted form of Guid use the below.
var gid = Guid.NewGuid().ToString("N");
This will give a GUID without any hyphen or special characters.
A GUID is globally unique, meaning that you won't run into clashes, hopefully ever. These are usually based on some sort of time based calculation with randomness interjected. If you want to shorten something using a hash, such as CRC, then then uniqueness it not automatic, but as long as you manage your uniqueness yourself (checking to see if the hash is not currently assigned to another user and if so, regenerating until you get a unique one) then you could use almost anything.
This is the way a lot of url-shorteners work.
If you use a CRC of a UUID/GUID as ID you could also use a shorter ID in the first place.
The idea of an UUID/GUID as ID is IMO that you can create IDs on disconnected systems and should have no problem with duplicate IDs.
Anyway who is going to enter the URL for the profile page by hand anyway?
Also I see no problems with URL friendliness of an UUID/GUID - there are no chars which are not allowed by http.
How are users identified in the database (or any other place you use to store your data)?
If they are identified using this GUID I'd say, you have a really good reason for this, because this makes searching for a special ID really complicated (even when using a binary tree); there is also more space needed to store these values.
If they are identified by an unique integer value, why not using this to call the user profile?
You can shorten a GUID to 20 printable ASCII characters, with it still being unique and without losing any information.
Take a look at this blog post by Jeff Atwood:
Equipping our ASCII Armor
I'm planning a distributed system of applications that will communicate with different types of RDBMS. One of the requirements is consistent handling of DateTimes across all RDBMS types. All DateTime values must be at millisecond precision, include the TimeZone info and be stored in a single column.
Since different RDBMS's handle dates and times differently, I'm worried I can't rely on their native column types in this case and so I'll have to come up with a different solution. (If I'm wrong here, you're welcome to show me the way.)
The solution, whatever it may be, should ideally allow for easy sorting and comparisons on the SQL level. Other aspects, such as readability and ability to use SQL datetime functions, are not important, since this will all be handled by a gateway service.
I'm toying with an idea of storing my DateTime values in an unsigned largeint column type (8 bytes). I haven't made sure if all RDBMS's in question (MSSQL, Oracle, DB2, PostgreSQL, MySQL, maybe a few others) actually /have/ such a type, but at this point I just assume they do.
As for the storage format... For example, 2009-01-01T12:00:00.999+01:00 could be stored similar to ?20090101120000999??, which falls in under 8 bytes.
The minimum DateTime I'd be able to store this way would be 0001-01-01T00:00:00.000+xx:xx, and the maximum would be 8000-12-31T23:59:59.999+xx:xx, which gives me more than enough of a span.
Since maximum unsigned largeint value is 18446744073709551615, this leaves me with the following 3 digits (marked by A and BB) to store the TimeZone info: AxxxxxxxxxxxxxxxxxBB.
Taking into account the maximum year span of 0001..8000, A can be either 0 or 1, and BB can be anywhere from 00 to 99.
And now the questions:
What do you think about my proposed solution? Does it have merit or is it just plain stupid?
If no better way exists, how do you propose the three remaining digits be used for TimeZone info best?
One of the requirements is consistent handling of DateTimes across all RDBMS types.
Be aware that date-time handling capabilities vary radically across various database systems. This ranges from virtually no support (SQLite) to excellent (Postgres). Some such as Oracle have legacy data-types that may confuse the situation, so study carefully without making assumptions.
Rather than establish a requirement that broadly says we must support "any or all database", you should get more specific. Research exactly what databases might realistically be candidates for deployment in the real-world. A requirement of "any or all databases" is naïve and unrealistic because databases vary in many capabilities — date-time handling is just the beginning of your multi-database support concerns.
The SQL standard barely touches on the subject of date-time, broadly defining a few types with little discussion of the nuances and complexities of date-time work.
Also be aware that most programming platforms provide atrociously poor support for date-time handling. Note that Java leads the industry in this field, with its brilliantly designed java.time classes. That framework evolved from the Joda-Time project for Java which was ported to .Net platform as NodaTime.
All DateTime values must be at millisecond precision,
Good that you have specified that important detail. Understand that various systems resolve date-time values to whole seconds, milliseconds, microseconds, nanoseconds, or something else.
include the TimeZone info and be stored in a single column.
Define time zone precisely.
Understand the difference between an offset-from-UTC and a time zone: The first is a number of hours-minutes-seconds plus-or-minus, the second has a name in format Continent/Region and is a history of past, present, and future changes to the offset used by the people of a particular region.
The 2-4 letter abbreviations such as CST, PST, IST, and so on are not formal time zone names, are not standardized, and are not even unique (avoid them).
Since different RDBMS's handle dates and times differently, I'm worried I can't rely on their native column types in this case and so I'll have to come up with a different solution.
The SQL standard does define a few types that are supported by some major databases.
TIMESTAMP WITH TIME ZONE represents a moment, a specific point on the timeline. I vaguely recall hearing of a database that actually stored the incoming time zone. But most, such as Postgres, use the time zone indicated on the incoming value to adjust into UTC, then store that UTC value, and lastly, discard the zone info. When retrieved, you get back a UTC value. Beware of tools and middleware with the confusing anti-feature of applying a default time zone after retrieval and before display to the user.
TIMESTAMP WITHOUT TIME ZONE represents a date with time-of-day, but purposely lacking the context of a time zone or offset. Without a zone/offset, such a value does not represent a moment. You could apply a time zone to determine a moment in a range of about 26-27 hours, the range of time zones around the globe.
There are other types in the standard as well such as date-only (DATE) and time-only (TIME).
See this table I made for Java, but in this context the column of SQL standard types in relevant. Be aware that TIME WITH TIME ZONE makes no sense logically, and should not be used.
If you have narrowed down your list of candidate databases, study their documentation to learn if they have a type akin to the standard types in which you are interested, and what the name of that type is (not always the standard name).
I'm toying with an idea of storing my DateTime values in an unsigned largeint column type (8 bytes).
A 64-bit value is not likely appropriate. For example, the java.time classes use a pair of numbers, a number of whole seconds since the epoch reference of first moment of 1970 in UTC, plus another number for the count of nanoseconds in the fractional second.
It is really best to use the database's data-time data types if they are similar across your list of candidate databases. Using a count-from-epoch is inherently ambiguous, which makes identifying erroneous data difficult.
Storing your own count-from-epoch number is possible. If you must go that way, be sure the entire team understands what epoch reference was chosen. At least a couple dozen have been in use in various computing systems. Beware of staff persons assuming a particular epoch reference is in use.
Another way to define your own date-time tracking is to use text in the standard ISO 8601 formats. Such strings will alphabetically sort as chronological. One exception to that sorting is the optional but commonly used Z at the end to indicate an offset-from-UTC of zero (pronounced “Zulu”).
The minimum DateTime I'd be able to store this way would be 0001-01-01T00:00:00.000+xx:xx,
Taking into account the maximum year span of 0001..8000
Are you really storing values from the time of Christ? Is this software really going to be around executing transactions for the year 8000?
This is an area where the responsible stakeholders should define their real needs. For example, for many business systems you may need only data from the year of the product's launch and run out only a century or two into the future.
The minimum/maximum value range varies widely between different databases. If you choose to use a built-in data type in each database system, investigate its limits. Some, for example, may go only to the year 2038, the common Y2038 problem.
To sum up my recommendation:
Get real about your date-time needs: min/max range, resolution, and various types (moment versus not a moment, date-only, etc.).
Get real about your possible databases for deployment.
If you need enterprise-quality reliability in a classic RDMS, your candidate list is likely only a few: Postgres, Microsoft SQL Server, Oracle, and maybe IBM Db2.
Keep this list of supported databases as short as possible. Each database you agree to support is a huge commitment, now and in the future.
Be sure your chosen database(s) have a database driver available for your chosen programming language(s). For example JDBC for Java.
If at all possible, use the built-in data types offered by the database.
Be sure you and your team understand date-time handling. Many do not, in my experience, as (a) the subject is rarely taught, and (b) many programmers & admins mistakenly believe their quotidian intuitive understanding of date-time is sufficient for programming work. (Ignorance is bliss, as they say.)
Identify other areas of functionality beyond date-time handling, and compare which databases support those areas.
I would suggest you to store the datetime information in milliseconds since 1970 (Java style) .
It's a standard way for storing datetime information, in addition it's more efficient in terms of space than your suggestion. Because in your suggestion some digits are "wasted" i.e. the month digits can store only 00-12 (instead of 00-99) and so on.
You didn't specify what is your development language but I am sure you can find many code snippets that transform date to milliseconds.
If you are developing in .NET they have a similar concept of ticks. (you can use this information as well)
Regarding the time zone,I would have add another column to store only the TimeZone indication.
Remember that any format you choose should maintain consistency between two dates, i.e. if D1 > D2 then format(D1)>format(D2) , this way you can query the DB for changes since some date, or query for changes between two dates
I am working on cleaning up a bug in a large code base where no one was paying attention to local time vs. UTC time.
What we want is a way of globally ignoring time zone information on DateTime objects sent to and from our ASP.NET web services. I've got a solution for retrieve operations. Data is only returned in datasets, and I can look for DateTime columns and set the DateTimeMode to Unspecified. That solves my problem for all data passed back and forth inside a data set.
However DateTime objects are also often passed directly as parameters to the web methods. I'd like to strip off any incoming time zone information. Rather than searching through our client code and using DateTime.SpecifyKind(..) to set all DateTime vars to Undefined, I'd like to do some sort of global ASP.NET override to monitor incoming parameters and strip out the time zone information.
Is such a thing possible? Or is there another easier way to do what I want to do?
Just to reiterate -- I don't care about time zones, everyone is in the same time zone. But a couple of users have machines badly configured, wrong time zones, etc. So when they send in July 1, 2008, I'm getting June 30, 2008 22:00:00 on the server side where it's automatically converting it from their local time to the server's local time.
Update: One other possibility would be if it were possible to make a change on the client side .NET code to alter the way DateTime objects with Kind 'Undefined' are serialized.
I have dealt with this often in many applications, services, and on different platforms (.NET, Java, etc.). Please believe me that you do NOT want the long term consequences of pretending that you don't care about the time zone. After chasing lots of errors that are enormously difficult and expensive to fix, you will wish you had cared.
So, rather than stripping the time zone, you should either capture the correct time zone or force a specific time zone. If you reasonably can, get the various data sources fixed to provide a correct time zone. If they are out of your control, then force them either to the server's local time zone or to UTC.
The general industry convention is to force everything to UTC, and to set all production hardware clocks to UTC (that means servers, network devices like routers, etc.). Then you should translate to/from the user's local time zone in the UI.
If you fix it correctly now, it can be easy and cheap. If you intentionally break it further because you think that will be cheaper, then you will have no excuses later when you have to untangle the awful mess.
Note that this is similar to the common issue with Strings: there is not such thing as plain text (a String devoid of a character encoding) and there is no such thing as a plain (no time zone) time/date. Pretending otherwise is the source of much pain and heartache, and embarrassing errors.
OK, I do have a workaround for this, which depends on the fact that I only actually need the Date portion of the DateTime. I attach this property to every Date or DateTime parameter in the system
<XmlElement(DataType:="date")>
This changes the generated wsdl to have the type s:date instead of s:dateTime. (Note that simply having the type of the .NET method parameter be a Date rather than a DateTime did NOT accomplish this). So the client now only sends the date portion of the DateTime, no time info, no time zone info.
If I ever need to send a Date and Time value to the server, I'll have to use some other workaround, like making it a string parameter.
I've had issues with the time zone information as well. The problem is I'm already providing the datetime fields in UTC. Then the serialization occurs and the local offset becomes part of the date/time. The dates/times for our vendor in a different timezone were pretty messed up. I got around this problem by using the tsql convert function on the datetime fields in my select statement I used to populate my datasets. This converted the fields to a string variable, which translates nicely to a datetime value automatically on the client side. If you just want to pass the date, you can use the 101 code to provide just the date. I used 126 to provide the date and time exactly how it appears in my database columns, with the timezone information stripped out.