Is there a best way to store datetime value in documentdb?
Obviously I will be storing this information in UTC and ISO 8601 formats. Are there any gotchas with this?
I should be able to query based on this datetime value such startDateTime < currentValue and currentValue <= endDateTime etc. What should I do to get maximum performance on these types of queries etc.
In your case, the only real key that you didn't mention is that you have a range index with full precision (-1) on the ISO-8601 strings.
Some other general guidelines:
Store all events in canonical form: 2016-07-18T01:23:45.678Z
Store everything in zulu/GMT time. End every string with a Z. Never store it with +03:00. Make sure you shift local time input from the user to zulu time before running queries with that input.
I also recommend that you use the most coarse granularity for your situation. So, if you are referring to the entire month of march, 2016, simply store 2016-03 leaving off the -01T00:00:00.000Z. This mostly applies to the literals you use when running queries. Assuming the events are stored in canonical form than 2016-07 < 2016-07-18T01:23:45.678Z is true. This recommendation is mostly for the benefit of the user, but it won't cause any performance degradation and it's possible that it could be a very slight improvement in some circumstances.
Related
I have a request to alter current columns which are type of 'time' and instead of capturing just time I need to capture so called "utc time".
My idea is to create a fixed codelist with all timezones, and then to reference it to a appropriate table as FK.
My questions are:
Can column of a type 'time' hold also an information regarding time zones (utc, for example 15:00:00 +2 (gmt + 2)) and if not, could you suggest me another type for that column?
Should I maybe need to separate it into two columns? For example: [15:00:00] - StartTime, [+2:00] - UtcOffset
EF Insert: When I do inserting to the db, for that particular column, should I convert my DateTime object to for example DateTimeOffset?
Thanks in advance.
From the comments in your question, it sounds like you are building an appointment scheduling system. I'll base my answer on that, because your specific questions aren't quite aligned to the scenario you described.
First, it's important to understand that the relationship between a time zone and an offset is a one-to-many relationship. One time zone can have multiple offsets. In other words, a time zone is not an offset, but rather a time zone has an offset.
A time zone represents a geographic region where the local time is the same throughout. It is identified by a string ID, such as "America/Los_Angeles" (an IANA time zone ID) or "Pacific Standard Time" (a Windows time zone ID). In .NET, you will use them on the TimeZoneInfo object with the Id instance property or methods like FindSystemTimeZonesById.
An offset is like -07:00 or +05:30 or even +13:45. Any given offset applies only at a particular date and time. For example, in America/Los_Angeles, either -08:00 or -07:00 apply depending on whether daylight saving time is in effect at a given point in time. Keep in mind that DST is not the only reason for offsets to be different - many time zones have changed their standard time at some point in their history.
Also, it's called an offset because it is deviated from UTC by a certain amount. UTC itself always has a zero offset, delineated either by +00:00 or sometimes by Z. It's similar to GMT, except in how it is defined. UTC applies universally, everywhere. GMT technically applies only on the prime meridian. They both refer to the zero offset. You should prefer to say "UTC" in most cases.
Next, you should separate your application logic between future scheduling and present/past record keeping.
Present/past is the easier of the two. Since the moment in time has actually occurred, the local time and its offset from UTC is fixed forever. You can either store the local time and its offset in a single .NET DateTimeOffset structure (mapped to a datetimeoffset field in SQL Server). In other words, you can simply store 2021-07-27T12:00:00-08:00.
Note that you could instead store the equivalent UTC date and time, which would be 2021-07-27T20:00:00+00:00. However you've then lost the local time, and thus would need to convert back using the original time zone if you wanted to see that time. Some people prefer that, but I think it's more useful to store the original value.
For future scheduling, the situation is a bit different. Consider that the offset might not be the same for one appointment as it will be for the next appointment in the same time zone. Also consider that the definitions for which offset apply might change in between the time you schedule the appointment and when it comes around. (The likelihood of that increases the further out you schedule.)
Thus, for each location you should not store an offset, but rather a time zone identifier. Add a TimeZoneId to your object that stores each location (or each appointment depending on your model schema). Use TimeZoneInfo.GetSystemTimeZones to list the available time zones. The DisplayName property can be shown to your user, and the selected Id property gets assigned to the TimeZoneId.
Next you have to consider if you are scheduling a single appointment or creating a recurring appointment pattern.
For a single appointment you simply need the local date and time for that appointment. You can use a .NET DateTime struct (use datetime2 in SQL). Don't apply an offset, and don't convert to UTC. Just store the information provided.
For recurring appointments, you need to think through the information provided and store exactly what is given. For example, if the appointment is at 10:00 every other Tuesday, you'll need to store "10:00", "Tuesdays" and "2 week intervals". The data types for each will vary depending on how you choose to store and apply them. For example, you might use a time type for 10:00, but you could store the other values using integers. Appointments of different patterns could get stored in different ways.
Alternatively, some like to store patterns using a string containing a CRON expression. You can google that for more details.
Now you have everything you need to both schedule an appointment and record that appointment after it happens. But there's one part missing - you'll likely want some table of upcoming appointments that are easily queryable. For that, you've got a few options:
You can create a separate table via a background job of some kind. Periodically it would query all the appointments, use their information to compute the next upcoming appointment time, and insert it. You can store that in a DateTimeOffset, either as local time or as UTC. (SQL Server will always compute indexes on the equivalent UTC time either way.)
You could just add another field to your appointments that shows the next actual appointment time. You can then compute the next upcoming appointment whenever the appointment is created or updated, or when that appointment occurs, and update the table accordingly.
With either approach keep in mind that you will want to periodically check for time zone data updates (either via Windows Update or keeping the tzdata package current on Linux). You will also want to periodically re-compute future appointment times, in case that time zone data has been modified in a way that affects the appointments.
If all of this sounds super complicated - sorry, but it is. Doing scheduling worldwide across time zones right is challenging. If you want it simpler, you might want to look into a pre-made solution such as Quartz.NET, which you can integrate into your application.
I've been searching all morning and can't seem to get a handle on this (though I do have a few possible theories). It's also not impossible that this might be a duplicate but please take into account that all the questions I searched, didn't give a definitive answer but were rather too open to interpretation.
In SQL Server (>= 2012), a table column of type datetime, is it stored timezone offset agnostic or is how does it work? From my investigation, it would seem that datetimeoffset is the type that includes the offset with the date/time while datetime simply omits this?
When I read the data from the database, and use CONVERT( datetimeoffset, [My Column] ) it's giving me 2016-09-21 16:49:54.7170000 +00:00 while myself and the server are both in UTC +02:00 which reinforces my belief, am I correct?
What I'm trying to achieve, is allow data to be saved FROM various tz offsets (via a function), then saved into the database in UTC and finally convert the datetime value back to a (possibly different) offset. I don't care about DST / etc as the users browser will give me the current offset at the time of the saving and the viewing user will give me their tz offset at the time of viewing. For historic reports the exact time of day (DST dependent) is irrelevant.
Currently the database tables already use datetime as opposed to datetimeoffset; it's my observation that it's completely fine to continue with this, though at some point, it might be good to change to datetimeoffset in order to then have start recording the historic tz offset?
Any clarity will be greatly appreciated.
TL;DR; Yes. the DateTime (and DateTime2) data type is not time-zone aware.
The long version:
Official documentation of DateTime clearly states that the DateTime data type does not support time zone (nor daylight savings time). Same is true for DateTime2.
You can see in both pages there's a table that describes the data type's properties, and in that table, for both data types, the value for "Time zone offset aware and preservation" and for "Daylight saving aware" is "No".
Time zone offset aware and preservation No
Daylight saving aware No
The description of DateTime is as follows:
Defines a date that is combined with a time of day with fractional seconds that is based on a 24-hour clock.
The description of DateTime2 is as follows:
Defines a date that is combined with a time of day that is based on 24-hour clock.
datetime2 can be considered as an extension of the existing datetime type that has a larger date range, a larger default fractional precision, and optional user-specified precision.
The only data type that is timezone aware is DateTimeOffset:
Defines a date that is combined with a time of a day that has time zone awareness and is based on a 24-hour clock.
Btw, it is recommended to choose DateTime2 over DateTime, both by Microsoft official documentation:
Note
Use the time, date, datetime2 and datetimeoffset data types for new work. These types align with the SQL Standard. They are more portable. time, datetime2 and datetimeoffset provide more seconds precision. datetimeoffset provides time zone support for globally deployed applications.
And by SQL Server professionals: Why You Should Never Use DATETIME Again!:
Datetime also have a bug/feature implicitly converting string literals of format yyyy-mm-dd / yyyy-mm-dd hh:mm:ss - Datetime will try to convert them using local settings, while Datetime2 will always convert them correctly.
Check out this SO post about it.
I’m creating Android apps and need to save date/time of the creation record. The SQLite docs say, however, "SQLite does not have a storage class set aside for storing dates and/or times" and it's "capable of storing dates and times as TEXT, REAL, or INTEGER values".
Is there a technical reason to use one type over another? And can a single column store dates in any of the three formats from row to row?
I will need to compare dates later. For instance, in my apps I will show all records that are created between date A until date B. I am worried that not having a true DATETIME column could make comparisons difficult.
SQlite does not have a specific datetime type. You can use TEXT, REAL or INTEGER types, whichever suits your needs.
Straight from the DOCS
SQLite does not have a storage class set aside for storing dates and/or times. Instead, the built-in Date And Time Functions of SQLite are capable of storing dates and times as TEXT, REAL, or INTEGER values:
TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.
Applications can chose to store dates and times in any of these formats and freely convert between formats using the built-in date and time functions.
SQLite built-in Date and Time functions can be found here.
One of the powerful features of SQLite is allowing you to choose the storage type. Advantages/disadvantages of each of the three different possibilites:
ISO8601 string
String comparison gives valid results
Stores fraction seconds, up to three decimal digits
Needs more storage space
You will directly see its value when using a database browser
Need for parsing for other uses
"default current_timestamp" column modifier will store using this format
Real number
High precision regarding fraction seconds
Longest time range
Integer number
Lowest storage space
Quick operations
Small time range
Possible year 2038 problem
If you need to compare different types or export to an external application, you're free to use SQLite's own datetime conversion functions as needed.
SQLite does not have a storage class set aside for storing dates
and/or times. Instead, the built-in Date And Time Functions of SQLite
are capable of storing dates and times as TEXT, REAL, or INTEGER
values:
TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS"). REAL as Julian
day numbers, the number of days since noon in Greenwich on November
24, 4714 B.C. according to the proleptic Gregorian calendar. INTEGER
as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.
Applications can chose to store dates and times in any of these
formats and freely convert between formats using the built-in date and
time functions.
Having said that, I would use INTEGER and store seconds since Unix epoch (1970-01-01 00:00:00 UTC).
For practically all date and time matters I prefer to simplify things, very, very simple... Down to seconds stored in integers.
Integers will always be supported as integers in databases, flat files, etc. You do a little math and cast it into another type and you can format the date anyway you want.
Doing it this way, you don't have to worry when [insert current favorite database here] is replaced with [future favorite database] which coincidentally didn't use the date format you chose today.
It's just a little math overhead (eg. methods--takes two seconds, I'll post a gist if necessary) and simplifies things for a lot of operations regarding date/time later.
Store it in a field of type long. See Date.getTime() and new Date(long)
I'm trying to implement a ReservationController which is responsible for taking reservations of something for a specific time range. So far I guessed using one column for the Date (DateTime) and two columns for the time span (2x Time) in the database would be a good idea. Especially when it comes to queries on date, this approach is easier because I know, that the DateTime column is always set to 12am. So I just query DateTime.Today for instance. But now I'm getting into trouble with reservations which are passing the day border (eg. Today 22pm - Tomorrow 1am). Could you please give me some advice what is a common solution for this problem (what database schema I should use)?
regards
I would have thought just two DateTimes would be enough? You can still query whether start datetime or end datetime is today (i.e >= today midnight and < tomorrow midnight).
Perhaps I am missing something - were there other queries you need to do, or were you worried about optimisation of this query? It should be fine, if you add one or more indexes for the DateTime columns.
I'm planning a distributed system of applications that will communicate with different types of RDBMS. One of the requirements is consistent handling of DateTimes across all RDBMS types. All DateTime values must be at millisecond precision, include the TimeZone info and be stored in a single column.
Since different RDBMS's handle dates and times differently, I'm worried I can't rely on their native column types in this case and so I'll have to come up with a different solution. (If I'm wrong here, you're welcome to show me the way.)
The solution, whatever it may be, should ideally allow for easy sorting and comparisons on the SQL level. Other aspects, such as readability and ability to use SQL datetime functions, are not important, since this will all be handled by a gateway service.
I'm toying with an idea of storing my DateTime values in an unsigned largeint column type (8 bytes). I haven't made sure if all RDBMS's in question (MSSQL, Oracle, DB2, PostgreSQL, MySQL, maybe a few others) actually /have/ such a type, but at this point I just assume they do.
As for the storage format... For example, 2009-01-01T12:00:00.999+01:00 could be stored similar to ?20090101120000999??, which falls in under 8 bytes.
The minimum DateTime I'd be able to store this way would be 0001-01-01T00:00:00.000+xx:xx, and the maximum would be 8000-12-31T23:59:59.999+xx:xx, which gives me more than enough of a span.
Since maximum unsigned largeint value is 18446744073709551615, this leaves me with the following 3 digits (marked by A and BB) to store the TimeZone info: AxxxxxxxxxxxxxxxxxBB.
Taking into account the maximum year span of 0001..8000, A can be either 0 or 1, and BB can be anywhere from 00 to 99.
And now the questions:
What do you think about my proposed solution? Does it have merit or is it just plain stupid?
If no better way exists, how do you propose the three remaining digits be used for TimeZone info best?
One of the requirements is consistent handling of DateTimes across all RDBMS types.
Be aware that date-time handling capabilities vary radically across various database systems. This ranges from virtually no support (SQLite) to excellent (Postgres). Some such as Oracle have legacy data-types that may confuse the situation, so study carefully without making assumptions.
Rather than establish a requirement that broadly says we must support "any or all database", you should get more specific. Research exactly what databases might realistically be candidates for deployment in the real-world. A requirement of "any or all databases" is naïve and unrealistic because databases vary in many capabilities — date-time handling is just the beginning of your multi-database support concerns.
The SQL standard barely touches on the subject of date-time, broadly defining a few types with little discussion of the nuances and complexities of date-time work.
Also be aware that most programming platforms provide atrociously poor support for date-time handling. Note that Java leads the industry in this field, with its brilliantly designed java.time classes. That framework evolved from the Joda-Time project for Java which was ported to .Net platform as NodaTime.
All DateTime values must be at millisecond precision,
Good that you have specified that important detail. Understand that various systems resolve date-time values to whole seconds, milliseconds, microseconds, nanoseconds, or something else.
include the TimeZone info and be stored in a single column.
Define time zone precisely.
Understand the difference between an offset-from-UTC and a time zone: The first is a number of hours-minutes-seconds plus-or-minus, the second has a name in format Continent/Region and is a history of past, present, and future changes to the offset used by the people of a particular region.
The 2-4 letter abbreviations such as CST, PST, IST, and so on are not formal time zone names, are not standardized, and are not even unique (avoid them).
Since different RDBMS's handle dates and times differently, I'm worried I can't rely on their native column types in this case and so I'll have to come up with a different solution.
The SQL standard does define a few types that are supported by some major databases.
TIMESTAMP WITH TIME ZONE represents a moment, a specific point on the timeline. I vaguely recall hearing of a database that actually stored the incoming time zone. But most, such as Postgres, use the time zone indicated on the incoming value to adjust into UTC, then store that UTC value, and lastly, discard the zone info. When retrieved, you get back a UTC value. Beware of tools and middleware with the confusing anti-feature of applying a default time zone after retrieval and before display to the user.
TIMESTAMP WITHOUT TIME ZONE represents a date with time-of-day, but purposely lacking the context of a time zone or offset. Without a zone/offset, such a value does not represent a moment. You could apply a time zone to determine a moment in a range of about 26-27 hours, the range of time zones around the globe.
There are other types in the standard as well such as date-only (DATE) and time-only (TIME).
See this table I made for Java, but in this context the column of SQL standard types in relevant. Be aware that TIME WITH TIME ZONE makes no sense logically, and should not be used.
If you have narrowed down your list of candidate databases, study their documentation to learn if they have a type akin to the standard types in which you are interested, and what the name of that type is (not always the standard name).
I'm toying with an idea of storing my DateTime values in an unsigned largeint column type (8 bytes).
A 64-bit value is not likely appropriate. For example, the java.time classes use a pair of numbers, a number of whole seconds since the epoch reference of first moment of 1970 in UTC, plus another number for the count of nanoseconds in the fractional second.
It is really best to use the database's data-time data types if they are similar across your list of candidate databases. Using a count-from-epoch is inherently ambiguous, which makes identifying erroneous data difficult.
Storing your own count-from-epoch number is possible. If you must go that way, be sure the entire team understands what epoch reference was chosen. At least a couple dozen have been in use in various computing systems. Beware of staff persons assuming a particular epoch reference is in use.
Another way to define your own date-time tracking is to use text in the standard ISO 8601 formats. Such strings will alphabetically sort as chronological. One exception to that sorting is the optional but commonly used Z at the end to indicate an offset-from-UTC of zero (pronounced “Zulu”).
The minimum DateTime I'd be able to store this way would be 0001-01-01T00:00:00.000+xx:xx,
Taking into account the maximum year span of 0001..8000
Are you really storing values from the time of Christ? Is this software really going to be around executing transactions for the year 8000?
This is an area where the responsible stakeholders should define their real needs. For example, for many business systems you may need only data from the year of the product's launch and run out only a century or two into the future.
The minimum/maximum value range varies widely between different databases. If you choose to use a built-in data type in each database system, investigate its limits. Some, for example, may go only to the year 2038, the common Y2038 problem.
To sum up my recommendation:
Get real about your date-time needs: min/max range, resolution, and various types (moment versus not a moment, date-only, etc.).
Get real about your possible databases for deployment.
If you need enterprise-quality reliability in a classic RDMS, your candidate list is likely only a few: Postgres, Microsoft SQL Server, Oracle, and maybe IBM Db2.
Keep this list of supported databases as short as possible. Each database you agree to support is a huge commitment, now and in the future.
Be sure your chosen database(s) have a database driver available for your chosen programming language(s). For example JDBC for Java.
If at all possible, use the built-in data types offered by the database.
Be sure you and your team understand date-time handling. Many do not, in my experience, as (a) the subject is rarely taught, and (b) many programmers & admins mistakenly believe their quotidian intuitive understanding of date-time is sufficient for programming work. (Ignorance is bliss, as they say.)
Identify other areas of functionality beyond date-time handling, and compare which databases support those areas.
I would suggest you to store the datetime information in milliseconds since 1970 (Java style) .
It's a standard way for storing datetime information, in addition it's more efficient in terms of space than your suggestion. Because in your suggestion some digits are "wasted" i.e. the month digits can store only 00-12 (instead of 00-99) and so on.
You didn't specify what is your development language but I am sure you can find many code snippets that transform date to milliseconds.
If you are developing in .NET they have a similar concept of ticks. (you can use this information as well)
Regarding the time zone,I would have add another column to store only the TimeZone indication.
Remember that any format you choose should maintain consistency between two dates, i.e. if D1 > D2 then format(D1)>format(D2) , this way you can query the DB for changes since some date, or query for changes between two dates