lubridate date parsing for dates starting with 0 - r

I am using the following code to parse dates, but it doesn't seem to work following formats 04 Aug 2017, 05-Aug-2017. Basically if the date is starting with 0 and we use a number of order formats together as below.
For below example, it throws output as 2014-04-20 UTC
library(lubridate)
dateStr <- "04-Apr-2014"
newdate <- parse_date_time(dateStr,orders =c("m d y","m-d-y","m/d/y","d m y","d-m-y","d/m/Y","d B y","d-B-y","d/B/y","B d y","B-d-y","B/d/y","y m d","y d m","y-m-d","y-d-m","y/m/d"),locale = "eng")
newdate

This is not a bug, more perhaps a side-effect of a "feature".
This comes down to the "relaxed" extensions that lubridate supports. For instance, m in the strict sense is a month number, but lubridate also expands to include abbreviated and full month names. Similarly, y is typically just the two-digit year, but is extended to include the century as well. (Similar to poly-morphic code, this flexibility comes at a cost: the potential for getting things wrong.)
Further, lubridate::parse_date_time tries to be smart by supporting heterogenuous date-times (from its man-page), so "09-01-01" and "090101" will parse to be the same thing.
In this case, since you use m and y, it tries to go with numeric only, and matches the 14 to y, ignores all non-numeric (since you suggested numeric), and sees 20 as the day. If you remove all month-leading formatting strings, it no longer tries to find that order of things.
So, mitigation against this problem:
reduce the number of possible orders= formats; the more you offer, the more it can go wrong
remove all formatting strings that start with "m", only feasible if you are certain your dates will not start with month
if you have some control over the types of strings you are getting, then restrict the use of numeric-versus-named months, perhaps giving the parser a better shot
don't use parse_date_time, perhaps the other functions (e.g., dmy or not-lubridate)
file a bug if you feel strongly enough about this, though you leave yourself open to it when you try "a gazillion" formatting strings

Related

Strange behaviour with parse_time()

I am trying to parse a string representing a period consisting of minutes, seconds, and milliseconds. My preferred functions for this would come from the readr package, where seconds and milliseconds may be seen jointly as partial seconds. Apparently, within this package there is a silent assumption that minutes are represented as two digits, i.e. padded with zeros.
readr::parse_time("1:23.456", format="%M:%OS") # doesn't work
readr::parse_time("01:23.456", format="%M:%OS") # works
The ms function from lubridate handles this straight out of the box:
lubridate::ms("1:23.456")
Any workaround for this so I can use parse_time and other functions in readr without resorting to pad with zeros myself?
The format specifications can be shown here:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/strptime.html and
https://readr.tidyverse.org/reference/parse_datetime.html
The issue here is that %M refers to the time shown in minutes between "00" and "99". Note that this is the exact specification that you passed so it is part of the specified format rather than an assumption of the package. As far as I'm aware there is no minutes argument that accepts varying string lengths that can be passed to the format column. (This is dissimilar to the day argument which would accept a single character).
Lubridate's ms function uses a different method to parse time strings. Lubridate's function is far more robust due to being able to handle many formats of times including the one specified in the question when no format is given.

How to convert date and time into a numeric value

As a new and self taught R user I am struggling with converting date and time values characters into numbers to enable me to group unique combinations of data. I'm hoping someone has come across this before and knows how I might go about it.
I'd like to convert a field of DateTime data (30/11/2012 14:35) to a numeric version of the date and time (seconds from 1970 maybe??) so that I can back reference the date and time if needed.
I have search the R help and online help and only seem to be able to find POSIXct, strptime which seem to convert the other way in the examples I've seen.
I will need to apply the conversion to a large dataset so I need to set the formatting for a field not an individual value.
I have tried to modify some python code but to no avail...
Any help with this, including pointers to tools I should read about would be much appreciated.
You can do this with base R just fine, but there are some shortcuts for common date formats in the lubridate package:
library(lubridate)
d <- ymd_hms("30/11/2012 14:35")
> as.numeric(d)
[1] 1921407275
From ?POSIXct:
Class "POSIXct" represents the (signed) number of seconds since the
beginning of 1970 (in the UTC timezone) as a numeric vector.

Want only the time portion of a date-time object in R

I have a vector of times in R, all_symbols$Time and I am trying to find out how to get JUST the times (or convert the times to strings without losing information). I use
strptime(all_symbol$Time[j], format="%H:%M:%S")
which for some reason assumes the date is today and returns
[1] "2013-10-18 09:34:16"
Date and time formatting in R is quite annoying. I am trying to get the time only without adding too many packages (really any--I am on a school computer where I cannot install libraries).
Once you use strptime you will of necessity get a date-time object and the default behavior for no date in the format string is to assume today's date. If you don't like that you will need to prepend a string that is the date of your choice.
#James' suggestion is equivalent to what I was going to suggest:
format(all_symbol$Time[j], format="%H:%M:%S")
The only package I know of that has time classes (i.e time of day with no associated date value) is package:chron. However I find that using format as a way to output character values from POSIXt objects lends itself well to functions that require factor input.
In the decade since this was written there is now a package named “hms” that has some sort of facility for hours, minutes, and seconds.
hms: Pretty Time of Day
Implements an S3 class for storing and formatting time-of-day values, based on the 'difftime' class.
Came across the same problem recently and found this and other posts R: How to handle times without dates? inspiring. I'd like to contribute a little for whoever has similar questions.
If you only want to you base R, take advantage of as.Date(..., format = ("...")) to transform your date into a standard format. Then, you can use substr to extract the time. e.g. substr("2013-10-01 01:23:45 UTC", 12, 16) gives you 01:23.
If you can use package lubridate, functions like mdy_hms will make life much easier. And substr works most of the time.
If you want to compare the time, it should work if they are in Date or POSIXt objects. If you only want the time part, maybe force it into numeric (you may need to transform it back later). e.g. as.numeric(hm("00:01")) gives 60, which means it's 60 seconds after 00:00:00. as.numeric(hm("23:59")) will give 86340.

Specific date format conversion problems in R

Basically I want to know why as.Date(200322,format="%Y%W") gives me NA. While we are at it, I would appreciate any advice on a data structure for repeated cross-section (aka pseudo-panel) in R.
I did get aggregate() to (sort of) work, but it is not flexible enough - it misses data on columns when I omit the missed values, for example.
Specifically, I have a survey that is repeated weekly for a couple of years with a bunch of similar questions answers to which I would like to combine, average, condition and plot in both dimensions. Getting the date conversion right should presumably help me towards my goal with zoo package or something similar.
Any input is appreciated.
Update: thanks for string suggestion, but as you can see in your own example, %W part doesn't work - it only identifies the year while setting the current day while I need to set a specific week (and leave the day blank).
Use a string as first argument in as.Date() and select a specific weekday (format %w, value 0-6). There are seven possible dates in each week, therefore strptime needs more information to select a unique date. Otherwise the current day and month are returned.
> as.Date(paste("200947", "0", sep="-"), format="%Y%W-%w")
[1] "2009-11-22"

C#: Convert AS/400 date into DateTime

Dates in DB2 AS/400 are an integer, containing the number of days since sometime around the turn of the 20th century.
Question 1: Does anyone know the IBM DB2/AS400 "zero" date? e.g.:
12/30/1899
12/31/1899
1/1/1900
Question 2: Given an "AS/400" date (e.g. 40010) how can you convert that to a CLR DateTime?
DateTime d = new DateTime(40010); //invalid
Some other "zero" dates are:
OLE Automation: 12/30/1899
SQL Server: 1/1/1900
I don't think AS/400 dates are stored internally as some number of days from an epoch date1 (this is the more common term for what you are calling "zero date"). As Tracy Probst said, this is definitely NOT what date fields in native AS/400 physical files look like.2
But that's immaterial if whatever method you are using to extract the data is giving it to you as the number of days since an epoch. Ideally, you should find out what the intended date is by looking directly at the AS/400, or asking someone who can. If the date on the AS/400 is 2009-07-30 and what you are getting is 40022, then you can be pretty confident the epoch date is Jan 1, 1900. If you are getting 40024, then the epoch is Dec 30, 1899. (Though it's of course best to compare a bunch of dates, preferably from different years to guard against possible use of Julian dates.)
Also, as Tracy commented on his own answer, it's exceedingly common for dates to be stored in generic numeric fields (which is what I would guess if your retrieval method is reporting Decimal as the data type), in which case it really has nothing to do with DB2's internal date format anyway. You should be aware that by far the most common date formats stored in AS/400 numeric fields are the following, or variations thereof:
yyyymmdd (Gregorian, ISO 4-digit year)
mmddyy (Gregorian, U.S. 2-digit year)
yyyyddd (so-called Julian, 4-digit year)
yyddd (so-called Julian, 2-digit year)
yymmdd
cyymmdd (IBM's crazy invention with century flag)
The ddd in the Julian dates is the number of days from the beginning of year. The c in IBM's crazy date is 0 for 19yy or 1 for 20yy. I have not heard of anyone who stores days-since-epoch on "The Four Hundred" but maybe you've encountered a convert from another platform. The mainframe heritage of the AS/400 strongly favors human-readable dates.
1The AS/400 (now called IBM i) does have its own data type for dates, and this data type actually does consist internally of a number of days from an epoch. But that epoch is many thousands of years in the past, not somewhere near the turn of the 20th century, and not even near the beginning of the Common Era. IBM likes to call this number of days the Scaliger number, but for most people who study this stuff, it's called the Julian Day Number. As you may have noticed from the main part of my answer, IBM uses the word "Julian" to mean something completely different (and not even related to the Julian calendar). Namely, IBM's so-called "Julian date" is really the ordinal date from ISO 8601.
2The internal format of the date data type is very low-level and mostly hidden from the user (including most programmers). The DSPPFM command, which ostensibly shows the "actual contents" of a file, is at least one step "too late": the value it reports has already been converted from the internal, 4-byte "Scaliger number" to a human-readable form.
Question 1:
I have no idea what the start date is for DB2. Google isn't very helpful anyway. Don't you have any sample data you could use to figure it out?
Update: are you sure the date is stored as a number of days? I found this page that suggests otherwise.
Question 2:
Assuming 1900-01-01 as the start date in this example, where days is the AS/400 date value.
DateTime myDate = new DateTime(1900, 1, 1).AddDays(days);
I don't know the answer for 1. But for 2, you can do something like this:
private DateTime AS400 = new DateTime(1900, 1, 1);
...
DateTime myClrDT = AS400.AddDays(days);
Question 1:
As far as I can tell, there is no "zero date" in an AS/400 phsyical file. If I do a DSPPFM on a phsyical file with a timestamp field in it, the value is stored as a readable timestamp in the format yyyy-MM-ddhh.mm.ss. For example: "2005-08-0207.06.33" for 08/02/2005 at 7:06:33 AM. There can be a zero-date within a particular programming language and that's really where you need to focus. The AS/400 ODBC driver returns the date in a SQL_TYPE_TIMESTAMP field.
Question 2:
It should be as simple as:
DateTime d = Convert.ToDateTime(reader["DateField"]);
I invite other C# experts to edit the response with better C# code.
I've just 5 months of experience in DB2(working on AS400), so i just can show you something
about the way we work with dates. It's true that we consider the 'zero' date in our calculation of the date fields. In our system, the 'zero' date =12/31/1971 0:00.
I don't know if this is the 'only' 'zero' date in AS400.
In our system files, the date we use is stored as the number of days from the 'zero' date(length=5).
So, every time we have to get the date field, from a specified file, we convert this field to get the date in the format : dd/mm/yyyy or yyyy-mm-dd(it depends from the environment where we execute the query). The function is:
date(field+719892), where field is the field where we store the date and 719892 is the number of days we add after each unconverted date we use(it seems like it is the number of days between x-12/31/1971, you can calculate x).
I'll give you on more example:
select date(15+719892) as date1 from library1.file1
The result is: date1=1972-01-15
marc_s had a comment that confused the "zero" dates with "minimum" dates in SQL Server. Just so everyone gets to see the example:
SELECT
CAST(0 AS datetime) AS dateTimeZero,
CAST(0 AS smalldatetime) AS smallDateTimeZero
dateTimeZero smallDateTimeZero
======================= ===================
1900-01-01 00:00:00.000 1900-01-01 00:00:00

Resources