Issue in converting date format to numeric format in R - r

I had a dataset that looked like this:
#df
id date
1 2016-08-30 10:46:46.810
I tried to remove the hour part and only keep the date. This function worked:
df$date <- format(as.POSIXct(strptime(df$date,"%Y-%m-%d %H:%M:%S")) ,format = "%Y-%m-%d")
and the date now look likes this
id date
1 2016-08-30
Which is something that I was looking for. But the problem is I wish to do some calculation on this data and have to convert it to integer:
temp <- as.numeric(df$date )
It gives me the following warning:
Warning message:
NAs introduced by coercion
and results in
NA
Does anyone know where the issue is?

It's pretty easy as you have a standard format (see ISO 8601) which inter alia the anytime package supports (and it supports other, somewhat regular formats):
R> library(anytime)
R> at <- anytime("2016-08-30 10:46:46.810")
R> at
[1] "2016-08-30 10:46:46.80 CDT"
R> ad <- anydate("2016-08-30 10:46:46.810")
R> ad
[1] "2016-08-30"
R>
The key, though, is to understand the relationship between the underlying date formats. You will have to read and try a bit more on that. Here, in essence we just have
R> as.Date(anytime("2016-08-30 10:46:46.810"))
[1] "2016-08-30"
R>
The anytime package has a few other tricks such as automagic conversion from integer, character, factor, ordered, ...
As for the second part of your question, your were so close and then you spoiled it again with format() creating a character representation.
You almost always want Date representation instead:
R> ad <- as.Date(anytime("2016-08-30 10:46:46.810"))
R> as.integer(ad)
[1] 17043
R> as.numeric(ad)
[1] 17043
R> ad + 1:3
[1] "2016-08-31" "2016-09-01" "2016-09-02"
R>

Not format(). format gives you a character vector (string), and this confuses as.numeric because there are weird non-numeric characters in there. As far as the parser is concerned, you might as well have asked as.numeric("ripe red tomatoes").
Use as.Date() instead. e.g.
as.Date(as.POSIXct(df$date, format="%Y-%m-%d %H:%M:%S"))

Related

Date conversion problems in R

I have a date field in the third column of imported data as a string. I am trying to convert it to a proper date field.
mydata[1,3]
[1] 04/01/1957
The field is initially typed as a factor. I try to convert it to a date with:
mydata$Date <- as.Date(mydata$Date, "%m/%d/%y")
However, this seems to convert incorrectly. The new output is:
mydata[1,3]
[1] "2019-04-01"
The mistake is that you use %y instead of %Y.
%y is a two-digit year and %Y a four digit year.
Check https://www.statmethods.net/input/dates.html
It's a duplicate a few dozen times over. anytime::anydate(mydata[1,3]) will do it for you by first converting to character.
Here is your example, made reproducible:
R> dfact <- factor(c("04/01/1957", "05/01/1957", "06/01/1957"))
R> mydata <- data.frame(a=dfact, b=dfact, c=dfact)
R> mydata[1,3]
[1] 04/01/1957
Levels: 04/01/1957 05/01/1957 06/01/1957
R> anytime::anydate(mydata[1,3])
[1] "1957-04-01"
R>
Note also that your format string is wrong: you have a four-digit year for which you need %Y instead of %y%. And what you (as well as the other answers / comments) missed as well is that as.character() is mandatory here:
R> mydata[1,3]
[1] 04/01/1957
Levels: 04/01/1957 05/01/1957 06/01/1957
R> as.Date(mydata[1,3], "%m/%d/%Y") # used to be wrong, now works
[1] "1957-04-01"
R> as.Date(as.character(mydata[1,3]), "%m/%d/%Y") # better but more work
[1] "1957-04-01"
R> anytime::anydate(mydata[1,3]) # easiest
[1] "1957-04-01"
R>
Edit: I at first overlooked that R now seems to add the as.character() step which used to be mandatory. Small steps. anydate() still helps by allowing us to skip the format string for a fairly large number of possible formats.

Convert YYYYMM factor format to YYYY-MM format

I have data which have the format of YYYYMM and I wish convert it to YYYY-MM format.
exemple : 201805 should be in the format of 2018-05
How could I do it please ?
We can use as.yearmon from zoo to convert it to yearmon object and then do the format
library(zoo)
format(as.yearmon(as.character(v1), "%Y%m"), "%Y-%m")
#[1] "2018-05"
data
v1 <- 201805
I like the idea of using actual dates here. If the days component does not matter to you, then you may arbitrarily just set each of your dates to the first of the month. Then, we can leverage R's dates functions to handle the heavy lifting.
x <- "201805"
x <- paste0(x, "01")
x
y <- format(as.Date(x, format = "%Y%m%d"), "%Y-%m-%d")
substr(y, 1, 7)
[1] "20180501"
[1] "2018-05"
You could use regular expressions:
data <- "201805"
sub("(\\d{4})", "\\1-", data)
[1] "2018-05"
Another variant, using only lookarounds:
sub("(?<=\\d{4})(?=\\d{2})", "-", data, perl=TRUE)
How about following one(I am considering that OP need not to perform any checks on its variable's value here).
val="201805"
sub("(..$)","-\\1",val)
OR to perform substitution with last 2 digits only try following.
val="201805"
sub("(\\d{2}$)","-\\1",val)
[1] "2018-05"
Very similar to some of the others, but because I find the package useful I will mention it:
library(lubridate)
date <- "201805"
format(ymd(paste0(date,"01")), "%Y-%m")
Lubridate can make life easy if the formats start to vary.
Here is another option albeit a longer one:
library(tidyverse)
somestring<-"201805"
stringi::stri_sub(somestring,1,4)<-"-"
somestring1<-"201805"
somestring2<-substring(somestring1,1,4)
as.character.Date(paste0(somestring2,somestring))
Result:
"2018-05"

R convert character "111213" into proper time which is "11:12:13"

R convert character "111213" into time "11:12:13".
strptime("111213", format="%H%m%s") gives NA
and
strptime("111213", "%H%m%s") gives 1970-01-01 01:00:13 CET
I think the canonical answer would be as in my comment:
format(strptime("111213", format="%H%M%S"), "%H:%M:%S")
#[1] "11:12:13"
where you can read ?strptime for all the details. format is a generic function, and in this specific case we are using format.POSIXlt.
Another solution is to merely play with string:
paste(substring("111213", c(1,3,5), c(2,4,6)), collapse = ":")
#[1] "11:12:13"
This makes sense because your input is really not a Date-Time: there is no Date.
We can use
library(chron)
times(gsub("(.{2})(?=\\d)", "\\1:", "111213", perl = TRUE))
#[1] 11:12:13
To manipulate time, you can use hms package.
By default, it working with %H:%M;%S (or %X format).
For you specifique time format ("111213"), you need to go through base function as.difftime
hms::as.hms(as.difftime("111213", format = "%H%M%S"))
#> 11:12:13
So if we incorporate also date in similar "integer" format we can obtain command :
strptime("20181017 112233", format="%Y%m%d %H%M%S")

Why is Date is being returned as type 'double'?

I'm having some trouble working with the as.Date function in R. I have a vector of dates that I'm reading in from a .csv file that are coming in as a factor of integers or as character (depending on how I read in the file, but this doesn't seem to have anything to do with the issue), formatted as %m/%d/%Y.
I'm going through the file row by row, pulling out the date field and trying to convert it for use elsewhere using the following code:
tmpDtm <- as.Date(as.character(tempDF$myDate), "%m/%d/%Y")
This seems to give me what I want, for example, if I do this to a starting value of 12/30/2014, I get the value "2014-12-30" returned. However, if I examine this value using typeof(), R tells me that it its data type is 'double'. Additionally, if I try to bind this to other values and store it in a data frame using c() or cbind(), in the data frame, it winds up being stored as 16434, which looks to me like some sort of different internal storage value of a date. I'm pretty sure that's what it is too because if I try to convert that value again using as.Date(), it throws an error asking for an origin.
So, two questions: Is this as expected? If so, is there a more appropriate way to convert a date so that I actually end up with a date-typed object?
Thank you
Dates are internally represented as double, as you can see in the following example:
> typeof(as.Date("09/12/16", "%m/%d/%y"))
[1] "double"
it is still marked a class Date, as in
> class(as.Date("09/12/16", "%m/%d/%y"))
[1] "Date"
and because it is a double, you can do computations with it. But because it is of class Date, these computations lead to Dates:
> as.Date("09/12/16", "%m/%d/%y") + 1
[1] "2016-09-13"
> as.Date("09/12/16", "%m/%d/%y") + 31
[1] "2016-10-13"
EDIT
I have asked for c() and cbind(), because they can be assciated with some strange behaviour. See the following example, where switching the order within c changes not the type but the class of the result:
> c(as.Date("09/12/16", "%m/%d/%y"), 1)
[1] "2016-09-12" "1970-01-02"
> c(1, as.Date("09/12/16", "%m/%d/%y"))
[1] 1 17056
> class(c(as.Date("09/12/16", "%m/%d/%y"), 1))
[1] "Date"
> class(c(1, as.Date("09/12/16", "%m/%d/%y")))
[1] "numeric"
EDIT 2 - c() and cbind force objects to be of one type. The first edit shows an anomaly of coercion, but generally, the vector must be of one shared type. cbind shares this behavior because it coerces to matrix, which in turn coerces to a single type.
For more help on typeof and class see this link
This is as expected. You used typeof(); you probably should used class():
R> Sys.Date()
[1] "2016-09-12"
R> typeof(Sys.Date()) # this more or less gives you how it is stored
[1] "double"
R> class(Sys.Date()) # where as this gives you _behaviour_
[1] "Date"
R>
Minor advertisement: I have a new package anytime, currently in incoming at CRAN, which deals with this as it converts "anything" to POSIXct (via anytime()) or Date (via anydate().
E.g.:
R> anydate("12/30/2014") # no format needed
[1] "2014-12-30"
R> anydate(as.factor("12/30/2014")) # converts from factor too
[1] "2014-12-30"
R>

convert factor to date with empty cells

I have a factor vector x looking like this:
""
"1992-02-13"
"2011-03-10"
""
"1998-11-30"
Can I convert this vector to a date vector (using as.Date())?
Trying the obvious way gives me:
> x <- as.Date(x)
Error in charToDate(x) :
character string is not in a standard unambiguous format
At the moment I solve this problem like this:
> levels(x)[1] <- NA
> x <- as.Date(x)
But this doesn't look too elegant...
Thank you in advance!
You simply need to tell as.Date what format to expect in your character vector:
xd <- as.Date(x, format="%Y-%m-%d")
xd
[1] NA "1992-02-13" "2011-03-10" NA "1998-11-30"
To illustrate that these are indeed dates:
xd[3] - xd[2]
Time difference of 6965 days
PS. This conversion using as.Date works regardless of whether your data is a character vector or a factor.
When you pull in the data with read.csv, or others, you can set
read.csv(...,na.strings=c(""))
to avoid having to deal with this entirely.
I usually convert factors to a POSIX* type class using the function strptime. First argument is your vector and the second argument is the "pattern" by which the date/time is constructed (a % sign + a specific letter). You basically tell R that first you have a year, then you have a -, then a month and so on. See ?strptime for a full list of conversion specifications.
x <- factor(c("1992-02-13", "2011-03-10", "1998-11-30"))
(x.date <- strptime(x, format = "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"
class(x.date)
[1] "POSIXlt" "POSIXt"
The same principle holds for as.Date. You tell R to "make this a date/time object and here are the instructions on how to make it".
(as.Date(x, "%Y-%m-%d"))
[1] "1992-02-13" "2011-03-10" "1998-11-30"

Resources