Strptime not able to convert date and time - r

My attempt to format date like this results in NA NA. Neither the date nor the time is getting converted. What am I doing wrong?
x <- strptime(c("2013-12-12", "08:43:24.967"),"%Y-%m-%d %H:%M:%OS")

With the format string that you have supplied, strptime expects a vector of date-time strings. You have a vector containing date and time as separate vector elements. This is incorrect.
Instead of passing c("2013-12-12", "08:43:24.967") (two elements, date then time), you need to pass "2013-12-12 08:43:24.967" (one element, date-time).
The data you have can be put in the proper format with paste:
strptime(paste("2013-12-12", "08:43:24.967"),format="%Y-%m-%d %H:%M:%OS")
[1] "2013-12-12 08:43:24"
The fractional seconds aren't printed above, because the default is to not print them. But the expression does capture them (with the default options(digits.secs=NULL)). They would be printed with the proper format string for output, or a specification of the number of digits to print (e.g. options(digits.secs=3))

You need to pass a date string as first argument to strptime function follwed by the date format. It seems like you entered 0 in the date format string and please remove that milliseconds part in the date string.
You can use statement like this:
strptime("2013-12-12 08:43:24", "%Y-%m-%d %H:%M:%S")

Related

Behaviour of as.Date when format = ""

I have a simple question.
How does the function as.Date work?
Can anyone explain why a simple "" changes everything in the code below?
as.Date("03/04/2019")
#[1] "0003-04-19"
as.Date("03/04/2019", "")
#[1] "2019-07-16"
This is directly from the help page which you view with ?as.Date:
## S3 method for class 'character'
as.Date(x, format, tryFormats = c("%Y-%m-%d", "%Y/%m/%d"),
optional = FALSE, ...)
x: an object to be converted.
format: character string. If not specified, it will try tryFormats one by one on the first non-NA element, and give an error if none works. Otherwise, the processing is via strptime.
So, in the first case we do not provide format. In this case as.Date tries the following formats: "year-month-day" and "year/month/day". The second one "works" in this case so we get the following return:
as.Date("03/04/2019")
[1] "0003-04-20"
"03" is interpreted as year, "04" as month and "2019" as day. Since day can only have to digits only the first two digits are used.
In the second case we provide format ourselves:
as.Date("03/04/2019", format = "")
[1] "2019-07-16"
If format is provided as.Date does not try any formats. Instead strptime("03/04/2019", format = "") is returned.
In the help page of strptime you can find the following:
For strptime the input string need not specify the date completely: it
is assumed that unspecified seconds, minutes or hours are zero, and an
unspecified year, month or day is the current one.
So you provide a format but it does not contain anything so the current date is returned.
In any case, you can simply specify the format yourself:
as.Date("03/04/2019", format = "%d/%m/%Y")
[1] "2019-04-03"

Parsing dates in R from strings with multiple formats

I have a tibble in R with about 2,000 rows. It was imported from Excel using read_excel. One of the fields is a date field: dob. It imported as a string, and has dates in three formats:
"YYYY-MM-DD"
"DD-MM-YYYY"
"XXXXX" (ie, a five-digit Excel-style date)
Let's say I treat the column as a vector.
dob <- c("1969-02-02", "1986-05-02", "34486", "1995-09-05", "1983-06-05",
"1981-02-01", "30621", "01-05-1986")
I can see that I probably need a solution that uses both parse_date_time and as.Date.
If I use parse_date_time:
dob_fixed <- parse_date_time(dob, c("ymd", "dmy"))
This fixes them all, except the five-digit one, which returns NA.
I can fix the five-digit one, by using as.integer and as.Date:
dob_fixed2 <- as.Date(as.integer(dob), origin = "1899-12-30")
Ideally I would run one and then the other, but because each returns NA on the strings that don't work I can't do that.
Any suggestions for doing all? I could simply change them in Excel and re-import, but I feel like that's cheating!
We create a logical index after the first run based on the NA values and use that to index for the second run
i1 <- is.na(dob_fixed)
dob_fixed[i1] <- as.Date(as.integer(dob[i1]), origin = "1899-12-30")

R: why a '%b.%Y' date class is not "Date"?

Sometimes I work with data like this:
sep-2018
From date like this:
Sys.Date()
[1] "2018-09-21"
To have this result, I generally use:
format(Sys.Date(),'%b-%Y')
But its class is not a date:
class(format(Sys.Date(),'%b-%Y'))
[1] "character"
Why it's not a date? Is it possible to have it with class() = date, and how?
Also an external library like zoo have the same thing.
library(zoo)
> class(format(as.yearmon(format(Sys.Date()), "%Y-%m-%d"), "%b.%Y"))
[1] "character"
Also using "%m.%Y" seems to generate the same thing, but it does not creates (for example) ordering issue.
The format command takes the date and outputs a printable string based on the format you provide. To quote the documentation:
An object of similar structure to x containing character representations of the
elements of the first argument x in a common format, and in the current
locale's encoding.
Also, a Date variable is stored as a numeric type internally (number of days since 1970-01-01)
dput(Sys.Date())
#structure(17795, class = "Date")
structure(0, class = "Date")
#[1] "1970-01-01"
So to pinpoint the date, you need day, month and year fields. If you don't have all three, it will probably return NA or an error. Similarly for time classes. If you don't have the data then you can just use some dummy values, and use format to print only the fields you want.
As Rohit says, format doesn't outputs a Date object, but a string in the format of your choice.
To get a Date object from a string like "sep-2018" you could use readr::parse_date().
(my_date <- readr::parse_date("sep-2018", format = '%b-%Y'))
#[1] "2018-09-01"
class(my_date)
#[1] "Date"

How do I change the format of a char vector containing milliseconds to timeseries vector in R

I have a DF in R which has two character columns. The first column is a time series array and the second column contains continuous numbers. The time series field has time recorded in milliseconds. I am trying to convert this array to a date array. However whichever method I use to convert the same, I lose the milliseconds information.
Following is the dataframe:
time = c("08-08-2016 09:16:33.430","08-08-2016 09:16:37.930")
values <- c(45,21)
my_data <- data.frame(time,values)
I would like to preserve the millisecond information. However, as I convert the time char array using following method, I lose the milliseconds (O/P time array= 2016-08-08 09:16:33,08-08-2016 09:16:37) .
my_data$time=strptime(my_data$time,format="%m-%d-%Y %H:%M:%S.%OS")
I also tried using as.POSIXct, as.Date functions but could not resolve. Can someone please help?
%OS instead of %S, not in addition to it. "%m-%d-%Y %H:%M:%OS" is the format string required:
options(digits.secs=6)
as.POSIXct(my_data$time, format="%m-%d-%Y %H:%M:%OS")
#[1] "2016-08-08 09:16:33.43 AEST" "2016-08-08 09:16:37.93 AEST"
You have a standard-enough format so that anytime can parse this automagically with additional input from you:
R> timevec <- c("08-08-2016 09:16:33.430","08-08-2016 09:16:37.930")
R> anytime(timevec)
[1] "2016-08-08 09:16:33.43 CDT" "2016-08-08 09:16:37.93 CDT"
R>
I tend to have options(digits.secs=6) set by default which is why the display also shows the fractional seconds.

error in getting the correct date using strptime in R

I'm using strptime to extract date and the result is a wrong year
Where is the error in the below code:
strptime('8/29/2013 14:13', "%m/%d/%y")
[1] "2020-08-29 PDT"
What are the other ways to extract date and time as separate columns.
The data I have is in this format - 8/29/2013 14:13
I want to split this into two columns, one is 8/29/2013 and the other is 14:13.
You have a four digit year so you need to use %Y
strptime('8/29/2013 14:13', "%m/%d/%Y" )
[1] "2013-08-29 CEST"
Do you really want data and time in separate columns? It usually much easier to deal with a single date-time object.
Here's one possibility to separate time and date from the string.
For convenience, we could first convert the string into a POSIX object:
datetime <- '8/29/2013 14:13'
datetime.P <- as.POSIXct(datetime, format='%m/%d/%Y %H:%M')
Then we can use as.Date() to extract the date from this object and use format() to display it in the desired format:
format(as.Date(datetime.P),"%m/%d/%Y")
#[1] "08/29/2013"
To store the time separately we can use, e.g., the strftime() function:
strftime(datetime.P, '%H:%M')
#[1] "14:13"
The last function (strftime()) is not vectorized, which means that if we are dealing with a vector datetime containing several character strings with date and time in the format as described in the OP, it should be wrapped into a loop like sapply() to extract the time from each string.
Example
datetime <- c('8/29/2013 14:13', '9/15/2014 12:03')
datetime.P <- as.POSIXct(datetime, format='%m/%d/%Y %H:%M')
format(as.Date(datetime.P),"%m/%d/%Y")
#[1] "08/29/2013" "09/15/2014"
sapply(datetime.P, strftime, '%H:%M')
#[1] "14:13" "12:03"
Hope this helps.

Resources