strptime returning NA - r

I am getting NA value for my date string using strptime in R.
I looked at the various answers, but it didn't work.
Here is my code
startDate=strptime("Wed May 25 01:51:32 UTC 2016", format="%a %B %d %H:%m:%S %Z %Y", tz="UTC")
print(startDate)
Any help would be appreciated.

"%H:%m:%S" should be "%H:%M:%S". Once you change that, you'll get an error because %Z is not valid for input.
If all the datetime strings have UTC timezone, this will work:
R> strptime("Wed May 25 01:51:32 UTC 2016", "%a %B %d %H:%M:%S UTC %Y", "UTC")
[1] "2016-05-25 01:51:32 UTC"
If not, then you can extract the year and prepend it to the string, because strptime will ignore all characters after those specified by the format string.
R> dts <- "Wed May 25 01:51:32 UTC 2016"
R> dtf <- "%Y %a %B %d %H:%M:%S"
R> strptime(paste(substring(dts, nchar(dts)-3), dts), dtf, "UTC")
[1] "2016-05-25 01:51:32 UTC"

Related

Adjust data time zone in R

For some reason I can't adjust the time zone by as.POSIXlt.
time <- "Wed Jun 22 01:53:56 +0000 2016"
t <- strptime(time, format = '%a %b %d %H:%M:%S %z %Y')
t
[1] "2016-06-21 21:53:56"
Can't change the time zone
as.POSIXlt(t, "EST")
[1] "2016-06-21 21:53:56"
as.POSIXlt(t, "Australia/Darwin")
[1] "2016-06-21 21:53:56"
Can change the time zone for Sys.time()
as.POSIXlt(Sys.time(), "EST")
[1] "2016-09-26 01:47:22 EST"
as.POSIXlt(Sys.time(), "Australia/Darwin")
[1] "2016-09-26 16:19:48 ACST"
How to solve it?
Try this:
time <- "Wed Jun 22 01:53:56 +0000 2016"
strptime(time, format = '%a %b %d %H:%M:%S %z %Y')
#[1] "2016-06-22 07:23:56"
strptime(time, format = '%a %b %d %H:%M:%S %z %Y', tz="EST")
#[1] "2016-06-21 20:53:56"
strptime(time, format = '%a %b %d %H:%M:%S %z %Y', tz="Australia/Darwin")
#[1] "2016-06-22 11:23:56"
strptime returns a POSIXlt object. Calling as.POSIXlt on t just returns t. There is no as.POSIXlt.POSIXlt method, so as.POSIXlt.default is dispatched. And you can see the first if statement checks if x inherits the POSIXlt class, and returns x if that's true.
str(t)
# POSIXlt[1:1], format: "2016-06-21 20:53:56"
print(as.POSIXlt.default)
# function (x, tz = "", ...)
# {
# if (inherits(x, "POSIXlt"))
# return(x)
# if (is.logical(x) && all(is.na(x)))
# return(as.POSIXlt(as.POSIXct.default(x), tz = tz))
# stop(gettextf("do not know how to convert '%s' to class %s",
# deparse(substitute(x)), dQuote("POSIXlt")), domain = NA)
# }
# <bytecode: 0x2d6aa18>
# <environment: namespace:base>
You either need to use as.POSIXct instead of strptime and specify the timezone you want, then convert to POSIXlt:
ct <- as.POSIXct(time, tz = "Australia/Darwin", format = "%a %b %d %H:%M:%S %z %Y")
t <- as.POSIXlt(ct)
Or use strptime and convert t to POSIXct and then back to POSIXlt:
t <- strptime(time, format = "%a %b %d %H:%M:%S %z %Y")
t <- as.POSIXlt(as.POSIXct(t, tz = "Australia/Darwin"))

How to convert string to date format in R

I have a column of strings in the following format:
Wed, 6 Dec 2000 08:47:00 -0800 (PST)
How can I convert this into date format using lubridate or another package? I have done this before, but there was no -0800 (PST) at the end.
Thank you.
I was able to get a result using strptime() without even worrying about the timezone name at the end:
> x - "Wed, 6 Dec 2000 08:47:00 -0800 (PST)"
> strptime(x, "%a, %d %b %Y %H:%M:%S %z")
[1] "2000-12-07 00:47:00"
However, if you want to remove the timezone name, you can use substr() to do this:
> strptime(substr(x, 1, nchar(x)-6), "%a, %d %b %Y %H:%M:%S %z")
[1] "2000-12-07 00:47:00"
We can also use parse_date_time
library(lubridate)
parse_date_time(x, "adbY HMS z", tz = "US/Pacific")
#[1] "2000-12-06 08:47:00 PST"

How do I specify POSIX (time) format for 3 letter tz in R, in order to ignore it?

For output, the specification is %Z (see ?strptime). But for input, how does that work?
To clarify, it'd be great for the time zone abbreviation to be parsed into useful information by as.POSIXct(), but more core to be question is how to get the function to at least ignore the time zone.
Here is my best workaround, but is there a particular format code to pass to as.POSIXct() that will work for all time zones?
times <- c("Fri Jul 03 00:15:00 EDT 2015", "Fri Jul 03 00:15:00 GMT 2015")
as.POSIXct(times, format="%a %b %d %H:%M:%S %Z %Y") # nope! strptime can't handle %Z in input
formats <- paste("%a %b %d %H:%M:%S", gsub(".+ ([A-Z]{3}) [0-9]{4}$", "\\1", times),"%Y")
as.POSIXct(times, format=formats) # works
Edit: Here is the output from the last line, as well as its class (from a separate call); the output is as expected. From the console:
> as.POSIXct(times, format=formats)
[1] "2015-07-03 00:15:00 EDT" "2015-07-03 00:15:00 EDT"
> attributes(as.POSIXct(times, format=formats))
$class
[1] "POSIXct" "POSIXt"
$tzone
[1] ""
The short answer is, "no, you can't." Those are abbreviations and they are not guaranteed to uniquely identify a specific timezone.
For example, is "EST" Eastern Standard Time in the US or Australia? Is "CST" Central Standard Time in the US or Australia, or is it China Standard Time, or is it Cuba Standard Time?
I just noticed that you're not trying to parse the timezone abbreviation, you are simply trying to avoid it. I don't know of a way to tell strptime to ignore arbitrary characters. I do know that it will ignore anything in the character representation of the time after the end of the format string. For example:
R> # The year is not parsed, so the current year is used
R> as.POSIXct(times, format="%a %b %d %H:%M:%S")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"
Other than that, a regular expression is the only thing I can think of that solves this problem. Unlike your example, I would use the regex on the input character vector to remove all 3-5 character timezone abbreviations.
R> times_no_tz <- gsub(" [[:upper:]]{3,5} ", " ", times)
R> as.POSIXct(times_no_tz, format="%a %b %d %H:%M:%S %Y")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"

as.Date dosen't work with %d-%b-%y (R) [duplicate]

I often use as.POSIXct to convert characters to POSIXct, but I get NA sometimes and I don't know why. For example:
DATE <- "Fri Apr 10 11:57:47 2015"
DATE_in_posix <- as.POSIXct(DATE, format="%a %b %d %H:%M:%S %Y")
I tried this too:
DATE_in_posix <- as.POSIXct(DATE, format="%a %h %d %H:%M:%S %Y")
But result for both is always:
> DATE_in_posix
[1] NA
Maybe the input for as.POSIXct is too long? And when it's too long what could be the solution?
It's probably because "Fri" and "Apr" are not the correct abbreviations in your locale.
Use Sys.setlocale("LC_TIME", locale) to set your R session's locale to one that will correctly interpret English abbreviations. See the Examples section of ?Sys.setlocale for how to specify locale in the above function call.
For example, on my Ubuntu machine it would be:
> Sys.setlocale("LC_TIME", "en_US.UTF-8")
> as.POSIXct("Fri Apr 10 11:57:47 2015", format="%a %b %d %H:%M:%S %Y")
[1] "2015-04-10 11:57:47 CDT"
Thanks a lot Henrik!!!
I changed the LC_TIME category like this, now it works
Sys.getlocale(category = "LC_TIME")
[1] "German_Germany.1252"
Sys.setlocale("LC_TIME", "English")
[1] "English_United States.1252"
DATE_in_posix<-as.POSIXct(DATE,format="%a %b %d %H:%M:%S %Y")
> DATE_in_posix
[1] "2015-04-10 11:57:47 CEST"
and strptime now works too of course
DATE_in_posix<-strptime(DATE,format="%a %b %d %H:%M:%S %Y")
> DATE_in_posix
[1] "2015-04-10 11:57:47 CEST"
Thank you so much guys and have a nice weekend!

Conversion from character to date/time returns NA

I often use as.POSIXct to convert characters to POSIXct, but I get NA sometimes and I don't know why. For example:
DATE <- "Fri Apr 10 11:57:47 2015"
DATE_in_posix <- as.POSIXct(DATE, format="%a %b %d %H:%M:%S %Y")
I tried this too:
DATE_in_posix <- as.POSIXct(DATE, format="%a %h %d %H:%M:%S %Y")
But result for both is always:
> DATE_in_posix
[1] NA
Maybe the input for as.POSIXct is too long? And when it's too long what could be the solution?
It's probably because "Fri" and "Apr" are not the correct abbreviations in your locale.
Use Sys.setlocale("LC_TIME", locale) to set your R session's locale to one that will correctly interpret English abbreviations. See the Examples section of ?Sys.setlocale for how to specify locale in the above function call.
For example, on my Ubuntu machine it would be:
> Sys.setlocale("LC_TIME", "en_US.UTF-8")
> as.POSIXct("Fri Apr 10 11:57:47 2015", format="%a %b %d %H:%M:%S %Y")
[1] "2015-04-10 11:57:47 CDT"
Thanks a lot Henrik!!!
I changed the LC_TIME category like this, now it works
Sys.getlocale(category = "LC_TIME")
[1] "German_Germany.1252"
Sys.setlocale("LC_TIME", "English")
[1] "English_United States.1252"
DATE_in_posix<-as.POSIXct(DATE,format="%a %b %d %H:%M:%S %Y")
> DATE_in_posix
[1] "2015-04-10 11:57:47 CEST"
and strptime now works too of course
DATE_in_posix<-strptime(DATE,format="%a %b %d %H:%M:%S %Y")
> DATE_in_posix
[1] "2015-04-10 11:57:47 CEST"
Thank you so much guys and have a nice weekend!

Resources