Character string with timezone convert to date - r

I am trying to convert a vector of dates that I read from a csv file using read.table. These were read as a vector of character strings. I am trying to convert it to a date vector using as_date.
The date vector has elements of the below type
dateString
"Wed Dec 11 00:00:00 ICT 2013"
On trying to convert using the below command,
as.Date(dateString,"%a %b %e %H:%M:%S %Z %Y")
Error in strptime(x, format, tz = "GMT") :
use of %Z for input is not supported
What would be the right format to use in strptime? or in as.Date?

Just use the anytime() function from the anytime package:
R> anytime::anytime("Wed Dec 11 00:00:00 ICT 2013")
[1] "2013-12-11 CST"
R>
There is also an utctime() variant to not impose your local time, and much. By now we also had a number of questions here so just search.
And if you want a date, it works the same way:
R> anytime::anydate("Wed Dec 11 00:00:00 ICT 2013")
[1] "2013-12-11"
R>

Related

How to convert a string into a date-time object

The data is character and I want it to be date-time. I have the cheat sheet with me but there isn't any format that I can use that satisfies the weird date format. Any suggestions?
x <- 'Fri Dec 11 12:10:51 PST 2020'
You can use the anytime package
> library(anytime)
> anytime("Fri Dec 11 12:10:51 PST 2020")
[1] "2020-12-11 12:10:51 CST"
>
> class(anytime("Fri Dec 11 12:10:51 PST 2020"))
[1] "POSIXct" "POSIXt"
>
It has three key advantages:
it can guess the format (as here)
it converts all sorts of input format (incl character, factor, ...)
it is pretty fast (as the parser is C++ from Boost)
It is pretty standard for most methods to ignore the timezone attribute. So the PST became my local time, i.e. Central.
In base R, you could do :
x <- 'Fri Dec 11 12:10:51 PST 2020'
as.POSIXct(x, format = '%a %b %d %T PST %Y')
See ?strptime for detailed format specifications.

Converting date from 2019-07-04 14:01 +0000 to MM/dd/yyyy format

I am trying to the date format 2019-07-04 14:01 +0000 to mm/dd/yyyy format.
I am using this:
as.Date(strptime(d <- Twitter$time, "%b %d %Y %H:%M %p"))
I've also tried:
ymd_hms(Twitter$time)
However it returns NA values. Is there any way to convert this format to MM/dd/yyyy in R?
As we are not interested in the time component convert the column to Date class with as.Date (here the format is not required as the input is in the default format mode) and use format to change the format
format(as.Date(str1), "%m/%d/%Y")
#[1] "07/04/2019"
data
str1 <- "2019-07-04 14:01 +0000"
There are always two steps: parse, and format.
You can use as.Date() as shown or anydate() from the anytime package (which will also work for different input formats as shown here):
R> inp <- anytime::anydate(c("2019-07-04 14:01 +0000", "04-Jul-2019 14:02"))
R> inp
[1] "2019-07-04" "2019-07-04"
R> format(inp, "%m/%d/%Y")
[1] "07/04/2019" "07/04/2019"
R>

Dealing with date-time string that has day of the week

I have a date-time string that has day of the week and some meta-data in the string.
d <- "Fri, 14 Jul 2000 06:59:00 -0700 (PDT)"
I need to convert it into a date-time object (e.g. I have a column of these in a data.table) for further analysis. I have dealt with this using regexes to strip off meta-data from the string. Is there a better approach?
What I have is:
m <- regexpr("^\\w+,\\s+", d, perl=TRUE)
regmatches(d, m)
m <- regexpr("\\s-?\\d+\\s\\(\\w+\\)$", d, perl=TRUE)
regmatches(d, m)
ds <- sub("^\\w+,\\s+", "", d)
ds <- sub("\\s-?\\d+\\s\\(\\w+\\)$", "", ds)
Now I can convert this to date-time objects of class Date, Posixlt or Posixct for use in analysis.
dd <- strptime(ds, format="%d %b %Y %H:%M:%S")
dd <- as.Date(ds, format="%d %b %Y %H:%M:%S")
dd <- as.POSIXct(ds, format="%d %b %Y %H:%M:%S")
I wrote the anytime package to help with (among other things) these silly format strings -- so it heuristically just tries a number of them (and focuses on sane ones).
The input you have here qualifies (and is in fact a pretty common form):
R> anytime("Fri, 14 Jul 2000 06:59:00 -0700 (PDT)")
[1] "2000-07-14 06:59:00 CDT"
R>
We do not currently try to capture the timezone offset information at the end, so you have to deal with that after the fact. The display is in CDT which is my local timezone.
There is some more information about anytime on its webpage.
assuming the format of string is going to be constant across your data :
time = trimws(unlist(strsplit(d, "[,-]"))[2])
#[1] "14 Jul 2000 06:59:00"
tz = unlist(strsplit(d, "[,-]"))[3]
tz = gsub("[^A-Z]", "", tz)
#[1] "PDT"
> as.Date(time, format = "%d %b %Y")
[1] "2000-07-14"
> as.POSIXct(time, format = "%d %b %Y %H:%M:%S") #specify th etimezone with tz
[1] "2000-07-14 06:59:00 IST"

How to convert string to date format in R

I have a column of strings in the following format:
Wed, 6 Dec 2000 08:47:00 -0800 (PST)
How can I convert this into date format using lubridate or another package? I have done this before, but there was no -0800 (PST) at the end.
Thank you.
I was able to get a result using strptime() without even worrying about the timezone name at the end:
> x - "Wed, 6 Dec 2000 08:47:00 -0800 (PST)"
> strptime(x, "%a, %d %b %Y %H:%M:%S %z")
[1] "2000-12-07 00:47:00"
However, if you want to remove the timezone name, you can use substr() to do this:
> strptime(substr(x, 1, nchar(x)-6), "%a, %d %b %Y %H:%M:%S %z")
[1] "2000-12-07 00:47:00"
We can also use parse_date_time
library(lubridate)
parse_date_time(x, "adbY HMS z", tz = "US/Pacific")
#[1] "2000-12-06 08:47:00 PST"

How do I specify POSIX (time) format for 3 letter tz in R, in order to ignore it?

For output, the specification is %Z (see ?strptime). But for input, how does that work?
To clarify, it'd be great for the time zone abbreviation to be parsed into useful information by as.POSIXct(), but more core to be question is how to get the function to at least ignore the time zone.
Here is my best workaround, but is there a particular format code to pass to as.POSIXct() that will work for all time zones?
times <- c("Fri Jul 03 00:15:00 EDT 2015", "Fri Jul 03 00:15:00 GMT 2015")
as.POSIXct(times, format="%a %b %d %H:%M:%S %Z %Y") # nope! strptime can't handle %Z in input
formats <- paste("%a %b %d %H:%M:%S", gsub(".+ ([A-Z]{3}) [0-9]{4}$", "\\1", times),"%Y")
as.POSIXct(times, format=formats) # works
Edit: Here is the output from the last line, as well as its class (from a separate call); the output is as expected. From the console:
> as.POSIXct(times, format=formats)
[1] "2015-07-03 00:15:00 EDT" "2015-07-03 00:15:00 EDT"
> attributes(as.POSIXct(times, format=formats))
$class
[1] "POSIXct" "POSIXt"
$tzone
[1] ""
The short answer is, "no, you can't." Those are abbreviations and they are not guaranteed to uniquely identify a specific timezone.
For example, is "EST" Eastern Standard Time in the US or Australia? Is "CST" Central Standard Time in the US or Australia, or is it China Standard Time, or is it Cuba Standard Time?
I just noticed that you're not trying to parse the timezone abbreviation, you are simply trying to avoid it. I don't know of a way to tell strptime to ignore arbitrary characters. I do know that it will ignore anything in the character representation of the time after the end of the format string. For example:
R> # The year is not parsed, so the current year is used
R> as.POSIXct(times, format="%a %b %d %H:%M:%S")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"
Other than that, a regular expression is the only thing I can think of that solves this problem. Unlike your example, I would use the regex on the input character vector to remove all 3-5 character timezone abbreviations.
R> times_no_tz <- gsub(" [[:upper:]]{3,5} ", " ", times)
R> as.POSIXct(times_no_tz, format="%a %b %d %H:%M:%S %Y")
[1] "2015-07-03 00:15:00 UTC" "2015-07-03 00:15:00 UTC"

Resources