convert character string with timezone to date in r - r

I have a vector of character strings looking like this. I want to convert them to dates. The characters for time-zone is posing trouble.
> a
[1] "07/17/2014 5:01:22 PM EDT" "7/17/2014 2:01:05 PM PDT" "07/17/2014 4:00:48 PM CDT" "07/17/2014 3:05:16 PM MDT"
If I use: strptime(a, "%d/%m/%Y %I:%M:%S %p %Z") I get [1] NA
If i omit the "%Z" for time-zone, and use this:
strptime(a, "%m/%d/%Y %I:%M:%S %p", tz = "EST5EDT") I get
[1] "2014-07-17 17:01:22 EDT"
Since my strings contain various time zones - PDT, CDT, EDT, MDT , I can't default all time zones to EST5EDT. One way to overcome is split the vector into different vectors for each time-zone, remove the letters PDT / EDT etc. and apply the right timezone with strptime - "EST5EDT" , "CST6CDT" etc. Is there any other way to solve this?

If the date is always the first part of the elements of the character vector and it is always followed by the time, splitting the elements by the whitespaces is a possibility. If only the date is needed:
dates <- sapply(a, function(x) strsplit(x, split = " ")[[1]][1])
dates <- as.Date(as.character(dates), format = "%m/%d/%Y")
[1] "2014-07-17" "2014-07-17" "2014-07-17" "2014-07-17"
If also the time is needed:
datetime <- sapply(a, function(x) paste(strsplit(x, split = " ")[[1]][1:3],
collapse = " "))
datetime <- strptime(as.character(datetime), format = "%m/%d/%Y %I:%M:%S %p")
[1] "2014-07-17 17:01:22 CEST" "2014-07-17 14:01:05 CEST"
You can set a different timezone using the tz argument here.

Related

String column to Dtm(date time) column in R

How does one convert a column from str to dtm? I've tried as.date and strptime and non of those works. Say I have a table with a column with 3 attributes (2003/11/04 19:29, 2001/04/02 21:32, 2003/10/28 09:51) in the str format. How would I covert this column so that it is in the dtm format? Thank you in advance!
Check ?strptime for different format arguments. You can do:
x <- c('2003/11/04 19:29', '2001/04/02 21:32', '2003/10/28 09:51')
as.POSIXct(x, format = "%Y/%m/%d %H:%M", tz = "UTC")
#Can also be done with `strptime`
#strptime(x, format = "%Y/%m/%d %H:%M", tz = "UTC")
#[1] "2003-11-04 19:29:00 UTC" "2001-04-02 21:32:00 UTC" "2003-10-28 09:51:00 UTC"
Or with lubridate
lubridate::ymd_hm(x)
Replace x with column name df$column_name.

formatting time in R error

I have a Time column in my df with value 1.01.2016 0:00:05. I want it without the seconds and therefore used df$Time <- as.POSIXct(df$Time, format = "%d.%m.%Y :%H:%M", tz = "Asia/Kolkata"). But I get NA value. What is the problem here?
I suspect there are two things working here: the storage of a time object (POSIXt), and the representation of that object.
The string you present is (I believe) not a proper POSIXt (whether POSIXct or POSIXlt) object for R, which means it is just a character string. In that case, you can remove it with:
gsub(':[^:]*$', '', '1.01.2016 0:00:05')
# [1] "1.01.2016 0:00"
However, that is still just a string, not a date or time object. If you parse it into a time-object that R knows about:
as.POSIXct("1.01.2016 0:00:05", format = "%d.%m.%Y %H:%M:%S", tz = "Asia/Kolkata")
# [1] "2016-01-01 00:00:05 IST"
then you now have a time object that R knows something about ... and it defaults to representing it (printing it on the console) with seconds-precision. Typically, all that is available to change for the console-printing is the precision of the seconds, as in
options("digits.secs")
# $digits.secs
# NULL
Sys.time()
# [1] "2018-06-26 18:21:06 PDT"
options("digits.secs"=3)
Sys.time()
# [1] "2018-06-26 18:21:10.090 PDT"
then you can get more. But alas, I do know think there is an R-option to say "always print my POSIXt objects in this way". So your only choice is (at the point where you no longer need it to be a time-like object) to change it back into a string with no time-like value:
x <- as.POSIXct("1.01.2016 0:00:05", format = "%d.%m.%Y %H:%M:%S", tz = "Asia/Kolkata")
x
# [1] "2016-01-01 00:00:05 IST"
?strptime
# see that day-of-month can either be "%d" for 01-31 or "%e" for 1-31
format(x, format="%e.%m.%Y %H:%M")
# [1] " 1.01.2016 00:00"
(This works equally well for a vector.)
Part of me suggests convert to POSIXt and back to string as opposed to my gsub example because using as.POSIXct will tell you when the string does not match the date-time-like object you are expecting, whereas gsub will happily do something wrong or nothing.
Try asPOSIXlt:
> test <- "1.01.2016 0:00:05"
> as.POSIXlt(test, "%d.%m.%Y %H:%M:%S", tz="Asia/Kolkata")
[1] "2016-01-01 00:00:05 IST"

NA returned while using strptime

I have this data frame which gives me Date and Time columns. I am trying to combine these 2 columns but strptime is returning NA. i want to understand why is it happening?
x <- data.frame(date = "1/2/2007", time = "00:00:02")
y <- strptime(paste(x$date,x$time,sep = " "), format = "%b/%d/%y %H:%M:%S")
We need %m and %Y in place of %b and %y (%b - Abbreviated month name in the current locale on this platform. %y - Year without century (00–99)).
strptime(paste(x$date,x$time,sep = " "), "%m/%d/%Y %H:%M:%S")
#[1] "2007-01-02 00:00:02 IST"
For understanding the format, it is better to check ?strptime
Or we can use mdy_hms from lubridate
library(lubridate)
with(x, mdy_hms(paste(date, time)))
#[1] "2007-01-02 00:00:02 UTC"

Formatting Date Strings in R

I have two columns of differently formatted date strings that I need to make the same format,
the first is in the form:
vt_dev_date = "6/20/2016 7:45"
the second is in the form
vt_other = "2016-06-14 20:21:29.0"
If could get them both in the same form down to the minute that would be great. I have tried
strptime(vt_dev_date,format = "%Y-%m-%d %H:%M")
strptime(vt_other,"%Y-%m-%d %H:%M")
and for the second one, it works and I get
"2016-06-14 20:21:00 EDT"
But for the first string, it seems that because the month and hour are not padded with zeros, none of the formating tricks will work, becuase if I try
test_string <- "06/20/2016 07:45"
strptime(test_string,format = "%m/%d/%Y %H:%M")
[1] "2016-06-20 07:45:00 EDT"
It works, but I dont think going through every row in the column and padding each date is a great option. Any help would be appreciated.
Thanks,
josh
How about using lubridate , as follows :
library(lubridate)
x <- c("6/20/2016 7:45","2016-06-14 20:21:29.0")
> x
[1] "6/20/2016 7:45" "2016-06-14 20:21:29.0"
> parse_date_time(x, orders = c("mdy hm", "ymd hms"))
[1] "2016-06-20 07:45:00 UTC" "2016-06-14 20:21:29 UTC"
>

Parse timestamp with a.m./p.m

I have a file that formats time stamps like 25/03/2011 9:15:00 p.m.
How can I parse this text to a Date-Time class with either strptime or as.POSIXct?
Here is what almost works:
> as.POSIXct("25/03/2011 9:15:00", format="%d/%m/%Y %I:%M:%S", tz="UTC")
[1] "2011-03-25 09:15:00 UTC"
Here is what is not working, but I'd like to have working:
> as.POSIXct("25/03/2011 9:15:00 p.m.", format="%d/%m/%Y %I:%M:%S %p", tz="UTC")
[1] NA
I'm using R version 2.13.2 (2011-09-30) on MS Windows. My working locale is "C":
Sys.setlocale("LC_TIME", "C")
It appears the AM/PM indicator can't include punctuation. Try it after removing the punctuation:
td <- "25/03/2011 9:15:00 p.m."
tdClean <- gsub("(.)\\.?[Mm]\\.?","\\1m",td)
as.POSIXct(tdClean, format="%d/%m/%Y %I:%M:%S %p", tz="UTC")
# [1] "2011-03-25 21:15:00 UTC"
Just came across this, as another option you can use stringr package.
library(stringr)
data$date2 <- str_sub(data$date, end = -4)
# this removes the punctuation but holds onto the A/P values
data$date2 <- str_c(data$date2, 'm')
# adds the required m

Resources