Trying to reverse-convert a vector of time zones - r

I have this database of time stamps (AlertTime), and I know what time zone these are in (TimeZone). I know how to set these date to POSIXCT or if they were all UTC, but I'm struggling to get them identified as their local time stamps because most functions don't accept a vector for tz.
I do need both the local time stamp properly formatted (AlertTimeLocal) and the UTC equivalent (AlertTimeUTC).
AlertTime TimeZone AlertTimeLocal (desired) AlertTimeUTC (desired)
11 May 2020, 06:22 PM America/Denver 2020-05-11 18:22:00 MDT 2020-05-12 00:22:00 MDT
11 MAY 2020, 04:11 AM America/Los_Angeles 2020-05-11 04:11:00 PDT 2020-05-11 11:11:00 UTC
10 MAY 2020, 03:38 PM America/Chicago 2020-05-10 15:38:00 CDT 2020-05-10 20:38:00 CDT
I was using this code but it doesn't seem to do anything anymore:
FreshAir$AlertTimeLocal <- mapply(function(x,y) {format(x, tz=y, usetz=TRUE)}, FreshAir$AlertTime, FreshAir$TimeZone)
Would a hacky solution be to set all the RAW time stamps to UTC, then convert them to the equivalent time zone in the other direction?

We can use force_tzs from lubridate
library(lubridate)
library(dplyr)
df1 %>%
mutate(AlertTimeLocal = dmy_hm(AlertTime),
AlertTimeUTC = force_tzs(AlertTimeLocal, tzones = TimeZone))
# AlertTime TimeZone AlertTimeLocal AlertTimeUTC
#1 11 May 2020, 06:22 PM America/Denver 2020-05-11 18:22:00 2020-05-12 00:22:00
#2 11 MAY 2020, 04:11 AM America/Los_Angeles 2020-05-11 04:11:00 2020-05-11 11:11:00
#3 10 MAY 2020, 03:38 PM America/Chicago 2020-05-10 15:38:00 2020-05-10 20:38:00
Update
If we need to store as separate time zones, we can use a list column
library(purrr)
df2 <- df1 %>%
mutate(AlertTime2 = dmy_hm(AlertTime),
AlertTimeUTC = force_tzs(AlertTime2, tzones = TimeZone),
AlertTimeLocal = map2(AlertTime2, TimeZone, ~ force_tz(.x, tzone = .y)))
df2$AlertTimeLocal
#[[1]]
#[1] "2020-05-11 18:22:00 MDT"
#[[2]]
#[1] "2020-05-11 04:11:00 PDT"
#[[3]]
#[1] "2020-05-10 15:38:00 CDT"
data
df1 <- structure(list(AlertTime = c("11 May 2020, 06:22 PM",
"11 MAY 2020, 04:11 AM",
"10 MAY 2020, 03:38 PM"), TimeZone = c("America/Denver",
"America/Los_Angeles",
"America/Chicago")), class = "data.frame", row.names = c(NA,
-3L))

I think a tidy solution might look cleaner, but if you want a base R solution, here's an alternative using #akrun's df1:
df1$AlertTimeLocal <- df1$AlertTimeUTC <-
c.POSIXct(Map(as.POSIXct, df1$AlertTime, tz = df1$TimeZone, format = "%d %b %Y, %I:%M %p"))
attr(df1$AlertTimeUTC, "tzone") <- "UTC"
attr(df1$AlertTimeLocal, "tzone") <- "US/Mountain"
df1
# AlertTime TimeZone AlertTimeUTC AlertTimeLocal
# 1 11 May 2020, 06:22 PM America/Denver 2020-05-12 00:22:00 2020-05-11 18:22:00
# 2 11 MAY 2020, 04:11 AM America/Los_Angeles 2020-05-11 11:11:00 2020-05-11 05:11:00
# 3 10 MAY 2020, 03:38 PM America/Chicago 2020-05-10 20:38:00 2020-05-10 14:38:00
Something that has not been discussed, though: in R, you cannot have different time zones within one vector of POSIXt. That is, in a vector, time zone is an attribute of the vector, not of the element. If you need individual time zones for each time in that column, you'll need to do a list-column. This works but is not always supported well by utilities/functions that work on data.frame.

Related

How to convert a column of UTC timestamps into several different timezones?

I have a dataset with dates stored in the DB as UTC, however, the timezone is actually different.
mydat <- data.frame(
time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)
I want to apply the timezone to the UTC saved timestamps, over the entire column.
I looked into the with_tz function in the lubridate package, but I don't see how to reference the "timezone" column, rather than hardcoding in a value.
Such as if I try
with_tz(mydat$time_stamp, tzone = mydat$timezone)
I get the following error
Error in as.POSIXlt.POSIXct(x, tz) : invalid 'tz' value`
However, if I try
mydat$time_stamp2 <- with_tz(mydat$time_stamp,"America/New_York")
that will render a new column without issue. How can I do this just referencing column values?
Welcome to StackOverflow. This is nice, common, and tricky problem! The following should do what you ask for:
Code
mydat <- data.frame(time_stamp=c("2022-08-01 05:00:00 UTC",
"2022-08-01 17:00:00 UTC",
"2022-08-02 22:30:00 UTC",
"2022-08-04 05:00:00 UTC",
"2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago", "America/New_York",
"America/Los_Angeles", "America/Denver",
"America/New_York"))
mydat$utc <- anytime::utctime(mydat$time_stamp, tz="UTC")
mydat$format <- ""
for (i in seq_len(nrow(mydat)))
mydat[i, "format"] <- strftime(mydat[i,"utc"],
"%Y-%m-%d %H:%M:%S",
tz=mydat[i,"timezone"])
Output
> mydat
time_stamp timezone utc format
1 2022-08-01 05:00:00 UTC America/Chicago 2022-08-01 05:00:00 2022-08-01 00:00:00
2 2022-08-01 17:00:00 UTC America/New_York 2022-08-01 17:00:00 2022-08-01 13:00:00
3 2022-08-02 22:30:00 UTC America/Los_Angeles 2022-08-02 22:30:00 2022-08-02 15:30:00
4 2022-08-04 05:00:00 UTC America/Denver 2022-08-04 05:00:00 2022-08-03 23:00:00
5 2022-08-05 02:00:00 UTC America/New_York 2022-08-05 02:00:00 2022-08-04 22:00:00
>
Comment
We first parse your data as UTC, I once wrote a helper function for that in my anytime package (there are alternatives but this is how I do it...). We then need to format from the given (numeric !!) UTC representation to the give timezone. We need a loop for this as the tz argument to strftime() is not vectorized.
Dirk gave a great answer that uses (mostly) base R tooling, if that is a requirement of yours. I wanted to also add an answer that uses the clock package that I developed because it doesn't require working rowwise over your data frame. clock has a function called sys_time_info() that retrieves low level information about a UTC time point in a specific time zone. It is one of the few functions where it makes sense to have a vectorized zone argument (which you need here) and returns an offset from UTC that will be useful here for converting to a "local" time.
As others have mentioned, you won't be able to construct a date-time vector that stores multiple time zones in it, but if you just need to see what the local time would have been in those zones, this can still be useful.
library(clock)
mydat <- data.frame(
time_stamp=c("2022-08-01 05:00:00 UTC","2022-08-01 17:00:00 UTC","2022-08-02 22:30:00 UTC","2022-08-04 05:00:00 UTC","2022-08-05 02:00:00 UTC"),
timezone=c("America/Chicago","America/New_York","America/Los_Angeles","America/Denver","America/New_York")
)
# Parse into a "sys-time" type, which can be thought of as a UTC time point
mydat$time_stamp <- sys_time_parse(mydat$time_stamp, format = "%Y-%m-%d %H:%M:%S")
mydat
#> time_stamp timezone
#> 1 2022-08-01T05:00:00 America/Chicago
#> 2 2022-08-01T17:00:00 America/New_York
#> 3 2022-08-02T22:30:00 America/Los_Angeles
#> 4 2022-08-04T05:00:00 America/Denver
#> 5 2022-08-05T02:00:00 America/New_York
# "Low level" information about DST, the time zone abbreviation,
# and offset from UTC in that zone. This is one of the few functions where
# it makes sense to have a vectorized `zone` argument.
info <- sys_time_info(mydat$time_stamp, mydat$timezone)
info
#> begin end offset dst abbreviation
#> 1 2022-03-13T08:00:00 2022-11-06T07:00:00 -18000 TRUE CDT
#> 2 2022-03-13T07:00:00 2022-11-06T06:00:00 -14400 TRUE EDT
#> 3 2022-03-13T10:00:00 2022-11-06T09:00:00 -25200 TRUE PDT
#> 4 2022-03-13T09:00:00 2022-11-06T08:00:00 -21600 TRUE MDT
#> 5 2022-03-13T07:00:00 2022-11-06T06:00:00 -14400 TRUE EDT
# Add the offset to the sys-time and then convert to a character column
# (these times don't really represent sys-time anymore since they are now localized)
mydat$localized <- as.character(mydat$time_stamp + info$offset)
mydat
#> time_stamp timezone localized
#> 1 2022-08-01T05:00:00 America/Chicago 2022-08-01T00:00:00
#> 2 2022-08-01T17:00:00 America/New_York 2022-08-01T13:00:00
#> 3 2022-08-02T22:30:00 America/Los_Angeles 2022-08-02T15:30:00
#> 4 2022-08-04T05:00:00 America/Denver 2022-08-03T23:00:00
#> 5 2022-08-05T02:00:00 America/New_York 2022-08-04T22:00:00

Transform string in date through Lubridate with variation in month, day, year hour min am/pm and time zone

I need some help with a lubridate function over different time zones. I have two vectors of the kind:
date1 = c("February 11th 2017, 6:05am PST", "April 24th 2018, 4:09pm PDT")
date2 = c("2013-12-14 00:58:00 CET", "2013-06-19 18:00:00 CEST")
I would like to use lubridate functions (I tried mdy_hm) to transform these strings into date format, and then take the difference (in days) across the two strings while taking into account the difference in time zone, where D in PDT stands for Day Light and S in PST stands for Standard time zone for Pacific time (https://www.timeanddate.com/time/zones/pdt and https://www.timeanddate.com/time/zones/pst) and similarly for CET (https://time.is/CET) and CEST (https://time.is/CEST). Could you please help me?
First thing I did was to setup a tibble with your 2 date vectors
tibble(
date1 = c("February 11th 2017, 6:05am PST", "April 24th 2018, 4:09pm PDT"),
date2 = c("2013-12-14 00:58:00 CET", "2013-06-19 18:00:00 CEST"),
) %>%
{. ->> my_dates}
my_dates
# # A tibble: 2 x 2
# date1 date2
# <chr> <chr>
# February 11th 2017, 6:05am PST 2013-12-14 00:58:00 CET
# April 24th 2018, 4:09pm PDT 2013-06-19 18:00:00 CEST
Then, make a tibble of the timezone abbreviations and their offset from UTC
# setup timezones and UTC offsets
tribble(
~tz, ~offset,
'PST', -8,
'PDT', -7,
'CET', +1,
'CEST', +2
) %>%
{. ->> my_tz}
my_tz
# # A tibble: 4 x 2
# tz offset
# <chr> <dbl>
# PST -8
# PDT -7
# CET 1
# CEST 2
Then, we tidy the datetimes up by removing the character suffix after the day number in date1 (the 'th' bit after '11th'). We also pull out the timezone code and put that in a separate column; the timezone column allows us to left_join() my_tz in, giving us the UTC offset.
We use string-handling functions from the stringr package, and regex expressions to find, extract and replace the components. A very handy tool for testing regex patterns can be found here https://regex101.com/r/5pr3LL/1/
my_dates %>%
mutate(
# remove the character suffix after the day number (eg 11th)
day_suffix = str_extract(date1, '[0-9]+[a-z]+') %>% str_extract('[a-z]+'),
date1 = str_replace(date1, day_suffix, ''),
day_suffix = NULL,
# extract timezone info
date1_tz = str_extract(date1, '[a-zA-Z]+$'),
date2_tz = str_extract(date2, '[a-zA-Z]+$'),
) %>%
# join in timezones for date1
left_join(my_tz, by = c('date1_tz' = 'tz')) %>%
rename(
offset_date1 = offset
) %>%
# join in timezones for date2
left_join(my_tz, by = c('date2_tz' = 'tz')) %>%
rename(
offset_date2 = offset
) %>%
{. ->> my_dates_info}
my_dates_info
# # A tibble: 2 x 6
# date1 date2 date1_tz date2_tz offset_date1 offset_date2
# <chr> <chr> <chr> <chr> <dbl> <dbl>
# February 11 2017, 6:05am PST 2013-12-14 00:58:00 CET PST CET -8 1
# April 24 2018, 4:09pm PDT 2013-06-19 18:00:00 CEST PDT CEST -7 2
So now, we can use lubridate::as_datetime() to convert date1 and date2 to dttm (datetime) format. as_datetime() takes a character-format datetime and converts it to datetime format. You must specify the format of the character string using symbols and abbreviations explained here. For example, here we use %B to refer to the full name of the month, %d is the day number and %Y is the (4-digit) year number etc.
Note: because we don't specify the timezone inside as_datetime(), the underlying timezone stored with these datetimes defaults to UTC (as seen by using tz()). This is why we call these columns date*_orig, to remind us the timezone is the original datetime's timezone. Then we add the offset to the datetime object, so we now have these times in UTC (and the underlying timezone signature of these values is UTC, so that's ideal).
# now define datetimes in local and UTC timezones (note: technically the tz is UTC for both)
my_dates_info %>%
mutate(
date1_orig = as_datetime(date1, format = '%B %d %Y, %I:%M%p '),
date1_utc = date1_orig + hours(offset_date1),
date2_orig = as_datetime(date2, format = '%Y-%m-%d %H:%M:%S'),
date2_utc = date2_orig + hours(offset_date2),
) %>%
{. ->> my_dates_utc}
my_dates_utc
# # A tibble: 2 x 10
# date1 date2 date1_tz date2_tz offset_date1 offset_date2 date1_orig date1_utc date2_orig date2_utc
# <chr> <chr> <chr> <chr> <dbl> <dbl> <dttm> <dttm> <dttm> <dttm>
# February 11 2017, 6:05am PST 2013-12-14 00:58:00 CET PST CET -8 1 2017-02-11 06:05:00 2017-02-10 22:05:00 2013-12-14 00:58:00 2013-12-14 01:58:00
# April 24 2018, 4:09pm PDT 2013-06-19 18:00:00 CEST PDT CEST -7 2 2018-04-24 16:09:00 2018-04-24 09:09:00 2013-06-19 18:00:00 2013-06-19 20:00:00
Now that we have both sets of dates in datetime format, and in the same timezone, we can calculate time differences between them.
# now calculate difference between them
my_dates_utc %>%
select(date1_utc, date2_utc) %>%
mutate(
difference_days = interval(start = date1_utc, end = date2_utc) %>% time_length(unit = 'days')
)
# # A tibble: 2 x 3
# date1_utc date2_utc difference_days
# <dttm> <dttm> <dbl>
# 2017-02-10 22:05:00 2013-12-14 01:58:00 -1155.
# 2018-04-24 09:09:00 2013-06-19 20:00:00 -1770.
This should be fine for small-scale operations. If you had more than 2 different datetime format vectors, it would be worth considering a more complex operation where you transform the data from wide to long format. This would save repeating the same/similar code for each column, like we have done for date1 and date2 in this example.

How to know if a as.POSIXct date time is AM/PM in r?

I have a column with date and time in the as.POSIXct format e.g. "2019-02-23 12:45". I want to identify if the time is AM or PM and add AM or PM to the date and time?
the following code creates an example dataset for representation:
ID <- data.frame(c(1,2,3,4))
DATE <- data.frame(as.POSIXct(c("2019-02-25 07:30", "2019-03-25 14:30", "2019-03-25 12:00", "2019-03-25 00:00"),format="%Y-%m-%d %H:%M"))
DATEAMPM <- data.frame(c("2019-02-25 07:30 AM", "2019-03-25 14:30 PM", "2019-03-25 12:00 PM", "2019-03-25 00:00 AM"))
AMPMFLAG <- data.frame(c(0,1,1,0))
test <- cbind(ID,DATE,DATEAMPM,AMPMFLAG)
names(test) <- c("PID","DATE","DATEAMPM","AMPMFLAG")
Would like to create the DATEAMPM and AMPMFLAG columns as represented in the code above.
I have seen character strings of the form "2019-09-23 08:45 PM" converted to 2019-09-23 20:45" by specifying the argument as below, but do not the other way around to incorporate AM/PM into the date time
as.POSIXct(strptime(,format="%Y-%m-%d %I:%M %p"))
Appreciate your help
We can use format to get the data with AM/PM
test$DATEAMPM <- format(test$DATE, "%Y-%m-%d %I:%M %p")
test$AMPMFLAG <- +(grepl("PM", test$DATEAMPM))
test
# PID DATE DATEAMPM AMPMFLAG
#1 1 2019-02-25 07:30:00 2019-02-25 07:30 AM 0
#2 2 2019-03-25 14:30:00 2019-03-25 02:30 PM 1
#3 3 2019-03-25 12:00:00 2019-03-25 12:00 PM 1
#4 4 2019-03-25 00:00:00 2019-03-25 12:00 AM 0
Also note that when you convert 14:30:00 in AM/PM it would be 02:30 PM and not 14:30 PM.

Using dplyr::if_else() in R to change the time zone of POSIXct timestamps based on value of another variable

I'm working with some timestamps in POSIXct format. Right now they are all showing up as being in the timezone "UTC", but in reality some are known to be in the "America/New_York" timezone. I'd like to correct the timestamps so that they all read as the correct times.
I initially used an ifelse() statement along with lubridate::with_tz(). This didn't work as expected because ifelse() didn't return values in POSIXct.
Then I tried dplyr::if_else() based on other posts here, and that's not working as expected either.
I can change a single timestamp or even a list of timestamps to a different timezone using with_tz() (so I know it works), but when I use it within if_else() the output is such that all the values are returned given the "yes" argument in if_else().
library(lubridate)
library(dplyr)
x <- data.frame("ts" = as.POSIXct(c("2017-04-27 13:44:00 UTC",
"2017-03-10 12:22:00 UTC", "2017-03-22 10:24:00 UTC"), tz = "UTC"),
"tz" = c("UTC","EST","UTC"))
x <- mutate(x, ts_New = if_else(tz == "UTC", with_tz(ts, "America/New_York"), ts))
Expected results are below where ts_New has timestamps adjusted to new time zone but only when values in tz = "UTC". Timestamps with tz = "America/New_York" shouldn't change.
ts tz ts_NEW
1 2017-04-27 13:44:00 UTC 2017-04-27 09:44:00
2 2017-03-10 12:22:00 EST 2017-03-10 12:22:00
3 2017-01-22 10:24:00 UTC 2017-03-22 06:24:00
Actual results are below where all ts_New timestamps are adjusted to new time zone regardless of value in tz
x
ts tz ts_New
1 2017-04-27 13:44:00 UTC 2017-04-27 09:44:00
2 2017-03-10 12:22:00 EST 2017-03-10 07:22:00
3 2017-03-22 10:24:00 UTC 2017-03-22 06:24:00
This doesn't answer your original question about why with_tz doesn't work with if_else but here is one workaround. We subtract 4 hours (difference between UTC and EST) where tz == "UTC".
library(dplyr)
library(lubridate)
x %>% mutate(ts_New = if_else(tz == "UTC", ts - hours(4), ts))
# ts tz ts_New
#1 2017-04-27 13:44:00 UTC 2017-04-27 09:44:00
#2 2017-03-10 12:22:00 EST 2017-03-10 12:22:00
#3 2017-03-22 10:24:00 UTC 2017-03-22 06:24:00
Or in base R
x$ts_New <- x$ts
inds <- x$tz == "UTC"
x$ts_New[inds] <- x$ts_New[inds] - 4 * 60 * 60

How to rearrange date and time

Could you please tell me how to rearrange the datetime of data set A in order to compatible with datetime of data set B (which is in GMT+10 format)?
Thank you.
**data set A**
sitecode status start end
ANS0009 spike 11/09/2013 04:45:00 PM (GMT+11) 11/09/2013 05:00:00 PM (GMT+11)
ARM0064 spike 05/03/2014 11:00:00 AM (GMT+10) 05/03/2014 11:15:00 AM (GMT+10)
BAS0059 dry 13/01/2013 00:00:00 AM (GMT+11) 29/03/2013 11:45:00 PM (GMT+11)
BAS0059 spike 11/03/2014 10:15:00 AM (GMT+10) 11/03/2014 10:30:00 AM (GMT+10)
BLC0097 failure 12/20/2012 05:00:00 PM (GMT+11) 12/31/2012 11:45:00 PM (GMT+11)
BLC0097 spike 24/12/2015 04:59:45 PM (GMT+10) 24/12/2015 05:01:50 PM (GMT+10)
**data set B**
sitecode status start end
EUM0056 record 2012-12-01 11:00:00 2013-10-06 01:45:00
EUM0056 missing 2013-10-06 01:45:00 2013-10-06 03:00:00
EUM0056 record 2013-10-06 03:00:00 2014-03-11 20:15:00
MDL0026 record 2012-12-07 11:00:00 2013-04-04 19:45:00
MDL0026 missing 2013-04-04 19:45:00 2014-02-27 23:00:00
MDL0026 record 2014-02-27 23:00:00 2014-10-05 01:45:00
We can could use lubridate to parse multiple formats after splitting the string into two to remove the (GMT + ...).
library(lubridate)
library(stringr)
v1 <- strsplit(str1, "\\s+(?=\\()", perl = TRUE)[[1]]
parse_date_time(v1[1], c("%d/%m/%Y %I:%M:%S %p", "%m/%d/%Y %I:%M:%S %p"),
tz= "GMT", exact = TRUE) + lubridate::hours(str_extract(v1[2], "\\d+"))
#[1] "2013-09-12 03:45:00 GMT"
Using the full dataset example
datA[c("start", "end")] <- lapply(datA[c("start", "end")], function(x){
m1 <- do.call(rbind, strsplit(x, "\\s+(?=\\()", perl = TRUE))
parse_date_time(m1[,1], c("%d/%m/%Y %I:%M:%S %p", "%m/%d/%Y %I:%M:%S %p"),
tz = "GMT", exact = TRUE) + lubridate::hours(str_extract(m1[,2], "\\d+")
)})
data
str1 <- "11/09/2013 04:45:00 PM (GMT+11)"
require(lubridate)
exampleA <- c("11/09/2013 04:45:00 PM (GMT+11)",
"11/09/2013 04:45:00 PM (GMT+10)")
exampleA <- as.data.frame(exampleA)
exampleA$flag <- 0
exampleA$flag[grep(" PM \\(GMT\\+11\\)", exampleA$exampleA)] <- 1
exampleA$exampleA <- gsub(" PM \\(GMT\\+11\\)","", exampleA$exampleA)
exampleA$exampleA <- gsub(" PM \\(GMT\\+10\\)","", exampleA$exampleA)
exampleA$exampleA <- mdy_hms(exampleA$exampleA)
exampleA$exampleA[exampleA$flag == 1] <- exampleA$exampleA - 3600
exampleB <- c("2013-11-09 03:45:00", "2013-11-09 04:45:00")
exampleB <- ymd_hms(exampleB)
# Proof it works
exampleA$exampleA == exampleB
[1] TRUE TRUE
If you have a mix of formats in 1 data set (i.e. mdy, ydm, etc) you can deal with this by using if statements -- either in a function which you can apply or a for loop -- and text if a certain position has a value >12 to determine the format, then use the appropriate lubridate function to convert it.

Resources