Epoch time and local time, different time zone - r

I want to convert epoch time to the local time. As you can see here, I have different time zones and I want to get a local time for each row. How can I do the conversion considering each time zone?
df <- data.frame(Epoch_Time = c(1460230930,1460231830, 1459929664),
Time_Zone = c("UTC−12:00", "UTC+10:00", "UTC-9:00"))

You need to store your epoch time as POSIX and then you can manipulate more easily.
library(dplyr)
library(lubridate)
df <- tibble(
time_epoch = as.POSIXct(
c(1460230930,1460231830, 1459929664), tz = "UTC", origin = "1970-01-01"
),
time_zone = c("UTC-12:00", "UTC+10:00", "UTC-09:00")
)
df <- mutate(df,
time_zone = as.numeric(substr(time_zone, 4, 6)),
time_local = as.character(time_epoch + hours(time_zone))
)
df
# # A tibble: 3 x 3
# time_epoch time_zone time_local
# <dttm> <dbl> <chr>
# 1 2016-04-09 21:42:10 -12 2016-04-09 11:42:10
# 2 2016-04-09 21:57:10 10 2016-04-10 09:57:10
# 3 2016-04-06 10:01:04 -9 2016-04-06 03:01:04
Notes:
I haven't put the effort in to properly generalise the conversion
from your UTC strings, only enough to use for this example. Ideally,
you want Olson Names instead of offsets, you can get these
here
the time_local is stored as character, you cannot store a date/time column with multiple time zones, they are stored with a
single value, see attributes(df$time_epoch)
attributes(df$time_epoch)
# $class
# [1] "POSIXct" "POSIXt"
#
# $tzone
# [1] "UTC"

Related

dplyr::if_else changes datetime (POSIXct) values

I'm working with a dataset that has a lot of timestamps. There are some invalid timestamps which I try to identify and set to NA. Because if_else() forces me to have the same data type in both arms, I'm using as.POSIXct(NA) to encode such missing values.
Interestingly, the results differ when I invert the test (and change the true and false argument) in if_else().
Here is some code to illustrate my problems:
x <- tibble(
A = parse_datetime("2020-08-18 19:00"),
B = if_else(TRUE, A, as.POSIXct(NA)),
C = if_else(FALSE, as.POSIXct(NA), A)
)
> x
# A tibble: 1 x 3
A B C
<dttm> <dttm> <dttm>
1 2020-08-18 19:00:00 2020-08-18 19:00:00 2020-08-18 21:00:00
Any idea, why C is two hours later?
Follow-up:
Based on the great answers below, I think a more readable solution should perhaps generate a missing datetime object with parse_datetime(NA_character_) and use this in the code instead of as.POSIXct().
R> NA_datetime_ <- parse_datetime(NA_character_)
R> x <- tibble(
A = parse_datetime("2020-08-18 19:00"),
B = if_else(TRUE, A, NA_datetime_),
C = if_else(FALSE, NA_datetime_, A)
)
R> map(x, lubridate::tz)
$A
[1] "UTC"
$B
[1] "UTC"
$C
[1] "UTC"
At First, you need to know that parse_datetime() returns a date-time object with an tzone attribute default to UTC. You can use lubridate::tz(x$A) and attributes(x$A) to check it.
From the document of if_else(), it said the true and false arguments must be the same type. All other attributes are taken from true. Hence, in part C of your tibble:
C = if_else(FALSE, as.POSIXct(NA), A)
as.POSIXct(NA) doesn't have a tzone attribute, so A's tzone is dropped and reset to the time zone of your region. Actually, C is not two hours later. The three columns have equal time but unequal time zones. To fix it, you can adjust as.POSIXct(NA) to own a tzone attribute, i.e. replace it with
as.POSIXct(NA_character_, tz = "UTC")
Note: You must use NA_character_ instead of NA because the tz argument in as.POSIXct() only works on character objects.
Finally, revise your code as
x <- tibble(
A = parse_datetime("2020-08-18 19:00"),
B = if_else(TRUE, A, as.POSIXct(NA_character_, tz = "UTC")),
C = if_else(FALSE, as.POSIXct(NA_character_, tz = "UTC"), A)
)
# # A tibble: 1 x 3
# A B C
# <dttm> <dttm> <dttm>
# 1 2020-08-18 19:00:00 2020-08-18 19:00:00 2020-08-18 19:00:00
Remember to check their time zones.
R > lubridate::tz(x$A)
[1] "UTC"
R > lubridate::tz(x$B)
[1] "UTC"
R > lubridate::tz(x$C)
[1] "UTC"
This is a timezone problem :
lubridate::tz(x$A)
[1] "UTC"
lubridate::tz(x$B)
[1] "UTC"
lubridate::tz(x$C)
[1] ""
This is due to the way if_else <- function (test, yes, no) works : it uses the attributes of the yes argument which for C is NA.

How to clean a time column in r

I have a time column in R as:
22:34:47
06:23:15
7:35:15
5:45
How to make all the time values in a column into hh:mm:ss format. I have used
as_date(a$time, tz=NULL) but I am not able to get the format which I wanted.
Here is an option with parse_date_time which can take multiple formats
library(lubridate)
format(parse_date_time(time, c("HMS", "HM"), tz = "GMT"), "%H:%M:%S")
#[1] "22:34:47" "06:23:15" "07:35:15" "05:45:00"
data
time <- c("22:34:47", "06:23:15", "7:35:15", "5:45")
Nothing a bit of formatting can't take care of:
x <- c("22:34:47","06:23:15","7:35:15","5:45")
format(
pmax(
as.POSIXct(x, format="%T", tz="UTC"),
as.POSIXct(x, format="%R", tz="UTC"), na.rm=TRUE
),
"%T"
)
#[1] "22:34:47" "06:23:15" "07:35:15" "05:45:00"
The pmax means any additional seconds will be taken in preference to just hh:mm.
You could get functional if you wanted to get a similar result with less typing, and more opportunity for turning it into a repeatable function.
do.call(pmax, c(lapply(c("%T","%R"), as.POSIXct, x=x, tz="UTC"), na.rm=TRUE))
Using a tidyverse approach with dplyr and hms verbs.
library(dplyr)
library(hms)
a <- tibble(time = c("22:34:47", "06:23:15", "7:35:15", "5:45"))
a %>%
mutate(
time = case_when(
is.na(parse_hms(time)) ~ parse_hm(time),
TRUE ~ parse_hms(time)
)
)
# # A tibble: 4 x 1
# time
# <time>
# 1 22:34
# 2 06:23
# 3 07:35
# 4 05:45
Note that the use of case_when could be replaced with an ifelse. The reason for this conditional is that parse_hms will return NA for values without seconds.
You may also want the output to be a POSIX compliant value, you may adapt the previous solution to do so.
a %>%
mutate(
time = case_when(
is.na(parse_hms(time)) ~ as.POSIXct(parse_hm(time)),
TRUE ~ as.POSIXct(parse_hms(time))
)
)
# # A tibble: 4 x 1
# time
# <dttm>
# 1 1970-01-01 22:34:47
# 2 1970-01-01 06:23:15
# 3 1970-01-01 07:35:15
# 4 1970-01-01 05:45:00
Note this will set the date to origin, which is 1970-01-01 by default.

Vectorised time zone conversion with lubridate

I have a data frame with a column of date-time strings:
library(tidyverse)
library(lubridate)
testdf = data_frame(
mytz = c('Australia/Sydney', 'Australia/Adelaide', 'Australia/Perth'),
mydt = c('2018-01-17T09:15:00', '2018-01-17T09:16:00', '2018-01-17T09:18:00'))
testdf
# A tibble: 3 x 2
# mytz mydt
# <chr> <chr>
# 1 Australia/Sydney 2018-01-17T09:15:00
# 2 Australia/Adelaide 2018-01-17T09:16:00
# 3 Australia/Perth 2018-01-17T09:18:00
I want to convert these date-time strings to POSIX date-time objects with their respective timezones:
testdf %>% mutate(mydt_new = ymd_hms(mydt, tz = mytz))
Error in mutate_impl(.data, dots) :
Evaluation error: tz argument must be a single character string.
In addition: Warning message:
In if (tz != "UTC") { :
the condition has length > 1 and only the first element will be used
I get the same result if I use ymd_hms without a timezone and pipe it into force_tz. Is it fair to conclude that lubridate doesn't support any sort of vectorisation when it comes to timezone operations?
Another option is map2. It may be better to store different tz output in a list as this may get coerced to a single tz
library(tidyverse)
out <- testdf %>%
mutate(mydt_new = map2(mydt, mytz, ~ymd_hms(.x, tz = .y)))
If required, it can be unnested
out %>%
unnest
The values in the list are
out %>%
pull(mydt_new)
#[[1]]
#[1] "2018-01-17 09:15:00 AEDT"
#[[2]]
#[1] "2018-01-17 09:16:00 ACDT"
#[[3]]
#[1] "2018-01-17 09:18:00 AWST"
tz argument must be a single character string. indicates that there are more than one time zones thrown into ymd_hms(). In order to make sure that there is only one time zone being thrown into the function, I used rowwise(). Note that I am not in Australian time zone. So I am not sure if the outcome I have is identical to yours.
testdf <- data_frame(mytz = c('Australia/Sydney', 'Australia/Adelaide', 'Australia/Perth'),
mydt = c('2018-01-17 09:15:00', '2018-01-17 09:16:00', '2018-01-17 09:18:00'))
testdf %>%
rowwise %>%
mutate(mydt_new = ymd_hms(mydt, tz = mytz))
mytz mydt mydt_new
<chr> <chr> <dttm>
1 Australia/Sydney 2018-01-17 09:15:00 2018-01-17 06:15:00
2 Australia/Adelaide 2018-01-17 09:16:00 2018-01-17 06:46:00
3 Australia/Perth 2018-01-17 09:18:00 2018-01-17 09:18:00

Convert serial number to character representation in R

I am importing some weather data, but the timestamp is split across different columns. I want to join these columns and create posix objects out of them.
datenum <- c()
for (i in 1:dim(weather)[1]){
date_string <- paste0(weather$Year.UTC[i],'-',weather$Month.UTC[i],'-',weather$Day.UTC[i],'-',weather$Hour.UTC[i]) # different columns of data
# for i = 1, date_string = "2012-12-31-23"
datenum[i] <- as.POSIXct(date_string, format="%Y-%m-%d-%H",tz="GMT", origin = "1960-01-01")
# for i = 1, datenum[1] = 1356994800 (numeric)
}
as.Date(datenum[1], origin = "1960-01-01")
# Gives character = "7285-07-27"
To visually confirm that I am doing it right, I would like to see a string in the form "yyyy-mm-dd HH:MM:SS", which is what I try to obtain with as.Date. The origin is the same when converting to a serial number and back to a character, but the date is completely wrong. What I am doing wrong?
Why so complicated?
weather <- data.frame(Year.UTC=c(2012, 2013),
Month.UTC=c(1,2),
Day.UTC=c(1,2),
Hour.UTC=c(22,23))
weather <- within(weather, datetime <-
as.POSIXct(paste(Year.UTC, Month.UTC, Day.UTC, Hour.UTC, sep="-"),
format="%Y-%m-%d-%H", tz="UTC"))
# Year.UTC Month.UTC Day.UTC Hour.UTC datetime
#1 2012 1 1 22 2012-01-01 22:00:00
#2 2013 2 2 23 2013-02-02 23:00:00
As you see, you don't need a loop at all.

How to change the format of a 13 digit number to a date in R

I imported a dataset into R and one of the variables is a date but it is showing as a 13 digit number as such 1269576000000.
How can I change this number into a date? I am not sure what the format should be like but i'm guessing that this number also contains information about time (hours, minutes, seconds).
Is there any code to directly change the format of this variable in R?
Thanks.
The most common form would be the number of seconds from Jan 1, 1970, at least that is what the POSIX standard has been. Unlike Simon0101, I think you should be using as.POSIXct, because you will generally be wanting to stick such results in dataframes and POSIXlt objects get messed up in that environment. You apparently are being given the time in the number of milliseconds however:
> as.POSIXct(1269576000000, origin="1970-01-01")
[1] "42201-04-06 17:00:00 PDT" # not a sensible result
> as.POSIXct(1269576000000/1000, origin="1970-01-01")
[1] "2010-03-26 05:00:00 PDT"
So it was neither the number of fractional days nor seconds but rather the number of milliseconds since the origin.
You are looking for as.POSIXlt which converts a numeric data type to the (possibly) fractional number of days that have passed since an origin date, which is why it is important to know which date is counted as day 1 (or sometimes day 0!) by whatever generated your data:
x <- 1269576000000
# Guessing at the origin
as.POSIXlt( x/1e3, tz="GMT", origin="1970-01-01")
[1] "2010-03-26 04:00:00 GMT"
And to display fractional seconds, set the option digits.secs, i.e.
options(digits.secs=3)
x <- 1269576000500
as.POSIXlt( x/1e3, tz="GMT", origin="1970-01-01")
[1] "2010-03-26 04:00:00.5 GMT"
Which can easily be added to a dataframe (I am not sure why #DWin thinks this is a problem):
x <- 1269576000000
x <- seq( x , by = 500 , length.out = 10 )
df <- data.frame( ID = 1:10 , Time = as.POSIXlt( x/1e3, tz="GMT", origin="1970-01-01") )
df
ID Time
1 1 2010-03-26 04:00:00.0
2 2 2010-03-26 04:00:00.5
3 3 2010-03-26 04:00:01.0
4 4 2010-03-26 04:00:01.5
5 5 2010-03-26 04:00:02.0
6 6 2010-03-26 04:00:02.5
7 7 2010-03-26 04:00:03.0
8 8 2010-03-26 04:00:03.5
9 9 2010-03-26 04:00:04.0
10 10 2010-03-26 04:00:04.5

Resources