Vectorised time zone conversion with lubridate

Vectorised time zone conversion with lubridate - r

I have a data frame with a column of date-time strings:
library(tidyverse)
library(lubridate)
testdf = data_frame(
mytz = c('Australia/Sydney', 'Australia/Adelaide', 'Australia/Perth'),
mydt = c('2018-01-17T09:15:00', '2018-01-17T09:16:00', '2018-01-17T09:18:00'))
testdf
# A tibble: 3 x 2
# mytz mydt
# <chr> <chr>
# 1 Australia/Sydney 2018-01-17T09:15:00
# 2 Australia/Adelaide 2018-01-17T09:16:00
# 3 Australia/Perth 2018-01-17T09:18:00
I want to convert these date-time strings to POSIX date-time objects with their respective timezones:
testdf %>% mutate(mydt_new = ymd_hms(mydt, tz = mytz))
Error in mutate_impl(.data, dots) :
Evaluation error: tz argument must be a single character string.
In addition: Warning message:
In if (tz != "UTC") { :
the condition has length > 1 and only the first element will be used
I get the same result if I use ymd_hms without a timezone and pipe it into force_tz. Is it fair to conclude that lubridate doesn't support any sort of vectorisation when it comes to timezone operations?

Another option is map2. It may be better to store different tz output in a list as this may get coerced to a single tz
library(tidyverse)
out <- testdf %>%
mutate(mydt_new = map2(mydt, mytz, ~ymd_hms(.x, tz = .y)))
If required, it can be unnested
out %>%
unnest
The values in the list are
out %>%
pull(mydt_new)
#[[1]]
#[1] "2018-01-17 09:15:00 AEDT"
#[[2]]
#[1] "2018-01-17 09:16:00 ACDT"
#[[3]]
#[1] "2018-01-17 09:18:00 AWST"

tz argument must be a single character string. indicates that there are more than one time zones thrown into ymd_hms(). In order to make sure that there is only one time zone being thrown into the function, I used rowwise(). Note that I am not in Australian time zone. So I am not sure if the outcome I have is identical to yours.
testdf <- data_frame(mytz = c('Australia/Sydney', 'Australia/Adelaide', 'Australia/Perth'),
mydt = c('2018-01-17 09:15:00', '2018-01-17 09:16:00', '2018-01-17 09:18:00'))
testdf %>%
rowwise %>%
mutate(mydt_new = ymd_hms(mydt, tz = mytz))
mytz mydt mydt_new
<chr> <chr> <dttm>
1 Australia/Sydney 2018-01-17 09:15:00 2018-01-17 06:15:00
2 Australia/Adelaide 2018-01-17 09:16:00 2018-01-17 06:46:00
3 Australia/Perth 2018-01-17 09:18:00 2018-01-17 09:18:00

Related

String to Date with leading X character

I'm trying to convert the Date column to date format but I keep getting an error. I think the problem might be that the date is a character and has an X before the year:
HMC.Close Date
1 39.71 X2007.01.03
2 40.04 X2007.01.04
3 38.67 X2007.01.05
4 38.89 X2007.01.08
5 38.91 X2007.01.09
6 37.94 X2007.01.10
This is the code I've been running:
stock_honda <- expand.grid("HMC" = HMC$HMC.Close) %>%
"Date" = as.Date(row.names(as.data.frame(HMC))) %>%
subset(Date >"2021-02-28" & Date < "2022-03-11")
Error in charToDate(x) :
character string is not in a standard unambiguous format

You can use gsub to first remove the "X" that is causing a problem and then use ymd from lubridate package to convert the strings into Dates. Additionally, you can make that conversion using mutate(across(...)) from the dplyr package to do everything in a tidyverse-way.
library(dplyr)
library(lubridate)
df |>
# Mutate Date to remove X and convert it to Date
mutate(across(Date, function(x){
ymd(gsub("X","", x))
}))
# HMC.Close Date
#1 39.71 2007-01-03
#2 40.04 2007-01-04
#3 38.67 2007-01-05
#4 38.89 2007-01-08
#5 38.91 2007-01-09
#6 37.94 2007-01-10

Here is a pipeline that avoids prepending "X" to the dates in the first place:
library(quantmod)
getSymbols(c("FCAU.VI", "TYO", "VWAGY", "HMC"), na.rm = TRUE)
library(tidyverse)
stock_honda <- (HMC
%>% as.data.frame()
%>% rownames_to_column("Date")
%>% select(Date, HMC.Close)
%>% mutate(across(Date, lubridate::ymd))
%>% filter(between(Date, as.Date("2021-02-28"), as.Date("2022-03-11")))
)
It would be nice if there were a version of between that avoided the need to explicitly convert to dates. (filter("2021-02-28" < Date, Date < "2022-03-11") would also work for the last step.)

R: readxl and date format

I read in an excel file, where 1 column contains dates in different format: excel format (e.g. 43596) and text (e.g. "01.01.2020").
To convert excel format one can use as.Date(as.numeric(df$date), origin = "1899-12-30")
to convert text one can use as.Date(df$date, format = "%d.%m.%Y")
These work for individual values, but when I try ifelse as:
df$date <- ifelse(length(df$date)==5,
as.Date(as.numeric(df$date), origin = "1899-12-30"),
as.Date(df$date, format = "%d.%m.%Y"))
or a for loop:
for (i in length(x)) {
if(nchar(x[i])==5) {
y[i] <- as.Date(as.numeric(x[i]), origin = "1899-12-30")
} else {x[i] <- as.Date(x[i], , format = "%d.%m.%Y"))}
} print(x)
It does not work because of:
"character string is not in a standard unambiguous format"
Maybe you could advice a better solution to convert/ replace different date formats in the appropriate one?

I have 2 solutions for it.
Changing the code, which I don't like because you are depending on xlsx date formats:
> df <- tibble(date = c("01.01.2020","43596"))
>
> df$date <- as.Date(ifelse(nchar(df$date)==5,
+ as.Date(as.numeric(df$date), origin = "1899-12-30"),
+ as.Date(df$date, format = "%d.%m.%Y")), origin = "1970-01-01")
Warning message:
In as.Date(as.numeric(df$date), origin = "1899-12-30") :
NAs introducidos por coerción
>
> df$date
[1] "2020-01-01" "2019-05-11"
>
Save the document as CSV and use read_csv() function from readr package. That solves everything !!!!

You could use sapply to apply ifelse to each value:
df$date <- as.Date(sapply(df$date,function(date) ifelse(nchar(date)==5,
as.Date(as.numeric(date), origin = "1899-12-30"),
as.Date(date, format = "%d.%m.%Y"))),
origin="1970-01-01")
df
# A tibble: 6 x 2
contract date
<dbl> <date>
1 231429 2019-05-11
2 231437 2020-01-07
3 231449 2021-01-01
4 231459 2020-03-03
5 231463 2020-10-27
6 231466 2011-03-17

A tidyverse solution using rowwise
library(dplyr)
library(lubridate)
df %>%
rowwise() %>%
mutate(date_new=as.Date(ifelse(grepl("\\.",date),
as.character(dmy(date)),
as.character(as.Date(as.numeric(date), origin="1899-12-30"))))) %>%
ungroup()
# A tibble: 6 × 3
contract date date_new
<dbl> <chr> <date>
1 231429 43596 2019-05-11
2 231437 07.01.2020 2020-01-07
3 231449 01.01.2021 2021-01-01
4 231459 03.03.2020 2020-03-03
5 231463 44131 2020-10-27
6 231466 40619 2011-03-17

dplyr::if_else changes datetime (POSIXct) values

I'm working with a dataset that has a lot of timestamps. There are some invalid timestamps which I try to identify and set to NA. Because if_else() forces me to have the same data type in both arms, I'm using as.POSIXct(NA) to encode such missing values.
Interestingly, the results differ when I invert the test (and change the true and false argument) in if_else().
Here is some code to illustrate my problems:
x <- tibble(
A = parse_datetime("2020-08-18 19:00"),
B = if_else(TRUE, A, as.POSIXct(NA)),
C = if_else(FALSE, as.POSIXct(NA), A)
)
> x
# A tibble: 1 x 3
A B C
<dttm> <dttm> <dttm>
1 2020-08-18 19:00:00 2020-08-18 19:00:00 2020-08-18 21:00:00
Any idea, why C is two hours later?
Follow-up:
Based on the great answers below, I think a more readable solution should perhaps generate a missing datetime object with parse_datetime(NA_character_) and use this in the code instead of as.POSIXct().
R> NA_datetime_ <- parse_datetime(NA_character_)
R> x <- tibble(
A = parse_datetime("2020-08-18 19:00"),
B = if_else(TRUE, A, NA_datetime_),
C = if_else(FALSE, NA_datetime_, A)
)
R> map(x, lubridate::tz)
$A
[1] "UTC"
$B
[1] "UTC"
$C
[1] "UTC"

At First, you need to know that parse_datetime() returns a date-time object with an tzone attribute default to UTC. You can use lubridate::tz(x$A) and attributes(x$A) to check it.
From the document of if_else(), it said the true and false arguments must be the same type. All other attributes are taken from true. Hence, in part C of your tibble:
C = if_else(FALSE, as.POSIXct(NA), A)
as.POSIXct(NA) doesn't have a tzone attribute, so A's tzone is dropped and reset to the time zone of your region. Actually, C is not two hours later. The three columns have equal time but unequal time zones. To fix it, you can adjust as.POSIXct(NA) to own a tzone attribute, i.e. replace it with
as.POSIXct(NA_character_, tz = "UTC")
Note: You must use NA_character_ instead of NA because the tz argument in as.POSIXct() only works on character objects.
Finally, revise your code as
x <- tibble(
A = parse_datetime("2020-08-18 19:00"),
B = if_else(TRUE, A, as.POSIXct(NA_character_, tz = "UTC")),
C = if_else(FALSE, as.POSIXct(NA_character_, tz = "UTC"), A)
)
# # A tibble: 1 x 3
# A B C
# <dttm> <dttm> <dttm>
# 1 2020-08-18 19:00:00 2020-08-18 19:00:00 2020-08-18 19:00:00
Remember to check their time zones.
R > lubridate::tz(x$A)
[1] "UTC"
R > lubridate::tz(x$B)
[1] "UTC"
R > lubridate::tz(x$C)
[1] "UTC"

This is a timezone problem :
lubridate::tz(x$A)
[1] "UTC"
lubridate::tz(x$B)
[1] "UTC"
lubridate::tz(x$C)
[1] ""
This is due to the way if_else <- function (test, yes, no) works : it uses the attributes of the yes argument which for C is NA.

Mutate and format multiple date columns [duplicate]

This question already has an answer here:
Convert multiple character columns to as.Date and time in R
(1 answer)
Closed 2 years ago.
I have a tibble containing some date columns formatted as strings:
library(tidyverse)
df<-tibble(dates1 = c("2020-08-03T00:00:00.000Z", "2020-08-03T00:00:00.000Z"),
dates2 = c("2020-08-05T00:00:00.000Z", "2020-08-05T00:00:00.000Z"))
I want to convert the strings from YMD-HMS to DMY-HMS. Can someone explain to me why this doesn't work:
df %>%
mutate_at(vars(starts_with("dates")), as.Date, format="%d/%m/%Y %H:%M:%S")
Whereas this does?
df %>% mutate(dates1 = format(as.Date(dates1), "%d/%m/%Y %H:%M:%S")) %>%
mutate(dates2 = format(as.Date(dates2), "%d/%m/%Y %H:%M:%S"))
Finally, is it possible to assign these columns as 'datetime' columns (e.g. dttm) rather than chr once the date formatting has taken place?

The format argument which you are passing is for as.Date whereas what you really want is to pass it for format function. You can use an anonymous function for that or use formula syntax.
library(dplyr)
df %>%
mutate(across(starts_with("dates"), ~format(as.Date(.), "%d/%m/%Y %H:%M:%S")))
# A tibble: 2 x 2
# dates1 dates2
# <chr> <chr>
#1 03/08/2020 00:00:00 05/08/2020 00:00:00
#2 03/08/2020 00:00:00 05/08/2020 00:00:00
To represent data as date or datetime R uses standard way of representing them which is Y-M-D H:M:S, you can change the representation using format but then the output would be character as above.
df %>%
mutate(across(starts_with("dates"), lubridate::ymd_hms))
# dates1 dates2
# <dttm> <dttm>
#1 2020-08-03 00:00:00 2020-08-05 00:00:00
#2 2020-08-03 00:00:00 2020-08-05 00:00:00

How to clean a time column in r

I have a time column in R as:
22:34:47
06:23:15
7:35:15
5:45
How to make all the time values in a column into hh:mm:ss format. I have used
as_date(a$time, tz=NULL) but I am not able to get the format which I wanted.

Here is an option with parse_date_time which can take multiple formats
library(lubridate)
format(parse_date_time(time, c("HMS", "HM"), tz = "GMT"), "%H:%M:%S")
#[1] "22:34:47" "06:23:15" "07:35:15" "05:45:00"
data
time <- c("22:34:47", "06:23:15", "7:35:15", "5:45")

Nothing a bit of formatting can't take care of:
x <- c("22:34:47","06:23:15","7:35:15","5:45")
format(
pmax(
as.POSIXct(x, format="%T", tz="UTC"),
as.POSIXct(x, format="%R", tz="UTC"), na.rm=TRUE
),
"%T"
)
#[1] "22:34:47" "06:23:15" "07:35:15" "05:45:00"
The pmax means any additional seconds will be taken in preference to just hh:mm.
You could get functional if you wanted to get a similar result with less typing, and more opportunity for turning it into a repeatable function.
do.call(pmax, c(lapply(c("%T","%R"), as.POSIXct, x=x, tz="UTC"), na.rm=TRUE))

Using a tidyverse approach with dplyr and hms verbs.
library(dplyr)
library(hms)
a <- tibble(time = c("22:34:47", "06:23:15", "7:35:15", "5:45"))
a %>%
mutate(
time = case_when(
is.na(parse_hms(time)) ~ parse_hm(time),
TRUE ~ parse_hms(time)
)
)
# # A tibble: 4 x 1
# time
# <time>
# 1 22:34
# 2 06:23
# 3 07:35
# 4 05:45
Note that the use of case_when could be replaced with an ifelse. The reason for this conditional is that parse_hms will return NA for values without seconds.
You may also want the output to be a POSIX compliant value, you may adapt the previous solution to do so.
a %>%
mutate(
time = case_when(
is.na(parse_hms(time)) ~ as.POSIXct(parse_hm(time)),
TRUE ~ as.POSIXct(parse_hms(time))
)
)
# # A tibble: 4 x 1
# time
# <dttm>
# 1 1970-01-01 22:34:47
# 2 1970-01-01 06:23:15
# 3 1970-01-01 07:35:15
# 4 1970-01-01 05:45:00
Note this will set the date to origin, which is 1970-01-01 by default.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Vectorised time zone conversion with lubridate - r

Related

String to Date with leading X character

R: readxl and date format

dplyr::if_else changes datetime (POSIXct) values

Mutate and format multiple date columns [duplicate]

How to clean a time column in r

Categories

Resources