Convert date fromat from 31-May-2020 to timeseries data in R - r

I have a date format as 01-June-2020. I would like to convert it to a time series data in R. I tried as.Date but it returns NAs.
Here is the data:
dput(head(TData))
structure(list(Date = c("31-May-20", "01-Jun-20", "02-Jun-20",
"03-Jun-20", "04-Jun-20", "07-Jun-20"), Price = c(7213.03, 7288.81,
7285.23, 7222.41, 7207.78, 7267.86), Open = c(7050.66, 7213.03,
7288.81, 7285.23, 7222.41, 7207.78), High = c(7338.96, 7288.81,
7321.36, 7311.85, 7207.78, 7277.7), Low = c(7149.71, 7202.14,
7277.63, 7202.39, 7129.25, 7233.67), Vol. = c("307.44M", "349.59M",
"343.52M", "286.85M", "234.18M", "225.87M"), `Change %` = c("2.30%",
"1.05%", "-0.05%", "-0.86%", "-0.20%", "0.83%")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))

We have to specify the format. By default, the format is YYYY-MM-DD i.e. %Y-%m-%d. Here, it is %d 2 digit day, followed by abbreviated month in characters- %b and 2 digit year - %y
TData$Date <- as.Date(TData$Date, '%d-%b-%y')
If we want to create a time series data, may be use xts
library(lubridate)
library(xts)
library(dplyr)
TData %>%
mutate(Date = dmy(Date)) %>%
select(Date, where(is.numeric)) %>%
{xts(.[-1], order.by = .$Date)}
Price Open High Low
2020-05-31 7213.03 7050.66 7338.96 7149.71
2020-06-01 7288.81 7213.03 7288.81 7202.14
2020-06-02 7285.23 7288.81 7321.36 7277.63
2020-06-03 7222.41 7285.23 7311.85 7202.39
2020-06-04 7207.78 7222.41 7207.78 7129.25
2020-06-07 7267.86 7207.78 7277.70 7233.67
or may use tsibble
library(tsibble)
TData %>%
mutate(Date = dmy(Date)) %>%
select(Date, where(is.numeric)) %>%
as_tsibble(index = Date)
-output
# A tsibble: 6 x 5 [1D]
Date Price Open High Low
<date> <dbl> <dbl> <dbl> <dbl>
1 2020-05-31 7213. 7051. 7339. 7150.
2 2020-06-01 7289. 7213. 7289. 7202.
3 2020-06-02 7285. 7289. 7321. 7278.
4 2020-06-03 7222. 7285. 7312. 7202.
5 2020-06-04 7208. 7222. 7208. 7129.
6 2020-06-07 7268. 7208. 7278. 7234.

We can also use lubridate package functions. Since months are stored as abbreviated month names we use %b instead of %m here:
library(lubridate)
df %>%
mutate(Date = as_date(Date, format = "%d-%b-%Y"))
# A tibble: 6 x 7
Date Price Open High Low Vol. `Change %`
<date> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2020-05-31 7213. 7051. 7339. 7150. 307.44M 2.30%
2 2020-06-01 7289. 7213. 7289. 7202. 349.59M 1.05%
3 2020-06-02 7285. 7289. 7321. 7278. 343.52M -0.05%
4 2020-06-03 7222. 7285. 7312. 7202. 286.85M -0.86%
5 2020-06-04 7208. 7222. 7208. 7129. 234.18M -0.20%
6 2020-06-07 7268. 7208. 7278. 7234. 225.87M 0.83%

Related

How to parse time-date (not date-time)

I have a dataframe with a few columns that contain time/ date information. I'm familiar with using lubridate to parse date-time (ie mm/dd/yyyy hh:mm:ss), but this dataframe has date time in reverse order (ie hh:mm:ss mm/dd/yyyy). How do I get this to read as a date/time? The column is currently reading as a character which is useless to me. Below is an example of what my dataset looks like. I can't make the "time_date" column read as a date -time.
df <- tribble(~activity, ~time_date,
"run", "15:06:17 03/08/2016",
"skip", "09:01:00 03/08/2016")
You should first convert it to a date time with right format and after that you can use strftime with the desired format like this:
datetimes <- as.POSIXct(df$time_date, format = "%H:%M:%S %m/%d/%Y")
df$time_date <- strftime(datetimes, format = "%m/%d/%Y %H:%M:%S")
df
#> # A tibble: 2 × 2
#> activity time_date
#> <chr> <chr>
#> 1 run 03/08/2016 15:06:17
#> 2 skip 03/08/2016 09:01:00
Created on 2023-01-04 with reprex v2.0.2
With dplyr and lubridate on character class data.
library(dplyr)
library(lubridate)
df %>%
rowwise() %>%
mutate(dd = strsplit(time_date, " "),
date_time = mdy_hms(paste(unlist(dd)[2], unlist(dd)[1])),
dd = NULL) %>%
ungroup()
# A tibble: 2 × 3
activity time_date date_time
<chr> <chr> <dttm>
1 run 15:06:17 03/08/2016 2016-03-08 15:06:17
2 skip 09:01:00 03/08/2016 2016-03-08 09:01:00
Alternatively using str_extract
df %>%
mutate(date_time = mdy_hms(paste(str_extract(time_date, " \\d+/.+"),
str_extract(time_date, "\\d+:.+ "))))
# A tibble: 2 × 3
activity time_date date_time
<chr> <chr> <dttm>
1 run 15:06:17 03/08/2016 2016-03-08 15:06:17
2 skip 09:01:00 03/08/2016 2016-03-08 09:01:00
You can use lubridate::parse_date_time() and specify the order as "HMS mdy":
library(dplyr)
library(lubridate)
df %>%
mutate(date_time = parse_date_time(time_date, "HMS mdy"))
# A tibble: 2 × 3
activity time_date date_time
<chr> <chr> <dttm>
1 run 15:06:17 03/08/2016 2016-03-08 15:06:17
2 skip 09:01:00 03/08/2016 2016-03-08 09:01:00

Conditionally mutate column across list of dataframes in R

I am working with a large list of dataframes that use inconsistent date formats. I would like to conditionally mutate across the list so that any dataframe that contains a string will use one date format, and those that do not contain the string use another format. In other words, I want to distinguish between dataframes launched in year 2019 (which use mdy) and those launched in all others years (which use dmy).
The following code will conditionally mutate rows within a dataframe, but I am unsure how to conditionally mutate across the entire column.
dataframes %>% map(~.x %>%
mutate(date_time = if_else(str_detect(date_time, "/19 "),
mdy_hms(date_time), dmy_hms(date_time)))
Thank you!
edit
Data and code example. There are dataframes that contain a mixture of years.
library(tidyverse)
library(lubridate)
dataframes <- list(
tibble(date_time = c("07/06/19 01:00:00 PM", "07/06/20 01:00:00 PM"), num = 1:2), # July 6th
tibble(date_time = c("06/07/20 01:00:00 PM", "06/07/21 01:00:00 PM"), num = 1:2) # July 6th
)
dataframes %>%
map(~.x %>%
mutate(date_time = if_else(str_detect(date_time, "/19 "),
mdy_hms(date_time), dmy_hms(date_time)),
date = date(date_time),
month = month(date_time),
doy = yday(date_time)))
[[1]]
# A tibble: 2 × 5
date_time num date month doy
<dttm> <int> <date> <dbl> <dbl>
1 2019-07-06 13:00:00 1 2019-07-06 7 187
2 2020-06-07 13:00:00 2 2020-06-07 6 159
[[2]]
# A tibble: 2 × 5
date_time num date month doy
<dttm> <int> <date> <dbl> <dbl>
1 2020-07-06 13:00:00 1 2020-07-06 7 188
2 2021-07-06 13:00:00 2 2021-07-06 7 187
If you are trying to determine the format of the date column for the whole data.frame based on the presence of any date from 2019, then a small tweak of your code should work.
Instead of evaluating each record for the presence of /19 , you set the condition of the if_else() to be any(str_detect(...)) which returns TRUE if any of the values are TRUE. However the result of any() is always of length 1 so you then need to rep() the result to match the length of the whole data.frame using dplyr::n().
library(tidyverse)
library(lubridate)
dataframes <- list(
tibble(date_time = c("07/06/19 01:00:00 PM", "07/06/20 01:00:00 PM"), num = 1:2), # July 6th
tibble(date_time = c("06/07/20 01:00:00 PM", "06/07/21 01:00:00 PM"), num = 1:2) # July 6th
)
dataframes %>%
map( ~ .x %>%
mutate(
date_time = if_else(str_detect(date_time, "/19 ") %>%
any() %>%
rep(n()),
mdy_hms(date_time),
dmy_hms(date_time)),
date = date(date_time),
month = month(date_time),
doy = yday(date_time)
))
#> [[1]]
#> # A tibble: 2 × 5
#> date_time num date month doy
#> <dttm> <int> <date> <dbl> <dbl>
#> 1 2019-07-06 13:00:00 1 2019-07-06 7 187
#> 2 2020-07-06 13:00:00 2 2020-07-06 7 188
#>
#> [[2]]
#> # A tibble: 2 × 5
#> date_time num date month doy
#> <dttm> <int> <date> <dbl> <dbl>
#> 1 2020-07-06 13:00:00 1 2020-07-06 7 188
#> 2 2021-07-06 13:00:00 2 2021-07-06 7 187
Created on 2022-07-20 by the reprex package (v2.0.1)

do() superseded! Alternative is to use across(), nest_by(), and summarise, how?

I'm doing something quite simple. Given a dataframe of start dates and end dates for specific periods I want to expand/create a full sequence for each period binned by week (with the factor for each row), then output this in a single large dataframe.
For instance:
library(tidyverse)
library(lubridate)
# Dataset
start_dates = ymd_hms(c("2019-05-08 00:00:00",
"2020-01-17 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00"), tz = "UTC")
end_dates = ymd_hms(c( "2019-10-24 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00",
"2022-01-19 00:00:00"), tz = "UTC")
df1 = data.frame(studying = paste0("period",seq(1:7),sep = ""),start_dates,end_dates)
It was suggested to me to use do(), which currently works fine but I hate it when things are superseded. I also have a way of doing it using map2. But reading the file (https://dplyr.tidyverse.org/reference/do.html) suggests you can use nest_by(), across() and summarise() to do the same job as do(), how would I go about getting same result? I've tried a lot of things but I just can't seem to get it.
# do() way to do it
df1 %>%
group_by(studying) %>%
do(data.frame(week=seq(.$start_dates,.$end_dates,by="1 week")))
# transmute() way to do it
df1 %>%
transmute(weeks = map2(start_dates,end_dates, seq, by = "1 week"), studying)
%>% unnest(cols = c(weeks))
As the documentation of ?do suggests, we can now use summarise and replace the . with across():
library(tidyverse)
library(lubridate)
df1 %>%
group_by(studying) %>%
summarise(week = seq(across()$start_dates,
across()$end_dates,
by = "1 week"))
#> `summarise()` has grouped output by 'studying'. You can override using the
#> `.groups` argument.
#> # A tibble: 134 x 2
#> # Groups: studying [7]
#> studying week
#> <chr> <dttm>
#> 1 period1 2019-05-08 00:00:00
#> 2 period1 2019-05-15 00:00:00
#> 3 period1 2019-05-22 00:00:00
#> 4 period1 2019-05-29 00:00:00
#> 5 period1 2019-06-05 00:00:00
#> 6 period1 2019-06-12 00:00:00
#> 7 period1 2019-06-19 00:00:00
#> 8 period1 2019-06-26 00:00:00
#> 9 period1 2019-07-03 00:00:00
#> 10 period1 2019-07-10 00:00:00
#> # … with 124 more rows
Created on 2022-01-19 by the reprex package (v0.3.0)
You can also use tidyr::complete:
df1 %>%
group_by(studying) %>%
complete(start_dates = seq(from = start_dates, to = end_dates, by = "1 week")) %>%
select(-end_dates, weeks = start_dates)
# A tibble: 134 x 2
# Groups: studying [7]
studying weeks
<chr> <dttm>
1 period1 2019-05-08 00:00:00
2 period1 2019-05-15 00:00:00
3 period1 2019-05-22 00:00:00
4 period1 2019-05-29 00:00:00
5 period1 2019-06-05 00:00:00
6 period1 2019-06-12 00:00:00
7 period1 2019-06-19 00:00:00
8 period1 2019-06-26 00:00:00
9 period1 2019-07-03 00:00:00
10 period1 2019-07-10 00:00:00
# ... with 124 more rows
Although marked Experimental the help file for group_modify does say that
‘group_modify()’ is an evolution of ‘do()’
and, in fact, the code for the example in the question using group_modify is nearly the same as with do.
# with group_modify
df2 <- df1 %>%
group_by(studying) %>%
group_modify(~ data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))
# with do
df0 <- df1 %>%
group_by(studying) %>%
do(data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))
identical(df2, df0)
## [1] TRUE
Not sure if this exactly what you are looking for, but here is my attempt with rowwise and unnest
df1 %>%
rowwise() %>%
mutate(week = list(seq(start_dates, end_dates, by = "1 week"))) %>%
select(studying, week) %>%
unnest(cols = c(week))
Another approach:
library(tidyverse)
df1 %>%
group_by(studying) %>%
summarise(df = tibble(weeks = seq(start_dates, end_dates, by = 'week'))) %>%
unnest(df)
#> `summarise()` has grouped output by 'studying'. You can override using the `.groups` argument.
#> # A tibble: 134 × 2
#> # Groups: studying [7]
#> studying weeks
#> <chr> <dttm>
#> 1 period1 2019-05-08 00:00:00
#> 2 period1 2019-05-15 00:00:00
#> 3 period1 2019-05-22 00:00:00
#> 4 period1 2019-05-29 00:00:00
#> 5 period1 2019-06-05 00:00:00
#> 6 period1 2019-06-12 00:00:00
#> 7 period1 2019-06-19 00:00:00
#> 8 period1 2019-06-26 00:00:00
#> 9 period1 2019-07-03 00:00:00
#> 10 period1 2019-07-10 00:00:00
#> # … with 124 more rows
Created on 2022-01-20 by the reprex package (v2.0.1)

How to keep vector as date class in R when changing format of dates

I have the following vector of dates in R. I would like to convert it from character class into Date class, however I only want to display the dates as year-month (%Y-%m) instead of year-month-day (%Y-%m-%d)
library(tidyverse)
dates <- structure(list(Date = c("2022-03-24", "2022-04-21", "2022-05-24",
"2022-07-22", "2022-09-01")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
dates %>%
mutate(Date = format(as.Date(Date), '%Y-%m'))
when using format to convert the dates to %Y-%m, is there a way to maintain the class of the vector as <date> instead of as a character <chr>?
# A tibble: 5 x 1
Date
<chr>
1 2022-03-24
2 2022-04-21
3 2022-05-24
4 2022-07-22
5 2022-09-01
We may use floor_date to normalize the day to a single constant value from the Dates
library(lubridate)
library(dplyr)
dates %>%
mutate(Date2 = floor_date(ymd(Date), "month"))
-output
# A tibble: 5 × 2
Date Date2
<chr> <date>
1 2022-03-24 2022-03-01
2 2022-04-21 2022-04-01
3 2022-05-24 2022-05-01
4 2022-07-22 2022-07-01
5 2022-09-01 2022-09-01
Or another option is to assign the day
dates$Date2 <- ymd(dates$Date)
day(dates$Date2) <- 1
> dates
# A tibble: 5 × 2
Date Date2
<chr> <date>
1 2022-03-24 2022-03-01
2 2022-04-21 2022-04-01
3 2022-05-24 2022-05-01
4 2022-07-22 2022-07-01
5 2022-09-01 2022-09-01
A base R option -
You can change the last two digits of the date with 01.
as.Date(sub('\\d{2}$', '01', dates$Date))
#[1] "2022-03-01" "2022-04-01" "2022-05-01" "2022-07-01" "2022-09-01"

Set up data in order to use Prophet() in R

I want to use the Prophet() function in R, but I cannot transform my column "YearWeek" to a as.Date() column.
I have a column "YearWeek" that stores values from 201401 up to 201937 i.e. starting in 2014 week 1 up to 2019 week 37.
I don't know how to declare this column as a date in the form yyyy-ww needed to use the Prophet() function.
Does anyone know how to do this?
Thank you in advance.
One solution could be to append a 01 to the end of your yyyy-ww formatted dates.
Data:
library(tidyverse)
df <- cross2(2014:2019, str_pad(1:52, width = 2, pad = 0)) %>%
map_df(set_names, c("year", "week")) %>%
transmute(date = paste(year, week, sep = "")) %>%
arrange(date)
head(df)
#> # A tibble: 6 x 1
#> date
#> <chr>
#> 1 201401
#> 2 201402
#> 3 201403
#> 4 201404
#> 5 201405
#> 6 201406
Now let's append the 01 and convert to date:
df %>%
mutate(date = paste(date, "01", sep = ""),
new_date = as.Date(date, "%Y%U%w"))
#> # A tibble: 312 x 2
#> date new_date
#> <chr> <date>
#> 1 20140101 2014-01-05
#> 2 20140201 2014-01-12
#> 3 20140301 2014-01-19
#> 4 20140401 2014-01-26
#> 5 20140501 2014-02-02
#> 6 20140601 2014-02-09
#> 7 20140701 2014-02-16
#> 8 20140801 2014-02-23
#> 9 20140901 2014-03-02
#> 10 20141001 2014-03-09
#> # ... with 302 more rows
Created on 2019-10-10 by the reprex package (v0.3.0)
More info about a numeric week of the year can be found here.

Resources