How to create and populate dummy rows in tidyverse? - r

I am working with some monthly data and I would like to convert it to daily data by creating and populating some dummy rows, as the question suggests.
For example, say I have the following data:
date index
2013-04-30 232
2013-05-31 232
2013-06-30 233
Is there an "easy" way, preferably through tidyverse, that I could convert the above data into daily data, assuming I keep the index constant throughout the month? For example, I would like to create another 29 rows for April, ranging from 2013-04-01 to 2013-04-29 with the index of the last day of the month which would be 232 for April. The same should be applied to the rest of months (I have more data than just those three months).
Any intuitive suggestions will be greatly appreciated :)

Using complete and fill from tidyr you could do:
dat <- structure(list(
date = structure(c(15825, 15856, 15886), class = "Date"),
index = c(232L, 232L, 233L)
), class = "data.frame", row.names = c(
NA,
-3L
))
library(tidyr)
dat |>
complete(date = seq(as.Date("2013-04-01"), as.Date("2013-06-30"), "day")) |>
fill(index, .direction = "up")
#> # A tibble: 91 × 2
#> date index
#> <date> <int>
#> 1 2013-04-01 232
#> 2 2013-04-02 232
#> 3 2013-04-03 232
#> 4 2013-04-04 232
#> 5 2013-04-05 232
#> 6 2013-04-06 232
#> 7 2013-04-07 232
#> 8 2013-04-08 232
#> 9 2013-04-09 232
#> 10 2013-04-10 232
#> # … with 81 more rows

Related

Set up data in order to use Prophet() in R

I want to use the Prophet() function in R, but I cannot transform my column "YearWeek" to a as.Date() column.
I have a column "YearWeek" that stores values from 201401 up to 201937 i.e. starting in 2014 week 1 up to 2019 week 37.
I don't know how to declare this column as a date in the form yyyy-ww needed to use the Prophet() function.
Does anyone know how to do this?
Thank you in advance.
One solution could be to append a 01 to the end of your yyyy-ww formatted dates.
Data:
library(tidyverse)
df <- cross2(2014:2019, str_pad(1:52, width = 2, pad = 0)) %>%
map_df(set_names, c("year", "week")) %>%
transmute(date = paste(year, week, sep = "")) %>%
arrange(date)
head(df)
#> # A tibble: 6 x 1
#> date
#> <chr>
#> 1 201401
#> 2 201402
#> 3 201403
#> 4 201404
#> 5 201405
#> 6 201406
Now let's append the 01 and convert to date:
df %>%
mutate(date = paste(date, "01", sep = ""),
new_date = as.Date(date, "%Y%U%w"))
#> # A tibble: 312 x 2
#> date new_date
#> <chr> <date>
#> 1 20140101 2014-01-05
#> 2 20140201 2014-01-12
#> 3 20140301 2014-01-19
#> 4 20140401 2014-01-26
#> 5 20140501 2014-02-02
#> 6 20140601 2014-02-09
#> 7 20140701 2014-02-16
#> 8 20140801 2014-02-23
#> 9 20140901 2014-03-02
#> 10 20141001 2014-03-09
#> # ... with 302 more rows
Created on 2019-10-10 by the reprex package (v0.3.0)
More info about a numeric week of the year can be found here.

How to assign day of year values starting from an arbitary date and take care of missing values?

I have an R dataframe df_demand with a date column (depdate) and a dependent variable column bookings. The duration is 365 days starting from 2017-11-02 and ending at 2018-11-01, sorted in ascending order.
We have booking data for only 279 days in the year.
dplyr::arrange(df_demand, depdate)
depdate bookings
1 2017-11-02 43
2 2017-11-03 27
3 2017-11-05 27
4 2017-11-06 22
5 2017-11-07 39
6 2017-11-08 48
.
.
279 2018-11-01 60
I want to introduce another column day_of_year in the following way:
depdate day_of_year bookings
1 2017-11-02 1 43
2 2017-11-03 2 27
3 2017-11-04 3 NA
4 2017-11-05 4 27
.
.
.
365 2018-11-01 365 60
I am trying to find the best possible way to do this.
In Python, I could use something like :
df_demand['day_of_year'] = df_demand['depdate'].sub(df_demand['depdate'].iat[0]).dt.days + 1
I wanted to know about an R equivalent of the same.
When I run
typeof(df_demand_2$depdate)
the output is
"double"
Am I missing something?
You can create a row for every date using the complete function from the tidyr package.
First, I'm creating a data frame with some sample data:
df <- data.frame(
depdate = as.Date(c('2017-11-02', '2017-11-03', '2017-11-05')),
bookings = c(43, 27, 27)
)
Next, I'm performing two operations. First, using tidyr::complete, I'm specifying all the dates I want in my analysis. I can do that using seq.Date, creating a sequence from the first to the last day.
Once that is done, the day_of_year column is simply equal to the row number.
df_complete <- tidyr::complete(df,
depdate = seq.Date(from = min(df$depdate), to = max(df$depdate), by = 1)
)
df_complete$day_of_year <- 1:nrow(df_complete)
> df_complete
#> # A tibble: 4 x 3
#> depdate bookings day_of_year
#> <date> <dbl> <int>
#> 1 2017-11-02 43 1
#> 2 2017-11-03 27 2
#> 3 2017-11-04 NA 3
#> 4 2017-11-05 27 4
An equivalent solution with the pipe operator from dplyr:
df %>%
complete(depdate = seq.Date(from = min(df$depdate), to = max(df$depdate), by = 1)) %>%
mutate(days_of_year = row_number())

R- create dataset by removing duplicates based on a condition - filter

I have a data frame where for each day, I have several prices.
I would like to modify my data frame with the following code :
newdf <- Data %>%
filter(
if (Data$Date == Data$Echeance) {
Data$Close == lag(Data$Close,1)
} else {
Data$Close == Data$Close
}
)
However, it is not giving me what I want, that is :
create a new data frame where the variable Close takes its normal value, unless the day of Date is equal to the day of Echeance. In this case, take the following Close value.
I added filter because I wanted to remove the duplicate dates, and keep only one date per day where Close satisfies the condition above.
There is no error message, it just doesn't give me the right database.
Here is a glimpse of my data:
Date Echeance Compens. Open Haut Bas Close
1 1998-03-27 00:00:00 1998-09-10 00:00:00 125. 828 828 820 820. 197
2 1998-03-27 00:00:00 1998-11-10 00:00:00 128. 847 847 842 842. 124
3 1998-03-27 00:00:00 1999-01-11 00:00:00 131. 858 858 858 858. 2
4 1998-03-30 00:00:00 1998-09-10 00:00:00 125. 821 821 820 820. 38
5 1998-03-30 00:00:00 1998-11-10 00:00:00 129. 843 843 843 843. 1
6 1998-03-30 00:00:00 1999-01-11 00:00:00 131. 860 860 860 860. 5
Thanks a lot in advance.
Sounds like a use case for ifelse, with dplyr:
library(dplyr)
Data %>%
mutate(Close = ifelse(Date==Echeance, lead(Close,1), Close))
Here an example:
dat %>%
mutate(var_new = ifelse(date1==date2, lead(var,1), var))
# A tibble: 3 x 4
# date1 date2 var var_new
# <date> <date> <int> <int>
# 1 2018-03-27 2018-03-27 10 11
# 2 2018-03-28 2018-01-01 11 11
# 3 2018-03-29 2018-02-01 12 12
The function lead will move the vector by 1 position. Also note that I created a var_new just to show the difference, but you can mutate directly var.
Data used:
dat <- tibble(date1 = seq(from=as.Date("2018-03-27"), to=as.Date("2018-03-29"), by="day"),
date2 = c(as.Date("2018-03-27"), as.Date("2018-01-01"), as.Date("2018-02-01")),
var = 10:12)
dat
# A tibble: 3 x 3
# date1 date2 var
# <date> <date> <int>
# 1 2018-03-27 2018-03-27 10
# 2 2018-03-28 2018-01-01 11
# 3 2018-03-29 2018-02-01 12

Convert weekly Data frame to monthly data frame in R

My dat looks like below with different node_desc having weekly data for 4 years
ID1 ID2 DATE_ value
1: 00001 436 2014-06-29 175.8164
2: 00001 436 2014-07-06 188.9264
3: 00001 436 2014-07-13 167.5376
4: 00001 436 2014-07-20 160.7907
5: 00001 436 2014-07-27 185.3018
6: 00001 436 2014-08-03 179.5748
would like to convert data frame to monthly.Trying below code
df %>%
tq_transmute(select = c(value,ID1),
mutate_fun = apply.monthly,
FUN = mean)
But my output looks like below
DATE_ value
<dttm> <dbl>
1 2014-06-29 00:00:00 144.
2 2014-07-27 00:00:00 143.
3 2014-08-31 00:00:00 143.
4 2014-09-28 00:00:00 152.
5 2014-10-26 00:00:00 156.
6 2014-11-30 00:00:00 166.
But I would like to have ID1,ID2,Date(monthly) and value(either getting the mean or max of 4 weeks) instead of just having date and value,because I have data of different ID1's for 4 years.Can someone help me in R
Here's my take
dta <- data.frame(id1=rep("00001",6),id2=rep("436",6),
date_=as.Date(c("29jun2014","6jul2014","13jul2014","20jul2014","27jul2014","3aug2014"),"%d%B%Y"),
value=c(175.8164,188.9264,167.5376,160.7907,185.3018,179.5748))
And dplyr would do the rest. Here I summarize the data by taking the mean value
library(dplyr)
my_dta <- dta %>% mutate(month_=format(as.yearmon(date_),"%b"))
my_dta %>% group_by(.dots=c("id1","id2")) %>% summarise(mvalue=mean(value))
The problem you have is that your dataset doesn't have daily data. The apply.monthly function comes from xts, but tidyquant uses wrappers around a lot of functions so they work in a more tidy way. apply.monthly needs an xts object, which is basicly a matrix with a time index.
Also know that apply.monthly returns the last available day of the month in your timeseries. Looking at your example set, the last day it returns for july 2017 will the 27th. Now if you have 5 records (weeks) in a month the mean function will do this over 5 records. It will never be exactly 1 month as weekly data never covers monthly data.
But with tidyquant you can get sort of a monthly result with ID1 and ID2 with your data if you join the outcome with the original data. See code below. I haven't removed any unwanted columns.
df1 %>%
tq_transmute(select = c(value, ID1),
mutate_fun = apply.monthly,
FUN = mean) %>%
mutate(DATE_ = as.Date(DATE_)) %>%
inner_join(df1, by = "DATE_")
# A tibble: 3 x 5
DATE_ value.x ID1 ID2 value.y
<date> <dbl> <fct> <fct> <dbl>
1 2014-06-29 176. 00001 436 176.
2 2014-07-27 176. 00001 436 185.
3 2014-08-03 180. 00001 436 180.
data:
df1 <- data.frame(ID1 = rep("00001", 6),
ID2 = rep("436", 6),
DATE_ = as.Date(c("2014-06-29", "2014-07-06", "2014-07-13", "2014-07-20", "2014-07-27", "2014-08-03")),
value = c(175.8164,188.9264,167.5376,160.7907,185.3018,179.5748)
)

How to replace non numerical values in a dataset with R

I have a dataset who looks like this:
Date Electricity
janv-90 23
juin-90 24
juil-90 34
janv-91 42
juin-91 27
juil-91 13
But I want it looking like that:
Date Electricity
190 23
690 24
790 34
191 42
691 27
791 13
Note that my dataset goes from 90 to 10 (namely 1990 to 2010).
since your monts were in French, found a little long route, else we already have month names as constants in R like in month.abb or month.names
# first I create a look-up vector
month.abb.french <- c("janv", "fevr", "mars", "avril",
"mai", "juin", "juil", "aout", "sept",
"oct", "nov", "dec")
# extract the months
month <- unlist(strsplit(df$Date, "-"))[c(TRUE, FALSE)]
# similarily extract the years
year <- unlist(strsplit(df$Date, "-"))[c(FALSE, TRUE)]
# month
#[1] "janv" "juin" "juil" "janv" "juin" "juil"
# year
#[1] "90" "90" "90" "91" "91" "91"
df$newcol <- paste0(match(month, month.abb.french), year)
# Date Electricity newcol
#1: janv-90 23 190
#2: juin-90 24 690
#3: juil-90 34 790
#4: janv-91 42 191
#5: juin-91 27 691
#6: juil-91 13 791
We can just use match, substr and paste to get the expected output
df$Date <- as.numeric(paste0(match(substr(df$Date, 1, 4), month.abb), substring(df$Date, 6)))
df
# Date Electricity
# 1 190 23
# 2 690 24
# 3 790 34
# 4 191 42
# 5 691 27
# 6 791 13
Or using tidyverse by separating the 'Date' column into two columns ('Date' and 'val') by the - delimiter, then match the 'Date' with the mon_ab from the locale() and finally unite the 'Date' and 'val' columns together
library(dplyr)
library(tidyr)
library(readr)
separate(df, Date, into = c("Date", "val")) %>%
mutate(Date = match(Date, sub("\\.$", "", locale("fr")[[1]]$mon_ab))) %>%
unite(Date, Date, val, sep="")
# Date Electricity
#1 190 23
#2 690 24
#3 790 34
#4 191 42
#5 691 27
#6 791 13
data
df <- structure(list(Date = c("janv-90", "juin-90", "juil-90", "janv-91",
"juin-91", "juil-91"), Electricity = c(23L, 24L, 34L, 42L, 27L,
13L)), .Names = c("Date", "Electricity"), class = "data.frame", row.names = c(NA,
-6L))

Resources