I want to calculate all moving averages in one statement rather than repeating myself. Is this possible using quantmod or does it require some clever use of tidyeval and/or purrr?
library(tidyquant)
library(quantmod)
library(zoo)
tibble(date = as.Date('2018-01-01') + days(1:100),
value = 100 + cumsum(rnorm(100))) %>%
tq_mutate(mutate_fun = rollapply, select = "value", width = 10, FUN = mean, col_rename = "rm10") %>%
tq_mutate(mutate_fun = rollapply, select = "value", width = 5, FUN = mean, col_rename = "rm5") %>%
gather(series, value, -date) %>%
ggplot(aes(date, value, color = series)) +
geom_line()
Here is a solution using data.table's new frollmean()-function
data.table v1.12.0 or higher required.
sample data
library( data.table )
set.seed(123)
dt <- data.table( date = as.Date('2018-01-01') + days(1:100),
value = 100 + cumsum(rnorm(100)))
code
#set windwos you want to roll on
windows <- c(5,10)
#create a rm+window column for each roll
dt[, ( paste0( "rm", windows ) ) := lapply( windows, function(x) frollmean( value, x)) ]
output
head( dt, 15 )
# date value rm5 rm10
# 1: 2018-01-02 99.43952 NA NA
# 2: 2018-01-03 99.20935 NA NA
# 3: 2018-01-04 100.76806 NA NA
# 4: 2018-01-05 100.83856 NA NA
# 5: 2018-01-06 100.96785 100.2447 NA
# 6: 2018-01-07 102.68292 100.8933 NA
# 7: 2018-01-08 103.14383 101.6802 NA
# 8: 2018-01-09 101.87877 101.9024 NA
# 9: 2018-01-10 101.19192 101.9731 NA
# 10: 2018-01-11 100.74626 101.9287 101.0867
# 11: 2018-01-12 101.97034 101.7862 101.3398
# 12: 2018-01-13 102.33015 101.6235 101.6519
# 13: 2018-01-14 102.73092 101.7939 101.8482
# 14: 2018-01-15 102.84161 102.1239 102.0485
# 15: 2018-01-16 102.28577 102.4318 102.1802
plot
#plot molten data
library(ggplot2)
ggplot( data = melt(dt, id.vars = c("date") ),
aes(x = date, y = value, colour = variable)) +
geom_line()
update - grouped data
library(data.table)
library(ggplot2)
set.seed(123)
#changed the sample data a bit, to get different values for grp=1 and grp=2
dt <- data.table(grp = rep(1:2, each = 100), date = rep(as.Date('2018-01-01') + days(1:100), 2), value = 100 + cumsum(rnorm(200)))
dt[, ( paste0( "rm", windows ) ) := lapply( windows, function(x) frollmean( value, x)), by = "grp" ]
ggplot( data = melt(dt, id.vars = c("date", "grp") ),
aes(x = date, y = value, colour = variable)) +
geom_line() +
facet_wrap(~grp, nrow = 1)
In this example I use the AAPL adjusted close price downloaded using the getSymbols function from quantmod
lets say you want the SMAs with the following lengths:
smaLength = c(30,35,40,46,53,61,70,81,93)
Now create the SMA like so:
lapply(smaLength,function(x) SMA(AAPL$AAPL.Adjusted,x)) %>% do.call(cbind,.) %>% tail()
result:
SMA SMA.1 SMA.2 SMA.3 SMA.4 SMA.5 SMA.6 SMA.7 SMA.8
2019-03-04 167.3703 165.2570 163.3706 162.1362 161.5904 162.9735 164.7770 169.3341 175.4143
2019-03-05 168.0162 165.9396 164.0682 162.5499 161.7934 162.8342 164.6408 168.9595 174.9418
2019-03-06 168.7454 166.6585 164.7488 162.9638 162.0062 162.8110 164.6165 168.6446 174.5135
2019-03-07 169.3866 167.2323 165.3086 163.3320 162.1409 162.7868 164.5661 168.2780 174.0284
2019-03-08 170.0820 167.7646 165.8150 163.6764 162.3807 162.8711 164.5855 167.8407 173.5334
2019-03-11 170.8092 168.4419 166.4589 164.1471 162.8097 163.0354 164.6573 167.4864 173.0806
Define the input and then lapply over the widths creating a rollmean for each one merging them together. Finally plot it.
library(ggplot2)
library(magrittr)
library(zoo)
set.seed(123)
w <- c(1, 5, 10)
zoo(100 * cumsum(rnorm(100)), as.Date("2018-01-01") + 1:100) %>%
lapply(w, rollmeanr, x = .) %>%
do.call("merge", .) %>%
setNames(w) %>%
autoplot(facet = NULL)
Related
I have a set of x,y data (10,000). These data points are to be partitioned along the x-axis into non-overlapping bins of 10 data points each. From this, I need a new dataset, such that x = mean of these 10 data, y = maximum of these 10 data. The final data set should be 1000 sets of x,y. sample
Sample in Excel. I want to perform this task in R
In tidyverse:
library(tidyverse)
df %>%
arrange(x) %>%
group_by(grp = gl(n(), 10, n())) %>%
summarise(x = mean(x), y = max(y))
In Base R
n <- nrow(df)
do.call(rbind.data.frame, by(df[order(df$x),], gl(n, 10, n),
function(x) cbind(x = mean(x$x), y = max(x$y))))
I created some sample data as you did not provide those.
I use the library data.table but you could do similar in dplyr or base.
library(data.table)
dt <- data.table(
x = sample(40:50, 50, replace = T),
y = sample(1000:3000, 50)
)
dt[, grp := gl(.N, 10, .N)] # edit based on Onyambu's solution
dt[, .(x_avg = mean(x), y_max = max(y)), by = grp]
# grp x_avg y_max
# 1: 1 44.7 2765
# 2: 2 45.3 2861
# 3: 3 44.7 2831
# 4: 4 46.2 2947
# 5: 5 46.7 2684
I was trying to add one day to the to-variable - but only if from is not missing:
df <- data.frame(from = c("2020-01-01", "2020-02-01", ""),
to = c("2020-01-05", "2020-02-20", "2020-03-04"))
df <- df %>% mutate(
from = as.Date(from),
to = as.Date(to),
to = ifelse(!is.na(from), to + 1, to)
)
df
Obviously, this doesn't work :(. Can anyone tell me how to do it?
Try this. The function ifelse() uses to transform dates to numbers. Instead you can use if_else() from dplyr. Here the code:
#Data
df <- data.frame(from = c("2020-01-01", "2020-02-01", ""),
to = c("2020-01-05", "2020-02-20", "2020-03-04"))
#Variables
df <- df %>% mutate(
from = as.Date(from),
to = as.Date(to),
to = if_else(!is.na(from), to + 1, to)
)
#Output
df
Output:
df
from to
1 2020-01-01 2020-01-06
2 2020-02-01 2020-02-21
3 <NA> 2020-03-04
We can also do
library(lubridate)
library(dplyr)
df %>%
mutate(across(everything(), ymd),
to = case_when(!is.na(from)~ to+1, TRUE ~ to))
-output
# from to
#1 2020-01-01 2020-01-06
#2 2020-02-01 2020-02-21
#3 <NA> 2020-03-04
I have the following dataset dt
Y date segment
10 2019-11-11 1
12 2019-11-12 1
9 2019-11-13 1
...
..
14 2019-12-15 5
12 2019-12-16 5
10 2019-12-17 5
I want to build an autoregressive model such that
Y(segment, dat)_{t} = beta1*Y(segment,dat)_{t-1} + beta2*Y(segment,dat)_{t-2}...
while I have to problems with just one segment as I would do something like this:
library(dynlm)
Y <- dt$Y
AR2 <- dynlm(ts(Y) ~ L(ts(Y)) + L(ts(Y), 2) )
I am not sure how to do with multiple segments at the same time
The simplest approach is to use lm() like this:
library(tidyverse)
dt <- tibble(
Y = sample(1:50, 200, replace=TRUE),
date = rep(seq(as.Date("2019-11-11"), by="1 day", length=40),5),
segment = rep(1:5, rep(40, 5))
)
dt <- dt %>%
arrange(segment, date) %>%
group_by(segment) %>%
mutate(
Y1 = dplyr::lag(Y,1),
Y2 = dplyr::lag(Y,2)
) %>%
ungroup()
fit <- lm(Y ~ Y1 + Y2, data=dt)
Created on 2020-08-27 by the reprex package (v0.3.0)
I have a list (bbb) with 5 elements in it, i.e., each element for a year, like 2010, 2011, ... , 2014:
The first one in the list is this:
> bbb[1]
$`2010`
Date Average
X2010.01.01 2010-01-01 2.079090e-03
X2010.01.02 2010-01-02 5.147627e-04
X2010.01.03 2010-01-03 2.997464e-04
X2010.01.04 2010-01-04 1.375538e-04
X2010.01.05 2010-01-05 1.332109e-04
The second one in the list is this:
> bbb[2]
$`2011`
Date Average
X2011.01.01 2011-01-01 1.546253e-03
X2011.01.02 2011-01-02 1.152864e-03
X2011.01.03 2011-01-03 1.752446e-03
X2011.01.04 2011-01-04 2.639658e-03
X2011.01.05 2011-01-05 5.231150e-03
X2011.01.06 2011-01-06 8.909878e-04
And so on.
Here is my question:
How can I save all of these list's elements in 1 sheet of an Excel file to have something like this:
Your help would be highly appreciated.
You can do this using dcast.
bbb <- list(`2010` = data.frame(date = as.Date("2010-01-01") + 0:4,
avg = 1:5),
`2011` = data.frame(date = as.Date("2011-01-01") + 0:5,
avg = 11:16),
`2012` = data.frame(date = as.Date("2012-01-01") + 0:9,
avg = 21:30),
`2013` = data.frame(date = as.Date("2013-01-01") + 0:7,
avg = 21:28))
df <- do.call("rbind", bbb)
df$year <- format(df$date, format = "%Y")
df$month_date <- format(df$date, format = "%b-%d")
library(data.table)
library(openxlsx)
df_dcast <- dcast(df, month_date~year, value.var = "avg")
write.xlsx(df_dcast, "example1.xlsx")
Or using spread
library(dplyr)
library(tidyr)
df2 <- df %>%
select(-date) %>%
spread(key = year, value = avg)
write.xlsx(df2, "example2.xlsx")
This isn't very pretty, but it's the best I could think of right now. But you could take the dataframes and loop through the list, joining them by date like this:
library(tidyverse)
library(lubridate)
bbb <- list(`2010` = tibble(date = c('01-01-2010', '01-02-2010', '01-03-2010', '01-04-2010', '01-05-2010'),
average = 11:15),
`2011` = tibble(date = c('01-01-2011', '01-02-2011', '01-03-2011', '01-04-2011', '01-05-2011'),
average = 1:5),
`2012` = tibble(date = c('01-01-2012', '01-02-2012', '01-03-2012', '01-04-2012', '01-05-2012'),
average = 6:10))
for (i in seq_along(bbb)) {
if(i == 1){
df <- bbb[[i]] %>%
mutate(
date = paste(day(as.Date(date, format = '%m-%d-%Y')),
month(as.Date(date, format = '%m-%d-%Y'), label = TRUE),
sep = '-')
)
colnames(df) <- c('date', names(bbb[i])) # Assuming your list of dataframes has just 2 columns: date and average
} else {
join_df <- bbb[[i]] %>%
mutate(
date = paste(day(as.Date(date, format = '%m-%d-%Y')),
month(as.Date(date, format = '%m-%d-%Y'), label = TRUE),
sep = '-')
)
colnames(join_df) <- c('date', names(bbb[i]))
df <- full_join(df, join_df, by = 'date')
}
}
This loops through the list of dataframes and reformats the dates to Day-Month.
# A tibble: 5 x 4
date `2010` `2011` `2012`
<chr> <int> <int> <int>
1 1-Jan 11 1 6
2 2-Jan 12 2 7
3 3-Jan 13 3 8
4 4-Jan 14 4 9
5 5-Jan 15 5 10
You could then write that out with the writexl package function write_xlsx
library(tidyverse)
set.seed(1)
graph.data <- tibble(cal.date = as.Date(40100:40129, origin = "1899-12-30"),
random_num = rnorm(30, 8, 5))
This is the data frame we're working with here.
# A tibble: 30 x 2
cal.date random_num
<date> <dbl>
1 2009-10-14 4.87
2 2009-10-15 8.92
3 2009-10-16 3.82
4 2009-10-17 16.0
5 2009-10-18 9.65
6 2009-10-19 3.90
7 2009-10-20 10.4
8 2009-10-21 11.7
9 2009-10-22 10.9
10 2009-10-23 6.47
# ... with 20 more rows
I'm trying to nest(sp? lexical scope) two functions, which I call child_function and parent_function.
child_function <- function(df, variable, hor.line = 6) {
variable <- enquo(variable)
df <- mutate(mutation = 2 * !!variable, horizontal.line = hor.line)
}
parent_function <- function(df, date, variable, hor.line = 6) {
date <- enquo(date)
variable <- enquo(variable)
hor.line <- enquo(hor.line)
df <- child_function(df, !!variable, !!hor.line) %>% print()
p <- ggplot(df, aes(date, mutation)) +
geom_point() +
geom_hline(aes(yintercept = !!hor.line))
p
}
When I test it all out with the line below I get "Error in !variable : invalid argument type".
parent_function(graph.data, date = cal.date, variable = random_num, hor.line=8)
I imagine I'm not using the proper dplyr tidyeval syntax. What's wrong with my functions?
Needed a bit of a cleanup, but now it should work:
library(tidyverse)
set.seed(1)
graph.data <- tibble(cal.date = as.Date(40100:40129, origin = "1899-12-30"),
random_num = rnorm(30, 8, 5))
child_function <- function(df, variable, hor.line = 6) {
variable <- enquo(variable)
df <- mutate(df, mutation := 2 * !! variable, horizontal.line := hor.line)
}
parent_function <- function(df, date, variable, hor.line = 6) {
date <- enquo(date)
variable <- enquo(variable)
df <- child_function(df, !! variable, hor.line) %>% print()
p <- ggplot(df, aes(!! date, mutation)) +
geom_point() +
geom_hline(aes(yintercept = hor.line))
p
}
parent_function(graph.data, date = cal.date, variable = random_num, hor.line=8)
I think the main issue was that sometimes you put !! or enquo where there was no need and vice versa.