Rolling regression return multiple objects

Rolling regression return multiple objects - r

I am trying to build a rolling regression function based on the example here, but in addition to returning the predicted values, I would like to return the some rolling model diagnostics (i.e. coefficients, t-values, and mabye R^2). I would like the results to be returned in discrete objects based on the type of results. The example provided in the link above sucessfully creates thr rolling predictions, but I need some assistance packaging and writing out the rolling model diagnostics:
In the end, I would like the function to return three (3) objects:
Predictions
Coefficients
T values
R^2
Below is the code:
require(zoo)
require(dynlm)
## Create Some Dummy Data
set.seed(12345)
x <- rnorm(mean=3,sd=2,100)
y <- rep(NA,100)
y[1] <- x[1]
for(i in 2:100) y[i]=1+x[i-1]+0.5*y[i-1]+rnorm(1,0,0.5)
int <- 1:100
dummydata <- data.frame(int=int,x=x,y=y)
zoodata <- as.zoo(dummydata)
rolling.regression <- function(series) {
mod <- dynlm(formula = y ~ L(y) + L(x), data = as.zoo(series)) # get model
nextOb <- max(series[,'int'])+1 # To get the first row that follows the window
if (nextOb<=nrow(zoodata)) { # You won't predict the last one
# 1) Make Predictions
predicted <- predict(mod,newdata=data.frame(x=zoodata[nextOb,'x'],y=zoodata[nextOb,'y']))
attributes(predicted) <- NULL
c(predicted=predicted,square.res <-(predicted-zoodata[nextOb,'y'])^2)
# 2) Extract coefficients
#coefficients <- coef(mod)
# 3) Extract rolling coefficient t values
#tvalues <- ????(mod)
# 4) Extract rolling R^2
#rsq <-
}
}
rolling.window <- 20
results.z <- rollapply(zoodata, width=rolling.window, FUN=rolling.regression, by.column=F, align='right')
So after figuring out how to extract t values from model (i.e. mod) , what do I need to do to make the function return three (3) seperate objects (i.e. Predictions, Coefficients, and T-values)?
I am fairly new to R, really new to functions, and extreemly new to zoo, and I'm stuck.
Any assistance would be greatly appreciated.

I hope I got you correctly, but here is a small edit of your function:
rolling.regression <- function(series) {
mod <- dynlm(formula = y ~ L(y) + L(x), data = as.zoo(series)) # get model
nextOb <- max(series[,'int'])+1 # To get the first row that follows the window
if (nextOb<=nrow(zoodata)) { # You won't predict the last one
# 1) Make Predictions
predicted=predict(mod,newdata=data.frame(x=zoodata[nextOb,'x'],y=zoodata[nextOb,'y']))
attributes(predicted)<-NULL
#Solution 1; Quicker to write
# c(predicted=predicted,
# square.res=(predicted-zoodata[nextOb,'y'])^2,
# summary(mod)$coef[, 1],
# summary(mod)$coef[, 3],
# AdjR = summary(mod)$adj.r.squared)
#Solution 2; Get column names right
c(predicted=predicted,
square.res=(predicted-zoodata[nextOb,'y'])^2,
coef_intercept = summary(mod)$coef[1, 1],
coef_Ly = summary(mod)$coef[2, 1],
coef_Lx = summary(mod)$coef[3, 1],
tValue_intercept = summary(mod)$coef[1, 3],
tValue_Ly = summary(mod)$coef[2, 3],
tValue_Lx = summary(mod)$coef[3, 3],
AdjR = summary(mod)$adj.r.squared)
}
}
rolling.window <- 20
results.z <- rollapply(zoodata, width=rolling.window, FUN=rolling.regression, by.column=F, align='right')
head(results.z)
predicted square.res coef_intercept coef_Ly coef_Lx tValue_intercept tValue_Ly tValue_Lx AdjR
20 10.849344 0.721452 0.26596465 0.5798046 1.049594 0.38309211 7.977627 13.59831 0.9140886
21 12.978791 2.713053 0.26262820 0.5796883 1.039882 0.37741499 7.993014 13.80632 0.9190757
22 9.814676 11.719999 0.08050796 0.5964808 1.073941 0.12523824 8.888657 15.01353 0.9340732
23 5.616781 15.013297 0.05084124 0.5984748 1.077133 0.08964998 9.881614 16.48967 0.9509550
24 3.763645 6.976454 0.26466039 0.5788949 1.068493 0.51810115 11.558724 17.22875 0.9542983
25 9.433157 31.772658 0.38577698 0.5812665 1.034862 0.70969330 10.728395 16.88175 0.9511061
To see how it works, make a small example with a regression:
x <- rnorm(1000); y <- 2*x + rnorm(1000)
reg <- lm(y ~ x)
summary(reg)$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02694322 0.03035502 0.8876033 0.374968
x 1.97572544 0.03177346 62.1816310 0.000000
As you can see, calling summary first and then getting the coefficients of it (coef(summary(reg)) works as well) gives you a table with estimates, standard errors, and t-values. So estimates are saved in column 1 of that table, t-values in column 3. And that's how I obtain them in the updated rolling.regression function.
EDIT
I updated my solution; now it also contains the adjusted R2. If you just want the normal R2, get rid of the .adj.
EDIT 2
Quick and dirty hack how to name the columns:
rolling.regression <- function(series) {
mod <- dynlm(formula = y ~ L(y) + L(x), data = as.zoo(series)) # get model
nextOb <- max(series[,'int'])+1 # To get the first row that follows the window
if (nextOb<=nrow(zoodata)) { # You won't predict the last one
# 1) Make Predictions
predicted=predict(mod,newdata=data.frame(x=zoodata[nextOb,'x'],y=zoodata[nextOb,'y']))
attributes(predicted)<-NULL
#Get variable names
strVar <- c("Intercept", paste0("L", 1:(nrow(summary(mod)$coef)-1)))
vec <- c(predicted=predicted,
square.res=(predicted-zoodata[nextOb,'y'])^2,
AdjR = summary(mod)$adj.r.squared,
summary(mod)$coef[, 1],
summary(mod)$coef[, 3])
names(vec)[4:length(vec)] <- c(paste0("Coef_", strVar), paste0("tValue_", strVar))
vec
}
}

Related

What kind of formula is used to calculate the p-value in `t.test`?

So, just a touch of backstory. I've been learning biostatistics in the past 4-5 months in university, 6 months of biomathematics before that. I only started deep diving into programming around 5 days ago.
I've been trying to redo t.test() with my own function.
test2 = function(t,u){
T = (mean(t) - u) / ( sd(t) / sqrt(length(t)))
t1=round(T, digits=5)
df=length(t)
cat(paste('t - value =', t1,
'\n','df =', df-1,
'\n','Alternative hipotézis: a minta átlag nem egyenlő a hipotetikus átlaggal'))
}
I tried searching the formula for the p-value, I found one, but when I used it, my value was different from the one within the t.test.
The t-value and the df do match t.test().
I highly appreciate any help, thank you.
P.s: Don't worry about the last line, it's in Hungarian.

The p-value can be derived from the probability function of the t distribution pt. Using this and making the notation more common with sample x and population mean mu we can use something like:
test2 <- function(x, u){
t <- (mean(x) - u) / (sd(x) / sqrt(length(x)))
df <- length(x) - 1
cat('t-value =', t, ', df =', df, ', p =', 2 * (1 - pt(q=t, df=df)), '\n')
}
set.seed(123) # remove this for other random values
## random sample
x <- rnorm(10, mean=5.5)
## population mean
mu <- 5
## own function
test2(x, mu)
## one sample t-test from R
t.test(x, mu=mu)
We get for the own test2:
t-value = 1.905175 , df = 9, p = 0.08914715
and for R's t.test
One Sample t-test
data: x
t = 1.9052, df = 9, p-value = 0.08915
alternative hypothesis: true mean is not equal to 5
95 percent confidence interval:
4.892330 6.256922
sample estimates:
mean of x
5.574626

The definitive source of what R is doing is the source code. If you look at the source code for stats:::t.test.default (which you can get by typing stats:::t.test.default into the console, without parentheses at the end and hitting enter), you'll see that for a single-sample test like the one you're trying to do above, you would get the following:
nx <- length(x)
mx <- mean(x)
vx <- var(x)
df <- nx - 1
stderr <- sqrt(vx/nx)
tstat <- (mx - mu)/stderr
if (alternative == "less") {
pval <- pt(tstat, df)
}
else if (alternative == "greater") {
pval <- pt(tstat, df, lower.tail = FALSE)
}
else {
pval <- 2 * pt(-abs(tstat), df)
}
These are the relevant pieces (there's a lot more code in there, too).

How to use the replicate function in R to repeat the function

I have a problem when using replicate to repeat the function.
I tried to use the bootstrap to fit
a quadratic model using concentration as the predictor and Total_lignin as the response and going to report an estimate of the maximum with a corresponding standard error.
My idea is to create a function called bootFun that essentially did everything within one iteration of a for loop. bootFun took in only the data set the predictor, and the response to use (both variable names in quotes).
However, the SD is 0, not correct. I do not know where is the wrong place. Could you please help me with it?
# Load the libraries
library(dplyr)
library(tidyverse)
# Read the .csv and only use M.giganteus and S.ravennae.
dat <- read_csv('concentration.csv') %>%
filter(variety == 'M.giganteus' | variety == 'S.ravennae') %>%
arrange(variety)
# Check the data
head(dat)
# sample size
n <- nrow(dat)
# A function to do one iteration
bootFun <- function(dat, pred, resp){
# Draw the sample size from the dataset
sample <- sample_n(dat, n, replace = TRUE)
# A quadratic model fit
formula <- paste0('resp', '~', 'pred', '+', 'I(pred^2)')
fit <- lm(formula, data = sample)
# Derive the max of the value of concentration
max <- -fit$coefficients[2]/(2*fit$coefficients[3])
return(max)
}
max <- bootFun(dat = dat, pred = 'concentration', resp = 'Total_lignin' )
# Iterated times
N <- 5000
# Use 'replicate' function to do a loop
maxs <- replicate(N, max)
# An estimate of the max of predictor and corresponding SE
mean(maxs)
sd(maxs)

Base package boot, function boot, can ease the job of calling the bootstrap function repeatedly. The first argument must be the data set, the second argument is an indices argument, that the user does not set and other arguments can also be passed toit. In this case those other arguments are the predictor and the response names.
library(boot)
bootFun <- function(dat, indices, pred, resp){
# Draw the sample size from the dataset
dat.sample <- dat[indices, ]
# A quadratic model fit
formula <- paste0(resp, '~', pred, '+', 'I(', pred, '^2)')
formula <- as.formula(formula)
fit <- lm(formula, data = dat.sample)
# Derive the max of the value of concentration
max <- -fit$coefficients[2]/(2*fit$coefficients[3])
return(max)
}
N <- 5000
set.seed(1234) # Make the bootstrap results reproducible
results <- boot(dat, bootFun, R = N, pred = 'concentration', resp = 'Total_lignin')
results
#
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = dat, statistic = bootFun, R = N, pred = "concentration",
# resp = "Total_lignin")
#
#
#Bootstrap Statistics :
# original bias std. error
#t1* -0.4629808 -0.0004433889 0.03014259
#
results$t0 # this is the statistic, not bootstrapped
#concentration
# -0.4629808
mean(results$t) # bootstrap value
#[1] -0.4633233
Note that to fit a polynomial, function poly is much simpler than to explicitly write down the polynomial terms one by one.
formula <- paste0(resp, '~ poly(', pred, ',2, raw = TRUE)')
Check the distribution of the bootstrapped statistic.
op <- par(mfrow = c(1, 2))
hist(results$t)
qqnorm(results$t)
qqline(results$t)
par(op)
Test data
set.seed(2020) # Make the results reproducible
x <- cumsum(rnorm(100))
y <- x + x^2 + rnorm(100)
dat <- data.frame(concentration = x, Total_lignin = y)

Predict segmented lm outside of package

I have an array of outputs from hundreds of segmented linear models (made using the segmented package in R). I want to be able to use these outputs on new data, using the predict function. To be clear, I do not have the segmented linear model objects in my workspace; I just saved and reimported the relevant outputs (e.g. the coefficients and breakpoints). For this reason I can't simply use the predict.segmented function from the segmented package.
Below is a toy example based on this link that seems promising, but does not match the output of the predict.segmented function.
library(segmented)
set.seed(12)
xx <- 1:100
zz <- runif(100)
yy <- 2 + 1.5*pmax(xx-35,0) - 1.5*pmax(xx-70,0) +
15*pmax(zz-0.5,0) + rnorm(100,0,2)
dati <- data.frame(x=xx,y=yy,z=zz)
out.lm<-lm(y~x,data=dati)
o<-## S3 method for class 'lm':
segmented(out.lm,seg.Z=~x,psi=list(x=c(30,60)),
control=seg.control(display=FALSE))
# Note that coefficients with U in the name are differences in slopes, not slopes.
# Compare:
slope(o)
coef(o)[2] + coef(o)[3]
coef(o)[2] + coef(o)[3] + coef(o)[4]
# prediction
pred <- data.frame(x = 1:100)
pred$dummy1 <- pmax(pred$x - o$psi[1,2], 0)
pred$dummy2 <- pmax(pred$x - o$psi[2,2], 0)
pred$dummy3 <- I(pred$x > o$psi[1,2]) * (coef(o)[2] + coef(o)[3])
pred$dummy4 <- I(pred$x > o$psi[2,2]) * (coef(o)[2] + coef(o)[3] + coef(o)[4])
names(pred)[-1]<- names(model.frame(o))[-c(1,2)]
# compute the prediction, using standard predict function
# computing confidence intervals further
# suppose that the breakpoints are fixed
pred <- data.frame(pred, predict(o, newdata= pred,
interval="confidence"))
# Try prediction using the predict.segment version to compare
test <- predict.segmented(o)
plot(pred$fit, test, ylim = c(0, 100))
abline(0,1, col = "red")
# At least one segment not being predicted correctly?
Can I use the base r predict() function (not the segmented.predict() function) with the coefficients and break points saved from segmented linear models?
UPDATE
I figured out that the code above has issues (don't use it). Through some reverse engineering of the segmented.predict() function, I produced the design matrix and use that to predict values instead of directly using the predict() function. I do not consider this a full answer of the original question yet because predict() can also produce confidence intervals for the prediction, and I have not yet implemented that--question still open for someone to add confidence intervals.
library(segmented)
## Define function for making matrix of dummy variables (this is based on code from predict.segmented())
dummy.matrix <- function(x.values, x_names, psi.est = TRUE, nameU, nameV, diffSlope, est.psi) {
# This function creates a model matrix with dummy variables for a segmented lm with two breakpoints.
# Inputs:
# x.values: the x values of the segmented lm
# x_names: the name of the column of x values
# psi.est: this is legacy from the predict.segmented function, leave it set to 'TRUE'
# obj: the segmented lm object
# nameU: names (class character) of 3rd and 4th coef, which are "U1.x" "U2.x" for lm with two breaks. Example: names(c(obj$coef[3], obj$coef[4]))
# nameV: names (class character) of 5th and 6th coef, which are "psi1.x" "psi2.x" for lm with two breaks. Example: names(c(obj$coef[5], obj$coef[6]))
# diffSlope: the coefficients (class numeric) with the slope differences; called U1.x and U2.x for lm with two breaks. Example: c(o$coef[3], o$coef[4])
# est.psi: the estimated break points (class numeric); these are the estimated breakpoints from segmented.lm. Example: c(obj$psi[1,2], obj$psi[2,2])
#
n <- length(x.values)
k <- length(est.psi)
PSI <- matrix(rep(est.psi, rep(n, k)), ncol = k)
newZ <- matrix(x.values, nrow = n, ncol = k, byrow = FALSE)
dummy1 <- pmax(newZ - PSI, 0)
if (psi.est) {
V <- ifelse(newZ > PSI, -1, 0)
dummy2 <- if (k == 1)
V * diffSlope
else V %*% diag(diffSlope)
newd <- cbind(x.values, dummy1, dummy2)
colnames(newd) <- c(x_names, nameU, nameV)
} else {
newd <- cbind(x.values, dummy1)
colnames(newd) <- c(x_names, nameU)
}
# if (!x_names %in% names(coef(obj.seg)))
# newd <- newd[, -1, drop = FALSE]
return(newd)
}
## Test dummy matrix function----------------------------------------------
set.seed(12)
xx<-1:100
zz<-runif(100)
yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+15*pmax(zz-.5,0)+rnorm(100,0,2)
dati<-data.frame(x=xx,y=yy,z=zz)
out.lm<-lm(y~x,data=dati)
#1 segmented variable, 2 breakpoints: you have to specify starting values (vector) for psi:
o<-segmented(out.lm,seg.Z=~x,psi=c(30,60),
control=seg.control(display=FALSE))
slope(o)
plot.segmented(o)
summary(o)
# Test dummy matrix fn with the same dataset
newdata <- dati
nameU1 <- c("U1.x", "U2.x")
nameV1 <- c("psi1.x", "psi2.x")
diffSlope1 <- c(o$coef[3], o$coef[4])
est.psi1 <- c(o$psi[1,2], o$psi[2,2])
test <- dummy.matrix(x.values = newdata$x, x_names = "x", psi.est = TRUE,
nameU = nameU1, nameV = nameV1, diffSlope = diffSlope1, est.psi = est.psi1)
# Predict response variable using matrix multiplication
col1 <- matrix(1, nrow = dim(test)[1])
test <- cbind(col1, test) # Now test is the same as model.matrix(o)
predY <- coef(o) %*% t(test)
plot(predY[1,])
lines(predict.segmented(o), col = "blue") # good, predict.segmented gives same answer

Rolling regression and prediction with lm() and predict()

I need to apply lm() to an enlarging subset of my dataframe dat, while making prediction for the next observation. For example, I am doing:
fit model predict
---------- -------
dat[1:3, ] dat[4, ]
dat[1:4, ] dat[5, ]
. .
. .
dat[-1, ] dat[nrow(dat), ]
I know what I should do for a particular subset (related to this question: predict() and newdata - How does this work?). For example to predict the last row, I do
dat1 = dat[1:(nrow(dat)-1), ]
dat2 = dat[nrow(dat), ]
fit = lm(log(clicks) ~ log(v1) + log(v12), data=dat1)
predict.fit = predict(fit, newdata=dat2, se.fit=TRUE)
How can I do this automatically for all subsets, and potentially extract what I want into a table?
From fit, I'd need the summary(fit)$adj.r.squared;
From predict.fit I'd need predict.fit$fit value.
Thanks.

(Efficient) solution
This is what you can do:
p <- 3 ## number of parameters in lm()
n <- nrow(dat) - 1
## a function to return what you desire for subset dat[1:x, ]
bundle <- function(x) {
fit <- lm(log(clicks) ~ log(v1) + log(v12), data = dat, subset = 1:x, model = FALSE)
pred <- predict(fit, newdata = dat[x+1, ], se.fit = TRUE)
c(summary(fit)$adj.r.squared, pred$fit, pred$se.fit)
}
## rolling regression / prediction
result <- t(sapply(p:n, bundle))
colnames(result) <- c("adj.r2", "prediction", "se")
Note I have done several things inside the bundle function:
I have used subset argument for selecting a subset to fit
I have used model = FALSE to not save model frame hence we save workspace
Overall, there is no obvious loop, but sapply is used.
Fitting starts from p, the minimum number of data required to fit a model with p coefficients;
Fitting terminates at nrow(dat) - 1, as we at least need the final column for prediction.
Test
Example data (with 30 "observations")
dat <- data.frame(clicks = runif(30, 1, 100), v1 = runif(30, 1, 100),
v12 = runif(30, 1, 100))
Applying code above gives results (27 rows in total, truncated output for 5 rows)
adj.r2 prediction se
[1,] NaN 3.881068 NaN
[2,] 0.106592619 3.676821 0.7517040
[3,] 0.545993989 3.892931 0.2758347
[4,] 0.622612495 3.766101 0.1508270
[5,] 0.180462206 3.996344 0.2059014
The first column is the adjusted-R.squared value for fitted model, while the second column is the prediction. The first value for adj.r2 is NaN, because the first model we fit has 3 coefficients for 3 data points, hence no sensible statistics is available. The same happens to se as well, as the fitted line has no 0 residuals, so prediction is done without uncertainty.

I just made up some random data to use for this example. I'm calling the object data because that was what it was called in the question at the time that I wrote this solution (call it anything you like).
(Efficient) Solution
data <- data.frame(v1=rnorm(100),v2=rnorm(100),clicks=rnorm(100))
data1 = data[1:(nrow(data)-1), ]
data2 = data[nrow(data), ]
for(i in 3:nrow(data)){
nam <- paste("predict", i, sep = "")
nam1 <- paste("fit", i, sep = "")
nam2 <- paste("summary_fit", i, sep = "")
fit = lm(clicks ~ v1 + v2, data=data[1:i,])
tmp <- predict(fit, newdata=data2, se.fit=TRUE)
tmp1 <- fit
tmp2 <- summary(fit)
assign(nam, tmp)
assign(nam1, tmp1)
assign(nam2, tmp2)
}
All of the results you want will be stored in the data objects this creates.
For example:
> summary_fit10$r.squared
[1] 0.3087432
You mentioned in the comments that you'd like a table of results. You can programmatically create tables of results from the 3 types of output files like this:
rm(data,data1,data2,i,nam,nam1,nam2,fit,tmp,tmp1,tmp2)
frames <- ls()
frames.fit <- frames[1:98] #change index or use pattern matching as needed
frames.predict <- frames[99:196]
frames.sum <- frames[197:294]
fit.table <- data.frame(intercept=NA,v1=NA,v2=NA,sourcedf=NA)
for(i in 1:length(frames.fit)){
tmp <- get(frames.fit[i])
fit.table <- rbind(fit.table,c(tmp$coefficients[[1]],tmp$coefficients[[2]],tmp$coefficients[[3]],frames.fit[i]))
}
fit.table
> fit.table
intercept v1 v2 sourcedf
2 -0.0647017971121678 1.34929652763687 -0.300502017324518 fit10
3 -0.0401617893034109 -0.034750571912636 -0.0843076273486442 fit100
4 0.0132968863522573 1.31283604433593 -0.388846211083564 fit11
5 0.0315113918953643 1.31099122173898 -0.371130010135382 fit12
6 0.149582794027583 0.958692838785998 -0.299479715938493 fit13
7 0.00759688947362175 0.703525856001948 -0.297223988673322 fit14
8 0.219756240025917 0.631961979610744 -0.347851129205841 fit15
9 0.13389223748979 0.560583832333355 -0.276076134872669 fit16
10 0.147258022154645 0.581865844000838 -0.278212722024832 fit17
11 0.0592160359650468 0.469842498721747 -0.163187274356457 fit18
12 0.120640756525163 0.430051839741539 -0.201725012088506 fit19
13 0.101443924785995 0.34966728554219 -0.231560038360121 fit20
14 0.0416637001406594 0.472156988919337 -0.247684504074867 fit21
15 -0.0158319749710781 0.451944113682333 -0.171367482879835 fit22
16 -0.0337969739950376 0.423851304105399 -0.157905431162024 fit23
17 -0.109460218252207 0.32206642419212 -0.055331391802687 fit24
18 -0.100560410735971 0.335862465403716 -0.0609509815266072 fit25
19 -0.138175283219818 0.390418411384468 -0.0873106257144312 fit26
20 -0.106984355317733 0.391270279253722 -0.0560299858019556 fit27
21 -0.0740684978271464 0.385267011513678 -0.0548056844433894 fit28

Backtesting accuracy of regression model through rolling window regression with quantmod

I´ve been trying to backtest the predictability of a regression (trying to get one-step-ahead predictions) by implementing a rolling window regression and calculating and recording the difference between the estimation and the last available day, for each day in the past, in a column.
I tried to apply Christoph_J ´s answer at Rolling regression return multiple objects
There is no syntax error in the code. However, I´m not sure if there is a semantic error. Is the value in row i of the "predicted" column, the ex-ante prediction of the row i value of the OpCl column?
library(zoo)
library(dynlm)
library(quantmod)
sp <- getSymbols("^GSPC", auto.assign=FALSE)
sp$GSPC.Adjusted <- NULL
colnames(sp) <- gsub("^GSPC\\.","",colnames(sp))
sp$Number<-NA
sp$Number<-1:nrow(sp)
sp$OpCl <- OpCl(sp)
sp$ClHi <- HiCl(sp)
sp$LoCl <- LoCl(sp)
sp$LoHi <- LoHi(sp)
#### LAG
spLag <- lag(sp)
colnames(spLag) <- paste(colnames(sp),"lag",sep="")
sp <- na.omit(merge(sp, spLag))
### REGRESSION
f <- OpCl ~ Openlag + Highlag + OpCllag + ClHilag
OpClLM <- lm(f, data=sp)
#sp$OpClForecast <- NA
#sp$OpClForecast <- tail(fitted(OpClLM),1)
#####################################################
rolling.regression <- function(series) {
mod <- dynlm(formula = OpCl ~ L(Open) + L(High) + L(OpCl) + L(ClHi),
data = as.zoo(series))
nextOb <- min(series[,6])+1 # To get the first row that follows the window
if (nextOb<=nrow(sp)) { # You won't predict the last one
# 1) Make Predictions
predicted=predict(mod,newdata=data.frame(OpCl=sp[nextOb,'OpCl'],
Open=sp[nextOb,'Open'],High=sp[nextOb,'High'],
OpCl=sp[nextOb,'OpCl'], ClHi=sp[nextOb,'ClHi']))
attributes(predicted)<-NULL
#Solution ; Get column names right
c(predicted=predicted,
AdjR = summary(mod)$adj.r.squared)
}
}
rolling.window <- 300
results.sp <- rollapply(sp, width=rolling.window,
FUN=rolling.regression, by.column=F, align='right')
sp<-cbind(sp,results.sp)
View(sp)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Rolling regression return multiple objects - r

Related

What kind of formula is used to calculate the p-value in `t.test`?

How to use the replicate function in R to repeat the function

Predict segmented lm outside of package

Rolling regression and prediction with lm() and predict()

Backtesting accuracy of regression model through rolling window regression with quantmod

Categories

Resources