Linear Regression Analysis - rolling across rows - r

I need a suggestion on how I get the results
of my regression analysis into an object.
I wan't to perform the regression analysis row wise and
with a window of 20 days.
The object Slope should save the results (slopes) of each days regressions analysis over the window.
#Loading Library
require(quantmod)
#Initiation of Example
mc_result <- matrix(sample(c(1:200)), ncol = 200, nrow =1)
mc_result1 <- matrix(sample(c(1:200)), ncol =200, nrow =1)
mc_result <- rbind(mc_result, mc_result1)
a <- c(1:200)
Slope <- matrix(ncol=2, nrow=181)
Caution this Loop that does not work.
The Loop should apply Rollapply row wise
and save the results for each day in the object Slope.
However, this is how the result should look like, but with changing Slope values. At the moment the Slope Value is stable and I don't know why.
for (i in 1:2) {
Slope[,i] <- rollapply(data =mc_result[i,], width=20,
FUN = function(z)
summary(lm(mc_result[i,] ~ a, data = as.data.frame(z)))$coefficients[2], by.column = FALSE)
}

I think what you want is the following (in your code none of mc_result[i,] or a is rolling over the indices in the data, that's why the linear regression coefficients are not changing, since you are training on the same dataset, only z is changing, you need to change the code to something like the following):
#Loading Library
require(quantmod)
#Initiation of Example
mc_result <- matrix(sample(c(1:200)), ncol = 200, nrow =1)
mc_result1 <- matrix(sample(c(1:200)), ncol =200, nrow =1)
mc_result <- rbind(mc_result, mc_result1)
a <- c(1:200)
Slope <- matrix(ncol=2, nrow=181)
for (i in 1:2) {
Slope[,i] <- rollapply(data = 1:200, width=20,
FUN = function(z) {
summary(lm(mc_result[i,z] ~ a[z]))$coefficients[2]
}, by.column = FALSE)
}
head(Slope)
[,1] [,2]
[1,] 1.3909774 2.0278195
[2,] 1.0315789 2.8421053
[3,] 1.5082707 2.8571429
[4,] 0.0481203 1.6917293
[5,] 0.2969925 0.2060150
[6,] 1.3526316 0.6842105

Related

loop over variable names

I am trying to build various regression models with different columns (independent variables in my dataset).
set.seed(0)
True = rnorm(20, 100, 10)
v = matrix(rnorm(120, 10, 3), nrow = 20)
dt = data.frame(cbind(True, v))
colnames(dt) = c('True', paste0('ABC', 1:6))
So the independent variables I want to throw in the data is "ABCi", aka when i=1, use ABC1, etc. Each model uses the first 80% of the observations to build, then I make a prediction on the rest 20%.
I tried this:
reg.pred = rep(0, ncol(dt))
for (i in 1:nrow(dt)){
reg = lm(True~paste0('ABC', i), data = dt[(1:(0.8*nrow(dt))),])
reg.pred[i] = predict(reg, data = dt[(0.8*nrow(dt)):nrow(dt),])
}
Not working... giving errors like:
Error in model.frame.default(formula = True ~ paste0("ABC", i), data = dt[(1:(0.8 * :
variable lengths differ (found for 'paste0("ABC", i)')
Not sure how can I retrieve the variable name in a loop... Any suggestion is appreciated!
You do not technically need to use as.formula() as #Sonny suggests, but you cannot mix a character representation of the formula and formula notation. So, you need to fix that. However, once you do, you'll notice that there are other issues with your code that #Sonny either did not notice or opted not to address.
Most notably, the line
reg.pred = rep(0, ncol(dt))
implies you want a single prediction from each model, but
predict(reg, data = dt[(0.8*nrow(dt)):nrow(dt),])
implies you want a prediction for each of the observations not in the training set (you'll need a +1 after 0.8*nrow(dt) for that by the way).
I think the following should fix all your issues:
set.seed(0)
True = rnorm(20, 100, 10)
v = matrix(rnorm(120, 10, 3), nrow = 20)
dt = data.frame(cbind(True, v))
colnames(dt) = c('True', paste0('ABC', 1:6))
# Make a matrix for the predicted values; each column is for a model
reg.pred = matrix(0, nrow = 0.2*nrow(dt), ncol = ncol(dt)-1)
for (i in 1:(ncol(dt)-1)){
# Get the name of the predictor we want here
this_predictor <- paste0("ABC", i)
# Make a character representation of the lm formula
lm_formula <- paste("True", this_predictor, sep = "~")
# Run the model
reg = lm(lm_formula, data = dt[(1:(0.8*nrow(dt))),])
# Get the appropriate test data
newdata <- data.frame(dt[(0.8*nrow(dt)+1):nrow(dt), this_predictor])
names(newdata) <- this_predictor
# Store predictions
reg.pred[ , i] = predict(reg, newdata = newdata)
}
reg.pred
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 100.2150 100.8394 100.7915 99.88836 97.89952 105.7201
# [2,] 101.2107 100.8937 100.9110 103.52487 102.13965 104.6283
# [3,] 100.0426 101.0345 101.2740 100.95785 102.60346 104.2823
# [4,] 101.1055 100.9686 101.5142 102.56364 101.56400 104.4447
In this matrix of predictions, each column is from a different model, and the rows correspond to the last four rows of your data (the rows not in your training set).
You can use as.formula
f <- as.formula(
paste("True",
paste0('ABC', i),
sep = " ~ "))
reg = lm(f, data = dt[(1:(0.8*nrow(dt))),])

Rolling Granger Causality

I am using MSBVAR package in R to calculate Granger causality between two variables. The data and commands are same as used in the package:
data(IsraelPalestineConflict)
granger.test(IsraelPalestineConflict, p=6)
It gives following results:
F-statistic p-value
p2i -> i2p 17.63100 0.000000e+00
i2p -> p2i 10.91235 7.134737e-12
I want to apply a loop/rollapply with this function and want to save the results in a file. I tried like this after watching past answers on rollapply but as i am new to R so don't know how to make it work.
rollapply(zoo(IsraelPalestineConflict),width=1275,
FUN = function(t)
{ t = granger.test(IsraelPalestineConflict, p=6);
},
by.column=FALSE, align="right")
But it gives the same results with first column replaced by years and i dont know how can i save the results of the F-statistics and P-values with rollapply.
F-statistic p-value
2003.8077 17.63100 0.000000e+00
2003.8269 10.91235 7.134737e-12
Kind answer is requested, please.
Perhaps you want this:
granger.test.c <- function(x) c(granger.test(x, p = 6))
rollapplyr(IsraelPalestineConflict, 1275, granger.test.c, by.column = FALSE )
This creates a list of the above for p = 2, 3, 4, 5:
granger.test.c <- function(x, p) c(granger.test(x, p = p))
p <- 2:5
roll <- function(p, DF) rollapplyr(DF, 1275, granger.test.c, by.column = FALSE, p = p )
L <- lapply(p, roll, DF = IsraelPalestineConflict)
names(L) <- p

How to extract the p.value and estimate from cor.test() in a data.frame?

In this example, I have temperatures values from 50 different sites, and I would like to correlate the Site1 with all the 50 sites. But I want to extract only the components "p.value" and "estimate" generated with the function cor.test() in a data.frame into two different columns.
I have done my attempt and it works, but I don't know how!
For that reason I would like to know how can I simplify my code, because the problem is that I have to run two times a Loop "for" to get my results.
Here is my example:
# Temperature data
data <- matrix(rnorm(500, 10:30, sd=5), nrow = 100, ncol = 50, byrow = TRUE,
dimnames = list(c(paste("Year", 1:100)),
c(paste("Site", 1:50))) )
# Empty data.frame
df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="")
# Extraction
for (i in 1:50) {
df1 <- cor.test(data[,1], data[,i] )
df[,2:3] <- df1[c("estimate", "p.value")]
}
for (i in 1:50) {
df1 <- cor.test(data[,1], data[,i] )
df[i,2:3] <- df1[c("estimate", "p.value")]
}
df
I will appreciate very much your help :)
I might offer up the following as well (masking the loops):
result <- do.call(rbind,lapply(2:50, function(x) {
cor.result<-cor.test(data[,1],data[,x])
pvalue <- cor.result$p.value
estimate <- cor.result$estimate
return(data.frame(pvalue = pvalue, estimate = estimate))
})
)
First of all, I'm guessing you had a typo in your code (you should have rnorm(5000 if you want unique values. Otherwise you're going to cycle through those 500 numbers 10 times.
Anyway, a simple way of doing this would be:
data <- matrix(rnorm(5000, 10:30, sd=5), nrow = 100, ncol = 50, byrow = TRUE,
dimnames = list(c(paste("Year", 1:100)),
c(paste("Site", 1:50))) )
# Empty data.frame
df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="")
estimates = numeric(50)
pvalues = numeric(50)
for (i in 1:50){
test <- cor.test(data[,1], data[,i])
estimates[i] = test$estimate
pvalues[i] = test$p.value
}
df$Estimate <- estimates
df$P.value <- pvalues
df
Edit: I believe your issue was is that in the line df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="") if you do typeof(df$Estimate), you see it's expecting an integer, and typeof(test$estimate) shows it spits out a double, so R doesn't know what you're trying to do with those two values. you can redo your code like thus:
df <- data.frame(label=paste("Site", 1:50), Estimate=numeric(50), P.value=numeric(50))
for (i in 1:50){
test <- cor.test(data[,1], data[,i])
df$Estimate[i] = test$estimate
df$P.value[i] = test$p.value
}
to make it a little more concise.
similar to the answer of colemand77:
create a cor function:
cor_fun <- function(x, y, method){
tmp <- cor.test(x, y, method= method)
cbind(r=tmp$estimate, p=tmp$p.value) }
apply through the data.frame. You can transpose the result to get p and r by row:
t(apply(data, 2, cor_fun, data[, 1], "spearman"))

p-value matrix of x and y variables from anova output

I have many X and Y variables (something like, 500 x 500). The following just small data:
yvars <- data.frame (Yv1 = rnorm(100, 5, 3), Y2 = rnorm (100, 6, 4),
Yv3 = rnorm (100, 14, 3))
xvars <- data.frame (Xv1 = sample (c(1,0, -1), 100, replace = T),
X2 = sample (c(1,0, -1), 100, replace = T),
Xv3 = sample (c(1,0, -1), 100, replace = T),
D = sample (c(1,0, -1), 100, replace = T))
I want to extact p-values and make a matrix like this:
Yv1 Y2 Yv3
Xv1
X2
Xv3
D
Here is my attempt to loop the process:
prob = NULL
anova.pmat <- function (x) {
mydata <- data.frame(yvar = yvars[, x], xvars)
for (i in seq(length(xvars))) {
prob[[i]] <- anova(lm(yvar ~ mydata[, i + 1],
data = mydata))$`Pr(>F)`[1]
}
}
sapply (yvars,anova.pmat)
Error in .subset(x, j) : only 0's may be mixed with negative subscripts
What could be the solution ?
Edit:
For the first Y variable:
For first Y variable:
prob <- NULL
mydata <- data.frame(yvar = yvars[, 1], xvars)
for (i in seq(length(xvars))) {
prob[[i]] <- anova(lm(yvar ~ mydata[, i + 1],
data = mydata))$`Pr(>F)`[1]
}
prob
[1] 0.4995179 0.4067040 0.4181571 0.6291167
Edit again:
for (j in seq(length (yvars))){
prob <- NULL
mydata <- data.frame(yvar = yvars[, j], xvars)
for (i in seq(length(xvars))) {
prob[[i]] <- anova(lm(yvar ~ mydata[, i + 1],
data = mydata))$`Pr(>F)`[1]
}
}
Gives the same result as above !!!
Here is an approach that uses plyr to loop over the columns of a dataframe (treating it as a list) for each of the xvars and yvars, returning the appropriate p-value, arranging it into a matrix. Adding the row/column names is just extra.
library("plyr")
probs <- laply(xvars, function(x) {
laply(yvars, function(y) {
anova(lm(y~x))$`Pr(>F)`[1]
})
})
rownames(probs) <- names(xvars)
colnames(probs) <- names(yvars)
Here is one solution, which consists in generating all combinations of Y- and X-variables to test (we cannot use combn) and run a linear model in each case:
dfrm <- data.frame(y=gl(ncol(yvars), ncol(xvars), labels=names(yvars)),
x=gl(ncol(xvars), 1, labels=names(xvars)), pval=NA)
## little helper function to create formula on the fly
fm <- function(x) as.formula(paste(unlist(x), collapse="~"))
## merge both datasets
full.df <- cbind.data.frame(yvars, xvars)
## apply our LM row-wise
dfrm$pval <- apply(dfrm[,1:2], 1,
function(x) anova(lm(fm(x), full.df))$`Pr(>F)`[1])
## arrange everything in a rectangular matrix of p-values
res <- matrix(dfrm$pval, nc=3, dimnames=list(levels(dfrm$x), levels(dfrm$y)))
Sidenote: With high-dimensional datasets, relying on the QR decomposition to compute the p-value of a linear regression is time-consuming. It is easier to compute the matrix of Pearson linear correlation for each pairwise comparisons, and transform the r statistic into a Fisher-Snedecor F using the relation F = νar2/(1-r2), where degrees of freedom are defined as νa=(n-2)-#{(xi=NA),(yi=NA)} (that is, (n-2) minus the number of pairwise missing values--if there're no missing values, this formula is the usual coefficient R2 in regression).

applying Paired t-test of column of two different matrix-R code

I have two matrices. I would like to apply a paired t test column by column and print the t-value, degrees of freedom, confidence interval and p value for each column. I started with the code below.
D1 and D2 are two matrices:
for (j in 1:n){
t.test(D1[,j],D2[,j],paired=T)
}
Also, how can I print the each result from this loop?
Here's how I'd approach the problem:
#Make some random data
m1 <- matrix(rnorm(100), ncol = 5)
m2 <- matrix(rnorm(100), ncol = 5)
#Define a function to run your t.test, grab the relevant stats, and put them in a data.frame
f <- function(x,y){
test <- t.test(x,y, paired=TRUE)
out <- data.frame(stat = test$statistic,
df = test$parameter,
pval = test$p.value,
conl = test$conf.int[1],
conh = test$conf.int[2]
)
return(out)
}
#iterate over your columns via sapply
sapply(seq(ncol(m1)), function(x) f(m1[,x], m2[,x]))
#-----
[,1] [,2] [,3] [,4] [,5]
stat -0.7317108 1.73474 -0.0658436 0.6252509 -0.6161323
df 19 19 19 19 19
pval 0.4732743 0.09898052 0.9481902 0.5392442 0.5451188
conl -1.097654 -0.1259523 -0.7284456 -0.5680937 -0.7523431
conh 0.5289878 1.345625 0.6840117 1.052094 0.4101385
You may want to transpose the output since it is column major ordered.

Resources