How to create a loop for a linear model in R - r

I am here to ask your help.
I have to run a series of OLS regression on multiple depended variable using the same set for the independent ones.
I.e. I have a dataframe of size (1510x5), in particular each one represent the return of a portfolio, and I would like to regress it agains the same set of dependent variable (1510x4), which in my case are the factors from the Carhart model. Since, beside the value for the coefficients, I am interested in both their P-value and on the R2 of the regression, is there a way to build a loop that allows me to store the information?
What I have tried so far is:
for (i in 1:ncol(EW_Portfolio)) {
lmfit <- lm(EW_Portfolio[, i] ~ FFM)
summary(lmfit_i)
}
in the hope that, every time the loop repeated itself, I could see the result of each individual regression.

The easiest would be to store it in a list:
resultsList <- list()
for (i in 1:ncol(EW_Portfolio)) {
lmfit <- lm(EW_Portfolio[, i] ~ FFM)
resultsList[[i]] <- summary(lmfit_i)
}
You can then access the results you mention:
resultsList[[1]]$coefficients
resultsList[[1]]$r.squared

it may be something like, couldn't sure about the p.values
data("mtcars")
formulas <- list(
mpg ~ disp,
mpg ~ disp + wt
)
res <- vector("list", length = length(formulas))
my.r2 <- vector("list", length = length(formulas))
my.sum <- vector("list", length = length(formulas))
for(i in seq_along(formulas)){
res[[i]] <- lm(formulas[[i]], data = mtcars)
my.r2[[i]] <- (summary(res[[i]]))$adj.r.squared
my.sum[[i]] <- (summary(res[[i]]))
}
res
unlist(my.r2)
my.sum
lapply(formulas, lm, data = mtcars)

Related

R - Loop Need Assistance with writing a loop and storing the results in a data frame in R

I have an enormous data set. I want to sample it K times, run a linear regression and extract the RMSE each time to store in a data frame.
pseudo code:
rmse <- emptyDataFrame{}
for (i in 1:100)
sample_n(df, n, replace=True)
model <- lm(y ~ ., data = df)
rmse <- sqrt(mean(y_pred - y)^2))
Can anyone give me the missing details?
You can add a [i] after rmse to store the values in different indexes. Also, i don't know how your function sample_n works, but perhaps you need to save its output in a new variable to pass it to lm.
Also, the formula for RMSE is sqrt(mean((y_pred - y)^2)).
rmse <- c()
for (i in 1:100){
df_sampled <- sample_n(df, n, replace=True)
model <- lm(y ~ ., data = df_sampled)
rmse[i] <- sqrt(mean((y_pred - y)^2))
}
as.data.frame(rmse)

fitting linear regression models with different predictors using loops

I want to fit regression models using a single predictor variable at a time. In total I have 7 predictors and 1 response variable. I want to write a chunk of code that picks a predictor variable from data frame and fits a model. I would further want to extract regression coefficient( not the intercept) and the sign of it and store them in 2 vectors. Here's my code-
for (x in (1:7))
{
fit <- lm(distance ~ FAA_unique_with_duration_filtered[x] , data=FAA_unique_with_duration_filtered)
coeff_values<-summary(fit)$coefficients[,1]
coeff_value<-coeff_values[2]
append(coeff_value_vector,coeff_value , after = length(coeff_value_vector))
append(RCs_sign_vector ,sign(coeff_values[2]) , after = length(RCs_sign_vector))
}
Over here x in will use the first column , then the 2nd and so on. However, I am getting the following error.
Error in model.frame.default(formula = distance ~ FAA_unique_with_duration_filtered[x], :
invalid type (list) for variable 'FAA_unique_with_duration_filtered[x]'
Is there a way to do this using loops?
You don't really need loops for this.
Suppose we want to regress y1, the 5th column of the built-in anscombe dataset, separately on each of the first 4 columns.
Then:
a <- anscombe
reg <- function(i) coef(lm(y1 ~., a[c(5, i)]))[[2]] # use lm
coefs <- sapply(1:4, reg)
signs <- sign(coefs)
# or
a <- anscombe
reg <- function(i) cov(a$y1, a[[i]]) / var(a[[i]]) # use formula for slope
coefs <- sapply(1:4, reg)
signs <- sign(coefs)
Alternately the following where reg is either of the reg definitions above.
a <- anscombe
coefs <- numeric(4)
for(i in 1:4) coefs[i] <- reg(i)
signs <- sign(coefs)

How can I extract one specific coefficient from multiple lavaan models?

I wrote a function to run several lavaan models at once (from 5 different datasets). In the output I get the 5 different outputs. However, I would like to extract one specific estimate from each of these models, because I am using these in a meta-analysis (and I have many more models)
Here is my code for running the model:
df_list <- list ('Y1'=emo_dyn_1,'Y2'=emo_dyn_2,'Y3'=emo_dyn_3,'Y4'=emo_dyn_4,'Y5'=emo_dyn_5)
model <- 'DepB ~ isdNA + imeanNA + sex + age'
fun = function(emo_dyn){
fit=sem(model,
data=emo_dyn,
estimator = "MLR",
missing = "ml.x")
summ = summary(fit, standardized = TRUE)
list(fit = fit,summary = summ)
}
results <- lapply(df_list,fun)
names(results) <- names(df_list)
results
And this is how I extract the coefficient. It kinda makes it a dataframe and then I extract the specific value from it. Not sure if that is the best option. It is about the standardized estimate of a specific path. But it is just copy and paste and I am sure this goes easier, but I don't know how to write this loop.
emo_dyn_1_est<-standardizedSolution(results$Y1$fit) # Standardised coefficients
emo_dyn_1_est_1<-emo_dyn_1_est[1, 4]
emo_dyn_1_est_1
emo_dyn_2_est<-standardizedSolution(results$Y2$fit) # Standardised coefficients
emo_dyn_2_est_2<-emo_dyn_2_est[1, 4]
emo_dyn_2_est_2
emo_dyn_3_est<-standardizedSolution(results$Y3$fit) # Standardised coefficients
emo_dyn_3_est_3<-emo_dyn_3_est[1, 4]
emo_dyn_3_est_3
emo_dyn_4_est<-standardizedSolution(results$Y4$fit) # Standardised coefficients
emo_dyn_4_est_4<-emo_dyn_4_est[1, 4]
emo_dyn_4_est_4
emo_dyn_5_est<-standardizedSolution(results$Y5$fit) # Standardised coefficients
emo_dyn_5_est_5<-emo_dyn_5_est[1, 4]
emo_dyn_5_est_5
lavaan has the parameterEstimates function so you can do something like:
df_list <- list ('Y1'=emo_dyn_1,'Y2'=emo_dyn_2,'Y3'=emo_dyn_3,'Y4'=emo_dyn_4,'Y5'=emo_dyn_5)
model <- 'DepB ~ isdNA + imeanNA + sex + age'
fun <- function(emo_dyn){
fit <- sem(model,
data=emo_dyn,
estimator = "MLR",
missing = "ml.x")
fit
}
results <- lapply(df_list,fun)
names(results) <- names(df_list)
## Get a specific parameter
get_param <- function(fit, coef_pos) {
param <- parameterEstimates(fit, standardized = TRUE)[coef_pos, "std.lv"]
param
}
lapply(results, get_param, coef_pos = 1)
I made one change: in your lapply to get the results I only kept the model fit. If you want all the summaries you can just do lapply(results, summary). The get_param function assumes that you know the position in the results table of the parameter you want.
If you want to keep your existing lapply for the results then something like this would work:
results_fit_only <- lapply(results, "[[", "fit")
lapply(results_fit_only, get_param, coef_pos = 1)

Get list of R-squared values for linear regression model as we incrementally add predictors

I have a regression that predicts y based on 14 x-values (x1 through x14). I want to write a loop that does a regression where each iteration of the loop adds one more predictor to the regression, then tells me what the r-squared is. Here is my code:
rsqvals <- rep(NA, 15)
for (i in 1:15){
simtemp2 <- simdata[, 1:i]
modeL <- lm(y ~ ., data=simtemp2)
rsqvals[i] <- summary(modeL)$r.squared
}
where simdata is my data frame and simtemp2 is the columns I want. I suspect the problem has something to do with the fact that I can't type simdata[, 1:i], but I'm not sure why not. Any help appreciated!
It looks like you are subsetting the data.frame too much on the first iteration. In your first iteration, you would get simtemp2 <- simdata[,1:1]. The result of this operation is a vector in simtemp2. Even if you convert simtemp2 back into a data.frame, lm() will not like it as a parameter. Try starting at 2 and see if this works:
rsqvals <- rep(NA, 15)
interceptonly <- lm(y~1,data=simdata) ### no features, only the intercept
### this isn't statistically meaningful, but I put it here for completeness
rsqvals[1] <- summary(interceptonly)$r.squared
for (i in 2:15){
simtemp2 <- simdata[, 1:i]
modeL <- lm(y ~ ., data=simtemp2)
rsqvals[i] <- summary(modeL)$r.squared
}
print(rsqvals)

R: Regression of each variable depending on all the others

In R, I have the following data.frame:
df <- data.frame(var1,var2,var3)
I would like to fit a regression function, like multinom, for each variable with respect to the others, without using the variable names explicitely. In other words, I would like to obtain this result:
fit1 <- multinom(var1 ~ ., data=df)
fit2 <- multinom(var2 ~ ., data=df)
fit3 <- multinom(var3 ~ ., data=df)
But in a for loop, without using the variable names (so that I can use the same code for any data.frame). Something similar to this:
for (i in colnames(df))
{
fit[i] <- lm(i ~ ., data=df)
}
(This code does not work.)
Maybe my question is trivial, but I have no idea on how to proceed.
Thanks!
You need to add an extra step to build the formula object using string operation
fit <- vector(mode = "list", length = ncol(df))
for (i in colnames(df)) {
fm <- as.formula(paste0(i, " ~ ."))
fit[[i]] <- lm(fm, data = df)
}

Resources