Using a for loop to extract coefficients from multiple models - r

I have multiple cox models (with one variable static in all models) and am trying to extract the coefficient for that variable.
In all models the coefficient is indexed as follows: for example in model1 it is model1[[8]][1] ; for model2 it is model2[[8]][1] etc. I attempted to create a for loop but R as shown below but its not working.
Could someone help me why I am getting an error when running the following code
for (i in 1:5) {
coef[i] <- exp(summary(model[i])[[8]][1])
}
I get the following error "object 'model' not found".
Many thanks in advance
A

Here is an example of what I meant in my comment
data(iris)
model1 <- lm(data = iris, Sepal.Length ~ Sepal.Width + Species)
model2 <- lm(data = iris, Sepal.Length ~ Sepal.Width)
You can do this so you don't have to type all the models.
model.list<-mget(grep("model[0-9]+$", ls(),value=T))
ls() lists all the object you have and grep() is taking all the objects that have names "model" followed by a number.
coefs<-lapply(model.list,function(x)coef(x)[2])
unlist(coefs)
Sepal.Width Sepal.Width
0.8035609 -0.2233611

Here's a generalized example:
model1 <- 1:5
model2 <- 2:6
I can execute a function like mean to find the average of each vector with a for loop:
for(i in 1:2) print(mean(get(paste0('model', i))))
#[1] 3
#[1] 4
It works. But the a more standard approach is to use the list object. Then I can execute the desired function with built-in functions like sapply:
lst <- list(model1, model2)
sapply(lst, mean)
#[1] 3 4

Related

Regarding multiple linear models simultaneously

I am trying to run many linear regressions models simultaneously. Please help me to make a code for this.
I am working on two data frames. In first data frame have 100 dependent variables and in second data frame i have 100 independent variables. Now I want simple linear models like
lm1 <- lm(data_frame_1[[1]] ~ data_frame_2[[1]])
lm2 <- lm(data_frame[[2]] ~ data_frame[[2]])
and so on .That means I have to run 100 regression models. I want to do this simultaneously. Please help me to make respective codes to run these all models simultaneously.
It is not that clear what you mean by simultaneously. But maybe doing a loop is fine in your case?
model.list = list()
for (i in 1:100){
model.list[[i]] = lm(data.frame.1[[i]] ~ data.frame2[[i]])
}
Using dataframe_1 and dataframe_2 defined in the Note at the end we define a function LM that takes an x name and y name and performs a regression of y on x using the columns from those data frames. The result is a list of lm objects. Note that the Call: line in the output of each output list component correctly identifies which columns were used.
LM <- function(xname, yname) {
fo <- formula(paste(yname, "~", xname))
do.call("lm", list(fo, quote(cbind(dataframe_1, dataframe_2))))
}
Map(LM, names(dataframe_1), names(dataframe_2))
giving:
$x1
Call:
lm(formula = y1 ~ x1, data = cbind(dataframe_1, dataframe_2))
Coefficients:
(Intercept) x1
3.0001 0.5001
... etc ...
Note
Using the builtin anscombe data frame define dataframe_1 as the x columns and data_frame_2 as the y columns.
dataframe_1 <- anscombe[grep("x", names(anscombe))]
dataframe_2 <- anscombe[grep("y", names(anscombe))]

how to pass variable through model.matrix() in r

I'm creating a function that performs cross-validation and ridge regression to select predictors for a model. The inputs of my function are dataframe and the desired outcome variable outcome (what is being predicted). I'm using model.matrix() to create an x matrix that I will pass to glmnet(). My function uses outcome as the object argument in model.matrix(), but it looks like outcome is the wrong data type to pass through model.matrix(). Using model.matrix() normally, I would write something like model.matrix(Weight~.,dataframe). In this case, however, model.matrix won't work as model.matrix(outcome~.,dataframe) or model.matrix(dataframe$outcome~.,dataframe). Any ideas?
If 'outcome' is the object that stores the string "Weight", then we can paste with formula
model.matrix(formula(paste(outcome, "~ .")), dataframe)
A reproducible example with 'iris' dataset
data(iris)
outcome <- "Species"
m1 <- model.matrix(formula(paste(outcome, "~ .")), iris)
m2 <- model.matrix(Species ~ ., iris)
identical(m1, m2)
#[1] TRUE

predict method for felm from lfe package

Does anyone have a nice clean way to get predict behavior for felm models?
library(lfe)
model1 <- lm(data = iris, Sepal.Length ~ Sepal.Width + Species)
predict(model1, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Works
model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Does not work
UPDATE (2020-04-02): The answer from Grant below using the new package fixest provides a more parsimonious solution.
As a workaround, you could combine felm, getfe, and demeanlist as follows:
library(lfe)
lm.model <- lm(data=demeanlist(iris[, 1:2], list(iris$Species)), Sepal.Length ~ Sepal.Width)
fe <- getfe(felm(data = iris, Sepal.Length ~ Sepal.Width | Species))
predict(lm.model, newdata = data.frame(Sepal.Width = 3)) + fe$effect[fe$idx=="virginica"]
The idea is that you use demeanlist to center the variables, then lm to estimate the coefficient on Sepal.Width using the centered variables, giving you an lm object over which you can run predict. Then run felm+getfe to get the conditional mean for the fixed effect, and add that to the output of predict.
Late to the party, but the new fixest package (link) has a predict method. It supports high-dimensional fixed effects (and clustering, etc.) using a very similar syntax to lfe. Somewhat remarkably, it is also considerably faster than lfe for the benchmark cases that I've tested.
library(fixest)
model_feols <- feols(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict(model_feols, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
# Works
This might not be the answer that you are looking for, but it seems that the author did not add any functionality to the lfe package in order to make predictions on external data by using the fitted felm model. The primary focus seems to be on the analysis of the group fixed effects. However, it's interesting to note that in the documentation of the package the following is mentioned:
The object has some resemblance to an 'lm' object, and some
postprocessing methods designed for lm may happen to work. It may
however be necessary to coerce the object to succeed with this.
Hence, it might be possible to coerce the felm object to an lm object in order to obtain some additional lm functionality (if all the required info is present in the object to perform the necessary computations).
The lfe package is intended to be run on very large datasets and effort was made to conserve memory: As a direct result of this, the felm object does not use/contain a qr decomposition, as opposed to the lm object. Unfortunately, the lm predict procedure relies on this information in order to compute the predictions. Hence, coercing the felm object and executing the predict method will fail:
> model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
> class(model2) <- c("lm","felm") # coerce to lm object
> predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
Error in qr.lm(object) : lm object does not have a proper 'qr' component.
Rank zero or should not have used lm(.., qr=FALSE).
If you really must use this package to perform the predictions then you could maybe write your own simplified version of this functionality by using the information that you have available in the felm object. For example, the OLS regression coƫfficients are available via model2$coefficients.
This should work for cases where you wish to ignore the group effects in the prediction, are predicting for new X's, and only want confidence intervals. It first looks for a clustervcv attribute, then robustvcv, then vcv.
predict.felm <- function(object, newdata, se.fit = FALSE,
interval = "none",
level = 0.95){
if(missing(newdata)){
stop("predict.felm requires newdata and predicts for all group effects = 0.")
}
tt <- terms(object)
Terms <- delete.response(tt)
attr(Terms, "intercept") <- 0
m.mat <- model.matrix(Terms, data = newdata)
m.coef <- as.numeric(object$coef)
fit <- as.vector(m.mat %*% object$coef)
fit <- data.frame(fit = fit)
if(se.fit | interval != "none"){
if(!is.null(object$clustervcv)){
vcov_mat <- object$clustervcv
} else if (!is.null(object$robustvcv)) {
vcov_mat <- object$robustvcv
} else if (!is.null(object$vcv)){
vcov_mat <- object$vcv
} else {
stop("No vcv attached to felm object.")
}
se.fit_mat <- sqrt(diag(m.mat %*% vcov_mat %*% t(m.mat)))
}
if(interval == "confidence"){
t_val <- qt((1 - level) / 2 + level, df = object$df.residual)
fit$lwr <- fit$fit - t_val * se.fit_mat
fit$upr <- fit$fit + t_val * se.fit_mat
} else if (interval == "prediction"){
stop("interval = \"prediction\" not yet implemented")
}
if(se.fit){
return(list(fit=fit, se.fit=se.fit_mat))
} else {
return(fit)
}
}
To extend the answer from pbaylis, I created a slightly longwinded function that extends nicely to allow for more than one fixed effect. Note that you have to manually enter the original dataset used in the felm model. The function returns a list with two items: the vector of predictions, and a dataframe based on the new_data that includes the predictions and fixed effects as columns.
predict_felm <- function(model, data, new_data) {
require(dplyr)
# Get the names of all the variables
y <- model$lhs
x <- rownames(model$beta)
fe <- names(model$fe)
# Demean according to fixed effects
data_demeaned <- demeanlist(data[c(y, x)],
as.list(data[fe]),
na.rm = T)
# Create formula for LM and run prediction
lm_formula <- as.formula(
paste(y, "~", paste(x, collapse = "+"))
)
lm_model <- lm(lm_formula, data = data_demeaned)
lm_predict <- predict(lm_model,
newdata = new_data)
# Collect coefficients for fe
fe_coeffs <- getfe(model) %>%
select(fixed_effect = effect, fe_type = fe, idx)
# For each fixed effect, merge estimated fixed effect back into new_data
new_data_merge <- new_data
for (i in fe) {
fe_i <- fe_coeffs %>% filter(fe_type == i)
by_cols <- c("idx")
names(by_cols) <- i
new_data_merge <- left_join(new_data_merge, fe_i, by = by_cols) %>%
select(-matches("^idx"))
}
if (length(lm_predict) != nrow(new_data_merge)) stop("unmatching number of rows")
# Sum all the fixed effects
all_fixed_effects <- base::rowSums(select(new_data_merge, matches("^fixed_effect")))
# Create dataframe with predictions
new_data_predict <- new_data_merge %>%
mutate(lm_predict = lm_predict,
felm_predict = all_fixed_effects + lm_predict)
return(list(predict = new_data_predict$felm_predict,
data = new_data_predict))
}
model2 <- felm(data = iris, Sepal.Length ~ Sepal.Width | Species)
predict_felm(model = model2, data = iris, new_data = data.frame(Sepal.Width = 3, Species = "virginica"))
# Returns prediction and data frame
I think what you're looking for might be the lme4 package. I was able to get a predict to work using this:
library(lme4)
data(iris)
model2 <- lmer(data = iris, Sepal.Length ~ (Sepal.Width | Species))
predict(model2, newdata = data.frame(Sepal.Width = 3, Species = "virginica"))
1
6.610102
You may have to play around a little to specify the particular effects you're looking for, but the package is well-documented so it shouldn't be a problem.

Accessing fitted.values when using ddply

I am using ddply to execute glm on subsets of my data. I am having difficulty accessing the estimated Y values. I am able to get the model parameter estimates using the below code, but all the variations I've tried to get the fitted values have fallen short. The dependent and independent variables in the glm model are column vectors, as is the "Dmsa" variable used in the ddply operation.
Define the model:
Model <- function(df){coef(glm(Y~D+O+B+A+log(M), family=poisson(link="log"), data=df))}
Execute the model on subsets:
Modrpt <- ddply(msadata, "Dmsa", Model)
Print Modrpt gives the model coefficients, but no Y estimates.
I know that if I wasn't using ddply, I can access the glm estimated Y values by using the code:
Model <- glm(Y~D+O+B+A+log(M), family=poisson(link="log"), data=msadata)
fits <- Model$fitted.values
I have tried both of the following to get the fitted values for the subsets, but no luck:
fits <- fitted.values(ddply(msadata, "Dmsa", Model))
fits <- ddply(msadata, "Dmsa", fitted.values(Model))
I'm sure this is a very easy to code...unfortunately, I'm just learning R. Does anyone know where I am going wrong?
You can use an anonymous function in your call to ddply e.g.
require(plyr)
data(iris)
model <- function(df){
lm( Petal.Length ~ Sepal.Length + Sepal.Width , data = df )
}
ddply( iris , "Species" , function(x) fitted.values( model(x) ) )
This has the advantage that you can also, without rewriting your model function, get thecoef values by doing
ddply( iris , "Species" , function(x) coef( model(x) ) )
As #James points out, this will fall down if you have splits of unequal size, better to use dlply which puts the result of each subset in it's own list element.
(I make no claims for statistical relevance or correctness of the example model - it is just an example)
I'd recommending doing this in two steps:
library(plyr)
# First first the models
models <- dlply(iris, "Species", lm,
formula = Petal.Length ~ Sepal.Length + Sepal.Width )
# Next, extract the fitted values
ldply(models, fitted.values)
# Or maybe
ldply(models, as.data.frame(fitted.values))

Creating variable names from varying lists

I am trying to create variable name from lists in R, but am struggling!
What I would ultimately like to do is to use previously created lists to create a formula for a multiple linear regression, whereby each value within the list will identify one of the explanatory variables of the regression formula.
I am starting with x lists of variable lengths (GoodModels_LMi, where i goes from 1
to x) and use each list to create a separate formula.
for (i in 1:x){
lm(formula created from appropriate list)
i<-i+1
}
The lists correspond to variable numbers to be chosen from a data matrix (AllData). So for
example if:
GoodModels_LM1<-c(2,4,8)
I would like my regression formula to be:
AllData[,1]~AllData[,2]+AllData[,4]+AllData[,8]
I have been trying to use as.formula() and paste() to achieve this, however, I am not sure how to create the second part of my formula.
as.formula(paste("AllData[,",i,"]~",paste(?????????)))
I know that this below is not right, but is as close as I have come:
paste("AllData[,",paste("GoodModels_LM",i,sep=""),"]",collapse="+")
I have also looked into assign(), but have not succeeded as the value argument was the same as the x argument.
Thanks very much for any help with this!
Olivia
Your formula should contain the column names, not the actual data. Here is a small demo using iris.
Imagine you want to run a regression using columns 2, 4, and 5 from iris. First, construct a formula using paste():
vars <- c(2, 4, 5)
frm <- paste("Sepal.Length ~ ", paste(names(iris)[vars], collapse=" + "))
frm
"Sepal.Length ~ Sepal.Width + Petal.Width + Species"
So, the object frm is a string containing a formula that you can pass to lm():
lm(frm, iris)
Call:
lm(formula = frm, data = iris)
Coefficients:
(Intercept) Sepal.Width Petal.Width
2.5211 0.6982 0.3716
Speciesversicolor Speciesvirginica
0.9881 1.2376

Resources