I found this way of looping over variables in an lm() when the variable names are stored as characters (http://www.ats.ucla.edu/stat/r/pages/looping_strings.htm):
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})
My first question is: Is there a more efficient/faster way?
What if I want to loop over different data instead of looping over variables?
Example:
reg1 <- lm(a~b, data=dataset1)
reg2 <- lm(a~b, data=dataset2)
Can I apply something similar to the code shown above? Using the substitute function for the data did not work.
Thank You!
The substitute in your example is used to construct the formula. If you want to to apply lm to a number of data.frames use:
lapply(list(dataset1, dataset2), lm, formula = a ~ b)
Related
I am trying to create a simple linear model in R a for loop where one of the variables will be specified as a parameter and thus looped through, creating a different model for each pass of the loop. The following does NOT work:
model <- lm(test_par[i] ~ weeks, data=all_data_plant)
If I tried the same model with the "test_par[i]" replaced with the variable's explicit name, it works just as expected:
model <- lm(weight_dry ~ weeks, data=all_data_plant)
I tried reformulate and paste ineffectively. Any thoughts?
Maybe try something like this:
n <- #add the column position of first variable
m <- #add the column position of last variable
lm_models <- lapply(n:m, function(x) lm(all_data_plant[,x] ~ weeks, data=all_data_plant))
You can pass the argument "formula" in lm() as character using paste(). Here a working example:
data("trees")
test_par <- names(trees)
model <- lm(Girth ~ Height, data = trees)
model <- lm("Girth ~ Height", data = trees) # character formula works
model <- lm(paste(test_par[1], "~ Height"), data=trees)
I am trying to create multiple linear regression models from a list of variable combinations (I also have them separately as a data-frame if that is more useful!)
The list of variables looks like this:
Vars
x1+x2+x3
x1+x2+x4
x1+x2+x5
x1+x2+x6
x1+x2+x7
The loop I'm using looks like this:
for (i in 1:length(var_list)){
lm(independent_variable ~ var_list[i],data = training_data)
i+1
}
However it is not recognizing the string of var_list[i] which gives x1+x2+x3 etc. as a model input.
Does any-one know how to fix it?
Thanks for your help.
You don't even have to use loops. Apply should work nicely.
training_data <- as.data.frame(matrix(sample(1:64), nrow = 8))
colnames(training_data) <- c("independent_variable", paste0("x", 1:7))
Vars <- as.list(c("x1+x2+x3",
"x1+x2+x4",
"x1+x2+x5",
"x1+x2+x6",
"x1+x2+x7"))
allModelsList <- lapply(paste("independent_variable ~", Vars), as.formula)
allModelsResults <- lapply(allModelsList, function(x) lm(x, data = training_data))
If you need models summaries you can add :
allModelsSummaries = lapply(allModelsResults, summary)
For example you can access the coefficient R² of the model lm(independent_variable ~ x1+x2+x3) by doing this:
allModelsSummaries[[1]]$r.squared
I hope it helps.
We can create the formula with paste
out <- vector('list', length(var_list))
for (i in seq_along(var_list)){
out[[i]] <- lm(paste('independent_variable', '~', var_list[i]),
data = training_data)
}
Or otherwise, it can be done with reformulate
lm(reformulate(var_list[i], 'independent_variable'), data = training_data)
I have a dataframe with many variables. I want to apply a linear regression to explain the last one with the others. So as I had to much to write I thought about creating a string with the independent variables e.g. Var1 + Var2 +...+ VarK. I achieved it pasting "+" to all column names except for the last one with this code:
ExVar <- toString(paste(names(datos)[1:11], "+ ", collapse = ''))
I also had to remove the last "+":
ExVar <- substr(VarEx, 1, nchar(ExVar)-2)
So I copied and pasted the ExVar string within the lm() function and the result looked like this:
m1 <- lm(calidad ~ Var1 + Var 2 +...+ Var K)
The question is: Is there any way to use "ExVar" within the lm() function as a string, not as a variable, to have a cleaner code?
For better understanding:
If I use this code:
m1 <- lm(calidad ~ ExVar)
It is interpreting ExVar as a independent variable.
The following will all produce the same results. I am providing multiple methods because there is are simpler ways of doing what you are asking (see examples 2 and 3) instead of writing the expression as a string.
First, I will generate some example data:
n <- 100
p <- 11
dat <- array(rnorm(n*p),c(n,p))
dat <- as.data.frame(dat)
colnames(dat) <- paste0("X",1:p)
If you really want to specify the model as a string, this example code will help:
ExVar <- toString(paste(names(dat[2:11]), "+ ", collapse = ''))
ExVar <- substr(ExVar, 1, nchar(ExVar)-3)
model1 <- paste("X1 ~ ",ExVar)
fit1 <- lm(eval(parse(text = model1)),data = dat)
Otherwise, note that the 'dot' notation will specify all other variables in the model as predictors.
fit2 <- lm(X1 ~ ., data = dat)
Or, you can select the predictors and outcome variables by column, if your data is structured as a matrix.
dat <- as.matrix(dat)
fit3 <- lm(dat[,1] ~ dat[,-1])
All three of these fit objects have the same estimates:
fit1
fit2
fit3
if you have a dataframe, and you want to explain the last one using all the rest then you can use the code below:
lm(calidad~.,dat)
or you can use
lm(rev(dat))#Only if the last column is your response variable
Any of the two above will give you the results needed.
To do it your way:
EXV=as.formula(paste0("calidad~",paste0(names(datos)[-12],collapse = '+')))
lm(EXV,dat)
There is no need to do it this way since the lm function itself will do this by using the first code above.
Built-in functions in R can be used in formula objects, for example
reg1 = lm(y ~ log(x), data = data1)
How can I write my functions such that they can be used in formula objects?
fnMyFun = function(x) {
return(x^2)
}
reg2 = lm(y ~ fnMyFun(x), data = data1)
What you've got certainly works. One problem is that different modelling functions handle formulas in different ways. I think that as long as you return something that model.matrix can make sense of, you'll be fine. That would mean
The function is vectorised; ie given a vector of length N, it returns a result also of length N
It has to return an atomic vector or matrix (but not a list, or of type raw)
I'm having some difficulties figuring out how to approach this problem. I have a data frame that I am splitting into distinct sites (link5). Once split I basically want to run a linear regression model on the subsets. Here is the code I'm working with, but it's definitely not correct. Also, It would be great if I could output the model results to a new data frame such that each site would have one row with the model parameter estimates - that is just a wish and not a necessity right now. Thank you for any help!
les_events <- split(les, les$link5)
result <- lapply(les_events) {
lm1 <-lm(cpe~K,data=les_events)
coef <- coef(lm1)
q.hat <- -coef(lm1)[2]
les_events$N0.hat <- coef(lm1[1]/q.hat)
}
You have a number of issues.
You haven't passed a function (the FUN argument) to lapply
Your closure ( The bit inside {} is almost, but not quite the body you want for your function)
something like th following will return the coefficients from your models
result <- lapply(les_events, function(DD){
lm1 <-lm(cpe~K,data=DD)
coef <- coef(lm1)
data.frame(as.list(coef))
})
This will return a list of data.frames containing columns for each coefficient.
lapply(les_events, lm, formula = 'cpe~K')
will return a list of linear model objects, which may be more useful.
For a more general split / apply / combine approaches use plyr or data.table
data.table
library(data.table)
DT <- data.table(les)
result <- les[, {lm1 <- lm(cpe ~ K, data = .SD)
as.list(lm1)}, by = link5]
plyr
library(plyr)
result <- ddply(les, .(link5), function(DD){
lm1 <-lm(cpe~K,data=DD)
coef <- coef(lm1)
data.frame(as.list(coef))
})
# or to return a list of linear model objects
dlply(les, link5, function(DD){ lm(cpe ~K, data =DD)})