Is there a way how to iterate over formula in R?
what I need to do lets say we have a formula given as: as.formula(hp ~ factor(gear) + qsec + am)
What I need to do is to iterate over elements of formula So I can create 3 model (3 because we use 3 regressors - no counting dummies)
I need to create first model as as.formula(hp ~ factor(gear)), then second like as.formula(hp ~ factor(gear) + qsec) and lastly as.formula(hp ~ factor(gear) + qsec + am)
Can we somehow use just one regressor in one iterration, then use two and when use three?
I need to automatize this for function and "hand" approach is not good
My approach here: create a string using sprintf and paste (with collapse option), coerce it to a formula, and then loop over the elements you want to include.
elements <- c("factor(gear)", "qsec", "am")
for (i in 1:length(elements)) {
fmla <- as.formula(sprintf("hp ~ %s", paste(elements[1:i], collapse = " + ")))
print(fmla)
print(summary(lm(fmla, data = mtcars)))
}
If you need to parse the formula gave, you could do something like this before running the loop above (might need to be modified for your specific setup):
library(stringr)
input_fmla <- "as.formula(hp ~ factor(gear) + qsec + am)"
temp <- str_remove_all(input_fmla, "(as.formula\\([^ ]* ~ |\\)$)")
elements <- trimws(str_split(temp, pattern = "\\+")[[1]])
Related
How do I calculate the relative importance using relaimpo package in R when I want to run it for several groups? As an example, in the mtcars dataframe I want to calculate the relative importance of several variables on mpg for every cyl. I calculated the relative importance of the variables on mpg, but I don't know how to make it per group. I tried to insert group_by(cyl) but I did not succeed. How would I do that in R?
library(relaimpo)
df <- mtcars
model <- lm(mpg ~ disp + hp + drat + wt, data=df)
rel_importance = calc.relimp(model, type = "lmg", rela=TRUE)
rel_importance
I'm not familiar with this package but in general if you want to apply a function by group in R you can split the data frame into a list of one data frame per group, and then apply the function to each element of the list.
In this case:
cyl_list <- split(df, df$cyl)
rel_importance_cyl <- lapply(
cyl_list,
\(df) {
model <- lm(mpg ~ disp + hp + drat + wt, data = df)
calc.relimp(model, type = "lmg", rela = TRUE)
}
)
names(rel_importance_cyl) # "4" "6" "8"
You can access this list either by name (e.g. rel_importance_cyl[["4"]]) or by index (e.g. rel_importance_cyl[[1]]), to see the values for each group.
I am trying to iterate over multiple columns for a glm function in R.
view(mtcars)
names <- names(mtcars[-c(1,2)])
for(i in 1:length(names)){
print(paste0("Starting iterations for ",names[i]))
model <- glm(mpg ~ cyl + paste0(names[i]), data=mtcars, family = gaussian())
summary(model)
print(paste0("Iterations for ",names[i], " finished"))
}
however, I am getting the following error:
[1] "Starting iterations for disp"
Error in model.frame.default(formula = mpg ~ cyl + paste0(names[i]), data = mtcars, :
variable lengths differ (found for 'paste0(names[i])')
Not sure, how I can correct this.
mpg ~ cyl + paste0(names[i]) or even mpg ~ cyl + names[i] is not a valid syntax for a formula. Use
reformulate(c("cyl", names[i]), "mpg")
instead, which dynamically creates a formula from variable names.
Since you need to build your model formula dynamically from string you need as.formula. Alternatively, consider reformulate which receives response and RHS variable names:
...
fml <- reformulate(c("cyl", names[i]), "mpg")
model <- glm(fml, data=mtcars, family = gaussian())
summary(model)
...
glm takes a formula which you can create using as.formula()
predictors <- names(mtcars[-c(1,2)])
for(predictor in predictors){
print(paste0("Starting iterations for ",predictor))
model <- glm(as.formula(paste0("mpg ~ cyl + ",predictor)),
data=mtcars,
family = gaussian())
print(summary(model))
print(paste0("Iterations for ",predictor, " finished"))
}
This question already has answers here:
Creating a loop through a list of variables for an LM model in R
(2 answers)
Closed 2 years ago.
Just as the title.
There is an simple example.
If I want to explore the relationship between vs(dependent variable) and mpg, cyl, and disp(independent variables), respectively, I can code like this:
library(tidyverse)
mtcars <- as_tibble(mtcars) %>%
mutate(mpg10 = mpg*10, cyl10 = cyl*10, disp10 = disp*10)
x = c('mpg', 'cyl', 'disp')
# y ~ x style
models <- map(x, ~ lm(substitute(vs ~ i, list(i = as.name(.))), data = mtcars))
Now I want to do more further. If mpg in the model, mpg10 also should be put in. If cyl in the model, also cyl10 should be put in, etc. Like this:
# y ~ x1 + x2 style
model1 <- lm(vs ~ mpg + mpg10, data = mtcars)
model2 <- lm(vs ~ cyl + cyl10, data = mtcars)
model3 <- lm(vs ~ disp + disp10, data = mtcars)
I don't know how to do this with map() function or for loop.
Any help will be highly appreciated!
You can use grep to find all the column names with the same name and use reformulate to create formula to use in lm.
purrr::map(x, ~lm(reformulate(grep(.x, names(mtcars), value = TRUE),
'vs'), data = mtcars))
So I would like to created a new formula in R based on another formula, the difference should only be in one additional variable:
For example I have:
formula = as.formula(price ~ speed + hp + mpg)
formula2 = as.formula(paste0(format(formula), "+ factor(DEPARTMENT)-1"))
However Code is not working the results I want is:
formula2 = price ~ speed + hp + mpg + factor(DEPARTMENT0) -1
?update.formula seems to be exactly what you are looking for !
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/update.formula.html
I have a dataframe Data_Group_7_8 and would like to make a linear regression based on a factor analysis.
The factor analysis paired variables from col 1:4 as MR1 and col 16:20 as MR2. I want to set col 1:4 as independent variable and 16:20 as dependent and tried the following code:
mdl <- lm(select(1:4) ~ select(16:20), data=Data_Group_7_8)
summary(mdl)
Which unfortunately doesn't work. But the following does:
df2 <- data.frame(x=Data_Group_7_8 %>% select(1:4),y=Data_Group_7_8 %>% select(16:20))
lrm <- lm(x.Themenwelt_1+ x.Themenwelt_2+ x.Themenwelt_3+ x.Product_demonstration ~ y.Inspired_by_1+ y.Inspired_by_2+ y.Inspired_by_3+ y.Inspired_by_4+ y.Inspired_by_5, data=df2)
summary(lrm)
Is there a way to select the variables (Themenwelt_1 etc.) directly from the original Data_Group_7_8 (as I have tried in code 1) instead of adding them all up from a new df as I have to do 60 different analyses with this df.
R allows you to build a formula from a string using as.formula(str). Each side will have to have the sum of the terms considered, and the LHS and RHS need to be joined with a tilde. You can get the names of the columns using names(), then it it just a matter of pasting them together, first each side of the equation, collapsing the character vector to a single string with collapse = '+', then combining the two sides separated by a tilde. This is an example with the built-in mtcars dataset:
regFormula <- function(dat,range1,range2){
dat %>%
select(range1) %>%
names() %>%
paste(collapse = ' + ') %>%
paste(dat %>%
select(range2) %>%
names() %>%
paste(collapse = ' + '),
sep = ' ~ ') %>%
as.formula()
}
regFormula(mtcars,1:3,4:5)
# mpg + cyl + disp ~ hp + drat
# <environment: 0x000000000cf55c90>
You can use this directly as the formula in your linear model.