Using lapply on a list of models - r

I have generated a list of models, and would like to create a summary table.
As and example, here are two models:
x <- seq(1:10)
y <- sin(x)^2
model1 <- lm(y ~ x)
model2 <- lm(y ~ x + I(x^2) + I(x^3))
and two formulas, the first generating the equation from components of formula
get.model.equation <- function(x) {
x <- as.character((x$call)$formula)
x <- paste(x[2],x[1],x[3])
}
and the second generating the name of model as a string
get.model.name <- function(x) {
x <- deparse(substitute(x))
}
With these, I create a summary table
model.list <- list(model1, model2)
AIC.data <- lapply(X = model.list, FUN = AIC)
AIC.data <- as.numeric(AIC.data)
model.models <- lapply(X = model.list, FUN = get.model)
model.summary <- cbind(model.models, AIC.data)
model.summary <- as.data.frame(model.summary)
names(model.summary) <- c("Model", "AIC")
model.summary$AIC <- unlist(model.summary$AIC)
rm(AIC.data)
model.summary[order(model.summary$AIC),]
Which all works fine.
I'd like to add the model name to the table using get.model.name
x <- get.model.name(model1)
Which gives me "model1" as I want.
So now I apply the function to the list of models
model.names <- lapply(X = model.list, FUN = get.model.name)
but now instead of model1 I get X[[1L]]
How do I get model1 rather than X[[1L]]?
I'm after a table that looks like this:
Model Formula AIC
model1 y ~ x 11.89136
model2 y ~ x + I(x^2) + I(x^3) 15.03888

Do you want something like this?
model.list <- list(model1 = lm(y ~ x),
model2 = lm(y ~ x + I(x^2) + I(x^3)))
sapply(X = model.list, FUN = AIC)

I'd do something like this:
model.list <- list(model1 = lm(y ~ x),
model2 = lm(y ~ x + I(x^2) + I(x^3)))
# changed Reduce('rbind', ...) to do.call(rbind, ...) (Hadley's comment)
do.call(rbind,
lapply(names(model.list), function(x)
data.frame(model = x,
formula = get.model.equation(model.list[[x]]),
AIC = AIC(model.list[[x]])
)
)
)
# model formula AIC
# 1 model1 y ~ x 11.89136
# 2 model2 y ~ x + I(x^2) + I(x^3) 15.03888

Another option, with ldply, but see hadley's comment below for a more efficient use of ldply:
# prepare data
x <- seq(1:10)
y <- sin(x)^2
dat <- data.frame(x,y)
# create list of named models obviously these are not suited to the data here, just to make the workflow work...
models <- list(model1=lm(y~x, data = dat),
model2=lm(y~I(1/x), data=dat),
model3=lm(y ~ log(x), data = dat),
model4=nls(y ~ I(1/x*a) + b*x, data = dat, start = list(a = 1, b = 1)),
model5=nls(y ~ (a + b*log(x)), data=dat, start = setNames(coef(lm(y ~ log(x), data=dat)), c("a", "b"))),
model6=nls(y ~ I(exp(1)^(a + b * x)), data=dat, start = list(a=0,b=0)),
model7=nls(y ~ I(1/x*a)+b, data=dat, start = list(a=1,b=1))
)
library(plyr)
library(AICcmodavg) # for small sample sizes
# build table with model names, function, AIC and AICc
data.frame(cbind(ldply(models, function(x) cbind(AICc = AICc(x), AIC = AIC(x))),
model = sapply(1:length(models), function(x) deparse(formula(models[[x]])))
))
.id AICc AIC model
1 model1 15.89136 11.89136 y ~ x
2 model2 15.78480 11.78480 y ~ I(1/x)
3 model3 15.80406 11.80406 y ~ log(x)
4 model4 16.62157 12.62157 y ~ I(1/x * a) + b * x
5 model5 15.80406 11.80406 y ~ (a + b * log(x))
6 model6 15.88937 11.88937 y ~ I(exp(1)^(a + b * x))
7 model7 15.78480 11.78480 y ~ I(1/x * a) + b
It's not immediately obvious to me how to replace the .id with a column name in the ldply function, any tips?

Related

How to set all coefficients to one in model?

To fix certain coefficient in regression to one we can use offset function.
I want to set all coefficients to 1.
Let's take this example:
set.seed(42)
y <- rnorm(100)
df <- data.frame("Uni" = runif(100), "Exp" = rexp(100), "Wei" = rweibull(100, 1))
lm(y~ offset(2*get("Uni")) + Exp + Wei, data = df)
Call:
lm(formula = y ~ offset(Uni) + offset(Exp) + offset(Wei), data = df)
Coefficients:
(Intercept)
-2.712
This code works, however what if I have huge amount of data e.g. 800 variables and I want to do for all of them ? Writing all their names would be not so efficient. Is there any solution which allows us to do it more tricky ?
I think I found one solution if we do it this way:
set.seed(42)
# Assign everything to one data frame
df <- data.frame("Dep" = rnorm(100), "Uni" = runif(100),
"Exp" = rexp(100), "Wei" = rweibull(100, 1))
varnames <- names(df)[-1]
# Create formula for the sake of model creation
form <- paste0("offset","(",varnames, ")",collapse = "+")
form <- as.formula(paste0(names(df)[1], "~", form))
lm(form, data = df)
1) terms/update The following one-liner will produce the indicated formula.
update(formula(terms(y ~ ., data = df)), ~ offset(.))
## y ~ offset(Uni + Exp + Wei)
2) reformulate/sprintf another approach is:
reformulate(sprintf("offset(%s)", names(df)), "y")
## y ~ offset(Dep) + offset(Uni) + offset(Exp) + offset(Wei)
3) rowSums Another approach is to simply sum each row:
lm(y ~ offset(rowSums(df)))
4) lm.fit We could use lm.fit in which case we don't need a formula:
lm.fit(cbind(y^0), y, offset = rowSums(df))
5) mean If you only need the coefficient then it is just:
mean(y - rowSums(df))

R convert regression model fit to a function

I want to quickly extract the fit of a regression model to a function.
So I want to get from:
# generate some random data
set.seed(123)
x <- rnorm(n = 100, mean = 10, sd = 4)
z <- rnorm(n = 100, mean = -8, sd = 3)
y <- 9 * x - 10 * x ^ 2 + 5 * z + 10 + rnorm(n = 100, 0, 30)
df <- data.frame(x,y)
plot(df$x,df$y)
model1 <- lm(formula = y ~ x + I(x^2) + z, data = df)
summary(model1)
to a model_function(x) that describes the fitted values for me.
Of course I could do this by hand in a way like this:
model_function <- function(x, z, model) {
fit <- coefficients(model)["(Intercept)"] + coefficients(model)["x"]*x + coefficients(model)["I(x^2)"]*x^2 + coefficients(model)["z"]*z
return(fit)
}
fit <- model_function(df$x,df$z, model1)
which I can compare to the actual fitted values and (with some rounding errors) works perfectly.
all(round(as.numeric(model1$fitted.values),5) == round(fit,5))
But of course this is not a universal solution (e.g. more variables etc.).
So to be clear:
Is there an easy way to extract the fitted values relationship as a function with the coefficients that were just estimated?
Note: I know of course about predict and the ability to generate fitted values from new data - but I'm really looking for that underlying function. Maybe that's possible through predict?
Grateful for any help!
If you want an actual function you can do something like this:
get_func <- function(mod) {
vars <- as.list(attr(mod$terms, "variables"))[-(1:2)]
funcs <- lapply(vars, function(x) list(quote(`*`), 1, x))
terms <- mapply(function(x, y) {x[[2]] <- y; as.call(x)}, funcs, mod$coefficients[-1],
SIMPLIFY = FALSE)
terms <- c(as.numeric(mod$coefficients[1]), terms)
body <- Reduce(function(a, b) as.call(list(quote(`+`), a, b)), terms)
vars <- setNames(lapply(seq_along(vars), function(x) NULL), sapply(vars, as.character))
f <- as.function(c(do.call(alist, vars), body))
formals(f) <- formals(f)[!grepl("\\(", names(formals(f)))]
f
}
Which allows:
my_func <- get_func(model1)
my_func
#> function (x = NULL, z = NULL)
#> 48.6991866925322 + 3.31343108778127 * x + -9.77589420188036 * I(x^2) + 5.38229596972984 * z
<environment: 0x00000285a1982b48>
and
my_func(x = 1:10, z = 3)
#> [1] 58.38361 32.36936 -13.19668 -78.31451 -162.98413 -267.20553
#> [7] -390.97872 -534.30371 -697.18048 -879.60903
and
plot(1:10, my_func(x = 1:10, z = 3), type = "b")
At the moment, this would not work with interaction terms, etc, but should work for most simple linear models
Any of these give the fitted values:
fitted(model1)
predict(model1)
model.matrix(model1) %*% coef(model1)
y - resid(model1)
X <- model.matrix(model1); X %*% qr.solve(X, y)
X <- cbind(1, x, x^2, z); X %*% qr.solve(X, y)
Any of these give the predicted values for any particular x and z:
cbind(1, x, x^2, z) %*% coef(model1)
predict(model1, list(x = x, z = z))

Is there any way to construct real regression equation by taking parameters from models in R?

data is:
d <- data.frame(x = rnorm(100, 0, 1),
y = rnorm(100, 0, 1),
z = rnorm(100, 0, 1))
function to fit 5 models
library(splines)
func <-function(d){
fit1 <- lm( y~ x + z, data = d)
fit2 <- lm( y~x + I(z^2), data = d)
fit3 <- lm( y~poly(x,3) + z, data = d)
fit4 <- lm( y~ns(x, 3) + z, data = d)
l <- list(fit1, fit2, fit3, fit4)
names(l) <- paste0("fit", 1:4)
return(l)
}
mods <- func(d)
mods[[1]]
stargazer(mods, type="text)
I want to construct real regression equations in real format of each one of the models by taking parameters from fitting models and ind variables automatically inside of R if it is possible. For example: for fit1 model, intercept = -0.20612, x = 0.17443, x = 0.03203. Then equation will be something like this: y = -0.206 + 0.174x + 0.032z etc and wanna list these equations of all models in a table along with very common useful statistics like R2, P value, adj.R2, observations etc. stargazer is not showing me my desired output. So I wanna make sure if there is any way to do this in R without doing it manually in excel?
Thanks in advance!
We can map through mods using #J.R.'s function here and broom::glance to the model R2, P-value, and adj.R2.
library(purrr)
library(broom)
map_dfr(mods,
function(x) data.frame('Eq'=regEq(lmObj = x, dig = 3), broom::glance(x), stringsAsFactors = FALSE),
.id='Model')
Model Eq r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
1 fit1 y = 0.091 - 0.022*x - 0.027*z 0.0012601436 -0.01933243 1.028408 0.06119408 0.9406769 3 -143.1721 294.3441 304.7648
2 fit2 y = 0.093 - 0.022*x - 0.003*I(z^2) 0.0006154188 -0.01999045 1.028740 0.02986619 0.9705843 3 -143.2043 294.4087 304.8294
3 fit3 y = 0.093 - 0.248*poly(x, 3)1 - 0.186*poly(x, 3)2 - 0.581*poly(x, 3)3 - 0.031*z 0.0048717358 -0.03702840 1.037296 0.11627016 0.9764662 5 -142.9909 297.9819 313.6129
4 fit4 y = 0.201 + 0.08*ns(x, 3)1 - 0.385*ns(x, 3)2 - 0.281*ns(x, 3)3 - 0.031*z 0.0032813558 -0.03868575 1.038125 0.07818877 0.9887911 5 -143.0708 298.1416 313.7726
deviance df.residual
1 102.5894 97
2 102.6556 97
3 102.2184 95
4 102.3818 95
The problem is that each of your models is not exactly ideal for tabular data, for example fit 3 returns 4 estimates while fit 1 returns just 3
If you are comfortable with lists I would suggest they are a great way of storing this kind of information
library(broom)
library(tidyverse)
library(splines)
d <- data.frame(x = rnorm(100, 0, 1),
y = rnorm(100, 0, 1),
z = rnorm(100, 0, 1))
func <-function(d){
fit1 <- lm( y~ x + z, data = d)
fit2 <- lm( y~x + I(z^2), data = d)
fit3 <- lm( y~poly(x,3) + z, data = d)
fit4 <- lm( y~ns(x, 3) + z, data = d)
l <- list(fit1, fit2, fit3, fit4)
names(l) <- paste0("fit", 1:4)
return(l)
}
mods <- func(d)
list_representation<- map(mods,tidy)
Assuming mods shown in the Note at the end and that what is wanted is a character vector of a text representation of the formulas with the coefficients substituted we have the following.
The fit2text function takes a fitted object and outputs a character string with the text representation of the formula. The round argument gives the number of digits that the coefficients are rounded to in the result. The rmI argument, if TRUE, removes any I(...) and just leaves the ... inside assuming, for ease of implementation, that the expression inside does not contain any parentheses. If FALSE then I is not removed.
Other statistics can be extracted from summary(mods[[1]]) or broom::glance(mods[[1]])
fit2text <- function(fit, round = 2, rmI = TRUE) {
fo <- formula(fit)
resp <- all.vars(fo)[1]
co <- round(coef(fit), round)
labs <- c(if (terms(fit, "intercept") == 1) "", labels(fit))
p <- gsub("\\+ *-", "- ", paste(resp, "~ ", paste(paste(co, labs), collapse = " + ")))
p2 <- if (rmI) gsub("I\\(([^)]+)\\)", "\\1", p) else p
gsub(" +", " ", p2)
}
sapply(mods, fit2text)
giving:
fit1
"y ~ -0.11 - 0.05 x + 0.03 z"
fit2
"y ~ -0.07 - 0.05 x - 0.04 z^2"
fit3
"y ~ -0.11 - 0.43 poly(x, 3) - 1.05 z + 0.27 + 0.04 poly(x, 3)"
fit4
"y ~ -0.55 + 0.23 ns(x, 3) + 0.79 z - 0.25 + 0.04 ns(x, 3)"
Note
The code in the question was not reproducible because the library calls were missing, it used random numbers without a set.seed and there were some further errors in the code. For clarity, we provide the following reproducible code that we used to provide the input for the above answer.
library(splines)
set.seed(123)
d <- data.frame(x = rnorm(100, 0, 1),
y = rnorm(100, 0, 1),
z = rnorm(100, 0, 1))
# function to fit 5 models
func <-function(d){
fit1 <- lm( y~ x + z, data = d)
fit2 <- lm( y~x + I(z^2), data = d)
fit3 <- lm( y~poly(x,3) + z, data = d)
fit4 <- lm( y~ns(x, 3) + z, data = d)
l <- list(fit1, fit2, fit3, fit4)
names(l) <- paste0("fit", 1:4)
return(l)
}
mods <- func(d)

Calling Variables in a Formula in r

I want to generalize my function in. I have defined form = y~x, and want to call y so that I use it within the function. y should be user (dynamic) defined and that is why I need a way to call it from the form. This is part of the code I tried and newdata is calculated within the function.
form = y ~ x
newdata = y
trial = function(form, x){
y = newdata
reg = lm(form, data = data.frame(x, newdata))
reg
}
If you want your function to work you could try :
form = y ~ x
newdata = 3*(1:100)+2
trial = function(f=form, x){
y = newdata
reg = lm(f, data = data.frame(x, y=newdata))
reg
}
trial(x=1:100) # or trial(form,1:100)
Call:
lm(formula = f, data = data.frame(x, y = newdata))
Coefficients:
(Intercept) x
2 3 # as expected
The think is, in your previous function, the form was an argument and R did not understand that it was the form you defined. Using f with a default value of form solves this issue.
Was that what you wanted ?
Note that if you want to call the function and define y in the call you could do :
form = y ~ x
trial = function(f=form, x, y){
reg = lm(f, data = data.frame(x, y))
reg
}
trial(x=1:100,y=3*(1:100)+2)
Call:
lm(formula = f, data = data.frame(x, y))
Coefficients:
(Intercept) x
2 3
From what I understood from your comments this might be closer to what you expected :
form = y ~ x
trial = function(f=form, x, y){
y = 5*y+2
reg = lm(f, data = data.frame(x, y))
reg
}
trial(x=1:100,y=3*(1:100))
Call:
lm(formula = f, data = data.frame(x, y))
Coefficients:
(Intercept) x
2 15
You call the function specifying x and giving a "first value" for y. Then, in the function, y is transformed (quite an easy transformation here) and then the regression is done.
Note that when you call the function, y is not the variable y but the argument in the function. If you have a variable z equal to 3*(1:100) you can do trial(x=1:100,y=z)
I tried this out and it turned out to be what I needed
form = y ~ x
trial = function(form, x){
.
.
.
newdata = something
.
.
new.form <- as.formula(call("~", form[[2]], form[[3]]))
assign(deparse(form[[2]]), newdata)
reg = lm(new.form, data = data.frame(x, newdata))
reg
}
Something like this?
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm(weight ~ group)
form <- as.formula(y ~ x)
trial = function(form,y, x){
reg = lm(form, data = data.frame(x=x, y=y))
reg
}
trial(form,weight,group)

What is the difference between x^2 and I(x^2) in R?

What is the difference between these two models in R?
model1 <- glm(y~ x + x^2, family=binomial(link=logit), weights=numbers))
model2 <- glm(y~ x + I(x^2),family=binomial(link=logit), weights=numbers))
Also what is the equvalent of I(x^2) in SAS?
The I() function means 'as is' whereas the ^n (to the power of n) operator means 'include these variables and all interactions up to n way'
This means:
I(X^2) is literally regressing Y against X squared and
X^2 means include X and the 2 way interaction of X but since it is only one variable there is no interaction so it returns only itself i.e. X. Note that in your formula you say X + X^2 which translates to X + X which in the formula syntax is only taken into account once. I.e. one of the two Xs will be removed.
Demonstration:
Y <- runif(100)
X2 <- runif(100)
df <- data.frame(Y,X1,X2)
b <- lm( Y ~ X2 + X2^2 + X2,data=df)
> b
Call:
lm(formula = Y ~ X2 + X2^2 + X2, data = df)
Coefficients:
(Intercept) X2
0.48470 0.05098
a <- lm( Y ~ X2 + I(X2^2),data=df)
> a
Call:
lm(formula = Y ~ X2 + I(X2^2), data = df)
Coefficients:
(Intercept) X2 I(X2^2)
0.47545 0.11339 -0.06682
Hope it helps!

Resources