NLME GLS Model Formula [duplicate] - r

I am not able to resolve the issue that when lm(sformula) is executed, it does not show the string that is assigned to sformula. I have a feeling it is generic way R handles argument of a function and not specific to linear regression.
Below is the illustration of the issue through examples. Example 1, has the undesired output lm(formula = sformula). The example 2 is the output I would like i.e., lm(formula = "y~x").
x <- 1:10
y <- x * runif(10)
sformula <- "y~x"
## Example: 1
lm(sformula)
## Call:
## lm(formula = sformula)
## Example: 2
lm("y~x")
## Call:
## lm(formula = "y~x")

How about eval(call("lm", sformula))?
lm(sformula)
#Call:
#lm(formula = sformula)
eval(call("lm", sformula))
#Call:
#lm(formula = "y~x")
Generally speaking there is a data argument for lm. Let's do:
mydata <- data.frame(y = y, x = x)
eval(call("lm", sformula, quote(mydata)))
#Call:
#lm(formula = "y~x", data = mydata)
The above call() + eval() combination can be replaced by do.call():
do.call("lm", list(formula = sformula))
#Call:
#lm(formula = "y~x")
do.call("lm", list(formula = sformula, data = quote(mydata)))
#Call:
#lm(formula = "y~x", data = mydata)

Related

can not use Non-standard evaluation in self-define function in r

I want to write a function that extracts some information from gam model.
I can do this without self-define function (df is what I wanted):
library(mgcv)
library(tidyverse)
model = gam(mpg ~ cyl, data = mtcars)
result = summary(model)$p.table
estimate = result[2,1]
se = result[2,2]
df = data.frame(estimate = estimate, se = se)
df
Then I wrapped it with a self-define function:
my_gam <- function(y, x, data){
model = gam(y ~ x, data = data)
result = summary(model)$p.table
estimate = result[2,1]
se = result[2,2]
df = data.frame(estimate = estimate, se = se)
df
}
But I can not use my function correctly.
my_gam(y = mpg, x = cyl, data = mtcars)
Error in eval(predvars, data, env) : object 'cyl' not found
my_gam(y = 'mpg', x = 'cyl', data = mtcars)
Error in gam(y ~ x, data = data) :
Not enough (non-NA) data to do anything meaningful
Is that a way I can get the df just as the first code block when I run my_gam(y = mpg, x = cyl, data = mtcars).
Any help will be highly appreciated!!
You can use reformulate/as.formula to construct the formula.
library(mgcv)
my_gam <- function(y, x, data){
model = gam(reformulate(x, y), data = data)
result = summary(model)$p.table
estimate = result[2,1]
se = result[2,2]
df = data.frame(estimate = estimate, se = se)
df
}
my_gam(y = 'mpg', x = 'cyl', data = mtcars)
# estimate se
#1 -2.876 0.3224
We can construct a formula with paste which would be fast
my_gam <- function(y, x, data){
model <- gam(as.formula(paste(y, "~", x)), data = data)
result <- summary(model)$p.table
estimate <- result[2,1]
se <- result[2,2]
df <- data.frame(estimate = estimate, se = se)
df
}
my_gam(y = 'mpg', x = 'cyl', data = mtcars)
# estimate se
#1 -2.87579 0.3224089
Or another option is to pass a formula as argument
my_gam <- function(fmla, data){
model <- gam(fmla, data = data)
result <- summary(model)$p.table
estimate <- result[2,1]
se <- result[2,2]
df <- data.frame(estimate = estimate, se = se)
df
}
my_gam(mpg ~ cyl, data = mtcars)
# estimate se
# 1 -2.87579 0.3224089

How to use a variable in lm() function in R?

Let us say I have a dataframe (df) with two columns called "height" and "weight".
Let's say I define:
x = "height"
How do I use x within my lm() function? Neither df[x] nor just using x works.
Two ways :
Create a formula with paste
x = "height"
lm(paste0(x, '~', 'weight'), df)
Or use reformulate
lm(reformulate("weight", x), df)
Using reproducible example with mtcars dataset :
x = "Cyl"
lm(paste0(x, '~', 'mpg'), data = mtcars)
#Call:
#lm(formula = paste0(x, "~", "mpg"), data = mtcars)
#Coefficients:
#(Intercept) mpg
# 11.2607 -0.2525
and same with
lm(reformulate("mpg", x), mtcars)
We can use glue to create the formula
x <- "height"
lm(glue::glue('{x} ~ weight'), data = df)
Using a reproducible example with mtcars
x <- 'cyl'
lm(glue::glue('{x} ~ mpg'), data = mtcars)
#Call:
#lm(formula = glue::glue("{x} ~ mpg"), data = mtcars)
#Coefficients:
#(Intercept) mpg
# 11.2607 -0.2525
When you run x = "height" your are assigning a string of characters to the variable x.
Consider this data frame:
df <- data.frame(
height = c(176, 188, 165),
weight = c(75, 80, 66)
)
If you want a regression using height and weight you can either do this:
lm(height ~ weight, data = df)
# Call:
# lm(formula = height ~ weight, data = df)
#
# Coefficients:
# (Intercept) weight
# 59.003 1.593
or this:
lm(df$height ~ df$weight)
# Call:
# lm(formula = df$height ~ df$weight)
#
# Coefficients:
# (Intercept) df$weight
# 59.003 1.593
If you really want to use x instead of height, you must have a variable called x (in your df or in your environment). You can do that by creating a new variable:
x <- df$height
y <- df$weight
lm(x ~ y)
# Call:
# lm(formula = x ~ y)
#
# Coefficients:
# (Intercept) y
# 59.003 1.593
Or by changing the names of existing variables:
names(df) <- c("x", "y")
lm(x ~ y, data = df)
# Call:
# lm(formula = x ~ y, data = df)
#
# Coefficients:
# (Intercept) y
# 59.003 1.593

R - Predicted variables not included in linear regression graph

Here's the relevant code snippet. How do I get the predicted variables to display in the plot?
df <- data.frame(X = 2010:2022, Y = c(11539282, 11543332, 11546969, 11567845, 11593741, 11606027, 11622554, 11658609, rep(NA, 5)))
model.1 <- lm(formula = Y ~ X, data = df)
predict(object = model.1, newdata = df)
plot(X, Y, ylim=c(11500000,11750000))
lines(sort(X), fitted(model.1)[order(X)])
Make these changes:
when creating the model use na.action = na.exclude
use the formula methods for plot and lines
use fitted(model.2) as the predicted values
no sorting is needed as X is already sorted
giving this code:
model.2 <- lm(Y ~ X, df, na.action = na.exclude)
plot(Y ~ X, df)
lines(fitted(model.2) ~ X, df)
or use abline in which case this shorter code can be used:
model.3 <- lm(Y ~ X, df)
plot(Y ~ X, df)
abline(model.3)
In either case we get this output:
Added
Based on clarification in the comments we could do this (or if you want an even wider range try ylim = extendrange(pred, f = .10) to extend the range by 10%, say, on either side).
pred <- predict(model.3, df)
plot(Y ~ X, df, ylim = range(pred))
lines(pred ~ X, df)
giving:

Showing string in formula and not as variable in lm fit

I am not able to resolve the issue that when lm(sformula) is executed, it does not show the string that is assigned to sformula. I have a feeling it is generic way R handles argument of a function and not specific to linear regression.
Below is the illustration of the issue through examples. Example 1, has the undesired output lm(formula = sformula). The example 2 is the output I would like i.e., lm(formula = "y~x").
x <- 1:10
y <- x * runif(10)
sformula <- "y~x"
## Example: 1
lm(sformula)
## Call:
## lm(formula = sformula)
## Example: 2
lm("y~x")
## Call:
## lm(formula = "y~x")
How about eval(call("lm", sformula))?
lm(sformula)
#Call:
#lm(formula = sformula)
eval(call("lm", sformula))
#Call:
#lm(formula = "y~x")
Generally speaking there is a data argument for lm. Let's do:
mydata <- data.frame(y = y, x = x)
eval(call("lm", sformula, quote(mydata)))
#Call:
#lm(formula = "y~x", data = mydata)
The above call() + eval() combination can be replaced by do.call():
do.call("lm", list(formula = sformula))
#Call:
#lm(formula = "y~x")
do.call("lm", list(formula = sformula, data = quote(mydata)))
#Call:
#lm(formula = "y~x", data = mydata)

Using lapply on a list of models

I have generated a list of models, and would like to create a summary table.
As and example, here are two models:
x <- seq(1:10)
y <- sin(x)^2
model1 <- lm(y ~ x)
model2 <- lm(y ~ x + I(x^2) + I(x^3))
and two formulas, the first generating the equation from components of formula
get.model.equation <- function(x) {
x <- as.character((x$call)$formula)
x <- paste(x[2],x[1],x[3])
}
and the second generating the name of model as a string
get.model.name <- function(x) {
x <- deparse(substitute(x))
}
With these, I create a summary table
model.list <- list(model1, model2)
AIC.data <- lapply(X = model.list, FUN = AIC)
AIC.data <- as.numeric(AIC.data)
model.models <- lapply(X = model.list, FUN = get.model)
model.summary <- cbind(model.models, AIC.data)
model.summary <- as.data.frame(model.summary)
names(model.summary) <- c("Model", "AIC")
model.summary$AIC <- unlist(model.summary$AIC)
rm(AIC.data)
model.summary[order(model.summary$AIC),]
Which all works fine.
I'd like to add the model name to the table using get.model.name
x <- get.model.name(model1)
Which gives me "model1" as I want.
So now I apply the function to the list of models
model.names <- lapply(X = model.list, FUN = get.model.name)
but now instead of model1 I get X[[1L]]
How do I get model1 rather than X[[1L]]?
I'm after a table that looks like this:
Model Formula AIC
model1 y ~ x 11.89136
model2 y ~ x + I(x^2) + I(x^3) 15.03888
Do you want something like this?
model.list <- list(model1 = lm(y ~ x),
model2 = lm(y ~ x + I(x^2) + I(x^3)))
sapply(X = model.list, FUN = AIC)
I'd do something like this:
model.list <- list(model1 = lm(y ~ x),
model2 = lm(y ~ x + I(x^2) + I(x^3)))
# changed Reduce('rbind', ...) to do.call(rbind, ...) (Hadley's comment)
do.call(rbind,
lapply(names(model.list), function(x)
data.frame(model = x,
formula = get.model.equation(model.list[[x]]),
AIC = AIC(model.list[[x]])
)
)
)
# model formula AIC
# 1 model1 y ~ x 11.89136
# 2 model2 y ~ x + I(x^2) + I(x^3) 15.03888
Another option, with ldply, but see hadley's comment below for a more efficient use of ldply:
# prepare data
x <- seq(1:10)
y <- sin(x)^2
dat <- data.frame(x,y)
# create list of named models obviously these are not suited to the data here, just to make the workflow work...
models <- list(model1=lm(y~x, data = dat),
model2=lm(y~I(1/x), data=dat),
model3=lm(y ~ log(x), data = dat),
model4=nls(y ~ I(1/x*a) + b*x, data = dat, start = list(a = 1, b = 1)),
model5=nls(y ~ (a + b*log(x)), data=dat, start = setNames(coef(lm(y ~ log(x), data=dat)), c("a", "b"))),
model6=nls(y ~ I(exp(1)^(a + b * x)), data=dat, start = list(a=0,b=0)),
model7=nls(y ~ I(1/x*a)+b, data=dat, start = list(a=1,b=1))
)
library(plyr)
library(AICcmodavg) # for small sample sizes
# build table with model names, function, AIC and AICc
data.frame(cbind(ldply(models, function(x) cbind(AICc = AICc(x), AIC = AIC(x))),
model = sapply(1:length(models), function(x) deparse(formula(models[[x]])))
))
.id AICc AIC model
1 model1 15.89136 11.89136 y ~ x
2 model2 15.78480 11.78480 y ~ I(1/x)
3 model3 15.80406 11.80406 y ~ log(x)
4 model4 16.62157 12.62157 y ~ I(1/x * a) + b * x
5 model5 15.80406 11.80406 y ~ (a + b * log(x))
6 model6 15.88937 11.88937 y ~ I(exp(1)^(a + b * x))
7 model7 15.78480 11.78480 y ~ I(1/x * a) + b
It's not immediately obvious to me how to replace the .id with a column name in the ldply function, any tips?

Resources