Calling Variables in a Formula in r - r

I want to generalize my function in. I have defined form = y~x, and want to call y so that I use it within the function. y should be user (dynamic) defined and that is why I need a way to call it from the form. This is part of the code I tried and newdata is calculated within the function.
form = y ~ x
newdata = y
trial = function(form, x){
y = newdata
reg = lm(form, data = data.frame(x, newdata))
reg
}

If you want your function to work you could try :
form = y ~ x
newdata = 3*(1:100)+2
trial = function(f=form, x){
y = newdata
reg = lm(f, data = data.frame(x, y=newdata))
reg
}
trial(x=1:100) # or trial(form,1:100)
Call:
lm(formula = f, data = data.frame(x, y = newdata))
Coefficients:
(Intercept) x
2 3 # as expected
The think is, in your previous function, the form was an argument and R did not understand that it was the form you defined. Using f with a default value of form solves this issue.
Was that what you wanted ?
Note that if you want to call the function and define y in the call you could do :
form = y ~ x
trial = function(f=form, x, y){
reg = lm(f, data = data.frame(x, y))
reg
}
trial(x=1:100,y=3*(1:100)+2)
Call:
lm(formula = f, data = data.frame(x, y))
Coefficients:
(Intercept) x
2 3
From what I understood from your comments this might be closer to what you expected :
form = y ~ x
trial = function(f=form, x, y){
y = 5*y+2
reg = lm(f, data = data.frame(x, y))
reg
}
trial(x=1:100,y=3*(1:100))
Call:
lm(formula = f, data = data.frame(x, y))
Coefficients:
(Intercept) x
2 15
You call the function specifying x and giving a "first value" for y. Then, in the function, y is transformed (quite an easy transformation here) and then the regression is done.
Note that when you call the function, y is not the variable y but the argument in the function. If you have a variable z equal to 3*(1:100) you can do trial(x=1:100,y=z)

I tried this out and it turned out to be what I needed
form = y ~ x
trial = function(form, x){
.
.
.
newdata = something
.
.
new.form <- as.formula(call("~", form[[2]], form[[3]]))
assign(deparse(form[[2]]), newdata)
reg = lm(new.form, data = data.frame(x, newdata))
reg
}

Something like this?
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm(weight ~ group)
form <- as.formula(y ~ x)
trial = function(form,y, x){
reg = lm(form, data = data.frame(x=x, y=y))
reg
}
trial(form,weight,group)

Related

How to set all coefficients to one in model?

To fix certain coefficient in regression to one we can use offset function.
I want to set all coefficients to 1.
Let's take this example:
set.seed(42)
y <- rnorm(100)
df <- data.frame("Uni" = runif(100), "Exp" = rexp(100), "Wei" = rweibull(100, 1))
lm(y~ offset(2*get("Uni")) + Exp + Wei, data = df)
Call:
lm(formula = y ~ offset(Uni) + offset(Exp) + offset(Wei), data = df)
Coefficients:
(Intercept)
-2.712
This code works, however what if I have huge amount of data e.g. 800 variables and I want to do for all of them ? Writing all their names would be not so efficient. Is there any solution which allows us to do it more tricky ?
I think I found one solution if we do it this way:
set.seed(42)
# Assign everything to one data frame
df <- data.frame("Dep" = rnorm(100), "Uni" = runif(100),
"Exp" = rexp(100), "Wei" = rweibull(100, 1))
varnames <- names(df)[-1]
# Create formula for the sake of model creation
form <- paste0("offset","(",varnames, ")",collapse = "+")
form <- as.formula(paste0(names(df)[1], "~", form))
lm(form, data = df)
1) terms/update The following one-liner will produce the indicated formula.
update(formula(terms(y ~ ., data = df)), ~ offset(.))
## y ~ offset(Uni + Exp + Wei)
2) reformulate/sprintf another approach is:
reformulate(sprintf("offset(%s)", names(df)), "y")
## y ~ offset(Dep) + offset(Uni) + offset(Exp) + offset(Wei)
3) rowSums Another approach is to simply sum each row:
lm(y ~ offset(rowSums(df)))
4) lm.fit We could use lm.fit in which case we don't need a formula:
lm.fit(cbind(y^0), y, offset = rowSums(df))
5) mean If you only need the coefficient then it is just:
mean(y - rowSums(df))

R convert regression model fit to a function

I want to quickly extract the fit of a regression model to a function.
So I want to get from:
# generate some random data
set.seed(123)
x <- rnorm(n = 100, mean = 10, sd = 4)
z <- rnorm(n = 100, mean = -8, sd = 3)
y <- 9 * x - 10 * x ^ 2 + 5 * z + 10 + rnorm(n = 100, 0, 30)
df <- data.frame(x,y)
plot(df$x,df$y)
model1 <- lm(formula = y ~ x + I(x^2) + z, data = df)
summary(model1)
to a model_function(x) that describes the fitted values for me.
Of course I could do this by hand in a way like this:
model_function <- function(x, z, model) {
fit <- coefficients(model)["(Intercept)"] + coefficients(model)["x"]*x + coefficients(model)["I(x^2)"]*x^2 + coefficients(model)["z"]*z
return(fit)
}
fit <- model_function(df$x,df$z, model1)
which I can compare to the actual fitted values and (with some rounding errors) works perfectly.
all(round(as.numeric(model1$fitted.values),5) == round(fit,5))
But of course this is not a universal solution (e.g. more variables etc.).
So to be clear:
Is there an easy way to extract the fitted values relationship as a function with the coefficients that were just estimated?
Note: I know of course about predict and the ability to generate fitted values from new data - but I'm really looking for that underlying function. Maybe that's possible through predict?
Grateful for any help!
If you want an actual function you can do something like this:
get_func <- function(mod) {
vars <- as.list(attr(mod$terms, "variables"))[-(1:2)]
funcs <- lapply(vars, function(x) list(quote(`*`), 1, x))
terms <- mapply(function(x, y) {x[[2]] <- y; as.call(x)}, funcs, mod$coefficients[-1],
SIMPLIFY = FALSE)
terms <- c(as.numeric(mod$coefficients[1]), terms)
body <- Reduce(function(a, b) as.call(list(quote(`+`), a, b)), terms)
vars <- setNames(lapply(seq_along(vars), function(x) NULL), sapply(vars, as.character))
f <- as.function(c(do.call(alist, vars), body))
formals(f) <- formals(f)[!grepl("\\(", names(formals(f)))]
f
}
Which allows:
my_func <- get_func(model1)
my_func
#> function (x = NULL, z = NULL)
#> 48.6991866925322 + 3.31343108778127 * x + -9.77589420188036 * I(x^2) + 5.38229596972984 * z
<environment: 0x00000285a1982b48>
and
my_func(x = 1:10, z = 3)
#> [1] 58.38361 32.36936 -13.19668 -78.31451 -162.98413 -267.20553
#> [7] -390.97872 -534.30371 -697.18048 -879.60903
and
plot(1:10, my_func(x = 1:10, z = 3), type = "b")
At the moment, this would not work with interaction terms, etc, but should work for most simple linear models
Any of these give the fitted values:
fitted(model1)
predict(model1)
model.matrix(model1) %*% coef(model1)
y - resid(model1)
X <- model.matrix(model1); X %*% qr.solve(X, y)
X <- cbind(1, x, x^2, z); X %*% qr.solve(X, y)
Any of these give the predicted values for any particular x and z:
cbind(1, x, x^2, z) %*% coef(model1)
predict(model1, list(x = x, z = z))

R - Predicted variables not included in linear regression graph

Here's the relevant code snippet. How do I get the predicted variables to display in the plot?
df <- data.frame(X = 2010:2022, Y = c(11539282, 11543332, 11546969, 11567845, 11593741, 11606027, 11622554, 11658609, rep(NA, 5)))
model.1 <- lm(formula = Y ~ X, data = df)
predict(object = model.1, newdata = df)
plot(X, Y, ylim=c(11500000,11750000))
lines(sort(X), fitted(model.1)[order(X)])
Make these changes:
when creating the model use na.action = na.exclude
use the formula methods for plot and lines
use fitted(model.2) as the predicted values
no sorting is needed as X is already sorted
giving this code:
model.2 <- lm(Y ~ X, df, na.action = na.exclude)
plot(Y ~ X, df)
lines(fitted(model.2) ~ X, df)
or use abline in which case this shorter code can be used:
model.3 <- lm(Y ~ X, df)
plot(Y ~ X, df)
abline(model.3)
In either case we get this output:
Added
Based on clarification in the comments we could do this (or if you want an even wider range try ylim = extendrange(pred, f = .10) to extend the range by 10%, say, on either side).
pred <- predict(model.3, df)
plot(Y ~ X, df, ylim = range(pred))
lines(pred ~ X, df)
giving:

Extracting x-axis intercept from a linear fit in R

I have some data generated using the following lines of code,
x <- c(1:10)
y <- x^3
z <- y-20
s <- z/3
t <- s*6
q <- s*y
x1 <- cbind(x,y,z,s,t,q)
x1 <- data.frame(x1)
I would like to plot x versus y,s, and t so I melt the data frame x1 first,
library(reshape2)
xm <- melt(x1, id=names(x1)[1], measure=names(x1)[c(2, 4, 5)], variable = "cols"`)
Then I plot them along with their linear fits using the following code,
library(ggplot2)
plt <- ggplot(xm, aes(x = x, y = value, color = cols)) +
geom_point(size = 3) +
labs(x = "x", y = "y") +
geom_smooth(method = "lm", se = FALSE)
plt
The plot which is generated is shown below,
Now I would liked to interpolate the x-intercept of the linear fit. The point in the plot where y axis value is 0.
The following lines of code as shown here, extracts the slope and y-intercept.
fits <- by(xm[-2], xm$cols, function(i) coef(lm(value ~ x, i)))
data.frame(cols = names(fits), do.call(rbind, fits))
Is there any way how I can extract the x-intercept other than manually calculating from the slope and y-intercept?
Thanks for the help!
You could do inverse prediction as implemented in package chemCal for calibrations if you don't want to calculate this yourself:
library(chemCal)
res <- by(xm[-2], xm$cols, function(i) inverse.predict(lm(value ~ x, i), 0)$Prediction)
res[1:3]
#xm$cols
#y s t
#2.629981 2.819734 2.819734
Edit:
Maybe you prefer this:
library(plyr)
res <- ddply(xm, .(cols),
function(i) data.frame(xinter=inverse.predict(lm(value ~ x, i), 0)$Prediction))
# cols xinter
# 1 y 2.629981
# 2 s 2.819734
# 3 t 2.819734
I don't think you can avoid computing the linear equation, though of course you don't have to do it by hand (unless you want to). For example:
by(xm[-2], xm$cols, function(i) {
fit <- lm(value~x, i); print(fit); solve(coef(fit)[-1], -coef(fit)[1] )}
)
Call:
lm(formula = value ~ x, data = i)
Coefficients:
(Intercept) x
-277.2 105.4
Call:
lm(formula = value ~ x, data = i)
Coefficients:
(Intercept) x
-99.07 35.13
Call:
lm(formula = value ~ x, data = i)
Coefficients:
(Intercept) x
-594.4 210.8
xm$cols: y
[1] 2.629981
-----------------------------------------------------------------------------------------------------------------
xm$cols: s
[1] 2.819734
-----------------------------------------------------------------------------------------------------------------
xm$cols: t
[1] 2.819734
What was solved is basically -277.2 + 105.4*x = 0 for x -> 105.4*x = 277.2 (the solve-function call) -> x = 2.629981. Seems your lines 's' and 't' intersect the y=0 axis at the same spot. If I understood correctly, your problem isn't extrapolation since your x-range covers the intercept but instead interpolation.
Ps. I think your code was missing: require("reshape")
EDIT:
result <- c(by(xm[-2], xm$cols, function(i) { fit <- lm(value~x, i); print(fit); solve(coef(fit)[-1], -coef(fit)[1] )} )); print(result)
> print(result)
y s t
2.629981 2.819734 2.819734
I found a way to calculate the x-intercept, first create a data frame with the y-intercept and slope values,
par <- data.frame(cols = names(fits), do.call(rbind, fits))
Then rename column header names to accurately denote the values,
colnames(par)[2] <- "y_intercept"
colnames(par)[3] <- "slope"
# Calculate the x-intercept by using the formula -(y_intercept)/slope
x_incpt <- -par[2]/par[3]
colnames(x_incpt) <- "x_intercept"
Which gives the following result,
x_intercept
y 2.629981
s 2.819734
t 2.819734

Using lapply on a list of models

I have generated a list of models, and would like to create a summary table.
As and example, here are two models:
x <- seq(1:10)
y <- sin(x)^2
model1 <- lm(y ~ x)
model2 <- lm(y ~ x + I(x^2) + I(x^3))
and two formulas, the first generating the equation from components of formula
get.model.equation <- function(x) {
x <- as.character((x$call)$formula)
x <- paste(x[2],x[1],x[3])
}
and the second generating the name of model as a string
get.model.name <- function(x) {
x <- deparse(substitute(x))
}
With these, I create a summary table
model.list <- list(model1, model2)
AIC.data <- lapply(X = model.list, FUN = AIC)
AIC.data <- as.numeric(AIC.data)
model.models <- lapply(X = model.list, FUN = get.model)
model.summary <- cbind(model.models, AIC.data)
model.summary <- as.data.frame(model.summary)
names(model.summary) <- c("Model", "AIC")
model.summary$AIC <- unlist(model.summary$AIC)
rm(AIC.data)
model.summary[order(model.summary$AIC),]
Which all works fine.
I'd like to add the model name to the table using get.model.name
x <- get.model.name(model1)
Which gives me "model1" as I want.
So now I apply the function to the list of models
model.names <- lapply(X = model.list, FUN = get.model.name)
but now instead of model1 I get X[[1L]]
How do I get model1 rather than X[[1L]]?
I'm after a table that looks like this:
Model Formula AIC
model1 y ~ x 11.89136
model2 y ~ x + I(x^2) + I(x^3) 15.03888
Do you want something like this?
model.list <- list(model1 = lm(y ~ x),
model2 = lm(y ~ x + I(x^2) + I(x^3)))
sapply(X = model.list, FUN = AIC)
I'd do something like this:
model.list <- list(model1 = lm(y ~ x),
model2 = lm(y ~ x + I(x^2) + I(x^3)))
# changed Reduce('rbind', ...) to do.call(rbind, ...) (Hadley's comment)
do.call(rbind,
lapply(names(model.list), function(x)
data.frame(model = x,
formula = get.model.equation(model.list[[x]]),
AIC = AIC(model.list[[x]])
)
)
)
# model formula AIC
# 1 model1 y ~ x 11.89136
# 2 model2 y ~ x + I(x^2) + I(x^3) 15.03888
Another option, with ldply, but see hadley's comment below for a more efficient use of ldply:
# prepare data
x <- seq(1:10)
y <- sin(x)^2
dat <- data.frame(x,y)
# create list of named models obviously these are not suited to the data here, just to make the workflow work...
models <- list(model1=lm(y~x, data = dat),
model2=lm(y~I(1/x), data=dat),
model3=lm(y ~ log(x), data = dat),
model4=nls(y ~ I(1/x*a) + b*x, data = dat, start = list(a = 1, b = 1)),
model5=nls(y ~ (a + b*log(x)), data=dat, start = setNames(coef(lm(y ~ log(x), data=dat)), c("a", "b"))),
model6=nls(y ~ I(exp(1)^(a + b * x)), data=dat, start = list(a=0,b=0)),
model7=nls(y ~ I(1/x*a)+b, data=dat, start = list(a=1,b=1))
)
library(plyr)
library(AICcmodavg) # for small sample sizes
# build table with model names, function, AIC and AICc
data.frame(cbind(ldply(models, function(x) cbind(AICc = AICc(x), AIC = AIC(x))),
model = sapply(1:length(models), function(x) deparse(formula(models[[x]])))
))
.id AICc AIC model
1 model1 15.89136 11.89136 y ~ x
2 model2 15.78480 11.78480 y ~ I(1/x)
3 model3 15.80406 11.80406 y ~ log(x)
4 model4 16.62157 12.62157 y ~ I(1/x * a) + b * x
5 model5 15.80406 11.80406 y ~ (a + b * log(x))
6 model6 15.88937 11.88937 y ~ I(exp(1)^(a + b * x))
7 model7 15.78480 11.78480 y ~ I(1/x * a) + b
It's not immediately obvious to me how to replace the .id with a column name in the ldply function, any tips?

Resources