R convert regression model fit to a function - r

I want to quickly extract the fit of a regression model to a function.
So I want to get from:
# generate some random data
set.seed(123)
x <- rnorm(n = 100, mean = 10, sd = 4)
z <- rnorm(n = 100, mean = -8, sd = 3)
y <- 9 * x - 10 * x ^ 2 + 5 * z + 10 + rnorm(n = 100, 0, 30)
df <- data.frame(x,y)
plot(df$x,df$y)
model1 <- lm(formula = y ~ x + I(x^2) + z, data = df)
summary(model1)
to a model_function(x) that describes the fitted values for me.
Of course I could do this by hand in a way like this:
model_function <- function(x, z, model) {
fit <- coefficients(model)["(Intercept)"] + coefficients(model)["x"]*x + coefficients(model)["I(x^2)"]*x^2 + coefficients(model)["z"]*z
return(fit)
}
fit <- model_function(df$x,df$z, model1)
which I can compare to the actual fitted values and (with some rounding errors) works perfectly.
all(round(as.numeric(model1$fitted.values),5) == round(fit,5))
But of course this is not a universal solution (e.g. more variables etc.).
So to be clear:
Is there an easy way to extract the fitted values relationship as a function with the coefficients that were just estimated?
Note: I know of course about predict and the ability to generate fitted values from new data - but I'm really looking for that underlying function. Maybe that's possible through predict?
Grateful for any help!

If you want an actual function you can do something like this:
get_func <- function(mod) {
vars <- as.list(attr(mod$terms, "variables"))[-(1:2)]
funcs <- lapply(vars, function(x) list(quote(`*`), 1, x))
terms <- mapply(function(x, y) {x[[2]] <- y; as.call(x)}, funcs, mod$coefficients[-1],
SIMPLIFY = FALSE)
terms <- c(as.numeric(mod$coefficients[1]), terms)
body <- Reduce(function(a, b) as.call(list(quote(`+`), a, b)), terms)
vars <- setNames(lapply(seq_along(vars), function(x) NULL), sapply(vars, as.character))
f <- as.function(c(do.call(alist, vars), body))
formals(f) <- formals(f)[!grepl("\\(", names(formals(f)))]
f
}
Which allows:
my_func <- get_func(model1)
my_func
#> function (x = NULL, z = NULL)
#> 48.6991866925322 + 3.31343108778127 * x + -9.77589420188036 * I(x^2) + 5.38229596972984 * z
<environment: 0x00000285a1982b48>
and
my_func(x = 1:10, z = 3)
#> [1] 58.38361 32.36936 -13.19668 -78.31451 -162.98413 -267.20553
#> [7] -390.97872 -534.30371 -697.18048 -879.60903
and
plot(1:10, my_func(x = 1:10, z = 3), type = "b")
At the moment, this would not work with interaction terms, etc, but should work for most simple linear models

Any of these give the fitted values:
fitted(model1)
predict(model1)
model.matrix(model1) %*% coef(model1)
y - resid(model1)
X <- model.matrix(model1); X %*% qr.solve(X, y)
X <- cbind(1, x, x^2, z); X %*% qr.solve(X, y)
Any of these give the predicted values for any particular x and z:
cbind(1, x, x^2, z) %*% coef(model1)
predict(model1, list(x = x, z = z))

Related

Estimating quantile correlations with rolling window

I would like to estimate the quantile correlations between two variables, say Y and X, using the rolling window. I am using the R package QCSIS for this purpose. I tried to do the following
library(QCSIS)
#Generate some random variables
n <- 4000
x <- rnorm(n)
y <- 2 * x + rt(n,df = 1)
tau <- 9 / 10
#calculate the static quantile correlation
fit<-qc(x = x, y = y, tau = tau)
fit$rho
#calculate the rolling window quantile correlations
s<-260 #The window size
Rho.mat <- matrix(0,1,(n-s+1)) #create empty matrix to store the correlation coefficients
#running the loop
for(i in 1:(n-s+1)) {
fit <- qc(x = x, y = y, tau = tau)
Rho.mat[,i] <- fit$rho
}
However, this code does not give the quantile correlation for each window and only repeats the static quantile correlation! Most of the other solutions I found online are related to linear regression and do not fit with the function I am using. That is why I am using a loop.
Use rollapplyr as follows to avoid loops:
library(zoo)
rollapplyr(cbind(x, y), s, function(z) qc(z[, 1], z[, 2], tau)$rho,
fill = NA, by.column = FALSE)
or over the indexes:
rollapplyr(seq_along(x), s, function(ix) qc(x[ix], y[ix], tau)$rho, fill = NA)
We can check the result like this:
library(zoo)
r.roll <- rollapplyr(cbind(x, y), s, function(z) qc(z[, 1], z[, 2], tau)$rho,
fill = NA, by.column = FALSE)
r.for <- x
for(i in seq_along(r.for)) {
r.for[i] <- if (i < s) NA else {
ix <- seq(to = i, length = s)
qc(x[ix], y[ix], tau = tau)$rho
}
}
identical(r.roll, r.for)
## [1] TRUE

Why do MASS:lm.ridge coefficents differ from those calculated manually?

When performing ridge regression manually, as it is defined
solve(t(X) %*% X + lbd*I) %*%t(X) %*% y
I get different results from those calculated by MASS::lm.ridge. Why? For ordinary linear regression the manual method (computing the pseudoinverse) works fine.
Here is my Minimal, Reproducible Example:
library(tidyverse)
ridgeRegression = function(X, y, lbd) {
Rinv = solve(t(X) %*% X + lbd*diag(ncol(X)))
t(Rinv %*% t(X) %*% y)
}
# generate some data:
set.seed(0)
tb1 = tibble(
x0 = 1,
x1 = seq(-1, 1, by=.01),
x2 = x1 + rnorm(length(x1), 0, .1),
y = x1 + x2 + rnorm(length(x1), 0, .5)
)
X = as.matrix(tb1 %>% select(x0, x1, x2))
# sanity check: force ordinary linear regression
# and compare it with the built-in linear regression:
ridgeRegression(X, tb1$y, 0) - coef(summary(lm(y ~ x1 + x2, data=tb1)))[, 1]
# looks the same: -2.94903e-17 1.487699e-14 -2.176037e-14
# compare manual ridge regression to MASS ridge regression:
ridgeRegression(X, tb1$y, 10) - coef(MASS::lm.ridge(y ~ x0 + x1 + x2 - 1, data=tb1, lambda = 10))
# noticeably different: -0.0001407148 0.003689412 -0.08905392
MASS::lm.ridge scales the data before modelling - this accounts for the difference in the coefficients.
You can confirm this by checking the function code by typing MASS::lm.ridge into the R console.
Here is the lm.ridge function with the scaling portion commented out:
X = as.matrix(tb1 %>% select(x0, x1, x2))
n <- nrow(X); p <- ncol(X)
#Xscale <- drop(rep(1/n, n) %*% X^2)^0.5
#X <- X/rep(Xscale, rep(n, p))
Xs <- svd(X)
rhs <- t(Xs$u) %*% tb1$y
d <- Xs$d
lscoef <- Xs$v %*% (rhs/d)
lsfit <- X %*% lscoef
resid <- tb1$y - lsfit
s2 <- sum(resid^2)/(n - p)
HKB <- (p-2)*s2/sum(lscoef^2)
LW <- (p-2)*s2*n/sum(lsfit^2)
k <- 1
dx <- length(d)
div <- d^2 + rep(10, rep(dx,k))
a <- drop(d*rhs)/div
dim(a) <- c(dx, k)
coef <- Xs$v %*% a
coef
# x0 x1 x2
#[1,] 0.01384984 0.8667353 0.9452382

How to define function arguments based on data.frame columns (R)?

I have a script that runs maximum likelihood estimation for a linear model. The model has several variables and I need to vary them occasionally, maybe add or drop some. The usual way to define the likelihood function is like this:
LL <- function(beta0, beta1, beta2, mu, sigma){
R = y - beta0*X$x0 + beta1*X$x1 + beta2*X$x2
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
I have dependent variable in vector y and covariates in data.frame X:
X <- data.frame(x0 = 1, x1 = runif(100), x2 = runif(100)*2)
y <- X$x0 + X$x1 + X$x2 + rnorm(100)
Now the amount of variables is subject to change by application and I need to reformulate the function so that it will take as many covariates as there are columns in the data.frame X. I was already able to reformulate this to a more general form:
cols <- 0:(ncol(X)-1)
betas <- paste0("beta", cols)
eqR <- paste0("y - ", paste0(betas, "*X$x", cols, collapse = " - "))
LL <- function(beta0, beta1, beta2, mu, sigma){
R = as.formula(eqR)
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
I'm still struggling to find a way to dynamically define the function so that it would take the same number of beta arguments as there are columns in the covariate matrix. Ellipsis is perhaps useful here? I also tried with do.call:
LL <- function(betas, mu, sigma){
R <- do.call(dnorm(as.formula(eqR), mu, sigma, log = T), betas)
-sum(R)
}
That doesn't work when you fit the model, which has another stumbling block in the list of initial values:
require(stats4)
fit <- mle(LL, start = list(beta0 = 0, beta1 = 0, beta2 = 0, mu = 0, sigma = 1))
Any ideas for this?
EDIT:
I made some advance with bbmle package:
require(bbmle)
dfModel <- cbind(y, X)
cols <- 0:(ncol(X)-1)
betas <-paste0("beta",cols)
betaList <- as.list(rep(0), length(betas)))
names(betaList) <- betas
initList <- c(betaList, mu = 0, sigma = 1)
fitML <- mle2(mu ~ dnorm(mean = y - beta0*x0 - beta1*x1 - beta2*x2, sd = sigma),
start = initList,
data = dfModel)
The above example works. But when I try to define the function beforehand with as.formula, I can't get it working. So the following does not work.
eqR <- paste0("y - ", paste0(betas, "*x", cols, collapse = " - "))
fitML <- mle2(mu ~ dnorm(mean = as.formula(eqR), sd = sigma),
start = initList,
data = dfModel)
The error message is:
Error in eval(expr, envir, enclos) : object 'beta0' not found
I suspect that this might have something to do with scoping - conflict between dnorm and as.formula? I just can't find workaround for that.
Try this:
betas = c(0,0,0)
X <- data.frame(x0 = 1, x1 = runif(100), x2 = runif(100)*2)
y <- apply(X,1,sum) + rnorm(100)
where betas is (b0, b1, b2, ...etc) and its length must be equal to the number of columns of X.
Since X could have a different number of columns y should be defined as above.
Your LL function should change to:
LL <- function(betas, mu, sigma){
R = y - as.matrix(X) %*% as.matrix(betas)
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
where %*% is the matrix product. This is the same as doing b[1]*X[,1] + b[2]*X[,2] + b[3]*X[,3] + ... + b[n]*X[,n]
With these changes, you could have data frame X with any number of columns, betas an array of the same length as columns of X.
I hope I understood what you needed.

fitting function for a given data set

I'm trying to fitting the following function y(x)=a*( 1 + (x^2)/(b^2) )^t to a particular set of data , where, a, b and t are constants that want to determine by fitting.
I try the following, for example
len <- 24
x = runif(len)
y = x^3 + runif(len, min = -0.1, max = 0.1)
plot(x, y)
s <- seq(from = 0, to = 1, length = 50)
lines(s, s^3, lty = 2)
df <- data.frame(x, y)
m <- nls(y~a*( 1 + (x^2)/(b^2) )^t, data = df, start = list(a=1,t=0, b=1), trace = T)
> Error in nlsModel(formula, mf, start, wts) :
singular gradient matrix at initial parameter estimates
Can someone help me to set this function to these points, even if the fitting becomes bad, the important is to get fit this function, ie that she run on the data
thanks everyone
Because your data are changing randomly, for some situations the value of a is close to zero and your function becomes zero. The curve fit procedure fails at that point. Randomizing the start parameters might work for some situations.
A slightly more stable output can be computed using the LM algorithm:
require("minpack.lm")
LMCurveFit <- function(df) {
# The function to be fit
FitFunction <- function(params, x) {
with (
as.list(params), {
a*(1 + x^2/b^2)^t
}
)
}
# Residual
Residual <- function(params, x, y) {
FitFunction(params, x) - y
}
# Sum of squares of residuals
ssqfun <- function(params, x, y) {
sum(Residual(params, x, y)^2)
}
# Normalize the data
x_max = max(x)
y_max = max(y)
df$x = df$x/x_max
df$y = df$y/y_max
# Define start parameters
a_start = 0.1
b_start = 1.0
t_start = 1.0
param_start = c(a = a_start,
b = b_start,
t = t_start)
# Do LM fit
nls.out <- nls.lm(par = param_start,
fn = Residual,
control = nls.lm.control(nprint=0,
ftol=.Machine$double.eps,
ptol=.Machine$double.eps,
maxfev=10000, maxiter=1024),
x = df$x,
y = df$y)
# Revert scaling
nls.out$par[1] = nls.out$par[1]*y_max
nls.out$par[2] = nls.out$par[2]*x_max
# Get the parameters
params_fit = coef(nls.out)
print(params_fit)
# Compute predicted values
predicted = FitFunction(as.list(params_fit), df$x*x_max)
}
# LM fit
pred_y = LMCurveFit(df)
lines(x, pred_y)

Extracting x-axis intercept from a linear fit in R

I have some data generated using the following lines of code,
x <- c(1:10)
y <- x^3
z <- y-20
s <- z/3
t <- s*6
q <- s*y
x1 <- cbind(x,y,z,s,t,q)
x1 <- data.frame(x1)
I would like to plot x versus y,s, and t so I melt the data frame x1 first,
library(reshape2)
xm <- melt(x1, id=names(x1)[1], measure=names(x1)[c(2, 4, 5)], variable = "cols"`)
Then I plot them along with their linear fits using the following code,
library(ggplot2)
plt <- ggplot(xm, aes(x = x, y = value, color = cols)) +
geom_point(size = 3) +
labs(x = "x", y = "y") +
geom_smooth(method = "lm", se = FALSE)
plt
The plot which is generated is shown below,
Now I would liked to interpolate the x-intercept of the linear fit. The point in the plot where y axis value is 0.
The following lines of code as shown here, extracts the slope and y-intercept.
fits <- by(xm[-2], xm$cols, function(i) coef(lm(value ~ x, i)))
data.frame(cols = names(fits), do.call(rbind, fits))
Is there any way how I can extract the x-intercept other than manually calculating from the slope and y-intercept?
Thanks for the help!
You could do inverse prediction as implemented in package chemCal for calibrations if you don't want to calculate this yourself:
library(chemCal)
res <- by(xm[-2], xm$cols, function(i) inverse.predict(lm(value ~ x, i), 0)$Prediction)
res[1:3]
#xm$cols
#y s t
#2.629981 2.819734 2.819734
Edit:
Maybe you prefer this:
library(plyr)
res <- ddply(xm, .(cols),
function(i) data.frame(xinter=inverse.predict(lm(value ~ x, i), 0)$Prediction))
# cols xinter
# 1 y 2.629981
# 2 s 2.819734
# 3 t 2.819734
I don't think you can avoid computing the linear equation, though of course you don't have to do it by hand (unless you want to). For example:
by(xm[-2], xm$cols, function(i) {
fit <- lm(value~x, i); print(fit); solve(coef(fit)[-1], -coef(fit)[1] )}
)
Call:
lm(formula = value ~ x, data = i)
Coefficients:
(Intercept) x
-277.2 105.4
Call:
lm(formula = value ~ x, data = i)
Coefficients:
(Intercept) x
-99.07 35.13
Call:
lm(formula = value ~ x, data = i)
Coefficients:
(Intercept) x
-594.4 210.8
xm$cols: y
[1] 2.629981
-----------------------------------------------------------------------------------------------------------------
xm$cols: s
[1] 2.819734
-----------------------------------------------------------------------------------------------------------------
xm$cols: t
[1] 2.819734
What was solved is basically -277.2 + 105.4*x = 0 for x -> 105.4*x = 277.2 (the solve-function call) -> x = 2.629981. Seems your lines 's' and 't' intersect the y=0 axis at the same spot. If I understood correctly, your problem isn't extrapolation since your x-range covers the intercept but instead interpolation.
Ps. I think your code was missing: require("reshape")
EDIT:
result <- c(by(xm[-2], xm$cols, function(i) { fit <- lm(value~x, i); print(fit); solve(coef(fit)[-1], -coef(fit)[1] )} )); print(result)
> print(result)
y s t
2.629981 2.819734 2.819734
I found a way to calculate the x-intercept, first create a data frame with the y-intercept and slope values,
par <- data.frame(cols = names(fits), do.call(rbind, fits))
Then rename column header names to accurately denote the values,
colnames(par)[2] <- "y_intercept"
colnames(par)[3] <- "slope"
# Calculate the x-intercept by using the formula -(y_intercept)/slope
x_incpt <- -par[2]/par[3]
colnames(x_incpt) <- "x_intercept"
Which gives the following result,
x_intercept
y 2.629981
s 2.819734
t 2.819734

Resources