I was wondering how to obtain slope estimates with SE and p-values for each segment, for a lme model using linear b-splines.
I can get slope estimates using predict, but not SE and p-values.
Here is an example:
rm(list = ls())
library(splines)
library(nlme)
getY <- function(x) ifelse(x < 7, x * 1.3, x * 0.6) + rnorm(length(x))
set.seed(123)
data <- data.frame(Id = numeric(0), X = numeric(0), Y = numeric(0))
for (i in 1:10) {
X <- sample(1:10, 4)
Y <- getY(X) + rnorm(1, 0.5)
Id <- rep(i, 4)
data <- rbind(data, cbind(Id = Id, X = X, Y = Y))
}
gdata <- groupedData(Y ~ X | Id, data)
mod <- lme(fixed = Y ~ bs(X, degree = 1, knots = 7), data = gdata, random = ~1 |
Id)
summary(mod)
Linear mixed-effects model fit by REML
Data: gdata
AIC BIC logLik
158.2 166.2 -74.09
Random effects:
Formula: ~1 | Id
(Intercept) Residual
StdDev: 1.217 1.389
Fixed effects: Y ~ bs(X, degree = 1, knots = 7)
Value Std.Error DF t-value p-value
(Intercept) 3.098 0.5817 28 5.326 0e+00
bs(X, degree = 1, knots = 7)1 4.031 0.7714 28 5.225 0e+00
bs(X, degree = 1, knots = 7)2 3.253 0.7258 28 4.481 1e-04
Correlation:
(Intr) b(X,d=1,k=7)1
bs(X, degree = 1, knots = 7)1 -0.597
bs(X, degree = 1, knots = 7)2 -0.385 0.233
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.469915 -0.628202 0.005586 0.541398 1.748387
Number of Observations: 40
Number of Groups: 10
plot(augPred(mod))
pred1 <- predict(mod, data.frame(X = 1:2), level = 0)
pred2 <- predict(mod, data.frame(X = 8:9), level = 0)
(slope1 <- diff(pred1))
1 0.6718
(slope2 <- diff(pred2))
1 -0.2594
Wouldn't you just take the differences of a predict result?
predict(mod, newdata=data.frame(X=1:10, Id=1) )
1 1 1 1 1 1 1 1 1
3.449572 4.121362 4.793152 5.464941 6.136731 6.808521 7.480311 7.220928 6.961544
1
6.702161
attr(,"label")
[1] "Predicted values"
So:
plot( predict(mod, newdata=data.frame(X=1:10, Id=1) ), ylim=c(-2,8))
lines( 1:9, diff(predict(mod, newdata=data.frame(X=1:10, Id=1) ), ylim=c(-2,8)) )
Related
I want to build a function additive_glm which will allow user to specify additive arguments to glm function if needed.
Let's consider data:
set.seed(42)
bin_var <- sample(0:1, 125, T)
indep_1 <- rnorm(125)
indep_2 <- rexp(125)
df <- data.frame("Norm" = indep_1, "Exp" = indep_2)
And my function additive_glm:
additive_glm <- function(y, x, glm_args = NULL){
do.call("glm", c(list(
formula = y ~ ., data = base::quote(as.data.frame(x)),
family = binomial(link = 'logit')
), glm_args))
}
But now if I want to run my function:
additive(bin_var, df)
I get:
Call: glm(formula = y ~ ., family = structure(list(family = "binomial",
link = "logit", linkfun = function (mu)
.Call(C_logit_link, mu), linkinv = function (eta)
.Call(C_logit_linkinv, eta), variance = function (mu)
mu * (1 - mu), dev.resids = function (y, mu, wt)
.Call(C_binomial_dev_resids, y, mu, wt), aic = function (y,
n, mu, wt, dev)
{
m <- if (any(n > 1))
n
else wt
-2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m *
y), round(m), mu, log = TRUE))
}, mu.eta = function (eta)
.Call(C_logit_mu_eta, eta), initialize = expression({
if (NCOL(y) == 1) {
if (is.factor(y))
y <- y != levels(y)[1L]
n <- rep.int(1, nobs)
y[weights == 0] <- 0
if (any(y < 0 | y > 1))
stop("y values must be 0 <= y <= 1")
mustart <- (weights * y + 0.5)/(weights + 1)
m <- weights * y
if (any(abs(m - round(m)) > 0.001))
warning("non-integer #successes in a binomial glm!")
}
else if (NCOL(y) == 2) {
if (any(abs(y - round(y)) > 0.001))
warning("non-integer counts in a binomial glm!")
n <- y[, 1] + y[, 2]
y <- ifelse(n == 0, 0, y[, 1]/n)
weights <- weights * n
mustart <- (n * y + 0.5)/(n + 1)
}
else stop("for the 'binomial' family, y must be a vector of 0 and 1's\nor a 2 column matrix where col 1 is no. successes and col 2 is no. failures")
}), validmu = function (mu)
all(is.finite(mu)) && all(mu > 0 & mu < 1), valideta = function (eta)
TRUE, simulate = function (object, nsim)
{
ftd <- fitted(object)
n <- length(ftd)
ntot <- n * nsim
wts <- object$prior.weights
if (any(wts%%1 != 0))
stop("cannot simulate from non-integer prior.weights")
if (!is.null(m <- object$model)) {
y <- model.response(m)
if (is.factor(y)) {
yy <- factor(1 + rbinom(ntot, size = 1, prob = ftd),
labels = levels(y))
split(yy, rep(seq_len(nsim), each = n))
}
else if (is.matrix(y) && ncol(y) == 2) {
yy <- vector("list", nsim)
for (i in seq_len(nsim)) {
Y <- rbinom(n, size = wts, prob = ftd)
YY <- cbind(Y, wts - Y)
colnames(YY) <- colnames(y)
yy[[i]] <- YY
}
yy
}
else rbinom(ntot, size = wts, prob = ftd)/wts
}
else rbinom(ntot, size = wts, prob = ftd)/wts
}), class = "family"), data = as.data.frame(x))
Coefficients:
(Intercept) Norm Exp
0.2235 -0.2501 -0.2612
Degrees of Freedom: 124 Total (i.e. Null); 122 Residual
Null Deviance: 173.2
Residual Deviance: 169.7 AIC: 175.7
So I really get what I want, however - it's preceded by huge Call code. I was looking for some techniques to get rid of it, however I wasn't so successful. Do you know how to omit this huge part of unnecessary code ?
1) Put the family argument within quote(...) . Only the line marked ## is changed.
additive_glm <- function(y, x, glm_args = NULL){
do.call("glm", c(list(
formula = y ~ ., data = base::quote(as.data.frame(x)),
family = quote(binomial(link = 'logit')) ##
), glm_args))
}
additive_glm(bin_var, df)
giving:
Call: glm(formula = y ~ ., family = binomial(link = "logit"), data = as.data.frame(x))
Coefficients:
(Intercept) Norm Exp
0.32821 -0.06504 -0.05252
Degrees of Freedom: 124 Total (i.e. Null); 122 Residual
Null Deviance: 171
Residual Deviance: 170.7 AIC: 176.7
2) Another possibility is:
additive_glm2 <- function(y, x, ...){
glm(y ~ ., data = as.data.frame(x), family = binomial(link = "logit"), ...)
}
additive_glm2(bin_var, df)
giving:
Call: glm(formula = y ~ ., family = binomial(link = "logit"), data = as.data.frame(x))
Coefficients:
(Intercept) Norm Exp
0.32821 -0.06504 -0.05252
Degrees of Freedom: 124 Total (i.e. Null); 122 Residual
Null Deviance: 171
Residual Deviance: 170.7 AIC: 176.7
I don't understand why you are using do.call. I would do this:
additive_glm <- function(y, x, family = binomial(link = 'logit'), ...){
mc <- match.call()
yname <- mc[["y"]]
xname <- mc[["x"]]
x[[as.character(yname)]] <- y
assign(as.character(xname), x)
eval(substitute(glm(yname ~ ., data = xname, family = family, ...), env = environment()))
}
additive_glm(bin_var, df)
#Call: glm(formula = bin_var ~ ., family = binomial(link = "logit"),
# data = df)
#
#Coefficients:
#(Intercept) Norm Exp
# 0.32821 -0.06504 -0.05252
#
#Degrees of Freedom: 124 Total (i.e. Null); 122 Residual
#Null Deviance: 171
#Residual Deviance: 170.7 AIC: 176.7
Note the nicely printed call.
I am trying to fit a crossed non-linear random effect model as the linear random effect models as mentioned in this question and in this mailing list post using the nlme package. Though, I get an error regardless of what I try. Here is an example
library(nlme)
#####
# simulate data
set.seed(18112003)
na <- 30
nb <- 30
sigma_a <- 1
sigma_b <- .5
sigma_res <- .33
n <- na*nb
a <- gl(na,1,n)
b <- gl(nb,na,n)
u <- gl(1,1,n)
x <- runif(n, -3, 3)
y_no_noise <- x + sin(2 * x)
y <-
x + sin(2 * x) +
rnorm(na, sd = sigma_a)[as.integer(a)] +
rnorm(nb, sd = sigma_b)[as.integer(b)] +
rnorm(n, sd = sigma_res)
#####
# works in the linear model where we know the true parameter
fit <- lme(
# somehow we found the right values
y ~ x + sin(2 * x),
random = list(u = pdBlocked(list(pdIdent(~ a - 1), pdIdent(~ b - 1)))))
vv <- VarCorr(fit)
vv2 <- vv[c("a1", "b1"), ]
storage.mode(vv2) <- "numeric"
print(vv2,digits=4)
#R Variance StdDev
#R a1 1.016 1.0082
#R b1 0.221 0.4701
#####
# now try to do the same with `nlme`
fit <- nlme(
y ~ c0 + sin(c1),
fixed = list(c0 ~ x, c1 ~ x - 1),
random = list(u = pdBlocked(list(pdIdent(~ a - 1), pdIdent(~ b - 1)))),
start = c(0, 0.5, 1))
#R Error in nlme.formula(y ~ a * x + sin(b * x), fixed = list(a ~ 1, b ~ :
#R 'random' must be a formula or list of formulae
The lme example is similar to the one page 163-166 of "Mixed-effects Models in S and S-PLUS" with only 2 random effects instead of 3.
I should haved used a two-sided formula as written in help("nlme")
fit <- nlme(
y ~ c0 + c1 + sin(c2),
fixed = list(c0 ~ 1, c1 ~ x - 1, c2 ~ x - 1),
random = list(u = pdBlocked(list(pdIdent(c0 ~ a - 1), pdIdent(c1 ~ b - 1)))),
start = c(0, 0.5, 1))
# fixed effects estimates
fixef(fit)
#R c0.(Intercept) c1.x c2.x
#R -0.1788218 0.9956076 2.0022338
# covariance estimates
vv <- VarCorr(fit)
vv2 <- vv[c("c0.a1", "c1.b1"), ]
storage.mode(vv2) <- "numeric"
print(vv2,digits=4)
#R Variance StdDev
#R c0.a1 0.9884 0.9942
#R c1.b1 0.2197 0.4688
I try to estimate confidence intervals for several parameters of a nonlinear model using bootstrapping. Right now, I do bootstrapping for for each parameter individually. Therefore I have to gererate the model serveral times.
Here is an example:
library(boot)
# generate some data:
x <- rnorm(300, mean = 5, sd = 2)
y <- xvalues^2*rnorm(300, mean = 1.5, sd = 1) + rnorm(300, mean = 3, sd = 1)
data <- data.frame(x = x, y = y)
# this is my model: nls(y ~ b1*x^2+b2, data = data, start = list(b1 = 1.5,b2 = 3))
# functions for bootstrapping:
getParamB1 <- function(x1, idx){
data <- x1 %>%
dplyr::slice(idx)
model <- nls(y ~ b1*x^2+b2, data = data, start = list(b1 = 1.5,b2 = 3))
coef(model)[['b1']]
}
getParamB2 <- function(x1, idx){
data <- x1 %>%
dplyr::slice(idx)
model <- nls(y ~ b1*x^2+b2, data = data, start = list(b1 = 1.5,b2 = 3))
coef(model)[['b2']]
}
# Calculate bootstrap confidence intervals
btrpB1 <- boot(data, statistic = getParamB1, R=200)
btrpB2 <- boot(data, statistic = getParamB2, R=200)
ciB1 <- boot.ci(btrpB1)
ciB2 <- boot.ci(btrpB2)
This is of course not very nice code. Is there a way to estiamte confidence intervals for several parameters (here b1 and b2) at once?
How about this?
library(boot)
# generate some data:
x <- rnorm(300, mean = 5, sd = 2)
y <- x^2 * rnorm(300, mean = 1.5, sd = 1) + rnorm(300, mean = 3, sd = 1)
df <- data.frame(x = x, y = y)
m1 <- nls(y ~ b1 * x^2 + b2, data = df, start = list(b1 = 1.5, b2 = 3))
boot.coef <- function(mod, data, indices) {
assign(deparse(mod$data), data[indices, ])
m <- eval(mod$call)
return(coef(m))
}
results <- boot(data = df, statistic = boot.coef,
R = 1000, mod = m1)
DF <- data.frame(factor1=rep(1:4,1000), factor2 = rep(1:4,each=1000),base = rnorm(4000,0,1),dep=rnorm(4000,400,5))
DF$f1_1 = DF$factor1 == 1
DF$f1_2 = DF$factor1 == 2
DF$f1_3 = DF$factor1 == 3
DF$f1_4 = DF$factor1 == 4
DF$f2_1 = DF$factor2 == 1
DF$f2_2 = DF$factor2 == 2
DF$f2_3 = DF$factor2 == 3
DF$f2_4 = DF$factor2 == 4
I want to run the following regression:
Dep = (f1_1 + f1_2 + f1_3 + f1_4)*(f2_1 + f2_2 + f2_3 + f2_4)*(base+base^2+base^3+base^4+base^5)
Is there a smarter way to do it?
You should code factor1 and factor2 as real factor variables. Also, it is better to use poly for polynomials. Here is what we can do:
DF <- data.frame(factor1=rep(1:4,1000), factor2 = rep(1:4,each=1000),
base = rnorm(4000,0,1), dep = rnorm(4000,400,5))
DF$factor1 <- as.factor(DF$factor1)
DF$factor2 <- as.factor(DF$factor2)
fit <- lm(dep ~ factor1 * factor2 * poly(base, degree = 5))
By default, poly generates orthogonal basis for numerical stability. If you want ordinary polynomials like base + base ^ 2 + base ^ 3 + ..., use poly(base, degree = 5, raw = TRUE).
Be aware, you will get lots of parameters from this model, as you are fitting a fifth order polynomial for each pair of levels between factor1 and factor2.
Consider a small example.
set.seed(0)
f1 <- sample(gl(3, 20, labels = letters[1:3])) ## randomized balanced factor
f2 <- sample(gl(3, 20, labels = LETTERS[1:3])) ## randomized balanced factor
x <- runif(3 * 20) ## numerical covariate
y <- rnorm(3 * 20) ## toy response
fit <- lm(y ~ f1 * f2 * poly(x, 2))
#Call:
#lm(formula = y ~ f1 * f2 * poly(x, 2))
#
#Coefficients:
# (Intercept) f1b f1c
# -0.5387 0.8776 0.1572
# f2B f2C poly(x, 2)1
# 0.5113 1.0139 5.8345
# poly(x, 2)2 f1b:f2B f1c:f2B
# 2.4373 1.0666 0.1372
# f1b:f2C f1c:f2C f1b:poly(x, 2)1
# -1.4951 -1.4601 -6.2338
# f1c:poly(x, 2)1 f1b:poly(x, 2)2 f1c:poly(x, 2)2
# -11.0760 -2.3668 1.9708
# f2B:poly(x, 2)1 f2C:poly(x, 2)1 f2B:poly(x, 2)2
# -3.7127 -5.8253 5.6227
# f2C:poly(x, 2)2 f1b:f2B:poly(x, 2)1 f1c:f2B:poly(x, 2)1
# -7.3582 20.9179 11.6270
#f1b:f2C:poly(x, 2)1 f1c:f2C:poly(x, 2)1 f1b:f2B:poly(x, 2)2
# 1.2897 11.2041 12.8096
#f1c:f2B:poly(x, 2)2 f1b:f2C:poly(x, 2)2 f1c:f2C:poly(x, 2)2
# -9.8476 10.6664 4.5582
Note, even for 3 factor levels each and a 3rd order polynomial, we already end up with great number of coefficients.
Using I () forces the formula to treat +-×/ as arithmetic rather than model operators. Example: lm (y ~ I (x1 +x2))
I would like to call glm in my function, a minimum example is:
my.glm <- function(...){
fit <- glm(...)
summary(fit)
}
However, it gives an error.
a <- data.frame(x=rpois(100, 2), y=rnorm(100) )
glm(x ~ 1, offset=y, family=poisson, data=a)
my.glm(x ~ 1, offset=y, family=poisson, data=a) # error eval(expr, envir, enclos)
What can I do?
You can use match.call to expand the ..., and modify its output to make it a call to glm:
my.glm <- function(...){
cl <- match.call()
cl[1] <- call("glm")
fit <- eval(cl)
summary(fit)
}
my.glm(x ~ 1, offset=y, family=poisson, data=a)
Call:
glm(formula = x ~ 1, family = poisson, data = a, offset = y)
Deviance Residuals:
Min 1Q Median 3Q Max
-7.1789 -0.8575 0.3065 1.5343 4.4896
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.07628 0.07433 1.026 0.305
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 346 on 99 degrees of freedom
Residual deviance: 346 on 99 degrees of freedom
AIC: 559.46
Number of Fisher Scoring iterations: 6
A solution in case you want to modify some things and then pass them to glm() along with ... (i.e. ...: additional arguments passed to glm()). This requires the rlang package, but there's probably a way to do it without.
glm_wrap <- function(data, formula, ...) {
#e.g. modify data and formula
data$new <- data$x + rnorm(nrow(data))
f <- update(formula, .~. + new)
#construct new call
new_call <- as.call(c(list(rlang::sym("glm"), formula = f, data = data), rlang::exprs(...)))
eval(new_call)
}
The resulting call is unfortunately long and ugly though.
df <- data.frame(y = 1:10, x = rnorm(10), z = runif(10, 1, 3))
glm_wrap(data = df, formula = y~x, family = gaussian(link = "log"), offset = log(z))
#>
#> Call: glm(formula = y ~ x + new, family = gaussian(link = "log"), data = structure(list(
#> y = 1:10, x = c(0.788586544201169, -0.191055916962356, -0.709038064642618,
#> -1.43594109422505, 0.139431523468874, 1.58756249459749, -0.699123220004699,
#> 0.824223253644347, 0.979299697212903, -0.766809343110728),
#> z = c(1.40056129638106, 1.53261906700209, 1.59653351828456,
#> 2.90909940004349, 2.1954998113215, 2.77657635230571, 2.63835062459111,
#> 2.78547951159999, 2.52235971018672, 1.20802361145616), new = c(-1.4056733559404,
#> -0.590623492831404, -0.460389391631124, 0.376223909604533,
#> -0.0865283753921801, 1.42297343043252, -0.391232902630507,
#> 0.835906008542682, 1.49391399054269, -0.861719595343475)), row.names = c(NA,
#> -10L), class = "data.frame"), offset = log(z))
#>
#> Coefficients:
#> (Intercept) x new
#> 0.87768 0.05808 0.03074
#>
#> Degrees of Freedom: 9 Total (i.e. Null); 7 Residual
#> Null Deviance: 79.57
#> Residual Deviance: 77.64 AIC: 56.87
Created on 2021-02-07 by the reprex package (v0.3.0)