Getting Confidence Intervals from predicted values from a nlme model from package medrc - r

I am trying to figure out how to get confidence intervals from predicted values from a model run on medrc (nlme model). The code worked on the regular drc package model, which does not use random effects, so I assume there is something I am not doing right with this nlme model to get CI because I am getting errors.
Below is an example data frame of the data I am using
df <- data.frame(Geno = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8,
9,9,9,9,10,10,10,10,11,11,11,11,12,12,12,12,13,13,13,13,14,14,14,14),
Treatment = c(3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",
3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",
3,6,9,"MMM",3,6,9,"MMM"),
Temp = c(32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535),
PAM = c(0.62225,0.593,0.35775,0.654,0.60625,0.5846667,0.316,0.60875,0.62275,0.60875,0.32125,
0.63725,0.60275,0.588,0.32275,0.60875,0.65225,0.6185,0.29925,0.64525,0.61925,0.61775,
0.11725,0.596,0.603,0.6065,0.2545,0.59025,0.586,0.5895,0.27025,0.59125,0.6345,0.6135,
0.3755,0.622,0.53375,0.552,0.2485,0.51925,0.6375,0.6256667,0.3575,0.63975,0.59375,0.6055,
0.333,0.64125,0.55275,0.51025,0.319,0.55725,0.6375,0.64725,0.348,0.66125))
df$Geno <- as.factor(df$Geno)
With this data, I am running this model that has 3 parameters for the dose-response curve model, b =slope, d= max, e= ED50.
model <- medrm(PAM ~ Temp,
data=df,
random= d + e ~ 1|Geno,
fct=LL.3(),
control=nlmeControl(msMaxIter = 2000, maxIter=2000, minScale=0.00001, tolerance=0.1, pnlsTol=1))
summary(model)
plot(model)
From this model I want to make prediction values for different temperatures along the model
model_preddata = data.frame(Temp = seq(28,39, length.out = 100))
model_pred = as.data.frame(predict(model, newdata = model_preddata, interval = 'confidence'))
with this I get an error but I can make it predict the PAM values if I add this
model_pred = as.data.frame(predict(model, newdata = model_preddata, interval = 'confidence', level = 0))
However this does not give me the lower and upper bounds columns like it does when I run this code with other non mixed effect models.
Can anyone help me figure out how to get the CI from the predicted values of this model

Related

Different prediction results for a glm between sjPlot get_model_data and predict.glm, especially, but not only, with an averaged model

I run a glm with negative binomial to test the effect of two categorical variables on count data. I then used model average for all possible models and tried to predict the model's response using two methods (see code below):
using predict
using get_model_data from sjPlot
The model fit gives the same results, but the CI are much bigger with predict. I also tried this with a simple model (not averaged) and got closer results, but still different.
m1 = glm.nb(C~ A*B, data = Route) #A has two levels and B has three levels
d1 = dredge(m1)
m.avg1 = model.avg(d1, fit = T)
newdat = expand.grid(B= c("B1","B2","B3"), A = c("A1", "A2"))
as.data.frame(predict(m.avg1, newdata = newdat, se.fit = T, type = "response")) %>%
mutate(CI = 1.96*(se.fit), CI.u = fit+CI, CI.l = fit-CI) %>%
cbind(.,newdat)
get_model_data(m.avg1,type="pred", terms = c("B", "A"), se = T) #although I use se = T, I do not get se in the output, just 95% CI
Here is a link to the data on google sheets
An example of the results:
With predict, for A1*B1 I get predicted fit = 13.24, 95% CI = [10.23, 16.24]
With get_model_data, for A1*B1 I get predicted fit = 13.24, 95% CI = [13.01, 13.46]
When doing the same with for the original model m1 I get closer results - a difference of only 0.31 in CI between the two options (for the same example as above). Yet, this is still a difference which keeps me up at night...

Similar to cox regression hazard model, can we get survival curves and hazard ratios using survivalsvm?

I am a beginner, trying to do survival analysis using machine learning on the lung cancer dataset. I know how to do the survival analysis using the Cox proportional hazard model. Cox proportional hazard model provides us the hazard ratios, which are nothing but the exponential of the regression coefficients. I wonder if, we can do the same thing using machine learning. As a beginner, I am trying survivalsvm from the R language. Please see the link for this. I am using the inbuilt cancer data for doing survival analysis. Following is the R code, given at this link.
library(survival)
library(survivalsvm)
set.seed(123)
n <- nrow(veteran)
train.index <- sample(1:n, 0.7 * n, replace = FALSE)
test.index <- setdiff(1:n, train.index)
survsvm.reg <- survivalsvm(Surv(diagtime, status) ~ .,
subset = train.index, data = veteran,
type = "regression", gamma.mu = 1,
opt.meth = "quadprog", kernel = "add_kernel")
print(survsvm.reg)
pred.survsvm.reg <- predict(object = survsvm.reg,
newdata = veteran, subset = test.index)
print(pred.survsvm.reg)
Can anyone help me to get the hazard ratios or survival curve for this dataset? Also, how to interpret the output of this function
This question is kind of old now but I'm going to answer anyway because this is a difficult problem and I struggled with {survivalsvm} when I first used it.
So depending on the type argument you get different outputs. In your case type = "regression" means you are plotting Shivaswamy's (hope i spelt correctly) SVCR which predicts the time until an event takes place, so these are survival time predictions.
In order to convert this to a survival curve you have to make some assumptions about the shape of the survival distribution. So for example, let's say you think the survival time is Normally distributed with N(mu, sigma). Then you can use your predicted survival time as mu and either predict or make an assumption about sigma.
Below is an example using your code and my {distr6} package, which enables quick computation of many distributions and printing and plotting of functions:
library(survival)
library(survivalsvm)
set.seed(123)
n <- nrow(veteran)
train.index <- sample(1:n, 0.7 * n, replace = FALSE)
test.index <- setdiff(1:n, train.index)
survsvm.reg <- survivalsvm(Surv(diagtime, status) ~ .,
subset = train.index, data = veteran,
type = "regression", gamma.mu = 1,
opt.meth = "quadprog", kernel = "add_kernel")
print(survsvm.reg)
pred.survsvm.reg <- predict(object = survsvm.reg,
newdata = veteran, subset = test.index)
# load distr6
library(distr6)
# create a vector of normal distributions each with
# mean as the predicted time and with variance 1
# `decorators = "ExoticStatistics"` adds survival function
v = VectorDistribution$new(distribution = "Normal",
params = data.frame(mean = as.numeric(pred.survsvm.reg$predicted)),
shared_params = list(var = 1),
decorators = "ExoticStatistics")
# survival function evaluated at times = 1:10
v$survival(1:10)
# plot survival function for first individual
plot(v[1], fun = "survival")
# plot hazard function for first individual
plot(v[1], fun = "hazard")

Bootstrapping in R: Predict

I am running a program where I conduct an OLS regression and then I subtract the coefficients from the actual observations to keep the residuals.
model1 = lm(data = final, obs ~ day + poly(temp,2) + prpn + school + lag1) # linear model
predfit = predict(model1, final) # predicted values
residuals = data.frame(final$obs - predfit) # obtain residuals
I want to bootstrap my model and then do the same with the bootstrapped coefficients. I try doing this the following way:
lboot <- lm.boot(model1, R = 1000)
predfit = predict(lboot, final)
residuals = data.frame(final$obs - predfit) # obtain residuals
However, that does not work. I also try:
boot_predict(model1, final, R = 1000, condense = T, comparison = "difference")
and that also does not work.
How can I bootstrap my model and then predict based of that?
If you're trying to fit the best OLS using bootstrap, I'd use the caret package.
library(caret)
#separate indep and dep variables
indepVars = final[,-final$obs]
depVar = final$obs
#train model
ols.train = train(indepVars, depVar, method='lm',
trControl = trainControl(method='boot', number=1000))
#make prediction and get residuals
ols.pred = predict(ols.train, indepVars)
residuals = ols.pred - final$obs

Predict probability distribution rather than point estimate from R linear and poisson models

I'm trying to predict a probability distribution (or parts of the distribution, for example, the inner 95%) from a linear and poisson model in R. Currently, all that I get from the prediction is a point estimate, and I can of course generate the standard errors as well. Is there a way to generate a distribution for the prediction, aside from perhaps running a Bayesian analysis and generating the posterior distribution?
To give a simple example, I'll use the Batting table from the Lahman baseball database and predict Runs from Hits. I'll do a linear model and a poisson model (not that the poisson distribution is the best for this data, but it will just serve as an example).
library(Lahman)
library(tidyverse)
df <- Batting
df <- df %>% filter(yearID > 2009, AB > 200)
## linear model
fit_lm <- lm(R ~ H, data = df)
summary(fit_lm)
predict(fit_lm, newdata = list(H = 50))
## poisson model
fit_pois <- glm(R ~ H, data = df, family = "poisson")
summary(fit_pois)
predict(fit_pois, newdata = list(H = 50), type = "response")
Could I potentially just use the predicted $fit and the $se to generate the distribution?
predict(fit_lm, newdata = list(H = 50), se = T)
hist(rnorm(n = 1000, mean = 50, sd = 2*0.34))

How to check for overdispersion in a GAM with negative binomial distribution?

I fit a Generalized Additive Model in the Negative Binomial family using gam from the mgcv package. I have a data frame containing my dependent variable y, an independent variable x, a factor fac and a random variable ran. I fit the following model
gam1 <- gam(y ~ fac + s(x) + s(ran, bs = 're'), data = dt, family = "nb"
I have read in Negative Binomial Regression book that it is still possible for the model to be overdisperesed. I have found code to check for overdispersion in glm but I am failing to find it for a gam. I have also encountered suggestions to just check the QQ plot and standardised residuals vs. predicted residuals, but I can not decide from my plots if the data is still overdisperesed. Therefore, I am looking for an equation that would solve my problem.
A good way to check how well the model compares with the observed data (and hence check for overdispersion in the data relative to the conditional distribution implied by the model) is via a rootogram.
I have a blog post showing how to do this for glm() models using the countreg package, but this works for GAMs too.
The salient parts of the post applied to a GAM version of the model are:
library("coenocliner")
library('mgcv')
## parameters for simulating
set.seed(1)
locs <- runif(100, min = 1, max = 10) # environmental locations
A0 <- 90 # maximal abundance
mu <- 3 # position on gradient of optima
alpha <- 1.5 # parameter of beta response
gamma <- 4 # parameter of beta response
r <- 6 # range on gradient species is present
pars <- list(m = mu, r = r, alpha = alpha, gamma = gamma, A0 = A0)
nb.alpha <- 1.5 # overdispersion parameter 1/theta
zprobs <- 0.3 # prob(y == 0) in binomial model
## simulate some negative binomial data from this response model
nb <- coenocline(locs, responseModel = "beta", params = pars,
countModel = "negbin",
countParams = list(alpha = nb.alpha))
df <- setNames(cbind.data.frame(locs, nb), c("x", "yNegBin"))
OK, so we have a sample of data drawn from a negative binomial sampling distribution and we will now fit two models to these data:
A Poisson GAM
m_pois <- gam(yNegBin ~ s(x), data = df, family = poisson())
A negative binomial GAM
m_nb <- gam(yNegBin ~ s(x), data = df, family = nb())
The countreg package is not yet on CRAN but it can be installed from R-Forge:
install.packages("countreg", repos="http://R-Forge.R-project.org")
Then load the packages and plot the rootograms:
library("countreg")
library("ggplot2")
root_pois <- rootogram(m_pois, style = "hanging", plot = FALSE)
root_nb <- rootogram(m_nb, style = "hanging", plot = FALSE)
Now plot the rootograms for each model:
autoplot(root_pois)
autoplot(root_nb)
This is what we get (after plotting both using cowplot::plot_grid() to arrange the two rootograms on the same plot)
We can see that the negative binomial model does a bit better here than the Poisson GAM for these data — the bottom of the bars are closer to zero throughout the range of the observed counts.
The countreg package has details on how you can add an uncertain band around the zero line as a form of goodness of fit test.
You can also compute the Pearson estimate for the dispersion parameter using the Pearson residuals of each model:
r$> sum(residuals(m_pois, type = "pearson")^2) / df.residual(m_pois)
[1] 28.61546
r$> sum(residuals(m_nb, type = "pearson")^2) / df.residual(m_nb)
[1] 0.5918471
In both cases, these should be 1; we see substantial overdispersion in the Poisson GAM, and some under-dispersion in the Negative Binomial GAM.

Resources