Maximum Likelihood Estimation three-parameter Weibull for right censored data - r

I am trying to estimate the parameters of the three-parametric Weibull distribution with ML for censored data.
I've worked it out by using the package flexsurv where I've defined an "own" density function.
I've also followed the instructions given in the documentation of the function flexsurv::flexsurvregto build the list with all required information to do the MLE with a customer density function.
In the following you can see what I've done so far.
library(FAdist)
library(flexsurv)
set.seed(1)
thres <- 3500
data <- rweibull(n = 1000, shape = 2.2, scale = 25000) + thres
y <- sample(c(0, 1), size = 1000, replace = TRUE)
df1 <- data.frame(x = data, status = y)
dweib3 <- function(x, shape, scale, thres, log = FALSE) {
dweibull(x - thres, shape, scale, log = log)
}
pweib3 <- function(q, shape, scale, thres, log_p = FALSE) {
pweibull(q - thres, shape, scale, log.p = log_p)
}
# Not required
#qweib3 <- function(p, shape, scale, thres, log.p = FALSE) {
# if (log.p == TRUE) {
# p <- exp(p)
# }
# qwei3 <- thres + qweibull(p, shape, scale)
# return(qwei3)
#}
dweib3 <- Vectorize(dweib3)
pweib3 <- Vectorize(pweib3)
custom.weibull <- list(name = "weib3",
pars = c('shape', 'scale', 'thres'), location = 'scale',
transforms = c(log, log, log),
inv.transforms = c(exp, exp, exp),
inits = function(t) {
c(1.2 / sqrt((var(log(t)))), exp(mean(log(t)) + (.572 / (1.2 / sqrt((var(log(t))))))), .5 * min(t))
}
)
ml <- flexsurvreg(Surv(df1$x, df1$status) ~ 1, data = df1, dist = custom.weibull)
The variable y should represent the status of a unit where 1 is a failure and 0 is an unfailed unit until censoring.
The initial values for shape and scale are taken from the moments which are also defined in the fitdistrpluspackage.
For the threshold parameter there must be a constraining since the threshold must be really smaller than the minimum of the data. Therefore a constraint of threshold is at its max .99 * t_min would be satisfactory (this is something which I haven't implement until now).
The output of the above MLE is the following:
> ml
Call:
flexsurvreg(formula = Surv(df1$x, df1$status) ~ 1, data = df1,
dist = custom.weibull)
Estimates:
est L95% U95% se
shape 2.37e+00 2.12e+00 2.65e+00 1.33e-01
scale 3.52e+04 3.32e+04 3.74e+04 1.08e+03
thres 2.75e+03 1.51e+03 5.02e+03 8.44e+02
N = 1000, Events: 481, Censored: 519
Total time at risk: 25558684
Log-likelihood = -5462.027, df = 3
AIC = 10930.05
The estimated parameters aren't fine even if there is censoring.
I've did this procedure a few times with other randomly generated data... the estimates are always pretty far away from the "truth".
Therefore I need an improvement of my code or another possibility to estimate the parameters of a three-parameter Weibull with MLE.

Related

mle2 (bbmle) parameter estimates on boundary, NaNs produced in object#vcov

I'm trying to run a MLE for an infectious disease compartmental transmission model (SEIR, in my case SSEIR) with the mle2 command, trying to fit a curve of predicted number of weekly deaths to that of observed weekly deaths similar to this:
plot of predicted vs observed weekly deaths.
However, the parameter estimates seem to always be on the (sensible) boundaries I provide and SEs, z-values, p-values are NA.
I set up the SEIR model and then solve it with the ode solver. Using that model output and the observed data, I calculate a negative log likelihood, which I then submit to the mle2 function.
When I first set it up, there were multiple errors that stopped the script from running, but now that those are resolved, I cannot seem to find the root of why the fitting doesn't work.
I am certain that the boundaries I set for the parameter estimation are sensible. The parameters are transition rates between compartments and are therefore defined as (for example) delta = 1/duration of infectiousness, so there are very real biological boundaries on what the parameters can be.
I am aware that I am trying to fit a lot of parameters with not that much data, but the same problem persists when I try only fitting one, so that cannot be the root of it.
library(deSolve)
library(bbmle)
#data
gdta <- c(0, 36.2708172419082, 1.57129615346629, 28.1146409459558, 147.701669719614, 311.876708482584, 512.401145459178, 563.798275104372, 470.731269976821, 292.716043742125, 153.604156195608, 125.760068922451, 198.755685044427, 143.847282793854, 69.2693867232681, 42.2093135487066, 17.0200426587424)
#build seir function
seir <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
dS0 <- - beta0 * S0 * (I/N)
dS1 <- - beta1 * S1 * (I/N)
dE <- beta0 * S0 * (I/N) + beta1 * S1 * (I/N) - delta * E
dI <- delta * E - gamma * I
dR <- gamma * I
return(list(c(dS0, dS1, dE, dI, dR)))
})
}
# build function to run seir, include ode solver
run_seir <- function(time, state, beta0, beta1, delta, gamma, sigma, N, startInf) {
parameters <- c(beta0, beta1, delta, gamma)
names(parameters) <- c("beta0", "beta1", "delta", "gamma")
init <- c(S0 = (N - startInf)*(sigma) ,
S1 = (N - startInf) * (1-sigma),
E = 0,
I = startInf,
R = 0)
state_est <- as.data.frame(ode(y = init, times = times, func = seir, parms = parameters))
return(state_est)
}
times <- seq(0, 16, by = 1) #sequence
states <- c("S0", "S1", "E", "I", "R")
# run the run_seir function to see if it works
run_seir(time = times, state= states, beta0 = 1/(1.9/7), beta1 = 0.3*(1/(1.9/7)), delta = 1/(4.1/7), gamma = 1/(4.68/7), sigma = 0.7, N = 1114100, startInf = 100)
#build calc likelihood function
calc_likelihood <- function(times, state, beta0, beta1, delta, gamma, sigma, N, startInf, CFR) {
model.output <- run_seir(time, state, beta0, beta1, delta, gamma, sigma, N, startInf)
LL <- sum(dpois(round(as.numeric(gdta)), (model.output$I)/(1/delta)*CFR, log = TRUE))
print(LL)
return(LL)
}
# run calc_likelihood function
calc_likelihood(time = times, state = states, beta0 = 1/(1.9/7), beta1 = 0.3*(1/(1.9/7)), delta = 1/(4.1/7), gamma = 1/(4.68/7), sigma = 0.7, N = 1114100, startInf = 100, CFR = 0.02)
#MLE
#parameters that are supposed to be fixed
fixed.pars <- c(N=1114100, startInf=100, CFR = 0.02)
#parameters that mle2 is supposed to estimate
free.pars <- c(beta0 = 1/(1.9/7), beta1 = 0.3*(1/(1.9/7)),
delta = 1/(4.1/7), gamma = 1/(4.68/7), sigma = 0.7)
#lower bound
lower_v <- c(beta0 = 0, beta1 = 0, delta = 0, gamma = 0, sigma = 0)
#upper bound
upper_v <- c(beta0 = 15, beta1 = 15, delta = 15, gamma = 15, sigma = 1)
#sigma = 1, this is not a typo
#mle function - need to use L-BFGS-B since we need to include boundaries
test2 <- mle2(calc_likelihood, start = as.list(free.pars), fixed = as.list(fixed.pars),method = "L-BFGS-B", lower = lower_v, upper = upper_v)
summary(test2)
After I run mle2, I get a warning saying:
Warning message:
In mle2(calc_likelihood, start = as.list(free.pars), fixed = as.list(fixed.pars), :
some parameters are on the boundary: variance-covariance calculations based on Hessian may be unreliable
and if I look at summary(test2):
Warning message:
In sqrt(diag(object#vcov)) : NaNs produced
Based on the research I've done so far, I understand that the second error might be due to the estimates being on the boundaries, so my question really is how to address the first one.
If I run mle2 with only lower boundaries, I get parameter estimates in the millions, which cannot be correct.
I am fairly certain that my model specification for the SEIR is correct, but after staring at this code and trying to resolve this issue for a week, I'm open to any input on how to make the fitting work.
Thanks,
JJ

Adjusting ODE model output using a Rogan-Gladen estimator in R

I have made an ODE model in R using the package deSolve. Currently the output of the model gives me the "observed" prevalence of a disease (i.e. the prevalence not accounting for diagnostic imperfection).
However, I want to adjust the model to output the "true" prevalence, using a simple adjustment formula called the Rogan-Gladen estimator (http://influentialpoints.com/Training/estimating_true_prevalence.htm):
True prevalence =
(Apparent prev. + (Specificity-1)) / (Specificity + (Sensitivity-1))
As you will see in the code below, I have attempted to adjust only one of the differential equations (diggP).
Running the model without adjustment gives an expected output (a proportion between 0 and 1). However, attempting to adjust the model using the RG-estimator gives a spurious output (a proportion less than 0).
Any advice on what might be going wrong here would be very much appreciated.
# Load required packages
library(tidyverse)
library(broom)
library(deSolve)
# Set time (age) for function
time = 1:80
# Defining exponential decay of lambda over age
y1 = 0.003 + (0.15 - 0.003) * exp(-0.05 * time) %>% jitter(10)
df <- data.frame(t = time, y = y1)
fit <- nls(y ~ SSasymp(time, yf, y0, log_alpha), data = df)
fit
# Values of lambda over ages 1-80 years
data <- as.matrix(0.003 + (0.15 - 0.003) * exp(-0.05 * time))
lambda<-as.vector(data[,1])
t<-as.vector(seq(1, 80, by=1))
foi<-cbind(t, lambda)
foi[,1]
# Making lambda varying by time useable in the ODE model
input <- approxfun(x = foi[,1], y = foi[,2], method = "constant", rule = 2)
# Model
ab <- function(time, state, parms) {
with(as.list(c(state, parms)), {
# lambda, changing by time
import<-input(time)
# Derivatives
# RG estimator:
#True prevalence = (apparent prev + (sp-1)) / (sp + (se-1))
diggP<- (((import * iggN) - iggR * iggP) + (sp_igg-1)) / (sp_igg + (se_igg-1))
diggN<- (-import*iggN) + iggR*iggP
dtgerpP<- (0.5*import)*tgerpN -tgerpR*tgerpP
dtgerpN<- (0.5*-import)*tgerpN + tgerpR*tgerpP
# Return results
return(list(c(diggP, diggN, dtgerpP, dtgerpN)))
})
}
# Initial values
yini <- c(iggP=0, iggN=1,
tgerpP=0, tgerpN=1)
# Parameters
pars <- c(iggR = 0, tgerpR = (1/8)/12,
se_igg = 0.95, sp_igg = 0.92)
# Solve model
results<- ode(y=yini, times=time, func=ab, parms = pars)
# Plot results
plot(results, xlab="Time (years)", ylab="Proportion")

Bootstrap parameter estimate of non-linear optimization in R: Why is it different than the regular parameter estimate?

Here's the short version of my question. The code is below.
I calculated the parameters for the non-linear von Bertalanffy growth equation in R using optim(), and now I am trying to add 95% confidence intervals to the von B growth coefficient K by bootstrapping. For at least one of the years of data, when I summarize the bootstrapped output of the growth coefficient K, the mean and median parameter estimates from bootstrapping are quite different than the estimated parameter:
>summary(temp.store) # summary of bootstrap values
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.002449 0.005777 0.010290 0.011700 0.016970 0.056720
> est.K [1] 0.01655956 # point-estimate from the optimization
I suspect the discrepancy is because there are errors in the bootstrap of the random draw that bias the result, although I have used try() to stop the optimization from crashing when there is a combination of input values that cause an error. So I would like to know what to do to fix that issue. I think I'm doing things correctly, because the fitted curve looks right.
Also, I have run this code for data from other years, and in at least one other year, the bootrap estimate and the regular estimate are very close.
Long-winded version:
The von Bertalanffy growth curve (VBGC) for length is given by:
L(t) = L.inf * [1 - exp(-K*(t-t0))] (Eq. 3.1.0.1, from FAO)
where L(t) is the fish's length, L.inf is the asymptotic maximum length, K is the growth coefficient, t is the time step and t0 is when growth began. L(t) and t are the observed data. Usually time or age is measured in years, but here I am looking at juvenile fish data and I have made t the day the of year ("doy") starting with January 1 = 1.
To estimate the starting parameters for the optimization, I have used a linearization of the VBGC equation.
doy <- c(156,205,228,276,319,380)
len <- c(36,56,60,68,68,71)
data06 <- data.frame(doy,len)
Function to get starting parameters for the optimization:
get.init <-function(dframe){ # linearization of the von B
l.inf <- 80 # by eyeballing max juvenile fish
# make a response variable and store it in the data frame:
# Eqn. 3.3.3.1 in FAO document
dframe$vonb.y <- - log(1 - (dframe$len)/l.inf )
lin.vonb <- lm(vonb.y ~ doy, data=dframe)
icept <- lin.vonb$coef[1] # 0.01534013 # intercept is a
slope <- k.lin <- lin.vonb$coef[2] # slope is the K param
t0 <- - icept/slope # get t0 from this relship: intercept= -K * t0
pars <- c(l.inf,as.numeric(slope),as.numeric(t0))
}
Sums of squares for von Bertalanffy growth equation
vbl.ssq <- function(theta, data){
linf=theta[1]; k=theta[2]; t0=theta[3]
# name variables for ease of use
obs.length=data$len
age=data$doy
#von B equation
pred.length=linf*(1-exp(-k*(age-t0)))
#sums of squares
ssq=sum((obs.length-pred.length)^2)
}
Estimate parameters
#Get starting parameter values
theta_init <- get.init(dframe=data06)
# optimize VBGC by minimizing sums of square differences
len.fit <- optim(par=theta_init, fn=vbl.ssq, method="BFGS", data=data06)
est.linf <- len.fit$par[1] # vonB len-infinite
est.K <- len.fit$par[2] # vonB K
est.t0 <- len.fit$par[3] # vonB t0
Bootstrapping
# set up for bootstrap loop
tmp.frame <- data.frame()
temp.store <- vector()
# bootstrap to get 95% conf ints on growth coef K
for (j in 1:1000){
# choose indices at random, with replacement
indices <- sample(1:length(data06[,1]),replace=T)
# values from original data corresponding to those indices
new.len <- data06$len[indices]
new.doy <- data06$doy[indices]
tmp.frame <- data.frame(new.doy,new.len)
colnames(tmp.frame) <- c("doy","len")
init.par <- get.init(tmp.frame)
# now get the vonB params for the randomly selected samples
# using try() to keep optimizing errors from crashing the program
try( len.fit.bs <- optim(par=init.par, fn=vbl.ssq, method="BFGS", data=tmp.frame))
tmp.k <- len.fit.bs$par[2]
temp.store[j] <- tmp.k
}
95% confidence interval for K parameter
k.ci <- quantile(temp.store,c(0.025,0.975))
# 2.5% 97.5%
#0.004437702 0.019784178
Here's the problem:
#>summary(temp.store)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.002449 0.005777 0.010290 0.011700 0.016970 0.056720
#
# est.K [1] 0.01655956
Example of error:
Error in optim(par = init.par, fn = vbl.ssq, method = "BFGS", data = tmp.frame) :
non-finite finite-difference value [2]
I don't believe I am making any errors with the optimization because the VBGC fit looks reasonable. Here are the plots:
plot(x=data06$doy,y=data06$len,xlim=c(0,550),ylim=c(0,100))
legend(x="topleft",legend=paste("Length curve 2006"), bty="n")
curve(est.linf*(1-exp(-est.K*(x-est.t0))), add=T,type="l")
plot(x=2006,y=est.K, main="von B growth coefficient for length; 95% CIs",
ylim=c(0,0.025))
arrows(x0=2006,y0=k.ci[1],x1=2006,y1=k.ci[2], code=3,
angle=90,length=0.1)
First of all, you have a very small number of values, possibly too few to trust the bootstrap method. Then a high proportion of fits fails for the classic bootstrap, because due to the resampling you often have not enough distinct x values.
Here is an implementation using nls with a selfstarting model and the boot package.
doy <- c(156,205,228,276,319,380)
len <- c(36,56,60,68,68,71)
data06 <- data.frame(doy,len)
plot(len ~ doy, data = data06)
fit <- nls(len ~ SSasympOff(doy, Asym, lrc, c0), data = data06)
summary(fit)
#profiling CI
proCI <- confint(fit)
# 2.5% 97.5%
#Asym 68.290477 75.922174
#lrc -4.453895 -3.779994
#c0 94.777335 126.112523
curve(predict(fit, newdata = data.frame(doy = x)), add = TRUE)
#classic bootstrap
library(boot)
set.seed(42)
boot1 <- boot(data06, function(DF, i) {
tryCatch(coef(nls(len ~ SSasympOff(doy, Asym, lrc, c0), data = DF[i,])),
error = function(e) c(Asym = NA, lrc = NA, c0 = NA))
}, R = 1e3)
#proportion of unsuccessful fits
mean(is.na(boot1$t[, 1]))
#[1] 0.256
#bootstrap CI
boot1CI <- apply(boot1$t, 2, quantile, probs = c(0.025, 0.5, 0.975), na.rm = TRUE)
# [,1] [,2] [,3]
#2.5% 69.70360 -4.562608 67.60152
#50% 71.56527 -4.100148 113.9287
#97.5% 74.79921 -3.697461 151.03541
#bootstrap of the residuals
data06$res <- residuals(fit)
data06$fit <- fitted(fit)
set.seed(42)
boot2 <- boot(data06, function(DF, i) {
DF$lenboot <- DF$fit + DF[i, "res"]
tryCatch(coef(nls(lenboot ~ SSasympOff(doy, Asym, lrc, c0), data = DF)),
error = function(e) c(Asym = NA, lrc = NA, c0 = NA))
}, R = 1e3)
#proportion of unsuccessful fits
mean(is.na(boot2$t[, 1]))
#[1] 0
#(residuals) bootstrap CI
boot2CI <- apply(boot2$t, 2, quantile, probs = c(0.025, 0.5, 0.975), na.rm = TRUE)
# [,1] [,2] [,3]
#2.5% 70.19380 -4.255165 106.3136
#50% 71.56527 -4.100148 113.9287
#97.5% 73.37461 -3.969012 119.2380
proCI[2,1]
CIs_k <- data.frame(lwr = c(exp(proCI[2, 1]),
exp(boot1CI[1, 2]),
exp(boot2CI[1, 2])),
upr = c(exp(proCI[2, 2]),
exp(boot1CI[3, 2]),
exp(boot2CI[3, 2])),
med = c(NA,
exp(boot1CI[2, 2]),
exp(boot2CI[2, 2])),
estimate = exp(coef(fit)[2]),
method = c("profile", "boot", "boot res"))
library(ggplot2)
ggplot(CIs_k, aes(y = estimate, ymin = lwr, ymax = upr, x = method)) +
geom_errorbar() +
geom_point(aes(color = "estimate"), size = 5) +
geom_point(aes(y = med, color = "boot median"), size = 5) +
ylab("k") + xlab("") +
scale_color_brewer(name = "", type = "qual", palette = 2) +
theme_bw(base_size = 22)
As you see, the bootstrap CI is wider than the profile CI and bootstrapping the residuals results in a more narrow estimated CI. All of them are almost symmetric. Furthermore, the medians are close to the point estimates.
As a first step of investigating what goes wrong in your code, you should look at the proportion of failed fits from your procedure.

How to estimate the Kalman Filter with 'KFAS' R package, with an AR(1) transition equation?

I am using 'KFAS' package from R to estimate a state-space model with the Kalman filter. My measurement and transition equations are:
y_t = Z_t * x_t + \eps_t (measurement)
x_t = T_t * x_{t-1} + R_t * \eta_t (transition),
with \eps_t ~ N(0,H_t) and \eta_t ~ N(0,Q_t).
So, I want to estimate the variances H_t and Q_t, but also T_t, the AR(1) coefficient. My code is as follows:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
fit <- fitSSM(ss_model, inits = c(0,0.6,0), method = 'L-BFGS-B')
But it returns: "Error in is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)),: System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 1e+07"
The NA definitions for the variances works well, as documented in the package's paper. However, it seems this cannot be done for the AR coefficients. Does anyone know how can I do this?
Note that I am aware of the SSMarima function, which eases the definition of the transition equation as ARIMA models. Although I am able to estimate the AR(1) coef. and Q_t this way, I still cannot estimate the \eps_t variance (H_t). Moreover, I am migrating my Kalman filter codes from EViews to R, so I need to learn SSMcustom for other models that are more complicated.
Thanks!
It seems that you are missing something in your example, as your error message comes from the function fitSSM. If you want to use fitSSM for estimating general state space models, you need to provide your own model updating function. The default behaviour can only handle NA's in covariance matrices H and Q. The main goal of fitSSM is just to get started with simple stuff. For complex models and/or large data, I would recommend using your self-written objective function (with help of logLik method) and your favourite numerical optimization routines manually for maximum performance. Something like this:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
objf <- function(pars, model, estimate = TRUE) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
if (estimate) {
-logLik(model)
} else {
model
}
}
opt <- optim(c(1, 0.5, 1), objf, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100), model = ss_model)
ss_model_opt <- objf(opt$par, ss_model, estimate = FALSE)
Same with fitSSM:
updatefn <- function(pars, model) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
model
}
fit <- fitSSM(ss_model, c(1, 0.5, 1), updatefn, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100))
identical(ss_model_opt, fit$model)

Confidence Intervals for Lethal Dose (LD) for Logistic Regression in R

I want to find Lethal Dose (LD50) with its confidence interval in R. Other softwares line Minitab, SPSS, SAS provide three different versions of such confidence intervals. I could not find such intervals in any package in R (I also used findFn function from sos package).
How can I find such intervals? I coded for one type of intervals based on Delta method (as not sure about it correctness) but would like to use any established function from R package. Thanks
MWE:
dose <- c(10.2, 7.7, 5.1, 3.8, 2.6, 0)
total <- c(50, 49, 46, 48, 50, 49)
affected <- c(44, 42, 24, 16, 6, 0)
finney71 <- data.frame(dose, total, affected)
fm1 <- glm(cbind(affected, total-affected) ~ log(dose),
family=binomial(link = logit), data=finney71[finney71$dose != 0, ])
summary(fm1)$coef
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.886912 0.6429272 -7.601035 2.937717e-14
log(dose) 3.103545 0.3877178 8.004650 1.198070e-15
library(MASS)
xp <- dose.p(fm1, p=c(0.50, 0.90, 0.95)) # from MASS
xp.ci <- xp + attr(xp, "SE") %*% matrix(qnorm(1 - 0.05/2)*c(-1,1), nrow=1)
zp.est <- exp(cbind(xp, attr(xp, "SE"), xp.ci[,1], xp.ci[,2]))
dimnames(zp.est)[[2]] <- c("LD", "SE", "LCL","UCL")
zp.est
LD SE LCL UCL
p = 0.50: 4.828918 1.053044 4.363708 5.343724
p = 0.90: 9.802082 1.104050 8.073495 11.900771
p = 0.95: 12.470382 1.133880 9.748334 15.952512
From the package drc, you can get the ED50 (same calculation), along with confidence intervals.
library(drc) # Directly borrowed from the drc manual
mod <- drm(affected/total ~ dose, weights = total,
data = finney71[finney71$dose != 0, ], fct = LL2.2(), type = "binomial")
#intervals on log scale
ED(mod, c(50, 90, 95), interval = "fls", reference = "control")
Estimated effective doses
(Back-transformed from log scale-based confidence interval(s))
Estimate Lower Upper
1:50 4.8289 4.3637 5.3437
1:90 9.8021 8.0735 11.9008
1:95 12.4704 9.7483 15.9525
Which matches the manual output.
The "finney71" data is included in this package, and your calculation of confidence intervals exactly matches the example given by the drc folks, down to the "# from MASS" comment. You should give credit to them, rather than claiming you wrote the code.
There's a few other ways to figure this out. One is using parametric bootstrap, which is conveniently available through the boot package.
First, we'll refit the model.
library(boot)
finney71 <- finney71[finney71$dose != 0,] # pre-clean data
fm1 <- glm(cbind(affected, total-affected) ~ log(dose),
family=binomial(link = logit),
data=finney71)
And for illustration, we can figure out the LD50 and LD75.
statfun <- function(dat, ind) {
mod <- update(fm1, data = dat[ind,])
coefs <- coef(mod)
c(exp(-coefs[1]/coefs[2]),
exp((log(0.75/0.25) - coefs[2])/coefs[1]))
}
boot_out <- boot(data = finney71, statistic = statfun, R = 1000)
The boot.ci function can work out a variety of confidence intervals for us, using this object.
boot.ci(boot_out, index = 1, type = c('basic', 'perc', 'norm'))
##BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
##Based on 999 bootstrap replicates
##
##CALL :
##boot.ci(boot.out = boot_out, type = c("basic", "perc", "norm"),
## index = 1)
##Intervals :
##Level Normal Basic Percentile
##95% ( 3.976, 5.764 ) ( 4.593, 5.051 ) ( 4.607, 5.065 )
The confidence intervals using the normal approximation are thrown off quite a bit by a few extreme values, which the basic and percentile-based intervals are more robust to.
One interesting thing to note: if the sign of the slope is sufficiently unclear, we can get some rather extreme values (simulated as in this answer, and discussed more thoroughly in this blog post by Andrew Gelman).
set.seed(1)
x <- rnorm(100)
z = 0.05 + 0.1*x*rnorm(100, 0, 0.05) # small slope and more noise
pr = 1/(1+exp(-z))
y = rbinom(1000, 1, pr)
sim_dat <- data.frame(x, y)
sim_mod <- glm(y ~ x, data = sim_dat, family = 'binomial')
statfun <- function(dat, ind) {
mod <- update(sim_mod, data = dat[ind,])
-coef(mod)[1]/coef(mod)[2]
}
sim_boot <- boot(data = sim_dat, statistic = statfun, R = 1000)
hist(sim_boot$t[,1], breaks = 100,
main = "Bootstrap of simulated model")
The delta method above gives us mean = 6.448, lower ci = -36.22, and upper ci = 49.12, and all of the bootstrap CIs give us similarly extreme estimates.
##Level Normal Basic Percentile
##95% (-232.19, 247.76 ) ( -20.17, 45.13 ) ( -32.23, 33.06 )

Resources