I need to fit x-y data with a model, which is non-analytic. I have a function f(x) that calculates the model for each x numerically, but there is no analytical equation. For the fit, I use optim in R. I minimise RMS between the model and the data. It works well and returns reasonable parameters.
I would like to find confidence intervals (or at least standard errors) on the best-fitting parameters. I found on internet that this can be done from the Hessian matrix, but only if maximising log-likelihood function. I don't know how to do this, all I have is x, y and f(x) from which I find RMS. Alas, I have no good way of estimating errors on y.
How can I find confidence intervals on my fit parameters?
Edit: perhaps an example in R might help explaining what I'm asking for. This example uses a simple analytic function to fit the data, in my real case the function is non-analytic, so I cannot use, e.g., nls.
set.seed(666)
# generate data
x <- seq(100) / 100
y <- 0.5 * x + rnorm(100, sd = 0.03) + 0.2
# function to fit
f <- function(x, a, b) {
a * x + b
}
# error function to minimise: RMS
errfun <- function(par, x, y) {
a <- par[1]
b <- par[2]
err <- sqrt(sum((f(x, a, b) - y)^2))
}
# use optim to fit the model to the data
par <- c(1, 0)
res <- optim(par, errfun, gr=NULL, x, y)
# best-fitting parameters
best_a <- res$par[1]
best_b <- res$par[2]
The best fitting parameters are a = 0.50 and b = 0.20. I need to find 95% confidence intervals on these.
This is a job for the bootstrap:
(1) create a large number of synthetic datasets x*. These are created by sampling from x with replacement the same number of data as were in x. For example, if your data is (1,2,3,4,5,6) an x* might be (5,2,4,4,2,3) (note that values might appear multiple times, or not at all because we are sampling with replacement)
(2) For each x*, calculate f(x*). If there are other parameters which don't depend on the data, don't change them. (so f(x,a,b,c) becomes f(x*,a,b,c) as long as a,b,c don't depend on x. Call these quantities f*.
(3) You can estimate anything you want from these f*. If you want the standard deviation of f(x), take the standard deviation of f*. If you want the 95% confidence interval, take the range from the 2.5 to the 97.5 percentiles of f*. More formally, if you want to estimate g(f(x)) you estimate it as g(f(x*)).
I should say this is a very practically-oriented explanation of the bootstrap. I have glossed over many theoretical details, but the bootstrap is near-universally applicable (basically as long as the thing you are trying to estimate actually exists, you are usually okay).
To apply this to the example you have given in your code:
x <- seq(100) / 100
y <- 0.5 * x + rnorm(100, sd = 0.03) + 0.2
# function to fit
f <- function(x, a, b) {
a * x + b
}
# error function to minimise: RMS
errfun <- function(par, x, y) {
a <- par[1]
b <- par[2]
err <- sqrt(sum((f(x, a, b) - y)^2))
}
# this is the part where we bootstrap
# use optim to fit the model to the data
best_a <- best_b <- numeric(10000)
for(i in 1:10000){
j <- sample(100,replace=TRUE)
x.boot <- x[j]; y.boot <- y[j]
par <- c(1, 0)
res <- optim(par, errfun, gr=NULL, x.boot, y.boot)
# best-fitting parameters
best_a[i] <- res$par[1]
best_b[i] <- res$par[2]
}
# now, we look at the *vector* best_a
# for example, if you want the standard deviation of a,
sd(best_a)
# or a 95% confidence interval for b,
quantile(best_b,c(0.025,0.975))
Related
I have read the maxLik document on how to do constrained optimization. However, I do not understand how I can do it. I have a custom likelihood function as below. The value of rho should be between 0 and 1 (making that two constraints). Now how exactly do I put those constraints? I have 3 parameters.
I have seen an almost similar question here with 3 constraints and 3 parameters but I am really a novice and do not understand the proposed hints on how to include the constraints ? how to use maxLik() to do the constrained optimization in R
require(maxLik)
data<- matrix(rnorm(3600,5,1),ncol=20)
Y=data[,c(1:20)]
Y <- as.matrix(Y, ncol=20)
p=4
T=nrow(Y)
X <- Y[p:(T-1),1:4]
unos <- rep(1,T)
X <- cbind(unos, X)
set.seed(101)
loglik <- function(theta) {
eta <- theta[1]
n <- theta[2]
rho <- theta[3]
coefis=as.matrix(c(mu0=0.0112, mu1=0.0115, mu2=0.009, mu3=0.021,
mu4=0.01237),ncol=1) #coefficients for the intercept and four lags
resi= Y- X%*%coefis
y <- Custom_lik(resi, eta, n, rho, T) #my custom likelihood function
return(-y[[1]])
}
m <- maxLik(loglik, start=c(eta=1.1, n=1.5, rho=0.5))
i have this model
Where TD is a binary variable, and Strata is a numeric variable equals to {1,2,3}. I need to get 95% CI for this two linear combinations:
I have this function to construct confidence intervals
pwp_gt_int <- coxph(Surv(tstart2,tstop2,status==1) ~ TD+ TD:strata(event)
mod_summ <- summary(pwp_gt_int)
coefs <- modsum$coefficients
X <- model.matrix(pwp_gt_int)
dof <- nrow(X) - ncol(X)
coefs_var <- vcov(pwp_gt_int)
halfCI <- qt(0.975, dof) * sqrt(diag(coefs_var))
matrix(c(coefs - halfCI, coefs + halfCI), nrow=3)
but i need something like this
coefs[2] = coefs[1] + 2*coefs[2]
coefs[3] = coefs[1] + 3*coefs[3]
matrix(c(coefs - halfCI, coefs + halfCI), nrow=3)
But the CI's i got are not plausible, i'm think im not getting right the variance-covariance matrix for the linear combinations.
Please help.
It looks like you're asking for two different things - one is the variance of a linear combination and the other is a confidence interval (and as such, a variance) for a non-linear combination. The linear combination is relatively easy. We know that the variance of a linear combination is:
where A is a matrix of constants and V(b) is the variance-covariance matrix of the random variables (in this case, the coefficients). If your coefficient vector has three values in it, and you want to do as you suggest in your last block of code, then the you would define:
or in R as:
A = matrix(c(1,1,2,0,0,3), ncol=3)
Then, you could make the linear combinations and their variances with:
b <- matrix(coef(pwp_gt_int)[1:3], ncol=1)
V <- vcov(pwp_gt_int)[1:3,1:3]
lincom <- A %*% b
v_lincom <- A %*% V %*% t(A)
sds <- sqrt(diag(v_lincom))
crit <- qt(.975, dof)
cis <- cbind(lincom - crit*sds, sincom + crit*sds)
That would be the confidence interval for the linear combination. The problem is that there isn't such an easy formula for the variance of a non-linear combination. Further, the confidence intervals may be asymmetric. One thing you could do is an end-point transformation, where you take lincom and cis and then exponentiate all of them. Another option would be a parametric bootstrap. Here's what that would look like.
B <- MASS::mvrnorm(2500, b, V)
nlcom <- exp(A %*% b)
nlsim <- exp(A %*% t(B))
nlcis <- apply(nlsim, 1, quantile, c(.025,.975))
Now, nlcis would have the confidence bounds for the non-linear combination. This should work given your data, but without the data to try it out, I'm not sure.
I am doing a simulation study for a mixed effect model (three levels; observations nested within subjects within schools):
f <- lmer(measurement ~ time + race + gender + s_ses +
fidelity + (1 + time|school/subject), mydata_long, REML=0)
The model allows the intercept and time slope to vary across subjects and schools. I am wondering how I can fix the variances to be specific values. I do know how to do that when there is only random intercept:
VarCorr(f)['subject:school']<-0.13
VarCorr(f)['school']<-0.20
However, when there is a random slope, these codes don't work since there are different components in the variance aspect (see the attached picture).
How can I fix the variances of subject: school (Intercept), subject:school time, school (Intercept), and school time to specific values in this case. Any suggestions?
A simulation example. The hardest part is getting the random-effects parameters correctly specified: the key things you need to know are (1) internally the random effects variance matrix is scaled by the residual variance; (2) for vector-valued random effects (like this random-slopes model), the variance-covariance matrix is specified in terms of its Cholesky factor: if we want covariance matrix V, there is a lower-triangular matrix such that C %*% t(C) == V. We compute C using chol(), then read off the elements of the lower triangle (including the diagonal) in column-major order (see helper functions below).
Set up experimental design (simplified from yours, but with the same random effects components):
mydata_long <- expand.grid(time=1:40,
school=factor(letters[1:25]),
subject=factor(LETTERS[1:25]))
Helper functions to convert from
a vector of standard deviations, one or more correlation parameters (in lower-triangular/column major order), and a residual standard deviation
to
a vector of "theta" parameters as used internally by lme4 (see description above)
... and back the other way (conv_chol)
conv_sc <- function(sdvec,cor,sigma) {
## construct symmetric matrix with cor in lower/upper triangles
cormat <- matrix(1,nrow=length(sdvec),ncol=length(sdvec))
cormat[lower.tri(cormat)] <- cor
cormat[upper.tri(cormat)] <- t(cormat)[upper.tri(cormat)]
## convert to covariance matrix and scale by 1/sigma^2
V <- outer(sdvec, sdvec)*cormat/sigma^2
## extract lower triangle in column-major order
return(t(chol(V))[lower.tri(V,diag=TRUE)])
}
conv_chol <- function(ch, s) {
m <- matrix(NA,2,2)
m[lower.tri(m,diag=TRUE)] <- ch
m[upper.tri(m)] <- 0
V <- m %*% t(m) * s^2
list(sd=sqrt(diag(V)), cor=cov2cor(V)[1,2])
}
If you want to start from covariance matrices rather than standard deviations and correlations you can modify the code to skip some steps (starting and ending with V).
Pick some values and convert (and back-convert, to check)
tt1 <- conv_sc(c(0.7, 1.2), 0.3, 0.5)
tt2 <- conv_sc(c(1.4, 0.2), -0.2, 0.5)
tt <- c(tt1, tt2)
conv_chol(tt1, s=0.5)
conv_chol(tt2, s=0.5)
Set up formula and simulate:
form <- m ~ time + (1 + time|school/subject)
set.seed(101)
mydata_long$m <- simulate(form[-2], ## [-2] drops the response
family=gaussian,
newdata=mydata_long,
newparams=list(theta=tt,
beta=c(1,1),
sigma=0.5))[[1]]
f <- lmer(form, data=mydata_long, REML=FALSE)
VarCorr(f)
The fitted results are close to what we requested above ...
Groups Name Std.Dev. Corr
subject:school (Intercept) 0.66427
time 1.16488 0.231
school (Intercept) 1.78312
time 0.22459 -0.156
Residual 0.49772
Now do the same thing 200 times, to explore the distribution of estimates:
simfun <- function() {
mydata_long$m <- simulate(form[-2],
family=gaussian,
newdata=mydata_long,
newparams=list(theta=tt,
beta=c(1,1),
sigma=0.5))[[1]]
f <- lmer(form, data=mydata_long, REML=FALSE)
return(as.data.frame(VarCorr(f))[,"sdcor"])
}
set.seed(101)
res <- plyr::raply(200,suppressMessages(simfun()),.progress="text")
Here plyr::raply() is used for convenience, you can do this however you like (for loop, lapply(), replicate(), purrr::map() ...)
par(las=1)
boxplot(res)
## add true values to the plot
points(1:7,c(0.7,1.2,0.3,1.4,0.2,-0.3,0.5),col=2,cex=3,lwd=3)
Suppose I have some data and I fit them to a gamma distribution, how to find the interval probability for Pr(1 < x <= 1.5), where x is an out-of-sample data point?
require(fitdistrplus)
a <- c(2.44121289,1.70292449,0.30550832,0.04332383,1.0553436,0.26912546,0.43590885,0.84514809,
0.36762336,0.94935435,1.30887437,1.08761895,0.66581035,0.83108270,1.7567334,1.00241339,
0.96263021,1.67488277,0.87400413,0.34639636,1.16804671,1.4182144,1.7378907,1.7462686,
1.7427784,0.8377457,0.1428738,0.71473956,0.8458882,0.2140742,0.9663167,0.7933085,
0.0475603,1.8657773,0.18307362,1.13519144)
fit <- fitdist(a, "gamma",lower = c(0, 0))
Someone does not like my above approach, which is conditional on MLE; now let's see something unconditional. If we take direct integration, we need a triple integration: one for shape, one for rate and finally one for x. This is not appealing. I will just produce Monte Carlo estimate instead.
Under Central Limit Theorem, MLE are normally distributed. fitdistrplus::fitdist does not give standard error, but we can use MASS::fitdistr which would performs exact inference here.
fit <- fitdistr(a, "gamma", lower = c(0,0))
b <- fit$estimate
# shape rate
#1.739737 1.816134
V <- fit$vcov ## covariance
shape rate
shape 0.1423679 0.1486193
rate 0.1486193 0.2078086
Now we would like to sample from parameter distribution and get samples of target probability.
set.seed(0)
## sample from bivariate normal with mean `b` and covariance `V`
## Cholesky method is used here
X <- matrix(rnorm(1000 * 2), 1000) ## 1000 `N(0, 1)` normal samples
R <- chol(V) ## upper triangular Cholesky factor of `V`
X <- X %*% R ## transform X under desired covariance
X <- X + b ## shift to desired mean
## you can use `cov(X)` to check it is very close to `V`
## now samples for `Pr(1 < x < 1.5)`
p <- pgamma(1.5, X[,1], X[,2]) - pgamma(1, X[,1], X[,2])
We can make a histogram of p (and maybe do a density estimation if you want):
hist(p, prob = TRUE)
Now, we often want sample mean for predictor:
mean(p)
# [1] 0.1906975
Here goes an example that uses MCMC techniques and a Bayesian mode of inference to estimate the posterior probability that a new observation falls in the interval (1:1.5). This is an unconditional estimate, as opposed to the conditional estimate obtained by integrating the gamma-distribution with maximum-likelihood parameter estimates.
This code requires that JAGS be installed on your computer (free and easy to install).
library(rjags)
a <- c(2.44121289,1.70292449,0.30550832,0.04332383,1.0553436,0.26912546,0.43590885,0.84514809,
0.36762336,0.94935435,1.30887437,1.08761895,0.66581035,0.83108270,1.7567334,1.00241339,
0.96263021,1.67488277,0.87400413,0.34639636,1.16804671,1.4182144,1.7378907,1.7462686,
1.7427784,0.8377457,0.1428738,0.71473956,0.8458882,0.2140742,0.9663167,0.7933085,
0.0475603,1.8657773,0.18307362,1.13519144)
# Specify the model in JAGS language using diffuse priors for shape and scale
sink("GammaModel.txt")
cat("model{
# Priors
shape ~ dgamma(.001,.001)
rate ~ dgamma(.001,.001)
# Model structure
for(i in 1:n){
a[i] ~ dgamma(shape, rate)
}
}
", fill=TRUE)
sink()
jags.data <- list(a=a, n=length(a))
# Give overdispersed initial values (not important for this simple model, but very important if running complicated models where you need to check convergence by monitoring multiple chains)
inits <- function(){list(shape=runif(1,0,10), rate=runif(1,0,10))}
# Specify which parameters to monitor
params <- c("shape", "rate")
# Set-up for MCMC run
nc <- 1 # number of chains
n.adapt <-1000 # number of adaptation steps
n.burn <- 1000 # number of burn-in steps
n.iter <- 500000 # number of posterior samples
thin <- 10 # thinning of posterior samples
# Running the model
gamma_mod <- jags.model('GammaModel.txt', data = jags.data, inits=inits, n.chains=nc, n.adapt=n.adapt)
update(gamma_mod, n.burn)
gamma_samples <- coda.samples(gamma_mod,params,n.iter=n.iter, thin=thin)
# Summarize the result
summary(gamma_samples)
# Compute improper (non-normalized) probability distribution for x
x <- rep(NA, 50000)
for(i in 1:50000){
x[i] <- rgamma(1, gamma_samples[[1]][i,1], rate = gamma_samples[[1]][i,2])
}
# Find which values of x fall in the desired range and normalize.
length(which(x>1 & x < 1.5))/length(x)
Answer:
Pr(1 < x <= 1.5) = 0.194
So pretty close to the conditional estimate, but this is not guaranteed to generally be the case.
You can just use pgamma with estimated parameters in fit.
b <- fit$estimate
# shape rate
#1.739679 1.815995
pgamma(1.5, b[1], b[2]) - pgamma(1, b[1], b[2])
# [1] 0.1896032
Thanks. But how about P(x > 2)?
Check out the lower.tail argument:
pgamma(q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE)
By default, pgamma(q) evaluates Pr(x <= q). Setting lower.tail = FALSE gives Pr(x > q). So you can do:
pgamma(2, b[1], b[2], lower.tail = FALSE)
# [1] 0.08935687
Or you can also use
1 - pgamma(2, b[1], b[2])
# [1] 0.08935687
I want to use y=a^(b^x) to fit the data below,
y <- c(1.0385, 1.0195, 1.0176, 1.0100, 1.0090, 1.0079, 1.0068, 1.0099, 1.0038)
x <- c(3,4,5,6,7,8,9,10,11)
data <- data.frame(x,y)
When I use the non-linear least squares procedure,
f <- function(x,a,b) {a^(b^x)}
(m <- nls(y ~ f(x,a,b), data = data, start = c(a=1, b=0.5)))
it produces an error: singular gradient matrix at initial parameter estimates. The result is roughly a = 1.1466, b = 0.6415, so there shouldn't be a problem with intial parameter estimates as I have defined them as a=1, b=0.5.
I have read in other topics that it is convenient to modify the curve. I was thinking about something like log y=log a *(b^x), but I don't know how to deal with function specification. Any idea?
I will expand my comment into an answer.
If I use the following:
y <- c(1.0385, 1.0195, 1.0176, 1.0100, 1.0090, 1.0079, 1.0068, 1.0099, 1.0038)
x <- c(3,4,5,6,7,8,9,10,11)
data <- data.frame(x,y)
f <- function(x,a,b) {a^b^x}
(m <- nls(y ~ f(x,a,b), data = data, start = c(a=0.9, b=0.6)))
or
(m <- nls(y ~ f(x,a,b), data = data, start = c(a=1.2, b=0.4)))
I obtain:
Nonlinear regression model
model: y ~ f(x, a, b)
data: data
a b
1.0934 0.7242
residual sum-of-squares: 0.0001006
Number of iterations to convergence: 10
Achieved convergence tolerance: 3.301e-06
I always obtain an error if I use 1 as a starting value for a, perhaps because 1 raised to anything is 1.
As for automatically generating starting values, I am not familiar with a procedure to do that. One method I have read about is to simulate curves and use starting values that generate a curve that appears to approximate your data.
Here is the plot generated using the above parameter estimates using the following code. I admit that maybe the lower right portion of the line could fit a little better:
setwd('c:/users/mmiller21/simple R programs/')
jpeg(filename = "nlr.plot.jpeg")
plot(x,y)
curve(1.0934^(0.7242^x), from=0, to=11, add=TRUE)
dev.off()