How to extract the estimates from a stanfit object - stan

Let fit be stanfit object. Then I can extract estimates of parameter AAA by the following manner:
Expected A Posterior
EAP <- as.data.frame(summary(fit)[[1]])["AAA","mean"]
95% Credible Interval
lower.CI <- as.data.frame(summary(fit)[[1]])["AAA","2.5%"]
upper.CI <- as.data.frame(summary(fit)[[1]])["AAA","97.5%"]
But I am not sure, this is a canonical method ? If there is some more simpler code, then let me know.
Memorandum of Understanding
EAP
EAP <- get_posterior_mean(fit,par=c("AAA"))

For a mean, the get_posterior_mean function is perhaps a bit more canonical. For quantiles, I would just do something like quantile(extract(fit, pars = "AAA")[[1]], probs = c(0.1, 0.9)). However, the endpoints of 95% credible intervals are not estimated very precisely with the default settings for Stan.

Related

Why does the results of the bootstrapping methods differs if it is being used the same seed?

I want to generate 95% confidence intervals from the R2 of a linear model. While developing the code and using the same seed for both approaches, I figured it out that doing the bootstrap manually doesn't give me the same results as using the boot function from the boot package. I am wondering now if I am doing something wrong? or why is this happening?
On the other hand, in order to calculate the 95% CI I was trying to use the confint function, but I'm getting an error "$ operator is invalid for atomic vectors". Any solution to avoid this error?
Here is a reproducible example to explain my concerns
#creating the dataframe
a <- rpois(n = 100, lambda = 10)
b <- rnorm(n = 100, mean = 5, sd = 1)
DF<- data.frame(a,b)
#bootstrapping manually
set.seed(123)
x=length(DF$a)
B_manually<- data.frame(replicate(100, summary(lm(a~b, data = DF[sample(x, replace = T),]))$r.squared))
names(B_manually)[1]<- "r_squared"
#Bootstrapping using the function "Boot" from Boot library
set.seed(123)
library(boot)
B_boot <- boot(DF, function(data,indices)
summary(lm(a~b, data[indices,]))$r.squared,R=100)
head(B_manually) == head(B_boot$t)
r_squared
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
#Why does the results of the manually vs boot function approach differs if I'm using the same seed?
# 2nd question (Using the confint function to determine the 95 CI gives me an error)
confint(B_manually$r_squared, level = 0.95, method = "quantile")
confint(B_boot$t, level = 0.95, method = "quantile")
#Error: $ operator is invalid for atomic vectors
#NOTE: I already used the boot.ci to determine the 95 confidence interval, as well as the
#quantile function to determine the CI, but the results of these CI differs from each others
#and just wanted to compare with the confint function.
quantile(B_function$t, c(0.025,0.975))
boot.ci(B_function, index=1,type="perc")
Thanks in advance for any help!
The boot package does not use replicate with sample to generate the indices. Check the importance.array function under the source code for boot. It basically generates all the indices at one go. So there's no reason to assume that you will end up with the same indices or same result. Take a step back, the purpose of bootstrap is to use random sampling methods to obtain a estimate of your parameters, you should get similar estimates from different implementation of bootstrap.
For example, you can see the distribution of R^2 is very similar:
set.seed(111)
a <- rpois(n = 100, lambda = 10)
b <- rnorm(n = 100, mean = 5, sd = 1)
DF<- data.frame(a,b)
set.seed(123)
x=length(DF$a)
B_manually<- data.frame(replicate(999, summary(lm(a~b, data = DF[sample(x, replace = T),]))$r.squared))
library(boot)
B_boot <- boot(DF, function(data,indices)
summary(lm(a~b, data[indices,]))$r.squared,R=999)
par(mfrow=c(2,1))
hist(B_manually[,1],breaks=seq(0,0.4,0.01),main="dist of R2 manual")
hist(B_boot$t,breaks=seq(0,0.4,0.01),main="dist of R2 boot")
The function confint you are using, is meant for a lm object, and works on estimating a confidence interval for the coefficient, see help page. It takes the standard error of the coefficient and multiply it by the critical t-value to give you confidence interval. You can check out this book page for the formula. The objects from your bootstrapping are not lm objects and this function doesn't work. It is not meant for any other estimates.

Fitting a confidence interval to dlmForecast in R

I've fit a Dyanmic Linear Model to some data using the dlmFilter in R [from the dlm package]. From said filter I have predicted 7 steps ahead using the dlmForecast function. The predicted outcome is very good, but I would like to add a 95% confidence interval and [after a lot of testing] have struggled to do so.
I've mocked up some similar code, below:
library(dlm)
data <- c(20.68502, 17.28549, 12.18363, 13.53479, 15.38779, 16.14770, 20.17536, 43.39321, 42.91027, 49.41402, 59.22262, 55.42043)
mod.build <- function(par) {
dlmModPoly(1, dV = exp(par[1]), dW = exp(par[2]))
}
# Returns most likely estimate of relevant values for parameters
mle <- dlmMLE(a2, rep(0,2), mod.build); #nileMLE$conv
if(mle$convergence==0) print("converged") else print("did not converge")
mod1 <- dlmModPoly(dV = v, dW = c(0, w))
mod1Filt <- dlmFilter(a1, mod1)
fut1 <- dlmForecast(mod1Filt, n = 7)
The forecast outcome appears to be very good [although the model to some extent over-fits the data due to the small number of observations]. However, I would like to add a 95% confidence interval and have struggled to figure out how to do so.
Any advice would be appreciated?
Cheers
hwidth <- (outer(sapply(fut1$Q, FUN=function(x) sqrt(diag(x))), qnorm(0.025, lower = FALSE)) +as.vector(t(fut1$f)))

confidence interval around predicted value from complex inverse function

I'm trying to get a 95% confidence interval around some predicted values, but am not capable of achieving this.
Basically, I estimated a growth curve like this:
set.seed(123)
dat=data.frame(size=rnorm(50,10,3),age=rnorm(50,5,2))
S <- function(t,ts,C,K) ((C*K)/(2*pi))*sin(2*pi*(t-ts))
sommers <- function(t,Linf,K,t0,ts,C)
Linf*(1-exp(-K*(t-t0)-S(t,ts,C,K)+S(t0,ts,C,K)))
model <- nls(size~sommers(age,Linf,K,t0,ts,C),data=dat,
start=list(Linf=10,K=4.7,t0=2.2,C=0.9,ts=0.1))
I have independent size measurements, for which I would like to predict the age. Therefore, the inverse of the function, which is not very straightforward, I calculated like this:
model.out=coef(model)
S.out <- function(t)
((model.out[[4]]*model.out[[2]])/(2*pi))*sin(2*pi*(t-model.out[[5]]))
sommers.out <- function(t)
model.out[[1]]*(1-exp(-model.out[[2]]*(t-model.out[[3]])-S.out(t)+S.out(model.out[[3]])))
inverse = function (f, lower = -100, upper = 100) {
function (y) uniroot((function (x) f(x) - y), lower = lower, upper = upper)[1]
}
sommers.inverse = inverse(sommers.out, 0, 25)
x= sommers.inverse(10) #this works with my complete dataset, but not with this fake one
Although this works fine, I need to know the confidence interval (95%) around this estimate (x). For linear models there is for example "predict(... confidence=)". I could also bootstrap the function somehow to get the quantiles associated with the parameters (didn't find how), to then use the extremes of those to calculate the maximum and minimum values predictable. But that doesn't really look like the good way of doing this....
Any help would be greatly appreciated.
EDIT after answer:
So this worked (explained in the book of Ben Bolker, see answer):
vmat = mvrnorm(1000, mu = coef(mfit), Sigma = vcov(mfit))
dist = numeric(1000)
for (i in 1:1000) {dist[i] = sommers_inverse(9.938,vmat[i,])}
quantile(dist, c(0.025, 0.975))
On the rather bad fake data I gave, this works of course rather horrible. But on the real data (which I have a problem recreating), this is ok!
Unless I'm mistaken, you're going to have to use either regular (parametric) bootstrapping or a method called either "population predictive intervals" (e.g., see section 5 of chapter 7 of Bolker 2008), which assumes that the sampling distributions of your parameters are multivariate Normal. However, I think you may have bigger problems, unless I've somehow messed up your model in adapting it ...
Generate data (note that random data may actually bad for testing your model - see below ...)
set.seed(123)
dat <- data.frame(size=rnorm(50,10,3),age=rnorm(50,5,2))
S <- function(t,ts,C,K) ((C*K)/(2*pi))*sin(2*pi*(t-ts))
sommers <- function(t,Linf,K,t0,ts,C)
Linf*(1-exp(-K*(t-t0)-S(t,ts,C,K)+S(t0,ts,C,K)))
Plot the data and the initial curve estimate:
plot(size~age,data=dat,ylim=c(0,16))
agevec <- seq(0,10,length=1001)
lines(agevec,sommers(agevec,Linf=10,K=4.7,t0=2.2,ts=0.1,C=0.9))
I had trouble with nls so I used minpack.lm::nls.lm, which is slightly more robust. (There are other options here, e.g. calculating the derivatives and providing the gradient function, or using AD Model Builder or Template Model Builder, or using the nls2 package.)
For nls.lm we need a function that returns the residuals:
sommers_fn <- function(par,dat) {
with(c(as.list(par),dat),size-sommers(age,Linf,K,t0,ts,C))
}
library(minpack.lm)
mfit <- nls.lm(fn=sommers_fn,
par=list(Linf=10,K=4.7,t0=2.2,C=0.9,ts=0.1),
dat=dat)
coef(mfit)
## Linf K t0 C ts
## 10.6540185 0.3466328 2.1675244 136.7164179 0.3627371
Here's our problem:
plot(size~age,data=dat,ylim=c(0,16))
lines(agevec,sommers(agevec,Linf=10,K=4.7,t0=2.2,ts=0.1,C=0.9))
with(as.list(coef(mfit)), {
lines(agevec,sommers(agevec,Linf,K,t0,ts,C),col=2)
abline(v=t0,lty=2)
abline(h=c(0,Linf),lty=2)
})
With this kind of fit, the results of the inverse function are going to be extremely unstable, as the inverse function is many-to-one, with the number of inverse values depending sensitively on the parameter values ...
sommers_pred <- function(x,pars) {
with(as.list(pars),sommers(x,Linf,K,t0,ts,C))
}
sommers_pred(6,coef(mfit)) ## s(6)=9.93
sommers_inverse <- function (y, pars, lower = -100, upper = 100) {
uniroot(function(x) sommers_pred(x,pars) -y, c(lower, upper))$root
}
sommers_inverse(9.938, coef(mfit)) ## 0.28
If I pick my interval very carefully I can get back the correct answer ...
sommers_inverse(9.938, coef(mfit), 5.5, 6.2)
Maybe your model will be better behaved with more realistic data. I hope so ...

How to fit a normal cumulative distribution function to data

I have generated some data which is effectively a cumulative distribution, the code below gives an example of X and Y from my data:
X<- c(0.09787761, 0.10745590, 0.11815422, 0.15503521, 0.16887488, 0.18361325, 0.22166727,
0.23526786, 0.24198808, 0.25432602, 0.26387961, 0.27364063, 0.34864672, 0.37734113,
0.39230736, 0.40699061, 0.41063824, 0.42497043, 0.44176913, 0.46076456, 0.47229330,
0.53134509, 0.56903577, 0.58308938, 0.58417653, 0.60061901, 0.60483849, 0.61847521,
0.62735245, 0.64337353, 0.65783302, 0.67232004, 0.68884473, 0.78846000, 0.82793293,
0.82963446, 0.84392010, 0.87090024, 0.88384044, 0.89543314, 0.93899033, 0.94781219,
1.12390279, 1.18756693, 1.25057774)
Y<- c(0.0090, 0.0210, 0.0300, 0.0420, 0.0580, 0.0700, 0.0925, 0.1015, 0.1315, 0.1435,
0.1660, 0.1750, 0.2050, 0.2450, 0.2630, 0.2930, 0.3110, 0.3350, 0.3590, 0.3770, 0.3950,
0.4175, 0.4475, 0.4715, 0.4955, 0.5180, 0.5405, 0.5725, 0.6045, 0.6345, 0.6585, 0.6825,
0.7050, 0.7230, 0.7470, 0.7650, 0.7950, 0.8130, 0.8370, 0.8770, 0.8950, 0.9250, 0.9475,
0.9775, 1.0000)
plot(X,Y)
I would like to obtain the median, mean and some quantile information (say for example 5%, 95%) from this data. The way I was thinking of doing this was to fit a defined distribution to it and then integrate to get my quantiles, mean and median values.
The question is how to fit the most appropriate cumulative distribution function to this data (I expect this may well be the Normal Cumulative Distribution Function).
I have seen lots of ways to fit a PDF but I can't find anything on fitting a CDF.
(I realise this may seem a basic question to many of you but it has me struggling!!)
Thanks in advance
Perhaps you could use nlm to find parameters that minimize the squared differences from your observed Y values and the expected for a normal distribution. Here an example using your data
fn <- function(x) {
mu <- x[1];
sigma <- exp(x[2])
sum((Y-pnorm(X,mu,sigma))^2)
}
est <- nlm(fn, c(1,1))$estimate
plot(X,Y)
curve(pnorm(x, est[1], exp(est[2])), add=T)
Unfortunately I don't know an easy with with this method to constrain sigma>0 without doing the exp transformation on the variable. But the fit seems reasonable

Confidence interval for Weibull distribution

I have wind data that I'm using to perform extreme value analysis (calculate return levels). I'm using R with packages 'evd', 'extRemes' and 'ismev'.
I'm fitting GEV, Gumbel and Weibull distributions, in order to estimate the return levels (RL) for some period T.
For the GEV and Gumbel cases, I can get RL's and Confidence Intervals using the extRemes::return.level() function.
Some code:
require(ismev)
require(MASS)
data(wind)
x = wind[, 2]
rperiod = 10
fit <- fitdistr(x, 'weibull')
s <- fit$estimate['shape']
b <- fit$estimate['scale']
rlevel <- qweibull(1 - 1/rperiod, shape = s, scale = b)
## CI around rlevel
## ci.rlevel = ??
But for the Weibull case, I need some help to generate the CI's.
I suspect the excruciatingly correct answer will be that the joint confidence region is an ellipse or some bent-sausage shape but you can extract variance estimates for the parameters from the fit object with the vcov function and then build standard errors for which +/- 1.96 SE's should be informative:
> sqrt(vcov(fit)["shape", "shape"])
[1] 0.691422
> sqrt(vcov(fit)["scale", "scale"])
[1] 1.371256
> s +c(-1,1)*sqrt(vcov(fit)["shape", "shape"])
[1] 6.162104 7.544948
> b +c(-1,1)*sqrt(vcov(fit)["scale", "scale"])
[1] 54.46597 57.20848
The usual way to calculate a CI for a single parameter is to assume Normal distribution and use theta+/- 1.96*SE(theta). In this case, you have two parameters so doing that with both of them would give you a "box", the 2D analog of an interval. The truly correct answer would be something more complex in the 'scale'-by-'shape' parameter space and might be most easily achieved with simulation methods, unless you have a better grasp of theory than I have.

Resources