Because the estimation of the parameters of the Weibull function using R2OpenBUGS is so different from the amounts provided to generate the data set using rweibull? What's wrong with my fit?
data<-rweibull(200, 2, 10)
model<-function(){
v ~ dgamma(0.0001,0.0001)
lambda ~ dgamma(0.0001,0.0001)
for(i in 1:n){
y[i] ~ dweib(v, lambda)
}
}
y<-data
n<-length(y)
data<-list("y", "n")
inits<-function(){list(v=1, lambda=1)}
params<-c("v", "lambda")
model.file<-file.path(tempdir(), "model.txt")
write.model(model, model.file)
weibull<-bugs(data, inits, params, model.file, n.iter = 3000, n.burnin = 2000, n.chains = 3)
print(weibull, 4)
The result obtained is:
Current: 3 chains, each with 3000 iterations (first 2000 discarded)
Cumulative: n.sims = 3000 iterations saved
mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff
v 2.0484 0.1044 1.8450 1.9780 2.0500 2.1180 2.2470 1.0062 780
lambda 0.0097 0.0026 0.0056 0.0078 0.0093 0.0112 0.0159 1.0063 830
deviance 1145.6853 1.8403 1144.0000 1144.0000 1145.0000 1146.0000 1151.0000 1.0047 770
pD = 1.6 and DIC = 1147.0
R parameterizes the Weibull using shape (=2 in your case) and scale (=10) by default: BUGS uses shape and lambda, where lambda=(1/scale)^shape. So you should expect lambda to be approximately (1/10)^2=0.01, which is close to your median of 0.0093.
This question on CrossValidated, and this paper in the R Journal, compare parameterizations.
Related
Below you can find a simple example of linear regression with 2 predictors. It works well. However, when having only 1 predictor (see second script), the following error message appears:
Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=x; dims declared=(20,1); dims found=(20)
The problem is that a matrix with 1 row is automatically transformed in a vector and therefore, doesn't match with the declared dimensions. One solution would be to declare xas a vector, but the problem is that I'm running the same scripts with different number of predictors (could be 1 or more).
STAN script:
write("// Stan model for simple linear regression
data {
int<lower=0> N; // number of data items
int<lower=0> K;// number of predictors
matrix[N, K] x;// predictor matrix
vector[N] y;// outcome vector
}
parameters {
real alpha; // intercept
vector[K] beta; // coefficients for predictors
real<lower=0> sigma; // error scale
}
model {
y ~ normal(x * beta + alpha, sigma); // likelihood
}", "ex_dimension.stan")
R script with 2 predictors (working):
N=20
K=2
x1=1:N+rnorm(N,0,0.5)
x2=rnorm(N,2,1)
x=cbind(x1,x2)
a=2
b=3
y=a*x1+b*x2+rnorm(N,0,1)
stan_data=list(N=N,
K=K,
x=x,
y=y)
fit <- stan(file = "ex_dimension.stan",
data = stan_data,
warmup = 500,
iter = 2000,
chains = 4,
cores = 4,
thin = 1,
control=list(adapt_delta=0.8))
fit
Script with 1 predictor (not working):
stan_data=list(N=N,
K=1,
x=x[,1],
y=y)
fit <- stan(file = "ex_dimension.stan",
data = stan_data,
warmup = 500,
iter = 2000,
chains = 4,
cores = 4,
thin = 1,
control=list(adapt_delta=0.8))
fit
Subset the matrix with drop = FALSE to avoid collapsing it to a vector and thereby passing the wrong input to the Stan model (see also e.g. Advanced R - Subsetting Chapter).
library(rstan)
stan_data <- list(N = N, K = 1, x = x[, 1, drop = FALSE], y = y)
fit <- stan(
model_code = "// Stan model for simple linear regression
data {
int<lower=0> N; // number of data items
int<lower=0> K; // number of predictors
matrix[N, K] x; // predictor matrix
vector[N] y; // outcome vector
}
parameters {
real alpha; // intercept
vector[K] beta; // coefficients for predictors
real<lower=0> sigma; // error scale
}
model {
y ~ normal(x * beta + alpha, sigma); // likelihood
}",
data = stan_data,
chains = 1
)
fit
#> Inference for Stan model: 4f8ba0f0c644593f519910e9d2741995.
#> 1 chains, each with iter=2000; warmup=1000; thin=1;
#> post-warmup draws per chain=1000, total post-warmup draws=1000.
#>
#> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
#> alpha 6.26 0.06 1.20 3.93 5.49 6.25 7.04 8.68 470 1
#> beta[1] 2.00 0.00 0.10 1.81 1.94 2.00 2.06 2.19 453 1
#> sigma 2.70 0.02 0.50 1.87 2.35 2.62 2.97 3.88 458 1
#> lp__ -28.15 0.06 1.21 -31.12 -28.80 -27.84 -27.23 -26.74 366 1
#>
#> Samples were drawn using NUTS(diag_e) at Thu Aug 15 12:41:19 2019.
#> For each parameter, n_eff is a crude measure of effective sample size,
#> and Rhat is the potential scale reduction factor on split chains (at
#> convergence, Rhat=1).
I am replicating a negative binomial regression model in R. When calculating robust standard errors, the output does not match Stata output of standard errors.
The original Stata code is
nbreg displaced eei lcostofwar cfughh roadskm lpopdensity ltkilled, robust nolog
I have attempted both manual calculation and vcovHC from sandwich. However, neither produces the same results.
My regression model is as follows:
mod1 <- glm.nb(displaced ~ eei + costofwar_log + cfughh + roadskm + popdensity_log + tkilled_log, data = mod1_df)
With vcovHC I have tried every option from HC0 to HC5.
Attempt 1:
cov_m1 <- vcovHC(mod1, type = "HC0", sandwich = T)
se <- sqrt(diag(cov_m1))
Attempt 2:
mod1_rob <- coeftest(mod1, vcovHC = vcov(mod1, type = "HC0"))
The most successful has been HC0 and vcov = sandwich but no SEs are correct.
Any suggestions?
EDIT
My output is as follows (using HC0):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.3281183 1.5441312 0.8601 0.389730
eei -0.0435529 0.0183359 -2.3753 0.017536 *
costofwar_log 0.2984376 0.1350518 2.2098 0.027119 *
cfughh -0.0380690 0.0130254 -2.9227 0.003470 **
roadskm 0.0020812 0.0010864 1.9156 0.055421 .
popdensity_log -0.4661079 0.1748682 -2.6655 0.007688 **
tkilled_log 1.0949084 0.2159161 5.0710 3.958e-07 ***
The Stata output I am attempting to replicate is:
Estimate Std. Error
(Intercept) 1.328 1.272
eei -0.044 0.015
costofwar_log 0.298 0.123
cfughh -0.038 0.018
roadskm 0.002 0.0001
popdensity_log -0.466 0.208
tkilled_log 1.095 0.209
The dataset is found here and the recoded variables are:
mod1_df <- table %>%
select(displaced, eei_01, costofwar, cfughh, roadskm, popdensity,
tkilled)
mod1_df$popdensity_log <- log(mod1_df$popdensity + 1)
mod1_df$tkilled_log <- log(mod1_df$tkilled + 1)
mod1_df$costofwar_log <- log(mod1_df$costofwar + 1)
mod1_df$eei <- mod1_df$eei_01*100
Stata uses the observed Hessian for its computations, glm.nb() uses the expected Hessian. Therefore, the default bread() employed by the sandwich() function is different, leading to different results. There are other R packages that employ the observed hessian for its variance-covariance estimate (e.g., gamlss) but these do not supply an estfun() method for the sandwich package.
Hence, below I simply set up a dedicated bread_obs() function that extracts the ML estimates from a negbin object, sets up the negative log-likelihood, computes the observed Hessian numerically via numDeriv::hessian() and computes the "bread" from it (omitting the estimate for log(theta)):
bread_obs <- function(object, method = "BFGS", maxit = 5000, reltol = 1e-12, ...) {
## data and estimated parameters
Y <- model.response(model.frame(object))
X <- model.matrix(object)
par <- c(coef(object), "log(theta)" = log(object$theta))
## dimensions
n <- NROW(X)
k <- length(par)
## nb log-likelihood
nll <- function(par) suppressWarnings(-sum(dnbinom(Y,
mu = as.vector(exp(X %*% head(par, -1))),
size = exp(tail(par, 1)), log = TRUE)))
## covariance based on observed Hessian
rval <- numDeriv::hessian(nll, par)
rval <- solve(rval) * n
rval[-k, -k]
}
With that function I can compare the sandwich() output (based on the expected Hessian) with the output using the bread_obs() (based on the observed Hessian).
s_exp <- sandwich(mod1)
s_obs <- sandwich(mod1, vcov = bread_obs)
cbind("Coef" = coef(mod1), "SE (Exp)" = sqrt(diag(s_exp)), "SE (Obs)" = sqrt(diag(s_obs)))
## Coef SE (Exp) SE (Obs)
## (Intercept) 1.328 1.259 1.259
## eei -0.044 0.017 0.015
## costofwar_log 0.298 0.160 0.121
## cfughh -0.038 0.015 0.018
## roadskm 0.002 0.001 0.001
## popdensity_log -0.466 0.135 0.207
## tkilled_log 1.095 0.179 0.208
This still has slight differences compared to Stata but these are likely numerical differences from the optimization etc.
If you create a new dedicated bread() method for negbin objects
bread.negbin <- bread_obs
then the method dispatch will use this if you do sandwich(mod1).
In R you need to manually provide a degree of freedom correction, so try this which I borrowed from this source:
dfa <- (G/(G - 1)) * (N - 1)/pm1$df.residual
# display with cluster VCE and df-adjustment
firm_c_vcov <- dfa * vcovHC(pm1, type = "HC0", cluster = "group", adjust = T)
coeftest(pm1, vcov = firm_c_vcov)
Here G is the number of Panels in your data set, N is the number of observations and pm1 is your model estimated. Obviously, you could drop the clustering.
I have a question about the how to calculate differences between the coefficients (categorical variables) of glm in Rstan.
As example, I used iris dataset in R to judge whether I can calculate the posterior distribution of differences of coefficients.
At first, I conducted a basic glm procedure like below and calculate the significant differences of coefficients.
library(tidyverse)
library(magrittr)
library(multcomp)
iris_glm <-
glm(Sepal.Length ~ Species, data = iris)
multcomp::glht(iris_glm, linfct = mcp(Species = "Tukey")) %>%
summary(.) %>%
broom::tidy()
lhs rhs estimate std.error statistic p.value
1 versicolor - setosa 0 0.930 0.1029579 9.032819 0.000000e+00
2 virginica - setosa 0 1.582 0.1029579 15.365506 0.000000e+00
3 virginica - versicolor 0 0.652 0.1029579 6.332686 4.294805e-10
Next, I conducted bayesian glm procedure using stan like below code, and calculate the posterior distribution of the differences between coefficients in generated quantities section.
# Make the model matrix for Rstan
iris_mod <-
model.matrix(Sepal.Length ~ Species, data = iris) %>%
as.data.frame(.)
# Input data
stan_data <-
list(N = nrow(iris_mod),
SL = iris$Sepal.Length,
Intercept = iris_mod$`(Intercept)`,
versicolor = iris_mod$Speciesversicolor,
virginica = iris_mod$Speciesvirginica)
# Stan code
data{
int N;
real <lower = 0> SL[N];
int <lower = 1> Intercept[N];
int <lower = 0, upper = 1> versicolor[N];
int <lower = 0, upper = 1> virginica[N];
}
parameters{
real beta0;
real beta1;
real beta2;
real <lower = 0> sigma;
}
transformed parameters{
real mu[N];
for(n in 1:N) mu[n] = beta0*Intercept[n] + beta1*versicolor[n] +
beta2*virginica[n];
}
model{
for(n in 1:N) SL[n] ~ normal(mu[n], sigma);
}
generated quantities{
real diff_beta0_beta1;
real diff_beta1_beta2;
real diff_beta0_beta2;
diff_beta0_beta1 = (beta0 + beta1) - beta0;
diff_beta1_beta2 = (beta0 + beta1) - (beta0 + beta2);
diff_beta0_beta2 = (beta0 + beta2) - beta0;
}
library(rstan)
fit_stan <-
stan(file = "iris.stan", data = stan_data, chains = 4,
seed = 1234)
# confirmation of posterior distribution
print(fit_stan, pars = c("diff_beta0_beta1", "diff_beta1_beta2",
"diff_beta0_beta2"))
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
diff_beta0_beta1 0.92 0 0.1 0.73 0.86 0.92 0.99 1.13 2041 1
diff_beta1_beta2 0.65 0 0.1 0.45 0.58 0.65 0.72 0.86 4000 1
diff_beta0_beta2 1.58 0 0.1 1.38 1.51 1.58 1.64 1.78 1851 1
Finally, I could get same results between the frequentist method and bayesian method.
I think this is correct way, but I'm not sure this because there are no information nor examples.
Also I also confirm this way could be extended another error distributions (including, poisson, gamma, binomial, negative- binomial etc.).
If there are another good ways or advices, please teach me.
You can calculate any function (including the difference in coefficients) of draws (such as those produced by Stan) from any proper posterior distribution.
I have a glm model that works. Since I'd like to add (ridge) regularization I thought I'd switch to glmnet. For some reason I cannot get glmnet to work. It seems to always predict the first class, never the second, which results in low accuracy and kappa = 0.
Below is some code to reproduce the problem. What am I doing wrong?
The test data it generates looks like this:
Since the data cannot be linearly separated two polynomial terms A^2 and B^2 are added.
A glm model predicts the data correctly (with accuracy = 1 and kappa = 1). Here is its prediction boundary:
While a glmnet model always has kappa = 0, no matter what lambda it tries:
lambda Accuracy Kappa Accuracy SD Kappa SD
0 0.746 0 0.0295 0
1e-04 0.746 0 0.0295 0
0.01 0.746 0 0.0295 0
0.1 0.746 0 0.0295 0
1 0.746 0 0.0295 0
10 0.746 0 0.0295 0
Code to reproduce the problem:
library(caret)
# generate test data
set.seed(42)
n <- 500; m <- 100
data <- data.frame(A=runif(n, 98, 102), B=runif(n, 98, 102), Type="foo")
data <- subset(data, sqrt((A-100)^2 + (B-100)^2) > 1.5)
data <- rbind(data, data.frame(A=rnorm(m, 100, 0.25), B=rnorm(m, 100, 0.25), Type="bar"))
# add a few polynomial features to match ellipses
polymap <- function(data) cbind(data, A2=data$A^2, B2=data$B^2)
data <- polymap(data)
plot(x=data$A, y=data$B, pch=21, bg=data$Type, xlab="A", ylab="B")
# train a binomial glm model
model.glm <- train(Type ~ ., data=data, method="glm", family="binomial",
preProcess=c("center", "scale"))
# train a binomial glmnet model with ridge regularization (alpha = 0)
model.glmnet <- train(Type ~ ., data=data, method="glmnet", family="binomial",
preProcess=c("center", "scale"),
tuneGrid=expand.grid(alpha=0, lambda=c(0, 0.0001, 0.01, 0.1, 1, 10)))
print(model.glm) # <- Accuracy = 1, Kappa = 1 - good!
print(model.glmnet) # <- Accuracy = low, Kappa = 0 - bad!
Calling glmnet directly (without caret) results in the same problem:
x <- as.matrix(subset(data, select=-c(Type)))
y <- data$Type
model.glmnet2 <- cv.glmnet(x=x, y=y, family="binomial", type.measure="class")
preds <- predict(model.glmnet2, x, type="class", s="lambda.min")
# all predictions are class 1...
EDIT: Plot of the scaled data and the decision boundary found by glm:
Model: -37 + 6317*A + 6059*B - 6316*A2 - 6059*B2
You should center and scale data prior to making polynomial versions of the predictor. Numerically, things work better that way:
set.seed(42)
n <- 500; m <- 100
data <- data.frame(A=runif(n, 98, 102), B=runif(n, 98, 102), Type="foo")
data <- subset(data, sqrt((A-100)^2 + (B-100)^2) > 1.5)
data <- rbind(data, data.frame(A=rnorm(m, 100, 0.25), B=rnorm(m, 100, 0.25), Type="bar"))
data2 <- data
data2$A <- scale(data2$A, scale = TRUE)
data2$B <- scale(data2$B, scale = TRUE)
data2$A2 <- data2$A^2
data2$B2 <- data2$B^2
# train a binomial glm model
model.glm2 <- train(Type ~ ., data=data2, method="glm")
# train a binomial glmnet model with ridge regularization (alpha = 0)
model.glmnet2 <- train(Type ~ ., data=data2, method="glmnet",
tuneGrid=expand.grid(alpha=0,
lambda=c(0, 0.0001, 0.01, 0.1, 1, 10)))
From these:
> getTrainPerf(model.glm2)
TrainAccuracy TrainKappa method
1 1 1 glm
> getTrainPerf(model.glmnet2)
TrainAccuracy TrainKappa method
1 1 1 glmnet
Max
Is anyone using R2OpenBUGS? Should I rather be using r2winbugs? ...
I am trying to model final (2-year) treatment outcomes (e.g. success, death, default or failure) for my sample of patients with a (single) intermediate (3-month) outcome.
R2OpenBUGS is giving me a strange posterior on the multinomial node, in which two of the outcomes are constant, the other two outcomes are equal, and the total number of outcomes is greater than the cohort size.
What am I doing wrong? Many thanks in advance! Code out output are below.
library(R2OpenBUGS)
model <- function() {
# Prior : distribution of final outcomes for treatment cohort N_tx
outc[1:4] ~ dmulti(p.outc[],N_tx)
p.outc[1] <- 164/1369
p.outc[2] <- 907/1369
p.outc[3] <- 190/1369
p.outc[4] <- 108/1369
# Prior : distribution of intermediate outcome (proxy of final outcome) for each final outcome cohort
# (e.g. proportion of patient with final outcome 1 that exhibited the intermediate outcome)
cr_1 ~ dunif(0.451, 0.609)
cr_2 ~ dunif(0.730, 0.787)
cr_3 ~ dunif(0.559, 0.700)
cr_4 ~ dunif(0.148, 0.312)
# Probability p of the intermediate outcome given prior distributions
p <- (outc[1]*cr_1+outc[2]*cr_2+outc[3]*cr_3+outc[4]*cr_4)/N_tx
# Likelihood function for the number of culture conversions at 3 months among those still on treatment in month 6 (excludes confirmed deaths and defaulters)
cs ~ dbin(p,N_tx)
}
# N_tx is the number of patients in our cohort
N_tx <- 100
# cs is the number of patient exhibiting the intermediate outcome
cs <- 80
data <- list("N_tx", "cs")
inits <- function() { list(outc=c(round(164/1369*N_tx),
round(907/1369*N_tx),
round(190/1369*N_tx),
round(108/1369*N_tx)),
cr_1=87/(87+77),
cr_2=689/(689+218),
cr_3=120/(120+70),
cr_4=24/(24+84))
}
params <- c("outc")
model.file <- file.path(tempdir(), "model.txt")
write.model(model, model.file)
out <- bugs(data, inits, params, model.file, n.iter=100000,debug=TRUE)
all(out$summary[,"Rhat"] < 1.1)
out$mean["outc"]
out$sd["outc"]
print(out, digits=5)
And here are some of the outputs:
> all(out$summary[,"Rhat"] < 1.1)
[1] TRUE
>
> out$mean["outc"]
$outc
[1] 15.53095 66.00000 14.00000 15.53095
> out$sd["outc"]
$outc
[1] 3.137715 0.000000 0.000000 3.137715
>
> print(out, digits=5)
Inference for Bugs model at "C:\",
Current: 3 chains, each with 1e+05 iterations (first 50000 discarded)
Cumulative: n.sims = 150000 iterations saved
mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff
outc[1] 15.53095 3.13771 10.00000 13.000 15.000 18.000 22.00 1.00100 130000
outc[2] 66.00000 0.00000 66.00000 66.000 66.000 66.000 66.00 1.00000 1
outc[3] 14.00000 0.00000 14.00000 14.000 14.000 14.000 14.00 1.00000 1
outc[4] 15.53095 3.13771 10.00000 13.000 15.000 18.000 22.00 1.00100 130000
deviance 8.59096 2.23382 5.08097 6.927 8.323 9.963 13.66 1.00102 55000
For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
DIC info (using the rule, pD = var(deviance)/2)
pD = 2.5 and DIC = 11.1
DIC is an estimate of expected predictive error (lower deviance is better).