I'm trying to implement a Bayesian ANCOVA that takes account of heteroscedasticity in R using JAGS. However, despite going through several tutorials of Bayesian simple regression and ANOVA, I can't understand how to prepare the file for JAGS. Here is my code so far:
y1 = rexp(57, rate=0.8) # dependent variable
x1 = hist(rbeta(57, 6, 2)) # continuous factor
x2 = rep(c(1, 2), 57/2) # categorical factor
groups = 2
n = 57
# list of variables
lddados <- list(g=groups, n=length(x), y=y, x1=x1, x2=x2)
sink('reglin.txt') # nome do arquivo aqui
cat('
# model
{
for(i in 1:n){
mu[i] = a0 + a[i]
y[i] = a0 + x1*a[ x2[i] ] + ε[i]
}
priors
y ~ dgamma(0.001,0.01)
for(i in 1:n){
inter[i] ~ dgamma(0.001,0.001)
coef[i] ~ dnorm(0.0,1.0E-
likelihood
got stuck...
}
}#------fim do modelo
')
sink()
Im currently trying out ANCOVA using rjags myself...
From my understanding, I would test this (untested);
require(rjags)
require(coda)
model_string <- "
model {
for ( i in 1:n ){
mu[i] <- a0 + a[x2[i]] + a3 * x1[i] # linear predictor
y[i] ~ dnorm(mu[i], prec) # y is norm. dist.
}
# priors
a0 ~ dnorm(0, 1.0E-6) # intercept
a[1] ~ dnorm(0, 1.0E-6) # effect of x1 at x2 level 1
a[2] ~ dnorm(0, 1.0E-6) # effect of x1 at x2 level 2
a3 ~ dnorm(0, 1.0E-6) # regression coefficient for x1 (covariate)
prec ~ dgamma(0.001, 0.001) # precision (inverse of variance)
}
"
# initial values for the mcmc
inits_list <- list(a=0, b=c(0,0), prec=100)
# model, initial values and data in right format
jags_model <- jags.model(textConnection(model_string), data=data, inits=inits_list, n.adapt = 500, n.chains = 3, quiet = T)
# burn-in
update(jags_model, 10000)
# run the mcmc chains using the coda package
mcmc_samples <- coda.samples(jags_model, c("mu", "a", "a1", "a2", "prec"), n.iter = 100000)
Tell me if it works...
Recommended books; McCarthy M. Bayesian Methods for Ecology and Kruschke JK. Doing Bayesian Data Analysis
Related
I am trying to extract the confidence intervals for my panel logit regression. I am using the following code:
model <- bife(dependent_variable ~ x1 + x2 | area, data = df, model = 'logit')
confint(model)
Running confint gives me NA values for all the coefficients and their confidence intervals.
Is this because of the 'bife' object? The model itself runs fine.
It's the bife:::vcov.bife method which doesn't produce dimnames. Until the author fixes this, we could help ourselves by writing a confint.bife method, that assigns coefficient names to the vcov.
confint.bife <- function (object, parm, level=0.95, ...) {
cf <- coef(object)
pnames <- names(cf)
if (missing(parm)) parm <- pnames
else if (is.numeric(parm)) parm <- pnames[parm]
a <- (1 - level)/2
a <- c(a, 1 - a)
pct <- stats:::format.perc(a, 3)
fac <- qnorm(a)
ci <- array(NA, dim=c(length(parm), 2L),
dimnames=list(parm, pct))
vc <- `dimnames<-`(vcov(object), list(pnames, pnames))
ses <- sqrt(diag(vc))[parm]
ci[] <- cf[parm] + ses %o% fac
ci
}
library('bife')
mod <- bife(LFP ~ I(AGE^2) + log(INCH) + KID1 + KID2 + KID3 +
factor(TIME) | ID, psid)
confint(mod)
# 2.5 % 97.5 %
# I(AGE^2) -0.003787755 -0.001185755
# log(INCH) -0.606681358 -0.236717893
# KID1 -1.393748723 -1.008131941
# KID2 -0.830532213 -0.485097762
# KID3 -0.248997085 0.012550225
# factor(TIME)2 -0.244728227 0.303869081
# factor(TIME)3 -0.190434814 0.438179674
# factor(TIME)4 0.117647679 0.870167422
# factor(TIME)5 0.635239557 1.547524672
# factor(TIME)6 0.613792831 1.689971248
# factor(TIME)7 0.639896725 1.876532219
# factor(TIME)8 0.585828050 2.017753781
# factor(TIME)9 0.753717289 2.381327746
I am trying to calculate manually the r-squared given by lm() in R
Considering:
fit <- lm(obs_values ~ preds_values, df)
with sd(df$obs_values) == sd(df$preds_values) and mean(df$obs_values) == mean(df$preds_values)
To do so I can extract the residuals by doing
res_a = residuals(fit) and then inject them in the formula as :
y = sum( (df$obs_values - mean(df$obs_values))^2 )
r-squared = 1 - sum(res_a^2)/y
Here I get the expected r-squared
Now, I would like to get the residual manually.
It should be as trivial as :
res_b = df$obs_values - df$predss_values, but for some reason, res_b is different than res_a...
You can't just do y - x in a regression y ~ x to get residuals. Where have regression coefficients gone?
fit <- lm(y ~ x)
b <- coef(fit)
resi <- y - (b[1] + b[2] * x)
You have many options:
## Residuals manually
# option 1
beta_hat <- coef(fit)
obs_values_hat <- beta_hat["(Intercept)"] + beta_hat["preds_values"] * preds_values
u_hat <- obs_values - obs_values_hat # residuals
# option 2
obs_values_hat <- fitted(fit)
u_hat <- obs_values - obs_values_hat # residuals
# (option 3 - not manually) or just u_hat <- resid(fit)
## R-squared manually
# option 1
var(obs_values_hat) / var(obs_values)
# option 2
1 - var(u_hat) / var(obs_values)
# option 3
cor(obs_values, obs_values_hat)^2
I have a question about this model in JAGS, I want to make a bayesian linear regression with a y[i] that follows not a normal distribution but a gamma.
The model is this:
"model {
Priors:
a ~ dnorm(0, 0.0001) # mean, precision = N(0, 10^4)
b ~ dnorm(0, 0.0001)
shape ~ dunif(0, 100)
# Likelihood data model:
for (i in 1:N) {
linear_predictor[i] <- a + b * x[i]
# dgamma(shape, rate) in JAGS:
y[i] ~ dgamma(shape, shape / exp(linear_predictor[i]))
}
}
"
What should I change to make this code usable for a multiple linear regression with this data?
dataListGamma = list(
x = x,
y = y,
Nx = dim(x)[2],
Ntotal = dim(x)[1]
)
i'm receiving this error:
Error in node (shape/(exp(linear_predictor[1331])))
how can this be possible? i cant understand
if i run it again it changes the value that makes the problem
Something like this (making b a vector with identical, independent priors for each element, and constructing the linear predictor with a for loop) should work:
model {
# Priors:
a ~ dnorm(0, 0.0001) # mean, precision = N(0, 10^4)
for (j in 1:Nx)
b[i] ~ dnorm(0, 0.0001)
}
shape ~ dunif(0, 100)
# Likelihood data model:
linear_predictor[i] <- a
for (i in 1:Ntotal) {
for (j in 1:Nx) {
linear_predictor[i] <- linear_predictor[i] + b[j]*N[i][j]
}
y[i] ~ dgamma(shape, shape / exp(linear_predictor[i]))
}
}
When I run a cluster standard error panel specification with plm and lfe I get results that differ at the second significant figure. Does anyone know why they differ in their calculation of the SE's?
set.seed(572015)
library(lfe)
library(plm)
library(lmtest)
# clustering example
x <- c(sapply(sample(1:20), rep, times = 1000)) + rnorm(20*1000, sd = 1)
y <- 5 + 10*x + rnorm(20*1000, sd = 10) + c(sapply(rnorm(20, sd = 10), rep, times = 1000))
facX <- factor(sapply(1:20, rep, times = 1000))
mydata <- data.frame(y=y,x=x,facX=facX, state=rep(1:1000, 20))
model <- plm(y ~ x, data = mydata, index = c("facX", "state"), effect = "individual", model = "within")
plmTest <- coeftest(model,vcov=vcovHC(model,type = "HC1", cluster="group"))
lfeTest <- summary(felm(y ~ x | facX | 0 | facX))
data.frame(lfeClusterSE=lfeTest$coefficients[2],
plmClusterSE=plmTest[2])
lfeClusterSE plmClusterSE
1 0.06746538 0.06572588
The difference is in the degrees-of-freedom adjustment. This is the usual first guess when looking for differences in supposedly similar standard errors (see e.g., Different Robust Standard Errors of Logit Regression in Stata and R). Here, the problem can be illustrated when comparing the results from (1) plm+vcovHC, (2) felm, (3) lm+cluster.vcov (from package multiwayvcov).
First, I refit all models:
m1 <- plm(y ~ x, data = mydata, index = c("facX", "state"),
effect = "individual", model = "within")
m2 <- felm(y ~ x | facX | 0 | facX, data = mydata)
m3 <- lm(y ~ facX + x, data = mydata)
All lead to the same coefficient estimates. For m3 the fixed effects are explicitly reported while they are not for m1 and m2. Hence, for m3 only the last coefficient is extracted with tail(..., 1).
all.equal(coef(m1), coef(m2))
## [1] TRUE
all.equal(coef(m1), tail(coef(m3), 1))
## [1] TRUE
The non-robust standard errors also agree.
se <- function(object) tail(sqrt(diag(object)), 1)
se(vcov(m1))
## x
## 0.07002696
se(vcov(m2))
## x
## 0.07002696
se(vcov(m3))
## x
## 0.07002696
And when comparing the clustered standard errors we can now show that felm uses the degrees-of-freedom correction while plm does not:
se(vcovHC(m1))
## x
## 0.06572423
m2$cse
## x
## 0.06746538
se(cluster.vcov(m3, mydata$facX))
## x
## 0.06746538
se(cluster.vcov(m3, mydata$facX, df_correction = FALSE))
## x
## 0.06572423
Simulate an AR(1) in R as follows:
# True parameters
b0 <- 1 # intercept
b1 <- 0.9 # coefficient
trueMean <- b0 / (1-b1) # equals to 10
set.seed(8236)
capT <- 1000
eps <- rnorm(capT)
y <- rep(NA,capT)
y[1] <- b0 + b1*trueMean + eps[1] # Initialize the series
for(t in 2:capT) y[t] = b0 + b1*y[t-1] + eps[t]
reg1 <- ar(y)
reg2 <- arima(y, order=c(1,0,0))
reg3 <- lm( y[2:capT] ~y[1:(capT-1)] )
Both reg1 and reg3 estimates are close to the true values. However, reg2 which uses the arima function estimates an intercept close to the true Mean of 10. Any clue as to why this is happening?
Got the answer on this page http://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm
It seems arima() reports the mean but calls it intercept!