MCMClogit confusion - r

Could anybody explain to me why
simulatedCase <- rbinom(100,1,0.5)
simDf <- data.frame(CASE = simulatedCase)
posterior_m0 <<- MCMClogit(CASE ~ 1, data = simDf, b0 = 0, B0 = 1)
always results in a MCMC acceptance ratio of 0? Any explanation would be greatly appreciated!

I think your problem is the model formula, since logistic regression models have no error term. Thus you model CASE ~ 1 should be replaced by something like CASE ~ x (the predictor variable x is mandatory). Here is your example, modified:
CASE <- rbinom(100,1,0.5)
x <- 1:100
posterior_m0 <- MCMClogit (CASE ~ x, b0 = 0, B0 = 1)
classic_m0 <- glm (CASE ~ x, family=binomial(link="logit"), na.action=na.pass)
So I think your problem is not related to the MCMCpack library (disclaimer: I have never used this package).

For anyone stumbling into this same problem :
It seems that the MCMClogit function cannot handle anything but B0=0 if your model only has an intercept.
If you add a covariate, then you can specify a precision just fine.
I would consider other packages (such as arm or rjags) if you really want to sample from this model. For a list of options available for Bayesian regression, see http://cran.r-project.org/web/views/Bayesian.html

Related

Estimating multiple OLS with AR residuals

I am new to modeling in R, so I'm stumbling a bit...
I have a model in Eviews, which I have to translate to R and make further upgrades.
The model is multiple OLS with AR(1) of residuals.
I implemented it like this
model1 <- lm(y ~ x1 + x2 + x3, data)
data$e <- dplyr:: lag(residuals(model1), 1)
model2 <- lm(y ~ x1 + x2 + x3 + e, data)
My issue is the same as it is in this thread and I expected it: while parameter estimations are similar, they are different enought that I cannot use it.
I am planing of using ARIMA from stats package, but the problem is implementation. How to make AR(1) on residuals, and make other variables as they are?
Provided I understood you correctly, you can supply external regressors to your arima model through the xreg argument.
You don't provide sample data so I don't have anything to play with, but your model should translate to something like
model <- arima(data$y, xreg = as.matrix(data[, c("x1", "x2", "x3")]), order = c(1, 0, 0))
Explanation: The first argument data$y contains your time series data. xreg contains your external regressors as a matrix, with every column containing as many observations for that regressor as you have time points. order = c(1, 0, 0) defines an AR(1) model.

(mis) understanding priors in MCMCglmm

I'm new to this package; I usually use Stan for Bayesian models, but I'm using a lot of data and hoping I can get the models to run faster in MCMCglmm. I have read the course notes as well as the readme on github. They are really helpful and great descriptions, but I still do not understand how to set priors.
I have a simple mixed effects model, and here is some example code. (I know that others have asked this question, but I had trouble finding one with reproducible code and those answers helped users with a specific question, while I'm trying to understand what the values in a prior list mean).
library(MCMCglmm)
# inverse logit function for simulation
inv.logit <- function(x){
exp(x)/(exp(x)+1)
}
# simulate some data
set.seed(123)
n <- 1000
covariates <- replicate(3, rnorm(n, 0, .5))
X <- cbind(rep(1, length(covariates[,1])),covariates)
colnames(X) <- c('int', 'X1', 'X2', 'X3')
coefs <- c(1, -1, -.5, 0.3)
resp <- X %*% coefs
psi <- inv.logit(resp)
ind <- rep(1:10, floor(n/10)) # assigning individuals for random effect, but it's not important
n_inds <- length(unique(ind))
y <- rbinom(n, 1, psi)
df <- data.frame(ind, y, X)
# priors: I really don't know what I'm doing here
prior1 <- list(R = list(V = 2, n = 1, fix=1),
G = list(G1 = list(V = diag(3), n = 3)))
# run the model
glmm <- MCMCglmm(y ~ X1 + X2 + X3,
random= ~ us(1 + ind), # having random slope and random intercept for ind
family = "categorical",
data = df,
prior=prior1
)
If I were running this in Stan, I would set priors for each fixed effect covariate: Beta ~ normal(0, sigma_beta) and the hyperprior: sigma_beta ~ gamma(2,1). (Although, I'd also be happy just setting the prior as Beta ~ normal(0,100) or something like this.) I would use similar priors for the random effects. I understand that MCMCglmm is more limited in distributions, but I really don't understand the notation. To be clear, I'm not really interested in specifying priors in this example model, I'm trying to understand how one does it.
Is there somewhere I can find a definition of what is exactly meant by each of the values that goes into the prior (e.g., V, n, alpha, ...) and how these values correspond to what we would write in a full model description of the priors? Or is someone willing to explain it to more simple minded people like me? The descriptions in the course notes and github were not able to answer my question.
Thank you!

I am using glmnet to address the multicollinearity issue, and for best lambda i want to calculate VIF between variables

I am using glmnet and for the best lambda I want to check the VIF between variables. Can anyone suggest how can I accomplish this?
Below is the code I am following and fielddfm is the data frame containing the independent variables:
x<- model.matrix(depvar ~ ., fielddfm) [,-1]
y <- depvar
lambda <- 10^seq(10, -2, length = 100)
ridge.mod <- glmnet(x, y, alpha = 0, lambda = lambda)
predict(ridge.mod, s = 0, exact = T, type = 'coefficients')
cv.out <- cv.glmnet(x, y, alpha = 0, nfolds = 3)
bestlam <- cv.out$lambda.min
ridge.pred <- predict(ridge.mod, s = bestlam, newx = x)
predict(ridge.mod, type = "coefficients", s = bestlam)'
Here, I get the coefficients for different promotion vehicles but I want to know, VIF values for the best lambda for different independent variables
Could yo please suggest how can I achieve this?
Since a) VIF is a function of your predictors rather than your model and b) a ridge regression keeps all variables irrespective of lambda, you could get the VIFs from an arbitrarily-fitted linear model. For example:
vifs = car::vif(lm(y ~ ., data = X))
where y is your response and X is your dataframe of predictors. Note that the results are independent of the values contained in y.
Given the above however, It's a little dubious whether this question makes sense in the first place...

change null hypothesis in lmtest in R

I have a linear model generated using lm. I use the coeftest function in the package lmtest go test a hypothesis with my desired vcov from the sandwich package. The default null hypothesis is beta = 0. What if I want to test beta = 1, for example. I know I can simply take the estimated coefficient, subtract 1 and divide by the provided standard error to get the t-stat for my hypothesis. However, there must be functionality for this already in R. What is the right way to do this?
MWE:
require(lmtest)
require(sandwich)
set.seed(123)
x = 1:10
y = x + rnorm(10)
mdl = lm(y ~ x)
z = coeftest(mdl, df=Inf, vcov=NeweyWest)
b = z[2,1]
se = z[2,2]
mytstat = (b-1)/se
print(mytstat)
the formally correct way to do this:
require(multcomp)
zed = glht(model=mdl, linfct=matrix(c(0,1), nrow=1, ncol=2), rhs=1, alternative="two.sided", vcov.=NeweyWest)
summary(zed)
Use an offset of -1*x
mdl<-lm(y~x)
mdl2 <- lm(y ~ x-offset(x) )
> mdl
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
0.5255 0.9180
> mdl2
Call:
lm(formula = y ~ x - offset(x))
Coefficients:
(Intercept) x
0.52547 -0.08197
You can look at summary(mdl2) to see the p-value (and it is the same as in mdl.
As far as I know, there is no default function to test the model coefficients against arbitrary value (1 in your case). There is the offset trick presented in the other answer, but it's not that straightforward (and always be careful with such model modifications). So, your expression (b-1)/se is actually a good way to do it.
I have two notes on your code:
You can use summary(mdl) to get the t-test for 0.
You are using lmtest with covariance structure (which will change the t-test values), but your original lm model doesn't have it. Perhaps this could be a problem? Perhaps you should use glm and specify the correlation structure from the start.

Linear Regression with a known fixed intercept in R

I want to calculate a linear regression using the lm() function in R. Additionally I want to get the slope of a regression, where I explicitly give the intercept to lm().
I found an example on the internet and I tried to read the R-help "?lm" (unfortunately I'm not able to understand it), but I did not succeed. Can anyone tell me where my mistake is?
lin <- data.frame(x = c(0:6), y = c(0.3, 0.1, 0.9, 3.1, 5, 4.9, 6.2))
plot (lin$x, lin$y)
regImp = lm(formula = lin$x ~ lin$y)
abline(regImp, col="blue")
# Does not work:
# Use 1 as intercept
explicitIntercept = rep(1, length(lin$x))
regExp = lm(formula = lin$x ~ lin$y + explicitIntercept)
abline(regExp, col="green")
Thanls for your help.
You could subtract the explicit intercept from the regressand and then fit the intercept-free model:
> intercept <- 1.0
> fit <- lm(I(x - intercept) ~ 0 + y, lin)
> summary(fit)
The 0 + suppresses the fitting of the intercept by lm.
edit To plot the fit, use
> abline(intercept, coef(fit))
P.S. The variables in your model look the wrong way round: it's usually y ~ x, not x ~ y (i.e. the regressand should go on the left and the regressor(s) on the right).
I see that you have accepted a solution using I(). I had thought that an offset() based solution would have been more obvious, but tastes vary and after working through the offset solution I can appreciate the economy of the I() solution:
with(lin, plot(y,x) )
lm_shift_up <- lm(x ~ y +0 +
offset(rep(1, nrow(lin))),
data=lin)
abline(1,coef(lm_shift_up))
I have used both offset and I(). I also find offset easier to work with (like BondedDust) since you can set your intercept.
Assuming Intercept is 10.
plot (lin$x, lin$y)
fit <-lm(lin$y~0 +lin$x,offset=rep(10,length(lin$x)))
abline(fit,col="blue")

Resources