Currently I want to fit a mixed effects model with positive or negative constraints on the parameters.
For parameter estimation using Joint Modeling, we use nlme package, which is required by the JM package.
In the lme function of the nlme package, it seems that optimization details can be specified by using lmeControl().
https://stat.ethz.ch/R-manual/R-devel/library/nlme/html/lmeControl.html
In "L-BFGS-B" using the optim function described as an option here,
It looks like we can specify the maximum and minimum values of the estimated parameters.
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/optim.html
However, only "optim" or "nlminb" as character can be specified with lmeControl(),
It doesn't look like I can write an optimization function by reading the source code.
https://github.com/cran/nlme/blob/master/R/lme.R
Is it possible to create a parameter constrained mixed effects model using the nlme package?
Thank you.
memo:
Specifying upper and lower options to the lme and lmeCntrol functions was not effective.
Also giving the lmeCntrol function an optimization function didn't work.
(Maybe I could not write the funciton properly.)
require(JM)
require(nlme)
require(tidyverse)
data(aids)
data(aids.id)
#fit mixed effect model
fitLME <- lme(sqrt(CD4) ~ obstime + obstime:drug, random = ~ obstime | patient, data = aids, )
fitLME %>% summary()
#create lme control.
ctrl <- lmeControl(optimMethod = "L-BFGS-B", opt ="optim", lower = c(0,0, 0), upper = c(1,20,1))
#fit with lme control
fitLME_ctrl <- lme(sqrt(CD4) ~ obstime + obstime:drug, random = ~ obstime | patient, data = aids, control = ctrl)
#this does not change result.
fitLME_ctrl %>% summary()
Related
I'm estimating a mixed effects model using simulated data. The basis of this is a conjoint experiment: there are N number of countries in the study with P participants and each respondent is shown the experiment twice. This means that there are NxPx2 observations. Heterogeneity is introduced into the data at the country level and so I run a mixed effect model using lmer with random effects varying by country to account for this variance. However, because each respondent does the experiment twice, I also want to cluster my standard errors at the individual level. My data and model looks something like this:
library(lme4)
data(iris)
# generating IDs for observations
iris <- iris %>% mutate(id = rep(1:(n()/2), each = 2))
#run model
mod <- lmer(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width + (Sepal.Width+Petal.Length+Petal.Width || Species), data=iris, REML = F, control = lmerControl(optimizer = 'bobyqa'))
I then attempt to get clustered SEs using the parameters package:
library(parameters)
param <- model_parameters(
mod,
robust = TRUE,
vcov_estimation = "CR",
vcov_type = "CR1",
vcov_args = list(cluster = iris$id)
)
This returns an error:
Error in vcovCR.lmerMod(obj = new("lmerModLmerTest", vcov_varpar = c(0.00740122363004, : Non-nested random effects detected. clubSandwich methods are not available for such models.
I'm not married to any one method or anything. I just want to return clustered SEs for this type of model specification. As of now I can't find any package that does this. Does anyone know how this can be done, or if such a model even makes sense? I'm new to MLMs but I was thinking if I were to run this as a simple linear model I would lm_robust and cluster by individual so it makes sense to me that I should do the same here as well.
I found that the predict function is currently not implemented in cumulative link mixed models fitted using the clmm function in ordinal R package. While predict is implemented for clmm2 in the same package, I chose to apply clmm instead because the later allows for more than one random effects. Further, I also fitted several clmm models and performed model averaging using model.avg function in MuMIn package. Ideally, I want to predict probabilities using the average model. However, while MuMIn supports clmm models, predict will also not work with the average model.
Is there a way to hack the predict function so that the function not only could predict probabilities from a clmm model, but also predict using model averaged coefficients from clmm (i.e. object of class "averaging")? For example:
require(ordinal)
require(MuMIn)
mm1 <- clmm(SURENESS ~ PROD + (1|RESP) + (1|RESP:PROD), data = soup,
link = "probit", threshold = "equidistant")
## test random effect:
mm2 <- clmm(SURENESS ~ PROD + (1|RESP) + (1|RESP:PROD), data = soup,
link = "logistic", threshold = "equidistant")
#create a model selection object
mm.sel<-model.sel(mm1,mm2)
##perform a model average
mm.avg<-model.avg(mm.sel)
#create new data and predict
new.data<-soup
##predict with indivindual model
predict(mm1, new.data)
I got the following error message:
In UseMethod("predict") :
no applicable method for predict applied to an object of class "clmm"
##predict with model average
predict(mm.avg, new.data)
Another error is returned:
Error in predict.averaging(mm.avg, new.data) :
predict for models 'mm1' and 'mm2' caused errors
I've been using clmm as well and yes I confirm predict.clmm is NOT (yet?) implemented. I didn't yet check the source code for fake.predict.clmm. It might work. If it doesn't, you're stuck with doing stuff by hand or using predict.clmm2.
I found a potential solution (pasted below) but have not been able to make work for my data.
Solution here: https://gist.github.com/mainambui/c803aaf857e54a5c9089ea05f91473bc
I think the problem is the number of coefficients I am using but am not experienced enough to figure it out. Hopefully this helps someone out though.
This is the model and newdata that I am using, though it is actually a model averaged version. Same predictors though.
ma10 <- clmm(Location3 ~ Sex * Grass3 + Sex * Forb3 + (1|Tag_ID), data =
IP_all_dunes)
ma_1 <- model.avg(ma10, ma8, ma5)##top 3 models
new_ma<- data.frame(Sex = c("m","f","m","f","m","f","m","f"),
Grass3 = c("1","1","1","1","0","0","0","0"),
Forb3 = c("0","0","1","1","0","0","1","1"))
# Arguments:
# - model = a clmm model
# - modelAvg = a clmm model average (object of class averaging)
# - newdata = a dataframe of new data to apply the model to
# Returns a dataframe of predicted probabilities for each row and response level
fake.predict.clmm <- function(modelAvg, newdata) {
# Actual prediction function
pred <- function(eta, theta, cat = 1:(length(theta) + 1), inv.link = plogis) {
Theta <- c(-1000, theta, 1000)
sapply(cat, function(j) inv.link(Theta[j + 1] - eta) - inv.link(Theta[j] -
eta))
}
# Multiply each row by the coefficients
#coefs <- c(model$beta, unlist(model$ST))##turn off if a model average is used
beta <- modelAvg$coefficients[2,3:12]
coefs <- c(beta, unlist(modelAvg$ST))
xbetas <- sweep(newdata, MARGIN=2, coefs, `*`)
# Make predictions
Theta<-modelAvg$coefficients[2,1:2]
#pred.mat <- data.frame(pred(eta=rowSums(xbetas), theta=model$Theta))
pred.mat <- data.frame(pred(eta=rowSums(xbetas), theta=Theta))
#colnames(pred.mat) <- levels(model$model[,1])
a<-attr(modelAvg, "modelList")
colnames(pred.mat) <- levels(a[[1]]$model[,1])
pred.mat
}
I have run a zero-inflated negative binomial model using the glmmADMB package in R. From what I understand, the pz parameter is the zero-inflation parameter and it is fitted by the package to the model that you run- the pz value that best fits your data is searched for and the package starts searching from pz=0.2. This is the default and can be changed.
After you run the model, does anyone know how to find what pz value is chosen for the data?
The zero-inflation estimate can be obtained (along with its standard deviation) from the model object. See below using built-in data from the glmmADMB package:
library(glmmADMB)
# munge data
Owls = transform(Owls, Nest = reorder(Nest, NegPerChick),
logBroodSize = log(BroodSize), NCalls = SiblingNegotiation)
# fit model
fit_zinb = glmmadmb(NCalls ~ (FoodTreatment + ArrivalTime) * SexParent +
offset(logBroodSize) + (1 | Nest),
data = Owls, zeroInflation = TRUE,
family = "nbinom")
# overall summary, check for match
summary(fit_zinb)
# zero-inflation estimate
fit_zinb$pz
# zero-inflation standard deviation
fit_zinb$sd_pz
I'm doing a replication of an estimation done with Stata's xtregar command, but I'm using R instead.
The xtregar command implements the method from Baltagi and Wu (1999) "Unequally spaced panel data regressions with AR(1) disturbances" paper. As Stata describes it:
xtregar fits cross-sectional time-series regression models when the disturbance term is first-order autoregressive. xtregar offers a within estimator for fixed-effects models and a GLS estimator for random-effects models. xtregar can accommodate unbalanced panels whose observations are unequally spaced over time.
So far, for the fixed-effects model, I used the plm package for R. The attempt looks like this:
plm(data=A, y ~ x1 + x2, effect = "twoways", model = "within")
Nevertheless is not complete (comparing to xtregar description) and the results are not quite like the ones Stata provides. Furthermore, Stata's command needs to set a panel variable and a time variable, feature that's (as far as I can tell) absent in the plm environment.
Should I settle with plm or is there another way of doing this?
PS: I searched thoroughly different websites but failed to find a equivalent to Stata's xtregar.
Update
After reading Croissant and Millo (2008) "Panel Data Econometrics in R: The plm Package", specifically seccion 7.4 "Some useful 'econometric' models in nlme" I used something like this for the Random Effects part of the estimation:
gls(data=A, y ~ x1 + x2, correlation = corAR1(0, form = ~ year | pays), na.action = na.exclude)
Nevertheless the following has results closer to those of Stata
lme(data=A, y ~ x1 + x2, random = ~ 1 | pays, correlation = corAR1(0, form = ~ year | pays), na.action = na.exclude)
Try {panelAR}. This is a package for regressions in panel data that addresses AR1 type of autocorrelations.
Unfortunately, I do not own Stata, so I can not test which correlation method to replicate in panelCorrMethod
library(panelAR)
model <-
panelAR(formula = y ~ x1 + x2,
data = A,
panelVar = 'pays',
timeVar = 'year',
autoCorr = 'ar1',
rho.na = TRUE,
bound.rho = TRUE,
panelCorrMethod ='phet' # You might need to change this parameter. 'phet' uses the HW Sandwich stimator for heteroskedasticity cases, but others are available.
)
This question regards how to code variable selection in a probit model with marginal effects (either directly or by calling some pre-existing package).
I'm conducting a little probit regression of the effects of free and commercial availability of films on the level of piracy of those films as a TLAPD-related blog post.
The easy way of running a probit in R is typically through glm, i.e.:
probit <- glm(y ~ x1 + x2, data=data, family =binomial(link = "probit"))
but that's problematic for interpretation because it doesn't supply marginal effects.
Typically, if I want marginal effects from a probit regression I define this function (I don't recall the original source, but it's a popular function that gets re-posted a lot):
mfxboot <- function(modform,dist,data,boot=500,digits=3){
x <- glm(modform, family=binomial(link=dist),data)
# get marginal effects
pdf <- ifelse(dist=="probit",
mean(dnorm(predict(x, type = "link"))),
mean(dlogis(predict(x, type = "link"))))
marginal.effects <- pdf*coef(x)
# start bootstrap
bootvals <- matrix(rep(NA,boot*length(coef(x))), nrow=boot)
set.seed(1111)
for(i in 1:boot){
samp1 <- data[sample(1:dim(data)[1],replace=T,dim(data)[1]),]
x1 <- glm(modform, family=binomial(link=dist),samp1)
pdf1 <- ifelse(dist=="probit",
mean(dnorm(predict(x, type = "link"))),
mean(dlogis(predict(x, type = "link"))))
bootvals[i,] <- pdf1*coef(x1)
}
res <- cbind(marginal.effects,apply(bootvals,2,sd),marginal.effects/apply(bootvals,2,sd))
if(names(x$coefficients[1])=="(Intercept)"){
res1 <- res[2:nrow(res),]
res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res1)),nrow=dim(res1)[1])
rownames(res2) <- rownames(res1)
} else {
res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep="")),nrow=dim(res)[1]))
rownames(res2) <- rownames(res)
}
colnames(res2) <- c("marginal.effect","standard.error","z.ratio")
return(res2)
}
Then run the regression like this:
mfxboot(modform = "y ~ x1 + x2",
dist = "probit",
data = piracy)
but using that approach I don't know that I can run any variable selection algorithms like forward, backward, stepwise, etc.
What's the best way to solve this problem? Is there a better way of running probits in R that reports marginal effects and also allows for automated model selection? Or should I focus on using mfxboot and doing variable selection with that function?
Thanks!
It is not clear why there is a problem. Model (variable) selection and computing of the marginal effects for a given model are sequential, and there is no reason to try to combine the two.
Here is how you might go about computing marginal effects and their bootstrapped standard effects post model (variable) selection:
Perform variable selection using your preferred model selection procedure (including bootstrap model selection techniques as discussed below, not to be confused with the bootstrap you will use to compute the standard errors of the marginal effects for the final model).
Here is an example on the dataset supplied in this question. Note also that this is in no way an endorsement of the use of stepwise variable selection methods.
#================================================
# read in data, and perform variable selection for
# a probit model
#================================================
dfE = read.csv("ENAE_Probit.csv")
formE = emploi ~ genre +
filiere + satisfaction + competence + anglais
glmE = glm(formula = formE,
family = binomial(link = "probit"),
data = dfE)
# perform model (variable) selection
glmStepE = step(object = glmE)
Now pass the selected model to a function that computes the marginal effects.
#================================================
# function: compute marginal effects for logit and probit models
# NOTE: this assumes that an intercept has been included by default
#================================================
fnMargEffBin = function(objBinGLM) {
stopifnot(objBinGLM$family$family == "binomial")
vMargEff = switch(objBinGLM$family$link,
probit = colMeans(outer(dnorm(predict(objBinGLM,
type = "link")),
coef(objBinGLM))[, -1]),
logit = colMeans(outer(dlogis(predict(objBinGLM,
type = "link")),
coef(objBinGLM))[, -1])
)
return(vMargEff)
}
# test the function
fnMargEffBin(glmStepE)
Here is the output:
> fnMargEffBin(glmStepE)
genre filiere
0.06951617 0.04571239
To get at the standard errors of the marginal effects, you could bootstrap the marginal effects, using, for example, the Boot function from the car function since it provides such a neat interface to bootstrap derived statistics from glm estimates.
#================================================
# compute bootstrap std. err. for the marginal effects
#================================================
margEffBootE = Boot(object = glmStepE, f = fnMargEffBin,
labels = names(coef(glmE))[-1], R = 100)
summary(margEffBootE)
Here is the output:
> summary(margEffBootE)
R original bootBias bootSE bootMed
genre 100 0.069516 0.0049706 0.045032 0.065125
filiere 100 0.045712 0.0013197 0.011714 0.048900
Appendix:
As a matter of theoretical interest, there are two ways to interpret your bootstrapped variable selection ask.
You can perform model selection (variable selection) by using as a measure of fit a bootstrap model fit criteria. The theory for this is outlined in Shao (1996), and requires a subsampling approach.
You then compute marginal effects and their bootstrap standard errors conditional on the best model selected above.
You can perform variable selection on multiple bootstrap samples, and arrive at either one best model by looking at the variables retained across the multiple bootstrap model selections, or use a model averaging estimator. The theory for this is discussed in Austin and Tu (2004).
You then compute marginal effects and their bootstrap standard errors conditional on the best model selected above.