ggcoef_model error when two random intercepts - r

When trying to graph the conditional fixed effects of a glmmTMB model with two random intercepts in GGally I get the error:
There was an error calling "tidy_fun()". Most likely, this is because the
function supplied in "tidy_fun=" was misspelled, does not exist, is not
compatible with your object, or was missing necessary arguments (e.g. "conf.level=" or "conf.int="). See error message below.
Error: Error in "stop_vctrs()":
! Can't recycle "..1" (size 3) to match "..2" (size 2).`
I have tinkered with figuring out the issue and it seems to be related to the two random intercepts included in the model. I have also tried extracting the coefficient and standard error information separately through broom.mixed::tidy and then feeding the data frame into GGally:ggcoef() with no avail. Any suggestions?
# Example with built-in randu data set
data(randu)
randu$A <- factor(rep(c(1,2), 200))
randu$B <- factor(rep(c(1,2,3,4), 100))
# Model
test <- glmmTMB(y ~ x + z + (0 +x|A) + (1|B), family="gaussian", data=randu)
# A few of my attempts at graphing--works fine when only one random effects term is in model
ggcoef_model(test)
ggcoef_model(test, tidy_fun = broom.mixed::tidy)
ggcoef_model(test, tidy_fun = broom.mixed::tidy, conf.int = T, intercept=F)
ggcoef_model(test, tidy_fun = broom.mixed::tidy(test, effects="fixed", component = "cond", conf.int = TRUE))

There are some (old!) bugs that have recently been fixed (here, here) that would make confidence interval reporting on RE parameters break for any model with multiple random terms (I think). I believe that if you are able to install updated versions of both glmmTMB and broom.mixed:
remotes::install_github("glmmTMB/glmmTMB/glmmTMB#ci_tweaks")
remotes::install_github("bbolker/broom.mixed")
then ggcoef_model(test) will work.

Related

error when using causalweights package in R

I was trying to estimate a causal effect using inverse probability weighting from the causalweightspackage. However, I keep running into the following error message:
Error in model.frame.default(formula = d ~ x, drop.unused.levels = TRUE) :
variable lengths differ (found for 'x')
I want to estimate the causal effect taking into consideration a matrix including multiple control variables. When using a single control from the data-set, R manages to generate an estimate, but when I try to use the matrix including all my control variables, I receive the above-mentioned error message.
My code is as follows and appears to generate estimates when using a single control instead of my predefined matrix of multiple controls as observable in the following code:
attach(data_clean2)
controls <- cbind(marits_1, nationality1, mother_tongue1, educ1,
lastj_fct1, child_subsidies, contr_2y,
unempl_r, gdp_gr, insured_earn)
ipw_atet <- treatweight(y = duration_ue2, # take initial data
d = treatment,
x = controls,
ATET = TRUE, # if = FALSE, estimates ATE (default)
trim = (1-pscore_max0),
boot = 2)
Has anyone encountered similar problems and found a solution?
Thanks in advance

Unable to conduct variable importance in r

I'm trying to test the variable importance before running the actual regression. But, when I attempt to do so, I get this error:
Error in varImp(regressor, scale = FALSE) :
trying to get slot "responses" from an object (class "randomForest.formula") that is not an S4 object
I've tried looking up the error, but there wasn't much information available online. What can I do to fix this?
all = read.csv('https://raw.githubusercontent.com/bandcar/massShootings/main/all.csv')
# Check Variable importance with randomForest
regressor <- randomForest::randomForest(total_victims ~ . , data = all, importance=TRUE) # fit the random forest with default parameter
caret::varImp(regressor, scale = FALSE) # get variable importance, based on mean decrease in accuracy

Error while using the weights option in nlme in r

Sorry this is crossposting from https://stats.stackexchange.com/questions/593717/nlme-regression-with-weights-syntax-in-r, but I thought it might be more appropriate to post it here.
I am trying to fit a power curve to model some observations in an nlme. However, I know some observations to be less reliable than others (reliability of each OBSID reflected in the WEIV in the dummy data), relatively independent of variance, and I quantified this beforehand and wish to include it as weights in my model. Moreover, I know a part of my variance is correlated with my independent variable so I cannot use directly the variance as weights.
This is my model:
coeffs_start = lm(log(DEPV)~log(INDV), filter(testdummy10,DEPV!=0))$coefficients
nlme_fit <- nlme(DEPV ~ a*INDV^b,
data = testdummy10,
fixed=a+b~ 1,
random = a~ 1,
groups = ~ PARTID,
start = c(a=exp(coeffs_start[1]), b=coeffs_start[2]),
verbose = F,
method="REML",
weights=varFixed(~WEIV))
This is some sample dummy data (I know it is not a great fit but it's fake data anyway) : https://github.com/FlorianLeprevost/dummydata/blob/main/testdummy10.csv
This runs well without the "weights" argument, but when I add it I get this error and I am not sure why because I believe it is the correct syntax:
Error in recalc.varFunc(object[[i]], conLin) :
dims [product 52] do not match the length of object [220]
In addition: Warning message:
In conLin$Xy * varWeights(object) :
longer object length is not a multiple of shorter object length
Thanks in advance!
This looks like a very long-standing bug in nlme. I have a patched version on Github, which you can install via remotes::install_github() as below ...
remotes::install_github("bbolker/nlme")
testdummy10 <- read.csv("testdummy10.csv") |> subset(DEPV>0 & INDV>0)
coeffs_start <- coef(lm(log(DEPV)~log(INDV), testdummy10))
library(nlme)
nlme_fit <- nlme(DEPV ~ a*INDV^b,
data = testdummy10,
fixed=a+b~ 1,
random = a~ 1,
groups = ~ PARTID,
start = c(a=exp(coeffs_start[1]),
b=coeffs_start[2]),
verbose = FALSE,
method="REML",
weights=varFixed(~WEIV))
packageVersion("nlme") ## 3.1.160.9000

Is it possible to use lqmm with a mira object?

I am using the package lqmm, to run a linear quantile mixed model on an imputed object of class mira from the package mice. I tried to make a reproducible example:
library(lqmm)
library(mice)
summary(airquality)
imputed<-mice(airquality,m=5)
summary(imputed)
fit1<-lqmm(Ozone~Solar.R+Wind+Temp+Day,random=~1,
tau=0.5, group= Month, data=airquality,na.action=na.omit)
fit1
summary(fit1)
fit2<-with(imputed, lqmm(Ozone~Solar.R+Wind+Temp+Day,random=~1,
tau=0.5, group= Month, na.action=na.omit))
"Error in lqmm(Ozone ~ Solar.R + Wind + Temp + Day, random = ~1, tau = 0.5, :
`data' must be a data frame"
Yes, it is possible to get lqmm() to work in mice. Viewing the code for lqmm(), it turns out that it's a picky function. It requires that the data argument is supplied, and although it appears to check if the data exists in another environment, it doesn't seem to work in this context. Fortunately, all we have to do to get this to work is capture the data supplied from mice and give it to lqmm().
fit2 <- with(imputed,
lqmm(Ozone ~ Solar.R + Wind + Temp + Day,
data = data.frame(mget(ls())),
random = ~1, tau = 0.5, group = Month, na.action = na.omit))
The explanation is that ls() gets the names of the variables available, mget() gets those variables as a list, and data.frame() converts them into a data frame.
The next problem you're going to find is that mice::pool() requires there to be tidy() and glance() methods to properly pool the multiple imputations. It looks like neither broom nor broom.mixed have those defined for lqmm. I threw together a very quick and dirty implementation, which you could use if you can't find anything else.
To get pool(fit2) to run you'll need to create the function tidy.lqmm() as below. Then pool() will assume the sample size is infinite and perform the calculations accordingly. You can also create the glance.lqmm() function before running pool(fit2), which will tell pool() the residual degrees of freedom. Afterwards you can use summary(pooled) to find the p-values.
tidy.lqmm <- function(x, conf.int = FALSE, conf.level = 0.95, ...) {
broom:::as_tidy_tibble(data.frame(
estimate = coef(x),
std.error = sqrt(
diag(summary(x, covariance = TRUE,
R = 50)$Cov[names(coef(x)),
names(coef(x))]))))
}
glance.lqmm <- function(x, ...) {
broom:::as_glance_tibble(
logLik = as.numeric(stats::logLik(x)),
df.residual = summary(x, R = 2)$rdf,
nobs = stats::nobs(x),
na_types = "rii")
}
Note: lqmm uses bootstrapping to estimate the standard error. By default it uses R = 50 bootstrapping replicates, which I've copied in the tidy.lqmm() function. You can change that line to increase the number of replicates if you like.
WARNING: Use these functions and the results with caution. I know just enough to be dangerous. To me it looks like these functions work to give sensible results, but there are probably intricacies that I'm not aware of. If you can find a more authoritative source for similar functions that work, or someone who is familiar with lqmm or pooling mixed models, I'd trust them more than me.

Strange glmulti results: Why are interaction variables from the candidate model dropped/not included?

I have been using glmulti to obtain model averaged estimates and relative importance values for my variables of interest. In running glmulti I specified a candidate model for which all variables and interactions were included based on a priori knowledge (see code below).
After running the glmutli model I studied the results by using the functions summary() and weightable(). There seem to be a number of strange things going on with the results which I do not understand.
First of all, when I run my candidate model with lme4 glmer() function I obtain an AIC value of 2086. In the glmulti output this candidate model (with exactly the same formula) has a lower AIC value (2107), as a result of which it appears at position 8 out of 26 in the list of all potential models (as obtained through the weigtable() function).
What seems to be causing this problem is that the logArea:Habitat interaction is dropped from the candidate model, despite level=2 being specified. The function summary(output_new#objects[[8]]) provides a different formula (without the logArea:Habitat interaction variable) compared to the formula provided through weightable(). This explains why the candidate model AIC value is not the same as obtained through lme4, but I do not understand why the interaction variables logArea:Habitat is missing from the formula. The same is happening for other possible models. It seems that for all models with 2 or more interactions, one interaction is dropped.
Does anyone have an explanation for what is going on? Any help would be much appreciated!
Best,
Robert
Note: I have created a subset of my data (https://drive.google.com/open?id=1rc0Gkp7TPdnhW6Bw87FskL5SSNp21qxl) and simplified the candidate model by removing variables in order to decrease model run time. (The problem remains the same)
newdat <- Data_ommited2[, c("Presabs","logBodymass", "logIsolation", "Matrix", "logArea", "Protection","Migration", "Habitat", "Guild", "Study","Species", "SpeciesStudy")]
glmer.glmulti <- function (formula, data, random, ...) {
glmer(paste(deparse(formula), random), data = data, family=binomial(link="logit"),contrasts=list(Matrix=contr.sum, Habitat=contr.treatment, Protection=contr.treatment, Guild=contr.sum),glmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 100000)))
}
output_new <- glmulti(y = Presabs ~ Matrix + logArea*Protection + logArea*Habitat,
data = sampledata,
random = '+(1|Study)+(1|Species)+(1|SpeciesStudy)',
family = binomial,
method = 'h',
level=2,
marginality=TRUE,
crit = 'aic',
fitfunc = glmer.glmulti,
confsetsize = 26)
print(output_new)
summary(output_new)
weightable(output_new)
I found a post (https://stats.stackexchange.com/questions/341356/glmulti-package-in-r-reporting-incorrect-aicc-values) of someone who encountered the same problem and it appears that the problem was caused by this line of code:
glmer.glmulti <- function (formula, data, random, ...) {
glmer(paste(deparse(formula), random), data = data, family=binomial(link="logit"))
}
By changing this part of the code into the following the problem was solved:
glmer.glmulti<-function(formula,data,random,...) {
newf <- formula
newf[[3]] <- substitute(f+r,
list(f=newf[[3]],
r=reformulate(random)[[2]]))
glmer(newf,data=data,
family=binomial(link="logit"))
}

Resources