I am using MCMCglmm package to perform mixed-model analysis. If I have data look like:
df_test <- data.frame(ID = 1:10, a = 171:180, b = 71:80 + rnorm(10),
age = factor(c(rep("young", 3), rep("mid", 4), rep("old", 3)),
levels = c("young", "mid", "old")))
For linear models, I can easily do summary(Manova(lm(cbind(a, b) ~ age + 0, data = df_test))) to get the table and see the clear effect on age (please ignore the collinearity issue here).
However, what if I use MCMCglmm to consider the mixed effects of ID? Let's say the MCMCglmm object is called "mc_model", how can I use Manova or similar methods to look at the effects of (multiple) categorical (with numerous levels) variables?
Thanks!
Related
I have an R coding question.
This is my first time asking a question here, so apologies if I am unclear or do something wrong.
I am trying to use a Generalized Linear Mixed Model (GLMM) with Poisson error family to test for any significant effect on a count response variable by three separate dichotomous variables (AGE = ADULT or JUVENILE, SEX = MALE or FEMALE and MEDICATION = NEW or OLD) and an interaction between AGE and MEDICATION (AGE:MEDICATION).
There is some dependency in my data in that the data was collected from a total of 22 different sites (coded as SITE vector with 33 distinct levels), and the data was collected over a total of 21 different years (coded as YEAR vector with 21 distinct levels, and treated as a categorical variable). Unfortunately, every SITE was not sampled for each YEAR, with some being sampled for a greater number of years than others.
The data is also quite sparse, in that I do not have a great number of measurements of the response variable (coded as COUNT and an integer vector) per SITE per YEAR.
My Poisson GLMM is constructed using the following code:
model <- glmer(data = mydata,
family = poisson(link = "log"),
formula = COUNT ~ SEX + SEX:MEDICATION + AGE + AGE:SEX + MEDICATION + AGE:MEDICATION + (1|SITE/YEAR),
offset = log(COUNT.SAMPLE.SIZE),
nAGQ = 0)
In order to try and obtain more reliable estimates for the fixed effect coefficients (particularly given the sparse nature of my data), I am trying to obtain 95% confidence intervals for the fixed effect coefficients through non-parametric bootstrapping.
I have come across the "glmmboot" package which can be used to conduct non-parametric bootstrapping of GLMMs, however when I try to run the non-parametric bootstrapping using the following code:
library(glmmboot)
bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000)
When I run this code, I receive the following message:
Performing case resampling (no random effects)
Naturally, though, my model does have random effects, namely (1|SITE/YEAR).
If I try to tell the function to resample from a specific block, by adding in the "reample_specific_blocks" argument, i.e.:
library(glmmboot)
bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000,
resample_specific_blocks = "YEAR")
Then I get the following error message:
Performing block resampling, over SITE
Error: Invalid grouping factor specification, YEAR:SITE
I get a similar error message if I try set 'resample_specific_blocks' to "SITE".
If I then try to set 'resample_specific_blocks' to "SITE:YEAR" or "SITE/YEAR" I get the following error message:
Error in bootstrap_model(base_model = model, base_data = mydata, resamples = 1000, :
No random columns from formula found in resample_specific_blocks
I have tried explicitly nesting YEAR within SITE and then adapting the model accordingly using the code:
mydata <- within(mydata, SAMPLE <- factor(SITE:YEAR))
model.refit <- glmer(data = mydata,
family = poisson(link = "log"),
formula = COUNT ~ SEX + AGE + MEDICATION + AGE:MEDICATION + (1|SITE) + (1|SAMPLE),
offset = log(COUNT.SAMPLE.SIZE),
nAGQ = 0)
bootstrap_model(base_model = model.refit,
base_data = mydata,
resamples = 1000,
resample_specific_blocks = "SAMPLE")
But unfortunately I just get this error message:
Error: Invalid grouping factor specification, SITE
The same error message comes up if I set resample_specific_blocks argument to SITE, or if I just remove the resample_specific_blocks argument.
I believe that the case_bootstrap() function found in the lmeresampler package could potentially be another option, but when I look into the help for it it looks like I would need to create a function and I unfortunately have no experience with creating my own functions within R.
If anyone has any suggestions on how I can get the bootstrap_model() function in the glmmboot package to recognise the random effects in my model/dataframe, or any suggestions for alternative methods on conducting non-parametric bootstrapping to create 95% confidence intervals for the coefficients of the fixed effects in my model, it would be greatly appreciated! Many thanks in advance, and for reading such a lengthy question!
For reference, I include links to the RDocumentation and GitHub for the glmmboot package:
https://www.rdocumentation.org/packages/glmmboot/versions/0.6.0
https://github.com/ColmanHumphrey/glmmboot
The following is code that will allow for creation of a reproducible example using the data set from lme4::grouseticks
#Load in required packages
library(tidyverse)
library(lme4)
library(glmmboot)
library(psych)
#Load in the grouseticks dataframe
data("grouseticks")
tibble(grouseticks)
#Create dummy vectors for SEX, AGE and MEDICATION
set.seed(1)
SEX <-sample(1:2, size = 403, replace = TRUE)
SEX <- as.factor(ifelse(SEX == 1, "MALE", "FEMALE"))
set.seed(2)
AGE <- sample(1:2, size = 403, replace = TRUE)
AGE <- as.factor(ifelse(AGE == 1, "ADULT", "JUVENILE"))
set.seed(3)
MEDICATION <- sample(1:2, size = 403, replace = TRUE)
MEDICATION <- as.factor(ifelse(MEDICATION == 1, "OLD", "NEW"))
grouseticks$SEX <- SEX
grouseticks$AGE <- AGE
grouseticks$MEDICATION <- MEDICATION
#Use the INDEX vector to create a vector of sample sizes per LOCATION
#per YEAR
grouseticks$INDEX <- 1
sample.sizes <- grouseticks %>%
group_by(LOCATION, YEAR) %>%
summarise(SAMPLE.SIZE = sum(INDEX))
#Combine the dataframes together into the dataframe to be used in the
#model
mydata$SAMPLE.SIZE <- as.integer(mydata$SAMPLE.SIZE)
#Create the Poisson GLMM model
model <- glmer(data = mydata,
family = poisson(link = "log"),
formula = TICKS ~ SEX + SEX + AGE + MEDICATION + AGE:MEDICATION + (1|LOCATION/YEAR),
nAGQ = 0)
#Attempt non-parametric bootstrapping on the model to get 95%
#confidence intervals for the coefficients of the fixed effects
set.seed(1)
Model.bootstrap <- bootstrap_model(base_model = model,
base_data = mydata,
resamples = 1000)
Model.bootstrap
I want to fit a very simple mixed-effects model, with a couple of fixed effects and random intercepts (no random slopes), using the mlogit package in R. My categorical outcome variable has three levels, so I cannot use the lme4 package.
However, I keep googling and stack-ing and CRAN-ing (?) about this, but nowhere am I able to find a good solution. Any help out there on how to do this with the mlogit package? -- Or are there any similar alternatives in other R packages (or in SPSS, Stata or Minitab, or via packages in Python/Julia)?
See code below for my data structure and what type of model I would like to fit (I know how to fit a fixed-effects only model with mlogit (cf. fixed_model below); I just want to add random intercepts):
library(mlogit)
library(dfidx)
# Make variables:
Outcome = c("y","z","y","z","x","z","y","x","x","x","z",
"y","z","x","x","y","z","x", "x", "y")
Predictor = rep(c("M", "F"), 10)
RandomIntercept = rep(c("A", "B", "C", "D"), 5)
# Make data frame
df <- data.frame(Outcome, Predictor, RandomIntercept)
# Make mlogit-ready dataframe:
df_mlogit <- dfidx(df, choice = "Outcome", shape = "wide", id.var = "RandomIntercept")
# Display first observations:
head(df_mlogit)
# Make fixed-effect-only model:
fixed_model <- mlogit::mlogit(Outcome ~ 1 | Predictor, data = df_mlogit, reflevel = "x")
#Display results:
fixed_model
# The kind of model I want, in lme4-syntax:
dream_model <- lme4::glmer(Outcome ~ Predictor + (1|RandomIntercept), family = "binomial")
I am using glmulti to run hierarchical linear models and select the best model. I have 4 predictors (A, B, C, D) to the DV, and my goal is to run all main effect models plus all combination of interaction effects (i.e., A:B, A:C, A:D). How do the following two models differ from each other?
library(glmulti)
# wrapper
glmer.glmulti <- function(formula, data, random = ""){
glmer(paste(deparse(formula), random), data = data, family = binomial)}
# model 1
glmulti(DV ~ A+B+C+D, level = 2, fitfunction = glmer.glmulti, random = "+ (1|ID)",
method = "g", data = df)
# model 2
glmulti(DV ~ A*B*C*D, level = 2, fitfunction = glmer.glmulti, random = "+ (1|ID)",
method = "g", data = df)
I know that "when an interaction between two factors is included in a model, then adding or not these factors as main effects does not change the model" (Calcagno, 2010). Seem that model 1 and model 2 should produce the same results because A*B*C*D essentially includes the main effect of each predictor. But the two codes select a different best model.
Thanks!
So, I am working with a big dataset (55965 points). I am trying to run a LME accounting for correlation. But R will return me this
Error: 'sumLenSq := sum(table(groups)^2)' = 3.13208e+09 is too large.
Too large or no groups in your correlation structure?
I can not subset it since I need all the points. My questions are:
Is there some setting I can change in the function?
If not, is there any other package with similar function that would run such a big dataset?
Here is a reproducible example:
require(nlme)
my.data<- matrix(data = 0, nrow = 55965, ncol = 3)
my.data<- as.data.frame(my.data)
dummy <- rep(1, 55965)
my.data$dummy<- dummy
my.data$V1<- seq(780, 56744)
my.data$V2<- seq(1:55965)
my.data$X<- seq(49.708, 56013.708)
my.data$Y<-seq(-12.74094, -55977.7409)
null.model <- lme(fixed = V1~ V2, data = my.data, random = ~ 1 | dummy, method = "ML")
spatial_model <- update(null.model, correlation = corGaus(1, form = ~ X + Y), method = "ML")
Since you have assigned a grouping factor with only one level, there are no groups in the data, which is what the error message reports. If you just want to account for spatial autocorrelation, with no other random effects, use gls from the same package.
Edit: A further note on 2 different approaches to modelling spatial autocorrelation: The corrGauss (and other corrSpatial type functions) implement spatial correlation models for regression residuals, which is different from, say, a spatial random effect added to the model based on county/district/grid identity.
I am trying to replicate some growth mixture modeling results from Mplus in R, using the lcmm() package. I am having some trouble converting the model specification from a Mplus framework to a linear mixed effects model framework in R.
In Mplus, a 2-class growth mixture model was specified like this:
VARIABLE: NAMES ARE y1-y4;
CLASSES = c(2);
ANALYSIS: TYPE = MIXTURE;
MODEL:
%OVERALL%
i s | y1#0 y2#1 y3#2 y4#3;
OUTPUT:
TECH1;
In R, using the lcmm() package, I think the model specification should look something like this:
m1 <- hlme(y ~ time,
mixture = ~ time,
random = ~ time,
subject = 'id',
ng = 2,
idiag = F,
data = dat.long)
However, I am not sure if I am specifying the mixture and random arguments correctly. Any suggestions would be appreciated!
Data is here: http://www.statmodel.com/usersguide/chap8/ex8.1.dat