GLM BACI analysis in R - r

I am trying to conduct a BACI analysis in R using logistic regression. Due to the use of reference levels in the output of GLMs, I am having difficulty interpreting my results. Has anyone had any luck retrieving a summary of all pairwise interactions?
(Depth is a continuous predictor variable, but I can convert it to a categorical if necessary.)
Towards <- c(4,7,9,0,15,10,11,23,1,4)
Total <- c(6,14,10,7,15,12,20,41,5,8)
Depth <- c(-.3,-.25,-.21,-.17,-.05,0,0,.25,.5,.56)
DPM <- c("Pre","Post","Pre","Pre","Post","Pre","Post","Post","Post","Pre")
Proximity <- c("Far","Near","East","East","East","Near","Far", "Far","Near","Far")
Area <- c("DPM","Control","Control","DPM","Control","Control",
"DPM","DPM","Control‌​","DPM")
Data <- data.frame(Towards,Total, Depth, DPM, Proximity, Area)}
mod <- glm(cbind(Towards, Total-Towards) ~ DPM*Site*Depth,
data=LogReg, family=binomial('logit'))

Related

Asymptotic regression function not correlating with raw data

I'm trying to model raw data by an asymptotic function with the equation $$f(x) = a + (b-a)(1-\exp(-c x))$$ using R. To do so I used the following code:
rawData <- import("path/StackTestData.tsv")
# executing regression
X <- rawData$x
Y <- rawData$y
model <- drm(Y ~ X, fct = DRC.asymReg())
# creating the regression function
f_0_ <- model$coefficients[1] #value for y if x=0
steepness <- model$coefficients[2]
plateau <- model$coefficients[3]
eq <- function(x){f_0_+(plateau-f_0_)*(1-exp(-steepness*x))}
# plotting the regression function together with the raw data
ggplot(rawData,aes(x=x,y=y)) +
geom_line(col="red") +
stat_function(fun=eq,col="blue") +
ylim(10,12.5)
In some cases, I got a proper regression function. However, with the attached data I don't get one. The regression function is not showing any correlation with the raw data whatsoever, as shown in the figure below. Can you perhaps offer a better solution for performing the asymptotic regression or do you know where the error lies?
Best Max
R4.1.2 was used using R Studio 1.4.1106. For ggplot the package ggpubr, for DRC.asymReg() the packages aomisc and drc were load.

Getting standardized coefficients for a glmer model?

I've been asked to provide standardized coefficients for a glmer model, but am not sure how to obtain them. Unfortunately, the beta function does not work on glmer models:
Error in UseMethod("beta") :
no applicable method for 'beta' applied to an object of class "c('glmerMod', 'merMod')"
Are there other functions I could use, or would I have to write one myself?
Another problem is that the model contains several continuous predictors (which operate on similar scales) and 2 categorical predictors (one with 4 levels, one with six levels). The purpose of using the standardized coefficients would be to compare the impact of the categorical predictors to those of the continuous ones, and I'm not sure that standardized coefficients are the appropriate way to do so. Are standardized coefficients an acceptable approach?
The model is as follows:
model=glmer(cbind(nr_corr,maximum-nr_corr) ~ (condition|SUBJECT) + categorical_1 + categorical_2 + continuous_1 + continuous_2 + continuous_3 + continuous_4 + categorical_1:categorical_2 + categorical_1:continuous_3, data, control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=100000)), family = binomial)
reghelper::beta simply standardizes the numeric variables in our dataset. So assuming your catagorical variables are factors rather than numeric dummy variables or other contrast encodings we can fairly simply standardize the numeric variables in our dataset
vars <- grep('^continuous(.*)?', all.vars(formula(model)))
f <- function(var, data)
scale(data[[var]])
data[, vars] <- lapply(vars, f, data = data)
update(model, data = data)
Now for the more general case we can more or less just as easily create our own beta.merMod function. However we will need to take into account whether or not it makes sense to standardize y. For example if we have a poisson model only positive integer values makes sense. In addition a question becomes whether or not to scale the random slope effects or not, and whether it makes sense to ask this question in the first place. In it I assume that categorical variables are encoded as character or factor and not numeric or integer.
beta.merMod <- function(model,
x = TRUE,
y = !family(model) %in% c('binomial', 'poisson'),
ran_eff = FALSE,
skip = NULL,
...){
# Extract all names from the model formula
vars <- all.vars(form <- formula(model))
lhs <- all.vars(form[[2]])
# Get random effects from the
ranef <- names(ranef(model))
# Remove ranef and lhs from vars
rhs <- vars[!vars %in% c(lhs, ranef)]
# extract the data used for the model
env <- environment(form)
call <- getCall(model)
data <- get(dname <- as.character(call$data), envir = env)
# standardize the dataset
vars <- character()
if(isTRUE(x))
vars <- c(vars, rhs)
if(isTRUE(y))
vars <- c(vars, lhs)
if(isTRUE(ran_eff))
vars <- c(vars, ranef)
data[, vars] <- lapply(vars, function(var){
if(is.numeric(data[[var]]))
data[[var]] <- scale(data[[var]])
data[[var]]
})
# Update the model and change the data into the new data.
update(model, data = data)
}
The function works for both linear and generalized linear mixed effect models (not tested for nonlinear models), and is used just like other beta functions from reghelper
library(reghelper)
library(lme4)
# Linear mixed effect model
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
fm2 <- beta(fm1)
fixef(fm1) - fixef(fm2)
(Intercept) Days
-47.10279 -19.68157
# Generalized mixed effect model
data(cbpp)
# create numeric variable correlated with period
cbpp$nv <-
rnorm(nrow(cbpp), mean = as.numeric(levels(cbpp$period))[as.numeric(cbpp$period)])
gm1 <- glmer(cbind(incidence, size - incidence) ~ nv + (1 | herd),
family = binomial, data = cbpp)
gm2 <- beta(gm1)
fixef(gm1) - fixef(gm2)
(Intercept) nv
0.5946322 0.1401114
Note however that unlike beta the function returns the updated model not a summary of the model.
Another problem is that the model contains several continuous predictors (which operate on similar scales) and 2 categorical predictors (one with 4 levels, one with six levels). The purpose of using the standardized coefficients would be to compare the impact of the categorical predictors to those of the continuous ones, and I'm not sure that standardized coefficients are the appropriate way to do so. Are standardized coefficients an acceptable approach?
Now that is a great question and one better suited for stats.stackexchange, and not one I'm certain of the answer to.
Again, thank you so much, Oliver! For anybody who is interested in the answer regarding the last part of my question,
Another problem is that the model contains several continuous
predictors (which operate on similar scales) and 2 categorical
predictors (one with 4 levels, one with six levels). The purpose of
using the standardized coefficients would be to compare the impact of
the categorical predictors to those of the continuous ones, and I'm
not sure that standardized coefficients are the appropriate way to do
so. Are standardized coefficients an acceptable approach?
you can find the answer here. The tl;dr is that using standardized regression coefficients is not the best approach for mixed models anyways, let alone one such as mine...

Regression Modelling of Linear, Exponential, and Power Curves in R

Please note this is cross-posted: https://stats.stackexchange.com/questions/427649/regression-modelling-of-linear-exponential-and-power-curves
I am trying to model reaction time (and other) data across trials (trial 1-5) using different mathematical functions. Specifically I model linear, exponential, and power functions using linear mixed effect models by transforming data and use AIC/BIC to compare fits:
Linear: lmer(ReactionTime ~ Trial + (Trial | Subjects), data = lmerdata)
Exponential: lmer(log(ReactionTime) ~ Trial + (Trial | Subjects), data = lmerdata)
Power: lmer(log(ReactionTime) ~ log(Trial) + (Trial | Subjects), data = lmerdata)
By doing this, the exponential and power equations imply a different distribution for errors than the linear equation. The consequence of this is inflated exponential and power function fits relative to the linear fit.
Is there a way to account for this using lmer()? Alternatively, how would this be done using non-linear mixed effects modelling? I've attempted to do it with nlme(), nlmer(), glmer() but all methods end up running into issues (e.g., do not converge).
Here is sample data:
#Create Empty Matrix
lmerdata <- matrix(NA, 20, 3)
#Add Participant IDs
lmerdata[, 1] <- rep(1:4, 5)
#Add Trial Counts
lmerdata[, 2] <- as.numeric(sort(rep(1:5, 4)))
#Add Reaction Time Data
lmerdata[, 3] <- c(2.184308,2.754287,2.396167,1.305267,1.943866,1.70844,2.586035,1.261954,1.768063,1.76659,2.242142,1.489634,1.62544,1.677268,2.378175,1.550744,1.481052,1.424327,1.738102,1.247097)
#Name Columns
colnames(lmerdata) <- c('Subjects', 'Trial', 'ReactionTime')
#Convert to Data Frame
lmerdata <- as.data.frame(lmerdata)
#Turn Subjects into Factor
lmerdata$Subjects <- as.factor(lmerdata$Subjects)

Pooling sandwich variance estimator over multiply imputed datasets

I am running a poisson regression on multiply imputed data to predict a common binary outcome. After running mice, I have obtained a stacked data frame comprising the raw data and five imputed datasets. Here is a toy example:
df <- mice::nhanes
imp <- mice(df) #impute data
com <- complete(imp, "long", TRUE) #creates data frame
I now want to:
Run the regression on each imputed dataset
Calculate robust standard errors using a sandwich variance estimator
Combine / pool the results of both analyses
I can run the regression on the mids object using the with and pool commands:
fit.pois.mids <- with(imp, glm(hyp ~ age + bmi + chl, family = poisson))
summary(pool(fit.pois.mids))
I can also run the regression on each of the imputed datasets before combining them:
imp.df <- split(com, com$.imp); names(imp.df) <- c("raw", "imp1", "imp2", "imp3", "imp4", "imp5") #creates list of data frames representing each imputed dataset
fit.pois <- lapply(imp.df, function(x) {
fit <- glm(hyp ~ age + bmi + chl, data = x, family = poisson)
fit
})
summary(MIcombine(fit.pois))
Similarly, I can calculate the standard errors for each imputed dataset:
sand <- lapply(fit.pois, function(x) {
se <- coeftest(x, vcov = sandwich)
se
})
Unfortunately, MIcombine does not seem to return p-values. This post suggests using Zelig, but for that matter, I may as well just use mice. Further it does not appear to be possible to combine the estimates of the standard errors:
summary(MIcombine(sand.df))
Error in UseMethod("vcov") :
no applicable method for 'vcov' applied to an object of class "coeftest"
For the sake of simplicity, it seems that mice is a better option for pooling the results of the regression; however, I am wondering how I would go about updating (i.e., pooling and combining) the standard errors. What are some ways this could be addressed?

Plot Multiple Imputation Results

I have successfully completed a multiple imputation on the missing data of my questionnaire research using the MICE package in R and performed a linear regression on the pooled imputed variables. I can't seem to work out how to extract single pooled variables and plot in a graph. Any ideas?
e.g.
>imp <- mice(questionnaire)
>fit <- with(imp, lm(APE~TMAS+APB+APA+FOAP))
>summary(pool(fit))
I want to plot pooled APE by TMAS.
Reproducible Example using nhanes:
> library(mice)
> nhanes
> imp <-mice(nhanes)
> fit <-with(imp, lm(bmi~chl+hyp))
> fit
> summary(pool(fit))
I would like to plot pooled chl against pooled bmi (for example).
Best I have been able to achieve is
> mat <-complete(imp, "long")
> plot(mat$chl~mat$bmi)
Which I believe gives the combined plot of all 5 imputations and is not quite what I am looking for (I think).
the underlying with.mids() function lets the regression be carried out on each imputed dataframe. So it is not one regression, but 5 regressions that happened. pool() just averages the estimated coefficients and adjusts the variances for the statistical inference according to the amount of imputation.
So there aren't single pooled variables to plot. What you could do is average the 5 imputed sets and recreate some kind of "regression line" based on the pooled coefficients, eg :
# Averaged imputed data
combchl <- tapply(mat$chl,mat$.id,mean)
combbmi <- tapply(mat$bmi,mat$.id,mean)
combhyp <- tapply(mat$hyp,mat$.id,mean)
# coefficients
coefs <- pool(fit)$qbar
# regression results
x <- data.frame(
int = rep(1,25),
chl = seq(min(combchl),max(combchl),length.out=25),
hyp = seq(min(combhyp),max(combhyp),length.out=25)
)
y <- as.matrix(x) %*%coefs
# a plot
plot(combbmi~combchl)
lines(x$chl,y,col="red")

Resources