Ordinal logistic regression with random variable and quantitative predictor variable - r

I want to use an ordinal logistic regression (my response variable is ordinal) that works with 2 random variables and for quantitative predictor variable with interaction (my formula is: ordinal_variable~ quantitative_variable:habitat + (1|community) + (1|species).
I was analyzing my data with clmm (see the script below) and got the results I expected, however I noticed that clmm was designed to be used when response and predictor variables are factorial.
I then tried the polmer (see the script below), but I did not get any answers.
Would someone have any suggestions on how to analyze this data?
library(ordinal)
model1 <- clmm(as.factor(ordinal_variable)~
quantitative_variable:habitat + (1|community) + (1|species),
data=baseline)
summary(model1)
library(MPDiR)
library(lme4)
model2 <- polmer(as.factor(ordinal_variable) ~
quantitative_variable:habitat + (community - 1 | Obs) +
(species - 1 | Obs), data=baseline)
summary(model2)

Related

Mixed effect model not returning all coefficients

I am running some mixed effect models in R. The code is:
m <- lmer(DV ~ FE^2 + FE + (FE^2 | ME) + (FE | ME), data=data, REML=FALSE)
When getting the coefficients with :
coefs <- data.frame(coef(m)[1])
I get a dataframe with the coefficients. However, when counting the number of groups it is missing a few. Why might this be? The model failed but my assumption is that it should still always produce a coefficient table showing the coefficients for each level of the Mixed effect.

plot - How to plot interaction of logistic regression model (glm) with multiple imputed data (MICE)?

I created an interaction term with iv*sex and imputed the data with mice. Then used the imputed data to run a logistic regression model (glm):
model <- with(data=imp, glm(dv~control+iv+sex+iv*sex, family="binomial"))
The following are the abbreviations of the variable names:
dependent variable=dv, independent variable=iv, moderator=sex, interaction term= iv*sex
There is significant interaction for iv*sex and I would like to plot a graph for the interaction but I couldn't find how to. It will be greatly appreciated if any solutions is offered. Thanks!
I've just run into the same issue, and with effects package I solved it.
e <- effects::effect("iv*sex", model)
e <- as.data.frame(e)
ggplot2::ggplot(e, ggplot2::aes(iv, fit, color=sex, group = sex)) +
ggplot2::geom_point() +
ggplot2::geom_line() +
"fit" is your dependent variable, in your case "dv".

Model Analysis IN R ( Logistic Regression)

I have a data file ( 1 million rows) that has one outcome variable as Status ( Yes / no ) with three continuous variables and 5 nominal variables ( 5 categories in each variable )
I want to predict the outcome i.e status.
I wanted to know which type of analysis is good for building up the model.
I have seen logit, probit, logistic regression. I am confused on what to start and analyse the variables that are more likely useful for analysis.
data file:
gender,region,age,company,speciality,jobrole,diag,labs,orders,status
M,west,41,PA,FPC, Assistant,code18,27,3,yes
M,Southwest,65,CV,FPC,Worker,code18,69,11,no
M,South,27,DV,IMC,Assistant,invalid,62,13,no
M,Southwest,18,CV,IMC,Worker,code8,6,1,yes
PS: Using R language.
Any help would be greatly appreciated Thanks !
Given the three, most usually start their analysis with Logistic regression.
Note that, Logistic and Logit are the same thing.
While deciding between Logistic and Probit, go for Logistic.
Probit usually returns results faster, while Logistic has a better edge for interpretation result.
Now, to settle on variables - You can vary the number of variables that you are going to use in your model.
model1 <- glm(status ~., data = df, family = binomial(link = 'logit'))
Now, check the model summary and check the importance of predictor variables.
model2 <- glm(status ~ gender + region + age + company + speciality + jobrole + diag + labs, data = df, family = binomial(link = 'logit'))
With reducing the number of variables you would better be able to identify what variables are important.
Also, ensure that you have performed data cleaning prior to this.
Avoid including highly correlated variables, you can check them using cor()

Mixed Modelling - Different Results between lme and lmer functions

I am currently working through Andy Field's book, Discovering Statistics Using R. Chapter 14 is on Mixed Modelling and he uses the lme function from the nlme package.
The model he creates, using speed dating data, is such:
speedDateModel <- lme(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality,
random = ~1|participant/looks/personality)
I tried to recreate a similar model using the lmer function from the lme4 package; however, my results are different. I thought I had the proper syntax, but maybe not?
speedDateModel.2 <- lmer(dateRating ~ looks + personality + gender +
looks:gender + personality:gender +
(1|participant) + (1|looks) + (1|personality),
data = speedData, REML = FALSE)
Also, when I run the coefficients of these models I notice that it only produces random intercepts for each participant. I was trying to then create a model that produces both random intercepts and slopes. I can't seem to get the syntax correct for either function to do this. Any help would be greatly appreciated.
The only difference between the lme and the corresponding lmer formula should be that the random and fixed components are aggregated into a single formula:
dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+ (1|participant/looks/personality)
using (1|participant) + (1|looks) + (1|personality) is only equivalent if looks and personality have unique values at each nested level.
It's not clear what continuous variable you want to define your slopes: if you have a continuous variable x and groups g, then (x|g) or equivalently (1+x|g) will give you a random-slopes model (x should also be included in the fixed-effects part of the model, i.e. the full formula should be y~x+(x|g) ...)
update: I got the data, or rather a script file that allows one to reconstruct the data, from here. Field makes a common mistake in his book, which I have made several times in the past: since there is only a single observation in the data set for each participant/looks/personality combination, the three-way interaction has one level per observation. In a linear mixed model, this means the variance at the lowest level of nesting will be confounded with the residual variance.
You can see this in two ways:
lme appears to fit the model just fine, but if you try to calculate confidence intervals via intervals(), you get
intervals(speedDateModel)
## Error in intervals.lme(speedDateModel) :
## cannot get confidence intervals on var-cov components:
## Non-positive definite approximate variance-covariance
If you try this with lmer you get:
## Error: number of levels of each grouping factor
## must be < number of observations
In both cases, this is a clue that something's wrong. (You can overcome this in lmer if you really want to: see ?lmerControl.)
If we leave out the lowest grouping level, everything works fine:
sd2 <- lmer(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+
(1|participant/looks),
data=speedData)
Compare lmer and lme fixed effects:
all.equal(fixef(sd2),fixef(speedDateModel)) ## TRUE
The starling example here gives another example and further explanation of this issue.

Stata's xtlogit (fe, re) equivalent in R?

Stata allows for fixed effects and random effects specification of the logistic regression through the xtlogit fe and xtlogit re commands accordingly. I was wondering what are the equivalent commands for these specifications in R.
The only similar specification I am aware of is the mixed effects logistic regression
mymixedlogit <- glmer(y ~ x1 + x2 + x3 + (1 | x4), data = d, family = binomial)
but I am not sure whether this maps to any of the aforementioned commands.
The glmer command is used to quickly fit logistic regression models with varying intercepts and varying slopes (or, equivalently, a mixed model with fixed and random effects).
To fit a varying intercept multilevel logistic regression model in R (that is, a random effects logistic regression model), you can run the following using the in-built "mtcars" data set:
data(mtcars)
head(mtcars)
m <- glmer(mtcars$am ~ 1 + mtcars$wt + (1|mtcars$gear), family="binomial")
summary(m)
# and you can examine the fixed and random effects
fixef(m); ranef(m)
To fit a varying-intercept slope model in Stata, you of course use the xtlogit command (using the similar but not identical in-built "auto" data set in Stata):
sysuse auto
xtset gear_ratio
xtlogit foreign weight, re
I'll add that I find the entire reference to "fixed" versus "random" effects ambiguous, and I prefer to refer to the structure of the model itself (e.g., are the intercepts varying? which slopes are varying, if any? is the model nested in 2 levels or more? are the levels cross-classified or not?). For a similar view, see Andrew Gelman's thoughts on "fixed" versus "random" effects.
Update: Ben Bolker's excellent comment below points out that in R it's more informative when using predict commands to use the data=mtcars option instead of, say, the dollar notation:
data(mtcars)
m1 <- glmer(mtcars$am ~ 1 + mtcars$wt + (1|mtcars$gear), family="binomial")
m2 <- glmer(am ~ 1 + wt + (1|gear), family="binomial", data=mtcars)
p1 <- predict(m1); p2 <- predict(m2)
names(p1) # not that informative...
names(p2) # very informative!

Resources