Reporting interactions in a linear model in r - r

I am trying to report all the interactions in a linear model that reads:
mod1.lme <- lm(volume ~ Group * Treatment + Group + Treatment, data = df)
Group is a factor variable with 3 levels: A, B and C.
The result that I currently get is for (I made up the data):
These two estimates are in reference to Treatment:A, but I would like to see each effect independently. So the output that I would like to get is:
Treatment:A
Treatment:B
Treatment:C
If I eliminate the intercept adding -1 at the end I get:
What is the best way to code this?
Thanks

The reason you are seeing the output that you are, is that one of the factor levels of Treatment becomes a reference level. When interpreting the model the coefficients become "the difference in effect from the reference level". This is necessary as long as the model includes an intercept, so the only way to get the interpretation with all coefficients shown is to remove the intercept as shown below.
mod1.lme <- lm(volume ~ Group * Treatment - 1, data = df)
Edit:
To change the name of the interaction effect, one would have to edit the name manually
sum.lm <- summary(mod1.lme)
rownames(sum.lm$coef) <- c("groupA","groupB","groupC", "groupA:Treatment", "groupB:Treatment", "groupC:Treatment")
or alternatively use another package for summaries such as sjPlot
library(sjPlot)
tab_model(mod1.lme, pred.labels = c("groupA","groupB","groupC", "groupA:Treatment", "groupB:Treatment", "groupC:Treatment"))

Related

glmer extracts coefficient only for main (predictor) factor, not contrast

here is the GLMER model
model <- glmer(ACC~Group*M_O*Lblock+ (1| Subject) + (1| hand),data = learndata_long3,family="binomial")
while the 'Lblock' factor has 9 levels, others have 2 levels.
The results generate like this:
summary(model)$coefficients
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.437931021 0.16334362 2.68104155 7.339340e-03
Group1 -0.032138148 0.14961572 -0.21480463 8.299196e-01
M_O1 0.135726477 0.04115871 3.29763642 9.750230e-04
Lblock1 0.301264476 0.08343952 3.61057288 3.055214e-04
Lblock2 0.623913565 0.08247767 7.56463576 3.889529e-14
Lblock3 1.022046512 0.08235930 12.40960689 2.317880e-35
Lblock4 1.399407518 0.08337615 16.78426631 3.181367e-63
Lblock5 1.741198402 0.08541505 20.38514752 2.265326e-92
Lblock6 2.065315516 0.08843600 23.35378765 1.261292e-120
Lblock7 2.268393650 0.09075950 24.99345703 7.201546e-138
Lblock8 2.637079325 0.09707420 27.16560426 1.656429e-162
ALL I want is extract each factor, like"
Estimate : Group / M_O / Lblock
how can I do? just sum up and then mean the block? or ?
Very new to these fields, thanks for your help
Thanks for clarifying a bit. I think what you are expecting is something similar to the output of an ANOVA? But this will not be possible with your data, as you have two random effects specified.
As you are running a logistic regression, you should read up a bit on how to interpret them. (I'm just putting this here because you said you were new to this)
https://stats.idre.ucla.edu/stata/output/logistic-regression-analysis/
Now, if you want to test the contribution of one of your factors to to the model, you have to create nested models, and compare them with a likelihood-ratio test using the anova() function in R.
For example, let's say that you had the same model you had above, but without any interactions specified:
m1 <- glmer(ACC~Group+M_O+Lblock+ (1| Subject) + (1| hand),data = learndata_long3,family="binomial")
And then one without the Group predictor:
m2 <- glmer(ACC~M_O+Lblock+ (1| Subject) + (1| hand),data = learndata_long3,family="binomial")
Then we compare whether having the Group predictor significantly improved the model:
anova(m1,m2)
This will give your a p-value telling you if the addition of Group significantly improves model fit.
If this seems all like a lot, which it is if you're not familiar with model comparison, I'd recommend looking at this tutorial from Bodo Winter. It's directed at people who are new mixed-models, and want a conceptual foundation of what is going on. I don't know what field you are in, but I think the examples are pretty accessible to everyone.
https://arxiv.org/abs/1308.5499
Please let me know if you need any other clarifications or have any questions during the tutorial.

LMER Factor vs numeric Interaction

I am attempting to use lmer to model my data.
My data has 2 independent variables and a dependent variable.
The first is "Morph" and has values "Identical", "Near", "Far".
The second is "Response" which can be "Old" or "New".
The dependent variable is "Fix_Count".
So here is a sample dataframe and what I currently have for running the linear model.
Subject <- c(rep(1, times = 6), rep(2, times = 6))
q <- c("Identical", "Near", "Far")
Morph <- c(rep(q, times = 4))
t <- c(rep("old", times = 3),rep("new", times=3))
Response <- c(rep(t, times = 2))
Fix_Count <- sample(1:9, 12, replace = T)
df.main <- data.frame(Subject,Morph, Response, Fix_Count, stringsAsFactors = T)
df.main$Subject <- as.factor(df.main$Subject)
res = lmer(Fix_Count ~ (Morph * Response) + (1|Subject), data=df.main)
summary(res)
And the output looks like this:
The issue is I do not want it to do combination but an overall interaction of Morph:Response.
I can get it to do this by converting Morph to numeric instead of factor. However I'm not sure conceptually that makes sense as the values don't properly represent 1,2,3 but low-mid-high (ordered but qualitative).
So: 1. Is it possible to run lmer to get interaction effects between 2 factor variables?
2. Or do you think numeric is a fine way to class "Identica", "Near", "Far"?
3. I have tried setting contrasts to see if that can help, but sometimes I get an error and other times it seems like nothing is changed. If contrasts would help, could you explain how I would implement this?
Thank you so much for any help you can offer. I have also posted this question to stack exchange as I am unsure if this is a coding issue or a stats issue. However I can remove it from the less relevant forum once I know.
Best, Kirk
Two problems I see. First, you should be using a factor variable for Subject. It's clearly not a continuous or integer variable. And to (possibly) address part of your question, there is an interaction function designed to work with regression formulas. I'm pretty sure that the formula interface will interpret the "*" operator that you used as a call to interaction, but the labeling of the output may be different and perhaps more to your liking. I get the same number of coefficients with:
res = lmer(Fix_Count ~ interaction(Morph , Response) + (1|Subject), data=df.main)
But that's not an improvement.
However, they differ from the model created with Morph*Response. Probably there is a different set of contrast options.
The way to get an overall statistical test of the interaction is to compare nested models:
res_simple = lmer(Fix_Count ~ Morph + Response + (1|Subject), data=df.main)
And then do an anova for the model comparison:
anova(res,res_simple)
refitting model(s) with ML (instead of REML)
Data: df.main
Models:
res_simple: Fix_Count ~ Morph + Response + (1 | Subject)
res: Fix_Count ~ interaction(Morph, Response) + (1 | factor(Subject))
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
res_simple 6 50.920 53.830 -19.460 38.920
res 8 54.582 58.461 -19.291 38.582 0.3381 2 0.8445
My opinion is that it is sufficiently close to the boundary for stats vs coding that it could have been acceptable on either forum. (You are not supposed to cross post, however.) If you are satisfied with a coding answer then we are done. If you need help with understanding model comparison, then you may need to edit your CV.com's question to request a more theory-based answer than mine. (I checked to make sure the anova results are the same regardless of whether you use the interaction function or the "*" operator.)

I get many predictions after running predict.lm in R for 1 row of new input data

I used ApacheData data with 83784 rows to build a linear regression model:
fit <-lm(tomorrow_apache~ as.factor(state_today)
+as.numeric(daily_creat)
+ as.numeric(last1yr_min_hosp_icu_MDRD)
+as.numeric(bun)
+as.numeric(urin)
+as.numeric(category6)
+as.numeric(category7)
+as.numeric(other_fluid)
+ as.factor(daily)
+ as.factor(age)
+ as.numeric(apache3)
+ as.factor(mv)
+ as.factor(icu_loc)
+ as.factor(liver_tr_before_admit)
+ as.numeric(min_GCS)
+ as.numeric(min_PH)
+ as.numeric(previous_day_creat)
+ as.numeric(previous_day_bun) ,ApacheData)
And I want to use this model to predict a new input so I give each predictor variable a value:
predict(fit, data=data.frame(state_today=1, daily_creat=2.3, last1yr_min_hosp_icu_MDRD=3, bun=10, urin=0.01, category6=10, category7=20, other_fluid=0, daily=2 , age=25, apache3=12, mv=1, icu_loc=1, liver_tr_before_admit=0, min_GCS=20, min_PH=3, previous_day_creat=2.1, previous_day_bun=14))
I expect a single value as a prediction to this new input, but I get many many predictions! I don't know why is this happening. What am I doing wrong?
Thanks a lot for your time!
You may also want to try the excellent effects package in R (?effects). It's very useful for graphing the predicted probabilities from your model by setting the inputs on the right-hand side of the equation to particular values. I can't reproduce the example you've given in your question, but to give you an idea of how to quickly extract predicted probabilities in R and then plot them (since this is vital to understanding what they mean), here's a toy example using the in-built data sets in R:
install.packages("effects") # installs the "effects" package in R
library(effects) # loads the "effects" package
data(Prestige) # loads in-built dataset
m <- lm(prestige ~ income + education + type, data=Prestige)
# this last step creates predicted values of the outcome based on a range of values
# on the "income" variable and holding the other inputs constant at their mean values
eff <- effect("income", m, default.levels=10)
plot(eff) # graphs the predicted probabilities

How to obtain Tukey compact letter display from a GLM with interactions

I have set of data that I've analyzed with a generalized linear model that has three categorical factors in 3-way interaction (factorA, factorB, factorC) and a fourth continuous factor (factorD) that is simply added in the model. I am trying to obtain a set of Tukey letter groups (ie, compact letter display) from the model but haven't found a way to include the interaction successfully. I'm not interested in including factorD, just the three in the interaction.
I have gotten the Tukey-adjusted pairwise comparisons with this:
lsmeans(my.glm, factorA*factorB*factorC)
But I was not able to figure out how to produce a compact letters display from that. It can be done with multcomp package but I could only find ways to do it with main effects with that package, not interactions.
So then I tried the agricolae package, as this post (https://stats.stackexchange.com/questions/31547/how-to-obtain-the-results-of-a-tukey-hsd-post-hoc-test-in-a-table-showing-groupe) discusses that that should work. However, following the instructions in that answer led to a non-functional response from HSD.test. Specifically, I could get the main effects tests to work fine, e.g. HSD.test(my.glm,"factorA") but I could not get the interactions to work. I tried this:
intxns<-with(my.data, interaction(factorA,factorB,factorC))
HSD.test(my.glm,"intxns",group=TRUE)
But a get an error that indicates the HSD.test function didn't recognize "intxns" as a valid object, it looks like this (also, I checked the intxns object and it looks good and the number of rows matched the number of residuals of my glm):
Name: inxtns
factorA factorB factorC factorD
I get that same error if I just put nonsense into the factor field in the HSD.test function call. I checked the inxtns object and it looks good and the number of rows matched the number of residu
The agricolae notes don't actually cover the use of interactions in HSD.test, but I assume it can work.
Does anyone know how to get HSD.test to work with interactions? Or is there any other function you've gotten to work to produce compact letter displays for a glm with interactions?
I've been working on this for a number of days now and haven't been able to find a solution, hopefully I'm not missing something obvious.
Thanks!
I don't know how you've specified your glm model, but for HSD.test, it's looking to match the particular treatment name with the same name specified in the glm formula as well as the data frame. This is why your main effect, factorA will work, but not the 3-way interaction. For multiple comparison tests on interactions, I find it easiest to generate the interactions separately and add them to the data frame as additional columns. The glm model can then be specified using the new variables which code for the interaction.
For example,
set.seed(42)
glm.dat <- data.frame(y = rnorm(1000), factorA = sample(letters[1:2],
size = 1000, replace = TRUE),
factorB = sample(letters[1:2], size = 1000, replace = TRUE),
factorC = sample(letters[1:2], size = 1000, replace = TRUE))
# Generate interactions explicitly and add them to the data.frame
glm.dat$factorAB <- with(glm.dat, interaction(factorA, factorB))
glm.dat$factorAC <- with(glm.dat, interaction(factorA, factorC))
glm.dat$factorBC <- with(glm.dat, interaction(factorB, factorC))
glm.dat$factorABC <- with(glm.dat, interaction(factorA, factorB, factorC))
# General linear model
glm.mod <- glm(y ~ factorA + factorB + factorC + factorAB + factorAC +
factorBC + factorABC, family = 'gaussian', data = glm.dat)
# Multiple comparison test
library(agricolae)
comp <- HSD.test(glm.mod, trt = "factorABC", group = TRUE)
giving
comp$groups giving
trt means M
1 a.a.a 0.070052189 a
2 a.b.b 0.035684571 a
3 b.a.a 0.020517535 a
4 b.b.b -0.008153257 a
5 a.b.a -0.036136140 a
6 a.a.b -0.078891136 a
7 b.a.b -0.080845419 a
8 b.b.a -0.115808772 a

R: How to make column of predictions for logistic regression model?

So I have a data set called x. The contents are simple enough to just write out so I'll just outline it here:
the dependent variable, Report, in the first column is binary yes/no (0 = no, 1 = yes)
the subsequent 3 columns are all categorical variables (race.f, sex.f, gender.f) that have all been converted to factors, and they're designated by numbers (e.g. 1= white, 2 = black, etc.)
I have run a logistic regression on x as follows:
glm <- glm(Report ~ race.f + sex.f + gender.f, data=x,
family = binomial(link="logit"))
And I can check the fitted probabilities by looking at summary(glm$fitted).
My question: How do I create a fifth column on the right side of this data set x that will include the predictions (i.e. fitted probabilities) for Report? Of course, I could just insert the glm$fitted as a column, but I'd like to try to write a code that predicts it based on whatever is in the race, sex, gender columns for a more generalized use.
Right now I the follow code which I will hope create a predicted column as well as lower and upper bounds for the confidence interval.
xnew <- cbind(xnew, predict(glm5, newdata = xnew, type = "link", se = TRUE))
xnew <- within(xnew, {
PredictedProb <- plogis(fit)
LL <- plogis(fit - (1.96 * se.fit))
UL <- plogis(fit + (1.96 * se.fit))
})
Unfortunately I get the error:
Error in eval(expr, envir, enclos) : object 'race.f' not found
after the cbind code.
Anyone have any idea?
There appears to be a few typo in your codes; First Xnew calls on glm5 but your model as far as I can see is glm (by the way using glm as name of your output is probably not a good idea). Secondly make sure the variable race.f is actually in the dataset you wish to do the prediction from. My guess is R can't find that variable hence the error.

Resources