Compact letter display from pairwise test - r

I would like to create a compact letter display from a post-hoc test I did on a linear mixed effect model (lmer)
Here is an example of what I would like when I do a pairwise t.test
df <- read.table("https://pastebin.com/raw/Dzfh7b2f", header=T,sep="")
library(rcompanion)
library(multcompView)
PT <- pairwise.t.test(df$fit,df$treatment, method=bonferroni)
PT = PT$p.value
PT1 = fullPTable(PT)
multcompLetters(PT1,
compare="<",
threshold=0.05,
Letters=letters,
reversed = FALSE)
This works our great, because from the pairwise.t.test, it is easy to simply extract the p values, and create the table I would like.
Now lets say I run a linear model, do a pairwise comparison, and would like to also create a table, as I did above, that creates a compact letter display for me from the extracted pvalues
library(multcomp)
mult<- summary(glht(model, linfct = mcp(treatment = "Tukey")), test = adjusted("holm"))
mult
I can see the p values, but have spent the last 2-3 hours trying to figure out how to just extract those values (as I did above with the pairwise.t.test), and subsequently, create a compact letter display table.
Any help is much appreciated. All the best

Find more details here.
mod <- lm(Sepal.Width ~ Species, data = iris)
mod_means_contr <- emmeans::emmeans(object = mod,
pairwise ~ "Species",
adjust = "tukey")
mod_means <- multcomp::cld(object = mod_means_contr$emmeans,
Letters = letters)
### Bonus plot
library(ggplot2)
ggplot(data = mod_means,
aes(x = Species, y = emmean)) +
geom_point() +
geom_errorbar(aes(ymin = lower.CL,
ymax = upper.CL),
width = 0.2) +
geom_text(aes(label = gsub(" ", "", .group)),
position = position_nudge(x = 0.2)) +
labs(caption = "Means followed by a common letter are\nnot significantly different according to the Tukey-test")
Created on 2021-06-03 by the reprex package (v2.0.0)

Thanks to the suggestion by #roland, the answer is simply:
mult<- summary(glht(model, linfct = mcp(treatment = "Tukey")), test = adjusted("holm"))
letter_display <- cld(mult)
letter_display

Related

How to generate a compact letter display for pairwise TukeyHSD

I'm having trouble generating a compact letter display for my results.
I've run an ANOVA followed by Tukey's HSD to generate the p values for each pair, but I do not know how (or if it is possible?) to assign letters to these p values to show which pairs are significant from each other.
csa.anova<-aov(rate~temp*light,data=csa.per.chl)
summary(csa.anova)
TukeyHSD(csa.anova)
This runs the tests I need, but I don't know how to assign letters to each p value to show which pairs are significant.
Find more details here.
mod <- lm(Sepal.Width ~ Species, data = iris)
mod_means_contr <- emmeans::emmeans(object = mod,
pairwise ~ "Species",
adjust = "tukey")
mod_means <- multcomp::cld(object = mod_means_contr$emmeans,
Letters = letters)
library(ggplot2)
ggplot(data = mod_means,
aes(x = Species, y = emmean)) +
geom_errorbar(aes(ymin = lower.CL,
ymax = upper.CL),
width = 0.2) +
geom_point() +
geom_text(aes(label = gsub(" ", "", .group)),
position = position_nudge(x = 0.2)) +
labs(caption = "Means followed by a common letter are\nnot significantly different according to the Tukey-test")
Created on 2021-06-03 by the reprex package (v2.0.0)
You need to install the multcomp package first. It can compute the Tukey HSD Test and returns an object that has summary and plot methods. The package also has a function (cld) to print the "compact letter display." As an example we can use the iris data set that comes with R:
library(multcomp)
data(iris)
iris.aov <- aov(Petal.Length~Species, iris)
iris.tukey <- glht(iris.aov, linfct=mcp(Species="Tukey"))
cld(iris.tukey)
# setosa versicolor virginica
# "a" "b" "c"

How do I plot a single numerical covariate using emmeans (or other package) from a model?

After variable selection I usually end up in a model with a numerical covariable (2nd or 3rd degree). What I want to do is to plot using emmeans package preferentially. Is there a way of doing it?
I can do it using predict:
m1 <- lm(mpg ~ poly(disp,2), data = mtcars)
df <- cbind(disp = mtcars$disp, predict.lm(m1, interval = "confidence"))
df <- as.data.frame(df)
ggplot(data = df, aes(x = disp, y = fit)) +
geom_line() +
geom_ribbon(aes(ymin = lwr, ymax = upr, x = disp, y = fit),alpha = 0.2)
I didn't figured out a way of doing it using emmip neither emtrends
For illustration purposes, how could I do it using mixed models via lme?
m1 <- lme(mpg ~ poly(disp,2), random = ~1|factor(am), data = mtcars)
I suspect that your issue is due to the fact that by default, covariates are reduced to their means in emmeans. You can use theat or cov.reduce arguments to specify a larger number of values. See the documentation for ref_grid and vignette(“basics”, “emmeans”), or the index of vignette topics.
Using sjPlot:
plot_model(m1, terms = "disp [all]", type = "pred")
gives the same graphic.
Using emmeans:
em1 <- ref_grid(m1, at = list(disp = seq(min(mtcars$disp), max(mtcars$disp), 1)))
emmip(em1, ~disp, CIs = T)
returns a graphic with a small difference in layout. An alternative is to add the result to an object and plot as the way that I want to:
d1 <- emmip(em1, ~disp, CIs = T, plotit = F)

ggplot GLM fitted curve without interaction

I want to add the fitted function from GLM on a ggplot. By default, it automatically create the plot with interaction. I am wondering, if I can plot the fitted function from the model without interaction. For example,
dta <- read.csv("http://www.ats.ucla.edu/stat/data/poisson_sim.csv")
dta <- within(dta, {
prog <- factor(prog, levels=1:3, labels=c("General", "Academic", "Vocational"))
id <- factor(id)
})
plt <- ggplot(dta, aes(math, num_awards, col = prog)) +
geom_point(size = 2) +
geom_smooth(method = "glm", , se = F,
method.args = list(family = "poisson"))
print(plt)
gives the plot with interaction,
However, I want the plot from the model,
`num_awards` = ß0 + ß1*`math` + ß2*`prog` + error
I tried to get this this way,
mod <- glm(num_awards ~ math + prog, data = dta, family = "poisson")
fun.gen <- function(awd) exp(mod$coef[1] + mod$coef[2] * awd)
fun.acd <- function(awd) exp(mod$coef[1] + mod$coef[2] * awd + mod$coef[3])
fun.voc <- function(awd) exp(mod$coef[1] + mod$coef[2] * awd + mod$coef[4])
ggplot(dta, aes(math, num_awards, col = prog)) +
geom_point() +
stat_function(fun = fun.gen, col = "red") +
stat_function(fun = fun.acd, col = "green") +
stat_function(fun = fun.voc, col = "blue") +
geom_smooth(method = "glm", se = F,
method.args = list(family = "poisson"), linetype = "dashed")
The output plot is
Is there any simple way in ggplot to do this efficiently?
Ben's idea of plotting predicted value of the response for specific model terms inspired me improving the type = "y.pc" option of the sjp.glm function. A new update is on GitHub, with version number 1.9.4-3.
Now you can plot predicted values for specific terms, one which is used along the x-axis, and a second one used as grouping factor:
sjp.glm(mod, type = "y.pc", vars = c("math", "prog"))
which gives you following plot:
The vars argument is needed in case your model has more than two terms, to specify the term for the x-axis-range and the term for the grouping.
You can also facet the groups:
sjp.glm(mod, type = "y.pc", vars = c("math", "prog"), show.ci = T, facet.grid = T)
There's no way that I know of to trick geom_smooth() into doing this, but you can do a little better than you've done. You still have to fit the model yourself and add the lines, but you can use the predict() method to generate the predictions and load them into a data frame with the same structure as the original data ...
mod <- glm(num_awards ~ math + prog, data = dta, family = "poisson")
## generate prediction frame
pframe <- with(dta,
expand.grid(math=seq(min(math),max(math),length=51),
prog=levels(prog)))
## add predicted values (on response scale) to prediction frame
pframe$num_awards <- predict(mod,newdata=pframe,type="response")
ggplot(dta, aes(math, num_awards, col = prog)) +
geom_point() +
geom_smooth(method = "glm", se = FALSE,
method.args = list(family = "poisson"), linetype = "dashed")+
geom_line(data=pframe) ## use prediction data here
## (inherits aesthetics etc. from main ggplot call)
(the only difference here is that the way I've done it the predictions span the full horizontal range for all groups, as if you had specified fullrange=TRUE in geom_smooth()).
In principle it seems as though the sjPlot package should be able to handle this sort of thing, but it looks like the relevant bit of code for doing this plot type is hard-coded to assume a binomial GLM ... oh well.
I'm not sure, but you wrote "without interaction" - maybe you are looking for effect plots? (If not, excuse me that I'm assuming something completely wrong...)
You can, for instance, use the effects package for this.
dta <- read.csv("http://www.ats.ucla.edu/stat/data/poisson_sim.csv")
dta <- within(dta, {
prog <- factor(prog, levels=1:3, labels=c("General", "Academic", "Vocational"))
id <- factor(id)
})
mod <- glm(num_awards ~ math + prog, data = dta, family = "poisson")
library(effects)
plot(allEffects(mod))
Another option would be the sjPlot package, as Ben suggested - however, the current version on CRAN only supports logistic regression models properly for effect plots. But in the current development version on GitHub I added support for various model families and link functions, so if you like, you can download that snapshot. The sjPlot package uses ggplot instead of lattice (which is used by the effects package, I think):
sjp.glm(mod, type = "eff", show.ci = T)
Or in non-faceted way:
sjp.glm(mod, type = "eff", facet.grid = F, show.ci = T)

Specify regression line intercept (R & ggplot2)

BACKGROUND
My current plot looks like this:
PROBLEM
I want to force the regression line to start at 1 for station_1.
CODE
library(ggplot2)
#READ IN DATA
var_x = c(2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011)
var_y = c(1.000000,1.041355,1.053106,1.085738,1.126375,1.149899,1.210831,1.249480,1.286305,1.367923,1.486978,1.000000,0.9849343,0.9826141,0.9676000,0.9382975,0.9037476,0.8757748,0.8607960,0.8573634,0.8536138,0.8258877)
var_z = c('Station_1','Station_1','Station_1','Station_1','Station_1','Station_1','Station_1','Station_1','Station_1','Station_1','Station_1','Station_2','Station_2','Station_2','Station_2','Station_2','Station_2','Station_2','Station_2','Station_2','Station_2','Station_2')
df_data = data.frame(var_x,var_y,var_z)
out = ggplot(df_data,aes(x=var_x,y=var_y,group=var_z))
out = out + geom_line(aes(linetype=var_z),size=1)
out = out + theme_classic()
#SELECT DATA FOR Station_1
PFI_data=subset(df_data,var_z=="Station_1")
#PLOT REGRESSION FOR Station_1
out = out+ stat_smooth(data = PFI_data,
method=lm,
formula = y~x,
se=T,size = 1.4,colour = "blue",linetype=1)
Any help would be appreciated - this has been driving me crazy for too long!
First of all, you should be careful when forcing a regression line to some fixed point. Here's a link to a discussion why.
Now, from a technical perspective, I'm relying heavily on these questions and answers: one, two. The outline of my solution is the following: precompute the desired intercept, run a regression without it, add the intercept to the resulting prediction.
I'm using an internal ggplot2:::predictdf.default function to save some typing. The cbind(df, df) part may look strange, but it's a simple hack to make geom_smooth work properly, since there are two factor levels in var_z.
# Previous code should remain intact, replace the rest with this:
# SELECT DATA FOR Station_1
PFI_data=subset(df_data,var_z=="Station_1")
names(PFI_data) <- c("x", "y", "z")
x0 <- df_data[df_data$var_z == "Station_1", "var_x"][1]
y0 <- df_data[df_data$var_z == "Station_1", "var_y"][1]
model <- lm(I(y-y0) ~ I(x-x0) + 0, data = PFI_data)
xrange <- range(PFI_data$x)
xseq <- seq(from=xrange[1], to=xrange[2])
df <- ggplot2:::predictdf.default(model, xseq, se=T, level=0.95)
df <- rbind(df, df)
df[c("y", "ymin", "ymax")] <- df[c("y", "ymin", "ymax")] + y0
out + geom_smooth(aes_auto(df), data=df, stat="identity")

A replacement for method = 'loess'

This is where I'm at so far:
I have a data frame df with two columns A and B (both containing real numbers) where b is dependent on a. I plot the columns against each other:
p = ggplot(df, aes(A, B)) + geom_point()
and see that the relationship is non-linear. Adding:
p = p + geom_smooth(method = 'loess', span = 1)
gives a 'good' line of best fit. Given a new value a of A I then use the following method to predict the value of B:
B.loess = loess(B ~ A, span = 1, data = df)
predict(B.loess, newdata = a)
So far, so good. However, I then realise I can't extrapolate using loess (presumably because it is non-parametric?!). The extrapolation seems fairly natural - the relationship looks something like a power type thing is going on e.g:
x = c(1:10)
y = 2^x
df = data.frame(A = x, B = y)
This is where I get unstuck. Firstly, what methods can I use to plot a line of best fit to this kind of ('power') data without using loess? Pathetic attempts such as:
p = ggplot(df, aes(A, B)) + geom_point() +
geom_smooth(method = 'lm', formula = log(y) ~ x)
give me errors. Also, assuming I am actually able to plot a line of best fit that I am happy with, I am having trouble using predict in a similar way I did when using loess. For examples sake, suppose I am happy with the line of best fit:
p = ggplot(df, aes(A, B)) + geom_point() +
geom_smooth(method = 'lm', formula = y ~ x)
then if I want to predict what value B would take if A was equal to 11 (theoretically 2^11), the following method does not work:
B.lm = lm(B ~ A)
predict(B.lm, newdata = 11)
Any help much appreciated. Cheers.
First , To answer your last question, you need to provide a data.frame with colnames are the predictors.
B.lm <- lm(B ~ A,data=df)
predict(B.lm, newdata = data.frame(A=11))
1
683.3333
As an alternative to loess you can try some higher polynomial regressions. Here I in this plot I compare poly~3 to loess using latticeExtra(easier to add the xspline interpolation) but in similar syntax to ggplot2.(layer).
xyplot(A ~ B,data=df,par.settings = ggplot2like(),
panel = function(x,y,...){
panel.xyplot(x,y,...)
grid.xspline(x,y,..., default.units = "native") ## xspline interpolation
})+
layer(panel.smoother(y ~ poly(x, 3), method = "lm"), style = 1)+ ## poly
layer(panel.smoother(y ~ x, span = 0.9),style=2) ### loeess
The default surface for loess.control is interpolate which, unsurprisingly doesn't allow extrapolations. The alternative, direct, allows you to extrapolate though a question remains as to whether this is meaningful.
predict(loess(hp~disp,mtcars),newdata=1000)
[1] NA
predict(loess(hp~disp,mtcars,control=loess.control(surface="direct")),newdata=1000)
[1] -785.0545

Resources