Changing order of letters of significance in TUKEY test - r

I have created a boxplot using ggplot from the results of my tukey test. I have added the letters of significance above my boxes but the letters are not in order. I wish for my first sample to be "a" and then have "b" and then "c".
I used the following code;
value_max =
Rosettes %>%
group_by(Genotype) %>%
summarize(max_value = max(X0.5xMS))
hsd=HSD.test(aov(X0.5xMS~Genotype, data=Rosettes), trt = "Genotype", group = T)
sig.letters <- hsd$groups[order(row.names(hsd$groups)), ]
p <- ggplot(data = Rosettes, aes(x = Genotype, y = X0.5xMS)) +
geom_boxplot(aes(fill = Genotype,)) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+ theme(axis.text.x = element_text(angle = 90)) +
geom_text(data = value_max, aes(x=Genotype, y = 0.1 + max_value, label = sig.letters$groups), vjust=0)+
stat_boxplot(geom = 'errorbar', width = 0.1)+
ggtitle("Rosette Tukey Results \n 0.5xMS") + theme(plot.title = element_text(hjust=0.5))+
xlab("Genotype") + ylab("Rosette Area (cm2)"); p
This code has given me the desired graph, the order of the letters is my only issue. If anyone could help, I would be very grateful.

Does this help? I know it is not your data and not your functions (aov and hsd.test but it does what you want, I believe. (Please find more about the compact letter display here.)
library(emmeans)
library(multcomp)
library(multcompView)
# set up model
model <- lm(weight ~ group, data = PlantGrowth)
# get (adjusted) weight means per group
model_means <- emmeans(object = model,
specs = "group")
# add letters to each mean
model_means_cld <- cld(object = model_means,
adjust = "Tukey",
Letters = letters,
reversed = TRUE, # <---- this one here!
alpha = 0.05)
# show output
model_means_cld
#> group emmean SE df lower.CL upper.CL .group
#> trt2 5.53 0.197 27 5.02 6.03 a
#> ctrl 5.03 0.197 27 4.53 5.53 ab
#> trt1 4.66 0.197 27 4.16 5.16 b
#>
#> Confidence level used: 0.95
#> Conf-level adjustment: sidak method for 3 estimates
#> P value adjustment: tukey method for comparing a family of 3 estimates
#> significance level used: alpha = 0.05
Created on 2021-10-18 by the reprex package (v2.0.1)

Related

Is there an R Function to make a funnel plot using P Values and Confidence Intervals Only?

Within my data I only have the confidence intervals, Odds Ratios and PValues and I would like to make a Funnel Plot to prove a false heterogeneity within my meta-analysis in R. But I can't seem to find the function to make this work... I have attached some example data so you can see what I am working with...
Article Lower Upper OR PValue
debbie 1.5 1.8 1.67 0.0001
Michelle 1.25 1.67 1.45 0.025
Richard 0.02 1.08 0.9 0.009
Any help you be greatly appreciated, even if you can point me in the right direction...
I can find plenty of articles which mention how to do funnel plots in R using the funnel.plots function however, none will work with the data I have. So I am a little stuck.
If we take the natural log of the Upper and Lower variables, they should each be 1.96 standard errors from the log of the odds ratio. We can therefore calculate the standard error for each study.
It is possible to make a funnel plot with log odds on the y axis and the inverse of the calculated standard error on the x axis to produce a funnel plot.
With your own data set, there are only 3 points, so let's make a larger data set to demonstrate this. We will create a table with the same columns, based on sample data which should have an average odds ratio of 1:
set.seed(1)
data <- do.call(rbind, lapply(1:100, function(i) {
n <- round(10^runif(1, 1, 4))
dat <- data.frame(x = sample(0:1, size = n, replace = TRUE),
y = sample(0:1, size = n, replace = TRUE))
coefs <- summary(glm(y ~ x, family = binomial, data = dat))$coef
data.frame(Article = as.character(i),
Lower = round(exp(coefs[2, 1] - 1.96 * coefs[2, 2]), 2),
Upper = round(exp(coefs[2, 1] + 1.96 * coefs[2, 2]), 2),
OR = round(exp(coefs[2, 1]), 2),
PValue = coefs[2, 4])
}))
head(data)
#> Article Lower Upper OR PValue
#> 1 1 0.58 4.22 1.56 0.3797660
#> 2 2 0.29 3.31 0.98 0.9764189
#> 3 3 0.90 1.28 1.07 0.4417953
#> 4 4 0.57 1.50 0.92 0.7421806
#> 5 5 0.56 17.33 3.11 0.1952396
#> 6 6 0.96 1.13 1.04 0.3525411
If we take the log of the odds ratio columns, and calculate the standard error, we can then plot the points in a funnel with 95% confidence intervals. Furthermore, we can color the points that have a p value of less than 0.05 to show these are outside our funnel:
data %>%
mutate(across(Lower:OR, log)) %>%
mutate(se = (Upper-OR) /1.96) %>%
ggplot(aes(1 / se, OR)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_point(aes(color = ifelse(PValue < 0.05, 'red', 'gray50'))) +
geom_function(fun = ~ 1.96/(.x), linetype = 2, color = 'red3') +
geom_function(fun = ~ -1.96/(.x), linetype = 2, color = 'red3') +
coord_cartesian(ylim = c(-3, 3)) +
labs(x = 'Inverse standard error', y = 'Log odds ratio') +
theme_minimal(base_size = 16) +
scale_color_identity()

How do you plot smooth components of different GAMs in same panel?

I have two GAMs which have the same predictor variables but different independent variables. I would like to combine the two GAMs to a set of plots where the smooth component (partial residuals) of each predictor variable are in the same panel (differentiated with e.g. color). Reproducible example:
# Required packages
require(mgcv)
require(mgcViz)
# Dataset
data("swiss")
# GAM models
fit1 <- mgcv::gam(Fertility ~ s(Examination) + s(Education), data = swiss)
fit2 <- mgcv::gam(Agriculture ~ s(Examination) + s(Education), data = swiss)
# Converting GAM objects to a gamViz objects
viz_fit1 <- mgcViz::getViz(fit1)
viz_fit2 <- mgcViz::getViz(fit2)
# Make plotGAM objects
trt_fit1 <- plot(viz_fit1, allTerms = T) + l_fitLine()
trt_fit2 <- plot(viz_fit2, allTerms = T) + l_fitLine()
# Print plots
print(trt_fit1, pages = 1)
print(trt_fit2, pages = 1)
Plot of fit1 looks like this:
And fit2 like this:
So I would like to combine the two Examinations into one panel, and the two Educations into another one, showing the independent variable (from different GAMs) with different color/linetype.
You could also do this using my {gratia} 📦 and the compare_smooths() function:
library("gratia")
library("mgcv")
# Dataset
data("swiss")
# GAM models
fit1 <- gam(Fertility ~ s(Examination) + s(Education),
data = swiss, method = "REML")
fit2 <- gam(Agriculture ~ s(Examination) + s(Education),
data = swiss, method = "REML")
# create and object that contains the info to compare smooths
comp <- compare_smooths(fit1, fit2)
# plot
draw(comp)
This produces
The output from compare_smooth() is a nested data frame (tibble)
r$> comp
# A tibble: 4 × 5
model smooth type by data
<chr> <chr> <chr> <chr> <list>
1 fit1 s(Education) TPRS NA <tibble [100 × 3]>
2 fit2 s(Education) TPRS NA <tibble [100 × 3]>
3 fit1 s(Examination) TPRS NA <tibble [100 × 3]>
4 fit2 s(Examination) TPRS NA <tibble [100 × 3]>
So if you want to do customising of the plot etc, you'll need to know how to work with nested data frames or just do
library("tidyr")
unnest(comp, data)
which gets you:
r$> unnest(comp, data)
# A tibble: 400 × 8
model smooth type by est se Education Examination
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 fit1 s(Education) TPRS NA 1.19 3.48 1 NA
2 fit1 s(Education) TPRS NA 1.37 3.20 1.53 NA
3 fit1 s(Education) TPRS NA 1.56 2.94 2.05 NA
4 fit1 s(Education) TPRS NA 1.75 2.70 2.58 NA
5 fit1 s(Education) TPRS NA 1.93 2.49 3.10 NA
6 fit1 s(Education) TPRS NA 2.11 2.29 3.63 NA
7 fit1 s(Education) TPRS NA 2.28 2.11 4.15 NA
8 fit1 s(Education) TPRS NA 2.44 1.95 4.68 NA
9 fit1 s(Education) TPRS NA 2.59 1.82 5.20 NA
10 fit1 s(Education) TPRS NA 2.72 1.71 5.73 NA
# … with 390 more rows
To create your own plots then, we proceed from the unnested data frames and add the confidence interval
ucomp <- unnest(comp, data) %>%
add_confint()
Then plot each panel in turn
library("ggplot2")
library("dplyr")
p_edu <- ucomp |>
filter(smooth == "s(Education)") |> # <-- only one comparison at a time
ggplot(aes(x = Education, y = est)) +
geom_ribbon(aes(ymin = lower_ci, ymax = upper_ci, fill = model),
alpha = 0.2) +
geom_line(aes(colour = model)) +
scale_fill_brewer(palette = "Set1") + # <-- change fill scale
scale_colour_brewer(palette = "Set1") + # <-- change colour scale
geom_rug(data = swiss, # <-- rug
mapping = aes(x = Education, y = NULL),
sides = "b", alpha = 0.4) +
labs(title = "s(Education)", y = "Estimate",
colour = "Model", fill = "Model")
p_exam <- ucomp |>
filter(smooth == "s(Examination)") |>
ggplot(aes(x = Examination, y = est)) +
geom_ribbon(aes(ymin = lower_ci, ymax = upper_ci, fill = model),
alpha = 0.2) +
geom_line(aes(colour = model)) +
scale_fill_brewer(palette = "Set1") + # <-- change fill scale
scale_colour_brewer(palette = "Set1") + # <-- change colour scale
geom_rug(data = swiss, # <-- rug
mapping = aes(x = Examination, y = NULL),
sides = "b", alpha = 0.4) +
labs(title = "s(Examination)", y = "Estimate",
colour = "Model", fill = "Model")
Now use the {patchwork} package to put the plots together
library("patchwork")
p_edu + p_exam + plot_layout(guides = "collect")
which produces
This is all using {ggplot2} so you'll need to look at other scales if you want more control over the colours ?scale_fill_manual for example or provide other ready-made discrete scales if you want to use an existing palette.
I could make some of this easier in {gratia} - I could allow users to provide a scale to be used for the colour and fill, and also if they supply the raw data I could draw the rugs too.
If you want them in the same plot, you can pull the data from your fit with trt_fit1[["plots"]][[1]]$data$fit and plot them yourself. I looked at the plot style from the mgcViz github. You can add a second axis or scale as necessary.
library(tidyverse)
exam_dat <-
bind_rows(trt_fit1[["plots"]][[1]]$data$fit %>% mutate(fit = "Fit 1"),
trt_fit2[["plots"]][[1]]$data$fit %>% mutate(fit = "Fit 2"))
ggplot(data = exam_dat, aes(x = x, y = y, colour = fit)) +
geom_line() +
labs(x = "Examination", y = "s(Examination)") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
To simply get them on the same panel, you could use gridExtra as fit1 and fit2 have a ggplot object.
gridExtra::grid.arrange(
trt_fit1[["plots"]][[2]]$ggObj,
trt_fit2[["plots"]][[2]]$ggObj,
nrow = 1)
Created on 2022-02-18 by the reprex package (v2.0.1)

Add geom_smooth to ggplot facets conditionally based on p-value

I'm using ggplot to visualize many linear regressions and facet them by groups. I'd like geom_smooth() to show the trend line as one color if P < 0.05, a different color if P < 0.10, and not show it at all if P ≥ 0.10.
I managed to do this using a loop to extract P-values from lm() for each regression, then join them with the data used for plotting. Then I add another column of color names to pass to aes(), determined conditionally from the P-values, and use scale_color_identity() to achieve my goal.
Here's an example:
library(tidyverse)
#make mtcars a tibble and cyl a factor, for convenience
mtcars1 <- as_tibble(mtcars) %>% dplyr::mutate(cyl = as.factor(cyl))
#initialize a list to store p-values from lm() for each level of factor
p.list <- vector(mode = "list", length = length(levels(mtcars1$cyl)))
names(p.list) <- levels(mtcars1$cyl)
#loop to calculate p-values for each level of mtcars$cyl
for(i in seq_along(levels(mtcars1$cyl))){
mtcars.sub <- mtcars1 %>% dplyr::filter(cyl == levels(.$cyl)[i])
lm.pval <- mtcars.sub %>%
dplyr::distinct(cyl) %>%
dplyr::mutate(P =
summary(lm(mpg ~ disp, data = mtcars.sub))$coefficients[2,4] ##extract P-value
)
p.list[[i]] <- lm.pval
}
#join p-values to dataset and add column to use with scale_color_identity()
mtcars.p <- mtcars1 %>% dplyr::left_join(dplyr::bind_rows(p.list, .id = "cyl"), by = "cyl") %>%
dplyr::mutate(p.color = ifelse(P < 0.05, "black",
ifelse(P < 0.10, "lightblue", NA)))
#plot
ggplot(data = mtcars.p, aes(x = disp, y = mpg)) +
geom_smooth(method = "lm",
se = FALSE,
aes(color = p.color)) +
geom_point() +
scale_color_identity(name = NULL,
na.translate = FALSE,
labels = c("P < 0.05", "P < 0.10"),
guide = "legend") +
facet_wrap(~cyl, scales = "free")
This seems like too many initial steps for something that should be relatively easy. Are these steps necessary, or is there a more efficient way of doing this? Can ggplot or any other packages out there do this on their own, without having to first extract p-values from lm()?
After specifying your regression function, you can include the line function within ggplot:
myline<-lm(mpg ~ disp, data = mtcars)
ggplot(data = mtcars, aes(x = disp, y = mpg)) +
geom_abline(slope = coef(myline)[[2]], intercept = coef(myline)[[1]], color='blue')+
geom_point(color='red') +
scale_color_identity(name = NULL,
na.translate = FALSE,
labels = c("P < 0.05", "P < 0.10"),
guide = "legend") +
facet_wrap(~cyl, scales = "free")
The same as above, you can use this geom_smooth() command as well:
geom_smooth(slope = coef(myline)[[2]], intercept = coef(myline)[[1]], color='blue',se=F,method='lm')+
We may simplify the steps with a group by operation and also instead of extracting each component, the output can be in a tibble with tidy from broom
library(broom)
library(dplyr)
library(tidyr)
mtcars1 %>%
group_by(cyl) %>%
summarise(out = list(tidy(lm(mpg ~ disp, data = cur_data())))) %>%
unnest(out)
-output
# A tibble: 6 x 6
cyl term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 4 (Intercept) 40.9 3.59 11.4 0.00000120
2 4 disp -0.135 0.0332 -4.07 0.00278
3 6 (Intercept) 19.1 2.91 6.55 0.00124
4 6 disp 0.00361 0.0156 0.232 0.826
5 8 (Intercept) 22.0 3.35 6.59 0.0000259
6 8 disp -0.0196 0.00932 -2.11 0.0568

splitting data frame in two parts (w.r.t. NA) and ploting one conditional graph concerning each part

consider the following data frame
set.seed(1357)
DS <- data.frame("group" = c("a", "b", rep(letters[3:6], each=2)),
"condition" = c(NA, NA, rep(c("cond1","cond2"), times=4)),
"oddsratio" = round(abs(rnorm(10,3)),2),
"lower" = round(abs(rnorm(10,0)),2),
"upper" = round(abs(rnorm(10,6)),2))
> DS
group condition oddsratio lower upper
1 a <NA> 3.37 0.66 4.59
2 b <NA> 4.77 0.31 6.37
3 c cond1 2.02 0.92 5.86
4 c cond2 3.37 0.71 7.38
5 d cond1 1.15 0.13 3.30
6 d cond2 3.74 0.25 7.28
7 e cond1 2.81 0.89 3.37
8 e cond2 4.15 1.87 5.32
9 f cond1 3.22 0.72 4.88
10 f cond2 2.02 0.54 7.43
"lower" and "upper" are lower and upper bound of the confidence interval, respectively.
I want to plot odds ratios with coresponding confidence intervals over groups „a“ to „f“ groupd by „condition“. For two groups of „a“ and „b“ we have NA under „condition“. ggplot should ignore it and plot odds ratio and confidence interval also for groups „a" and „b“ without considering „condition“. I guess I should split my data frame with respect to the column "condition" and then use ggplot for each part but don't know how! Hier is my code, which gives only odds ratios and confidence intervals for group „c“, „d“, „e“ and „f“. I would be thankful for any help.
p <- ggplot(DS, aes(x = group, y = oddsratio, color=condition)) +
theme_bw() +
geom_point(size = 3, position=position_dodge(.4)) +
geom_errorbar(aes(x = group, ymin = lower, ymax = upper), position=position_dodge(.4)) +
scale_color_manual(labels = c("cond1","cond2"), values = c("black","grey")) +
labs(x = "Population",
y = "Odds Ratio with 95% confidence interval")
p
As commented, consider converting the NA values into a new condition such as "NA", then run plot but adjust scale_color_manual() to add one more label and value for the new group:
DS$condition <- factor(ifelse(is.na(DS$condition), "NA", as.character(DS$condition)))
ggplot(DS, aes(x = group, y = oddsratio, color=condition)) +
theme_bw() +
geom_point(size = 3, position=position_dodge(.4)) +
geom_errorbar(aes(x = group, ymin = lower, ymax = upper), position=position_dodge(.4)) +
scale_color_manual(labels = c("cond1","cond2", "NA"), values = c("black","darkgrey", "grey")) +
labs(x = "Population",
y = "Odds Ratio with 95% confidence interval")

ggplot2: how to get robust confidence interval for predictions in geom_smooth?

consider this simple example
dataframe <- data_frame(x = c(1,2,3,4,5,6),
y = c(12,24,24,34,12,15))
> dataframe
# A tibble: 6 x 2
x y
<dbl> <dbl>
1 1 12
2 2 24
3 3 24
4 4 34
5 5 12
6 6 15
dataframe %>% ggplot(., aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = 'lm', formula = y~x)
Here the standard errors are computed with the default option. However, I would like to use the robust variance-covariance matrix available in the package sandwich and lmtest
That is, using vcovHC(mymodel, "HC3")
Is there a way to get that in a simple way using the geom_smooth() function?
UPDATE: 2021-03-17 It was recently pointed out to me that the ggeffects package handles different VCOVs automatically, including the trickier HAC case that I originally demonstrated below. Quick example of the latter:
library(ggeffects)
library(sandwich) ## For HAC and other robust VCOVs
d <- data.frame(x = c(1,2,3,4,5,6),
y = c(12,24,24,34,12,15))
reg1 <- lm(y ~ x, data = d)
plot(ggpredict(reg1, "x", vcov.fun = "vcovHAC"))
#> Loading required namespace: ggplot2
## This gives you a regular ggplot2 object. So you can add layers as you
## normally would. E.g. If you'd like to compare with the original data...
library(ggplot2)
last_plot() +
geom_point(data = d, aes(x, y)) +
labs(caption = 'Shaded region indicates HAC 95% CI.')
Created on 2021-03-17 by the reprex package (v1.0.0)
My original answer follows below...
HC robust SEs (simple)
This is easily done now thanks to the estimatr package and its family of lm_robust functions. E.g.
library(tidyverse)
library(estimatr)
d <- data.frame(x = c(1,2,3,4,5,6),
y = c(12,24,24,34,12,15))
d %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
geom_smooth(method = 'lm_robust', formula = y~x, fill="#E41A1C") + ## Robust (HC) SEs
geom_smooth(method = 'lm', formula = y~x, col = "grey50") + ## Just for comparison
labs(
title = "Plotting HC robust SEs in ggplot2",
subtitle = "Regular SEs in grey for comparison"
) +
theme_minimal()
Created on 2020-03-08 by the reprex package (v0.3.0)
HAC robust SEs (a bit more legwork)
The one caveat is that estimatr does not yet offer support for HAC (i.e. heteroscedasticity and autocorrelation consistent) SEs a la Newey-West. However, it is possible to obtain these manually with the sandwich package... which is kind of what the original question was asking anyway. You can then plot them using geom_ribbon().
I'll say for the record that HAC SEs don't make much sense for this particular data set. But here's an example of how you could do it, riffing off this excellent SO answer on a related topic.
library(tidyverse)
library(sandwich)
d <- data.frame(x = c(1,2,3,4,5,6),
y = c(12,24,24,34,12,15))
reg1 <- lm(y~x, data = d)
## Generate a prediction DF
pred_df <- data.frame(fit = predict(reg1))
## Get the design matrix
X_mat <- model.matrix(reg1)
## Get HAC VCOV matrix and calculate SEs
v_hac <- NeweyWest(reg1, prewhite = FALSE, adjust = TRUE) ## HAC VCOV (adjusted for small data sample)
#> Warning in meatHAC(x, order.by = order.by, prewhite = prewhite, weights =
#> weights, : more weights than observations, only first n used
var_fit_hac <- rowSums((X_mat %*% v_hac) * X_mat) ## Point-wise variance for predicted mean
se_fit_hac <- sqrt(var_fit_hac) ## SEs
## Add these to pred_df and calculate the 95% CI
pred_df <-
pred_df %>%
mutate(se_fit_hac = se_fit_hac) %>%
mutate(
lwr_hac = fit - qt(0.975, df=reg1$df.residual)*se_fit_hac,
upr_hac = fit + qt(0.975, df=reg1$df.residual)*se_fit_hac
)
pred_df
#> fit se_fit_hac lwr_hac upr_hac
#> 1 20.95238 4.250961 9.149822 32.75494
#> 2 20.63810 2.945392 12.460377 28.81581
#> 3 20.32381 1.986900 14.807291 25.84033
#> 4 20.00952 1.971797 14.534936 25.48411
#> 5 19.69524 2.914785 11.602497 27.78798
#> 6 19.38095 4.215654 7.676421 31.08548
## Plot it
bind_cols(
d,
pred_df
) %>%
ggplot(aes(x = x, y = y, ymin=lwr_hac, ymax=upr_hac)) +
geom_point() +
geom_ribbon(fill="#E41A1C", alpha=0.3, col=NA) + ## Robust (HAC) SEs
geom_smooth(method = 'lm', formula = y~x, col = "grey50") + ## Just for comparison
labs(
title = "Plotting HAC SEs in ggplot2",
subtitle = "Regular SEs in grey for comparison",
caption = "Note: Do HAC SEs make sense for this dataset? Definitely not!"
) +
theme_minimal()
Created on 2020-03-08 by the reprex package (v0.3.0)
Note that you could also use this approach to manually calculate and plot other robust SE predictions (e.g. HC1, HC2,etc.) if you so wished. All you would need to do is use the relevant sandwich estimator. For instance, using vcovHC(reg1, type = "HC2") instead of NeweyWest(reg1, prewhite = FALSE, adjust = TRUE) will give you an identical HC-robust CI to the first example that uses the estimatr package.
I am very new to this whole robust SE thing, but I was able to generate the following:
zz = '
x y
1 1 12
2 2 24
3 3 24
4 4 34
5 5 12
6 6 15
'
df <- read.table(text = zz, header = TRUE)
df
library(sandwich)
library(lmtest)
lm.model<-lm(y ~ x, data = df)
coef(lm.model)
se = sqrt(diag(vcovHC(lm.model, type = "HC3")))
fit = predict(lm.model)
predframe <- with(df,data.frame(x,
y = fit,
lwr = fit - 1.96 * se,
upr = fit + 1.96 * se))
library(ggplot2)
ggplot(df, aes(x = x, y = y))+
geom_point()+
geom_line(data = predframe)+
geom_ribbon(data = predframe, aes(ymin = lwr,ymax = upr), alpha = 0.3)

Resources