Plot pvalue information for mean comparisons by grouping variable

Plot pvalue information for mean comparisons by grouping variable - r

I've put together a plot to view groups separately but now want to include significance levels for mean pairwise comparison in the plot. While I can do the comparison outside of the plot I'm wondering what the most efficient way of including the comparison in the plot would be?
Current Plot
library(tidyverse)
dsub <- diamonds[ sample(nrow(diamonds), 10000), ]
dsub <- dsub %>%
filter(clarity %in% c('VS2', 'VS1', 'VVS2'))
ggplot(dsub, aes(x = cut, y = carat, fill = clarity)) +
geom_boxplot(outlier.size = 0) +
geom_point(pch = 21, position = position_jitterdodge())
Now I want to add the comparisons within each level of the cut variable between all levels of the clarity variable. I prefer using ggpubr but couldn't see where this could be achieved.

EDITED to take OP preference for output into account
Ahhhh... okay well let me at least save you a bunch of vertical space and neaten things up by overcoming the fact that rstatix doesn't honor the order of your factors and ggpubr wants its groups as character not factor.
library(ggplot2)
library(dplyr)
dsub <- diamonds[ sample(nrow(diamonds), 10000), ]
dsub <- dsub %>%
filter(clarity %in% c('VS2', 'VS1', 'VVS2'))
dsub <- droplevels(dsub)
dsub_stats <-
dsub %>%
group_by(cut) %>%
rstatix::wilcox_test(carat~clarity) %>%
mutate(group1 = factor(group1,
ordered = TRUE,
levels = c("VS2", "VS1", "VVS2"))) %>%
arrange(cut, group1) %>%
mutate(group1 = as.character(group1)) %>%
rstatix::add_xy_position(x='cut')
ggpubr::ggboxplot(dsub, x = "cut", y = "carat",
color = "clarity",
add='jitter') +
ggpubr::stat_pvalue_manual(dsub_stats,
label = "p.adj.signif",
tip.length = 0.01)
Created on 2020-09-24 by the reprex package (v0.3.0)

library(tidyverse)
library(rstatix)
library(ggpubr)
dsub <- diamonds[ sample(nrow(diamonds), 10000), ]
dsub <- dsub %>%
filter(clarity %in% c('VS2', 'VS1', 'VVS2'))
dsub_stats <- dsub %>%
group_by(cut) %>%
wilcox_test(carat~clarity) %>% add_xy_position(x='cut')
ggboxplot(dsub, x = "cut", y = "carat",
color = "clarity",
add='jitter'
) +
stat_pvalue_manual(dsub_stats, label = "p.adj.signif", tip.length = 0.01)

Related

Error in is.finite(x); need to add an additional line to a line chart (dplyr)

I have already attempted to search for this problem to no result. Have managed to reproduce the error below.
The problem: I'm trying to add a fourth line which represents the aggregate mean of all letters for each year. So far, I'm only able to generate the mean values for each letter. Everything runs fine until the last geom_line(), which is meant to generate the aggregate error. I've also tried inserting abline(). One other consideration is adding the "Mean" values under Letters so that they are generated anyway, but I believe there is a simpler method.
library(tidyverse)
Letters <- rep(c("A","B","C"),20)
Years <- rep(c(1990:1999),6)
Numbers <- runif(60, min = 0, max = 20)
df <- data.frame(Letters, Years, Numbers) %>%
group_by(Letters,Years) %>%
summarise(Letter_Mean= mean(Numbers),.groups = 'drop')
meanallletters <- df %>%
group_by(Years) %>%
summarise(all_mean = mean(Numbers),.groups = 'drop') %>%
select(-Years)
lineplotsample <- df %>%
ggplot(aes(x=Years, y=Letter_Mean, color = Letters))
## this doesn't work
lineplotsample + geom_line() + geom_point() + geom_line(aes(Years, y= meanallletters))
## this works, but missing the line representing aggregate mean
lineplotsample + geom_line() + geom_point()

I would summarize the data and then bind it to the bottom of the original data, like this:
library(tidyverse)
Letters <- rep(c("A","B","C"),20)
Years <- rep(c(1990:1999),6)
Numbers <- runif(60, min = 0, max = 20)
df <- data.frame(Letters, Years, Numbers) %>%
group_by(Letters,Years) %>%
summarise(Letter_Mean= mean(Numbers),.groups = 'drop')
meanallletters <- df %>%
group_by(Years) %>%
summarise(Letters = "All",
Letter_Mean = mean(Letter_Mean)) %>%
bind_rows(df,.) %>%
ungroup %>%
mutate(Letters = factor(Letters, levels=c("A", "B", "C", "All")))
meanallletters %>%
ggplot(aes(x=Years, y=Letter_Mean, color = Letters)) +
geom_line() +
geom_point()
Created on 2023-02-12 by the reprex package (v2.0.1)
Here's a more general way of specifying the levels. It also deals with the situation wither Letters is initially a factor.
library(tidyverse)
Letters <- rep(LETTERS,20)
Years <- rep(c(1990:1999),26)
Numbers <- runif(26*10, min = 0, max = 20)
df <- data.frame(Letters, Years, Numbers) %>%
group_by(Letters,Years) %>%
summarise(Letter_Mean= mean(Numbers),.groups = 'drop')
meanallletters <- df %>%
mutate(Letters = as.character(Letters)) %>%
group_by(Years) %>%
summarise(Letters = "All",
Letter_Mean = mean(Letter_Mean)) %>%
bind_rows(df,.) %>%
ungroup %>%
mutate(Letters = factor(Letters, levels=c(levels(as.factor(df$Letters)), "All")))
meanallletters %>%
ggplot(aes(x=Years, y=Letter_Mean, color = Letters)) +
geom_line() +
geom_point()
Created on 2023-02-12 by the reprex package (v2.0.1)

R: How do you order error plots by variance?

Is it possible to order error plots in R by their variance? So that they are from greatest variance to least?
Code:
library(ggplot2)
df <- ToothGrowth
df$dose <- as.factor(df$dose)
head(df, 3)
library(dplyr)
df.summary <- df %>%
group_by(dose) %>%
summarise(
sd = sd(len, na.rm = TRUE),
len = mean(len)
)
df.summary
f <- ggplot(
df.summary,
aes(x = dose, y = len, ymin = len-sd, ymax = len+sd)
)
f + geom_pointrange()
# Standard error bars
f + geom_errorbar(width = 0.2) +
geom_point(size = 1.5)
Any help at all would be greatly appreciated!

Continue the pipe coercing the sd to ordered factor with the order given by the numeric sd. Then plot as in the question. All that needs to change is the mutate below.
df.summary <- df %>%
group_by(dose) %>%
summarise(
sd = sd(len, na.rm = TRUE),
len = mean(len)
) %>%
mutate(i = order(sd, decreasing = TRUE),
dose = ordered(dose, levels = dose[i])) %>%
select(-i)

How to fix "Breaks and labels are different lengths" when using ggplot2 for faceted plots?

Consider the following example:
library(ggplot2)
library(RColorBrewer)
library(magrittr)
library(dplyr)
df <- data.frame(x = seq(0, 70, 0.5),
y = seq(0, 70, 0.5),
val = rnorm(141),
group =rep(1:3,47))
max_val_plot <- df$val %>% max() %>% round(0)
min_val_plot <--df$val %>% min() %>% round(0)
breaks_plot <-seq(min_val_plot,max_val_plot,0.1)
n <- breaks_plot %>% length()
getPalette <- colorRampPalette(brewer.pal(9, "RdBu"))
colors_plot <-getPalette(n)
labels_plot <- breaks_plot %>%
as.character()
labels_plot[!1:0]=' '
df %>%
ungroup() %>%
ggplot(aes(x=x,y=y,fill=val))+
geom_raster()+
facet_grid(~group)+
theme_bw(base_size = 20)+
scale_fill_stepsn(
name = "",
colours = colors_plot,
breaks = breaks_plot,
labels = labels_plot
)
Although labels and breaks are of equal length, the error "Breaks and labels are different lengths" is returned due to the presence of multiple groups and the faceted function in plotting code.
How can I fix this?
Thanks!

One option to fix your issue would be to pass a function to the labels argument of scale_fill_xxx to create the labels on the fly instead of providing the labels as a vector.
library(ggplot2)
library(RColorBrewer)
library(magrittr)
library(dplyr)
set.seed(123)
df %>%
ungroup() %>%
ggplot(aes(x = x, y = y, fill = val)) +
geom_raster() +
facet_grid(~group) +
theme_bw(base_size = 20) +
scale_fill_stepsn(
name = "",
colours = colors_plot,
breaks = breaks_plot,
labels = function(x) { x <- as.character(x); x[!1:0] <- " "; x}
)

R: ggplot, filling density plot with different colors around the mean value

Problem
I am trying to fill the density plot with different colors around the mean. For example, the left part of density plot from a vertical line of the mean will be filled with blue, and the right part with red. I tried the below method with three facets. Within each facet, by setting fill = color, it separates the plot into two density plots around the mean. I want to have only one plot filled with two colors. Can I get some help here?
Sample Data and Current Method
library(tidyverse)
library(tibble)
library(data.table)
rename <- dplyr::rename
select <- dplyr::select
set.seed(10002)
id <- sample(1:1000, 1000, replace=F)
set.seed(10003)
group <- sample(c('A','B','C'), 1000, replace=T)
set.seed(10001)
value1 <- sample(1:300, 1000, replace=T)
set.seed(10004)
value2 <- sample(1:300, 1000, replace=T)
sample <-
data.frame(id, group, value1, value2)
mu <-
sample %>%
gather(state, value, -group, -id) %>%
ddply(c("group"), summarise, grp.mean=mean(value))
p <-
sample %>%
gather(state, value, -group, -id) %>%
left_join(
mu,
by = 'group'
) %>%
distinct %>%
mutate(color = ifelse(value <= grp.mean, 'leq', 'greater')) %>%
select(-grp.mean) %>%
ggplot(aes(x = value, fill = color)) +
geom_density(alpha=0.4) +
geom_vline(
data = mu,
aes(xintercept = grp.mean, color = group),
linetype = "dashed"
) +
facet_wrap(.~group)

Use scale_x_continuous with labeller function that also takes a data frame as an argument as well as default breaks

Here's a code block:
# scale the log of price per group (cut)
my_diamonds <- diamonds %>%
mutate(log_price = log(price)) %>%
group_by(cut) %>%
mutate(scaled_log_price = scale(log_price) %>% as.numeric) %>% # scale within each group as opposed to overall
nest() %>%
mutate(mean_log_price = map_dbl(data, ~ .x$log_price %>% mean)) %>%
mutate(sd_log_price = map_dbl(data, ~ .x$log_price %>% sd)) %>%
unnest %>%
select(cut, price, price_scaled:sd_log_price) %>%
ungroup
# for each cut, find the back transformed actual values (exp) of each unit of zscore between -3:3
for (i in -3:3) {
my_diamonds <- my_diamonds %>%
mutate(!! paste0('mean_', ifelse(i < 0 , 'less_', 'plus_'), abs(i), 'z') := map2(.x = mean_log_price, .y = sd_log_price, ~ (.x + (i * .y)) %>% exp) %>% unlist)
}
my_diamonds_split <- my_diamonds %>% group_split(cut)
split_names <- my_diamonds %>% mutate(cut = as.character(cut)) %>% group_keys(cut) %>% pull(cut)
names(my_diamonds_split) <- split_names
I now have a variable my_diamonds_split that is a list of data frames. I would like to loop over these data frames and each time create a new ggplot.
I can use a custom labeller function with a single df, but I don't know how to do this within a loop:
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(ex_df$price) * x + mean(ex_df$price)))
}
ex_df <- my_diamonds_split$Ideal
ex_df %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller, limits = c(-3, 3))
This creates a plot for the 'Ideal' cut of diamonds. I also get two data points on the x axis, the zscore values at -2, 0 and 2 as well as the raw dollar values of 3.8K, 3.9K and 11.8K.
When I define the labeller function, I must specify the df to scale with. Tried instead with placing the dot instead of my_df, hoping that on each iteration ggplot would get the value of the df on any iteration:
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(.$price) * x + mean(.$price)))
}
ex_df <- my_diamonds_split$Ideal
ex_df %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller, limits = c(-3, 3))
Returns:
Error in is.data.frame(x) : object '.' not found
I then tried writing the function to accept an argument for the df to scale with:
labeller <- function(x, df) {
paste0(x,"\n", scales::dollar(sd(df$price) * x + mean(df$price)))
}
ex_df <- my_diamonds_split$Ideal
ex_df %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller(df = ex_df), limits = c(-3, 3)) # because when it comes to running in real life, I will try something like labeller(df = my_diamonds_split[[i]])
Error in paste0(x, "\n", scales::dollar(sd(df$price) * x + mean(df$price))) :
argument "x" is missing, with no default
Bearing in mind that the scaling must be done per iteration, how could I loop over my_diamonds_split, and on each iteration generate a ggplot per above?
labeller <- function(x) {
# how can I make df variable
paste0(x,"\n", scales::dollar(sd(df$price) * x + mean(df$price)))
}
for (i in split_names) {
my_diamonds_split[[i]] %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller, # <--- here, labeller must be defined with df$price except that will difer on each iteration
limits = c(-3, 3))
}

There's a hacky way to get this result in facets. Basically, after converting to z scores, you add different amounts (say, multiples of 1000) to each group's z scores. Then you set all the breaks to this collection of points and label them with pre-calculated labels.
library(ggplot2)
library(dplyr)
f <- function(x) {
y <- diamonds$price[diamonds$cut == x]
paste(seq(-3, 3), scales::dollar(round(mean(y) + seq(-3, 3) * sd(y))), sep = "\n")
}
breaks <- as.vector(sapply(levels(diamonds$cut), f))
diamonds %>%
group_by(cut) %>%
mutate(z = scale(price) + 3 + 1000 * as.numeric(cut)) %>%
ggplot(aes(z)) +
geom_point(aes(x = z - 2, y = 1), alpha = 0) +
geom_density() +
scale_x_continuous(breaks = as.vector(sapply(1:5 * 1000, "+", 0:6)),
labels = breaks) +
facet_wrap(vars(cut), scales = "free_x") +
theme(text = element_text(size = 16),
axis.text.x = element_text(size = 6))
You would have to increase the plot size to make the dollar values more visible of course.
Created on 2020-08-04 by the reprex package (v0.3.0)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Plot pvalue information for mean comparisons by grouping variable - r

Related

Error in is.finite(x); need to add an additional line to a line chart (dplyr)

R: How do you order error plots by variance?

How to fix "Breaks and labels are different lengths" when using ggplot2 for faceted plots?

R: ggplot, filling density plot with different colors around the mean value

Use scale_x_continuous with labeller function that also takes a data frame as an argument as well as default breaks

Categories

Resources