how to set dual Y axis in geom_bar plot in ggplot2? - r

I'd like to draw bar plot like this but in dual Y axis
(https://i.stack.imgur.com/ldMx0.jpg)
the first three indexs range from 0 to 1,
so I want the left y-axis (corresponding to NSE, KGE, VE) to range from 0 to 1,
and the right y-axis (corresponding to PBIAS) to range from -15 to 5.
the following is my data and code:
library("ggplot2")
## data
data <- data.frame(
value=c(0.82,0.87,0.65,-3.39,0.75,0.82,0.63,1.14,0.85,0.87,0.67,-7.03),
sd=c(0.003,0.047,0.006,4.8,0.003,0.028,0.006,4.77,0.004,0.057,0.014,4.85),
index=c("NSE","KGE","VE","PBIAS","NSE","KGE","VE","PBIAS","NSE","KGE","VE","PBIAS"),
period=c("all","all","all","all","calibration","calibration","calibration","calibration","validation","validation","validation","validation")
)
## fix index sequence
data$index <- factor(data$index, levels = c('NSE','KGE','VE',"PBIAS"))
data$period <- factor(data$period, levels = c('all','calibration', 'validation'))
## bar plot
ggplot(data, aes(x=index, y=value, fill=period))+
geom_bar(position="dodge", stat="identity")+
geom_errorbar(aes(ymin=value-sd, ymax=value+sd),
position = position_dodge(0.9), width=0.2 ,alpha=0.5, size=1)+
theme_bw()
I try to scale and shift the second y-axis,
but PBIAS bar plot was removed because of out of scale limit as follow:
(https://i.stack.imgur.com/n6Jfm.jpg)
the following is my code with dual y axis:
## bar plot (scale and shift the second y-axis with slope/intercept in 20/-15)
ggplot(data, aes(x=index, y=value, fill=period))+
geom_bar(position="dodge", stat="identity")+
geom_errorbar(aes(ymin=value-sd, ymax=value+sd),
position = position_dodge(0.9), width=0.2 ,alpha=0.5, size=1)+
theme_bw()+
scale_y_continuous(limits = c(0,1), name = "value", sec.axis = sec_axis(~ 20*.- 15, name="value"))
Any advice for move bar_plot or other solution?

Taking a different approach, instead of using a dual axis one option would be to make two separate plots and glue them together using patchwork. IMHO that is much easier than fiddling around with the rescaling the data (that's the step you missed, i.e. if you want to have a secondary axis you also have to rescale the data) and makes it clearer that the indices are measured on a different scale:
library(ggplot2)
library(patchwork)
data$facet <- data$index %in% "PBIAS"
plot_fun <- function(.data) {
ggplot(.data, aes(x = index, y = value, fill = period)) +
geom_bar(position = "dodge", stat = "identity") +
geom_errorbar(aes(ymin = value - sd, ymax = value + sd),
position = position_dodge(0.9), width = 0.2, alpha = 0.5, size = 1
) +
theme_bw()
}
p1 <- subset(data, !facet) |> plot_fun() + scale_y_continuous(limits = c(0, 1))
p2 <- subset(data, facet) |> plot_fun() + scale_y_continuous(limits = c(-15, 15), position = "right")
p1 + p2 +
plot_layout(guides = "collect", width = c(3, 1))
A second but similar option would be to use ggh4x which via ggh4x::facetted_pos_scales allows to set the limits for facet panels individually. One drawback, the panels have the same width. (I failed in making this approach work with facet_grid and space="free")
library(ggplot2)
library(ggh4x)
data$facet <- data$index %in% "PBIAS"
ggplot(data, aes(x = index, y = value, fill = period)) +
geom_bar(position = "dodge", stat = "identity") +
geom_errorbar(aes(ymin = value - sd, ymax = value + sd),
position = position_dodge(0.9), width = 0.2, alpha = 0.5, size = 1
) +
facet_wrap(~facet, scales = "free") +
facetted_pos_scales(
y = list(
facet ~ scale_y_continuous(limits = c(-15, 15), position = "right"),
!facet ~ scale_y_continuous(limits = c(0, 1), position = "left")
)
) +
theme_bw() +
theme(strip.text.x = element_blank())

Related

Log10 Y-Axis starting from 0

I created a bar plot to show differences in water accumulation on different sites and layers. Because one value is way higher than the other ones I want to set the y-axis on log10 scale. It all works but the result looks rather unintuitive. is it possible to set the limit of the y-axis to 0 so the bar with the value 0.2 is not going downwards?
Here is the code I used:
p2 <- ggplot(data_summary2, aes(x= Site, y= small_mean, fill= Depth, Color= Depth))+
geom_bar(stat = "identity", position = "dodge", alpha=1)+
geom_errorbar(aes(ymin= small_mean - sd, ymax= small_mean + sd),
position = position_dodge(0.9),width=0.25, alpha= 0.6)+
scale_fill_brewer(palette = "Greens")+
geom_text(aes(label=small_mean),position=position_dodge(width=0.9), vjust=-0.25, hjust= -0.1, size= 3)+
#geom_text(aes(label= Tukey), position= position_dodge(0.9), size=3, vjust=-0.8, hjust= -0.5, color= "gray25")+
theme_bw()+
theme(legend.position = c(0.2, 0.9),legend.direction = "horizontal")+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
labs(fill="Depth [cm]")+
theme(axis.text.x = element_text(angle = 25, size =9, vjust = 1, hjust=1))+
scale_x_discrete(labels= c( "Site 1\n(Hibiscus tillaceus)","Site 2 \n(Ceiba pentandra)","Site 3 \n(Clitoria fairchildiana)","Site 4 \n(Pachira aquatica)"))+
#theme(legend.position = c(0.85, 0.7))+
labs(x= "Sites\n(Type of Tree)", y= "µg Deep-Water/ g rhizosphere soil", title = "Average microbial Deep-water incorporation per Site", subtitle = "Changes over Time and Depth")+
facet_grid(.~Time, scale = "free")
p2 + scale_y_continuous(trans = "log10")
This is what the plot looks like:
On a log scale there is no 0, therefore the only sensible place for bars to start from is y = 10^0 or 1.
However you can create a pseudolog scale using scales::pseudo_log_trans to get 0 included on the axis so all the bars go the same direciton. I'm borrowing from this answer. NOTE it's important to add 0 to the breaks to make it clear that this is a pseudolog scale. Compare the two plots below:
library(tidyverse)
library(scales)
# make up data with distribution positive values above and below 1
d <- tibble(grp = LETTERS[1:5],
val = 10^(-2:2))
# normal plot with true log scale doesn't contain 0
d %>%
ggplot(aes(x = grp, y = val, fill = grp)) +
geom_col() +
ggtitle("On True Log Scale Bars Start at y = 1") +
scale_y_log10() # or if you prefer: scale_y_continuous(trans = "log10")
# set range of 'linear' portion of pseudolog scale
sigma <- min(d$val)
# plot on pseudolog to get all bars to extend to 0
d %>%
ggplot(aes(x = grp, y = val, fill = grp)) +
geom_col() +
ggtitle("On Pseudolog Scale Bars Start at y = 0") +
scale_y_continuous(
trans = pseudo_log_trans(base = 10, sigma = sigma),
breaks = c(0, 10^(-2:2)),
labels = label_number(accuracy = 0.01)
)
Created on 2021-12-30 by the reprex package (v2.0.1)

Overlaying histogram with different y-scales

I'm struggling with the following issue:
I want to plot two histograms, but since the statistics of one of the two classes is much less than the other I need to add a second y-axis to allow a direct comparison of the values.
I report below the code I used at the moment and the result.
Thank you in advance!
ggplot(data,aes(x= x ,group=class,fill=class)) + geom_histogram(position="identity",
alpha=0.5, bins = 20)+ theme_bw()
Consider the following situation where you have 800 versus 200 observations:
library(ggplot2)
df <- data.frame(
x = rnorm(1000, rep(c(1, 2), c(800, 200))),
class = rep(c("A", "B"), c(800, 200))
)
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
# Note that y = stat(count) is the default behaviour
mapping = aes(y = stat(count)))
You could scale the counts for each group to a maximum of 1 by using y = stat(ncount):
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
mapping = aes(y = stat(ncount)))
Alternatively, you can set y = stat(density) to have the total area integrate to 1.
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
mapping = aes(y = stat(density)))
Note that after ggplot 3.3.0 stat() probably will get replaced by after_stat().
How about comparing them side by side with facets?
ggplot(data,aes(x= x ,group=class,fill=class)) +
geom_histogram(position="identity",
alpha=0.5,
bins = 20) +
theme_bw() +
facet_wrap(~class, scales = "free_y")

Preventing wrong density plots when coloring histograms according to groups

based on some dummy data I created a histogram with desity plot
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
a <- ggplot(wdata, aes(x = weight))
a + geom_histogram(aes(y = ..density..,
# color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
The histogram of weight shall be colored corresponding to sex, so I use aes(y = ..density.., color = sex) for geom_histogram():
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
As I want it to, the density plot stays the same (overall for both groups), but the histograms jump scale up (and seem to be treated individually now):
How do I prevent this from happening? I need individually colored histogram bars but a joint density plot for all coloring groups.
P.S.
Using aes(color = sex) for geom_density() gets everything back to original scales - but I don't want individual density plots (like below):
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
EDIT:
As it has been suggested, dividing by the number of groups in geom_histogram()'s aesthetics with y = ..density../2 may approximate the solution. Nevertheless, this only works with symmetric distributions like in the first output below:
a + geom_histogram(aes(y = ..density../2,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
which yields
Less symmetric distributions, however, may cause trouble using this approach. See those below, where for 5 groups, y = ..density../5 was used. First original, then manipulation (with position = "stack"):
Since the distribution is heavy on the left, dividing by 5 underestimates on the left and overestimates on the right.
EDIT 2: SOLUTION
As suggested by Andrew, the below (complete) code solves the problem:
library(ggplot2)
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each = 200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
binwidth <- 0.25
a <- ggplot(wdata,
aes(x = weight,
# Pass binwidth to aes() so it will be found in
# geom_histogram()'s aes() later
binwidth = binwidth))
# Basic plot w/o colouring according to 'sex'
a + geom_histogram(aes(y = ..density..),
binwidth = binwidth,
colour = "black",
fill = "white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25))
# Plot w/ colouring according to 'sex'
a + geom_histogram(aes(x = weight,
# binwidth will only be found if passed to
# ggplot()'s aes() (as above)
y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25)) +
guides(color = FALSE)
Note:
binwidth = binwidth needed to be passed to ggplot()'s aes(), otherwise the pre-specified binwidth would not be found by geom_histogram()'s aes(). Further, position = "stack" is specified, so that both versions of the histogram are comparable. Plots for dummy data and the more complex distribution below:
Solved - Thanks for your help!
I don't think you can do it using y=..density.., but you can recreate the same thing like this...
binwidth <- 0.25 #easiest to set this manually so that you know what it is
a + geom_histogram(aes(y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "identity") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))

Combine/Overlay boxplot with histogram in R

I need to combine the boxplot with the histogram using ggplot2. So far I have this code.
library(dplyr)
library(ggplot2)
data(mtcars)
dat <- mtcars %>% dplyr::select(carb, wt) %>%
dplyr::group_by(carb) %>% dplyr::mutate(mean_wt = mean(wt), carb_count = n())
plot<-ggplot(data=mtcars, aes(x=carb, y=..count..)) +
geom_histogram(alpha=0.3, position="identity", lwd=0.2,binwidth=1)+
theme_bw()+
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.7))+
geom_text(data=aggregate(mean_wt~carb+carb_count,dat,mean), aes(carb, carb_count+0.5, label=round(mean_wt,1)), color="black")
plot + geom_boxplot(data = mtcars,mapping = aes(x = carb, y = 6*wt,group=carb),
color="black", fill="red", alpha=0.2,width=0.1,outlier.shape = NA)+
scale_y_continuous(name = "Count",
sec.axis = sec_axis(~./6, name = "Weight"))
This results in
However, I dont want the secondary y axis to be the same length of primary y axis. I want the secondary y axis to be smaller and on the top right corner only. Lets say secondary y axis should scale between 20-30 of primary y axis and the box plot should also scale with the axis.
Can anyone help me with this?
Here's one approach, where I adjusted the secondary axis formula and tweaked the way it's labeled. (EDIT: adjusted to make boxplots bigger, per OP comment.)
plot + geom_boxplot(data = mtcars,
# Adj'd scaling so each 1 wt = 2.5 count
aes(x = carb, y = (wt*2.5)+10,group=carb),
color="black", fill="red", alpha=0.2,
width=0.5, outlier.shape = NA)+ # Wider width
scale_y_continuous(name = "Count", # Adj'd labels to limit left to 0, 5, 10
breaks = 5*0:5, labels = c(5*0:2, rep("", 3)),
# Adj'd scaling to match the wt scaling
sec.axis = sec_axis(~(.-10)/2.5, name = "Weight",
breaks = c(0:5))) +
theme(axis.title.y.left = element_text(hjust = 0.15, vjust = 1),
axis.title.y.right = element_text(hjust = 0.15, vjust = 1))
You might also consider an alternative using the patchwork package, coincidentally written by the same developer who implemented secondary scales in ggplot2...
# Alternative solution using patchwork
library(patchwork)
plot2 <- ggplot(data=mtcars, aes(x=carb, y=..count..)) +
theme_bw()+
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.7))+
geom_boxplot(data = mtcars,
aes(x = carb, y = wt, group=carb),
color="black", fill="red", alpha=0.2,width=0.1,outlier.shape = NA) +
scale_y_continuous(name = "Weight") +
scale_x_continuous(labels = NULL, name = NULL,
expand = c(0, 0.85), breaks = c(2,4,6,8))
plot2 + plot + plot_layout(nrow = 2, heights = c(1,3)) +
labs(x=NULL)

Grouped scatterplot over grouped boxplot in R using ggplot2

I am creating a grouped boxplot with a scatterplot overlay using ggplot2. I would like to group each scatterplot datapoint with the grouped boxplot that it corresponds to.
However, I'd also like the scatterplot points to be different symbols. I seem to be able to get my scatterplot points to group with my grouped boxplots OR get my scatterplot points to be different symbols... but not both simultaneously. Below is some example code to illustrate what's happening:
library(scales)
library(ggplot2)
# Generates Data frame to plot
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24))
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20))
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5))
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900),
rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000))
value <- sqrt(value*value)
Tdata <- cbind(Gene, Clone, variable)
Tdata <- data.frame(Tdata)
Tdata <- cbind(Tdata,value)
# Creates the Plot of All Data
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape
# and I'd like them to each have different shapes.
ln_clr <- "black"
bk_clr <- "white"
point_shapes <- c(0,15,1,16,2,17)
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4")
lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25,
size = 0.7, coef = 4) +
geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3,
alpha = 1, colour = ln_clr) +
geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7,
pch=15)
lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
labels = trans_format("log10", math_format(10^.x)))
ggsave("Scatter Grouped-Wrong Symbols.png")
#*************************************************************************************************************************************
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25,
size = 0.7, coef = 4) +
geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3,
alpha = 1, colour = ln_clr) +
geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7,
aes(shape=Clone))
lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
labels = trans_format("log10", math_format(10^.x)))
ggsave("Scatter Ungrouped-Right Symbols.png")
If anyone has any suggestions I'd really appreciate it.
Thank you
Nathan
To get the boxplots to appear, the shape aesthetic needs to be inside geom_point, rather than in the main call to ggplot. The reason for this is that when the shape aesthetic is in the main ggplot call, it applies to all the geoms, including geom_boxplot. However, applying a shape=Clone aesthetic causes geom_boxplot to create a separate boxplot for each level of Clone. Since there's only one row of data for each combination of variable and Clone, no boxplot is produced.
That the shape aesthetic affects geom_boxplot seems counterintuitive to me, but maybe there's a reason for it that I'm not aware of. In any case, moving the shape aesthetic into geom_point solves the problem by applying the shape aesthetic only to geom_point.
Then, to get the points to appear with the correct boxplot, we need to group by Gene. I also added theme_classic to make it easier to see the plot (although it's still very busy):
ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', width=0.25, size=0.7, coef=4, position=position_dodge(0.85)) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, position=position_dodge(0.85)) +
geom_point(position=position_jitterdodge(dodge.width=0.85), size=1.8, alpha=0.7,
aes(shape=Clone, group=Gene)) +
scale_fill_manual(values=blue_cols) + labs(y="Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x))) +
theme_classic()
I think the plot would be easier to understand if you use faceting for Gene and the x-axis for variable. Putting time on the x-axis seems more intuitive, while using facetting frees up the color aesthetic for the points. With six different clones, it's still difficult (for me at least) to differentiate the point markers, but this looks cleaner to me than the previous version.
library(dplyr)
ggplot(Tdata %>% mutate(Gene=gsub("Gene","Gene ", Gene)),
aes(x=gsub("Day","",variable), y=value)) +
stat_boxplot(geom='errorbar', width=0.25, size=0.7, coef=4) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, width=0.5) +
geom_point(aes(fill=Clone), position=position_jitter(0.2), size=1.5, alpha=0.7, shape=21) +
theme_classic() +
facet_grid(. ~ Gene) +
labs(y = "Fold Change", x="Day") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x)))
If you really need to keep the points, maybe it would be better to separate the boxplots and points with some manual dodging:
set.seed(10)
ggplot(Tdata %>% mutate(Day=as.numeric(substr(variable,4,5)),
Gene = gsub("Gene","Gene ", Gene)),
aes(x=Day - 2, y=value, group=Day)) +
stat_boxplot(geom ='errorbar', width=0.5, size=0.5, coef=4) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, width=4) +
geom_point(aes(x=Day + 2, fill=Clone), size=1.5, alpha=0.7, shape=21,
position=position_jitter(width=1, height=0)) +
theme_classic() +
facet_grid(. ~ Gene) +
labs(y="Fold Change", x="Day") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x)))
One more thing: For future reference, you can simplify your data creation code:
Gene = rep(paste0("Gene",LETTERS[1:5]), each=24)
Clone = rep(paste0("D",1:6), 20)
variable = rep(rep(paste0("Day", seq(10,40,10)), each=6), 5)
value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24),
sd=rep(c(0.5,8,900,9000,3000), each=24))
Tdata = data.frame(Gene, Clone, variable, value)

Resources