Log10 Y-Axis starting from 0 - r

I created a bar plot to show differences in water accumulation on different sites and layers. Because one value is way higher than the other ones I want to set the y-axis on log10 scale. It all works but the result looks rather unintuitive. is it possible to set the limit of the y-axis to 0 so the bar with the value 0.2 is not going downwards?
Here is the code I used:
p2 <- ggplot(data_summary2, aes(x= Site, y= small_mean, fill= Depth, Color= Depth))+
geom_bar(stat = "identity", position = "dodge", alpha=1)+
geom_errorbar(aes(ymin= small_mean - sd, ymax= small_mean + sd),
position = position_dodge(0.9),width=0.25, alpha= 0.6)+
scale_fill_brewer(palette = "Greens")+
geom_text(aes(label=small_mean),position=position_dodge(width=0.9), vjust=-0.25, hjust= -0.1, size= 3)+
#geom_text(aes(label= Tukey), position= position_dodge(0.9), size=3, vjust=-0.8, hjust= -0.5, color= "gray25")+
theme_bw()+
theme(legend.position = c(0.2, 0.9),legend.direction = "horizontal")+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
labs(fill="Depth [cm]")+
theme(axis.text.x = element_text(angle = 25, size =9, vjust = 1, hjust=1))+
scale_x_discrete(labels= c( "Site 1\n(Hibiscus tillaceus)","Site 2 \n(Ceiba pentandra)","Site 3 \n(Clitoria fairchildiana)","Site 4 \n(Pachira aquatica)"))+
#theme(legend.position = c(0.85, 0.7))+
labs(x= "Sites\n(Type of Tree)", y= "µg Deep-Water/ g rhizosphere soil", title = "Average microbial Deep-water incorporation per Site", subtitle = "Changes over Time and Depth")+
facet_grid(.~Time, scale = "free")
p2 + scale_y_continuous(trans = "log10")
This is what the plot looks like:

On a log scale there is no 0, therefore the only sensible place for bars to start from is y = 10^0 or 1.
However you can create a pseudolog scale using scales::pseudo_log_trans to get 0 included on the axis so all the bars go the same direciton. I'm borrowing from this answer. NOTE it's important to add 0 to the breaks to make it clear that this is a pseudolog scale. Compare the two plots below:
library(tidyverse)
library(scales)
# make up data with distribution positive values above and below 1
d <- tibble(grp = LETTERS[1:5],
val = 10^(-2:2))
# normal plot with true log scale doesn't contain 0
d %>%
ggplot(aes(x = grp, y = val, fill = grp)) +
geom_col() +
ggtitle("On True Log Scale Bars Start at y = 1") +
scale_y_log10() # or if you prefer: scale_y_continuous(trans = "log10")
# set range of 'linear' portion of pseudolog scale
sigma <- min(d$val)
# plot on pseudolog to get all bars to extend to 0
d %>%
ggplot(aes(x = grp, y = val, fill = grp)) +
geom_col() +
ggtitle("On Pseudolog Scale Bars Start at y = 0") +
scale_y_continuous(
trans = pseudo_log_trans(base = 10, sigma = sigma),
breaks = c(0, 10^(-2:2)),
labels = label_number(accuracy = 0.01)
)
Created on 2021-12-30 by the reprex package (v2.0.1)

Related

how to set dual Y axis in geom_bar plot in ggplot2?

I'd like to draw bar plot like this but in dual Y axis
(https://i.stack.imgur.com/ldMx0.jpg)
the first three indexs range from 0 to 1,
so I want the left y-axis (corresponding to NSE, KGE, VE) to range from 0 to 1,
and the right y-axis (corresponding to PBIAS) to range from -15 to 5.
the following is my data and code:
library("ggplot2")
## data
data <- data.frame(
value=c(0.82,0.87,0.65,-3.39,0.75,0.82,0.63,1.14,0.85,0.87,0.67,-7.03),
sd=c(0.003,0.047,0.006,4.8,0.003,0.028,0.006,4.77,0.004,0.057,0.014,4.85),
index=c("NSE","KGE","VE","PBIAS","NSE","KGE","VE","PBIAS","NSE","KGE","VE","PBIAS"),
period=c("all","all","all","all","calibration","calibration","calibration","calibration","validation","validation","validation","validation")
)
## fix index sequence
data$index <- factor(data$index, levels = c('NSE','KGE','VE',"PBIAS"))
data$period <- factor(data$period, levels = c('all','calibration', 'validation'))
## bar plot
ggplot(data, aes(x=index, y=value, fill=period))+
geom_bar(position="dodge", stat="identity")+
geom_errorbar(aes(ymin=value-sd, ymax=value+sd),
position = position_dodge(0.9), width=0.2 ,alpha=0.5, size=1)+
theme_bw()
I try to scale and shift the second y-axis,
but PBIAS bar plot was removed because of out of scale limit as follow:
(https://i.stack.imgur.com/n6Jfm.jpg)
the following is my code with dual y axis:
## bar plot (scale and shift the second y-axis with slope/intercept in 20/-15)
ggplot(data, aes(x=index, y=value, fill=period))+
geom_bar(position="dodge", stat="identity")+
geom_errorbar(aes(ymin=value-sd, ymax=value+sd),
position = position_dodge(0.9), width=0.2 ,alpha=0.5, size=1)+
theme_bw()+
scale_y_continuous(limits = c(0,1), name = "value", sec.axis = sec_axis(~ 20*.- 15, name="value"))
Any advice for move bar_plot or other solution?
Taking a different approach, instead of using a dual axis one option would be to make two separate plots and glue them together using patchwork. IMHO that is much easier than fiddling around with the rescaling the data (that's the step you missed, i.e. if you want to have a secondary axis you also have to rescale the data) and makes it clearer that the indices are measured on a different scale:
library(ggplot2)
library(patchwork)
data$facet <- data$index %in% "PBIAS"
plot_fun <- function(.data) {
ggplot(.data, aes(x = index, y = value, fill = period)) +
geom_bar(position = "dodge", stat = "identity") +
geom_errorbar(aes(ymin = value - sd, ymax = value + sd),
position = position_dodge(0.9), width = 0.2, alpha = 0.5, size = 1
) +
theme_bw()
}
p1 <- subset(data, !facet) |> plot_fun() + scale_y_continuous(limits = c(0, 1))
p2 <- subset(data, facet) |> plot_fun() + scale_y_continuous(limits = c(-15, 15), position = "right")
p1 + p2 +
plot_layout(guides = "collect", width = c(3, 1))
A second but similar option would be to use ggh4x which via ggh4x::facetted_pos_scales allows to set the limits for facet panels individually. One drawback, the panels have the same width. (I failed in making this approach work with facet_grid and space="free")
library(ggplot2)
library(ggh4x)
data$facet <- data$index %in% "PBIAS"
ggplot(data, aes(x = index, y = value, fill = period)) +
geom_bar(position = "dodge", stat = "identity") +
geom_errorbar(aes(ymin = value - sd, ymax = value + sd),
position = position_dodge(0.9), width = 0.2, alpha = 0.5, size = 1
) +
facet_wrap(~facet, scales = "free") +
facetted_pos_scales(
y = list(
facet ~ scale_y_continuous(limits = c(-15, 15), position = "right"),
!facet ~ scale_y_continuous(limits = c(0, 1), position = "left")
)
) +
theme_bw() +
theme(strip.text.x = element_blank())

Raincloud plot - histogram?

I would like to create a raincloud plot. I have successfully done it. But I would like to know if instead of the density curve, I can put a histogram (it's better for my dataset).
This is my code if it can be usefull
ATSC <- ggplot(data = data, aes(y = atsc, x = numlecteur, fill = numlecteur)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .5) +
geom_point(aes(y = atsc, color = numlecteur), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
geom_point(data = sumld, aes(x = numlecteur, y = mean), position = position_nudge(x = 0.25), size = 2.5) +
geom_errorbar(data = sumld, aes(ymin = lower, ymax = upper, y = mean), position = position_nudge(x = 0.25), width = 0) +
guides(fill = FALSE) +
guides(color = FALSE) +
scale_color_brewer(palette = "Spectral") +
scale_y_continuous(breaks=c(0,2,4,6,8,10), labels=c("0","2","4","6","8","10"))+
scale_fill_brewer(palette = "Spectral") +
coord_flip() +
theme_bw() +
expand_limits(y=c(0, 10))+
xlab("Lecteur") + ylab("Age total sans check")+
raincloud_theme
I think we can maybe put the "geom_histogram()" but it doesn't work
Thank you in advance for your help !
(sources : https://peerj.com/preprints/27137v1.pdf
https://neuroconscience.wordpress.com/2018/03/15/introducing-raincloud-plots/)
This is actually not quite easy. There are a few challenges.
geom_histogram is "horizontal by nature", and the custom geom_flat_violin is vertical - as are boxplots. Therefore the final call to coord_flip in that tutorial. In order to combine both, I think best is switch x and y, forget about coord_flip, and use ggstance::geom_boxploth instead.
Creating separate histograms for each category is another challenge. My workaround to create facets and "merge them together".
The histograms are scaled way bigger than the width of the points/boxplots. My workaround scale via after_stat function.
How to nudge the histograms to the right position above Boxplot and points - I am converting the discrete scale to a continuous by mapping a constant numeric to the global y aesthetic, and then using the facet labels for discrete labels.
library(tidyverse)
my_data<-read.csv("https://data.bris.ac.uk/datasets/112g2vkxomjoo1l26vjmvnlexj/2016.08.14_AnxietyPaper_Data%20Sheet.csv")
my_datal <-
my_data %>%
pivot_longer(cols = c("AngerUH", "DisgustUH", "FearUH", "HappyUH"), names_to = "EmotionCondition", values_to = "Sensitivity")
# use y = -... to position boxplot and jitterplot below the histogram
ggplot(data = my_datal, aes(x = Sensitivity, y = -.5, fill = EmotionCondition)) +
# after_stat for scaling
geom_histogram(aes(y = after_stat(count/100)), binwidth = .05, alpha = .8) +
# from ggstance
ggstance::geom_boxploth( width = .1, outlier.shape = NA, alpha = 0.5) +
geom_point(aes(color = EmotionCondition), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
# merged those calls to one
guides(fill = FALSE, color = FALSE) +
# scale_y_continuous(breaks = 1, labels = unique(my_datal$EmotionCondition))
scale_color_brewer(palette = "Spectral") +
scale_fill_brewer(palette = "Spectral") +
# facetting, because each histogram needs its own y
# strip position = left to fake discrete labels in continuous scale
facet_wrap(~EmotionCondition, nrow = 4, scales = "free_y" , strip.position = "left") +
# remove all continuous labels from the y axis
theme(axis.title.y = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Created on 2021-04-15 by the reprex package (v1.0.0)

Scale density curve made with geom_density to similar height of geom_histogram?

I need to align the density line with the height of geom_histogram and keep count values on the y axis instead of density.
I have these 2 versions:
# Creating dataframe
library(ggplot2)
values <- c(rep(0,2), rep(2,3), rep(3,3), rep(4,3), 5, rep(6,2), 8, 9, rep(11,2))
data_to_plot <- as.data.frame(values)
# Option 1 ( y scale shows frequency, but geom_density line and geom_histogram are not matching )
ggplot(data_to_plot, aes(x = values)) +
geom_histogram(aes(y = ..count..), binwidth = 1, colour= "black", fill = "white") +
geom_density(aes(y=..count..), fill="blue", alpha = .2)+
scale_x_continuous(breaks = seq(0, max(data_to_plot$values), 1))
y scale shows frequency, but geom_density line and geom_histogram are not matching
# Option 2 (geom_density line and geom_histogram are matching, but y scale density = 1)
ggplot(data_to_plot, aes(x = values)) +
geom_histogram(aes(y = after_stat(ndensity)), binwidth = 1, colour= "black", fill = "white") +
geom_density(aes(y = after_stat(ndensity)), fill="blue", alpha = .2)+
scale_x_continuous(breaks = seq(0, max(data_to_plot$values), 1))
geom_density line and geom_histogram are matching, but y scale density = 1
What I need is plot from Option 2, but Y scale from Option 1. I can get it by adding (aes(y=1.25*..count..) for this particular data, but my data is not static and this will not work for another dataset (just modify values to test):
# Option 3 (with coefficient in aes())
ggplot(data_to_plot, aes(x = values)) +
geom_histogram(aes(y = ..count..), binwidth = 1, colour= "black", fill = "white") +
geom_density(aes(y=1.25*..count..), fill="blue", alpha = .2)+
scale_x_continuous(breaks = seq(0, max(data_to_plot$values), 1))
Desired result: y scale shows frequency and geom_density line is matching with geom_histogram height
I cannot hardcode coefficient or bins.
This problem is close to the ones discussed here, but it did not work for my case:
Programatically scale density curve made with geom_density to similar height to geom_histogram?
How to put geom_density and geom_histogram on same counts scale
A density curve always represents data between 0 and 1, whereas a count data are multiples of 1. So it does mostly not make sense to plot those data to the same y-axis.
The left plot shows density line and histogram for data similar to the ones from you - I just added some. The height of the bar shows the percentage of counts for the corresponding x-value. The y-scale is smaller than 1.
The right plot shows the same as the left, but another histogram is added which shows the count. The y-scales goes up and the 2 density plots shrink.
If you want to scale both to the same scale, you could to this by calculating a scaling factor. I have used this scaling factor to add a secondary y-axis to the third plot and saling the sec y-axis accordingly.
In order to make clear what belongs to what scale I have colored 2nd y-axis and the data belonging to it red.
library(ggplot2)
library(patchwork)
values <- c(rep(0,2),rep(1,4), rep(2,6), rep(3,8), rep(4,12), rep(5,7), rep(6,4),rep(7,2))
df <- as.data.frame(values)
p1 <- ggplot(df, aes(x = values)) +
stat_density(geom = 'line') +
geom_histogram(aes(y = ..density..), binwidth = 1,color = 'white', fill = 'red', alpha = 0.2)
p2 <- ggplot(df, aes(x = values)) +
stat_density(geom = 'line') +
geom_histogram(aes(y = ..count..), binwidth = 1, color = 'white', alpha = 0.2) +
geom_histogram(aes(y = ..density..), binwidth = 1, color = 'white', alpha = 0.2) +
ylab('density and counts')
# Find maximum of ..density..
m <- max(table(df$values)/sum(table(df$values)))
# Find maxium of df$values
mm <- max(table(df$values))
# Create Scaling factor for secondary axis
scaleF <- m/mm
p3 <- p1 + scale_y_continuous(
limits = c(0, m),
# Features of the first axis
name = "density",
# Add a second axis and specify its features
sec.axis = sec_axis( trans=~(./scaleF), name = 'counts')
) +
theme(axis.ticks.y.right = element_line(color = "red"),
axis.line.y.right = element_line(color = 'red'),
axis.text.y.right = element_text(color = 'red'),
axis.title.y.right = element_text(color = 'red')) +
annotate("segment", x = 5, xend = 7,
y = 0.25, yend = .25, colour = "pink", size=3, alpha=0.6, arrow=arrow())
p1 | p2 | p3

Preventing wrong density plots when coloring histograms according to groups

based on some dummy data I created a histogram with desity plot
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
a <- ggplot(wdata, aes(x = weight))
a + geom_histogram(aes(y = ..density..,
# color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
The histogram of weight shall be colored corresponding to sex, so I use aes(y = ..density.., color = sex) for geom_histogram():
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
As I want it to, the density plot stays the same (overall for both groups), but the histograms jump scale up (and seem to be treated individually now):
How do I prevent this from happening? I need individually colored histogram bars but a joint density plot for all coloring groups.
P.S.
Using aes(color = sex) for geom_density() gets everything back to original scales - but I don't want individual density plots (like below):
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
EDIT:
As it has been suggested, dividing by the number of groups in geom_histogram()'s aesthetics with y = ..density../2 may approximate the solution. Nevertheless, this only works with symmetric distributions like in the first output below:
a + geom_histogram(aes(y = ..density../2,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
which yields
Less symmetric distributions, however, may cause trouble using this approach. See those below, where for 5 groups, y = ..density../5 was used. First original, then manipulation (with position = "stack"):
Since the distribution is heavy on the left, dividing by 5 underestimates on the left and overestimates on the right.
EDIT 2: SOLUTION
As suggested by Andrew, the below (complete) code solves the problem:
library(ggplot2)
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each = 200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
binwidth <- 0.25
a <- ggplot(wdata,
aes(x = weight,
# Pass binwidth to aes() so it will be found in
# geom_histogram()'s aes() later
binwidth = binwidth))
# Basic plot w/o colouring according to 'sex'
a + geom_histogram(aes(y = ..density..),
binwidth = binwidth,
colour = "black",
fill = "white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25))
# Plot w/ colouring according to 'sex'
a + geom_histogram(aes(x = weight,
# binwidth will only be found if passed to
# ggplot()'s aes() (as above)
y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25)) +
guides(color = FALSE)
Note:
binwidth = binwidth needed to be passed to ggplot()'s aes(), otherwise the pre-specified binwidth would not be found by geom_histogram()'s aes(). Further, position = "stack" is specified, so that both versions of the histogram are comparable. Plots for dummy data and the more complex distribution below:
Solved - Thanks for your help!
I don't think you can do it using y=..density.., but you can recreate the same thing like this...
binwidth <- 0.25 #easiest to set this manually so that you know what it is
a + geom_histogram(aes(y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "identity") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))

Combine/Overlay boxplot with histogram in R

I need to combine the boxplot with the histogram using ggplot2. So far I have this code.
library(dplyr)
library(ggplot2)
data(mtcars)
dat <- mtcars %>% dplyr::select(carb, wt) %>%
dplyr::group_by(carb) %>% dplyr::mutate(mean_wt = mean(wt), carb_count = n())
plot<-ggplot(data=mtcars, aes(x=carb, y=..count..)) +
geom_histogram(alpha=0.3, position="identity", lwd=0.2,binwidth=1)+
theme_bw()+
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.7))+
geom_text(data=aggregate(mean_wt~carb+carb_count,dat,mean), aes(carb, carb_count+0.5, label=round(mean_wt,1)), color="black")
plot + geom_boxplot(data = mtcars,mapping = aes(x = carb, y = 6*wt,group=carb),
color="black", fill="red", alpha=0.2,width=0.1,outlier.shape = NA)+
scale_y_continuous(name = "Count",
sec.axis = sec_axis(~./6, name = "Weight"))
This results in
However, I dont want the secondary y axis to be the same length of primary y axis. I want the secondary y axis to be smaller and on the top right corner only. Lets say secondary y axis should scale between 20-30 of primary y axis and the box plot should also scale with the axis.
Can anyone help me with this?
Here's one approach, where I adjusted the secondary axis formula and tweaked the way it's labeled. (EDIT: adjusted to make boxplots bigger, per OP comment.)
plot + geom_boxplot(data = mtcars,
# Adj'd scaling so each 1 wt = 2.5 count
aes(x = carb, y = (wt*2.5)+10,group=carb),
color="black", fill="red", alpha=0.2,
width=0.5, outlier.shape = NA)+ # Wider width
scale_y_continuous(name = "Count", # Adj'd labels to limit left to 0, 5, 10
breaks = 5*0:5, labels = c(5*0:2, rep("", 3)),
# Adj'd scaling to match the wt scaling
sec.axis = sec_axis(~(.-10)/2.5, name = "Weight",
breaks = c(0:5))) +
theme(axis.title.y.left = element_text(hjust = 0.15, vjust = 1),
axis.title.y.right = element_text(hjust = 0.15, vjust = 1))
You might also consider an alternative using the patchwork package, coincidentally written by the same developer who implemented secondary scales in ggplot2...
# Alternative solution using patchwork
library(patchwork)
plot2 <- ggplot(data=mtcars, aes(x=carb, y=..count..)) +
theme_bw()+
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.7))+
geom_boxplot(data = mtcars,
aes(x = carb, y = wt, group=carb),
color="black", fill="red", alpha=0.2,width=0.1,outlier.shape = NA) +
scale_y_continuous(name = "Weight") +
scale_x_continuous(labels = NULL, name = NULL,
expand = c(0, 0.85), breaks = c(2,4,6,8))
plot2 + plot + plot_layout(nrow = 2, heights = c(1,3)) +
labs(x=NULL)

Resources