stack bars with stat_binline() - r

I have data similar to the example below.
I am wanting to visualise the spread of the outcome variable (value) for each group (name). The fill aesthetic is the desired interval - the example below uses the interquartile range.
I would expect the position="identity" to stack the bars on top of each other for the fill aesthetic (as it does for geom_bar). This is the behaviour that I want.
When I try position="stack", it's a mess.
I have looked at the stat_binline examples and the ggridges vignette but neither have examples where the position is modified to stack the ridges (binned or not).
library(ggplot2)
library(ggridges)
set.seed(123)
size <- 1000
data.frame(
name=sample(LETTERS[1:5], size=size, replace=T),
value=c(sample(1:20, size=size*0.8, replace=T), rep(15, size*0.2))
) %>%
group_by(name) %>%
arrange(value) %>%
mutate(percentile=row_number()/n()) %>%
ungroup() %>%
mutate(in_interval=percentile > 0.25 & percentile < 0.75)%>%
ggplot(aes(x = value, y = name, height = stat(count), fill=in_interval)) +
stat_binline(position = "identity", alpha=0.3, bins=20, scale=0.9) +
coord_flip()
The overlap that I want to avoid is shown here. I want these bars to the stacked instead.
Thank you!

I reviewed the ggridges docs - https://wilkelab.org/ggridges/reference/stat_binline.html
The ggplot position page - https://ggplot2.tidyverse.org/reference/position_stack.html
And a few of the great [ggridges] tagged answers on SO -
https://stackoverflow.com/a/58557352/10276092
Add color gradient to ridgelines according to height
And all I've produced is a non-ggridges answer:
df %>%
ggplot(aes(x=value, fill=in_interval)) +
geom_histogram(bins=20) +
facet_grid(cols=vars(name)) +
coord_flip()

Related

Add a gradient fill to geom_col

Here is come basic code for a column plot:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarise(
count = n()
) %>%
ggplot(
aes(
x = cut,
y = count,
fill = count
)
) +
geom_col() +
scale_fill_viridis_c(
option = "plasma"
)
I could not find any examples of what I would like to do so I will try and explain it as best I can. I have applied a colour gradient to the fill aesthetic which colours the whole column plot one colour. Is it possible to have it such that each column of the plot contains the full colour spectrum up until it's respective value?
By which I mean the "Ideal" column of my plot would look exactly like the key in the legend. Then the "Premium" column would look like the key in the legend but cut off ~2/3 of the way up.
Thanks
You can do this fairly easily with a bit of data manipulation. You need to give each group in your original data frame a sequential number that you can associate with the fill scale, and another column the value of 1. Then you just plot using position_stack
library(ggplot2)
library(dplyr)
diamonds %>%
group_by(cut) %>%
mutate(fill_col = seq_along(cut), height = 1) %>%
ggplot(aes(x = cut, y = height, fill = fill_col)) +
geom_col(position = position_stack()) +
scale_fill_viridis_c(option = "plasma")

Color/fill bars in geom_col based on another variable?

I have an uncolored geom_col and would like it to display information about another (continuous) variable by displaying different shades of color in the bars.
Example
Starting with a geom_col
library(dplyr)
library(ggplot2)
set.seed(124)
iris[sample(1:150, 50), ] %>%
group_by(Species) %>%
summarise(n=n()) %>%
ggplot(aes(Species, n)) +
geom_col()
Suppose we want to color the bars according to how low/high mean(Sepal.Width) in each grouping
(note: I don't know if there's a way to provide 'continuous' colors to a ggplot, but, if not, the following colors would be fine to use)
library(RColorBrewer)
display.brewer.pal(n = 3, name= "PuBu")
brewer.pal(n = 3, name = "PuBu")
[1] "#ECE7F2" "#A6BDDB" "#2B8CBE"
The end result should be the same geom_col as above but with the bars colored according to how low/high mean(Sepal.Width) is.
Notes
This answer shows something similar but is highly manual, and is okay for 3 bars, but not sustainable for many plots with a high number of bars (since would require too many case_when conditions to be manually set)
This is similar but the coloring is based on a variable already displayed in the plot, rather than another variable
Note also, in the example I provide above, there are 3 bars and I provide 3 colors, this is somewhat manual and if there's a better (i.e. less manual) way to designate colors would be glad to learn it
What I've tried
I thought this would work, but it seems to ignore the colors I provide
library(RColorBrewer)
# fill info from: https://stackoverflow.com/questions/38788357/change-bar-plot-colour-in-geom-bar-with-ggplot2-in-r
set.seed(124)
iris[sample(1:150, 50), ] %>%
group_by(Species) %>%
summarise(n=n(), sep_mean = mean(Sepal.Width)) %>%
arrange(desc(n)) %>%
mutate(colors = brewer.pal(n = 3, name = "PuBu")) %>%
mutate(Species=factor(Species, levels=Species)) %>%
ggplot(aes(Species, n, fill = colors)) +
geom_col()
Do the following
add fill = sep_mean to aes()
add + scale_fill_gradient()
remove mutate(colors = brewer.pal(n = 3, name = "PuBu")) since the previous step takes care of colors for you
set.seed(124)
iris[sample(1:150, 50), ] %>%
group_by(Species) %>%
summarise(n=n(), sep_mean = mean(Sepal.Width)) %>%
arrange(desc(n)) %>%
mutate(Species=factor(Species, levels=Species)) %>%
ggplot(aes(Species, n, fill = sep_mean, label=sprintf("%.2f", sep_mean))) +
geom_col() +
scale_fill_gradient() +
labs(fill="Sepal Width\n(mean cm)") +
geom_text()

Density curves on multiple histograms sharing same y-axis

I need to overlay normal density curves on 3 histograms sharing the same y-axis. The curves need to be separate for each histogram.
My dataframe (example):
height <- seq(140,189, length.out = 50)
weight <- seq(67,86, length.out = 50)
fev <- seq(71,91, length.out = 50)
df <- as.data.frame(cbind(height, weight, fev))
I created the histograms for the data as:
library(ggplot)
library(tidyr)
df %>%
gather(key=Type, value=Value) %>%
ggplot(aes(x=Value,fill=Type)) +
geom_histogram(binwidth = 8, position="dodge")
I am now stuck at how to overlay normal density curves for the 3 variables (separate curve for each histogram) on the histograms that I have generated. I won't mind the final figure showing either count or density on the y-axis.
Any thoughts on how to proceed from here?
Thanks in advance.
I believe that the code in the question is almost right, the code below just uses the answer in the link provided by #akrun.
Note that I have commented out the call to facet_wrap by placing a comment char before the last plus sign.
library(ggplot2)
library(tidyr)
df %>%
gather(key = Type, value = Value) %>%
ggplot(aes(x = Value, color = Type, fill = Type)) +
geom_histogram(aes(y = ..density..),
binwidth = 8, position = "dodge") +
geom_density(alpha = 0.25) #+
facet_wrap(~ Type)

Geom tile white space issue when the x variable is spread unevenly accross facet grids

I'm trying to produce a heat map of gene expression from samples of different conditions, faceted by the conditions:
require(reshape2)
set.seed(1)
expression.mat <- matrix(rnorm(100*1000),nrow=100)
df <- reshape2::melt(expression.mat)
colnames(df) <- c("gene","sample","expression")
df$condition <- factor(c(rep("C1",2500),rep("C2",3500),rep("C3",3800),rep("C4",200)),levels=c("C1","C2","C3","C4"))
I'd like to color by expression range:
df$range <- cut(df$expression,breaks=6)
The width parameter in ggplot's aes is supposed to control the width of the different facets. My question is how to find the optimal width value such that the figure is not distorted?
I played around a bit with this plot command:
require(ggplot2)
ggplot(df,aes(x=sample,y=gene,fill=range,width=100))+facet_grid(~condition,scales="free")+geom_tile(color=NA)+labs(x="condition",y="gene")+theme_bw()
Setting width to be below 100 leaves gaps in the last facet (with the lowest number of samples), and already at this value of 100 you can see that the right column in the first facet from left is distorted (wider than the columns to its left):
So my question is how to fix this/find a width that doesn't cause this.
Edit showing the issue with the sample variable faceted by condition
There is no C1 sample between 25 and 100,
because they are by C2, c3 and C4.
Here is an illustration for the sample < 200.
ggplot(filter(df[df$sample < 200,]),
aes(x=sample, y = gene, fill=range)) +
geom_tile() +
facet_grid(~condition)
The number of sample is not the same in all facets and faceting on conditoins creates wholes between sample numbers for each condition.
One way to go around this problem would be to
create a sample2 number. I work using the dplyr package.
library(dplyr)
sample2 <- df %>%
group_by(condition) %>%
distinct(sample) %>%
mutate(sample2 = 1:n())
df <- df %>%
left_join(sample2, by = c("condition", "sample"))
Then plot using sample2 as the x variable
ggplot(df,aes(x = sample2, y = gene,
fill = range))+
facet_grid(~condition) +
geom_tile(color=NA) + theme_bw()
Using the scales argument to vary scales on the x axis.
ggplot(df,aes(x = sample2, y = gene,
fill = range))+
facet_grid(~condition, scales = "free") +
geom_tile() + theme_bw()
Old answer using width
See for example this answer.
Adding a width aesthetic produces wider columns:
ggplot(df,aes(x = sample, y = gene,
fill = range, width = 50))+
facet_grid(~condition) +
geom_tile(color=NA) +
labs(x="condition",y="gene")+theme_bw()

ggplot2: Stack barcharts with group means

I have tried several things to make ggplot plot barcharts with means derived from factors in a dataframe, but i wasnt successful.
If you consider:
df <- as.data.frame(matrix(rnorm(60*2, mean=3,sd=1), 60, 2))
df$factor <- c(rep(factor(1:3), each=20))
I want to achieve a stacked, relative barchart like this:
This chart was created with manually calculating group means in a separate dataframe, melting it and using geom_bar(stat="identity", position = "fill) and scale_y_continuous(labels = percent_format()). I havent found a way to use stat_summary with stacked barcharts.
In a second step, i would like to have errorbars attached to the breaks of each column. I have six treatments and three species, so errorbars should be OK.
For anything this complicated, I think it's loads easier to pre-calculate the numbers, then plot them. This is easily done with dplyr/tidyr (even the error bars):
gather(df, 'cat', 'value', 1:2) %>%
group_by(factor, cat) %>%
summarise(mean=mean(value), se=sd(value)/sqrt(n())) %>%
group_by(cat) %>%
mutate(perc=mean/sum(mean), ymin=cumsum(perc) -se/sum(mean), ymax=cumsum(perc) + se/sum(mean)) %>%
ggplot(aes(x=cat, y=perc, fill=factor(factor))) +
geom_bar(stat='identity') +
geom_errorbar(aes(ymax=ymax, ymin=ymin))
Of course this looks a bit strange because there are error bars around 100% in the stacked bars. I think you'd be way better off ploting the actual data points, plus means and error bars and using faceting:
gather(df, 'cat', 'value', 1:2) %>%
group_by(cat, factor) %>%
summarise(mean=mean(value), se=sd(value)/sqrt(n())) %>%
ggplot(aes(x=cat, y=mean, colour=factor(factor))) +
geom_point(aes(y=value), position=position_jitter(width=.3, height=0), data=gather(df, 'cat', 'value', 1:2) ) +
geom_point(shape=5, size = 3) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.1) +
facet_grid(factor ~ .)
This way anyone can examine the data and see for themselves that they are normally distributed

Resources