Linear color gradient inside ggplot histogram columns - r

I am trying to reproduce this plot with ggplot2 :
From what I understood you can call it a histogram with linear color gradient
I stuck on this linear color gradient, I can't figure how to reproduce it within each columns.
I found one work around on another post here :Trying to apply color gradient on histogram in ggplot
But it is quite an old one and does not look well with my data, also it is more a "categorical coloring" than a "gradient coloring".
I found also this one : Plot background colour in gradient but it only applies the gradient on the plot background and not in the columns.
This could be tested using the iris dataset :
ggplot(iris, aes(x=Species, fill=Petal.Width)) +
geom_histogram(stat = "count")
Where the Petal.Width values of each Species would be used as a coloring gradient for each columns of the histogram with a color legend as in the example plot.
Any help is welcome !

As the data is not provided, I use a toy example.
The point is to have two variables one for colouring (grad) and another for the x-axis (x in the example). You need to use desc() to make the higher values placed on the higher position in each bin.
library(tidyverse)
n <- 10000
grad <- runif(n, min = 0, max = 100) %>% round()
x <- sample(letters, size = n, replace = T)
tibble(x, grad) %>%
ggplot(aes(x = x, group = desc(grad), fill = grad)) +
geom_bar(stat = 'count') +
scale_fill_viridis_c()
Created on 2020-05-14 by the reprex package (v0.3.0)
Or, using iris, the example is like:
library(tidyverse)
ggplot(iris, aes(x=Species, group = desc(Petal.Width), fill=Petal.Width)) +
geom_histogram(stat = "count") +
scale_fill_viridis_c()
#> Warning: Ignoring unknown parameters: binwidth, bins, pad
Created on 2020-05-14 by the reprex package (v0.3.0)

Related

Align ggplot2 points without having them overlap?

Is there any possible way to make the points on a boxplot show and not have them overlap each other if they arent unique?
Currently:
I want it to look like this (with the colours and other features):
I tried beeswarm and I'm getting the error:
Warning in f(...) : The default behavior of beeswarm has changed in version 0.6.0. In versions <0.6.0, this plot would have been dodged on the y-axis. In versions >=0.6.0, grouponX=FALSE must be explicitly set to group on y-axis. Please set grouponX=TRUE/FALSE to avoid this warning and ensure proper axis choice.
even though I have geom_beeswarm(grouponY=TRUE)
You could do something like this ...
library(tidyverse)
tibble(x = "label", y = rnorm(100, 10, 10)) |>
ggplot(aes(x, y)) +
geom_jitter(width = 0.1) +
geom_boxplot(outlier.shape = NA)
Created on 2022-04-24 by the reprex package (v2.0.1)
Slight modifications to Carl's answer could be to:
Move the geom_jitter layer below the geom_boxplot layer, so that the points show through the box; or
Make the box more transparent to allow the points to be visible
tibble(x = "label", y = rnorm(100, 10, 10)) %>%
ggplot(aes(x, y)) +
geom_boxplot(outlier.shape = NA, alpha ) +
geom_jitter(width = 0.1)
Alternatively, have you considered using a violin plot? It can more effectively show the density of the distribution, as the width of the plot is proportional to the proportion of data points around that level (for the y axis).

How to annotate the area under the curve of density plot of specific interval?

The density plot is interesting, but the height is just a height. (https://stats.stackexchange.com/questions/147885/how-to-interpret-height-of-density-plot)
So when visualizing this, it's always helpful to provide another information such as what's the percentage for Sepal.Length to be between, say, 5 and 6? Shade the area, and annotate the chart with the percentage of that specific area.
How can I do this with ggplot?
ggplot(iris, aes(x=Sepal.Length)) +
geom_density()
For example below, the area of interest is shaded and it shows the percentage (ideally 12% instead of 0.12)
You might find scales::oob_censor() a convenient function. It converts out-of-bounds values to NAs. You can use this to set bounds to the filled area, but also by counting non-NAs, get the fraction of observations falling within the bounds (as closed interval). A downside is that you will get a warning about missing values, which is fine. You'd have to manually set a satisfactory y-value for the text annotation though.
library(ggplot2)
library(scales)
bounds <- c(5, 6)
ggplot(iris, aes(x=Sepal.Length)) +
stat_density(geom = "line") +
stat_density(
geom = "area",
aes(x = stage(Sepal.Length, after_stat = oob_censor(x, bounds))),
alpha = 0.3
) +
annotate(
"text", mean(bounds), y = 0.2,
label = percent(mean(!is.na(oob_censor(iris$Sepal.Length, bounds))))
)
#> Warning: Removed 370 rows containing missing values (position_stack).
Created on 2021-06-01 by the reprex package (v1.0.0)

ggplot add Normal Distribution while using `facet_wrap` [duplicate]

This question already has answers here:
using stat_function and facet_wrap together in ggplot2 in R
(6 answers)
Closed 2 years ago.
I'm looking to plot the following histograms:
library(palmerpenguins)
library(tidyverse)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species)
For each histogram, I would like to add a Normal Distribution to each histogram with each species mean and standard deviation.
Of course I'm aware that I could compute the group specific mean and SD before embarking on the ggplot command, but I wonder whether there is a smarter/faster way to do this.
I have tried:
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species) +
stat_function(fun = dnorm)
But this only gives me a thin line at the bottom:
Any ideas?
Thanks!
Edit
I guess what I'm trying to recreate is this simple command from Stata:
hist bill_length_mm, by(species) normal
which gives me this:
I understand that there are some suggestions here: using stat_function and facet_wrap together in ggplot2 in R
But I'm specifically looking for a short answer that does not require me creating a separate function.
A while I ago I sort of automated this drawing of theoretical densities with a function that I put in the ggh4x package I wrote, which you might find convenient. You would just have to make sure that the histogram and theoretical density are at the same scale (for example counts per x-axis unit).
library(palmerpenguins)
library(tidyverse)
library(ggh4x)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 1) +
stat_theodensity(aes(y = after_stat(count))) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
You can vary the bin size of the histogram, but you'd have to adjust the theoretical density count too. Typically you'd multiply by the binwidth.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 2) +
stat_theodensity(aes(y = after_stat(count)*2)) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
Created on 2021-01-27 by the reprex package (v0.3.0)
If this is too much of a hassle, you can always convert the histogram to density instead of the density to counts.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density))) +
stat_theodensity() +
facet_wrap(~species)
While the ggh4x package is the way to go in this case, a more generalizable approach is with tapply and the use of the PANEL variable which is added to the data when a facet is applied.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density)), bins = 30) +
facet_wrap(~species) +
geom_line(aes(y = dnorm(bill_length_mm,
mean = tapply(bill_length_mm, species, mean, na.rm = TRUE)[PANEL],
sd = tapply(bill_length_mm, species, sd, na.rm = TRUE)[PANEL])))

ggplot2: add line for average count values resulting from geom_freqpoly

I am trying to add an additional line to my geom_freqpoly plot that represents the average count per binwidth. I tried two different things but none of them were successful.
I tried adding the line as a geom_line but got an error asking if I map my stat in the wrong layer.
library(tidyverse)
iris %>%
ggplot(aes(x = Petal.Length, y = ..count..)) +
geom_freqpoly(aes(color = Species),
binwidth = 0.2) +
geom_line(aes(yintercept = "mean"))
#> Warning: Ignoring unknown aesthetics: yintercept
#> Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y = ..count...
#> Did you map your stat in the wrong layer?
I tried adding another geom_freqpoly, like:
library(tidyverse)
iris %>%
ggplot() +
geom_freqpoly(aes(x = Petal.Length, y = ..count.., color = Species),
binwidth = 0.2) +
geom_freqpoly(aes(x = Petal.Length, y = mean(..count..), color = "red"), binwidth = 0.2)
But the resulting line is not what I expect.
Using the Iris dataset I would expect that the new line would represent the average count of Species by the defined binwidth (see image below), not what I am getting. My understanding is that geom_freqpoly divides a continues variable (like Petal.Length) in length bins (of 0.2 length in this case). So for each bin I want to have the average count of each specie and draw a line connecting those points.
Created on 2020-05-23 by the reprex package (v0.3.0)
Based on the edit of your question maybe this is what you expected.
The problem with your approach is that mean(..count..) simply computes the mean of the vector ..count.. which gives you one number and therefore a horizontal line. Therefore simply divide by the number of groups or Species
I'm not completely satisfied with my solution because I would like to avoid the code n_distinct(iris$Species). I tried some approaches with tapply but failed. So as a first step ...
library(tidyverse)
ggplot(iris, aes(x = Petal.Length)) +
geom_freqpoly(aes(color = Species), binwidth = .2) +
geom_freqpoly(aes(y = ..count.. / n_distinct(iris$Species)), color = "red", binwidth = .2)
Created on 2020-05-28 by the reprex package (v0.3.0)

Excluding cells from transparency in heatmap with ggplot

I am trying to generate a heatmap where I can show more than one level of information on each cell. For each cell I would like to show a different color depending on its value in one variable and then overlay this with a transparency (alpha) that shades the cell according to its value for another variable.
Similar questions have been addressed here (Place 1 heatmap on another with transparency in R) a
and here (Making a heatmap in R varying both color and transparency). In both cases the suggestion is to use ggplot and overlay two geom_tiles, one with the colors one with the transparency.
I have managed to overlay two geom_tiles (see code below). However, in my case, the problem is that the shading defined by the transparency (or "alpha") geom_tile also shades some cells that should remain as white or blank according to the colors (or "fill") geom_tile. I would like these cells to remain white even after overlaying the transparency.
#Create sample dataframe
df <- data.frame("x_pos" = c("A","A","A","B","B","B","C","C","C"),
"y_pos" = c("X","Y","Z","X","Y","Z","X","Y","Z"),
"col_var"= c(1,2,NA,4,5,6,NA,8,9),
"alpha_var" = c(7,12,0,3,2,15,0,6,15))
#Convert factor columns to numeric
df$col_var<- as.numeric(df$col_var)
df$alpha_var<- as.numeric(df$alpha_var)
#Cut display variable into breaks
df$col_var_cut <- cut(df$col_var,
breaks = c(0,3,6,10),
labels = c("cat1","cat2", "cat3"))
#Plot
library(ggplot2)
ggplot(df, aes (x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile () +
geom_text() +
scale_fill_manual(values=(brewer.pal(3, "RdYlBu")),na.value="white") +
geom_tile(aes(alpha = alpha_var), fill ="gray29")+
scale_alpha_continuous("alpha_var", range=c(0,0.7), trans = 'reverse')+
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
I would like cells "AZ" and "CX" in the heatmap resulting from the code above to be colored white instead of grey such that the alpha transparency doesn't apply to them. In my data, these cells have NA in the color variable (col_var) and can have a value of NA or 0 (as in the example code) in the transparency/alpha variable (alpha_var).
If this is not possible, then I would like to know whether there are other options to display both variables in a heatmap and keep the NA cells in the col_var white? I am happy to use other packages or alternative heatmap layouts such as those where the size of each cell or the thickness of its border vary according to the values the alpha_var. However, I am not sure how I could achieve this either.
Thanks in advance and my apologies for the cumbersome bits in the example code (I am still learning R and this is my first time asking questions here).
You were not far. See below for a possible solution. The first plot shows an implementation of adding transparency within the geom_tile call itself - note I removed the trans = reverse specification from your plot.
Plot 2 just adds back the white tiles on top of the other plot - simple hack which you will often find necessary when wanting to plot certain data points differently.
Note I have added a few minor comments to your code below.
# creating your data frame with better name - df is a base R function and not recommended as example name.
# Also note that I removed the quotation marks in the data frame call - they were not necessary. I also called as.numeric directly.
mydf <- data.frame(x_pos = c("A","A","A","B","B","B","C","C","C"), y_pos = c("X","Y","Z","X","Y","Z","X","Y","Z"), col_var= as.numeric(c(1,2,NA,4,5,6,NA,8,9)), alpha_var = as.numeric(c(7,12,0,3,2,15,0,6,15)))
mydf$col_var_cut <- cut(mydf$col_var, breaks = c(0,3,6,10), labels = c("cat1","cat2", "cat3"))
#Plot
library(tidyverse)
library(RColorBrewer) # you forgot to add this to your reprex
ggplot(mydf, aes (x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile(aes(alpha = alpha_var)) +
geom_text() +
scale_fill_manual(values=(brewer.pal(3, "RdYlBu")), na.value="white")
#> Warning: Removed 2 rows containing missing values (geom_text).
# a bit hacky for quick and dirty solution. Note I am using dplyr::filter from the tidyverse
ggplot(mapping = aes(x = x_pos, y = y_pos, fill = col_var_cut, label = col_var)) +
geom_tile(data = filter(mydf, !is.na(col_var))) +
geom_tile(data = filter(mydf, !is.na(col_var)), aes(alpha = alpha_var), fill ="gray29")+
geom_tile(data = filter(mydf, is.na(col_var)), fill = 'white') +
geom_text(data = mydf) +
scale_fill_manual(values = (brewer.pal(3, "RdYlBu"))) +
scale_alpha_continuous("alpha_var", range=c(0,0.7), trans = 'reverse')
#> Warning: Removed 2 rows containing missing values (geom_text).
Created on 2019-07-04 by the reprex package (v0.2.1)

Resources