ggplot add Normal Distribution while using `facet_wrap` [duplicate] - r

This question already has answers here:
using stat_function and facet_wrap together in ggplot2 in R
(6 answers)
Closed 2 years ago.
I'm looking to plot the following histograms:
library(palmerpenguins)
library(tidyverse)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species)
For each histogram, I would like to add a Normal Distribution to each histogram with each species mean and standard deviation.
Of course I'm aware that I could compute the group specific mean and SD before embarking on the ggplot command, but I wonder whether there is a smarter/faster way to do this.
I have tried:
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species) +
stat_function(fun = dnorm)
But this only gives me a thin line at the bottom:
Any ideas?
Thanks!
Edit
I guess what I'm trying to recreate is this simple command from Stata:
hist bill_length_mm, by(species) normal
which gives me this:
I understand that there are some suggestions here: using stat_function and facet_wrap together in ggplot2 in R
But I'm specifically looking for a short answer that does not require me creating a separate function.

A while I ago I sort of automated this drawing of theoretical densities with a function that I put in the ggh4x package I wrote, which you might find convenient. You would just have to make sure that the histogram and theoretical density are at the same scale (for example counts per x-axis unit).
library(palmerpenguins)
library(tidyverse)
library(ggh4x)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 1) +
stat_theodensity(aes(y = after_stat(count))) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
You can vary the bin size of the histogram, but you'd have to adjust the theoretical density count too. Typically you'd multiply by the binwidth.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 2) +
stat_theodensity(aes(y = after_stat(count)*2)) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
Created on 2021-01-27 by the reprex package (v0.3.0)
If this is too much of a hassle, you can always convert the histogram to density instead of the density to counts.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density))) +
stat_theodensity() +
facet_wrap(~species)

While the ggh4x package is the way to go in this case, a more generalizable approach is with tapply and the use of the PANEL variable which is added to the data when a facet is applied.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density)), bins = 30) +
facet_wrap(~species) +
geom_line(aes(y = dnorm(bill_length_mm,
mean = tapply(bill_length_mm, species, mean, na.rm = TRUE)[PANEL],
sd = tapply(bill_length_mm, species, sd, na.rm = TRUE)[PANEL])))

Related

r one sided geom_quasirandom

I am interested in plotting the distribution of some continuous variables by groups using the geom_quasirandom() function.
I would like to know how do plot only one side, for example only points on the right side rather than on both sides.
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()
I am interested in a plot like this. I am not interested in solutions based on geom_beeswarm because I have a lot of datapoints and geom_beeswarm just creates a crowded mess for my dataset. So any solution specific to geom_quasirandom() will be helpful. Thanks.
It's funny, the first example below is actually in the README for {ggbeeswarm} but it doesn't seem to work with my versions of R and the package.
library(tidyverse)
library(ggbeeswarm)
# doesn't look asymetric as shown in the vingette
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm(side = 1L)
#> Warning: Ignoring unknown parameters: side
However using the {see} package you can get pretty close to what I think you're asking for.
library(see)
# this works if you turn off the violin half
ggplot(iris,aes(Species, Sepal.Length)) +
geom_violindot(fill = NA, color = NA, color_dots = "black", size_dots = 1, fill_dots = "black")
Created on 2022-02-08 by the reprex package (v2.0.1)
Here's one approach, pulling the guts out of a beeswarm and using dplyr to adjust these manually, and adding the Species labels back manually.
ggplot_build(ggplot(iris,aes(Species, Sepal.Length)) +
geom_quasirandom()) -> a
a$data[[1]] %>%
group_by(group) %>%
mutate(x_adj = group + abs(x - median(x))) %>%
ungroup() %>%
ggplot(aes(x_adj, y)) +
geom_point() +
scale_x_continuous(breaks = 1:3, labels = unique(iris$Species))
...or a hack to keep the automatic labeling by using an invisible layer:
...
ggplot(aes(x, y)) +
geom_point(alpha = 0) +
geom_point(aes(x = x_adj))

ggplot2: add line for average count values resulting from geom_freqpoly

I am trying to add an additional line to my geom_freqpoly plot that represents the average count per binwidth. I tried two different things but none of them were successful.
I tried adding the line as a geom_line but got an error asking if I map my stat in the wrong layer.
library(tidyverse)
iris %>%
ggplot(aes(x = Petal.Length, y = ..count..)) +
geom_freqpoly(aes(color = Species),
binwidth = 0.2) +
geom_line(aes(yintercept = "mean"))
#> Warning: Ignoring unknown aesthetics: yintercept
#> Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y = ..count...
#> Did you map your stat in the wrong layer?
I tried adding another geom_freqpoly, like:
library(tidyverse)
iris %>%
ggplot() +
geom_freqpoly(aes(x = Petal.Length, y = ..count.., color = Species),
binwidth = 0.2) +
geom_freqpoly(aes(x = Petal.Length, y = mean(..count..), color = "red"), binwidth = 0.2)
But the resulting line is not what I expect.
Using the Iris dataset I would expect that the new line would represent the average count of Species by the defined binwidth (see image below), not what I am getting. My understanding is that geom_freqpoly divides a continues variable (like Petal.Length) in length bins (of 0.2 length in this case). So for each bin I want to have the average count of each specie and draw a line connecting those points.
Created on 2020-05-23 by the reprex package (v0.3.0)
Based on the edit of your question maybe this is what you expected.
The problem with your approach is that mean(..count..) simply computes the mean of the vector ..count.. which gives you one number and therefore a horizontal line. Therefore simply divide by the number of groups or Species
I'm not completely satisfied with my solution because I would like to avoid the code n_distinct(iris$Species). I tried some approaches with tapply but failed. So as a first step ...
library(tidyverse)
ggplot(iris, aes(x = Petal.Length)) +
geom_freqpoly(aes(color = Species), binwidth = .2) +
geom_freqpoly(aes(y = ..count.. / n_distinct(iris$Species)), color = "red", binwidth = .2)
Created on 2020-05-28 by the reprex package (v0.3.0)

Linear color gradient inside ggplot histogram columns

I am trying to reproduce this plot with ggplot2 :
From what I understood you can call it a histogram with linear color gradient
I stuck on this linear color gradient, I can't figure how to reproduce it within each columns.
I found one work around on another post here :Trying to apply color gradient on histogram in ggplot
But it is quite an old one and does not look well with my data, also it is more a "categorical coloring" than a "gradient coloring".
I found also this one : Plot background colour in gradient but it only applies the gradient on the plot background and not in the columns.
This could be tested using the iris dataset :
ggplot(iris, aes(x=Species, fill=Petal.Width)) +
geom_histogram(stat = "count")
Where the Petal.Width values of each Species would be used as a coloring gradient for each columns of the histogram with a color legend as in the example plot.
Any help is welcome !
As the data is not provided, I use a toy example.
The point is to have two variables one for colouring (grad) and another for the x-axis (x in the example). You need to use desc() to make the higher values placed on the higher position in each bin.
library(tidyverse)
n <- 10000
grad <- runif(n, min = 0, max = 100) %>% round()
x <- sample(letters, size = n, replace = T)
tibble(x, grad) %>%
ggplot(aes(x = x, group = desc(grad), fill = grad)) +
geom_bar(stat = 'count') +
scale_fill_viridis_c()
Created on 2020-05-14 by the reprex package (v0.3.0)
Or, using iris, the example is like:
library(tidyverse)
ggplot(iris, aes(x=Species, group = desc(Petal.Width), fill=Petal.Width)) +
geom_histogram(stat = "count") +
scale_fill_viridis_c()
#> Warning: Ignoring unknown parameters: binwidth, bins, pad
Created on 2020-05-14 by the reprex package (v0.3.0)

Plotly and stat_summary produce "NA" in tooltip

I'd like to use stat_summary to show mean values of my data on a ggplot2 graph, and use plotly to show the data and mean values in the tooltip. I'm using the dev version of ggplot2 (ggplot2_2.2.1.9000) and I just reinstalled plotly (plotly_4.7.1).
stat_smooth works - you can see I'm hovering over the smoothed line and I get values in the tooltip.
library(plotly)
library(ggplot2)
p <- iris %>%
ggplot(aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point()
ggplotly(p + stat_smooth(se = FALSE))
However, stat_summary doesn't - and that's what I need for my "real-world" problem :). Note how Sepal.Width is NA. I could have sworn this worked for me last year.
ggplotly(p + stat_summary(fun.y = "mean", geom="line", size = 1))

How do I create a categorical scatterplot in R like boxplots?

Does anyone know how to create a scatterplot in R to create plots like these in PRISM's graphpad:
I tried using boxplots but they don't display the data the way I want it. These column scatterplots that graphpad can generate show the data better for me.
Any suggestions would be appreciated.
As #smillig mentioned, you can achieve this using ggplot2. The code below reproduces the plot that you are after pretty well - warning it is quite tricky. First load the ggplot2 package and generate some data:
library(ggplot2)
dd = data.frame(values=runif(21), type = c("Control", "Treated", "Treated + A"))
Next change the default theme:
theme_set(theme_bw())
Now we build the plot.
Construct a base object - nothing is plotted:
g = ggplot(dd, aes(type, values))
Add on the points: adjust the default jitter and change glyph according to type:
g = g + geom_jitter(aes(pch=type), position=position_jitter(width=0.1))
Add on the "box": calculate where the box ends. In this case, I've chosen the average value. If you don't want the box, just omit this step.
g = g + stat_summary(fun.y = function(i) mean(i),
geom="bar", fill="white", colour="black")
Add on some error bars: calculate the upper/lower bounds and adjust the bar width:
g = g + stat_summary(
fun.ymax=function(i) mean(i) + qt(0.975, length(i))*sd(i)/length(i),
fun.ymin=function(i) mean(i) - qt(0.975, length(i)) *sd(i)/length(i),
geom="errorbar", width=0.2)
Display the plot
g
In my R code above I used stat_summary to calculate the values needed on the fly. You could also create separate data frames and use geom_errorbar and geom_bar.
To use base R, have a look at my answer to this question.
If you don't mind using the ggplot2 package, there's an easy way to make similar graphics with geom_boxplot and geom_jitter. Using the mtcars example data:
library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + geom_jitter() + theme_bw()
which produces the following graphic:
The documentation can be seen here: http://had.co.nz/ggplot2/geom_boxplot.html
I recently faced the same problem and found my own solution, using ggplot2.
As an example, I created a subset of the chickwts dataset.
library(ggplot2)
library(dplyr)
data(chickwts)
Dataset <- chickwts %>%
filter(feed == "sunflower" | feed == "soybean")
Since in geom_dotplot() is not possible to change the dots to symbols, I used the geom_jitter() as follow:
Dataset %>%
ggplot(aes(feed, weight, fill = feed)) +
geom_jitter(aes(shape = feed, col = feed), size = 2.5, width = 0.1)+
stat_summary(fun = mean, geom = "crossbar", width = 0.7,
col = c("#9E0142","#3288BD")) +
scale_fill_manual(values = c("#9E0142","#3288BD")) +
scale_colour_manual(values = c("#9E0142","#3288BD")) +
theme_bw()
This is the final plot:
For more details, you can have a look at this post:
http://withheadintheclouds1.blogspot.com/2021/04/building-dot-plot-in-r-similar-to-those.html?m=1

Resources