How to show boxplot flipped with another plot without flipping it? [duplicate] - r

This question already has answers here:
Overlaying boxplot with histogram in ggplot2
(3 answers)
Closed 3 years ago.
I want a boxplot to be overlayed on histogram. to avoid missing with the histogram, I am forced to draw the boxplot like that:
library(ggplot2)
ggplot(iris) + geom_boxplot(aes(x = Sepal.Length, y = factor(0)))
However the plot doesn't appear right unless I swap between the x and y.
I want to integrate a histogram with boxplot on the same coordinate, but it seems there is no way to plot a boxplot flipped without using coord_flip() that doesn't help here as it flip the whole plot.
ggplot(iris) +
geom_histogram(aes(x = Sepal.Length))+
geom_boxplot(aes(x = Sepal.Length, y = factor(0))) +
coord_flip()

Something like this?
library(ggplot2)
library(ggstance)
ggplot(iris, aes(x = Sepal.Length)) +
geom_histogram() +
geom_boxploth(aes(y = 3), width = 3, color = "blue", lwd = 2, alpha = .5) +
theme_minimal()

This works in the current development version of ggplot2, hopefully to be released soon.
library(ggplot2) # remotes::install_github("tidyverse/ggplot2")
packageVersion("ggplot2")
#> [1] '3.2.1.9000'
ggplot(iris) +
geom_histogram(aes(x = Sepal.Length))+
geom_boxplot(aes(x = Sepal.Length, y = factor(0)))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2019-11-12 by the reprex package (v0.3.0)

Related

Legend position based on coordinates in ggplot2

Using ggplot2's legend.position (and legend.justification), the two available parameters indicate the relative position of the legend, but what if I want to position the legend based on the coordinates of the plot?
I can't find a way to do it.
This is strange as annotate gives an x and y argument that allows such things.
Here is some toy data
library(ggplot2)
ggplot(data = mtcars, aes(x = mpg,y = disp,color = factor(cyl))) +
geom_point() +
theme(legend.position = c(0.01,0.01),
legend.justification = c(0,0))
Which gives:
What about if I want the bottom-left corner of the legend to be on coordinates (10,100)?
I don't think there is an easy way to do it. The only approach i could think of is to build the plot object to extract the ranges of the axes in order to convert (10, 100) into a relative coordinate that can be used with legend position. Admittedly, this is very hacky...
library(tidyverse)
p <- ggplot(data = mtcars, aes(x = mpg, y = disp, color = factor(cyl))) +
geom_point()
ranges <- ggplot_build(p) %>%
pluck("layout", "panel_params", 1) %>%
`[`(c("x.range", "y.range"))
x <- (10 - ranges$x.range[1]) / (ranges$x.range[2] - ranges$x.range[1])
y <- (100 - ranges$y.range[1]) / (ranges$y.range[2] - ranges$y.range[1])
p + theme(legend.position = c(x, y), legend.justification = c(0, 0))
Created on 2021-07-21 by the reprex package (v1.0.0)

Overlay KDE and filled histogram with ggplot2 (R)

I'm quite new in R and I'm struggling overlaying a filled histogram divided in 6 classes and a KDE based on the whole distribution (not the individual distributions of the 6 classes).
I have this dataset with 4 columns (data1, data2, data3, origin) with all data being continuous and origin being my categories (geographical locations). I'm fine with plotting the histogram for data1 with the 6 classes but when I'm adding the KDE curve, it's also divided in 6 curves (one for each class). I think I understand I have to override the first aes argument and make a new one when I call geom_density, but I can't find how to do so.
Translating my problem with the iris dataset, I would like the KDE curve for the Sepal.Length and not one KDE curve Sepal.Length for each species. Here is my code and my results with iris data.
ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
geom_histogram() +
theme_minimal() +
geom_density(kernel="gaussian", bw= 0.1, alpha=.3)
The problem is that the histogram displays counts, which integrates to the sum, and the density plot shows, well, density, that integrates to 1. To make the two compatible you'd have to use the 'computed variables' of the stat parts of the layers, which are accessible with after_stat(). You can either scale the density such that it integrates to the sum, or you can scale the histogram such that it integrates to 1.
Scaling the histogram to the density:
library(ggplot2)
ggplot(iris, aes(Sepal.Length, fill = Species)) +
geom_histogram(aes(y = after_stat(density)),
position = 'identity') +
geom_density(bw = 0.1, alpha = 0.3)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Scaling density to counts. To do this properly you should multiply the count computed variable with the binwidth parameter of the histogram.
ggplot(iris, aes(Sepal.Length, fill = Species)) +
geom_histogram(binwidth = 0.2, position = 'identity') +
geom_density(aes(y = after_stat(count * 0.2)),
bw = 0.1, alpha = 0.3)
Created on 2021-06-22 by the reprex package (v1.0.0)
As a side note; the default position argument for the histogram is to stack bars on top of oneanother. Setting position = "identity" prevents this. Alternatively, you could also set position = "stack" in the density layer.
EDIT: Sorry I've seem to have glossed over the 'I want 1 KDE for the entire Sepal.Length'-part of the question. You'd have to manually set the group, like so:
ggplot(iris, aes(Sepal.Length, fill = Species)) +
geom_histogram(binwidth = 0.2) +
geom_density(bw = 0.1, alpha = 0.3,
aes(group = 1, y = after_stat(count * 0.2)))
I also found a nice tutorial on combining geom_hist() and geom_density() with matching scale on sthda.com
http://www.sthda.com/english/wiki/ggplot2-density-plot-quick-start-guide-r-software-and-data-visualization#combine-histogram-and-density-plots
Reprex from there is:
set.seed(1234)
df <- data.frame(
sex=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5),
rnorm(200, mean=65, sd=5)))
)
library(ggplot2)
ggplot(df, aes(x=weight, color=sex, fill=sex)) +
geom_histogram(aes(y=..density..), alpha=0.5,position="identity") +
geom_density(alpha=.2)

ggplot add Normal Distribution while using `facet_wrap` [duplicate]

This question already has answers here:
using stat_function and facet_wrap together in ggplot2 in R
(6 answers)
Closed 2 years ago.
I'm looking to plot the following histograms:
library(palmerpenguins)
library(tidyverse)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species)
For each histogram, I would like to add a Normal Distribution to each histogram with each species mean and standard deviation.
Of course I'm aware that I could compute the group specific mean and SD before embarking on the ggplot command, but I wonder whether there is a smarter/faster way to do this.
I have tried:
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram() +
facet_wrap(~species) +
stat_function(fun = dnorm)
But this only gives me a thin line at the bottom:
Any ideas?
Thanks!
Edit
I guess what I'm trying to recreate is this simple command from Stata:
hist bill_length_mm, by(species) normal
which gives me this:
I understand that there are some suggestions here: using stat_function and facet_wrap together in ggplot2 in R
But I'm specifically looking for a short answer that does not require me creating a separate function.
A while I ago I sort of automated this drawing of theoretical densities with a function that I put in the ggh4x package I wrote, which you might find convenient. You would just have to make sure that the histogram and theoretical density are at the same scale (for example counts per x-axis unit).
library(palmerpenguins)
library(tidyverse)
library(ggh4x)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 1) +
stat_theodensity(aes(y = after_stat(count))) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
You can vary the bin size of the histogram, but you'd have to adjust the theoretical density count too. Typically you'd multiply by the binwidth.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 2) +
stat_theodensity(aes(y = after_stat(count)*2)) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
Created on 2021-01-27 by the reprex package (v0.3.0)
If this is too much of a hassle, you can always convert the histogram to density instead of the density to counts.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density))) +
stat_theodensity() +
facet_wrap(~species)
While the ggh4x package is the way to go in this case, a more generalizable approach is with tapply and the use of the PANEL variable which is added to the data when a facet is applied.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density)), bins = 30) +
facet_wrap(~species) +
geom_line(aes(y = dnorm(bill_length_mm,
mean = tapply(bill_length_mm, species, mean, na.rm = TRUE)[PANEL],
sd = tapply(bill_length_mm, species, sd, na.rm = TRUE)[PANEL])))

Draw two plots in R with ggplot and par

I start to study R. I'm starting with Iris dataset in the package datasets. To draw som graph I need to use the ggplot2 package. How can I split the Plots window and draw two graphs?
I try with the following code, but only one graph is showed.
iris=datasets::iris
par(mfrow=c(2,1))
ggplot(iris, aes(x=Sepal.Length,y=Sepal.Width,color=Species))+ geom_point(size=3)
ggplot(iris, aes(x=Petal.Length,y=Petal.Width,color=Species))+ geom_point(size=3)
use win.graph() to split the window into two.
Since you have not provided dataset, if you want to create a side by side plot try based on my example below
Try this:
library(cowplot)
iris1 <- ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot() + theme_bw()
iris2 <- ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_density(alpha = 0.7) + theme_bw() +
theme(legend.position = c(0.8, 0.8))
plot_grid(iris1, iris2, labels = "AUTO")
As ggplot2 is based on grid graphics system instead of base plot, par does not effective in adjusting ggplot2 plots, and the latest version of ggplot2 has already supported the arrangement of different plots, and you can set tags for each of them:
iris=datasets::iris
ggplot(iris, aes(x=Sepal.Length,y=Sepal.Width,color=Species))+ geom_point(size=3) + labs(tag = "A") -> p1
ggplot(iris, aes(x=Petal.Length,y=Petal.Width,color=Species))+ geom_point(size=3) + labs(tag = "B") -> p2
p1 + p2
For more sophisticated arrangement, you can use patchwork package to arrange them

how to prevent axes from intersecting in ggplot2

I'm using ggplot2 to make line graphs of some log-transformed data that all have large values (between 10^6 and 10^8); since the axes doesn't start at zero, I'd prefer not to have them intersect at the "origin."
Here's what the axes currently look like:
I'd prefer something more like one gets from base graphics (but I'm additionally using geom_ribbon and other fancy things I really like in ggplot2, so I'd prefer to find a ggplot2 solution):
Here's what I'm doing currently:
mydata <- data.frame(Day = rep(1:8, 3),
Treatment = rep(c("A", "B", "C"), each=8),
Value = c(7.415929, 7.200486, 7.040555, 7.096490, 7.056413, 7.143981, 7.429724, 7.332760, 7.643673, 7.303994, 7.343151, 6.923636, 6.923478, 7.249170, 7.513370, 7.438630, 7.209895, 7.000063, 7.160154, 6.677734, 7.026307, 6.830495, 6.863329, 7.319219))
ggplot(mydata, aes(x=Day, y=Value, group=Treatment))
+ theme_classic()
+ geom_line(aes(color = Treatment), size=1)
+ scale_y_continuous(labels = math_format(10^.x))
+ coord_cartesian(ylim = c(6.4, 7.75), xlim=c(0.5, 8))
plot(mydata$Day, mydata$Value, frame.plot = F) #non-intersecting axes
Workaround for this problem would be to remove axis lines with theme(axis.line=element_blank()) and then add false axis lines with geom_segment() - one for x axis and second for y axis. x, y , xend and yend values are determined from your plot (taken as the smallest and the largest values shown on plot for each corresponding axis) and axis limits used in coord_cartesian() (minimal value of limits to ensure that segment is plotted in place of axis).
ggplot(mydata, aes(x=Day, y=Value, group=Treatment)) +theme_classic() +
geom_line(aes(color = Treatment), size=1) +
scale_y_continuous(labels = math_format(10^.x))+
coord_cartesian(ylim = c(6.4, 7.75), xlim=c(0.5, 8))+
theme(axis.line=element_blank())+
geom_segment(x=2,xend=8,y=6.4,yend=6.4)+
geom_segment(x=0.5,xend=0.5,y=6.5,yend=7.75)
An older question. But since I was looking for this functionality recently I thought I'd flag the ggh4x package, which adds guides for truncating axes.
library(ggh4x)
#> Loading required package: ggplot2
ggplot(data.frame(x=0:10, y=0:10), aes(x, y)) +
geom_point() +
theme_classic() +
guides(x = "axis_truncated", y = "axis_truncated")
Created on 2023-02-17 with reprex v2.0.2
Apart from convenience, two nice things about the ggh4x option are that 1) it is stable across more complex plot compositions like faceting and 2) its dependencies are a subset of those belonging to ggplot2, so you aren't introducing a bunch of additional imports.
P.S. There's an open GitHub issue to bring this kind of "floating axes" functionality to the main ggplot2 library. It looks like it will eventually be incorporated.

Resources