How to plot two histograms on the same axis scale? - r

I have two dataframes: dataf1, dataf2. They have the same structure and columns.
3 columns names are A,B,C. And they both have 50 rows.
I would like to plot the histogram of column B on dataf1 and dataf2. I can plot two histograms separately but they are not of the same scale. I would like to know how to either put them on the same histogram using different colors or plot two histograms of the same scale?
ggplot() + aes(dataf1$B)+ geom_histogram(binwidth=1, colour="black",fill="white")
ggplot() + aes(dataf2$B)+ geom_histogram(binwidth=1, colour="black", fill="white")

Combine your data into a single data frame with a new column marking which data frame the data originally came from. Then use that new column for the fill aesthetic for your plot.
data1$source="Data 1"
data2$source="Data 2"
dat_combined = rbind(data1, data2)
You haven't provided sample data, so here are a few examples of possible plots, using the built-in iris data frame. In the plots below, dat is analogous to dat_combined, Petal.Width is analogous to B, and Species is analogous to source.
dat = subset(iris, Species != "setosa") # We want just two species
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", alpha=0.5, binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="dodge", binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", binwidth=0.1) +
facet_grid(Species ~ .)

As Zheyuan says, you just need to set the y limits for each plot to get them on the same scale. With ggplot2, one way to do this is with the lims command (though scale_y_continuous and coord_cartesian also work, albeit slightly differently). You also should never use data$column indside aes(). Instead, use the data argument for the data frame and unquoted column names inside aes(). Here's an example with some built-in data.
p1 = ggplot(mtcars, aes(x = mpg)) + geom_histogram() + lims(y = c(0, 13))
p2 = ggplot(iris, aes(x = Sepal.Length)) + geom_histogram() + lims(y = c(0, 13))
gridExtra::grid.arrange(p1, p2, nrow = 1)
Two get two histograms on the same plot, the best way is to combine your data frames. A guess, without seeing what your data looks like:
dataf = rbind(dataf1["B"], dataf2["B"])
dafaf$source = c(rep("f1", nrow(dataf1)), rep("f2", nrow(dataf2))
ggplot(dataf, aes(x = B, fill = source)) +
geom_histogram(position = "identity", alpha = 0.7)

Related

How to create a faceting based on a column in second dataframe

I want to create a graph that looks like:
Now I found the cowplot package which gave me a quite similar result.
library(ggplot2)
library(cowplot)
library(data.table)
library(ggridges)
d = data.table(iris)
a = ggplot(data = d, aes(x=Sepal.Length, y=..count..)) +
geom_density_line() +
geom_density_line(data = d[Species == "virginica"], aes(), fill="lightblue", color="darkblue") +
theme_bw()
b = ggplot(data = d, aes(x=Sepal.Length, y=..count..)) +
geom_density_line() +
geom_density_line(data = d[Species == "versicolor"], aes(), fill="lightgreen", color="darkgreen") +
theme_bw()
cowplot::plot_grid(a, b, labels=NULL)
The result looks like:
But, there are two points that bother me:
It has a y-axix in both plots
With my real data where I have up to 10 grids, the code becomes very long
I think it must be possible to use facet_grid(), facet_wrap() or something similar to achieve this. But I don't know how I can use a column from the dataframe of the second geometry to create these subsets without changing/losing the greyish background plot.
We can feed one layer a version of the data without Species, so it calculates the whole thing as our background context, and another layer that includes Species to map that to fill and to the appropriate facet.
library(ggplot2); library(ggridges)
ggplot(data = iris, aes(Sepal.Length, ..count..)) +
geom_density_line(data = subset(iris, select = -Species)) +
geom_density_line(aes(fill = Species)) +
facet_wrap(~Species)

How to wrap a wrapped plot plus another plot? [duplicate]

This question already has answers here:
Side-by-side plots with ggplot2
(14 answers)
Closed 2 years ago.
I want the six plots in one plot. And I would like to specify the titles of each plot. How can I do that?
p<-ggplot(df, aes(x=COD_NEIGHB))+
geom_bar(stat="count", width=0.3, fill="steelblue")+
theme_minimal()
# histogram of the strata in the whole dataset
s<-ggplot(data = df, mapping = aes(x = COD_NEIGHB)) +
geom_bar(stat="count", width=0.3, fill="steelblue")+
facet_wrap(~ fold)
plot_grid(p, s, ncol=2,label_size = 2)
After that, I did the suggestion
df$fold <- as.character(df$fold)
# Duplicate data. Set category in the duplicated dataset to "all"
df_all <- df
df_all$fold <- "all"
# Row bind the datasets
df_all <- rbind(df, df_all)
ggplot(df_all, aes(x=COD_NEIGHB)) +
geom_bar(stat="count", width=0.3, fill="steelblue")+
facet_wrap(~fold)
But now the problem is the scale. y-axis has to be on the proper scale.
any idea for that?
Thanks in advance!!!!
If I got you right you want a plot with facets by categories plus an additonal facet showing the total data. One option to achieve this is to duplicate your dataset to add an addtional category "all".
As no example data was provided I make use of mtcars to show you the basic idea:
library(ggplot2)
mtcars$cyl <- as.character(mtcars$cyl)
# Duplicate data. Set category in the duplicated dataset to "all"
mtcars_all <- mtcars
mtcars_all$cyl <- "all"
# Row bind the datasets
mtcars_all <- rbind(mtcars, mtcars_all)
ggplot(mtcars_all, aes(hp, mpg)) +
geom_point() +
facet_wrap(~cyl)
Here is another useful tool with the help of the ggarrange() function from the ggpubr package. You can arrange multiple plots on one page or multiple pages. You can also create a common, unique legend once you merge all your plots together.
Similar to previous answers, I used mtcars to demonstrate a simple use case:
#install.packages("ggpubr")
#library(ggpubr)
p1 <- ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
theme_minimal()
p2 <- ggplot(mtcars, aes(y = mpg, x = cyl)) +
geom_boxplot() +
theme_minimal()
ggarrange(p1, p2, ncol = 2)

Problem when trying to plot two histograms using fill aesthetic

I've been trying to plot two histograms by using the fill aesthetic and a specific column with two levels. However, instead of displaying both desired histograms, my code displays one histogram with the whole data and another only for the second classification. I don't know if there is a problem in my syntax neither if this is some kind of tricky issue.
library(tidyverse)
db1 <- data.frame(type=rep("A",100),val=rnorm(n=100,mean=50,sd=10))
db2 <- data.frame(type=rep("B",150),val=rnorm(n=150,mean=50,sd=10))
dbf <- bind_rows(db1,db2)
P1 <- ggplot(db1, aes(x=val)) + geom_histogram()
P2 <- ggplot(db2, aes(x=val)) + geom_histogram()
PF <- ggplot(dbf, aes(x=val)) + geom_histogram()
I want to get this, P1 and P2
ggplot(db1, aes(x=val)) + geom_histogram(fill="red", alpha=0.5) + geom_histogram(data=db2, aes(x=val),fill="green", alpha=0.5)
What I want
But the code I think should work, P1 and P2 with the fill aesthetic for column val
ggplot(dbf, aes(x=val)) + geom_histogram(aes(fill=type), alpha=0.5)
My code
Produces the combination of PF and P2
ggplot(dbf, aes(x=val)) + geom_histogram(fill="red", alpha=0.5) + geom_histogram(data=db2, aes(x=val),fill="green", alpha=0.5)
What I get
Any help or idea will be highly appreciated!
All you need is to pass position = "identity" to your geom_histogram function.
library(tidyverse)
library(ggplot2)
db1 <- data.frame(type=rep("A",100),val=rnorm(n=100,mean=50,sd=10))
db2 <- data.frame(type=rep("B",150),val=rnorm(n=150,mean=50,sd=10))
dbf <- bind_rows(db1,db2)
ggplot(dbf, aes(x=val, fill = type)) + geom_histogram(alpha=0.5, position = "identity")
Is your goal to show the overlap via the color combination? I'm not sure how to force geom_histogram to show the overlap, but geom_density does do what you want. You can play with the bandwidth (bw) to show more or less detail.
dbf %>% ggplot() +
aes(x = val, fill = type) +
geom_density(alpha = .5, bw = .5) +
scale_fill_manual(values = c("red","green"))

How to adjust the distance between the facet_grid frame and boxplots using ggplot_build & ggplot_gtable

We are presenting outcome data using boxplots and group these for different approaches using facet_grid with ggplot2 and geom_boxplot.
We would like to add more space between the boxplots and the frame of the facet_grid as shown in the graphic below.
The code we used included ggplot_build and gglot_table.
Which parameter of ggplot_build needs to be set to get more space in the panels?
require(ggplot2)
require(grid)
dat <- rbind(data.frame(approach=1,product=1,value=seq(1,20,0.5)),
data.frame(approach=1,product=2,value=seq(5,15,0.3)),
data.frame(approach=1,product=3,value=seq(5,17,0.2)),
data.frame(approach=2,product=1,value=seq(1,13,0.3)),
data.frame(approach=2,product=2,value=seq(3,18,0.5)),
data.frame(approach=2,product=3,value=seq(4,25,0.7)),
data.frame(approach=3,product=1,value=seq(1,15,0.6)),
data.frame(approach=3,product=2,value=seq(3,16,0.5)),
data.frame(approach=3,product=3,value=seq(1,10,0.1)))
dat$product<-as.factor(dat$product)
gg1<-ggplot(dat, aes(x =product, y = value)) +
geom_boxplot() +
facet_grid(cols=vars(approach))
gt = ggplot_gtable(ggplot_build(gg1))
grid.draw(gt)
ggplot(dat, aes(x =product, y = value)) +
geom_boxplot() +
coord_cartesian(xlim = c(1.2, 2, 2.8)) +
facet_grid(cols=vars(approach))

How to use scale from previous plot in current plot with ggplot2?

I am using ggplot2 to produce a plot that has 3 facets. Because I am comparing two different data sets, I would like to then be able to plot a second data set using the same y scale for the facets as in the first plot. However, I cannot find a simple way to save the settings of the first plot to then re-use them with the second plot. Since each facet has its own y scale, it will be a pain to specify them by hand for the second plot. Does anyone know of a quick way of re-using scales? To make this concrete, here is how I am generating first my plot:
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p + facet_wrap(~ cyl, scales = "free_y")
EDIT
When applying one of the suggestions below, I found out that my problem was more specific than described in the original post, and it had to do specifically with scaling of the error bars. Concretely, the error bars look weird when I rescale the second plot as suggested. Does anyone have any suggestions on how to keep the same scale for both plots and dtill display the error bars correctly? I am attaching example below for concreteness:
#Create sample data
d1 <- data.frame(fixtype=c('ff','ff','fp','fp'), detype=c('det','pro','det','pro'),
diffscore=c(-1,-15,3,-17),se=c(2,3,1,2))
d2 <- data.frame(fixtype=c('ff','ff','fp','fp'), detype=c('det','pro','det','pro'),
diffscore=c(-1,-3,-2,-1),se=c(4,3,5,3))
#Plot for data frame 1, this is the scale I want to keep
lim_d1 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d1, aes(colour=detype, y=diffscore, x=detype)) +
geom_point(aes(size=1), shape=15) +
geom_errorbar(lim_d1, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
#Plot for data frame 2 original scale
lim_d2 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d2, aes(colour=detype, y=diffscore, x=detype)) +
geom_point(aes(size=1), shape=15) +
geom_errorbar(lim_d2, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
#Plot for data frame 2 adjusted scale. This is where things go wrong!
#As suggested below, first I plot the first plot, then I draw a blank screen and try
#to plot the second data frame on top.
lim_d2 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d1, aes(colour=detype, y=diffscore, x=detype)) +
geom_blank() +
geom_point(data=d2, aes(size=1), shape=15) +
geom_errorbar(lim_d2, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
#If the error bars are fixed, by adding data=d2 to geom_errorbar(), then
#the error bars are displayed correctly but the scale gets distorted again
lim_d2 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d1, aes(colour=detype, y=diffscore, x=detype)) +
geom_blank() +
geom_point(data=d2, aes(size=1), shape=15) +
geom_errorbar(data=d2,lim_d2, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
You may first call ggplot on your original data where you add a geom_blank as a first layer. This sets up a plot area, with axes and legends based on the data provided in ggplot.
Then add geoms which use data other than the original data. In the example, I use a simple subset of the original data.
From ?geom_blank: "The blank geom draws nothing, but can be a useful way of ensuring common scales between different plots.".
ggplot(data = mtcars, aes(mpg, wt)) +
geom_blank() +
geom_point(data = subset(mtcars, wt < 3)) +
facet_wrap(~ cyl, scales = "free_y")
Here is an ugly hack that assumes you have an identical facetting layout in both plots.
It replaces the panel element of the ggplot build.
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p1 <- p + facet_wrap(~ cyl, scales = "free_y") + labs(title = 'original')
# create "other" data.frame
n <- nrow(mtcars)
set.seed(201405)
mtcars2 <- mtcars[sample(seq_len(n ),n-15),]
# create this second plot
p2 <- p1 %+% mtcars2 + labs(title = 'new data')
# and a copy so we can attempt to fix
p3 <- p2 + labs(title = 'new data original scale')
# use ggplot_build to construct the plots for rendering
p1b <- ggplot_build(p1)
p3b <- ggplot_build(p3)
# replace the 'panel' information in plot 2 with that
# from plot 1
p3b[['panel']] <- p1b[['panel']]
# render the revised plot
# for comparison
library(gridExtra)
grid.arrange(p1 , p2, ggplot_gtable(p3b))

Resources