How to create a faceting based on a column in second dataframe - r

I want to create a graph that looks like:
Now I found the cowplot package which gave me a quite similar result.
library(ggplot2)
library(cowplot)
library(data.table)
library(ggridges)
d = data.table(iris)
a = ggplot(data = d, aes(x=Sepal.Length, y=..count..)) +
geom_density_line() +
geom_density_line(data = d[Species == "virginica"], aes(), fill="lightblue", color="darkblue") +
theme_bw()
b = ggplot(data = d, aes(x=Sepal.Length, y=..count..)) +
geom_density_line() +
geom_density_line(data = d[Species == "versicolor"], aes(), fill="lightgreen", color="darkgreen") +
theme_bw()
cowplot::plot_grid(a, b, labels=NULL)
The result looks like:
But, there are two points that bother me:
It has a y-axix in both plots
With my real data where I have up to 10 grids, the code becomes very long
I think it must be possible to use facet_grid(), facet_wrap() or something similar to achieve this. But I don't know how I can use a column from the dataframe of the second geometry to create these subsets without changing/losing the greyish background plot.

We can feed one layer a version of the data without Species, so it calculates the whole thing as our background context, and another layer that includes Species to map that to fill and to the appropriate facet.
library(ggplot2); library(ggridges)
ggplot(data = iris, aes(Sepal.Length, ..count..)) +
geom_density_line(data = subset(iris, select = -Species)) +
geom_density_line(aes(fill = Species)) +
facet_wrap(~Species)

Related

How to plot plots using different datasets using ggplot2

I am trying to plot a line and a dot using ggplot2. I looked at but it assumes the same dataset is used. What I tried to do is
library(ggplot2)
df = data.frame(Credible=c(0.2, 0.3),
len=c(0, 0))
zero=data.frame(x0=0,y0=0)
ggplot(data=df, aes(x=Credible, y=len, group=1)) +
geom_line(color="red")+
geom_point()+
labs(x = "Credible", y = "")
ggplot(data=zero, aes(x=x0, y=y0, group=1)) +
geom_point(color="green")+
labs(x = "Credible", y = "")
but it generates just the second plot (the dot).
Thank you
Given the careful and reproducible way you created your question I am not just referring to the old answer as it may be harder to transfer the subsetting etc.
You initialize a new ggplot object whenever you run ggplot(...).
If you want to add a layer on top of an existing plot you have to operate on the same object, something like this:
ggplot(data=df, aes(x=Credible, y=len, group=1)) +
geom_line(color="red")+
geom_point()+
labs(x = "Credible", y = "") +
geom_point(data=zero, color="green", aes(x=x0, y=y0, group=1))
Note how in the second geom_point the data source and aesthetics are explicitly specified instead to prevent them being inherited from the initial object.

Problem when trying to plot two histograms using fill aesthetic

I've been trying to plot two histograms by using the fill aesthetic and a specific column with two levels. However, instead of displaying both desired histograms, my code displays one histogram with the whole data and another only for the second classification. I don't know if there is a problem in my syntax neither if this is some kind of tricky issue.
library(tidyverse)
db1 <- data.frame(type=rep("A",100),val=rnorm(n=100,mean=50,sd=10))
db2 <- data.frame(type=rep("B",150),val=rnorm(n=150,mean=50,sd=10))
dbf <- bind_rows(db1,db2)
P1 <- ggplot(db1, aes(x=val)) + geom_histogram()
P2 <- ggplot(db2, aes(x=val)) + geom_histogram()
PF <- ggplot(dbf, aes(x=val)) + geom_histogram()
I want to get this, P1 and P2
ggplot(db1, aes(x=val)) + geom_histogram(fill="red", alpha=0.5) + geom_histogram(data=db2, aes(x=val),fill="green", alpha=0.5)
What I want
But the code I think should work, P1 and P2 with the fill aesthetic for column val
ggplot(dbf, aes(x=val)) + geom_histogram(aes(fill=type), alpha=0.5)
My code
Produces the combination of PF and P2
ggplot(dbf, aes(x=val)) + geom_histogram(fill="red", alpha=0.5) + geom_histogram(data=db2, aes(x=val),fill="green", alpha=0.5)
What I get
Any help or idea will be highly appreciated!
All you need is to pass position = "identity" to your geom_histogram function.
library(tidyverse)
library(ggplot2)
db1 <- data.frame(type=rep("A",100),val=rnorm(n=100,mean=50,sd=10))
db2 <- data.frame(type=rep("B",150),val=rnorm(n=150,mean=50,sd=10))
dbf <- bind_rows(db1,db2)
ggplot(dbf, aes(x=val, fill = type)) + geom_histogram(alpha=0.5, position = "identity")
Is your goal to show the overlap via the color combination? I'm not sure how to force geom_histogram to show the overlap, but geom_density does do what you want. You can play with the bandwidth (bw) to show more or less detail.
dbf %>% ggplot() +
aes(x = val, fill = type) +
geom_density(alpha = .5, bw = .5) +
scale_fill_manual(values = c("red","green"))

How to format the scatterplots of data series in R

I have been struggling in creating a decent looking scatterplot in R. I wouldn't think it was so difficult.
After some research, it seemed to me that ggplot would have been a choice allowing plenty of formatting. However, I'm struggling in understanding how it works.
I'd like to create a scatterplot of two data series, displaying the points with two different colours, and perhaps different shapes, and a legend with series names.
Here is my attempt, based on this:
year1 <- mpg[which(mpg$year==1999),]
year2 <- mpg[which(mpg$year==2008),]
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy,color="yellow")) +
geom_point(data = year2, aes(x=cty,y=hwy,color="green")) +
xlab('cty') +
ylab('hwy')
Now, this looks almost OK, but with non-matching colors (unless I suddenly became color-blind). Why is that?
Also, how can I add series names and change symbol shapes?
Don't build 2 different dataframes:
df <- mpg[which(mpg$year%in%c(1999,2008)),]
df$year<-as.factor(df$year)
ggplot() +
geom_point(data = df, aes(x=cty,y=hwy,color=year,shape=year)) +
xlab('cty') +
ylab('hwy')+
scale_color_manual(values=c("green","yellow"))+
scale_shape_manual(values=c(2,8))+
guides(colour = guide_legend("Year"),
shape = guide_legend("Year"))
This will work with the way you currently have it set-up:
ggplot() +
geom_point(data = year1, aes(x=cty,y=hwy), col = "yellow", shape=1) +
geom_point(data = year2, aes(x=cty,y=hwy), col="green", shape=2) +
xlab('cty') +
ylab('hwy')
You want:
library(ggplot2)
ggplot(mpg, aes(cty, hwy, color=as.factor(year)))+geom_point()

How to plot two histograms on the same axis scale?

I have two dataframes: dataf1, dataf2. They have the same structure and columns.
3 columns names are A,B,C. And they both have 50 rows.
I would like to plot the histogram of column B on dataf1 and dataf2. I can plot two histograms separately but they are not of the same scale. I would like to know how to either put them on the same histogram using different colors or plot two histograms of the same scale?
ggplot() + aes(dataf1$B)+ geom_histogram(binwidth=1, colour="black",fill="white")
ggplot() + aes(dataf2$B)+ geom_histogram(binwidth=1, colour="black", fill="white")
Combine your data into a single data frame with a new column marking which data frame the data originally came from. Then use that new column for the fill aesthetic for your plot.
data1$source="Data 1"
data2$source="Data 2"
dat_combined = rbind(data1, data2)
You haven't provided sample data, so here are a few examples of possible plots, using the built-in iris data frame. In the plots below, dat is analogous to dat_combined, Petal.Width is analogous to B, and Species is analogous to source.
dat = subset(iris, Species != "setosa") # We want just two species
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", alpha=0.5, binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="dodge", binwidth=0.1)
ggplot(dat, aes(Petal.Width, fill=Species)) +
geom_histogram(position="identity", colour="grey40", binwidth=0.1) +
facet_grid(Species ~ .)
As Zheyuan says, you just need to set the y limits for each plot to get them on the same scale. With ggplot2, one way to do this is with the lims command (though scale_y_continuous and coord_cartesian also work, albeit slightly differently). You also should never use data$column indside aes(). Instead, use the data argument for the data frame and unquoted column names inside aes(). Here's an example with some built-in data.
p1 = ggplot(mtcars, aes(x = mpg)) + geom_histogram() + lims(y = c(0, 13))
p2 = ggplot(iris, aes(x = Sepal.Length)) + geom_histogram() + lims(y = c(0, 13))
gridExtra::grid.arrange(p1, p2, nrow = 1)
Two get two histograms on the same plot, the best way is to combine your data frames. A guess, without seeing what your data looks like:
dataf = rbind(dataf1["B"], dataf2["B"])
dafaf$source = c(rep("f1", nrow(dataf1)), rep("f2", nrow(dataf2))
ggplot(dataf, aes(x = B, fill = source)) +
geom_histogram(position = "identity", alpha = 0.7)

ggplot2: plotting order of factors within a geom

I have a (dense) dataset that consist of 5 groups, so my data.frame looks something like x,y,group. I can plot this data and colour the points based on their group using:
p= ggplot(dataset, aes(x,y))
p = p + geom_point(aes(colour = group))
My problem is now only that I want to control which group is on top. At the moment it looks like this is randomly decided for (at least I don't seem to be able to figure out what makes something be the "top" dot). Is there any way in ggplot2 to tell geom_point what the order of dots should be?
The order aesthetic is probably what you want.
library(ggplot2)
d <- ggplot(diamonds, aes(carat, price, colour = cut))
d + geom_point()
dev.new()
d + geom_point(aes(order = sample(seq_along(carat))))
The documentation is at ?aes_group_order
When you create the factor variable, you can influence the ordering using the levels parameter
f = factor(c('one', 'two'), levels = c('one', 'two'))
dataset = data.frame(x=1:2, y=1:2, group=f)
p = ggplot(dataset, aes(x,y))
p = p + geom_point(aes(colour = group))
Now, ggplot uses this order for the legend.

Resources