Stratifying a density plot by different groups using ggplot2 in R - r

I have a data frame in R called x that has hundreds of rows. Each row is a person. I have two variables, Height, which is continuous, and Country, which is a factor. I want to plot a smoothed histogram of all of the heights of the individuals. I want to stratify it by Country. I know that I can do that with the following code:
library(ggplot2)
ggplot(x, aes(x=Height, colour = (Country == "USA"))) + geom_density()
This plots everyone from the USA as one color (true) and everyone from any other country as the other color (false). However, what I would really like to do is plot everyone from the USA in one color and everyone from Oman, Nigeria, and Switzerland as the other color. How would I adapt my code to do this?

I made up some data for illustration:
head(iris)
table(iris$Species)
df <- iris
df$Species2 <- ifelse(df$Species == "setosa", "blue",
ifelse(df$Species == "virginica", "red", ""))
library(ggplot2)
p <- ggplot(df, aes(x = Sepal.Length, colour = (Species == "setosa")))
p + geom_density() # Your example
# Now let's choose the other created column
p <- ggplot(df, aes(x = Sepal.Length, colour = Species2))
p + geom_density() + facet_wrap(~Species2)
Edit to get rid of the "countries" that you don't want in the plot, just subset them out of the data frame you use in the plot (note that the labels with the colours don't exactly match but that can be changed within the data frame itself):
p <- ggplot(df[df$Species2 %in% c("blue", "red"),], aes(x = Sepal.Length, colour = Species2))
p + geom_density() + facet_wrap(~Species2)
And to overlay the lines just take out the facet_wrap:
p + geom_density()

I enjoyed working through the excellent answer above. Here are my mods.
df <- iris
df$Species2 <- ifelse(df$Species == "setosa", "blue",
ifelse(df$Species == "virginica", "red", ""))
homes2006 <- df
names(homes2006)[names(homes2006)=="Species"] <- "ownership"
homes2006a <- as.data.frame(sapply(homes2006, gsub,
pattern ="setosa", replacement = "renters"))
homes2006b <- as.data.frame(sapply(homes2006a, gsub, pattern = "virginica",
replacement = "home-owners"))
homes2006c <- as.data.frame(sapply(homes2006b, gsub, pattern = "versicolor",
replacement = "home-owners"))
##somehow sepal-length became a factor column
homes2006c[,1] <- as.numeric(homes2006c[,1])
library(ggplot2)
p <- ggplot(homes2006c, aes(x = Sepal.Length,
colour = (ownership == "home-owners")))
p + ylab("number of households") +
xlab("monthly income (NIS)") +
ggtitle("income distribution by home ownership") +
geom_density()

Related

Two ggplot with subset in pipe

I would like to plot two lines in one plot (both has the same axis), but one of the line is subset values from data frame.
I tries this
DF%>% ggplot(subset(., Cars == "A"), aes(Dates, sold_A)) +geom_line()+ ggplot(., (Dates, sold_ALL))
but this error occurred
object '.' not found
(1) You can't add a ggplot object to a ggplot object:
(2) Try taking the subset out of the call to ggplot.
DF %>%
subset(Cars == "A") %>%
ggplot(aes(Dates, sold_A)) +
geom_line() +
geom_line(data = DF, aes(Dates, sold_ALL))
I think you are misunderstanding how ggplot works. If we are attempting to do it your way, we could do:
DF %>% {ggplot(subset(., Cars == "A"), aes(Dates, sold_A)) +
geom_line(colour = "red") +
geom_line(data = subset(., Cars == "B"), colour = "blue") +
lims(y = c(0, 60))}
But it would be easier and better to map the variable Cars to the colour aesthetic, so your plot would be as simple as:
DF %>% ggplot(aes(Dates, sold_A, color = Cars)) + geom_line() + lims(y = c(0, 60))
Note that as well as being simpler code, we get the legend for free.
Data
Obviously, we didn't have your data for this question, but here is a constructed data set with the same name and same column variables:
set.seed(1)
Dates <- rep(seq(as.Date("2020-01-01"), by = "day", length = 20), 2)
Cars <- rep(c("A", "B"), each = 20)
sold_A <- rpois(40, rep(c(20, 40), each = 20))
DF <- data.frame(Dates, Cars, sold_A)
If you want only one plot, you would need to remove ggplot(., aes(Dates, sold_ALL)) and wrap directly into a structure like geom_line(data=., aes(Dates, sold_ALL)). Then, use the sage advice from #MrFlick. Here an example using iris data:
library(ggplot2)
library(dplyr)
#Example
iris %>%
{ggplot(subset(., Species == "setosa"), aes(Sepal.Length, Sepal.Width)) +
geom_point()+
geom_point(data=.,aes(Petal.Length, Petal.Width),color='blue')}
Output:
The ggplot(., aes(Dates, sold_ALL)) is creating a new canvas and the new plot.

Density over histogram using ggplot2

I have "long" format data frame which contains two columns: first col - values, second col- sex [Male - 1/Female - 2]. I wrote some code to make a histogram of entire dataset (code below).
ggplot(kz6, aes(x = values)) +
geom_histogram()
However, I want also add a density over histogram to emphasize the difference between sexes i.e. I want to combine 3 plots: histogram for entire dataset, and 2 density plots for each sex. I tried to use some examples (one, two, three, four), but it still does not work. Code for density only works, while the combinations of hist + density does not.
density <- ggplot(kz6, aes(x = x, fill = factor(sex))) +
geom_density()
both <- ggplot(kz6, aes(x = values)) +
geom_histogram() +
geom_density()
both_2 <- ggplot(kz6, aes(x = values)) +
geom_histogram() +
geom_density(aes(x = kz6[kz6$sex == 1,]))
P.S. some examples contains y=..density.. what does it mean? How to interpret this?
To plot a histogram and superimpose two densities, defined by a categorical variable, use appropriate aesthetics in the call to geom_density, like group or colour.
ggplot(kz6, aes(x = values)) +
geom_histogram(aes(y = ..density..), bins = 20) +
geom_density(aes(group = sex, colour = sex), adjust = 2)
Data creation code.
I will create a test data set from built-in data set iris.
kz6 <- iris[iris$Species != "virginica", 4:5]
kz6$sex <- "M"
kz6$sex[kz6$Species == "versicolor"] <- "F"
kz6$Species <- NULL
names(kz6)[1] <- "values"
head(kz6)

How do I facet by geom / layer in ggplot2?

I'm hoping to recreate the gridExtra output below with ggplot's facet_grid, but I'm unsure of what variable ggplot identifies with the layers in the plot. In this example, there are two geoms...
require(tidyverse)
a <- ggplot(mpg)
b <- geom_point(aes(displ, cyl, color = drv))
c <- geom_smooth(aes(displ, cyl, color = drv))
d <- a + b + c
# output below
gridExtra::grid.arrange(
a + b,
a + c,
ncol = 2
)
# Equivalent with gg's facet_grid
# needs a categorical var to iter over...
d$layers
#d + facet_grid(. ~ d$layers??)
The gridExtra output that I'm hoping to recreate is:
A hacky way of doing this is to take the existing data frame and create two, three, as many copies of the data frame you need with a value linked to it to be used for the facet and filtering later on. Union (or rbind) the data frames together into one data frame. Then set up the ggplot and geoms and filter each geom for the desired attribute. Also for the facet use the existing attribute to split the plots.
This can be seen below:
df1 <- data.frame(
graph = "point_plot",
mpg
)
df2 <- data.frame(
graph = "spline_plot",
mpg
)
df <- rbind(df1, df2)
ggplot(df, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(data = filter(df, graph == "point_plot")) +
geom_smooth(data = filter(df, graph == "spline_plot"), se=FALSE) +
facet_grid(. ~ graph)
If you really want to show different plots on different facets, one hacky way would be to make separate copies of the data and subset those...
mpg2 <- mpg %>% mutate(facet = 1) %>%
bind_rows(mpg %>% mutate(facet = 2))
ggplot(mpg2, aes(displ, cyl, color = drv)) +
geom_point(data = subset(mpg2, facet == 1)) +
geom_smooth(data = subset(mpg2, facet == 2)) +
facet_wrap(~facet)

ggplot2: create a plot using selected facets with part data

I would like to create a plot with
Using part of the data to create a base plot with facet_grid of two columns.
Use remaining part of the data and plot on top of the existing facets but using only a single column.
The sample code:
library(ggplot2)
library(gridExtra)
df2 <- data.frame(Class=rep(c('A','B','C'),each=20),
Type=rep(rep(c('T1','T2'),each=10), 3),
X=rep(rep(1:10,each=2), 3),
Y=c(rep(seq(3,-3, length.out = 10),2),
rep(seq(1,-4, length.out = 10),2),
rep(seq(-2,-8, length.out = 10),2)))
g2 <- ggplot() + geom_line(data = df2 %>% filter(Class %in% c('B','C')),
aes(X,Y,color=Class, linetype=Type)) +
facet_grid(Type~Class)
g3 <- ggplot() + geom_line(data = df2 %>% filter(Class == 'A'),
aes(X,Y,color=Class, linetype=Type)) +
facet_wrap(~Type)
grid.arrange(g2, g3)
The output plots:
How to include g3 plot on g2 plot? The resulting plot should include the g3 two lines twice on two facets.
I assume the plot below is what you were looking for.
library(dplyr)
library(ggplot2)
df_1 <- filter(df2, Class %in% c('B','C')) %>%
dplyr::rename(Class_1 = Class)
df_2 <- filter(df2, Class == 'A')
g2 <- ggplot() +
geom_line(data = df_1,
aes(X, Y, color = Class_1, linetype = Type)) +
geom_line(data = df_2,
aes(X, Y, color = Class, linetype = Type)) +
facet_grid(Type ~ Class_1)
g2
explaination
For tasks like this I found it better to work with two datasets. Since the variable df2$class has three unique values: A, B and C, faceting Class~Type does not give you desired plot, since you want the data for df2$Class == "A" to be displayed in the respective facets.
That's why I renamed variable Class in df_1 to Class_1 because this variable only contains two unique values: B and C.
Faceting Class_1 ~ Type allows you to plot the data for df2$Class == "A" on top without being faceted by Class.
edit
Based on the comment below here is a solution using only one dataset
g2 + geom_line(data = filter(df2, Class == 'A')[, -1],
aes(X, Y, linetype = Type, col = "A"))
Similar / same question: ggplot2:: Facetting plot with the same reference plot in all panels

Scatterplot on top of line plot ggplot

Example data:
set.seed(245)
cond <- rep( c("control","treatment"), each=10)
xval <- round(10+ rnorm(20), 1)
yval <- round(10+ rnorm(20), 1)
df <- data.frame(cond, xval, yval)
df$xval[cond=="treatment"] <- df$xval[cond=="treatment"] + 1.5
I would like the "treatment" condition be plotted as a line and the "control" data be plotted as a scatter plot. So far, I have found a work around where I specify them to both be lines but chose that the control line be plotted as 'blank' in scale_linetype_manual:
plot <-ggplot(data=df, aes(x=xval, y=yval, group=cond, colour=cond))+
geom_line(aes(linetype=cond))+
geom_point(aes(shape=cond))+
scale_linetype_manual(values=c('blank', 'solid'))
However, there must be a more straightforward way of plotting the control as a scatter plot and the treatment as a line plot. Eventually, I'd like to remove the geom_point from the treatment line. The way it is now, it would remove the its from the control as well leaving me with nothing for the control.
Any insight would be helpful. Thanks.
I hope I have understood you correctly. You may use a "treatment" subset of the data for geom_line, and a "control" subset for geom_point.
After the subsetting, there is only one "cond" for geom_line ("treatment") and one for geom_point ("control"). Thus, I have removed the aes mapping between "cond" and colour, linetype and shape respectively. You may wish to set these aesthetics to desired values instead. Similarly, no need for group in this solution.
ggplot(data = subset(df, cond == "treatment"), aes(x = xval, y = yval)) +
geom_line() +
geom_point(data = subset(df, cond == "control"))
Update following comment from OP, "Now, what if my data had actually three "conditions" where 2 of the conditions would be plotted as lines and the other 1 is scatterplot."
# some data
set.seed(123)
cond <- rep( c("contr","treat", "post-treat"), each = 10)
xval <- rnorm(30)
yval <- rnorm(30)
df <- data.frame(cond, xval, yval)
# plot
ggplot(data = subset(df, cond %in% c("treat", "post-treat")), aes(x = xval, y = yval)) +
geom_line(aes(group = cond, colour = cond)) +
geom_point(data = subset(df, cond == "contr"))

Resources