R facet_wrap and geom_density with multiple groups - r

Here's my dataframe:
df <- data.frame(state = sample(c(0, 1), replace = TRUE, size = 100),
X1 = rnorm(100, 0, 1),
X2 = rnorm(100, 1, 2),
X3 = rnorm(100, 2, 3))
What I would like to do is to plot for each variable X1, X2, X3 two densities/histograms (given the value of state) on the same plot BUT in such a way that all of the plots are on the same facet. I've done these things separately:
ggplot() +
geom_density(data = df, aes(x = X1, group = state, fill = state), alpha = 0.5, adjust = 2) +
xlab("X1") +
ylab("Density")
ggplot(gather(df[df$state == 0, 2:4]), aes(value)) +
geom_density() +
facet_wrap(~key, scales = 'free_x')
but I struggle to make it work together.

I'm assuming that you want the three facets for variables X1, X2 and X3, each with two curves filled by state.
You'll need to convert state to a factor, to make it a categorical variable, using dplyr::mutate(). I would also use the newer tidyr::pivot_longer() instead of gather: this will generate columns name + value by default.
Your data but with a seed to make it reproducible and named df1:
set.seed(1001)
df1 <- data.frame(state = sample(c(0, 1), replace = TRUE, size = 100),
X1 = rnorm(100, 0, 1),
X2 = rnorm(100, 1, 2),
X3 = rnorm(100, 2, 3))
The plot:
library(dplyr)
library(tidyr)
library(ggplot2)
df1 %>%
pivot_longer(-state) %>%
mutate(state = as.factor(state)) %>%
ggplot(aes(value)) +
geom_density(aes(fill = state), alpha = 0.5) +
facet_wrap(~name)
Result:

Related

ggplot scale alpha to only one variable

Is there a straightforward way to use alpha on only one variable using ggplot2?
I would have imagined that scale_alpha_manual(values = c(0, 1)) would work like scale_color_manual(). Ultimately, I am interested in doing an animation where a colour appears gradually.
df = data.frame(time = 1:100, x1 = rnorm(100, 1, 5), x2 = rnorm(100, 1, 5)) %>%
melt(id.vars = 'time')
df %>%
ggplot(aes(time, value, colour = variable)) +
geom_line() +
scale_color_manual(values = c('black', 'blue')) +
scale_alpha_manual(values = c(0, 1))
I am trying to get something like this but with an alpha
You could use the alpha as an aesthetic:
df = data.frame(time = 1:100, x1 = rnorm(100, 1, 5), x2 = rnorm(100, 1, 5)) %>%
melt(id.vars = 'time')
df %>%
ggplot(aes(time, value, colour = variable, alpha=variable)) +
geom_line() +
scale_color_manual(values = c('black', 'blue')) +
scale_alpha_manual(values = c(0.3, 1))

ggplot2 density plotting different size of data in R

I have two data sets, their size is 500 and 1000. I want to plot density for these two data sets in one plot.
I have done some search in google.
r-geom-density-values-in-y-axis
ggplot2-plotting-two-or-more-overlapping-density-plots-on-the-same-graph/
the data sets in above threads are the same
df <- data.frame(x = rnorm(1000, 0, 1), y = rnorm(1000, 0, 2), z = rnorm(1000, 2, 1.5))
But if I have different data size, I should normalize the data first in order to compare the density between data sets.
Is it possible to make density plot with different data size in ggplot2?
By default, all densities are scaled to unit area. If you have two datasets with different amounts of data, you can plot them together like so:
df1 <- data.frame(x = rnorm(1000, 0, 2))
df2 <- data.frame(y = rnorm(500, 1, 1))
ggplot() +
geom_density(data = df1, aes(x = x),
fill = "#E69F00", color = "black", alpha = 0.7) +
geom_density(data = df2, aes(x = y),
fill = "#56B4E9", color = "black", alpha = 0.7)
However, from your latest comment, I take that that's not what you want. Instead, you want the areas under the density curves to be scaled relative to the amount of data in each group. You can do that with the ..count.. aesthetics:
df1 <- data.frame(x = rnorm(1000, 0, 2), label=rep('df1', 1000))
df2 <- data.frame(x = rnorm(500, 1, 1), label=rep('df2', 500))
df=rbind(df1, df2)
ggplot(df, aes(x, y=..count.., fill=label)) +
geom_density(color = "black", alpha = 0.7) +
scale_fill_manual(values = c("#E69F00", "#56B4E9"))

Producing a "fuzzy" RD plot with ggplot2

My question is similar to this but the answers there will not work for me. Basically, I'm trying to produce a regression discontinuity plot with a "fuzzy" design that uses all the data for the treatment and control groups, but only plots the regression line within the "range" of the treatment and control groups.
Below, I've simulated some data and produced the fuzzy RD plot with base graphics. I'm hoping to replicate this plot with ggplot2. Note that the most important part of this is that the light blue regression line is fit using all the blue points, while the peach colored regression line is fit using all the red points, despite only being plotted over the ranges in which individuals were intended to receive treatment. That's the part I'm having a hard time replicating in ggplot.
I'd like to move to ggplot because I'd like to use faceting to produce this same plot across various units in which participants were nested. In the code below, I show a non-example using geom_smooth. When there's no fuzziness within a group, it works fine, but otherwise it fails. If I could get geom_smooth to be limited to only specific ranges, I think I'd be set. Any and all help is appreciated.
Simulate data
library(MASS)
mu <- c(0, 0)
sigma <- matrix(c(1, 0.7, 0.7, 1), ncol = 2)
set.seed(100)
d <- as.data.frame(mvrnorm(1e3, mu, sigma))
# Create treatment variable
d$treat <- ifelse(d$V1 <= 0, 1, 0)
# Introduce fuzziness
d$treat[d$treat == 1][sample(100)] <- 0
d$treat[d$treat == 0][sample(100)] <- 1
# Treatment effect
d$V2[d$treat == 1] <- d$V2[d$treat == 1] + 0.5
# Add grouping factor
d$group <- gl(9, 1e3/9)
Produce regression discontinuity plot with base
library(RColorBrewer)
pal <- brewer.pal(5, "RdBu")
color <- d$treat
color[color == 0] <- pal[1]
color[color == 1] <- pal[5]
plot(V2 ~ V1,
data = d,
col = color,
bty = "n")
abline(v = 0, col = "gray", lwd = 3, lty = 2)
# Fit model
m <- lm(V2 ~ V1 + treat, data = d)
# predicted achievement for treatment group
pred_treat <- predict(m,
newdata = data.frame(V1 = seq(-3, 0, 0.1),
treat = 1))
# predicted achievement for control group
pred_no_treat <- predict(m,
newdata = data.frame(V1 = seq(0, 4, 0.1),
treat = 0))
# Add predicted achievement lines
lines(seq(-3, 0, 0.1), pred_treat, col = pal[4], lwd = 3)
lines(seq(0, 4, 0.1), pred_no_treat, col = pal[2], lwd = 3)
# Add legend
legend("bottomright",
legend = c("Treatment", "Control"),
lty = 1,
lwd = 2,
col = c(pal[4], pal[2]),
box.lwd = 0)
non-example with ggplot
d$treat <- factor(d$treat, labels = c("Control", "Treatment"))
library(ggplot2)
ggplot(d, aes(V1, V2, group = treat)) +
geom_point(aes(color = treat)) +
geom_smooth(method = "lm", aes(color = treat)) +
facet_wrap(~group)
Notice the regression lines extending past the treatment range for groups 1 and 2.
There's probably a more graceful way to make the lines with geom_smooth, but it can be hacked together with geom_segment. Munge the data.frames outside of the plotting call if you like.
ggplot(d, aes(x = V1, y = V2, color = factor(treat, labels = c('Control', 'Treatment')))) +
geom_point(shape = 21) +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
geom_segment(data = data.frame(t(predict(m, data.frame(V1 = c(-3, 0), treat = 1)))),
aes(x = -3, xend = 0, y = X1, yend = X2), color = pal[4], size = 1) +
geom_segment(data = data.frame(t(predict(m, data.frame(V1 = c(0, 4), treat = 0)))),
aes(x = 0, xend = 4, y = X1, yend = X2), color = pal[2], size = 1)
Another option is geom_path:
df <- data.frame(V1 = c(-3, 0, 0, 4), treat = c(1, 1, 0, 0))
df <- cbind(df, V2 = predict(m, df))
ggplot(d, aes(x = V1, y = V2, color = factor(treat, labels = c('Control', 'Treatment')))) +
geom_point(shape = 21) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
geom_path(data = df, size = 1)
For the edit with facets, if I understand what you want correctly, you can calculate a model for each group with lapply and predict for each group. Here I'm recombine with dplyr::bind_rows instead of do.call(rbind, ...) for the .id parameter to insert the group number from the list element name, though there are other ways to do the same thing.
df <- data.frame(V1 = c(-3, 0, 0, 4), treat = c('Treatment', 'Treatment', 'Control', 'Control'))
m_list <- lapply(split(d, d$group), function(x){lm(V2 ~ V1 + treat, data = x)})
df <- dplyr::bind_rows(lapply(m_list, function(x){cbind(df, V2 = predict(x, df))}), .id = 'group')
ggplot(d, aes(x = V1, y = V2, color = treat)) +
geom_point(shape = 21) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
geom_path(data = df, size = 1) +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
facet_wrap(~group)

ggplot2 - boxplot multiple data.frames while staying in order

I apologize if this is more for SO instead of CV.
I am attempting to include a second boxplot into an existing boxplot that is ordered by the mean of the values plotted. When I include the boxplot from the second data.frame (representing a control sample to the other plots), the original plot looses its ordering.
Below is an example:
x1 <- data.frame("V1" = runif(100, 0, 100), "siteno" = "X1") #mean = 50.3
x2 <- data.frame("V1" = runif(100, 200, 450), "siteno" = "X2") #mean = 322.4
x3 <- data.frame("V1" = runif(100, 50, 150), "siteno" = "X3") #mean = 97.8
xData <- rbind(x1,x2,x3)
xData$siteno <- with(xData, reorder(siteno, V1, mean))
zData <- data.frame("V1" = runif(300, 0, 450), "siteno" = "Z1") #mean = 224.2
#orders xData correctly
ggplot(xData, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, width=1, position = position_dodge(width = 1), outlier.colour = "dark gray", outlier.size = 1)
this produces the below plot with x variables correctly ordered by mean:
If I try the code below to add the control data, the order of the x variables is lost:
x1 <- data.frame("V1" = runif(100, 0, 100), "siteno" = "X1") #mean = 50.3
x2 <- data.frame("V1" = runif(100, 200, 450), "siteno" = "X2") #mean = 322.4
x3 <- data.frame("V1" = runif(100, 50, 150), "siteno" = "X3") #mean = 97.8
xData <- rbind(x1,x2,x3)
xData$siteno <- with(xData, reorder(siteno, V1, mean))
zData <- data.frame("V1" = runif(300, 0, 450), "siteno" = "Z1") #mean = 224.2
#orders xData correctly
ggplot(xData, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, width=1, position = position_dodge(width = 1), outlier.colour = "dark gray", outlier.size = 1) +
geom_boxplot(data=zData, aes(x = siteno , y = V1))
this produces the following plot with no ordering of the x variables:
The point of my graph is to show the test values ordered by their mean and then have the control values boxplot off to the right for visual reference. I imagine there could be a solution that combines the xData and zData dataframes; I am willing to try that if there are some suggestions.
Thank you for your time.
When you use two data frame to combine data in one plot original levels (and order) is lost and new levels than combine data from both data frames are used. You don't get this behavior for fill values because you don't provide fill argument for the second data frame. But for the discrete x scale both data frames are combined and new levels are X1, X2, X3 and Z1.
Without making one data frame from all values you can use scale_x_discrete() and then in argument limits= use function levels() to get original order of levels of siteno and combine it with Z1 for reference level.
ggplot(xData, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, outlier.colour = "dark gray",
outlier.size = 1) +
geom_boxplot(data=zData, aes(x = siteno , y = V1))+
scale_x_discrete(limits=c(levels(xData$siteno),"Z1"))
why not add them all in one data.frame and order all 4 levels in that?
data2 <- rbind(xData, zData)
ggplot(data2, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, width=1,
position = position_dodge(width = 1),
outlier.colour = "dark gray", outlier.size = 1)
capture the desired order, eg, something like:
ord <- xvars[order(mean(xvars))]
Then use scale_x_discrete()

How can I use different color palettes for different layers in ggplot2?

Is it possible to plot two sets of data on the same plot, but use different color palettes for each set?
testdf <- data.frame( x = rnorm(100),
y1 = rnorm(100, mean = 0, sd = 1),
y2 = rnorm(100, mean = 10, sd = 1),
yc = rnorm(100, mean = 0, sd = 3))
ggplot(testdf, aes(x, y1, colour = yc)) + geom_point() +
geom_point(aes(y = y2))
What I would like to see is one set of data, say y1, in blues (color set by yc), and the other set in reds (again color set by yc).
The legend should then show 2 color scales, one in blue, the other red.
Thanks for your suggestions.
If you translate the "blues" and "reds" to varying transparency, then it is not against ggplot's philosophy. So, using Thierry's Moltenversion of the data set:
ggplot(Molten, aes(x, value, colour = variable, alpha = yc)) + geom_point()
Should do the trick.
That's not possible with ggplot2. I think it against the philosophy of ggplot2 because it complicates the interpreatation of the plot.
Another option is to use different shapes to separate the points.
testdf <- data.frame( x = rnorm(100),
y1 = rnorm(100, mean = 0, sd = 1),
y2 = rnorm(100, mean = 10, sd = 1),
yc = rnorm(100, mean = 0, sd = 3))
Molten <- melt(testdf, id.vars = c("x", "yc"))
ggplot(Molten, aes(x, value, colour = yc, shape = variable)) + geom_point()

Resources