The code below fills each of the two densities with color under the area of the curve:
library(ggplot2)
#fake data
dat <- data.frame(dens = c(rnorm(100), rnorm(100, 2, 0.5))
, group = rep(c("C", "P"), each = 100))
#fill the area under the curve
ggplot(dat, aes(x = dens, fill = group)) + geom_density(alpha = 0.75)
How can I achieve the following two goals?
1) Only fill each curve within a specified interval. For example, interval [-1.5, 2.0] for group 'C' and [0.5, 2.8] for group 'P'.
2) Add a vertical segment (from x-axis to the curve) for each density. For example, at x=0.2 for group 'C' and at x=1.9 for group 'P'.
To get you stared, here's your first question:
library(dplyr)
library(purrr)
library(tidyr)
library(ggplot2)
as.data.frame.density <- function(x) data.frame(x = x$x, y = x$y)
densities <- dat %>%
group_nest(group) %>%
mutate(dens = map(data, ~as.data.frame(density(.$dens)))) %>%
unnest(dens)
ggplot(densities, aes(x = x, y = y, group = group)) +
geom_density(stat = 'identity') +
geom_density(
aes(fill = group),
. %>% filter((group == "C" & between(x, -1.5, 2.0)) | (group == "P" & between(x, 0.5, 2.8))),
stat = 'identity',
alpha = 0.75
)
There are other ways of calculating the densities per group, using dplyr is just one way. It is probably good to set an equal bandwidth to the two density estimations.
Adding the segments is similar to this approach, you just need to find the correct values in the densities data.frame.
Related
I know how to plot several density curves/polygrams on one plot, but not conditional density plots.
Reproducible example:
require(ggplot2)
# generate data
a <- runif(200, min=0, max = 1000)
b <- runif(200, min=0, max = 1000)
c <- sample(c("A", "B"), 200, replace =T)
df <- data.frame(a,b,c)
# plot 1
ggplot(df, aes(a, fill = c)) +
geom_density(position='fill', alpha = 0.5)
# plot 2
ggplot(df, aes(b, fill = c)) +
geom_density(position='fill', alpha = 0.5)
In my real data I have a bunch of these paired conditional density plots and I would need to overlay one over the other to see (and show) how different (or similar) they are. Does anyone know how to do this?
One way would be to plot the two versions as layers. The overlapping areas will be slightly different, depending on the layer order, based on how alpha works in ggplot2. This may or may not be what you want. You might fiddle with the two alphas, or vary the border colors, to distinguish them more.
ggplot(df, aes(fill = c)) +
geom_density(aes(a), position='fill', alpha = 0.5) +
geom_density(aes(b), position='fill', alpha = 0.5)
For example, you might make it so the fill only applies to one layer, but the other layer distinguishes groups using the group aesthetic, and perhaps a different linetype. This one seems more readable to me, especially if there is a natural ordering to the two variables that justifies putting one in the "foreground" and one in the "background."
ggplot(df) +
geom_density(aes(a, group = c), position='fill', alpha = 0.2, linetype = "dashed") +
geom_density(aes(b, fill = c), position='fill', alpha = 0.5)
I'm not so sure if "on top of one another" is a great idea. Jon's ideas are probably the way to go. But what about just plotting side-by side - our brains can cope with that and we can compare this pretty well.
Make it long, then use facet.
Another option might be an animated graph (see 2nd code chunk below).
require(ggplot2)
#> Loading required package: ggplot2
library(tidyverse)
a <- runif(200, min=0, max = 1000)
b <- runif(200, min=0, max = 1000)
#### BAAAAAD idea to call anything "c" in R!!! Don't do this. ever!
d <- sample(c("A", "B"), 200, replace =T)
df <- data.frame(a,b,d)
df %>% pivot_longer(cols = c(a,b)) %>%
ggplot(aes(value, fill = d)) +
geom_density(position='fill', alpha = 0.5) +
facet_grid(~name)
library(gganimate)
p <- df %>% pivot_longer(cols = c(a,b)) %>%
ggplot(aes(value, fill = d)) +
geom_density(position='fill', alpha = 0.5) +
labs(title = "{closest_state}")
p_anim <- p + transition_states(name)
animate(p_anim, duration = 2, fps = 5)
Created on 2022-06-14 by the reprex package (v2.0.1)
Although it is not the overlay you might have thought of, it facilitates the comparison of density curves:
library(tidyverse)
library(ggridges)
library(truncnorm)
DF <- tibble(
alpha = rtruncnorm(n = 200, a = 0, b = 1000, mean = 500, sd = 50),
beta = rtruncnorm(n = 200, a = 0, b = 1000, mean = 550, sd = 50)
)
DF <- DF %>%
pivot_longer(c(alpha, beta), names_to = "name", values_to = "meas") %>%
mutate(name = factor(name))
DF %>%
ggplot(aes(meas, name, fill = factor(stat(quantile)))) +
stat_density_ridges(
geom = "density_ridges_gradient",
calc_ecdf = T,
quantiles = 4,
quantile_lines = T
) +
scale_fill_viridis_d(name = "Quartiles")
I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.
I have a data set similar to the one below where I have a lot of data for certain groups and then only single observations for other groups. I would like my single observations to show up as points but the other groups with multiple observations to show up as lines (no points). My code is below:
EDIT: I'm attempting to find a way to do this without using multiple datasets in the geom_* calls because of the issues it causes with the legend. There was an answer that has since been deleted that was able to handle the legend but didn't get rid of the points on the lines. I would potentially like a single legend with points only showing up if they are a single observation.
library(tidyverse)
dat <- tibble(x = runif(10, 0, 5),
y = runif(10, 0, 20),
group = c(rep("Group1", 4),
rep("Group2", 4),
"Single Point 1",
"Single Point 2")
)
dat %>%
ggplot(aes(x = x, y = y, color = group)) +
geom_point() +
geom_line()
Created on 2019-04-02 by the reprex package (v0.2.1)
Only plot the data with 1 point in geom_point() and the data with >1 point in geom_line(). These can be precalculated in mutate().
dat = dat %>%
group_by(group) %>%
mutate(n = n() )
dat %>%
ggplot(aes(x = x, y = y, color = group)) +
geom_point(data = filter(dat, n == 1) ) +
geom_line(data = filter(dat, n > 1) )
Having the legend match this is trickier. This is the sort of thing that that override.aes argument in guide_legend() can be useful for.
In your case I would separately calculate the number of observations in each group first, since that is what the line vs point is based on.
sumdat = dat %>%
group_by(group) %>%
summarise(n = n() )
The result is in the same order as the factor levels in the legend, which is why this works.
Now we need to remove lines and keep points whenever the group has only a single observation. 0 stands for a blank line and NA stands for now shape. I use an ifelse() statement for linetype and shape for override.aes, based on the number of observations per group.
dat %>%
ggplot(aes(x = x, y = y, color = group)) +
geom_point(data = filter(dat, n == 1) ) +
geom_line(data = filter(dat, n > 1) ) +
guides(color = guide_legend(override.aes = list(linetype = ifelse(sumdat$n == 1, 0, 1),
shape = ifelse(sumdat$n == 1, 19, NA) ) ) )
UPDATED:
I have the following data which I would like to draw a line between the groups, based on the slope of 3 factors `("I","II","III").
set.seed(205)
dat = data.frame(t=rep(c("I","II","III"), each=10),
pairs=rep(1:10,3),
value=rnorm(30),
group=rep(c("A","B"), 15))
I have tried the following, but I cannot manage to connect change the color of the line connecting "I" - "III" and "II" - "III":
ggplot(dat %>% group_by(pairs) %>%
mutate(slope = (value[t=="II"] - value[t=="I"])/( value[t=="II"])- value[t=="I"]),
aes(t, value, group=pairs, linetype=group, colour=slope > 0)) +
geom_point() +
geom_line()
This is a very similar issue to
Changing line color in ggplot based on slope
I hope I was able to explain my problem.
We can split apart the data, and get what you want:
#calculate slopes for I and II
dat %>%
filter(t != "III") %>%
group_by(pairs) %>%
# use diff to calculate slope
mutate(slope = diff(value)) -> dat12
#calculate slopes for II and III
dat %>%
filter(t != "I") %>%
group_by(pairs) %>%
# use diff to calculate slope
mutate(slope = diff(value)) -> dat23
ggplot()+
geom_line(data = dat12, aes(x = t, y = value, group = pairs, colour = slope > 0,
linetype = group))+
geom_line(data = dat23, aes(x = t, y = value, group = pairs, colour = slope > 0,
linetype = group))+
theme_bw()
Since the data in dat came sorted by t, I used diff to calculate the slope.
this is my first stack overflow post and I am a relatively new R user, so please go gently!
I have a data frame with three columns, a participant identifier, a condition (factor with 2 levels either Placebo or Experimental), and an outcome score.
set.seed(1)
dat <- data.frame(Condition = c(rep("Placebo",10),rep("Experimental",10)),
Outcome = rnorm(20,15,2),
ID = factor(rep(1:10,2)))
I would like to construct a bar plot with two bars with the mean outcome score for each condition and the standard deviation as an error bar. I would like to then overlay lines connecting points for each participant's score in each condition. So the plot displays the individual response as well as the group mean.If it is also possible I would like to include an axis break.
I don't seem to be able to find any advice in other threads, apologies if I am repeating a question.
Many Thanks.
p.s. I realise that presenting data in this way will not be to everyones tastes. It is for a specific requirement!
This ought to work:
library(ggplot2)
library(dplyr)
dat.summ <- dat %>% group_by(Condition) %>%
summarize(mean.outcome = mean(Outcome),
sd.outcome = sd(Outcome))
ggplot(dat.summ, aes(x = Condition, y = mean.outcome)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = mean.outcome - sd.outcome,
ymax = mean.outcome + sd.outcome),
color = "dodgerblue", width = 0.3) +
geom_point(data = dat, aes(x = Condition, y = Outcome),
color = "firebrick", size = 1.2) +
geom_line(data = dat, aes(x = Condition, y = Outcome, group = ID),
color = "firebrick", size = 1.2, alpha = 0.5) +
scale_y_continuous(limits = c(0, max(dat$Outcome)))
Some people are better with ggplot's stat functions and arguments than I am and might do it differently. I prefer to just transform my data first.
set.seed(1)
dat <- data.frame(Condition = c(rep("Placebo",10),rep("Experimental",10)),
Outcome = rnorm(20,15,2),
ID = factor(rep(1:10,2)))
dat.w <- reshape(dat, direction = 'wide', idvar = 'ID', timevar = 'Condition')
means <- colMeans(dat.w[, 2:3])
sds <- apply(dat.w[, 2:3], 2, sd)
ci.l <- means - sds
ci.u <- means + sds
ci.width <- .25
bp <- barplot(means, ylim = c(0,20))
segments(bp, ci.l, bp, ci.u)
segments(bp - ci.width, ci.u, bp + ci.width, ci.u)
segments(bp - ci.width, ci.l, bp + ci.width, ci.l)
segments(x0 = bp[1], x1 = bp[2], y0 = dat.w[, 2], y1 = dat.w[, 3], col = 1:10)
points(c(rep(bp[1], 10), rep(bp[2], 10)), dat$Outcome, col = 1:10, pch = 19)
Here is a method using the transfomations inside ggplot2
ggplot(dat) +
stat_summary(aes(x=Condition, y=Outcome, group=Condition), fun.y="mean", geom="bar") +
stat_summary(aes(x=Condition, y=Outcome, group=Condition), fun.data="mean_se", geom="errorbar", col="green", width=.8, size=2) +
geom_line(aes(x=Condition, y=Outcome, group=ID), col="red")