How can I reformat this ridgeline plot so that is a vertical ridgeline plot?
My real dataset is the actual PDF. For a minimum reproducible example, I generate distributions and extract the PDFs to use in a dummy function. The dataframe has a model name (for grouping), x values paired with PDF ordinates, and an id field that separates the different ridgeline levels (i.e., ridgeline y axis).
set.seed(123)
makedfs <- function(name, id, mu, sig) {
vals <- exp(rnorm(1000, mean=mu, sd=sig))
pdf <-density(vals)
model <- rep(name, length(pdf$x))
prox <- rep(id, length(pdf$x))
df <- data.frame(model, prox, pdf$x, pdf$y)
colnames(df) <- c("name", "id", "x", "pdf")
return(df)
}
df1 <- makedfs("model1", 0, log(1), 1)
df2 <- makedfs("model2", 0, log(0.5), 2)
df3 <- makedfs("model1", 1, log(0.2), 0.8)
df4 <- makedfs("model2", 1, log(1), 1)
df <- rbind(df1, df2, df3, df4)
From this answer, R Ridgeline plot with multiple PDFs can be overlayed at same level, I have a standard joyplot:
ggplot(df, aes(x=x, y=id, height = pdf, group = interaction(name, id), fill = name)) +
geom_ridgeline(alpha = 0.5, scale = .5) +
scale_y_continuous(limits = c(0, 5)) +
scale_x_continuous(limits = c(-6, 6))
I am trying the code below based on https://wilkelab.org/ggridges/reference/geom_vridgeline.html but it throws an error on the width parameter.
p <- ggplot(df, aes(x=id, y=x, width = ..density.., fill=id)) +
geom_vridgeline(stat="identity", trim=FALSE, alpha = 0.85, scale = 2)
Error in `f()`:
! Aesthetics must be valid computed stats. Problematic aesthetic(s): width = ..density...
Did you map your stat in the wrong layer?
If you wanted the same graph, just vertically oriented, you need to use the same parameters when you use geom_vridgeline.
I swapped the limits you originally set so you can see that it's the same.
ggplot(df, aes(x = id, y = x, width = pdf, fill = name,
group = interaction(name, id))) +
geom_vridgeline(alpha = 0.85, scale = .5) +
scale_x_continuous(limits = c(0, 5)) + # <-- note that the x & y switched
scale_y_continuous(limits = c(-6, 6))
Related
How can I create a ridgeline plot where multiple densities can be overlayed at the same ordinate and distinguished by color?
My real dataset is the actual PDF. For a minimum reproducible example, I generate distributions and extract the PDFs to use in a dummy function. The dataframe has a model name (for grouping), x values paired with PDF ordinates, and an id field that separates the different ridgeline levels (i.e., ridgeline y axis).
Make example dataframe
makedfs <- function(name, id, mu, sig) {
vals <- exp(rnorm(1000, mean=mu, sd=sig))
pdf <-density(vals)
model <- rep(name, length(pdf$x))
prox <- rep(id, length(pdf$x))
df <- data.frame(model, prox, pdf$x, pdf$y)
colnames(df) <- c("name", "id", "x", "pdf")
return(df)
}
df1 <- makedfs("model1", 0, log(1), 1)
df2 <- makedfs("model2", 0, log(0.5), 2)
df3 <- makedfs("model1", 1, log(0.2), 0.8)
df4 <- makedfs("model2", 1, log(1), 1)
df <- rbind(df1, df2, df3, df4)
head(df,5)
name id x pdf
1 model1 0 -0.6541933 0.0003544569
2 model1 0 -0.5999428 0.0007800386
3 model1 0 -0.5456924 0.0016274229
4 model1 0 -0.4914420 0.0032231582
5 model1 0 -0.4371915 0.0060682580
A quick plot for the first two models looks like this:
plot(df1$x, df1$pdf, type ="l", col=1, xlim=c(-6,6), xlab = "x", ylab = "pdf")
lines(df2$x, df2$pdf, col=2)
legend("topleft", c("df1", "df2"), col = 1:2, lty = 1)
Ridgeline not working
I expected to see the above curves at y=0 on this ridgeline plot, but there is something wrong with the lines and fills for all PDF curves.
library(ggplot2)
p <- ggplot(df, aes(x=x, y=id, height = pdf, group = name, fill = name)) +
geom_ridgeline(alpha = 0.5, scale = 1) +
scale_y_continuous(limits = c(0, 5)) +
scale_x_continuous(limits = c(-6, 6))
How can I produce the expected ridgeline plot?
IMHO the issue is that you messed up the grouping. Instead of grouping by name you have to group by both name and id using e.g. interaction:
set.seed(123)
library(ggplot2)
library(ggridges)
ggplot(df, aes(x=x, y=id, height = pdf, group = interaction(name, id), fill = name)) +
geom_ridgeline(alpha = 0.5, scale = .5) +
scale_y_continuous(limits = c(0, 5)) +
scale_x_continuous(limits = c(-6, 6))
I'm working on a population pyramide that should be saved as a gif. Kind of like in this tutorial of Flowing Data, but with ggplot instead of plotrix.
My workflow:
1) Create a population pyramide
2) Create multiple pyramide-plots in a for-loop
for (i in unique(d$jahr)) {
d_jahr <- d %>%
filter(jahr == i)
p <- ggplot(data = d_jahr, aes(x = anzahl, y = value, fill = art)) +
geom_bar(data = filter(d_jahr, art == "w"), stat = "identity") +
geom_bar(data = filter(d_jahr, art == "m"), stat = "identity") +
coord_flip() +
labs(title = paste(i), x = NULL, y = NULL)
ggsave(p,filename=paste("img/",i,".png",sep=""))
}
3) Save the plots as gif with the animation package
My problem:
All years have different values, so the x-axis have different ranges. This results in weird looks in a gif, because the center of the plots jumps to the right, to the left, to the right...
Is it possible to fix the x-axis (in this case y-axis, because of coord-flip()) over multiple plots that are created independently?
You can fix the range of an axis by setting the limits parameter:
library(ggplot2)
lst <- list(
data.frame(x = 1:100, y=runif(100, 0, 10)),
data.frame(x = 1:100, y=runif(100, 0, 100))
)
ylim <- range(do.call(c, lapply(lst, "[[", "y")))
for (x in seq(lst)) {
print(ggplot(lst[[x]], aes(x, y)) + geom_point() + scale_y_continuous(limits=ylim))
}
or by adding +ylim(ylim) instead of +scale_y_continuous(limits=ylim) (via #DeveauP).
I have recently came across a problem with ggplot2::geom_density that I am not able to solve. I am trying to visualise a density of some variable and compare it to a constant. To plot the density, I am using the ggplot2::geom_density. The variable for which I am plotting the density, however, happens to be a constant (this time):
df <- data.frame(matrix(1,ncol = 1, nrow = 100))
colnames(df) <- "dummy"
dfV <- data.frame(matrix(5,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.2, position = "identity") +
geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)
This is OK and something I would expect. But, when I shift this distribution to the far right, I get a plot like this:
df <- data.frame(matrix(71,ncol = 1, nrow = 100))
colnames(df) <- "dummy"
dfV <- data.frame(matrix(75,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.2, position = "identity") +
geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)
which probably means that the kernel estimation is still taking 0 as the centre of the distribution (right?).
Is there any way to circumvent this? I would like to see a plot like the one above, only the centre of the kerner density would be in 71 and the vline in 75.
Thanks
Well I am not sure what the code does, but I suspect the geom_density primitive was not designed for a case where the values are all the same, and it is making some assumptions about the distribution that are not what you expect. Here is some code and a plot that sheds some light:
# Generate 10 data sets with 100 constant values from 0 to 90
# and then merge them into a single dataframe
dfs <- list()
for (i in 1:10){
v <- 10*(i-1)
dfs[[i]] <- data.frame(dummy=rep(v,100),facet=v)
}
df <- do.call(rbind,dfs)
# facet plot them
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.5, position = "identity") +
facet_wrap( ~ facet,ncol=5 )
Yielding:
So it is not doing what you thought it was, but it is also probably not doing what you want. You could of course make it "translation-invariant" (almost) by adding some noise like this for example:
set.seed(1234)
noise <- +rnorm(100,0,1e-3)
dfs <- list()
for (i in 1:10){
v <- 10*(i-1)
dfs[[i]] <- data.frame(dummy=rep(v,100)+noise,facet=v)
}
df <- do.call(rbind,dfs)
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.5, position = "identity") +
facet_wrap( ~ facet,ncol=5 )
Yielding:
Note that there is apparently a random component to the geom_density function, and I can't see how to set the seed before each instance, so the estimated density is a bit different each time.
this is my first stack overflow post and I am a relatively new R user, so please go gently!
I have a data frame with three columns, a participant identifier, a condition (factor with 2 levels either Placebo or Experimental), and an outcome score.
set.seed(1)
dat <- data.frame(Condition = c(rep("Placebo",10),rep("Experimental",10)),
Outcome = rnorm(20,15,2),
ID = factor(rep(1:10,2)))
I would like to construct a bar plot with two bars with the mean outcome score for each condition and the standard deviation as an error bar. I would like to then overlay lines connecting points for each participant's score in each condition. So the plot displays the individual response as well as the group mean.If it is also possible I would like to include an axis break.
I don't seem to be able to find any advice in other threads, apologies if I am repeating a question.
Many Thanks.
p.s. I realise that presenting data in this way will not be to everyones tastes. It is for a specific requirement!
This ought to work:
library(ggplot2)
library(dplyr)
dat.summ <- dat %>% group_by(Condition) %>%
summarize(mean.outcome = mean(Outcome),
sd.outcome = sd(Outcome))
ggplot(dat.summ, aes(x = Condition, y = mean.outcome)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = mean.outcome - sd.outcome,
ymax = mean.outcome + sd.outcome),
color = "dodgerblue", width = 0.3) +
geom_point(data = dat, aes(x = Condition, y = Outcome),
color = "firebrick", size = 1.2) +
geom_line(data = dat, aes(x = Condition, y = Outcome, group = ID),
color = "firebrick", size = 1.2, alpha = 0.5) +
scale_y_continuous(limits = c(0, max(dat$Outcome)))
Some people are better with ggplot's stat functions and arguments than I am and might do it differently. I prefer to just transform my data first.
set.seed(1)
dat <- data.frame(Condition = c(rep("Placebo",10),rep("Experimental",10)),
Outcome = rnorm(20,15,2),
ID = factor(rep(1:10,2)))
dat.w <- reshape(dat, direction = 'wide', idvar = 'ID', timevar = 'Condition')
means <- colMeans(dat.w[, 2:3])
sds <- apply(dat.w[, 2:3], 2, sd)
ci.l <- means - sds
ci.u <- means + sds
ci.width <- .25
bp <- barplot(means, ylim = c(0,20))
segments(bp, ci.l, bp, ci.u)
segments(bp - ci.width, ci.u, bp + ci.width, ci.u)
segments(bp - ci.width, ci.l, bp + ci.width, ci.l)
segments(x0 = bp[1], x1 = bp[2], y0 = dat.w[, 2], y1 = dat.w[, 3], col = 1:10)
points(c(rep(bp[1], 10), rep(bp[2], 10)), dat$Outcome, col = 1:10, pch = 19)
Here is a method using the transfomations inside ggplot2
ggplot(dat) +
stat_summary(aes(x=Condition, y=Outcome, group=Condition), fun.y="mean", geom="bar") +
stat_summary(aes(x=Condition, y=Outcome, group=Condition), fun.data="mean_se", geom="errorbar", col="green", width=.8, size=2) +
geom_line(aes(x=Condition, y=Outcome, group=ID), col="red")
I need to add a legend of the two lines (best fit line and 45 degree line) on TOP of my two plots. Sorry I don't know how to add plots! Please please please help me, I really appreciate it!!!!
Here is an example
type=factor(rep(c("A","B","C"),5))
xvariable=seq(1,15)
yvariable=2*xvariable+rnorm(15,0,2)
newdata=data.frame(type,xvariable,yvariable)
p = ggplot(newdata,aes(x=xvariable,y=yvariable))
p+geom_point(size=3)+ facet_wrap(~ type) +
geom_abline(intercept =0, slope =1,color="red",size=1)+
stat_smooth(method="lm", se=FALSE,size=1)
Here is another approach which uses aesthetic mapping to string constants to identify different groups and create a legend.
First an alternate way to create your test data (and naming it DF instead of newdata)
DF <- data.frame(type = factor(rep(c("A", "B", "C"), 5)),
xvariable = 1:15,
yvariable = 2 * (1:15) + rnorm(15, 0, 2))
Now the ggplot code. Note that for both geom_abline and stat_smooth, the colour is set inside and aes call which means each of the two values used will be mapped to a different color and a guide (legend) will be created for that mapping.
ggplot(DF, aes(x = xvariable, y = yvariable)) +
geom_point(size = 3) +
geom_abline(aes(colour="one-to-one"), intercept =0, slope = 1, size = 1) +
stat_smooth(aes(colour="best fit"), method = "lm", se = FALSE, size = 1) +
facet_wrap(~ type) +
scale_colour_discrete("")
Try this:
# original data
type <- factor(rep(c("A", "B", "C"), 5))
x <- 1:15
y <- 2 * x + rnorm(15, 0, 2)
df <- data.frame(type, x, y)
# create a copy of original data, but set y = x
# this data will be used for the one-to-one line
df2 <- data.frame(type, x, y = x)
# bind original and 'one-to-one data' together
df3 <- rbind.data.frame(df, df2)
# create a grouping variable to separate stat_smoothers based on original and one-to-one data
df3$grp <- as.factor(rep(1:2, each = nrow(df)))
# plot
# use original data for points
# use 'double data' for abline and one-to-one line, set colours by group
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 3) +
facet_wrap(~ type) +
stat_smooth(data = df3, aes(colour = grp), method = "lm", se = FALSE, size = 1) +
scale_colour_manual(values = c("red","blue"),
labels = c("abline", "one-to-one"),
name = "") +
theme(legend.position = "top")
# If you rather want to stack the two keys in the legend you can add:
# guide = guide_legend(direction = "vertical")
#...as argument in scale_colour_manual
Please note that this solution does not extrapolate the one-to-one line outside the range of your data, which seemed to be the case for the original geom_abline.