Let's say I have the following dataset:
set.seed(42)
data <- data.frame(type = sample(LETTERS[1:2], 40, replace = T),
condition = sample(c("Control", "Treatment"), 40, replace = T),
measurement = runif(40))
And I'd like to create the facetted graph:
ggplot(data, aes( x= condition, y = measurement))+
geom_point()+
facet_wrap(~type)
I'd like also to show the baseline (with geom_hline, for example), that equals mean of control values (mean(data$measurement[data$condition == "Control"]). But because control values will be different in different types (meaning facets on the graph), I can't just calculate one single mean. As they will be different between the facets.
Is there any way to specify yintercept for geom_hline between different facets ?
Something like this, but with the specified yintercept value, calculating the mean values for the control group for each individual facet:
ggplot(data, aes( x= condition, y = measurement))+
geom_point()+
geom_hline(yintercept= mean(data$measurement[data$condition == "Control"]),
linetype="dashed",
color = "red", size=1)+
facet_wrap(~type)
Thanks a lot!
Best regards,
Eugene
You can use stat_summary with fun = mean and geom = "hline", passing only the control subset to the data parameter. You can map yintercept to the y value calculated by the stat.
ggplot(data, aes(x = condition, y = measurement))+
geom_point() +
stat_summary(fun = mean, geom = "hline", aes(yintercept = after_stat(y)),
data = data[data$condition == "Control",], color = "red",
linetype = "dashed") +
facet_wrap(~type)
Related
Problem
I have some data points stored in data.frame with three variables, x, y, and gender. My goal is to draw several generally fitted lines and also lines specifically fitted for male/female over the scatter plot, with points coloured by gender. It sounds easy but some issues just persist.
What I currently do is to use a new set of x's and predict y's for every model, combine the fitted lines together in a data.frame, and then convert wide to long, with their model name as the third var (from this post: ggplot2: how to add the legend for a line added to a scatter plot? and this: Add legend to ggplot2 line plot I learnt that mapping should be used instead of setting colours/legends separately). However, while I can get a multicolor line plot, the points come without specific colour for gender (already a factor) as I expected from the posts I referenced.
I also know it might be possible to use aes=(y=predict(model)), but I met other problems for this. I also tried to colour the points directly in aes, and assign colours separately for each line, but the legend cannot be generated unless I use lty, which makes legend in the same colour.
Would appreciate any idea, and also welcome to change the whole method.
Code
Note that two pairs of lines overlap. So it only appeared to be two lines. I guess adding some jitter in the data might make it look differently.
slrmen<-lm(tc~x+I(x^2),data=data[data['gender']==0,])
slrwomen<-lm(tc~x+I(x^2),data=data[data['gender']==1,])
prdf <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(1,100)))
prdm <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(0,100)))
prdf$fit <- predict(fullmodel, newdata = prdf)
prdm$fit <- predict(fullmodel, newdata = prdm)
rawplotdata<-data.frame(x=prdf$x, fullf=prdf$fit, fullm=prdm$fit,
linf=predict(slrwomen, newdata = prdf),
linm=predict(slrmen, newdata = prdm))
plotdata<-reshape2::melt(rawplotdata,id.vars="x",
measure.vars=c("fullf","fullm","linf","linm"),
variable.name="fitmethod", value.name="y")
plotdata$fitmethod<-as.factor(plotdata$fitmethod)
plt <- ggplot() +
geom_line(data = plotdata, aes(x = x, y = y, group = fitmethod,
colour=fitmethod)) +
scale_colour_manual(name = "Fit Methods",
values = c("fullf" = "lightskyblue",
"linf" = "cornflowerblue",
"fullm"="darkseagreen", "linm" = "olivedrab")) +
geom_point(data = data, aes(x = x, y = y, fill = gender)) +
scale_fill_manual(values=c("blue","green")) ## This does not work as I expected...
show(plt)
Code for another method (omitted two lines), which generates same-colour legend and multi-color plot:
ggplot(data = prdf, aes(x = x, y = fit)) + # prdf and prdm are just data frames containing the x's and fitted values for different models
geom_line(aes(lty="Female"),colour = "chocolate") +
geom_line(data = prdm, aes(x = x, y = fit, lty="Male"), colour = "darkblue") +
geom_point(data = data, aes(x = x, y = y, colour = gender)) +
scale_colour_discrete(name="Gender", breaks=c(0,1),
labels=c("Male","Female"))
This is related to using the colour aesthetic for lines and the fill aesthetics for points in your own (first) example. In the second example, it works because the colour aesthetic is used for lines and points.
By default, geom_point can not map a variable to fill, because the default point shape (19) doesn't have a fill.
For fill to work on points, you have to specify shape = 21:25 in geom_point(), outside of aes().
Perhaps this small reproducible example helps to illustrate the point:
Simulate data
set.seed(4821)
x1 <- rnorm(100, mean = 5)
set.seed(4821)
x2 <- rnorm(100, mean = 6)
data <- data.frame(x = rep(seq(20,80,length.out = 100),2),
tc = c(x1, x2),
gender = factor(c(rep("Female", 100), rep("Male", 100))))
Fit models
slrmen <-lm(tc~x+I(x^2), data = data[data["gender"]=="Male",])
slrwomen <-lm(tc~x+I(x^2),data = data[data["gender"]=="Female",])
newdat <- data.frame(x = seq(20,80,length.out = 200))
fitted.male <- data.frame(x = newdat,
gender = "Male",
tc = predict(object = slrmen, newdata = newdat))
fitted.female <- data.frame(x = newdat,
gender = "Female",
tc = predict(object = slrwomen, newdata = newdat))
Plot using colour aesthetics
Use the colour aesthetics for both points and lines (specify in ggplot such that it gets inherited throughout). By default, geom_point can map a variable to colour.
library(ggplot2)
ggplot(data, aes(x = x, y = tc, colour = gender)) +
geom_point() +
geom_line(data = fitted.male) +
geom_line(data = fitted.female) +
scale_colour_manual(values = c("tomato","blue")) +
theme_bw()
Plot using colour and fill aesthetics
Use the fill aesthetics for points and the colour aesthetics for lines (specify aesthetics in geom_* to prevent them being inherited). This will reproduce the problem.
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender)) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
To fix this, change the shape argument in geom_point to a point shape that can be filled (21:25).
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender), shape = 21) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
Created on 2021-09-19 by the reprex package (v2.0.1)
Note that the scales for colour and fill get merged automatically if the same variable is mapped to both aesthetics.
It seems to me that what you really want to do is use ggplot2::stat_smooth instead of trying to predict yourself.
Borrowing the data from #scrameri:
ggplot(data, aes(x = x, y = tc, color = gender)) +
geom_point() +
stat_smooth(aes(linetype = "X^2"), method = 'lm',formula = y~x + I(x^2)) +
stat_smooth(aes(linetype = "X^3"), method = 'lm',formula = y~x + I(x^2) + I(x^3)) +
scale_color_manual(values = c("darkseagreen","lightskyblue"))
Long story short, I ran a bunch of stochastic simulations for each of 15 groups, and have one integer per group that I need to add to each violin in the plot, and can't seem to figure out how to do it. Here's a reproducible example:
# Making data
df <- data.frame(c(rep(1,10), rep(2,10), rep(3,10)), sample.int(100, 30), c(rep(85,10), rep(60,10), rep(55,10)))
colnames(df) <- c("Group", "Data", "Extra")
# Grouping data
df$Group <- as.factor(df$Group)
# Plotting
Violin2 <- ggplot(data = df, aes(x = Group, y = Data))+
geom_violin(aes(fill = Group, color = Group))+
stat_summary(aes(y = Data), fun=mean, geom="point", color = "navyblue", shape = 17, size = 3)+
stat_summary(aes(y = Data), fun=median, geom="point", color = "black", shape = 16, size = 3)
#geom_point(aes(y = Extra, color = "#00BB66", shape = 16, size = 3)+
Violin2
So here, I'm saying that within the df, there are three groups: 1, 2, and 3, that are applied to the "Data" column. What I need to add, are the integers from the "Extra" column of the df, as single points on each violin (so the three integers would be 85, 60, and 55).
I initially tried to add a geom_point layer, and thought Extra would be grouped by Group, just as Data was, but that didn't work (Error: Discrete value supplied to continuous scale).
I've been searching around on here a lot, and can't find a solution, so any advice would be greatly appreciated! Thanks so much in advance for any help! :)
This is the data:
And this is the plot so far:
So it's actually just one more line of code - you can stitch different geom's together in ggplot and it makes it really easy to do exactly what you're talking about. Just add
geom_point(aes(y = Data)) +
So the whole code would look like this
ggplot(data = df, aes(x = Group, y = Data))+
geom_violin(aes(fill = Group, color = Group))+
geom_point(aes(y = Extra), size = 2, colour = "red") +
stat_summary(aes(y = Data), fun=mean, geom="point",
color = "navyblue", shape = 17, size = 3)+
stat_summary(aes(y = Data), fun=median, geom="point",
color = "black", shape = 16, size = 3)
I've coloured the points red and made them bigger but you can change that. That gives:
Your example is working perfectly. The only thing to update is to not use constant value for color arg inside aes. You could use it like that only outside the aes.
# Making data
library(ggplot2)
df <- data.frame(c(rep(1,10), rep(2,10), rep(3,10)), sample.int(100, 10), c(rep(85,10), rep(60,10), rep(55,10)))
colnames(df) <- c("Group", "Data", "Extra")
# Grouping data
df$Group <- as.factor(df$Group)
# Plotting
Violin2 <- ggplot(data = df, aes(x = Group, y = Data))+
geom_violin(aes(fill = Group, color = Group))+
stat_summary(aes(y = Data), fun=mean, geom="point", color = "navyblue", shape = 17, size = 3)+
stat_summary(aes(y = Data), fun=median, geom="point", color = "black", shape = 16, size = 3) +
geom_point(aes(y = Extra))
Violin2
Created on 2021-06-08 by the reprex package (v2.0.0)
A very similar question to the one asked here. However, in that situation the fill parameter for the two plots are different. For my situation the fill parameter is the same for both plots, but I want different color schemes.
I would like to manually change the color in the boxplots and the scatter plots (for example making the boxes white and the points colored).
Example:
require(dplyr)
require(ggplot2)
n<-4*3*10
myvalues<- rexp((n))
days <- ntile(rexp(n),4)
doses <- ntile(rexp(n), 3)
test <- data.frame(values =myvalues,
day = factor(days, levels = unique(days)),
dose = factor(doses, levels = unique(doses)))
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot( aes(fill = dose))+
geom_point( aes(fill = dose), alpha = 0.4,
position = position_jitterdodge())
produces a plot like this:
Using 'scale_fill_manual()' overwrites the aesthetic on both the boxplot and the scatterplot.
I have found a hack by adding 'colour' to geom_point and then when I use scale_fill_manual() the scatter point colors are not changed:
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(fill = dose), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = factor(test$dose)),
position = position_jitterdodge(jitter.width = 0.1))+
scale_fill_manual(values = c('white', 'white', 'white'))
Are there more efficient ways of getting the same result?
You can use group to set the different boxplots. No need to set the fill and then overwrite it:
ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(group = interaction(day, dose)), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = dose),
position = position_jitterdodge(jitter.width = 0.1))
And you should never use data$column inside aes - just use the bare column. Using data$column will work in simple cases, but will break whenever there are stat layers or facets.
I'm trying to create a scatterplot where the points are jittered (geom_jitter), but I also want to create a black outline around each point. Currently I'm doing it by adding 2 geom_jitters, one for the fill and one for the outline:
beta <- paste("beta == ", "0.15")
ggplot(aes(x=xVar, y = yVar), data = data) +
geom_jitter(size=3, alpha=0.6, colour=my.cols[2]) +
theme_bw() +
geom_abline(intercept = 0.0, slope = 0.145950, size=1) +
geom_vline(xintercept = 0, linetype = "dashed") +
annotate("text", x = 2.5, y = 0.2, label=beta, parse=TRUE, size=5)+
xlim(-1.5,4) +
ylim(-2,2)+
geom_jitter(shape = 1,size = 3,colour = "black")
However, that results in something like this:
Because jitter randomly offsets the data, the 2 geom_jitters are not in line with each other. How do I ensure the outlines are in the same place as the fill points?
I've see threads about this (e.g. Is it possible to jitter two ggplot geoms in the same way?), but they're pretty old and not sure if anything new has been added to ggplot that would solve this issue
The code above works if, instead of using geom_jitter, I use the regular geom_point, but I have too many overlapping points for that to be useful
EDIT:
The solution in the posted answer works. However, it doesn't quite cooperate for some of my other graphs where I'm binning by some other variable and using that to plot different colours:
ggplot(aes(x=xVar, y = yVar, color=group), data = data) +
geom_jitter(size=3, alpha=0.6, shape=21, fill="skyblue") +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_colour_brewer(name = "Title", direction = -1, palette = "Set1") +
xlim(-1.5,4) +
ylim(-2,2)
My group variable has 3 levels, and I want to colour each group level by a different colour in the brewer Set1 palette. The current solution just colours everything skyblue. What should I fill by to ensure I'm using the correct colour palette?
You don't actually have to use two layers; you can just use the fill aesthetic of a plotting character with a hole in it:
# some random data
set.seed(47)
df <- data.frame(x = rnorm(100), y = runif(100))
ggplot(aes(x = x, y = y), data = df) + geom_jitter(shape = 21, fill = 'skyblue')
The colour, size, and stroke aesthetics let you customize the exact look.
Edit:
For grouped data, set the fill aesthetic to the grouping variable, and use scale_fill_* functions to set color scales:
# more random data
set.seed(47)
df <- data.frame(x = runif(100), y = rnorm(100), group = sample(letters[1:3], 100, replace = TRUE))
ggplot(aes(x=x, y = y, fill=group), data = df) +
geom_jitter(size=3, alpha=0.6, shape=21) +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_fill_brewer(name = "Title", direction = -1, palette = "Set1")
I'm enjoying using tile density plots to represent probability densities. I often use the second (y) dimension to illustrate comparisons of densities between factors, but I'm having trouble introducing a third dimension. I want to use colour to represent the third dimension. How can I do this? (I've tried inserting aes references to Type in the example below but they appear to collide with the ..density.. aesthetic.)
Beginning with the following plot,
library(ggplot2)
dz <- data.frame(Type = c(rep("A", 100), rep("B", 100)),
Costs = c(rnorm(100), rnorm(100, 5, 1))
)
ggplot(dz, aes(x = Costs, y = 1)) +
stat_density(aes(fill = ..density..), geom = "tile", position = "identity") +
scale_fill_gradient(low = "white", high = "black")
What I want is a combination of the following. For A:
and B:
If you map fill to Type, and alpha to the density, you get more or less what you want:
ggplot(dz, aes(x = Costs, y = 1, fill=Type)) +
stat_density(aes(alpha=..density..), geom = "tile", position = "identity") +
scale_fill_manual(values=c("red", "blue"))