I'm enjoying using tile density plots to represent probability densities. I often use the second (y) dimension to illustrate comparisons of densities between factors, but I'm having trouble introducing a third dimension. I want to use colour to represent the third dimension. How can I do this? (I've tried inserting aes references to Type in the example below but they appear to collide with the ..density.. aesthetic.)
Beginning with the following plot,
library(ggplot2)
dz <- data.frame(Type = c(rep("A", 100), rep("B", 100)),
Costs = c(rnorm(100), rnorm(100, 5, 1))
)
ggplot(dz, aes(x = Costs, y = 1)) +
stat_density(aes(fill = ..density..), geom = "tile", position = "identity") +
scale_fill_gradient(low = "white", high = "black")
What I want is a combination of the following. For A:
and B:
If you map fill to Type, and alpha to the density, you get more or less what you want:
ggplot(dz, aes(x = Costs, y = 1, fill=Type)) +
stat_density(aes(alpha=..density..), geom = "tile", position = "identity") +
scale_fill_manual(values=c("red", "blue"))
Related
Let's say I have the following dataset:
set.seed(42)
data <- data.frame(type = sample(LETTERS[1:2], 40, replace = T),
condition = sample(c("Control", "Treatment"), 40, replace = T),
measurement = runif(40))
And I'd like to create the facetted graph:
ggplot(data, aes( x= condition, y = measurement))+
geom_point()+
facet_wrap(~type)
I'd like also to show the baseline (with geom_hline, for example), that equals mean of control values (mean(data$measurement[data$condition == "Control"]). But because control values will be different in different types (meaning facets on the graph), I can't just calculate one single mean. As they will be different between the facets.
Is there any way to specify yintercept for geom_hline between different facets ?
Something like this, but with the specified yintercept value, calculating the mean values for the control group for each individual facet:
ggplot(data, aes( x= condition, y = measurement))+
geom_point()+
geom_hline(yintercept= mean(data$measurement[data$condition == "Control"]),
linetype="dashed",
color = "red", size=1)+
facet_wrap(~type)
Thanks a lot!
Best regards,
Eugene
You can use stat_summary with fun = mean and geom = "hline", passing only the control subset to the data parameter. You can map yintercept to the y value calculated by the stat.
ggplot(data, aes(x = condition, y = measurement))+
geom_point() +
stat_summary(fun = mean, geom = "hline", aes(yintercept = after_stat(y)),
data = data[data$condition == "Control",], color = "red",
linetype = "dashed") +
facet_wrap(~type)
Long story short, I ran a bunch of stochastic simulations for each of 15 groups, and have one integer per group that I need to add to each violin in the plot, and can't seem to figure out how to do it. Here's a reproducible example:
# Making data
df <- data.frame(c(rep(1,10), rep(2,10), rep(3,10)), sample.int(100, 30), c(rep(85,10), rep(60,10), rep(55,10)))
colnames(df) <- c("Group", "Data", "Extra")
# Grouping data
df$Group <- as.factor(df$Group)
# Plotting
Violin2 <- ggplot(data = df, aes(x = Group, y = Data))+
geom_violin(aes(fill = Group, color = Group))+
stat_summary(aes(y = Data), fun=mean, geom="point", color = "navyblue", shape = 17, size = 3)+
stat_summary(aes(y = Data), fun=median, geom="point", color = "black", shape = 16, size = 3)
#geom_point(aes(y = Extra, color = "#00BB66", shape = 16, size = 3)+
Violin2
So here, I'm saying that within the df, there are three groups: 1, 2, and 3, that are applied to the "Data" column. What I need to add, are the integers from the "Extra" column of the df, as single points on each violin (so the three integers would be 85, 60, and 55).
I initially tried to add a geom_point layer, and thought Extra would be grouped by Group, just as Data was, but that didn't work (Error: Discrete value supplied to continuous scale).
I've been searching around on here a lot, and can't find a solution, so any advice would be greatly appreciated! Thanks so much in advance for any help! :)
This is the data:
And this is the plot so far:
So it's actually just one more line of code - you can stitch different geom's together in ggplot and it makes it really easy to do exactly what you're talking about. Just add
geom_point(aes(y = Data)) +
So the whole code would look like this
ggplot(data = df, aes(x = Group, y = Data))+
geom_violin(aes(fill = Group, color = Group))+
geom_point(aes(y = Extra), size = 2, colour = "red") +
stat_summary(aes(y = Data), fun=mean, geom="point",
color = "navyblue", shape = 17, size = 3)+
stat_summary(aes(y = Data), fun=median, geom="point",
color = "black", shape = 16, size = 3)
I've coloured the points red and made them bigger but you can change that. That gives:
Your example is working perfectly. The only thing to update is to not use constant value for color arg inside aes. You could use it like that only outside the aes.
# Making data
library(ggplot2)
df <- data.frame(c(rep(1,10), rep(2,10), rep(3,10)), sample.int(100, 10), c(rep(85,10), rep(60,10), rep(55,10)))
colnames(df) <- c("Group", "Data", "Extra")
# Grouping data
df$Group <- as.factor(df$Group)
# Plotting
Violin2 <- ggplot(data = df, aes(x = Group, y = Data))+
geom_violin(aes(fill = Group, color = Group))+
stat_summary(aes(y = Data), fun=mean, geom="point", color = "navyblue", shape = 17, size = 3)+
stat_summary(aes(y = Data), fun=median, geom="point", color = "black", shape = 16, size = 3) +
geom_point(aes(y = Extra))
Violin2
Created on 2021-06-08 by the reprex package (v2.0.0)
I need to align the density line with the height of geom_histogram and keep count values on the y axis instead of density.
I have these 2 versions:
# Creating dataframe
library(ggplot2)
values <- c(rep(0,2), rep(2,3), rep(3,3), rep(4,3), 5, rep(6,2), 8, 9, rep(11,2))
data_to_plot <- as.data.frame(values)
# Option 1 ( y scale shows frequency, but geom_density line and geom_histogram are not matching )
ggplot(data_to_plot, aes(x = values)) +
geom_histogram(aes(y = ..count..), binwidth = 1, colour= "black", fill = "white") +
geom_density(aes(y=..count..), fill="blue", alpha = .2)+
scale_x_continuous(breaks = seq(0, max(data_to_plot$values), 1))
y scale shows frequency, but geom_density line and geom_histogram are not matching
# Option 2 (geom_density line and geom_histogram are matching, but y scale density = 1)
ggplot(data_to_plot, aes(x = values)) +
geom_histogram(aes(y = after_stat(ndensity)), binwidth = 1, colour= "black", fill = "white") +
geom_density(aes(y = after_stat(ndensity)), fill="blue", alpha = .2)+
scale_x_continuous(breaks = seq(0, max(data_to_plot$values), 1))
geom_density line and geom_histogram are matching, but y scale density = 1
What I need is plot from Option 2, but Y scale from Option 1. I can get it by adding (aes(y=1.25*..count..) for this particular data, but my data is not static and this will not work for another dataset (just modify values to test):
# Option 3 (with coefficient in aes())
ggplot(data_to_plot, aes(x = values)) +
geom_histogram(aes(y = ..count..), binwidth = 1, colour= "black", fill = "white") +
geom_density(aes(y=1.25*..count..), fill="blue", alpha = .2)+
scale_x_continuous(breaks = seq(0, max(data_to_plot$values), 1))
Desired result: y scale shows frequency and geom_density line is matching with geom_histogram height
I cannot hardcode coefficient or bins.
This problem is close to the ones discussed here, but it did not work for my case:
Programatically scale density curve made with geom_density to similar height to geom_histogram?
How to put geom_density and geom_histogram on same counts scale
A density curve always represents data between 0 and 1, whereas a count data are multiples of 1. So it does mostly not make sense to plot those data to the same y-axis.
The left plot shows density line and histogram for data similar to the ones from you - I just added some. The height of the bar shows the percentage of counts for the corresponding x-value. The y-scale is smaller than 1.
The right plot shows the same as the left, but another histogram is added which shows the count. The y-scales goes up and the 2 density plots shrink.
If you want to scale both to the same scale, you could to this by calculating a scaling factor. I have used this scaling factor to add a secondary y-axis to the third plot and saling the sec y-axis accordingly.
In order to make clear what belongs to what scale I have colored 2nd y-axis and the data belonging to it red.
library(ggplot2)
library(patchwork)
values <- c(rep(0,2),rep(1,4), rep(2,6), rep(3,8), rep(4,12), rep(5,7), rep(6,4),rep(7,2))
df <- as.data.frame(values)
p1 <- ggplot(df, aes(x = values)) +
stat_density(geom = 'line') +
geom_histogram(aes(y = ..density..), binwidth = 1,color = 'white', fill = 'red', alpha = 0.2)
p2 <- ggplot(df, aes(x = values)) +
stat_density(geom = 'line') +
geom_histogram(aes(y = ..count..), binwidth = 1, color = 'white', alpha = 0.2) +
geom_histogram(aes(y = ..density..), binwidth = 1, color = 'white', alpha = 0.2) +
ylab('density and counts')
# Find maximum of ..density..
m <- max(table(df$values)/sum(table(df$values)))
# Find maxium of df$values
mm <- max(table(df$values))
# Create Scaling factor for secondary axis
scaleF <- m/mm
p3 <- p1 + scale_y_continuous(
limits = c(0, m),
# Features of the first axis
name = "density",
# Add a second axis and specify its features
sec.axis = sec_axis( trans=~(./scaleF), name = 'counts')
) +
theme(axis.ticks.y.right = element_line(color = "red"),
axis.line.y.right = element_line(color = 'red'),
axis.text.y.right = element_text(color = 'red'),
axis.title.y.right = element_text(color = 'red')) +
annotate("segment", x = 5, xend = 7,
y = 0.25, yend = .25, colour = "pink", size=3, alpha=0.6, arrow=arrow())
p1 | p2 | p3
I would like to apply the scale colour gradient also to the smooth line.
At the moment the code below set the color fix to red.
library(ggplot2)
a <- data.frame(year = 1:100, values = sin(1:100)*1000 + runif(100))
ggplot(a, aes(x = year, y = values, color = values )) + geom_line(size = 2) +
scale_colour_gradient2(
low = "blue",
mid = "white" ,
high = "red",
midpoint = 10
)+
geom_smooth(
data = a,
aes(x = year, y = values),
color = "red",
size = 2
)
But when I set color = values it doesn't work. Instead it takes the default blue.
geom_smooth(
data = a,
aes(x = year, y = values, color = values),
size = 2
)
Thanks in advance.
Use geom_smooth(aes(color=..y..)) to add a color aesthetic to geom_smooth. ..y.. is the vector of y-values internally calculated by geom_smooth to create the regression curve. In general, when you want to add an aesthetic to a summary value that's calculated internally, you need to map the aesthetic to that internal value. Here, the internal value is the ..y.. value of the smoothing function. In other cases it might be ..count.. for histograms or bar plots, or ..density.. for density plots.
Here's an example using your data. Note that I've tweaked a few of the plot parameters for illustration.
set.seed(48)
a <- data.frame(year = 1:100, values = sin(1:100)*1000 + runif(100))
ggplot(a, aes(x = year, y = values, color = values )) +
geom_line(size = 0.5) +
geom_smooth(aes(color=..y..), size=1.5, se=FALSE) +
scale_colour_gradient2(low = "blue", mid = "yellow" , high = "red",
midpoint=10) +
theme_bw()
Note that the color of the regression line does not change much because its y-values span a small range relative to the data. Here's another example with fake data that generates a more wide-ranging regression curve.
set.seed(1938)
a2 <- data.frame(year = seq(0,100,length.out=1000), values = cumsum(rnorm(1000)))
ggplot(a2, aes(x = year, y = values, color = values )) +
geom_line(size = 0.5) +
geom_smooth(aes(color=..y..), size=1.5, se=FALSE) +
scale_colour_gradient2(low = "blue", mid = "yellow" , high = "red",
midpoint=median(a2$values)) +
theme_bw()
I'm trying to create a scatterplot where the points are jittered (geom_jitter), but I also want to create a black outline around each point. Currently I'm doing it by adding 2 geom_jitters, one for the fill and one for the outline:
beta <- paste("beta == ", "0.15")
ggplot(aes(x=xVar, y = yVar), data = data) +
geom_jitter(size=3, alpha=0.6, colour=my.cols[2]) +
theme_bw() +
geom_abline(intercept = 0.0, slope = 0.145950, size=1) +
geom_vline(xintercept = 0, linetype = "dashed") +
annotate("text", x = 2.5, y = 0.2, label=beta, parse=TRUE, size=5)+
xlim(-1.5,4) +
ylim(-2,2)+
geom_jitter(shape = 1,size = 3,colour = "black")
However, that results in something like this:
Because jitter randomly offsets the data, the 2 geom_jitters are not in line with each other. How do I ensure the outlines are in the same place as the fill points?
I've see threads about this (e.g. Is it possible to jitter two ggplot geoms in the same way?), but they're pretty old and not sure if anything new has been added to ggplot that would solve this issue
The code above works if, instead of using geom_jitter, I use the regular geom_point, but I have too many overlapping points for that to be useful
EDIT:
The solution in the posted answer works. However, it doesn't quite cooperate for some of my other graphs where I'm binning by some other variable and using that to plot different colours:
ggplot(aes(x=xVar, y = yVar, color=group), data = data) +
geom_jitter(size=3, alpha=0.6, shape=21, fill="skyblue") +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_colour_brewer(name = "Title", direction = -1, palette = "Set1") +
xlim(-1.5,4) +
ylim(-2,2)
My group variable has 3 levels, and I want to colour each group level by a different colour in the brewer Set1 palette. The current solution just colours everything skyblue. What should I fill by to ensure I'm using the correct colour palette?
You don't actually have to use two layers; you can just use the fill aesthetic of a plotting character with a hole in it:
# some random data
set.seed(47)
df <- data.frame(x = rnorm(100), y = runif(100))
ggplot(aes(x = x, y = y), data = df) + geom_jitter(shape = 21, fill = 'skyblue')
The colour, size, and stroke aesthetics let you customize the exact look.
Edit:
For grouped data, set the fill aesthetic to the grouping variable, and use scale_fill_* functions to set color scales:
# more random data
set.seed(47)
df <- data.frame(x = runif(100), y = rnorm(100), group = sample(letters[1:3], 100, replace = TRUE))
ggplot(aes(x=x, y = y, fill=group), data = df) +
geom_jitter(size=3, alpha=0.6, shape=21) +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_fill_brewer(name = "Title", direction = -1, palette = "Set1")