Let's say we observed two species of beetles. We want to compare their size using geom_violin() as done below:
df = data.frame(species=rep(c('species_a','species_b'),3), size=c(1,1.5,1.2,1.8,1.1,1.9))
ggplot(df, aes(x=species, y=size)) + geom_violin()
Knowing that the expected size range is [0.8,1.8] for species_a and [1.2, 1.8] for species_b...
ranges = list(species_a=c(0.8,1.8), species_b=c(1.2,1.8))
How can we easily add this range (with a grey shape for example) on the graph?
Put ranges in separate data frame with species names and minimal/maximal values
ranges = data.frame(species=c('species_a','species_b'),
rmin=c(0.8,1.2),rmax=c(1.2,1.8))
ranges
species rmin rmax
1 species_a 0.8 1.8
2 species_b 1.2 1.8
Then use new data frame for geom_rect() to make area that is placed under the geom_violin(). The geom_blank() is used to make x axis according to original data frame.
ggplot(df, aes(x=species, y=size)) + geom_blank() +
geom_rect(data=ranges,aes(xmin=as.numeric(species)-0.45,
xmax=as.numeric(species)+0.45,
ymin=rmin,ymax=rmax),inherit.aes=FALSE)+
geom_violin()
You may try this:
# first, create data frame from list 'ranges'
df2 <- setNames(object = do.call(rbind.data.frame, ranges), nm = c("min_size", "max_size"))
df2$species <- rownames(df2)
# plot violins with 'df', and ranges with 'df2'.
# Set colour and size according to your own "data-ink ratio" preferences.
ggplot(data = df, aes(x = species)) +
geom_violin(aes(y = size)) +
geom_linerange(data = df2, aes(ymax = max_size, ymin = min_size), colour = "grey", size = 3)
Related
How do I retain one variable (a single point) to not be jittered while keeping the jitter on the other categorical variable in ggplot?
Here is the code I am currently using and what the output looks like:
# load ggplot2
library(ggplot2)
library(hrbrthemes)
# A basic scatterplot with color depending on Species
p <- ggplot(dt, aes(x=Type, y=y, color=Type)) +
geom_jitter(shape=22,
alpha=0.5,
size=2) +
geom_hline(yintercept=c(1.4, 28.7, 2.65, 14.9)) +
labs(y = 'ng/g lipid', title = 'PCB 99')
# Log base 10 scale
p + scale_y_continuous(trans = 'log10')
enter image description here
One option would be to split your data in categories (or (single) observations) which you want to be displayed jittered and not to be jittered. The first set of data could then be passed to geom_jitter while for the second you could use geom_point.
Using iris as example data:
library(ggplot2)
ggplot(iris, aes(x = Species, y = Sepal.Length, color = Species)) +
geom_jitter(
data = iris[!iris$Species == "setosa", ],
shape = 22,
alpha = 0.5,
size = 2
) +
geom_point(
data = iris[iris$Species == "setosa", ],
alpha = 0.5,
size = 2
)
** Edited with Repeatable Data **
I have a data.frame with plots of growth over time for 50 experimental treatments. I have plotted them as a faceted 5x10 plot grid. I also ordered them in a way that makes sense considering my experimental treatments.
I ran a regression function to find growth rate in each treatment, and saved the slope values in another data frame. I have plotted the data, the regression line, and the value of growth rate, but I want to color the backgrounds of the individual faceted plots according to that regression slope value, but I can't figure out how to set color to call to a continuous variable, and especially one from a different df with a different number of rows (original df has 300 rows, df I want to call has 50 - one for each treatment).
My code is as follows:
Df:
df <- data.frame(matrix(ncol = 3,nrow=300))
colnames(df) <- c("Trt", "Day", "Size")
df$Trt <- rep(1:50, each=6)
df$Day <- rep_len(1:6, length.out=300)
df$Size <- rep_len(c(3,5,8,9,12,12,3,7,10,16,17,20),length.out = 300)
Regression function and output dataframe:
regression=function(df){
reg_fun<-lm(formula=df$Size~df$Day)
slope<-round(coef(reg_fun)[2],3)
intercept<-round(coef(reg_fun)[1],3)
R2<-round(as.numeric(summary(reg_fun)[8]),3)
R2.Adj<-round(as.numeric(summary(reg_fun)[9]),3)
c(slope,intercept,R2,R2.Adj)
}
library(plyr)
slopevalues<-ddply(df,"Trt",regression)
colnames(slopevalues)<-c ("Trt","slope","intercept","R2","R2.Adj")
Plot:
ggplot(data=df, aes(x=Day, y=Size))+
geom_line() +
geom_point() +
xlab("Day") + ylab("Size (μm)")+
geom_smooth(method="lm",size=.5,se=FALSE)+
geom_text(data=slopevalues,
inherit.aes=FALSE,
aes(x =1, y = 16,hjust=0,
label=paste(slope)))+
facet_wrap(~ Trt, nrow=5)
What I want to do is color the backgrounds of the individual graphs according to the slope value (slopevalues$slope) on a gradient. My real data are not just 2 values repeated, so I want to do this on a gradient of colors according to that value.
Any advice welcome.
enter image description here
You can use geom_rect with infinite coordinates to do this:
ggplot(data=df, aes(x=Day, y=Size))+
## This is the only new bit
geom_rect(
aes(fill = slope, xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf),
slopevalues,
inherit.aes = FALSE
) +
## New bit ends here
geom_line() +
geom_point() +
xlab("Day") + ylab("Size (μm)")+
geom_smooth(method="lm",size=.5,se=FALSE)+
geom_text(data=slopevalues,
inherit.aes=FALSE,
aes(x =1, y = 16,hjust=0,
label=paste(slope)))+
facet_wrap(~ Trt, nrow=5)
Suppose I make a violin plot, with say 10 violins, using the following code:
library(ggplot2)
library(reshape2)
df <- melt(data.frame(matrix(rnorm(500),ncol=10)))
p <- ggplot(df, aes(x = variable, y = value)) +
geom_violin()
p
I can add a dot representing the mean of each variable as follows:
p + stat_summary(fun.y=mean, geom="point", size=2, color="red")
How can I do something similar but for arbitrary points?
For example, if I generate 10 new points, one drawn from each distribution, how could I plot those as dots on the violins?
You can give any function to stat_summary provided it just returns a single value. So one can use the function sample. Put extra arguments such as size, in the fun.args
p + stat_summary(fun.y = "sample", geom = "point", fun.args = list(size = 1))
Assuming your points are qualified using the same group names (i.e., variable), you should be able to define them manually with:
newdf <- group_by(df, variable) %>% sample_n(10)
p + geom_point(data=newdf)
The points can be anything, including static numbers:
newdf <- data.frame(variable = unique(df$variable), value = seq(-2, 2, len=10))
p + geom_point(data=newdf)
I had a similar problem. Code below exemplifies the toy problem - How does one add arbitrary points to a violin plot? - and solution.
## Visualize data set that comes in base R
head(ToothGrowth)
## Make a violin plot with dose variable on x-axis, len variable on y-axis
# Convert dose variable to factor - Important!
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1)
# Suppose you want to add 3 blue points
# [0.5, 10], [1,20], [2, 30] to the plot.
# Make a new data frame with these points
# and add them to the plot with geom_point().
TrueVals <- ToothGrowth[1:3,]
TrueVals$len <- c(10,20,30)
# Make dose variable a factor - Important for positioning points correctly!
TrueVals$dose <- as.factor(c(0.5, 1, 2))
# Plot with 3 added blue points
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1) +
geom_point(data = TrueVals, color = "blue")
I have a ggplot with a geom_text():
geom_text(y = 4, aes(label = text))
The variable text has the following format:
number1-number2
I want to know if it is possible to define a color for the number1 and another color for number2 (example: red and green color)
Thanks!
One way is if you have for example the label texts of number1 and number2 as separate columns in the data frame:
ggplot(data, aes(x,y)) + geom_text(label=data[,3], color="red", vjust=0) + geom_text(label=data[,4], color="blue", vjust=1)
You may also try annotate:
# data for plot
df <- data.frame(x = 1:5, y = 1:5)
# data for annotation
no1 <- "number1"
no2 <- "number1"
x_annot <- 4
y_annot <- 5
dodge <- 0.3
ggplot(data = df, aes(x = x, y = y)) +
geom_point() +
annotate(geom = "text", x = c(x_annot - dodge, x_annot, x_annot + dodge), y = y_annot,
label = c(no1, "-", no2),
col = c("red", "black", "green")) +
theme_classic()
I defined the labels and positions outside the annotate call, which possibly makes it easier to generate these variables more dynamically, e.g. if "number1" in fact could be calculated from the original data set, or positions be based on range of x and y.
I have data where I look at the difference in growth between a monoculture and a mixed culture for two different species. Additionally, I made a graph to make my data clear.
I want a barplot with error bars, the whole dataset is of course bigger, but for this graph this is the data.frame with the means for the barplot.
plant species means
Mixed culture Elytrigia 0.886625
Monoculture Elytrigia 1.022667
Monoculture Festuca 0.314375
Mixed culture Festuca 0.078125
With this data I made a graph in ggplot2, where plant is on the x-axis and means on the y-axis, and I used a facet to divide the species.
This is my code:
limits <- aes(ymax = meansS$means + eS$se, ymin=meansS$means - eS$se)
dodge <- position_dodge(width=0.9)
myplot <- ggplot(data=meansS, aes(x=plant, y=means, fill=plant)) + facet_grid(. ~ species)
myplot <- myplot + geom_bar(position=dodge) + geom_errorbar(limits, position=dodge, width=0.25)
myplot <- myplot + scale_fill_manual(values=c("#6495ED","#FF7F50"))
myplot <- myplot + labs(x = "Plant treatment", y = "Shoot biomass (gr)")
myplot <- myplot + opts(title="Plant competition")
myplot <- myplot + opts(legend.position = "none")
myplot <- myplot + opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank())
So far it is fine. However, I want to add two different horizontal lines in the two facets. For that, I used this code:
hline.data <- data.frame(z = c(0.511,0.157), species = c("Elytrigia","Festuca"))
myplot <- myplot + geom_hline(aes(yintercept = z), hline.data)
However if I do that, I get a plot were there are two extra facets, where the two horizontal lines are plotted. Instead, I want the horizontal lines to be plotted in the facets with the bars, not to make two new facets. Anyone a idea how to solve this.
I think it makes it clearer if I put the graph I create now:
Make sure that the variable species is identical in both datasets. If it a factor in one on them, then it must be a factor in the other too
library(ggplot2)
dummy1 <- expand.grid(X = factor(c("A", "B")), Y = rnorm(10))
dummy1$D <- rnorm(nrow(dummy1))
dummy2 <- data.frame(X = c("A", "B"), Z = c(1, 0))
ggplot(dummy1, aes(x = D, y = Y)) + geom_point() + facet_grid(~X) +
geom_hline(data = dummy2, aes(yintercept = Z))
dummy2$X <- factor(dummy2$X)
ggplot(dummy1, aes(x = D, y = Y)) + geom_point() + facet_grid(~X) +
geom_hline(data = dummy2, aes(yintercept = Z))