ggplot2 - boxplot multiple data.frames while staying in order - r

I apologize if this is more for SO instead of CV.
I am attempting to include a second boxplot into an existing boxplot that is ordered by the mean of the values plotted. When I include the boxplot from the second data.frame (representing a control sample to the other plots), the original plot looses its ordering.
Below is an example:
x1 <- data.frame("V1" = runif(100, 0, 100), "siteno" = "X1") #mean = 50.3
x2 <- data.frame("V1" = runif(100, 200, 450), "siteno" = "X2") #mean = 322.4
x3 <- data.frame("V1" = runif(100, 50, 150), "siteno" = "X3") #mean = 97.8
xData <- rbind(x1,x2,x3)
xData$siteno <- with(xData, reorder(siteno, V1, mean))
zData <- data.frame("V1" = runif(300, 0, 450), "siteno" = "Z1") #mean = 224.2
#orders xData correctly
ggplot(xData, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, width=1, position = position_dodge(width = 1), outlier.colour = "dark gray", outlier.size = 1)
this produces the below plot with x variables correctly ordered by mean:
If I try the code below to add the control data, the order of the x variables is lost:
x1 <- data.frame("V1" = runif(100, 0, 100), "siteno" = "X1") #mean = 50.3
x2 <- data.frame("V1" = runif(100, 200, 450), "siteno" = "X2") #mean = 322.4
x3 <- data.frame("V1" = runif(100, 50, 150), "siteno" = "X3") #mean = 97.8
xData <- rbind(x1,x2,x3)
xData$siteno <- with(xData, reorder(siteno, V1, mean))
zData <- data.frame("V1" = runif(300, 0, 450), "siteno" = "Z1") #mean = 224.2
#orders xData correctly
ggplot(xData, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, width=1, position = position_dodge(width = 1), outlier.colour = "dark gray", outlier.size = 1) +
geom_boxplot(data=zData, aes(x = siteno , y = V1))
this produces the following plot with no ordering of the x variables:
The point of my graph is to show the test values ordered by their mean and then have the control values boxplot off to the right for visual reference. I imagine there could be a solution that combines the xData and zData dataframes; I am willing to try that if there are some suggestions.
Thank you for your time.

When you use two data frame to combine data in one plot original levels (and order) is lost and new levels than combine data from both data frames are used. You don't get this behavior for fill values because you don't provide fill argument for the second data frame. But for the discrete x scale both data frames are combined and new levels are X1, X2, X3 and Z1.
Without making one data frame from all values you can use scale_x_discrete() and then in argument limits= use function levels() to get original order of levels of siteno and combine it with Z1 for reference level.
ggplot(xData, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, outlier.colour = "dark gray",
outlier.size = 1) +
geom_boxplot(data=zData, aes(x = siteno , y = V1))+
scale_x_discrete(limits=c(levels(xData$siteno),"Z1"))

why not add them all in one data.frame and order all 4 levels in that?
data2 <- rbind(xData, zData)
ggplot(data2, aes(x = siteno , y = V1)) +
stat_summary(fun.y=mean, colour="red", geom="point") +
geom_boxplot (aes(fill=siteno), alpha=.5, width=1,
position = position_dodge(width = 1),
outlier.colour = "dark gray", outlier.size = 1)

capture the desired order, eg, something like:
ord <- xvars[order(mean(xvars))]
Then use scale_x_discrete()

Related

Adding extra geom_point with differect scales

I am trying to add another set of points on top of a geom_point. The problem is that the initial dataset has a factor with 3 levels while the second doesn't. I want the first set of points to have different colors and shapes according to the factor levels and the second to be uniform. The plot is like this:
plot = ggplot() +
geom_point(data1,
aes(x = x1, y = y1,
color = factor, shape = factor)) +
scale_color_manual(values = factor_color) +
scale_shape_manual(values = factor_shape)
When I add the other set of points,
plot +
geom_point(data2,
aes(x = x2, y = y2))
I get this error
Error: Insufficient values in manual scale. 4 needed but only 3
provided.
I understand why this happens.
But when I set the scales inside the second geom_point, color = "red" and shape = 1 I get this error
Error: Continuous value supplied to discrete scale
Is there a solution to this problem?
EDIT
Example data have this structure
data1 = data.frame(factor = factor(rep(letters[1:3], 3)),
x1 = rnorm(9),
y1 = rnorm(9))
data2 = data.frame(x2 = rnorm(6),
y2 = rnorm(6))
factor_color = scales::hue_pal()(3)
factor_shape = c(19, 15, 17)
In my example, I have no error.
Just guessed you data:
data1 <- data.frame(factor = factor(letters[1:4]),
x1 = rnorm(4),
y1 = rnorm(4))
data2 <- data.frame(factor = factor(letters[1:3]),
x2 = rnorm(3),
y2 = rnorm(3))
factor_color <- scales::hue_pal()(4)
factor_shape <- 1:4
Your code with two small changes.
I specified data = in both geom_point
library(ggplot2)
plot <- ggplot() +
geom_point(data = data1,
aes(x = x1, y = y1,
color = factor, shape = factor)) +
scale_color_manual(values = factor_color) +
scale_shape_manual(values = factor_shape)
plot +
geom_point(data = data2,
aes(x = x2, y = y2))

Remove points with 0 density (no data) in stat_density_2d(geom = 'point')

I have two dataframes, one which I want to make a stat_density_2d plot using a 'raster' geom and one in which I want to use a 'point' geom. For the point geom I want to remove any point where there is no data though, as measured by a point size of 0.
The following is my code:
library(tidyverse)
set.seed(1)
#tibble for raster density plot
df <- tibble(x = runif(1000000, min = -7, max = 5),
y = runif(1000000, min = 0, max = 1000))
#tibble for point density plot
df2 <- tibble(x = runif(20000, min = -2, max = 2),
y = runif(20000, min = 0, max = 500))
#create the density plot
p1 <- ggplot(NULL, aes(x=x, y=y) ) +
stat_density_2d(data = df, aes(fill = stat(density)), geom = "raster", contour = FALSE) +
scale_fill_gradient(low="transparent", high="red") +
stat_density_2d(data = df2, geom = "point", aes(size = ..density..), n = 40, contour = FALSE) +
theme_bw() +
theme(text=element_text(size=18)) +
ylim(0, 1000) + xlim(-7, 5)
p1
which returns:
But where the points are smallest (outside the bounds specified in the df2 tibble) I don't want any density points to be shown. Is there anyway to remove these?
Here's a hack, though I don't know how robust it is to differences in data.
BLUF: add scale_radius(range=c(-1,6)).
I reduced your data a lot so that it doesn't take 5 minutes to render.
set.seed(1)
df <- tibble(x = runif(1000, min = -7, max = 5),
y = runif(1000, min = 0, max = 1000))
df2 <- tibble(x = runif(20, min = -2, max = 2),
y = runif(20, min = 0, max = 500))
Four plots:
Your code (my data), no other change;
scale_radius();
scale_radius(range = c(-0.332088004, 6)); and
scale_radius(range = c(-1, 6)).
This is surely a hack, and I don't know how to find a more precise way of filtering out specific levels.
The modified code:
p1 <- ggplot(NULL, aes(x=x, y=y) ) +
stat_density_2d(data = df, aes(fill = stat(density)), geom = "raster", contour = FALSE) +
scale_fill_gradient(low="transparent", high="red") +
stat_density_2d(data = df2, geom = "point", aes(size = ..density..), n = 40, contour = FALSE) +
theme_bw() +
# scale_radius() +
# scale_radius(range = c(-0.332088004, 6)) +
scale_radius(range = c(-1, 6)) +
theme(text=element_text(size=18)) +
ylim(0, 1000) + xlim(-7, 5)

Multiple data lines in a single ggplot2

I would like to plot multiple lines in a single ggplot, where each line would represent relationship between x and y given two or more parameters.
I know how to do that for one parameter:
Take following example data:
library(ggplot2)
library(reshape2)
rs = data.frame(seq(200, 1000, by=200),
runif(5),
runif(5),
rbinom(n = 5, size = 1, prob = 0.5))
names(rs) = c("x_", "var1", "var2", "par")
melted = melt(rs, id.vars="x_")
ggplot(data = melted,
aes(x = x_, y = value, group = variable, col = variable)) +
geom_point() +
geom_line(linetype = "dashed")
This plots three lines one for var1, one for var2 and one for par.
However, I would like four lines: one for var1 given par=0 and another one for var1 given par=1, and the same then again for var2.
How would this scale up, for example if I want that the condition is a combination of multiple parameters (e.g. par2 + par)?
If you melt the data in a different way, you can use par to change the shape and linetype of your lines, so it's nice and clear which line is which:
rs_melt = melt(rs, id.vars = c("x_", "par"))
ggplot(rs_melt, aes(x = x_, y = value, colour = variable,
shape = factor(par), linetype = factor(par))) +
geom_line(size = 1.1) +
geom_point(size = 3) +
labs(shape = "par", linetype = "par")
Output:
You need to adjust your melt function and add a group column which has both par and var details. I think below is what you want?
library(reshape)
library(ggplot2)
rs = data.frame(seq(200, 1000, by=200), runif(5), runif(5), rbinom(n = 5, size = 1, prob = 0.5))
names(rs)=c("x_", "var1", "var2", "par")
melted = melt(rs, id.vars=c("x_", "par"))
melted$group <- paste(melted$par, melted$variable)
ggplot(data=melted, aes(x=x_, y=value, group =group, col=group))+ geom_point() + geom_line(linetype = "dashed")

Significance annotation in facets

I am trying to annotate the plot below in a pairwise fashion - in each facet compare corresponding samples in the variable. Essentially comparing CTR from pos to CTR from neg and so on. I can't seem to get it to work.
Here is my data and plots:
library(ggpubr)
#data.frame
samples <- rep(c('LA', 'EA', 'CTR'), 300)
variable <- sample(c('pos', 'neg'), 900, replace = T)
stim <- rep(c('rp','il'), 450)
population <- sample(c('EM','CM','TEMRA'), 900, replace = T)
values <- runif(900, min = 0, max = 100)
df <- data.frame(samples, variable, stim, population, values)
#test and comparisons
test_comparisons <- list(c('neg', 'pos'))
test <- compare_means(values ~ variable, data = df, method = 'wilcox.test',
group.by = c('samples', 'stim', 'population'))
#plot
ggplot(aes(x= variable, y = values, fill = samples), data = df) +
geom_boxplot(position = position_dodge(0.85)) +
geom_dotplot(binaxis='y', stackdir='center', position =
position_dodge(0.85), dotsize = 1.5) +
facet_grid(population ~ stim, scales = 'free_x') +
stat_compare_means(comparisons = test_comparisons, label = 'p.signif') +
theme_bw()
This only produces 1 comparison per facet between pos and neg instead of 3...What am I doing wrong?
You can use the following code:
samples <- rep(c('LA', 'EA', 'CTR'), 300)
variable <- sample(c('pos', 'neg'), 900, replace = T)
stim <- rep(c('rp','il'), 450)
population <- sample(c('EM','CM','TEMRA'), 900, replace = T)
values <- runif(900, min = 0, max = 100)
df <- data.frame(samples, variable, stim, population, values)
#test and comparisons
test_comparisons <- list(c('neg', 'pos'))
test <- compare_means(values ~ variable, data = df, method = 'wilcox.test',
group.by = c('samples', 'stim', 'population'))
#plot
ggplot(aes(x= variable, y = values, fill = samples), data = df) +
geom_boxplot(position = position_dodge(0.85)) +
geom_dotplot(binaxis='y', stackdir='center', position =
position_dodge(0.85), dotsize = 1.5) +
facet_grid(population ~ stim+samples, scales = 'free_x') +
stat_compare_means(comparisons = test_comparisons, label = 'p.signif') +
theme_bw()
Hope this will rectify your problem

How can I use different color palettes for different layers in ggplot2?

Is it possible to plot two sets of data on the same plot, but use different color palettes for each set?
testdf <- data.frame( x = rnorm(100),
y1 = rnorm(100, mean = 0, sd = 1),
y2 = rnorm(100, mean = 10, sd = 1),
yc = rnorm(100, mean = 0, sd = 3))
ggplot(testdf, aes(x, y1, colour = yc)) + geom_point() +
geom_point(aes(y = y2))
What I would like to see is one set of data, say y1, in blues (color set by yc), and the other set in reds (again color set by yc).
The legend should then show 2 color scales, one in blue, the other red.
Thanks for your suggestions.
If you translate the "blues" and "reds" to varying transparency, then it is not against ggplot's philosophy. So, using Thierry's Moltenversion of the data set:
ggplot(Molten, aes(x, value, colour = variable, alpha = yc)) + geom_point()
Should do the trick.
That's not possible with ggplot2. I think it against the philosophy of ggplot2 because it complicates the interpreatation of the plot.
Another option is to use different shapes to separate the points.
testdf <- data.frame( x = rnorm(100),
y1 = rnorm(100, mean = 0, sd = 1),
y2 = rnorm(100, mean = 10, sd = 1),
yc = rnorm(100, mean = 0, sd = 3))
Molten <- melt(testdf, id.vars = c("x", "yc"))
ggplot(Molten, aes(x, value, colour = yc, shape = variable)) + geom_point()

Resources