Stacked boxplot and scatter plot - group BOTH by same variable - r

I am trying to create a scatter plot stacked on a boxplot. Similar dummy data below. The boxplot behaves well, as I want one boxplot for each of the three "exp" variables both "before" AND "after" (as seen in graph below, 6 box plots).
The problem however is that I also want the scatter plot data to lie on top of the correct plot (divided by before/after). Now, the points are just in between the two box plots, as you can see.
exp <- rep(c("smile", "neutral", "depressor"), each=5, times=2)
time <- rep(c("before", "after"), each = 15)
result <- rnorm(15, mean=50, sd=4)
result <- append(result, c(rnorm(15, mean=47, sd=3)))
data <- data.frame(exp, time, result)
ggplot(data, aes(exp, result, fill=time)) +
geom_boxplot() +
geom_point()
I would really appreciate some input, thanks in advance!

Is this solving your issue?
Here you add the time group in geom_point.
ggplot(data, aes(exp, result, fill=time)) +
stat_boxplot(width=0.5, position = position_dodge(1)) +
geom_boxplot(position = position_dodge(1), outlier.shape = NA)+
geom_point(aes(fill = time, group = time), color="black",
position = position_jitterdodge(jitter.width = .1, dodge.width = 1))
Before
After

You could use geom_jitter like this:
exp <- rep(c("smile", "neutral", "depressor"), each=5, times=2)
time <- rep(c("before", "after"), each = 15)
result <- rnorm(15, mean=50, sd=4)
result <- append(result, c(rnorm(15, mean=47, sd=3)))
data <- data.frame(exp, time, result)
library(ggplot2)
ggplot(data, aes(exp, result, fill=time)) +
geom_boxplot() +
geom_jitter()
Created on 2022-08-30 with reprex v2.0.2

Related

Map straight shaded error onto a coord_polar plot

I'd wondering if someone could please offer me a suggestion on how to sort out my shaded error for each line in this radar plot? I've tried several different approaches but am not getting what I want.
library(ggplot2)
library(scales)
# Make some data
Group.no <- 3
Group.names <- c("1","2","3")
Metric.no <- 4
Metriclist <- c("M1", "M4", "M6","M8")
Metric <- c(rep(c(Metriclist), each = Group.no))
Group <- c(rep(c(Group.names), times = Metric.no))
Mg <- c(87.7, 93.8, 72.5, 190.3, 170.9, 138.4, 283.2, 248.7, 196.5, 340.6, 307.9, 240.9)
d <- data.frame(Metric, Group, Mg)
d$lowCI <- Mg-8
d$highCI <- Mg+8
# Plot data
Plot <- ggplot(d, aes(x = Metric, y = Mg, group = Group)) +
geom_polygon(aes(group = Group, colour = Group), fill = NA, size = 1.1) +
geom_ribbon(aes(x=Metric,y=Mg,ymin=lowCI,ymax=highCI, group = Group, fill=Group), alpha=.3) +
coord_polar(start = -((180/Metric.no)*(pi/180)))+
theme_light()
Plot
Example plot
As you can see, the geom_ribbon isnt plotting the lower and upper CI's in line with geom_polgyon. I am currently at the peak of my knowledge on this. Is anyone able to offer a suitable fix so that the shaded CI's track the polygon please so that both are straight lines?
Thanks in advance for any solutions!
With a bit of fiddling with the data, you can just about make it work. Create a separate dataframe that has the confidence intervals (note that we need to repeat the first point to make the intervals work correctly between M8 and M1) with:
i <- c(seq_along(d[[1]]), which(d$Metric=="M1"))
dCI <- rbind(data.frame(d[i,1:2], CI=d$lowCI[i], type="low") ,
data.frame(d[i,1:2], CI=d$highCI[i], type="high"))
Then we can add the confidence intervals as a polygon with:
Plot <- ggplot(d, aes(x = Metric, y = Mg, group = Group)) +
geom_polygon(aes(group = Group, colour = Group), fill = NA, size = 1.1) +
geom_polygon(aes(x=Metric,y=CI, group = Group, fill=Group), data=dCI, alpha=.3) +
coord_polar(start = -((180/Metric.no)*(pi/180)))+
theme_light()
Plot
This produces the output:

Removing the borders in geom_boxplot in ggplot2

This should seem relatively straightforward but I can't find an argument which would allow me to do this and I've searched Google and Stack for an answer.
Sample code:
library(ggplot2)
library(plotly)
dat <- data.frame(cond = factor(rep(c("A","B"), each=200)), rating = c(rnorm(200),rnorm(200, mean=.8)))
p <- ggplot(dat, aes(x=cond, y=rating, fill=cond)) + geom_boxplot()
p <- ggplotly(p)
This outputs the first graph, I would want something like the second.
I tried including colour=cond but that gets rid of the median.
Two possible hacks for consideration, using the same dataset as Marco Sandri's answer.
Hack 1. If you don't really need it to work in plotly, just static ggplot image:
ggplot(dat, aes(x=cond, y=rating, fill=cond)) +
geom_boxplot() +
geom_boxplot(aes(color = cond),
fatten = NULL, fill = NA, coef = 0, outlier.alpha = 0,
show.legend = F)
This overlays the original boxplot with a version that's essentially an outline of the outer box, hiding the median (fatten = NULL), fill colour (fill = NA), whiskers (coef = 0) & outliers (outlier.alpha = 0).
However, it doesn't appear to work well with plotly. I've tested it with the dev version of ggplot2 (as recommended by plotly) to no avail. See output below:
Hack 2. If you need it to work in plotly:
ggplot(dat %>%
group_by(cond) %>%
mutate(rating.IQR = case_when(rating <= quantile(rating, 0.3) ~ quantile(rating, 0.25),
TRUE ~ quantile(rating, 0.75))),
aes(x=cond, y=rating, fill=cond)) +
geom_boxplot() +
geom_boxplot(aes(color = cond, y = rating.IQR),
fatten = NULL, fill = NA)
(ggplot output is same as above)
plotly doesn't seem to understand the coef = 0 & output.alpha = 0 commands, so this hack creates a modified version of the y variable, such that everything below P30 is set to P25, and everything above is set to P75. This creates a boxplot with no outliers, no whiskers, and the median sits together with the upper box limit at P75.
It's more cumbersome, but it works in plotly:
Here is an inelegant solution based on grobs:
set.seed(1)
dat <- data.frame(cond = factor(rep(c("A","B"), each=200)),
rating = c(rnorm(200),rnorm(200, mean=.8)))
library(ggplot2)
library(plotly)
p <- ggplot(dat, aes(x=cond, y=rating, fill=cond)) + geom_boxplot()
# Generate a ggplot2 plot grob
g <- ggplotGrob(p)
# The first box-and-whiskers grob
box_whisk1 <- g$grobs[[6]]$children[[3]]$children[[1]]
pos.box1 <- which(grepl("geom_crossbar",names(box_whisk1$children)))
g$grobs[[6]]$children[[3]]$children[[1]]$children[[pos.box1]]$children[[1]]$gp$col <-
g$grobs[[6]]$children[[3]]$children[[1]]$children[[pos.box1]]$children[[1]]$gp$fill
# The second box-and-whiskers grob
box_whisk2 <- g$grobs[[6]]$children[[3]]$children[[2]]
pos.box2 <- which(grepl("geom_crossbar",names(box_whisk2$children)))
g$grobs[[6]]$children[[3]]$children[[2]]$children[[pos.box2]]$children[[1]]$gp$col <-
g$grobs[[6]]$children[[3]]$children[[2]]$children[[pos.box2]]$children[[1]]$gp$fill
library(grid)
grid.draw(g)
P.S. To my knowledge, the above code cannot be used for generating plotly graphs.

Violin plots with additional points

Suppose I make a violin plot, with say 10 violins, using the following code:
library(ggplot2)
library(reshape2)
df <- melt(data.frame(matrix(rnorm(500),ncol=10)))
p <- ggplot(df, aes(x = variable, y = value)) +
geom_violin()
p
I can add a dot representing the mean of each variable as follows:
p + stat_summary(fun.y=mean, geom="point", size=2, color="red")
How can I do something similar but for arbitrary points?
For example, if I generate 10 new points, one drawn from each distribution, how could I plot those as dots on the violins?
You can give any function to stat_summary provided it just returns a single value. So one can use the function sample. Put extra arguments such as size, in the fun.args
p + stat_summary(fun.y = "sample", geom = "point", fun.args = list(size = 1))
Assuming your points are qualified using the same group names (i.e., variable), you should be able to define them manually with:
newdf <- group_by(df, variable) %>% sample_n(10)
p + geom_point(data=newdf)
The points can be anything, including static numbers:
newdf <- data.frame(variable = unique(df$variable), value = seq(-2, 2, len=10))
p + geom_point(data=newdf)
I had a similar problem. Code below exemplifies the toy problem - How does one add arbitrary points to a violin plot? - and solution.
## Visualize data set that comes in base R
head(ToothGrowth)
## Make a violin plot with dose variable on x-axis, len variable on y-axis
# Convert dose variable to factor - Important!
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Plot
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1)
# Suppose you want to add 3 blue points
# [0.5, 10], [1,20], [2, 30] to the plot.
# Make a new data frame with these points
# and add them to the plot with geom_point().
TrueVals <- ToothGrowth[1:3,]
TrueVals$len <- c(10,20,30)
# Make dose variable a factor - Important for positioning points correctly!
TrueVals$dose <- as.factor(c(0.5, 1, 2))
# Plot with 3 added blue points
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin(trim = FALSE) +
geom_boxplot(width=0.1) +
geom_point(data = TrueVals, color = "blue")

How to add different lines for facets

I have data where I look at the difference in growth between a monoculture and a mixed culture for two different species. Additionally, I made a graph to make my data clear.
I want a barplot with error bars, the whole dataset is of course bigger, but for this graph this is the data.frame with the means for the barplot.
plant species means
Mixed culture Elytrigia 0.886625
Monoculture Elytrigia 1.022667
Monoculture Festuca 0.314375
Mixed culture Festuca 0.078125
With this data I made a graph in ggplot2, where plant is on the x-axis and means on the y-axis, and I used a facet to divide the species.
This is my code:
limits <- aes(ymax = meansS$means + eS$se, ymin=meansS$means - eS$se)
dodge <- position_dodge(width=0.9)
myplot <- ggplot(data=meansS, aes(x=plant, y=means, fill=plant)) + facet_grid(. ~ species)
myplot <- myplot + geom_bar(position=dodge) + geom_errorbar(limits, position=dodge, width=0.25)
myplot <- myplot + scale_fill_manual(values=c("#6495ED","#FF7F50"))
myplot <- myplot + labs(x = "Plant treatment", y = "Shoot biomass (gr)")
myplot <- myplot + opts(title="Plant competition")
myplot <- myplot + opts(legend.position = "none")
myplot <- myplot + opts(panel.grid.minor=theme_blank(), panel.grid.major=theme_blank())
So far it is fine. However, I want to add two different horizontal lines in the two facets. For that, I used this code:
hline.data <- data.frame(z = c(0.511,0.157), species = c("Elytrigia","Festuca"))
myplot <- myplot + geom_hline(aes(yintercept = z), hline.data)
However if I do that, I get a plot were there are two extra facets, where the two horizontal lines are plotted. Instead, I want the horizontal lines to be plotted in the facets with the bars, not to make two new facets. Anyone a idea how to solve this.
I think it makes it clearer if I put the graph I create now:
Make sure that the variable species is identical in both datasets. If it a factor in one on them, then it must be a factor in the other too
library(ggplot2)
dummy1 <- expand.grid(X = factor(c("A", "B")), Y = rnorm(10))
dummy1$D <- rnorm(nrow(dummy1))
dummy2 <- data.frame(X = c("A", "B"), Z = c(1, 0))
ggplot(dummy1, aes(x = D, y = Y)) + geom_point() + facet_grid(~X) +
geom_hline(data = dummy2, aes(yintercept = Z))
dummy2$X <- factor(dummy2$X)
ggplot(dummy1, aes(x = D, y = Y)) + geom_point() + facet_grid(~X) +
geom_hline(data = dummy2, aes(yintercept = Z))

Plotting two variables using ggplot2 - same x axis

I have two graphs with the same x axis - the range of x is 0-5 in both of them.
I would like to combine both of them to one graph and I didn't find a previous example.
Here is what I got:
c <- ggplot(survey, aes(often_post,often_privacy)) + stat_smooth(method="loess")
c <- ggplot(survey, aes(frequent_read,often_privacy)) + stat_smooth(method="loess")
How can I combine them?
The y axis is "often privacy" and in each graph the x axis is "often post" or "frequent read".
I thought I can combine them easily (somehow) because the range is 0-5 in both of them.
Many thanks!
Example code for Ben's solution.
#Sample data
survey <- data.frame(
often_post = runif(10, 0, 5),
frequent_read = 5 * rbeta(10, 1, 1),
often_privacy = sample(10, replace = TRUE)
)
#Reshape the data frame
survey2 <- melt(survey, measure.vars = c("often_post", "frequent_read"))
#Plot using colour as an aesthetic to distinguish lines
(p <- ggplot(survey2, aes(value, often_privacy, colour = variable)) +
geom_point() +
geom_smooth()
)
You can use + to combine other plots on the same ggplot object. For example, to plot points and smoothed lines for both pairs of columns:
ggplot(survey, aes(often_post,often_privacy)) +
geom_point() +
geom_smooth() +
geom_point(aes(frequent_read,often_privacy)) +
geom_smooth(aes(frequent_read,often_privacy))
Try this:
df <- data.frame(x=x_var, y=y1_var, type='y1')
df <- rbind(df, data.frame(x=x_var, y=y2_var, type='y2'))
ggplot(df, aes(x, y, group=type, col=type)) + geom_line()

Resources