Adding additional points to ggplot Boxplot

Adding additional points to ggplot Boxplot - r

I have built a simple boxplot using ggplot, and I am trying to add an additional theoretical data-point - 'theoretical' in the sense that it did not form part of the original boxplot, but is linked to another dataset I would like to make a comparison to...
Here is my boxplot at present with some dummy data.
# create a dataset
data <- data.frame(
name=c( rep("A",10), rep("B",10), rep("B",10), rep("C",10), rep('D', 10) ),
value=c( rnorm(10, 10, 3), rnorm(10, 10, 1), rnorm(10, 4, 2), rnorm(10, 6, 2), rnorm(10, 8, 4) )
)
# Plot
data %>%
ggplot( aes(x=name, y=value, fill=name)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.5) +
geom_jitter(position=position_jitter(0.2), color="black", size=2.0, alpha=0.9, pch=21)
If I had the below array, where each value represents a theoretical data-point for each condition from a different distribution, how would I include that data-point on the above plot (with a different plot character)?
A_new <- c(5)
B_new <- c(6)
C_new <- c(10)
D_new <- c(7)
new_vals <- c(A_new, B_new, C_new, D_new)

You can do this by saving the original ggplot object in a variable and then adding additional layers via "+" later on.
x=data %>%
ggplot( aes(x=name, y=value, fill=name)) +
geom_boxplot() +
geom_jitter(position=position_jitter(0.2), color="black", size=2.0, alpha=0.9, pch=21)
new_data <- data.frame(name=c("A", "B", "C", "D"), value=new_vals)
x + geom_jitter(data=new_data, aes(x=name, y=value, fill=name), position=position_jitter(.2), color="blue", size=1.5, pch=20)

Related

Stacked boxplot and scatter plot - group BOTH by same variable

I am trying to create a scatter plot stacked on a boxplot. Similar dummy data below. The boxplot behaves well, as I want one boxplot for each of the three "exp" variables both "before" AND "after" (as seen in graph below, 6 box plots).
The problem however is that I also want the scatter plot data to lie on top of the correct plot (divided by before/after). Now, the points are just in between the two box plots, as you can see.
exp <- rep(c("smile", "neutral", "depressor"), each=5, times=2)
time <- rep(c("before", "after"), each = 15)
result <- rnorm(15, mean=50, sd=4)
result <- append(result, c(rnorm(15, mean=47, sd=3)))
data <- data.frame(exp, time, result)
ggplot(data, aes(exp, result, fill=time)) +
geom_boxplot() +
geom_point()
I would really appreciate some input, thanks in advance!

Is this solving your issue?
Here you add the time group in geom_point.
ggplot(data, aes(exp, result, fill=time)) +
stat_boxplot(width=0.5, position = position_dodge(1)) +
geom_boxplot(position = position_dodge(1), outlier.shape = NA)+
geom_point(aes(fill = time, group = time), color="black",
position = position_jitterdodge(jitter.width = .1, dodge.width = 1))
Before
After

You could use geom_jitter like this:
exp <- rep(c("smile", "neutral", "depressor"), each=5, times=2)
time <- rep(c("before", "after"), each = 15)
result <- rnorm(15, mean=50, sd=4)
result <- append(result, c(rnorm(15, mean=47, sd=3)))
data <- data.frame(exp, time, result)
library(ggplot2)
ggplot(data, aes(exp, result, fill=time)) +
geom_boxplot() +
geom_jitter()
Created on 2022-08-30 with reprex v2.0.2

How to add labels to facet_wrap(ed) geom_count plot?

I am creating a facetted plot using facet_wrap. I want text labels to be included inside the bubble. Instead it seems the total is included as label - i.e. all graphs has the same numbers but different bubble size (which is correct).
(Edits)
My code:
Category1 <- c('A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B')
Category2 <- c('W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V')
Class <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
df <- data.frame(Category1, Category2, Class)
g <- ggplot(df, aes(Category1, Category2))
g <- g + facet_wrap(Class ~ ., nrow = 3) + geom_count(col="tomato3", show.legend=F) + scale_size_continuous(range = c(5, 10))
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g
g2 <- g + geom_text(data=ggplot_build(g)$data[[1]], aes(x, y, label=n), size=2) #+ scale_size(range = c(5, 15))
g2
I expect that the size of the bubble will be indicated by the text inside the bubble. But the actual result is all graphs have the same number. I want the small bubble to have small number proportional to its size.

The problem is that your code using ggplot_build data does not have the same categories as the original. You need to create a count data before hand and use it for plotting.
Create count data
library(tidyverse)
df_count <- df %>%
count(Class, Category1, Category2)
Plot
There are two ways to incorporate this new data.
Method 1
The first example I show is to use both df and df_count. This method will modify your code minimally:
g <- ggplot(df, aes(Category1, Category2))
g <- g + facet_wrap(Class ~ ., nrow = 3) + geom_count(col="tomato3", show.legend=F) +
geom_text(data = df_count, aes(Category1, Category2, label=n), size=2) +
scale_size_continuous(range = c(5, 10)) +
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g
The line geom_text(data = df_count, aes(Category1, Category2, label=n), size=2) + is added.
Method 2
This method uses only the count data. It uses geom_point() instead of geom_count() and alter the size using the variable n. This method is probably better in terms of code readability.
g_alternative <- ggplot(df_count, aes(Category1, Category2, label = n)) +
facet_wrap(Class ~ ., nrow = 3) +
geom_point(col="tomato3", aes(size = n), show.legend=F) +
geom_text() +
scale_size_continuous(range = c(5, 10)) +
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g_alternative
The output looks like this:

ggplot2 geom_tile diagonal line overlay

I'm looking for a way to produce a diagonal slash from the bottom left the to top right corner of a cell within a plot made using geom_tile.
The input is a melted data frame with two categorical factor columns, sample and gene. I'd like to use something like geom_segment, but I'm not able to specify fractional increments. Any ideas on the best way to accomplish this?
edit: Here is a reproducible example, I can't share one from my own data, as it's protected patient information.
df <- data_frame( gene = c('TP53','TP53','MTOR','BRACA1'),
sample = c('A','B','A','B'),
diagonal = c(FALSE,TRUE,TRUE,FALSE),
effect = c('missense', 'nonsense', 'missense', 'silent') )
ggplot(df, aes(sample, gene)) + geom_tile(aes(fill = effect))
what I'm looking for:

One way to do it:
library(ggplot2)
df <- data.frame(
x = rep(c(2, 5, 7, 9, 12), 2),
y = rep(c(1, 2), each = 5),
z = factor(1:10),
w = rep(diff(c(0, 4, 6, 8, 10, 14)), 2)
)
p <- ggplot(df, aes(x, y)) + geom_tile(aes(fill = z))
gb <- ggplot_build(p)
p + geom_segment(data=gb$data[[1]][1:2, ],
aes(x=xmin, xend=xmax, y=ymin, yend=ymax),
color="white")
In your example, could also rely on the indices of the factor levels like this:
library(ggplot2)
df <- data.frame( gene = c('TP53','TP53','MTOR','BRACA1'),
sample = c('A','B','A','B'),
diagonal = c(FALSE,TRUE,TRUE,FALSE),
effect = c('missense', 'nonsense', 'missense', 'silent') )
df$cross <- c(F,T,T,F)
ggplot(df, aes(sample, gene)) +
geom_tile(aes(fill = effect)) +
geom_segment(data=transform(subset(df, !!cross), sample=as.numeric(sample), gene=as.numeric(gene)),
aes(x=sample-.49, xend=sample+.49, y=gene-.49, yend=gene+.49),
color="white", size=2)
(Note that I used data.frame and not dplyr::data_frame, so that both columns become factors.)
If you want a legend:
ggplot(df, aes(sample, gene)) +
geom_tile(aes(fill = effect)) +
geom_segment(data=transform(subset(df, !!cross), sample=as.numeric(sample), gene=as.numeric(gene)),
aes(x=sample-.49, xend=sample+.49, y=gene-.49, yend=gene+.49, color=cross),
size=2) +
scale_color_manual(values=c("TRUE"="white", "FALSE"=NA))

You can use geom_abline. You can tweak intercept and slope to get what you want. More info and examples here.
ggplot(df, aes(sample, gene)) +
geom_tile(aes(fill = effect)) +
geom_abline(intercept = 1, slope = 1, color="white", size=2)

If you don't actually want specific lines, but just want to highlight, you can simply draw dots:
ggplot(df, aes(sample, gene)) + geom_tile(aes(fill = effect)) +
geom_point(aes(sample, gene))
You can make it look like a line: geom_point(aes(sample, gene), shape='/', size=10, color='white')
To have the lines be only on some tiles, simply pass only the rows with those coordinates to geom_point: geom_point(data=filter(df, diagonal), aes(sample, gene))
Alternatively, you can hack it with a manual shape scale: geom_point(aes(sample, gene, shape=diagonal)) + scale_shape_manual(values=c(' ', '/'))

Axis Labels that are ggplot2 objects / grobs

I wish to use ggplot2 objects/grobs/plots as axis labels.
Here is my toy example:
library(dplyr)
library(ggplot2)
# master plot
df <- data_frame(y = c("unchanging", "increasing", "decreasing"), x = c(20, 50, 30))
ggplot(df, aes(x, y)) + geom_point()
# fxn generates ggplot2 object specifying a line plot from two points
two_pt_line_plot <- function(y1, y2) {
df <- data_frame(y = c(y1, y2), x = c("from", "to"))
ggplot(df, aes(x,y, group = 1)) + geom_line(size = 4) +
xlab(NULL) + ylab(NULL) +
scale_x_discrete(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0))
}
# make the three plot objects, name them appropriately.
grobs <- Map(two_pt_line_plot, c(.5,0,1), c(.5, 1, 0))
names(grobs) <- df$y
grobs
#> $unchanging
#> $increasing
#> $decreasing
I want to programmatically generate this:
The only thing I can currently think of is that I somehow layer over the plots as facets where the theming as been hacked to the max to make it look like it belongs. But I haven't been able to do that yet and it seems like a very hack-y solution. I therefore thought I would throw it out there.

group by two columns in ggplot2

Is it possible to group by two columns? So the cross product is drawn
by geom_point() and geom_smooth()?
As example:
frame <- data.frame(
series <- rep(c('a', 'b'), 6),
sample <- rep(c('glass','water', 'metal'), 4),
data <- c(1:12))
ggplot(frame, aes()) # ...
Such that the points 6 and 12 share a group, but not with 3.

Taking the example from this question, using interaction to combine two columns into a new factor:
# Data frame with two continuous variables and two factors
set.seed(0)
x <- rep(1:10, 4)
y <- c(rep(1:10, 2)+rnorm(20)/5, rep(6:15, 2) + rnorm(20)/5)
treatment <- gl(2, 20, 40, labels=letters[1:2])
replicate <- gl(2, 10, 40)
d <- data.frame(x=x, y=y, treatment=treatment, replicate=replicate)
ggplot(d, aes(x=x, y=y, colour=treatment, shape = replicate,
group=interaction(treatment, replicate))) +
geom_point() + geom_line()

for example:
qplot(round, price, data=firm, group=id, color=id, geom='line') +
geom_smooth(aes(group=interaction(size, type)))

Why not just paste those two columns together and use that variable as groups?
frame$grp <- paste(frame[,1],frame[,2])
A somewhat more formal way to do this would be to use the function interaction.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Adding additional points to ggplot Boxplot - r

Related

Stacked boxplot and scatter plot - group BOTH by same variable

How to add labels to facet_wrap(ed) geom_count plot?

ggplot2 geom_tile diagonal line overlay

Axis Labels that are ggplot2 objects / grobs

group by two columns in ggplot2

Categories

Resources