ggplot2 geom_tile diagonal line overlay - r

I'm looking for a way to produce a diagonal slash from the bottom left the to top right corner of a cell within a plot made using geom_tile.
The input is a melted data frame with two categorical factor columns, sample and gene. I'd like to use something like geom_segment, but I'm not able to specify fractional increments. Any ideas on the best way to accomplish this?
edit: Here is a reproducible example, I can't share one from my own data, as it's protected patient information.
df <- data_frame( gene = c('TP53','TP53','MTOR','BRACA1'),
sample = c('A','B','A','B'),
diagonal = c(FALSE,TRUE,TRUE,FALSE),
effect = c('missense', 'nonsense', 'missense', 'silent') )
ggplot(df, aes(sample, gene)) + geom_tile(aes(fill = effect))
what I'm looking for:

One way to do it:
library(ggplot2)
df <- data.frame(
x = rep(c(2, 5, 7, 9, 12), 2),
y = rep(c(1, 2), each = 5),
z = factor(1:10),
w = rep(diff(c(0, 4, 6, 8, 10, 14)), 2)
)
p <- ggplot(df, aes(x, y)) + geom_tile(aes(fill = z))
gb <- ggplot_build(p)
p + geom_segment(data=gb$data[[1]][1:2, ],
aes(x=xmin, xend=xmax, y=ymin, yend=ymax),
color="white")
In your example, could also rely on the indices of the factor levels like this:
library(ggplot2)
df <- data.frame( gene = c('TP53','TP53','MTOR','BRACA1'),
sample = c('A','B','A','B'),
diagonal = c(FALSE,TRUE,TRUE,FALSE),
effect = c('missense', 'nonsense', 'missense', 'silent') )
df$cross <- c(F,T,T,F)
ggplot(df, aes(sample, gene)) +
geom_tile(aes(fill = effect)) +
geom_segment(data=transform(subset(df, !!cross), sample=as.numeric(sample), gene=as.numeric(gene)),
aes(x=sample-.49, xend=sample+.49, y=gene-.49, yend=gene+.49),
color="white", size=2)
(Note that I used data.frame and not dplyr::data_frame, so that both columns become factors.)
If you want a legend:
ggplot(df, aes(sample, gene)) +
geom_tile(aes(fill = effect)) +
geom_segment(data=transform(subset(df, !!cross), sample=as.numeric(sample), gene=as.numeric(gene)),
aes(x=sample-.49, xend=sample+.49, y=gene-.49, yend=gene+.49, color=cross),
size=2) +
scale_color_manual(values=c("TRUE"="white", "FALSE"=NA))

You can use geom_abline. You can tweak intercept and slope to get what you want. More info and examples here.
ggplot(df, aes(sample, gene)) +
geom_tile(aes(fill = effect)) +
geom_abline(intercept = 1, slope = 1, color="white", size=2)

If you don't actually want specific lines, but just want to highlight, you can simply draw dots:
ggplot(df, aes(sample, gene)) + geom_tile(aes(fill = effect)) +
geom_point(aes(sample, gene))
You can make it look like a line: geom_point(aes(sample, gene), shape='/', size=10, color='white')
To have the lines be only on some tiles, simply pass only the rows with those coordinates to geom_point: geom_point(data=filter(df, diagonal), aes(sample, gene))
Alternatively, you can hack it with a manual shape scale: geom_point(aes(sample, gene, shape=diagonal)) + scale_shape_manual(values=c(' ', '/'))

Related

Prevent reordering within facet_wrap()

I have an issue with my ggplot() reordering the data. I have an example code below. I have data, and reordered the factors in feed to my content, but after the str_extract() in facet_wrap(), the data gets reordered back before I reordered it. Is there a way to prevent that from occurring? For my actual code, it is important for me to use regex within the facet_wrap() in ggplot,
data <- chickwts
data <- mutate(data, time = 1:nrow(data))
lvl <- c("linseed", "meatmeal", "sunflower", "soybean",
"casein", "horsebean")
data$feed <- factor(data$feed, levels = lvl)
ggplot(data, aes(x = time, y = weight, color = feed)) +
geom_line(size = 1) + geom_point(size = 1.75) +
facet_wrap(~str_extract(feed,"[a-z]+"))
You could put the factor inside the facet_wrap:
ggplot(data, aes(x = time, y = weight, color = feed)) +
geom_line(size = 1) + geom_point(size = 1.75) +
facet_wrap(~ factor(str_extract(feed,"[a-z]+"), levels = lvl))

r ggplot when two colors overlap

I have some codes to generate a plot,the only problem I have is there're many overlapping colors.
When two colors overlap, how do I specify the dominant color?
For example, there're 4 black points when indicator = threshold. They are at 4 x-axis correspondingly. However, the black points at "Wire" and "ACH" scales do not show up because it is overlap with blue points. The black point at "RDFI" scale barely shows up. How can I make black as the dominant color when two colors overlap? Thanks ahead!
ggplot(df, aes(a-axis, y-axis), color=indicator)) +
geom_quasirandom(groupOnX=TRUE, na.rm = TRUE) +
labs(title= 'chart', x='x-axis', y= 'y-axis') +
scale_color_manual(name = 'indicator', values=c("#99ccff","#000000" ))
for specify the dominant color you should use the function new_scale () and its aliases new_scale_color () and new_scale_fill ().
As an example, lets overlay some measurements over a contour map of topography using the beloed volcano
library(ggplot2)
library(ggnewscale)
# Equivalent to melt(volcano)
topography <- expand.grid(x = 1:nrow(volcano),
y = 1:ncol(volcano))
topography$z <- c(volcano)
# point measurements of something at a few locations
set.seed(42)
measurements <- data.frame(x = runif(30, 1, 80),
y = runif(30, 1, 60),
thing = rnorm(30))
dominant point:
ggplot(mapping = aes(x, y)) +
geom_contour(data = topography, aes(z = z, color = stat(level))) +
# Color scale for topography
scale_color_viridis_c(option = "D") +
# geoms below will use another color scale
new_scale_color() +
geom_point(data = measurements, size = 3, aes(color = thing)) +
# Color scale applied to geoms added after new_scale_color()
scale_color_viridis_c(option = "A")
dominant contour:
ggplot(mapping = aes(x, y)) +
geom_point(data = measurements, size = 3, aes(color = thing)) +
scale_color_viridis_c(option = "A")+
new_scale_color() +
geom_contour(data = topography, aes(z = z, color = stat(level))) +
scale_color_viridis_c(option = "D")
Your problem may not lie with what color is dominant. You have selected colors that will show up often. You may be losing the bottom of your Y axis. The code you have in your example can not have possibly produced that plot it has errors.
Here is a simple example that show's one way to overcome your problem by simply overplottting the threshold points after you have plotted the beeswarm.
library(dplyr)
library(ggbeeswarm)
distro <- data.frame(
'variable'=rep(c('runif','rnorm'),each=1000),
'value'=c(runif(2000, min=-3, max=3))
)
distro$indicator <- "NA"
distro[3,3] <- "Threshhold"
distro[163,3] <- "Threshhold"
ggplot2::ggplot(distro,aes(variable, value, color=indicator)) +
geom_quasirandom(groupOnX=TRUE, na.rm = TRUE, width=0.1) +
scale_color_manual(name = 'indicator', values=c("#99ccff","#000000")) +
geom_point(data = distro %>% filter(indicator == "Threshhold"))
You sort your data based on the color variable (your indicator).
Basically you want your black dots to be plotted last = on top of the other ones.
df$indicator <- sort(df$indicator, decreasing=T)
#Tidyverse solution
df <- df %>% arrange(desc(indicator))
Dependent on your levels you may have to reverse sort or not.
Then you just plot.
pd <- tibble(x=rnorm(1000), y=1, indicator=sample(c("A","B"), replace=T, size = 1000))
ggplot(pd, aes(x=x,y=y,color=indicator)) + geom_point()
pd <- pd %>% arrange(indicator)
ggplot(pd, aes(x=x,y=y,color=indicator)) + geom_point()
pd <- pd %>% arrange(desc(indicator))
ggplot(pd, aes(x=x,y=y,color=indicator)) + geom_point()

Adding reference lines to a bar-plot with ggplot in R

This is a minimal example that shows the plots I am trying to make.
Data looks like this:
plot1 = data.frame(
Factor1 = as.factor(rep('A', 4)),
Factor2 = as.factor(rep(c('C', 'D'), 2)),
Factor3 = as.factor(c( rep('E', 2), rep('F', 2))),
Y = c(0.225490, 0.121958, 0.218182, 0.269789)
)
plot2 = data.frame(
Factor1 = as.factor(rep('B', 4)),
Factor2 = as.factor(rep(c('C', 'D'), 2)),
Factor3 = as.factor(c( rep('E', 2), rep('F', 2))),
Y = c(-0.058585, -0.031686, 0.013141, 0.016249)
)
While the basic code for plotting looks like this:
require(ggplot2)
require(grid)
p1 <- ggplot(data=plot1, aes(x=Factor2, y=Y, fill=factor(Factor3))) +
ggtitle('Type: A') +
coord_cartesian(ylim = c(-0.10, 0.30)) +
geom_bar(position=position_dodge(.9), width=0.5, stat='identity') +
scale_x_discrete(name='Regime',
labels=c('C', 'D')) +
scale_y_continuous('Activations') +
scale_fill_brewer(palette='Dark2', name='Background:',
breaks=c('E','F'),
labels=c('E','F')) +
theme(axis.text=element_text(size=11),
axis.title.x=element_text(size=13, vjust=-0.75),
axis.title.y=element_text(size=13, vjust=0.75),
legend.text=element_blank(),
legend.title=element_blank(),
legend.position='none',
plot.title=element_text(hjust=0.5))
p2 <- ggplot(data=plot2, aes(x=Factor2, y=Y, fill=factor(Factor3))) +
ggtitle('Type: B') +
coord_cartesian(ylim = c(-0.10, 0.30)) +
geom_bar(position=position_dodge(.9), width=0.5, stat='identity') +
scale_x_discrete(name='Regime',
labels=c('C', 'D')) +
scale_y_continuous('Activations') +
scale_fill_brewer(palette='Dark2', name='Background:',
breaks=c('E','F'),
labels=c('E','F')) +
theme(axis.text=element_text(size=11),
axis.title.x=element_text(size=13, vjust=-0.75),
axis.title.y=element_blank(),
legend.text=element_text(size=11),
legend.title=element_text(size=13),
plot.title=element_text(hjust=0.5))
pushViewport(viewport(
layout=grid.layout(1, 2, heights=unit(4, 'null'),
widths=unit(c(1,1.17), 'null'))))
print(p1, vp=viewport(layout.pos.row=1, layout.pos.col=1))
print(p2, vp=viewport(layout.pos.row=1, layout.pos.col=2))
And the figure looks like this:
However, I would need something like this:
Thick black lines are the reference values. They are constant and the Figure presents that "reference situation". However, in other plots that I need to produce bars will change but the reference values should remain the same to make the comparisons straightforward and easy. I know I should be using geom_segment() but those lines in my attempts to make this work are just missing the bars.
Any help/advice? Thanks!
I was able to do this using geom_errorbarh. For instance, with the second figure:
p1 +
geom_errorbarh(
aes(xmin = as.numeric(Factor2)-.2,xmax = as.numeric(Factor2)+.2), #+/-.2 for width
position = position_dodge(0.9), size = 2, height = 0
)
OUTPUT:
And, if I understand the other plots you describe, you can specify the reference data in those, eg data = plot1
If your references are not going to be changed, you can create a second dataset and merge it to the dataset you are going to plot.
Here, I first add plot1 and plot2. Then, I create a new dataset that will be the reference dataset.
library(dplyr)
new_df = rbind(plot1, plot2)
ref_plot = new_df
ref_plot <- ref_plot %>% rename(Ref_value = Y)
Then, now you have the new_df which is the dataset to be plot and ref_plot that contains references values for each conditions.
Instead of using grid and create two different plot that I will merge after, I preferred to use facet_wrap which put all plots on the same figure. It is much more convenient and don't require to write twice the same thing.
As mentioned by #AHart few minutes before me, you can use geom_errorbar to define your reference values on the plot. The difference is I prefere to use geom_errorbar instead of geom_errobarh.
Here is for the plot:
library(ggplot2)
new_df %>% left_join(ref_plot) %>%
ggplot(aes(x = Factor2, y = Y, fill = Factor3))+
geom_bar(stat = "identity", position = position_dodge())+
geom_errorbar(aes(ymin = Ref_value-0.00001, ymax = Ref_value+0.0001, group = Factor3), position = position_dodge(.9),width = 0.2)+
facet_wrap(.~Factor1, labeller = labeller(Factor1 = c(A = "Type A", B = "Type B"))) +
scale_x_discrete(name='Regime',
labels=c('C', 'D')) +
scale_fill_brewer(palette='Dark2', name='Background:',
breaks=c('E','F'),
labels=c('E','F')) +
theme(axis.text=element_text(size=11),
axis.title.x=element_text(size=13, vjust=-0.75),
axis.title.y=element_blank(),
legend.text=element_text(size=11),
legend.title=element_text(size=13),
plot.title=element_text(hjust=0.5))

How to add labels to facet_wrap(ed) geom_count plot?

I am creating a facetted plot using facet_wrap. I want text labels to be included inside the bubble. Instead it seems the total is included as label - i.e. all graphs has the same numbers but different bubble size (which is correct).
(Edits)
My code:
Category1 <- c('A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B','C','A','B')
Category2 <- c('W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V','W','V')
Class <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
df <- data.frame(Category1, Category2, Class)
g <- ggplot(df, aes(Category1, Category2))
g <- g + facet_wrap(Class ~ ., nrow = 3) + geom_count(col="tomato3", show.legend=F) + scale_size_continuous(range = c(5, 10))
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g
g2 <- g + geom_text(data=ggplot_build(g)$data[[1]], aes(x, y, label=n), size=2) #+ scale_size(range = c(5, 15))
g2
I expect that the size of the bubble will be indicated by the text inside the bubble. But the actual result is all graphs have the same number. I want the small bubble to have small number proportional to its size.
The problem is that your code using ggplot_build data does not have the same categories as the original. You need to create a count data before hand and use it for plotting.
Create count data
library(tidyverse)
df_count <- df %>%
count(Class, Category1, Category2)
Plot
There are two ways to incorporate this new data.
Method 1
The first example I show is to use both df and df_count. This method will modify your code minimally:
g <- ggplot(df, aes(Category1, Category2))
g <- g + facet_wrap(Class ~ ., nrow = 3) + geom_count(col="tomato3", show.legend=F) +
geom_text(data = df_count, aes(Category1, Category2, label=n), size=2) +
scale_size_continuous(range = c(5, 10)) +
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g
The line geom_text(data = df_count, aes(Category1, Category2, label=n), size=2) + is added.
Method 2
This method uses only the count data. It uses geom_point() instead of geom_count() and alter the size using the variable n. This method is probably better in terms of code readability.
g_alternative <- ggplot(df_count, aes(Category1, Category2, label = n)) +
facet_wrap(Class ~ ., nrow = 3) +
geom_point(col="tomato3", aes(size = n), show.legend=F) +
geom_text() +
scale_size_continuous(range = c(5, 10)) +
labs(subtitle="Count Plot", y="Category2", x="Category1", title="Cat1 vs Cat2")
g_alternative
The output looks like this:

R: How to spread (jitter) points with respect to the x axis?

I have the following code snippet in R:
dat <- data.frame(cond = factor(rep("A",10)),
rating = c(1,2,3,4,6,6,7,8,9,10))
ggplot(dat, aes(x=cond, y=rating)) +
geom_boxplot() +
guides(fill=FALSE) +
geom_point(aes(y=3)) +
geom_point(aes(y=3)) +
geom_point(aes(y=5))
This particular snippet of code produces a boxplot where one point goes over another (in the above case one point 3 goes over another point 3).
How can I move the point 3 so that the point remains in the same position on the y axis, but it is slightly moved left or right on the x axis?
This can be achieved by using the position_jitter function:
geom_point(aes(y=3), position = position_jitter(w = 0.1, h = 0))
Update:
To only plot the three supplied points you can construct a new dataset and plot that:
points_dat <- data.frame(cond = factor(rep("A", 3)), rating = c(3, 3, 5))
ggplot(dat, aes(x=cond, y=rating)) +
geom_boxplot() +
guides(fill=FALSE) +
geom_point(aes(x=cond, y=rating), data = points_dat, position = position_jitter(w = 0.05, h = 0))
ggplot2 now includes position_dodge(). From the help's description: "Dodging preserves the vertical position of an geom while adjusting the horizontal position."
Thus you can either use it as geom_point(position = position_dodge(0.5)) or, if you want to dodge points that are connected by lines and need the dodge to the be the same across both geoms, you can use something like:
dat <- data.frame(cond = rep(c("A", "B"), each=10), x=rep(1:10, 2), y=rnorm(20))
dodge <- position_dodge(.3) # how much jitter on the x-axis?
ggplot(dat, aes(x, y, group=cond, color=cond)) +
geom_line(position = dodge) +
geom_point(position = dodge)
ggplot2 now has a separate geom for this called geom_jitter so you don't need the position = dodge or position = position_dodge()) argument. Here applied to OP's example:
dat <- data.frame(cond = factor(rep("A",10)),
rating = c(1,2,3,4,6,6,7,8,9,10))
ggplot(dat, aes(x=cond, y=rating)) +
geom_boxplot() +
guides(fill=FALSE) +
geom_jitter(aes(y=c(3, 3, 5)))

Resources