How to add duplicate label using directlabels package in R - r

I have the following data frame:
library(tidyverse)
library(directlabels)
dat <- structure(list(time.course = c("CONTROL", "DAY03", "DAY06", "DAY09",
"DAY12", "DAY15", "CONTROL", "DAY03", "DAY06", "DAY09", "DAY12",
"DAY15", "CONTROL", "DAY03", "DAY06", "DAY09", "DAY12", "DAY15",
"CONTROL", "DAY03", "DAY06", "DAY09", "DAY12", "DAY15"), log_delta = c(0,
0.620163956872191, 0.97251217133899, 0.788819459139427, 0.412543422847407,
0.401621905837411, 0, -0.168711062429047, -0.973481367557294,
-1.46433243027353, -1.34771037206345, -1.77709667157235, 0, -0.187344700204557,
-0.254280909246003, -0.335330756378048, -0.655121382977672, -1.1733031812697,
0, -0.0160729795971869, -0.628563089917479, -1.43060414378064,
-1.466051599194, -2.57510172892555), `UMAP cluster` = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 7L, 7L, 7L, 7L, 7L,
7L, 13L, 13L, 13L, 13L, 13L, 13L), .Label = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"
), class = "factor"), cell_name = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L), .Label = c("Macrophage", "Enteroendocrine",
"Endothelial", "Lymphatic", "Fibroblast", "T cell", "Myofibroblast",
"Absorbtice & secrectory cell", "Plasmacytoid DC", "Neutrophil",
"Plasma cell", "Cajal intestinal cell", "Glial cell", "Germinal center B cell"
), class = "factor")), row.names = c(NA, -24L), class = c("tbl_df",
"tbl", "data.frame"))
dat
It looks like this:
# A tibble: 24 x 4
time.course log_delta `UMAP cluster` cell_name
<chr> <dbl> <fct> <fct>
1 CONTROL 0 1 Macrophage
2 DAY03 0.620 1 Macrophage
3 DAY06 0.973 1 Macrophage
4 DAY09 0.789 1 Macrophage
5 DAY12 0.413 1 Macrophage
6 DAY15 0.402 1 Macrophage
7 CONTROL 0 5 Fibroblast
8 DAY03 -0.169 5 Fibroblast
9 DAY06 -0.973 5 Fibroblast
10 DAY09 -1.46 5 Fibroblast
11 DAY12 -1.35 5 Fibroblast
12 DAY15 -1.78 5 Fibroblast
13 CONTROL 0 7 Myofibroblast
14 DAY03 -0.187 7 Myofibroblast
15 DAY06 -0.254 7 Myofibroblast
16 DAY09 -0.335 7 Myofibroblast
17 DAY12 -0.655 7 Myofibroblast
18 DAY15 -1.17 7 Myofibroblast
19 CONTROL 0 13 Myofibroblast
20 DAY03 -0.0161 13 Myofibroblast
21 DAY06 -0.629 13 Myofibroblast
22 DAY09 -1.43 13 Myofibroblast
23 DAY12 -1.47 13 Myofibroblast
24 DAY15 -2.58 13 Myofibroblast
Notice that "Myofibroblast" occurs twice as UMAP cluster 7 and 13.
I tried to plot that using this code with directlabels package:
ggplot(dat, aes(x = time.course, y = log_delta,
color = cell_name)) +
geom_line(aes(group = `UMAP cluster`)) +
scale_x_discrete(expand = c(0, 2.5)) +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
xlab("") +
ylab("log(proportion / control proportion)") +
theme(legend.title = element_blank()) +
geom_dl(aes(label = cell_name), method = list(dl.trans(x = x + 0.3), "last.bumpup", cex = 0.8))
The plot looks like this:
Notice that Myofibroblast doesn't occur at the end of the two lines (blue).
What I want to do is
to color Myofibroblast with two colors
each of Myofibroblast lines also tagged with labels.
How can I achieve that?

It's not possible to do it with geom_dl because it inherits the aes and directly takes the column variable to be the label. I can think of two solutions, first, is to create a new variable by fusing the cluster id with cell type:
# the column name is giving some problems
colnames(dat)[3] = "UMAPcluster"
dat <- dat %>% mutate(new=paste(cell_name,UMAPcluster))
dat$new <- factor(dat$new,levels=unique(dat$new))
ggplot(dat, aes(x = time.course, y = log_delta,,group=new,col = new)) +
geom_line() +
scale_x_discrete(expand = c(0, 2.5)) +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
xlab("") +
ylab("log(proportion / control proportion)") +
theme(legend.title = element_blank()) +
geom_dl(aes(label = new), method = list(dl.trans(x = x + 0.3), "last.bumpup", cex = 0.8))
Or you create a new data frame and annotate with geom_text (or geom_label if you like boxes). Preferably you keep the legend for the cluster so that it is clear what the colors mean.
LAB = dat %>% group_by(UMAPcluster) %>% top_n(1,wt=time.course)
ggplot(dat, aes(x = time.course, y = log_delta,color = UMAPcluster)) +
geom_line(aes(group = UMAPcluster)) +
scale_x_discrete(expand = c(0, 2.5)) +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
xlab("") +
ylab("log(proportion / control proportion)") +
geom_label(data=LAB,aes(label=cell_name),show.legend=FALSE,nudge_x=0.7)

Related

Add text next to geom_segment in ggplot2

Hello I have a df such as :
tab
X molecule gene start_gene end_gene start_scaff end_scaff strand direction COL1 COL2
1 7 scaffold_1254 G7 6708 11967 1 20072 backward -1 10 20
2 5 scaffold_7638 G5 9567 10665 1 15336 backward -1 18 1
3 4 scaffold_7638 G4 3456 4479 1 15336 forward 1 18 1
4 2 scaffold_15158 G2 10105 10609 1 13487 backward -1 5 9
5 6 scaffold_8315 G6 2760 3849 1 10827 forward 1 25 7
6 3 scaffold_7180 G3 9814 10132 1 10155 backward -1 21 9
7 1 scaffold_74038 G1 1476 2010 1 2010 forward 1 8 34
so far with this code :
ggplot(tab, aes(x = start_scaff, xend = end_scaff,
y = molecule, yend = molecule)) +
geom_segment(size = 3, col = "grey80") +
geom_segment(aes(x = ifelse(direction == 1, start_gene, end_gene),
xend = ifelse(direction == 1, end_gene, start_gene)),
data = tab,
arrow = arrow(length = unit(0.1, "inches")), size = 2) +
geom_text_repel(aes(x = start_gene, y = molecule, label = gene),
data = tab, nudge_y = 0.5,size=2) +
scale_y_discrete(limits = rev(levels(tab$molecule))) +
theme_minimal()
I mannaged to get this plot :
and I wondered if there were a way to add a text just next to geom_segment with COL1 and COL2 values and color the text depending on a threshold : green values > 10, red values <= 10
and get something like
dput(tab)
structure(list(X = c(7L, 5L, 4L, 2L, 6L, 3L, 1L), molecule = structure(c(1L,
5L, 5L, 2L, 6L, 3L, 4L), .Label = c("scaffold_1254", "scaffold_15158",
"scaffold_7180", "scaffold_74038", "scaffold_7638", "scaffold_8315"
), class = "factor"), gene = structure(c(7L, 5L, 4L, 2L, 6L,
3L, 1L), .Label = c("G1", "G2", "G3", "G4", "G5", "G6", "G7"), class = "factor"),
start_gene = c(6708L, 9567L, 3456L, 10105L, 2760L, 9814L,
1476L), end_gene = c(11967L, 10665L, 4479L, 10609L, 3849L,
10132L, 2010L), start_scaff = c(1L, 1L, 1L, 1L, 1L, 1L, 1L
), end_scaff = c(20072L, 15336L, 15336L, 13487L, 10827L,
10155L, 2010L), strand = structure(c(1L, 1L, 2L, 1L, 2L,
1L, 2L), .Label = c("backward", "forward"), class = "factor"),
direction = c(-1L, -1L, 1L, -1L, 1L, -1L, 1L), COL1 = c(10L,
18L, 18L, 5L, 25L, 21L, 8L), COL2 = c(20L, 1L, 1L, 9L, 7L,
9L, 34L)), class = "data.frame", row.names = c(NA, -7L))
An approximation would be
ggplot(tab, aes(x = start_scaff, xend = end_scaff,
y = molecule, yend = molecule)) +
geom_segment(size = 3, col = "grey80") +
geom_segment(aes(x = ifelse(direction == 1, start_gene, end_gene),
xend = ifelse(direction == 1, end_gene, start_gene)),
data = tab,
arrow = arrow(length = unit(0.1, "inches")), size = 2) +
geom_text_repel(aes(x = start_gene, y = molecule, label = gene),
data = tab, nudge_y = 0.5,size=2) +
scale_y_discrete(limits = rev(levels(tab$molecule))) +
theme_minimal() +
geom_text(data = mutate(tab, COLr1 = COL1<10), aes(color = COLr1, label = COL1), position = position_nudge(x=20000)) +
geom_text(data = mutate(tab, COLr2 = COL2<10), aes(color = COLr2, label = COL2), position = position_nudge(x=22000)) +
geom_text(data = mutate(tab, txt = "-"), aes(label = txt), position = position_nudge(x=21100)) +
scale_color_manual(values = c("darkgreen", "red")) +
xlim(c(NA,23000)) +
theme(legend.position = "none")

How to do side by side bar chart ggplot and retain original sorting

Thanks in advance for your help. I have been searching stack overflow and google with this issue and have not been successful.
I need a side by side horizontal bar chart of the word and its frequency in two files.
My data frame is as below:
head(comp,10)
WORD FREQ RDFREQ
170 project 67 5
20 business 64 14
117 management 53 13
59 development 34 4
211 support 27 6
215 systems 25 10
102 information 22 2
201 software 21 6
203 solutions 20 2
220 technical 20 2
I have used melt to create the frequency frame as below:
dfp1 <- melt(comp, value.factor = TRUE)
head(dfp1,20)
WORD variable value
1 project FREQ 67
2 business FREQ 64
3 management FREQ 53
4 development FREQ 34
5 support FREQ 27
6 systems FREQ 25
7 information FREQ 22
8 software FREQ 21
9 solutions FREQ 20
10 technical FREQ 20
11 applications FREQ 19
12 planning FREQ 18
My code to plot is
g <- ggplot(dfp1, aes(x = WORD, y= value, order=- as.integer(value)))
g <- g + geom_bar(aes(fill = variable), position = "dodge", stat="identity")
g <- g + coord_flip()
g <- g + theme(axis.text.y = element_text(size=12))
g <- g + labs(x = "Keyword", y = "Count",
title = paste("File1 vs File2")
)
print(g)
The plot that I get is sorted by the WORD instead of the descending order of frequency. Thanks again and looking forward to the responses.
would this work for you,
comp <- structure(list(WORD = structure(c(5L, 1L, 4L, 2L, 8L, 9L, 3L,
6L, 7L, 10L), .Label = c("business", "development", "information",
"management", "project", "software", "solutions", "support",
"systems", "technical"), class = "factor"), FREQ = c(67L, 64L,
53L, 34L, 27L, 25L, 22L, 21L, 20L, 20L), RDFREQ = c(5L, 14L,
13L, 4L, 6L, 10L, 2L, 6L, 2L, 2L)), .Names = c("WORD", "FREQ",
"RDFREQ"), class = "data.frame", row.names = c("170", "20", "117",
"59", "211", "215", "102", "201", "203", "220"))
comp %>% gather(variable, Count, -WORD) %>%
mutate(Keyword = fct_reorder(WORD, Count, .desc = FALSE)) %>%
ggplot(aes(x = Keyword, y = Count, fill = variable)) +
geom_bar(stat = 'identity', position = "dodge") + coord_flip() +
theme(axis.text.y = element_text(size=12)) +
labs(title = paste("File1 vs File2") )
if you want to order exclusively on FREQ you could do something like this,
comp %>% arrange(desc(FREQ)) %>% mutate(id = row_number()) %>%
gather(variable, Count, -WORD, -id) %>%
mutate(Keyword = fct_reorder(WORD, id, .desc = TRUE)) %>%
ggplot(aes(x = Keyword, y = Count, fill = variable)) +
geom_bar(stat = 'identity', position = "dodge") + coord_flip() +
theme(axis.text.y = element_text(size=12))
+ labs(title = paste("File1 vs File2") )

Applying scale_fill_gradient in ggplot2 conditionally

I am plotting the following data using geom_tile and geom_textin ggplot2
mydf
Var1 Var2 dc1 bin
1 H G 0.93333333 0
2 G H 0.06666667 1
3 I G 0.80000000 0
4 G I 0.20000000 1
5 J G 0.33333333 1
6 G J 0.66666667 0
7 K G 0.57894737 1
8 G K 0.42105263 0
9 I H 0.80000000 0
10 H I 0.20000000 1
11 J H 0.25000000 0
12 H J 0.75000000 1
13 K H 0.20000000 0
14 H K 0.80000000 1
15 J I 0.12500000 0
16 I J 0.87500000 1
17 K I 0.32000000 0
18 I K 0.68000000 1
19 K J 0.28571429 0
20 J K 0.71428571 1
I am plotting 'Var1' vs 'Var2', and then using the 'bin' variable as my geom_text. Currently, I have filled each tile based upon scale_fill_gradient using the variable 'dc1'.
### Plotting
ggplot(mydf, aes(Var2, Var1, fill = dc1)) +
geom_tile(colour="gray20", size=1.5, family="bold", stat="identity", height=1, width=1) +
geom_text(data=mydf, aes(Var2, Var1, label = bin), color="black", size=rel(4.5)) +
scale_fill_gradient(low = "white", high = "firebrick3", space = "Lab", na.value = "gray20",
guide = "colourbar") +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
xlab("") +
ylab("") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="gray20", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
axis.text = element_text(color="white", size=rel(1.5)),
panel.background = element_rect(fill="gray20"),
plot.background = element_rect(fill="gray20"),
legend.position = "none"
)
Which gives this:
What I am trying to do (unsuccessfully) is to make the fill conditional upon the 'bin' variable. If bin==1then I would like to fill according to 'dc1'. If bin==0 then I would like to fill with 'white'.
This would give the following which I have manually created as an example desired plot:
I tried messing around with scale_fill_gradient to try and introduce a second fill option, but cannot seem to figure this out. Thanks for any help/pointers.
This is the dput for mydf:
structure(list(Var1 = structure(c(4L, 5L, 3L, 5L, 2L, 5L, 1L,
5L, 3L, 4L, 2L, 4L, 1L, 4L, 2L, 3L, 1L, 3L, 1L, 2L), .Label = c("K",
"J", "I", "H", "G"), class = "factor"), Var2 = structure(c(1L,
2L, 1L, 3L, 1L, 4L, 1L, 5L, 2L, 3L, 2L, 4L, 2L, 5L, 3L, 4L, 3L,
5L, 4L, 5L), .Label = c("G", "H", "I", "J", "K"), class = "factor"),
dc1 = c(0.933333333333333, 0.0666666666666667, 0.8, 0.2,
0.333333333333333, 0.666666666666667, 0.578947368421053,
0.421052631578947, 0.8, 0.2, 0.25, 0.75, 0.2, 0.8, 0.125,
0.875, 0.32, 0.68, 0.285714285714286, 0.714285714285714),
bin = c(0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 1)), .Names = c("Var1", "Var2", "dc1", "bin"), row.names = c(NA,
-20L), class = "data.frame")
Perhaps replace fill = dc1 with fill = dc1 * bin? A stripped-down version of your code:
ggplot(data = mydf, aes(x = Var2, y = Var1, fill = dc1 * bin, label = bin)) +
geom_tile() +
geom_text() +
scale_fill_gradient(low = "white", high = "firebrick3")

Displaying multiple boxplots per group in R

I have data of the form:
Day A B
1 1 4
1 2 5
1 3 6
2 2 2
2 3 4
2 5 6
3 6 7
3 4 6
And I would like to display this on a single chart, with Day along the x-axis, and with each x-position having a boxplot for each of A and B (colour coded).
Here's a (slight) modification of an example form the ?boxplot help page. The examples show off many common uses of the functions.
tg <- data.frame(
dose=ToothGrowth$dose[1:30],
A=ToothGrowth$len[1:30],
B=ToothGrowth$len[31:60]
)
head(tg)
# dose A B
# 1 0.5 4.2 15.2
# 2 0.5 11.5 21.5
# 3 0.5 7.3 17.6
# 4 0.5 5.8 9.7
# 5 0.5 6.4 14.5
# 6 0.5 10.0 10.0
boxplot(A ~ dose, data = tg,
boxwex = 0.25, at = 1:3 - 0.2,
col = "yellow",
main = "Guinea Pigs' Tooth Growth",
xlab = "Vitamin C dose mg",
ylab = "tooth length",
xlim = c(0.5, 3.5), ylim = c(0, 35), yaxs = "i")
boxplot(B ~ dose, data = tg, add = TRUE,
boxwex = 0.25, at = 1:3 + 0.2,
col = "orange")
legend(2, 9, c("A", "B"),
fill = c("yellow", "orange"))
Try:
ddf = structure(list(Day = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), A = c(1L,
2L, 3L, 2L, 3L, 5L, 6L, 4L), B = c(4L, 5L, 6L, 2L, 4L, 6L, 7L,
6L)), .Names = c("Day", "A", "B"), class = "data.frame", row.names = c(NA,
-8L))
mm = melt(ddf, id='Day')
ggplot(mm)+geom_boxplot(aes(x=factor(Day), y=value, fill=variable))

ggplot2 geom_line() should point at specified value

I have written the following code:
library(ggplot2)
data <- structure(list(x = c(1L, 6L, 3L, 4L, 2L, 3L, 6L, 1L, 5L, 2L,
1L, 5L), y = c(1L, 7L, 5L, 6L, 3L, 4L, 6L, 2L, 5L, 6L, 5L, 2L
), year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("2010", "2011"), class = "factor"), matching = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("person1",
"person2", "person3", "person4", "person5", "person6"), class = "factor")), .Names = c("x",
"y", "year", "matching"), row.names = c(NA, -12L), class = "data.frame")
data$year <- factor(data$year)
colors <- c("#4cb5ee", "#a0d099", "red")
p <- ggplot(data, aes(x=x, y=y)) +
geom_point(aes(colour=year), shape=16, size=6) +
geom_line(aes(group=matching), arrow=arrow(length=unit(0.15,"cm")), colour="black", size=1) +
xlab("x") + ylab("y") +
scale_colour_manual("year", values=colors) +
scale_x_continuous(limits=c(1,7), breaks=seq(1,7, by=1)) +
scale_y_continuous(limits=c(1,7), breaks=seq(1,7, by=1))
print(p)
It gives the following output:
But what I want geom_line() to do is: always points at the point where year=2011. I can't figure out why the arrow of the line is point sometimes at a point which refers to year=2010 and sometimes points at a point where year=2011.
What I found out is that arrow takes several arguments:
arrow(angle = 30, length = unit(0.25, "inches"), ends = "last", type = "open")
So that I could say ends="first". But I can't generalize that ends is always first or always last.
I tried to add a column to my data.frame which has the information if the arrow should end first or last, but it didn't gives me the output I wanted.
Every help is highly appreciated :-)
Thanks in advance!
geom_path should do the trick:
p <- ggplot(data, aes(x=x, y=y)) +
geom_point(aes(colour=year), shape=16, size=6) +
geom_path(aes(group=matching),
arrow=arrow(length=unit(0.15,"cm")),
colour="black", size=1) +
xlab("x") + ylab("y") +
scale_colour_manual("year", values=colors) +
scale_x_continuous(limits=c(1,7), breaks=seq(1,7, by=1)) +
scale_y_continuous(limits=c(1,7), breaks=seq(1,7, by=1))
print(p)
There is probably a more efficient way to do this, but one approach is to use geom_segment() instead of geom_line(). This will allow you to specify the beginning and ending points of the line with ease. We have to restructure the data so that we can specify x, y, xend, and yend. I will restructure with merge, though you could probably do this with cast or reshape.
zz <- merge(data[data$year == 2010 ,], data[data$year == 2011 ,]
, by = "matching", suffixes = 1:2)
matching x1 y1 year1 x2 y2 year2
1 person1 1 1 2010 6 6 2011
2 person2 6 7 2010 1 2 2011
3 person3 3 5 2010 5 5 2011
4 person4 4 6 2010 2 6 2011
5 person5 2 3 2010 1 5 2011
6 person6 3 4 2010 5 2 2011
We will then use two datasets in our call to ggplot:
ggplot() + #Blank call to ggplot
geom_point(data = data, aes(x=x, y=y, colour=year), shape=16, size=6) + #Points
geom_segment(data = zz, aes(x = x1, y = y1, xend = x2, yend = y2), #Segments
arrow = arrow(length = unit(0.15, "cm")), colour = "black", size = 1) +
xlab("x") + ylab("y") +
scale_colour_manual("year", values=colors) +
scale_x_continuous(limits=c(1,7), breaks=seq(1,7, by=1)) +
scale_y_continuous(limits=c(1,7), breaks=seq(1,7, by=1))

Resources