R: How to make multiple plots in a graph using unique IDs - r

I have a dataframe df with 4 unique UID - 1001,1002,1003,1004.
I want to write a user-defined function in R that does the following:
Plots Turbidity against Time for each unique UID. Turbidity values are the ones in the Time_1, Time_2 and Time_3 columns. For example, UID = 1001 will have 4 plots in one graph
Add a legend to each graph such as M-L, F-L, M-R, and F-R (from columns Gen and Type)
Add a title to each graph. For example- UID:1001
Export the graphs as pdf or jpeg or tiff pdf files - 4 graphs per page
# dataset
Gen <- c('M','M','M','M','F','F','F','F','M','M','M','M','F','F','F','F')
Site <- rep('FRX',length(gen))
Type <- c('L','L','L','L','L','L','L','L','R','R','R','R','R','R','R','R')
UID <- c(1001,1002,1003,1004,1001,1002,1003,1004,1001,1002,1003,1004,1001,1002,1003,1004)
Time_1 <- c(100.78,112.34,108.52,139.19,149.02,177.77,79.18,89.10,106.78,102.34,128.52,119.19,129.02,147.77,169.18,170.11)
Time_2 <- c(150.78,162.34,188.53,197.69,208.07,217.76,229.48,139.51,146.87,182.54,189.57,199.97,229.28,247.73,269.91,249.19)
Time_3 <- c(250.78,262.34,288.53,297.69,308.07,317.7,329.81,339.15,346.87,382.54,369.59,399.97,329.28,347.73,369.91,349.19)
df <- data.frame(Gen,Site,Type,UID,Time_1,Time_2,Time_3)
df
My attempt
library(ggplot2)
library(tidyr)
# See below for my thoughts/attempt- I am open to other R libraries and approaches
graphplotter <-function(x){
# 1. Convert from wide to long
data_long <- gather(df, time, turbidity, Time_1:Time_3, factor_key=TRUE)
data_long
#2. plot for each unique UID- 1001 to 1004 and add legend
basic <- ggplot(datalong, aes(time, turbidity, shape=Tree)) + geom_point() + geom_line()
basic + theme(
legend.position = c(.95, .95),
legend.justification = c("right", "top"),
legend.box.just = "right",
legend.margin = margin(6, 6, 6, 6))
#3. add title
print(basic+ labs( title= "UID: 1001, Tubidity against time", y="turbidity", x = "Time in hours"))
#4. export as pdf
pdf("turbdity-time.pdf")
par(mfrow = c(2, 2)) ## set the layout to be 2 by 2
sapply(1:4, function(i) plot(basic[,i]))
dev.off()
}
I want all four graphs to look something like this (ignore the circumference and age, should be turbidity and time).
Thanks

I use facet_wrap
graphplotter <-function(x){
x %>%
gather(., time, turbidity, Time_1:Time_3, factor_key=TRUE) %>%
mutate(label = (paste0(Gen, "-", Type))) %>%
#group_by(UID) %>%
ggplot(aes(color = label)) + geom_point(aes(time, turbidity, shape = label, group = label)) +
geom_line(aes(time, turbidity, group = label)) + facet_wrap(~UID) + theme(
legend.position = c(1, 1),
legend.justification = c("right", "top"),
legend.box.just = "right",
legend.margin = margin(1, 1, 1, 1),
legend.text = element_text(size = 7))
}
graphplotter(df)

Related

How to aggregate data from years to decades and plot them?

This is the graph that I would like to reproduce:
but for that I have to change the years column because on the graph the x axis is in decades. By what means could I accomplish this ?
This is what I did to extract the data from the site (https://ourworldindata.org/famines) :
library(rvest)
library(dplyr)
library(tidyr)
library(ggplot2)
col_link <- "https://ourworldindata.org/famines#famines-by-world-region-since-1860"
col_page <- read_html(col_link)
col_table <- col_page %>% html_nodes("table#tablepress-73") %>%
html_table() %>% . [[1]]
data1 <- col_table %>%
select(Year, `Excess Mortality midpoint`)
Year `Excess Mortality midpoint`
<chr> <chr>
1 1846–52 1,000,000
2 1860-1 2,000,000
3 1863-67 30,000
4 1866-7 961,043
5 1868 100,000
6 1868-70 1,500,000
7 1870–1871 1,000,000
8 1876–79 750,000
9 1876–79 7,176,346
10 1877–79 11,000,000
# ... with 67 more rows
Firstly, to convert the periods to decades, you need to extract a year for each period, based on which the calculation will be made. From your comment above, it looks like you need to extract the end year for each period. Given the data, regular expressions are used below to do this (and packages dplyr and stringr).
col_table <- col_table %>%
mutate(Year = case_when(
grepl("^\\d{4}$",Year) ~ Year,
grepl("\\d{4}[–-]\\d{4}",Year) ~ str_sub(Year, start= -4),
grepl("\\d{4}[–-]\\d{2}$",Year) ~ paste0(str_sub(Year,1,2),str_sub(Year,-2)),
grepl("\\d{4}[–-]\\d{1}$",Year) ~ paste0(str_sub(Year,1,3),str_sub(Year,-1))))
What this part of code is doing, is to detect the different cases and extract the proper year. Below there are examples for all cases, that are present on the dataset and what this part of code will result to.
1868 -> 1868
1878-1880 -> 1880
1846–52 -> 1852
1860-1 -> 1861
Now we have the year, so the next step is to extract the decade. To do so, we need to make sure that Year column is numeric and apply the necessary calculation (check here for it: https://stackoverflow.com/a/48966643/8864619)
col_table <- col_table %>%
mutate(Decade = as.numeric(Year) - as.numeric(Year) %% 10)
To reproduce the plot we need to group by decade and make sure that the Excess Mortality midpoint column is numeric to be able to get the sum of victims per decade.
col_table <- col_table %>%
mutate(`Excess Mortality midpoint` = as.numeric(gsub(",", "", `Excess Mortality midpoint`))) %>%
group_by(Decade) %>%
summarize(val = sum(`Excess Mortality midpoint`)) %>%
ungroup()
For the plot itself, ggplot2 is used:
ylab <- c(5, 10, 15, 20, 25)
options(scipen=999)
p <- ggplot(data = col_table, aes(x=factor(Decade),y=val)) +
geom_bar(stat = "identity", fill = "navy") +
scale_x_discrete(labels = col_table %>% distinct(Decade) %>% mutate(Decade = paste0(Decade,"s")) %>% pull()) +
geom_text(aes(label=format(val,big.mark=",")), size=2,vjust=-0.3) +
scale_y_continuous(labels = paste(ylab, "millions"),breaks = 10^6 * ylab) +
ggtitle('Famine victims worldwide')+
theme(panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(size = 0.05, linetype = 'solid',
colour = "black"),
axis.title.x = element_blank(),
axis.title.y = element_blank())
p
So, putting everything together, the following code should get you a column for the year and a column for the relevant decade, which should be then used to create the plot you want to:
library(rvest)
library(dplyr)
library(stringr)
library(ggplot2)
col_link <- "https://ourworldindata.org/famines#famines-by-world-region-since-1860"
col_page <- read_html(col_link)
col_table <- col_page %>% html_nodes("table#tablepress-73") %>% html_table() %>% . [[1]]
col_table <- col_table %>%
mutate(Year = case_when(
grepl("^\\d{4}$",Year) ~Year,
grepl("\\d{4}[–-]\\d{4}",Year) ~ str_sub(Year, start= -4),
grepl("\\d{4}[–-]\\d{2}$",Year) ~ paste0(str_sub(Year,1,2),str_sub(Year,-2)),
grepl("\\d{4}[–-]\\d{1}$",Year) ~ paste0(str_sub(Year,1,3),str_sub(Year,-1)))) %>%
mutate(Decade = as.numeric(Year) - as.numeric(Year)%%10) %>%
mutate(`Excess Mortality midpoint` = as.numeric(gsub(",", "", `Excess Mortality midpoint`))) %>%
group_by(Decade) %>%
summarize(val = sum(`Excess Mortality midpoint`)) %>%
ungroup()
ylab <- c(5, 10, 15, 20, 25)
options(scipen=999)
p <- ggplot(data = col_table, aes(x=factor(Decade),y=val)) +
geom_bar(stat = "identity", fill = "navy") +
scale_x_discrete(labels = col_table %>% distinct(Decade) %>% mutate(Decade = paste0(Decade,"s")) %>% pull()) +
geom_text(aes(label=format(val,big.mark=",")), size=2,vjust=-0.3) +
scale_y_continuous(labels = paste(ylab, "millions"),breaks = 10^6 * ylab) +
ggtitle('Famine victims worldwide')+
theme(panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(size = 0.05, linetype = 'solid',
colour = "black"),
axis.title.x = element_blank(),
axis.title.y = element_blank())
p
Here's the reproduced plot:
First, strsplit, make a proper year matrix, combine back with famines divided by number of years and reshape to long format (lines 1:6). Next, aggregate sums by decade and barplot it.
r <- strsplit(data1$Year, '-|–|, ') |>
rapply(\(y) unlist(lapply(y, \(x) f(max(as.numeric(y)), x))), how='r') |>
{\(.) t(sapply(., \(x) `length<-`(x, max(lengths(.)))))}() |>
{\(.) cbind(`colnames<-`(., paste0('year.', seq_len(dim(.)[2]))),
n=dim(.)[2] - rowSums(is.na(.)))}() |>
{\(.) data.frame(., f=as.numeric(gsub('\\D', '',
data1$`Excess Mortality midpoint`))/
.[, 'n'])}()|>
reshape(1:3, direction='long') |>
stats:::aggregate.formula(formula=f ~ as.integer(substr(year, 1, 3)),
FUN=sum) |>
t()
## plot
op <- par(mar=c(5, 5, 4, 2)+.1) ## set/store old pars
b <- barplot(r, axes=FALSE, ylim=c(0, max(r[2, ])*1.05),
main='Famine victims', )
abline(h=asq, col='lightgrey', lty=3)
barplot(r, names.arg=paste0(r[1, ], '0s'), col='#20254c',
cex.names=.8, axes=FALSE, add=TRUE)
asq <- seq(0, max(axTicks(2)), 2e6)
axis(2, asq, labels=FALSE)
mtext(paste(asq/1e6, 'Million'), 2, 1, at=asq, las=2)
text(b, r[2, ] + 5e5, labels=formatC(r[2, ], format='d', big.mark=','), cex=.7)
box()
par(op) ## restore old pars
In line 2, I used this helper function f() to fill up the pseudo-years:
f <- \(x1, x2, n1=nchar(x1)) {
u <- lapply(list(x1, x2), as.character)
s <- c(n1 - nchar(u[[2]]) + 1L, n1)
as.integer(`substr<-`(u[[1]], s[1], s[2], u[[2]]))
}
You can refine the aggregation method yourself to make the result exactly look like the original, but maybe this is better :)

How Insert an expression in legend in ggplot2?:: correct color + multiple lines and point

I am new to R and have not been able to correct the following graph.
Xb_exp, it should have blue dots.
Xb_dw, solid red line.
Xb_f, dotted line.
Xb_s, longdash line.
The legend expression should be as shown with the subscript.
I have not been able to correct it.
Is there a way to do this?
enter image description here
my data
CA <- c(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
Xb_exp <- c(0.0231,0.0519,0.0839,0.1197,0.1595,0.1996,0.2384,0.2772,0.3153,0.3520,0.3887,0.4254,0.4615,0.4978,0.5339,0.5685,0.6000,0.6279,0.6528,0.6762,0.6974,0.7166,0.7346,0.7516,0.7669,0.7810,0.7940,0.8059)
Xb_dw <- c(0.0160,0.0516,0.0886,0.1259,0.1633,0.2006,0.2377,0.2749,0.3122,0.3496,0.3870,0.4245,0.4617,0.4984,0.5339,0.5678,0.5996,0.6288,0.6551,0.6786,0.6994,0.7179,0.7346,0.7499,0.7641,0.7774,0.7899,0.8018)
Xb_f <- c(0.0021,0.0031,0.0046,0.0067,0.0095,0.0131,0.0177,0.0234,0,0387,0.0483,0.0591,0.0709,0.0832,0.0955,0.1073,0.1181,0.1272,0.1345,0.1398,0.1443,0.1456,0.1468,0.1474,0.1476,0.1477,0.1477,0.1477,0.1477)
Xb_s <- c(0.0139,0.0484,0.0839,0.1192,0.1538,0.1874,0.2200,0.2515,0.2818,0.3108,0.3387,0.3653,0.3908,0.4151,0.4383,0.4604,0.4815,0.5015,0.5206,0.5387,0.5559,0.5722,0.5877,0.6024,0.6164,0.6264,0.6421,0.6040)
dat <- c(CA, Xb_exp, Xb_dw, Xb_f, Xb_s)
my code
labels = c(expression(X[b_exp]),expression(X[b_dw]),expression(X[b_f]),expression(X[b_s]))
color4 <- c("Xb_exp"="#3C5488FF", "Xb_dw"="#DC0000FF", "Xb_f"="#00A087FF", "Xb_s"="#4DBBD5FF")
Xb_D1 <- ggplot(data = dat) +
theme_bw() +
labs(x="Crank position (ºCA)", y= bquote('Burn fraction ('~X[b]~')')) +
geom_point(aes(x=CA, y=Xb_exp, colour="Xb_exp"), size=3) +
geom_line(aes(x=CA, y=Xb_dw,colour="Xb_dw"), size=1,linetype="solid") +
geom_line(aes(x=CA, y=Xb_f,colour="Xb_f"), size=1,linetype="dotted") +
geom_line(aes(x=CA, y=Xb_s,colour="Xb_s"), size=1,linetype="longdash") +
scale_colour_manual(values=color4, labels=labels) +
theme(legend.title = element_blank(),legend.position = c(0.8, 0.5),
legend.text = element_text(size = 12)) +
scale_x_continuous(limits = c(2,80))
plot(Xb_D1)
ggplot() requires a dataframe not a vector. If you modify your code with:
dat <- data.frame(CA, Xb_exp, Xb_dw, Xb_f, Xb_s)
and fix the typo in your Xb_f vector
Xb_f <- c(0.0021,0.0031,0.0046,0.0067,0.0095,0.0131,0.0177,0.0234,0.0387,0.0483,0.0591,0.0709,0.0832,0.0955,0.1073,0.1181,0.1272,0.1345,0.1398,0.1443,0.1456,0.1468,0.1474,0.1476,0.1477,0.1477,0.1477,0.1477)
Your remaining code will work as but could be achieved more simply using the tidyverse approach below. Use pivot_longer to stack the y variables against your x variable.
dat %>%
pivot_longer(Xb_exp:Xb_s) %>%
ggplot(aes(x = CA, y = value, colour = name)) +
geom_point() +
geom_line() +
scale_colour_manual(values=color4, labels=labels) +
theme_bw() +
theme(legend.title = element_blank(),legend.position = c(0.8, 0.5),
legend.text = element_text(size = 12)) +
scale_x_continuous(limits = c(2,80)) +
labs(x="Crank position (ºCA)", y= bquote('Burn fraction ('~X[b]~')')) ```
Ironically, setting this up with conventional ploting is rather simple:
Given all the data above:
linetypes4 <- c( Xb_exp=NA, Xb_dw="solid", Xb_f="dotted", Xb_s="longdash" )
plot(
NA, type="n", xlim=c(0,30), ylim=c(0,0.8),
xlab = "Crank position (ºCA)", ylab = bquote('Burn fraction ('~X[b]~')'),
panel.first = grid()
)
with( dat, {
points( x=CA, y=Xb_exp, pch=19, col=color4["Xb_exp"], size=3 )
for( n in c("Xb_dw", "Xb_f", "Xb_s")) {
lines( x=CA, y=get(n), lty=linetypes[n], col=color4[n], lwd=2 )
}
})
legend(
x = "right",
legend = labels,
col = color4,
lty = linetypes4,
pch = c(19,NA,NA,NA),
box.lwd = 0,
inset = .02
)
There are some errors in your code suggesting you didn't try what you pasted.
0,0387, in your data should likely be 0.0387, otherwise nothing is right (no data measures several hundreds in there)
c(CA, ... ) should likely be data.frame( CA, ... )
Now, the first problem is you are doing all the heavy lifting yourself, while ggplot sits there with nothing left to do. It was designed to set up colors and line types by group. You however need to transform the data first to take full advantage of that:
library(tidyr)
CA <- c(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
Xb_exp <- c(0.0231,0.0519,0.0839,0.1197,0.1595,0.1996,0.2384,0.2772,0.3153,0.3520,0.3887,0.4254,0.4615,0.4978,0.5339,0.5685,0.6000,0.6279,0.6528,0.6762,0.6974,0.7166,0.7346,0.7516,0.7669,0.7810,0.7940,0.8059)
Xb_dw <- c(0.0160,0.0516,0.0886,0.1259,0.1633,0.2006,0.2377,0.2749,0.3122,0.3496,0.3870,0.4245,0.4617,0.4984,0.5339,0.5678,0.5996,0.6288,0.6551,0.6786,0.6994,0.7179,0.7346,0.7499,0.7641,0.7774,0.7899,0.8018)
Xb_f <- c(0.0021,0.0031,0.0046,0.0067,0.0095,0.0131,0.0177,0.0234,0.0387,0.0483,0.0591,0.0709,0.0832,0.0955,0.1073,0.1181,0.1272,0.1345,0.1398,0.1443,0.1456,0.1468,0.1474,0.1476,0.1477,0.1477,0.1477,0.1477)
Xb_s <- c(0.0139,0.0484,0.0839,0.1192,0.1538,0.1874,0.2200,0.2515,0.2818,0.3108,0.3387,0.3653,0.3908,0.4151,0.4383,0.4604,0.4815,0.5015,0.5206,0.5387,0.5559,0.5722,0.5877,0.6024,0.6164,0.6264,0.6421,0.6040)
dat <- data.frame(CA, Xb_exp, Xb_dw, Xb_f, Xb_s)
color4 <- c("Xb_exp"="#3C5488FF", "Xb_dw"="#DC0000FF", "Xb_f"="#00A087FF", "Xb_s"="#4DBBD5FF")
linetypes <- c( Xb_dw="solid", Xb_f="dotted", Xb_s="longdash" )
dat2 <- pivot_longer( dat, cols=starts_with("Xb_") )
dat2.line <- dat2 %>% filter( name != "Xb_exp" )
dat2.point <- dat2 %>% filter( name == "Xb_exp" )
dat2 is now a long data set, with data category as a variable, not with a separate column for each data series. This is how ggplot likes it:
dat2
# A tibble: 112 x 3
CA name value
<dbl> <fct> <dbl>
1 3 Xb_exp 0.0231
2 3 Xb_dw 0.016
3 3 Xb_f 0.0021
4 3 Xb_s 0.0139
5 4 Xb_exp 0.0519
6 4 Xb_dw 0.0516
7 4 Xb_f 0.0031
8 4 Xb_s 0.0484
9 5 Xb_exp 0.0839
10 5 Xb_dw 0.0886
# … with 102 more rows
I then split the data on what later goes to points and what goes ot lines, just not to make the plot code uglier than it has to be:
Xb_D1 <- ggplot(data = dat2.line, aes(x=CA,y=value,color=name)) +
theme_bw() +
labs(x="Crank position (ºCA)", y= bquote('Burn fraction ('~X[b]~')')) +
geom_point( data = dat2.point, size=3) +
geom_line( aes(col=name,lty=name), size=1 ) +
scale_colour_manual(values=color4) +
scale_linetype_manual( values=linetypes, guide=FALSE ) +
guides(
color = guide_legend( override.aes=list( shape=c(NA,19,NA,NA), linetype=c("solid","solid","dashed","dotted") ) )
) +
theme(legend.title = element_blank(),legend.position = c(0.8, 0.5),
legend.text.align = 0,
legend.text = element_text(size = 12)) +
scale_x_continuous(limits = c(2,30))
print(Xb_D1)
no need to supply labels
use line type as you would use color with ggplot, its just one more channel that can carry information (or aesthetic as they like to call it over there)
align the legends left, looks nicer that way
more sophisticated is the use of override.aes to take away the points from the legend categories who shouldn't have them.
Now, I was unable to change the order of the data series in the labels, that can be a hazzle. Is it still ok for you the order they are?

Split dataframe and Create multipanel scatterplots from list of data frames

I have a dataframe like so:
set.seed(453)
year= as.factor(c(rep("1998", 20), rep("1999", 16)))
lepsp= c(letters[seq(from = 1, to = 20 )], c('a','b','c'),letters[seq(from =8, to = 20 )])
freq= c(sample(1:15, 20, replace=T), sample(1:18, 16,replace=T))
df<-data.frame(year, lepsp, freq)
df<-
df %>%
group_by(year) %>%
mutate(rank = dense_rank(-freq))
Frequencies freq of each lepsp within each year are ranked in the rank column. Larger freq values correspond to the smallest rank value and smaller freq values have the largest rank values. Some rankings are repeated if levels of lepsp have the same abundance.
I would like to split the df into multiple subsets by year. Then I would like to plot each subsetted dataframe in a multipanel figure. Essentially this is to create species abundance curves. The x-axis would be rank and the yaxis needs to be freq.
In my real dataframe I have 22 years of data. I would prefer the graphs to be displayed as 2 columns of 4 rows for a total of 8 graphs per page. Essentially I would have to repeat the solution offered here 3 times.
I also need to demarcate the 25%, 50% and 75% quartiles with vertical lines to look like this (desired result):
It would be great if each graph specified the year to which it belonged, but since all axis are the same name, I do not want x and y labels to be repeated for each graph.
I have tried to plot multiple lines on the same graph but it gets messy.
year.vec<-unique(df$year)
plot(sort(df$freq[df$year==year.vec[1]],
decreasing=TRUE),bg=1,type="b", ylab="Abundance", xlab="Rank",
pch=21, ylim=c(0, max(df$freq)))
for (i in 2:22){
points(sort(df$freq[df$year==year.vec[i]], decreasing=TRUE), bg=i,
type="b", pch=21)
}
legend("topright", legend=year.vec, pt.bg=1:22, pch=21)
I have also tried a loop, however it does not produce an output and is missing some of the arguments I would like to include:
jpeg('pract.jpg')
par(mfrow = c(6, 4)) # 4 rows and 2 columns
for (i in unique(levels(year))) {
plot(df$rank,df$freq, type="p", main = i)
}
dev.off()
Update
(Attempted result)
I found the following code after my post which gets me a little closer, but is still missing all the features I would like:
library(reshape2)
library(ggplot2)
library (ggthemes)
x <- ggplot(data = df2, aes(x = rank, y = rabun)) +
geom_point(aes(fill = "dodgerblue4")) +
theme_few() +
ylab("Abundance") + xlab("Rank") +
theme(axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 15),
axis.text.y = element_text(size = 15),
plot.title = element_blank(), # we don't want individual plot titles as the facet "strip" will give us this
legend.position = "none", # we don't want a legend either
panel.border = element_rect(fill = NA, color = "darkgrey", size = 1.25, linetype = "solid"),
axis.ticks = element_line(colour = 'darkgrey', size = 1.25, linetype = 'solid')) # here, I just alter to colour and thickness of the plot outline and tick marks. You generally have to do this when faceting, as well as alter the text sizes (= element_text() in theme also)
x
x <- x + facet_wrap( ~ year, ncol = 4)
x
I prefer base R to modify graph features, and have not been able to find a method using base R that meets all my criteria above. Any help is appreciated.
Here's a ggplot approach. First off, I made some more data to get the 3x2 layout:
df = rbind(df, mutate(df, year = year + 4), mutate(df, year = year + 8))
Then We do a little manipulation to generate the quantiles and labels by group:
df_summ =
df %>% group_by(year) %>%
do(as.data.frame(t(quantile(.$rank, probs = c(0, 0.25, 0.5, 0.75)))))
names(df_summ)[2:5] = paste0("q", 0:3)
df_summ_long = gather(df_summ, key = "q", value = "value", -year) %>%
inner_join(data.frame(q = paste0("q", 0:3), lab = c("Common", "Rare-75% -->", "Rare-50% -->", "Rare-25% -->"), stringsAsFactors = FALSE))
With the data in good shape, plotting is fairly simple:
library(ggthemes)
library(ggplot2)
ggplot(df, aes(x = rank, y = freq)) +
geom_point() +
theme_few() +
labs(y = "Abundance (% of total)", x = "Rank") +
geom_vline(data = df_summ_long[df_summ_long$q != "q0", ], aes(xintercept = value), linetype = 4, size = 0.2) +
geom_text(data = df_summ_long, aes(x = value, y = Inf, label = lab), size = 3, vjust = 1.2, hjust = 0) +
facet_wrap(~ year, ncol = 2)
There's some work left to do - mostly in the rarity text overlapping. It might not be such an issue with your actual data, but if it is you could pull the max y values into df_summ_long and stagger them a little bit, actually using y coordinates instead of just Inf to get it at the top like I did.

Add ticks in-between discrete groups on x-axis

I want to replace one of my grouped boxplots (below) to before-after kind, but keep it grouped. This one was made using ggboxplot() from ggpubr. I know there's also ggpaired() but I couldn't manage to make it grouped like this one.
Thanks to this question I was able to create grouped before-after graph like this one. I would now like to change the axis from 4 marks to just 2 (just "yes" and "no", since "before" and "after" are still in the legend.
Here's my code with dummy data:
library(tidyverse)
set.seed(123)
data.frame(ID = rep(LETTERS[1:10], 2),
consent = rep(sample(c("Yes", "No"), 10, replace = T), 2),
height = sample(rnorm(20, 170, sd = 10)),
ind = rep(c("before", "after"), each = 2)
) %>%
ggplot(aes(x = interaction(ind, consent), y = height, color = ind))+
geom_point()+
geom_line(aes(group = interaction(ID, consent)), color = "black")+
scale_x_discrete("response")
Is it even possible to reduce number of categories on axis? Or can I create grouped plot using ggpaired(), but without using facets?
Solution can be to create dummy numeric variable (in-between before and after) and put it on the x-axis. Then you can change it's names.
# Generate OP data
library(tidyverse)
set.seed(123)
df <- data.frame(ID = rep(LETTERS[1:10], 2),
consent = rep(sample(c("Yes", "No"), 10, replace = T), 2),
height = sample(rnorm(20, 170, sd = 10)),
ind = rep(c("before", "after"), each = 2)
)
df$name <- paste(df$consent, df$ind)
# Generate dummy numeric variable for `name` combinations
foo <- data.frame(name = c("Yes before", "Yes", "Yes after",
"No before", "No", "No after"),
X = 1:6)
# name X
# 1 Yes before 1
# 2 Yes 2
# 3 Yes after 3
# 4 No before 4
# 5 No 5
# 6 No after 6
And now we just need to map name to X and put it on x-axis:
df <- merge(foo, df)
ggplot(df, aes(X, height))+
geom_point(aes(color = ind)) +
geom_line(aes(group = interaction(ID, consent))) +
scale_x_continuous(breaks = c(2, 5), labels = foo$name[c(2, 5)])
#camille made me think about facety solution. Apparently, it is possible to put facet labels not just to the bottom of the plot, but even under the axis. Which solved my problem without having to modify my dataframe:
library(ggpubr) #for theme_pubr and JCO palette
ggplot(df, aes(x = ind, y = height, group = ID))+
geom_point(aes(color = ind), size = 3)+
geom_line()+
labs(y = "Height")+
facet_wrap(~ consent,
strip.position = "bottom", ncol = 5)+ #put facet label to the bottom
theme_pubr()+
color_palette("jco")+
theme(strip.placement = "outside", #move the facet label under axis
strip.text = element_text(size = 12),
strip.background = element_blank(),
axis.title.x = element_blank(),
legend.position = "none")
Result with dataframe from the question:

Incorporate more information about variables on axes into a heatmap in ggplot

I would like to annotate a heat-map by putting symbols next to the axis text.
For instance, lets say I am plotting out a distance matrix. Here's an example of such a matrix for environmental community similarity:
library(vegan)
library(tidyverse)
data(varespec)
data(varechem)
library(reshape2)
library(viridis)
vare.dist <- vegdist(varespec)
vare.hc <- hclust(as.dist(vare.dist))
vare.dist.long <- vare.dist %>% as.matrix %>% melt %>%
mutate(Var1 = factor(Var1, levels = unique(vare.hc$labels)[vare.hc$order]))%>%
mutate(Var2 = factor(Var2, levels = unique(vare.hc$labels)[vare.hc$order]))
vare.dist.long %>% #as.matrix %>% .[vare.hc$order, vare.hc$order] %>% melt %>%
ggplot(aes(x = Var1, y = Var2, fill = value)) + geom_tile() + scale_fill_viridis(direction = 1) +
theme(axis.text.x = element_text(angle = 270, hjust = 0, vjust = 0.5
))
Figure 1. A heatmap of distances between different sites in the varespec data set.
Let's say I want to indicate which sites have higher than average nitrogen, phosphorous and potassium. I want to indicate this data along the axis, reserving the y-axis for some other purpose. One ugly way of doing this would be to modify the text strings, and text font, for axis one.
## Define a few helper functions
transmit_factor_order <- function(ordered, unordered){
# allows us to put a character vector or unordered factor in the same order as another factor
# ordered is an ordered factor
# unordered is an unordered factor or character vector
if(class(ordered) != 'factor'){
stop("'ordered' must be of class factor")
}
nrow = length(ordered)
df = data.frame(ordered, unordered, oldorder = 1:nrow)
df = df[order(df[,"ordered"]),]
df[,"unordered"] = factor(df[,"unordered"], levels = unique(df[,"unordered"]))
df = df[order(df[,"oldorder"]),]
df$unordered
}
medcode_chr <- function(vec, low = "", high = "o"){
# convert a vector of numbers into one of symbols (or numbers) with one value
# for lower than median values and one for higher than median values
sapply(vec, function(x){
if(x < median(na.omit(vec))){low}else{high}
}
)
}
## Actual Work
vare.dist.long %>%
# convert numeric vector to character
mutate(Var1_chr = as.character(Var1)) %>%
# append the envioronmental data
left_join(varechem %>% rownames_to_column %>% dplyr::select(rowname:K),
by = c('Var1_chr' = 'rowname')) %>%
# make new columns with symbols that we display if values are bigger than the median
mutate(highN = medcode_chr(N),
highP = medcode_chr(P, high = "+"),
highK = medcode_chr(K, low = 0, high = 1)) %>%
# make a new name, which is the number, appended to the symbols defined above
unite(Var1_Annotated, Var1, highN, highP, sep = " ", remove = FALSE) %>%
# make sure that newly named vecotr is in the same order as Var1
#(which was ordered for clustering purposes)
mutate(Var1_Annotated = transmit_factor_order(Var1, Var1_Annotated)) %>%
# do the same thing to the character version of Var1, which will be useful downstream
mutate(Var1_chr = transmit_factor_order(Var1, Var1_chr))-> vare.data
vare.data %>%
ggplot(aes(x = Var1_Annotated, y = Var2, fill = value)) + geom_tile() + scale_fill_viridis(direction = 1) +
theme(axis.text.x = element_text(angle = 270, hjust = 0, vjust = 0.5,
face = ifelse(vare.data$highK, "bold", "plain"),
colour = ifelse(vare.data$highK, "red", "blue")
))
Figure 2. Another heatmap of distances between different sites in the varespec data set. This time the x-axis names have been modified to show if nitrogen is high or low (presence or absence of "o"), whether phosphorous is high or low (presence or abscence of "+"), and whether potassium is high (red) or low (blue).
This figure conveys the information that I need, but it's kind of ugly. I'd rather say, place differently colored circles to convey which things have high nitrogen, phosphorous and potassium. I'm thinking something like the following, but actually part of the previous figure.
vare.data %>% dplyr::select(-c(Var2, value)) %>% unique %>% arrange(Var1_Annotated) %>%
mutate(highN = medcode_chr(N, 0, 1),
highP = medcode_chr(P, 0, 1),
highK = medcode_chr(K, 0, 1)) %>%
dplyr::select(-c(N,P,K, Var1, Var1_Annotated)) %>%
gather(key, value, -Var1_chr) %>%
filter(value == 1) %>%
ggplot(aes(x = Var1_chr, y = key, color = key)) + geom_point() +
theme(axis.text.x = element_text(angle = 270, hjust = 0, vjust = 0.5),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank()) +
coord_fixed(ratio = 0.3)
Figure 3. Colored points, that I'd like to incorperate into Figure 1 to make it prettier than figure 2 but convey the same information.
Is there some way I can incorporate the colored dots (Figure 3) into the heatmap (Figure 1) so I can show the data about how the sites cluster and concurrently tell information about the different sites, as I do in Figure 2?
Thanks for any advice!
One (rather dirty) option would be to arrange both grobs, align them by vertically, tinker with the grobs vertical position, andedit the x-axis' scales and titles.
library(cowplot)
A <- vare.dist.long %>%
ggplot(aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_viridis(direction = 1) +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
plot.margin = unit(c(1,1,-1.5,1), "cm") ## Note the -1.5, it tells the grob to print itself 1.5 cm below its normal position.
) #/theme
B <- vare.data %>% dplyr::select(-c(Var2, value)) %>% unique %>%
arrange(Var1_Annotated) %>%
mutate(highN = medcode_chr(N, 0, 1),
highP = medcode_chr(P, 0, 1),
highK = medcode_chr(K, 0, 1)) %>%
dplyr::select(-c(N,P,K, Var1, Var1_Annotated)) %>%
gather(key, value, -Var1_chr) %>%
filter(value == 1) %>%
ggplot(aes(x = Var1_chr, y = key, color = key)) +
geom_point() +
theme(axis.text.x = element_text(angle = 270, hjust = 0, vjust = 0.5),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
plot.margin = unit(c(0,1,1,1), "cm")) +
coord_fixed(ratio = 0.3)
cowplot::plot_grid(A,B, nrow = 2, align = "v")

Resources