R ggplot2 facet chart with annotations varying among the facets - r

I am trying to construct a facet plot using ggplot2 with an annotation that varies from one facet to the next. The annotation is to be located using the plot area coordinates between 0 and 1, rather than the usual (x,y) coordinates, and is to be in the same location for every facet. The annotation is to be constructed using the y aesthetic and the paste0() function.
My reprex shows one case that works, but this case does not include the part that comes from the y aesthetic and the annotation does not vary among the facets. The reprex also shows a second case where the percentage change in the last (most recent) value for the y aesthetic is added to the annotation, and this does not work. It is this second case that I want to solve.
The reprex uses the ggpp package, but I have also tried using annotation_custom instead of ggpp. However I have not been able to get that to work either. Any help much appreciated.
Here is my reprex:
# Reprex for facets with placed annotation
library(ggplot2)
library(ggpp)
PC <- function(x) {y <- round(100*(x/lag(x)-1),1)}
df <- data.frame(tm=1:25,A=sample(1:100,25,replace=T),
B=sample(1:100,25,replace=T),C=sample(1:100,25,replace=T),
D=sample(1:100,25,replace=T))
df <- tidyr::pivot_longer(df,cols=2:5,names_to="City",values_to="Value")
# This works:
ggplot(df,aes(x=tm,y=Value))+
geom_line()+
scale_y_continuous(lim=c(-10,100))+
ggpp::geom_text_npc(aes(npcx = x, npcy = y, label=label),
data = data.frame(x = 0.05, y = 0.05,
label='% change in various cities'))+
facet_wrap(~City,scale="free_y")
# But this does not work:
ggplot(df,aes(x=tm,y=Value))+
geom_line()+
scale_y_continuous(lim=c(-10,100))+
ggpp::geom_text_npc(aes(npcx = x, npcy = y, label=label),
data = data.frame(x = 0.05, y = 0.05,
label=paste0("Last change in this city ",PC(y)[25],'%')))+
facet_wrap(~City,scale="free_y")

This solution below is a bit complicated, there are probably simpler ones, but it works.
1. Function PC()
Without loading package dplyr, your function PC is calling stats::lag, not dplyr::lag. And assigning to y without returning its value. The right version is
PC <- function(x) {round(100*(x/dplyr::lag(x) - 1), 1)}
2. The data
The plot is created with data = df but then, when plotting the labels the data set changes and the y value no longer comes from df.
The ggpp::geom_text_npc layer doesn't compute PC(y) correctly because its data argument only is self-referring to y. The data.frame is ill formed. This y is not the one in df.
A way to correct this is to first note that the labels to be plotted are 4, one per city and compute the last change value beforehand. This is very simple:
Value <- with(df, tapply(Value, City, \(y) PC(y)[length(y)]))
Value
# A B C D
# -24.1 -16.7 -91.3 46.9
The labels data then becomes
df_labels <- data.frame(
x = rep(0.05, length(Value)), y = rep(0.05, length(Value)),
City = names(Value),
label = paste0("Last change in this city ", Value, "%")
)
3. The plot
Full reproducible example, from top to bottom.
# Reprex for facets with placed annotation
suppressPackageStartupMessages({
library(ggplot2)
library(ggpp)
})
set.seed(2022)
PC <- function(x) {y <- round(100*(x/dplyr::lag(x) - 1), 1)}
df <- data.frame(tm=1:25,A=sample(1:100,25,replace=T),
B=sample(1:100,25,replace=T),
C=sample(1:100,25,replace=T),
D=sample(1:100,25,replace=T))
df <- tidyr::pivot_longer(df,cols=2:5,names_to="City",values_to="Value")
Value <- with(df, tapply(Value, City, \(y) PC(y)[length(y)]))
df_labels <- data.frame(
x = rep(0.05, length(Value)), y = rep(0.05, length(Value)),
City = names(Value),
label = paste0("Last change in this city ", Value, "%")
)
ggplot(df, aes(x = tm, y = Value)) +
geom_line() +
scale_y_continuous(lim = c(-10, 100)) +
ggpp::geom_text_npc(
data = df_labels,
mapping = aes(
npcx = x, npcy = y,
label = label
)
) +
facet_wrap(~ City, scale = "free_y")
Created on 2022-08-08 by the reprex package (v2.0.1)

Related

Setting per-column y axis limits with facet_grid

I am, in R and using ggplot2, plotting the development over time of several variables for several groups in my sample (days of the week, to be precise). An artificial sample (using long data suitable for plotting) is this:
library(tidyverse)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>% ggplot(mapping = aes(x = x, y = values)) + geom_line() + facet_grid(groups2 ~ groups1)
which gives
In this example, the first variable -- shown in the left column -- has unlimited range, while the second variable -- shown in the right column -- is weakly positive.
I would like to reflect this in my plot by allowing the Y axes to differ across the columns in this plot, i.e. set Y axis limits separately for the two variables plotted. However, in order to allow for easy visual comparison of the different groups for each of the two variables, I would also like to have the identical Y axes within each column.
I've looked at the scales option to facet_grid(), but it does not seem to be able to do what I want. Specifically,
passing scales = "free_x" allows the Y axes to vary across rows, while
passing scales = "free_y" allows the X axes to vary across columns, but
there is no option to allow the Y axes to vary across columns (nor, presumably, the X axes across rows).
As usual, my attempts to find a solution have yielded nothing. Thank you very much for your help!
I think the easiest would to create a plot per facet column and bind them with something like {patchwork}. To get the facet look, you can still add a faceting layer.
library(tidyverse)
library(patchwork)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
set.seed(42) ## always better to set a seed before using random functions
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>%
group_split(groups1) %>%
map({
~ggplot(.x, aes(x = x, y = values)) +
geom_line() +
facet_grid(groups2 ~ groups1)
}) %>%
wrap_plots()
Created on 2023-01-11 with reprex v2.0.2

Add a labelling function to just first or last ggplot label

I often find myself working with data with long-tail distributions, so that a huge amount of range in values happens in the top 1-2% of the data. When I plot the data, the upper outliers cause variation in the rest of the data to wash out, but I want to show those difference.
I know there are other ways of handling this, but I found that capping the values towards the end of the distribution and then applying a continuous color palette (i.e., in ggplot) is one way that works for me to represent the data. However, I want to ensure the legend stays accurate, by adding a >= sign to the last legend label
The picture below shows the of legend I'd like to achieve programmatically, with the >= sign drawn in messily in red.
I also know I can manually set breaks and labels, but I'd really like to just do something like, if(it's the last label) ~paste0(">=",label) else label) (to show with pseudo code)
Reproducible example:
(I want to alter the plot legend to prefix just the last label)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- tibble(x = x
,y = y
,z = z)
d %>%
ggplot(aes(x = x
,y = y
,fill = z
,color = z)) +
geom_point() +
scale_color_viridis_c()
One option would be to pass a function to the labels argument which replaces the last element or label with your desired label like so:
library(ggplot2)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- data.frame(
x = x,
y = y,
z = z
)
ggplot(d, aes(
x = x,
y = y,
fill = z,
color = z
)) +
geom_point() +
scale_fill_continuous(labels = function(x) {
x[length(x)] <- paste0(">=", x[length(x)])
x
}, aesthetics = c("color", "fill"))

violin plot in R and values in X axis

I created these two violin plots in R, using:
install.packages("vioplot")
par(mfrow = c(1, 2))
vioplot::vioplot(HEL$Y,las=2,main="HEL$Y",col="deepskyblue",notch=TRUE)
vioplot::vioplot(ITA$Y,las=2,main="ITA$Y",col="aquamarine",notch=TRUE)
as a result I get the following. However, I don't know why in the X axis I get 1 and 2. How can I get rid of the 2?
Thanks for your help.
This mysterious behavior is due to the use of the argument "notch = TRUE". Example:
set.seed(456)
vioplot(rnorm(10), notch = TRUE)
My interpretation is that notch is not an argument of vioplot, so the function interprets it as data to add to the graph (see the little smudge at y = 1: that's where it wants to put the new data, since TRUE equals 1 when it is converted into a numeric).
To confirm that an unknown argument is interpreted as data to be plotted, here is a little experiment:
vioplot(rnorm(10), unknown_argument = rnorm(10))
And the result:
This is a ggplot2 solution in case you're interested.
library(ggplot2)
library(dplyr)
# Recreate similar data
HEL <- data.frame(Y = rnorm(50, 8, 3))
ITA <- data.frame(Y = rnorm(50, 9, 2))
# Join in a single dataframe and reshape to longer format
dat <- bind_rows(rename(HEL, hel_y = Y),
rename(ITA, ita_y = Y)) |>
tidyr::pivot_longer(everything())
# Make the plots
dat |>
ggplot(aes(name, value)) +
geom_violin(aes(fill = name)) +
geom_boxplot(width = 0.1) +
scale_fill_manual(values = c("deepskyblue", "aquamarine")) +
theme(legend.position = "")
Created on 2022-04-28 by the reprex package (v2.0.1)

R colour code plot by rownames for principal component analysis

I am attempting to complete a principal component analysis on a set of data containing columns of numeric data.
Assuming a dataset like this (in reality I have a pre configured data frame, this one if for reproducibility):
v1 <- c(1,2,3,4,5,6,7)
v2 <- c(3,6,2,5,2,4,9)
v3 <- c(6,1,4,2,3,7,5)
dataset <-data.frame(v1,v2,v3)
row.names(dataset) <-c('New York', 'Seattle', 'Washington DC', 'Dallas', 'Chicago','Los Angeles','Minneapolis')
I have ran my principal component analysis, and successfully plotted it:
pca=prcomp(dataset,scale=TRUE)
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
What I want to do however is colour code my data points based on the city, which is the row names of my dataset. I also want to use these cities (i.e. rownames) as labels.
I've tried the following, but neither have worked:
## attempt 1 - I get row labels, but no chart
plot(pca$x[,1], pca$x[,2],col=rownames(dataset),pch=rownames(dataset),
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),cex=0.7,pos=3,col="darkgrey")
## attempt 2
datasetwithcity = rownames_to_column(dataset, var = "city")
head(datasetwithcity)
OnlyCities=datasetwithcity[,1]
OnlyCities
# this didn't work:
City_Labels=as.numeric(OnlyCities)
head(City_Labels)
# gets city labels, but loses points and no colour
plot(pca$x[,1], pca$x[,2],col=City_Labels,pch=City_Labels,
xlab="First PC",ylab="Second PC")
text(pca$x[,1], pca$x[,2],labels=rownames(dataset),
cex=0.7,pos=3,col="darkgrey")
There are many different ways to do this.
In base R, you could do:
plot(pca$x[,1], pca$x[,2],
xlab="First PC",ylab="Second PC", col = seq(nrow(pca$x)),
xlim = c(-2.5, 2.5), ylim = c(-2, 2))
text(pca$x[,1], pca$x[,2],cex=0.7,pos=3,col="darkgrey")
text(x = pca$x[,1], y = pca$x[,2], labels = rownames(pca$x), pos = 1)
Personally, I think the resulting aesthetics are nicer (and more easy to change to suit your needs) with ggplot. The code is also a bit easier to read once you get used to the syntax.
library(ggplot2)
df <- as.data.frame(pca$x)
df$city <- rownames(df)
ggplot(df, aes(PC1, PC2, color = city)) +
geom_point(size = 3) +
geom_text(aes(label = city) , vjust = 2) +
lims(x = c(-2.5, 2.5), y = c(-2, 2)) +
theme_bw() +
theme(legend.position = "none")
Created on 2021-10-28 by the reprex package (v2.0.0)

ggplot facet_wrap with italics

I have a dataset I'm plotting, with facets by variables (in the toy dataset - densities of 2 species). I need to use the actual variable names to do 2 things: 1) italicize species names, and 2) have the 2 in n/m2 properly superscripted (or ASCII-ed, whichever easier).
It's similar to this, but I can't seem to make it work for my case.
toy data
library(ggplot2)
df <- data.frame(x = 1:10, y = 1:10,
z = rep(c("Species1 density (n/m2)", "Species2 density (m/m2)"), each = 5),
z1 = rep(c("Area1", "Area2", "Area3", "Area4", "Area5"), each = 2))
ggplot(df) + geom_point(aes(x = x, y = y)) + facet_grid(z1 ~ z)
I get an error (variable z not found) when I try to use the code in the answer naively. How do I get around having 2 variables in the facetting?
A little modification gets the code from your link to work. I've changed the code to use data_frame to stop the character vector being converted to a factor, and taken the common information out of the codes so it can be added via the labeller (otherwise it would be a pain to make half the text italic)
library(tidyverse)
df <- data_frame(
x = 1:10,
y = 1:10,
z = rep(c("Species1", "Species2"), each = 5),
z1 = rep(c("Area1", "Area2", "Area3", "Area4", "Area5"), each = 2)
)
ggplot(df) +
geom_point(aes(x = x, y = y)) +
facet_grid(z1 ~ z, labeller = label_bquote(col = italic(.(z))~density~m^2))

Resources