Visualizing Vargha and Delaney's A with ggplot - r

I would like to visualize Vargha & Delaney's A in ggplot for educational purposes.
A is an effect size used to compare ordinal data of two groups that depend on each data point's upward/downward/sideways comparison to all data points of the other group.
For this, I would like to be able to show all upward, downward, and equal comparisons of data points in different colors. For an example of what I'm looking for, check out this rough scribble
For reproducibility's sake here is some data to try it with:
library(tidyverse)
data_VD <- tibble(
A = c(1, 2, 3, 6),
B = c(1, 3, 7, 9)
)
For reference to how A is calculated, see https://journals.sagepub.com/doi/10.3102/10769986025002101, though it shouldn't be necessary for creating the plot.

You could do:
library(tidyverse)
long_dat <- data_VD %>%
{expand.grid(A = .$A, B = .$B)} %>%
mutate(change = factor(sign(B - A)))
ggplot(pivot_longer(data_VD, everything()), aes(x = name, y = value)) +
geom_segment(data = long_dat, size = 1.5,
aes(x = 'A', xend = 'B', y = A, yend = B, color = change)) +
geom_point(size = 4) +
scale_color_manual(values = c('#ed1e26', '#fff205', '#26b24f')) +
theme_classic(base_size = 20) +
scale_y_continuous(breaks = 1:10) +
labs(x = '', y = '') +
theme(legend.position = 'none')

Related

How do I add triangles to a ggplot to point out dosing time points?

I'm working on my research project and I want to add something to my ggplot. I have concentration-time graphs and I want to point out at what point a dose is given. I need to point this out with a triangle just above the graph so that the reader knows at what point a dose is given. An example of what I mean is added underneath.
The data is sensitive, so I can't give you that, but the idea is simple. It's concentration-time data. My code for the actual graph is:
ggplot(data = df, aes(x = "Time", y = "Concentration", col = "Species"))
+ ylab("Concentration (mg/mL)") + xlab ("Time (h)")
+ geom_point() + scale_color_viridis(discrete = T, option = "F", begin = 0, end = 0.8)
+ theme_bw() + scale_y_log10()
I know that there is an annotation() function, but I don't think there's an option for adding triangles to the graph. I haven't tried anything else yet, because I don't know what other options there are. I hope someone can help me with this problem.
Suppose your administration times are at 1, 6, 12 and 18 hours. Then you could do:
admin_times <- c(1, 6, 12, 18)
and
ggplot(data = df, aes(x = Time, y = Concentration, col = Species)) +
ylab("Concentration (mg/mL)") +
scale_x_continuous("Time (h)", breaks = 0:4 * 6, limits = c(1, 24)) +
geom_point() +
scale_color_viridis_d(option = "F", begin = 0, end = 0.8) +
theme_bw() +
scale_y_log10() +
annotate('point', x = admin_times, y = max(df$Concentration)*2,
shape = 25, size = 6, color = 'gray80', fill = 'gray80')
Note that you don't put quotation marks around column names inside aes when creating a ggplot.
Data used:
df <- data.frame(Time = rep(1:24, 2),
Concentration = dexp(c(1:24, 1:24),
rep(c(0.1, 0.15), each = 24)),
Species = rep(c('A', 'B'), each = 24))

Position stacked identity data sample size as geom_text directly over a bar using geom_bar from ggplot2

In this experiment, we tracked presence or absence of bacterial infection in our subject animals. We were able to isolate which type of bacteria was present in our animals and created a plot that has Week Since Experiment Start on the X axis, and Percentage of Animals Positive for bacterial infection on the Y axis. This is a stacked identity ggplot where each geom_bar contains the different identities of the bacteria that were in the infected animals each week. Here is a sample dataset with the corresponding ggplot code and result:
DummyData <- data.frame(matrix(ncol = 5, nrow = 78))
colnames(DummyData) <- c('WeeksSinceStart','BacteriaType','PositiveOccurences','SampleSize','NewSampleSize')
DummyData$WeeksSinceStart <- c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,9,9,9,9,9,10,10,10,10)
DummyData$BacteriaType <- c("BactA","BactB","BactD","BactB","BactE","BactA","BactS","BactF","BactE","BactH","BactJ","BactK","BactE","BactB","BactS","BactF","BactL","BactE","BactW","BactH","BactS","BactJ","BactQ","BactN","BactW","BactA","BactD","BactE","BactA","BactC","BactD","BactK","BactL","BactE","BactD","BactA","BactS","BactK","BactB","BactE","BactF","BactH","BactN","BactE","BactL","BactZ","BactE","BactC","BactR","BactD","BactJ","BactN","BactK","BactW","BactR","BactE","BactW","BactA","BactM","BactG","BactO","BactI","BactE","BactD","BactM","BactH","BactC","BactM","BactW","BactA","BactL","BactB","BactE","BactA","BactS","BactH","BactQ","BactF")
PosOcc <- seq(from = 1, to = 2, by = 1)
DummyData$PositiveOccurences <- rep(PosOcc, times = 13)
DummyData$SampleSize <- c(78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,29,29,29,29,29,10,10,10,10)
DummyData$NewSampleSize <- c(78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,29,NA,NA,NA,NA,10,NA,NA,NA)
numcolor <- 20
plotcolors <- colorRampPalette(brewer.pal(8, "Set3"))(numcolor)
#GGplot for Dummy Data
DummyDataPlot <- ggplot(DummyData, aes(x = WeeksSinceStart, y = PositiveOccurences/SampleSize, fill = BacteriaType)) + geom_bar(position = "stack", stat = "identity") +
geom_text(label = DummyData$NewSampleSize, nudge_y = 0.1) +
scale_y_continuous(limits = c(0,0.6), breaks = seq(0, 1, by = 0.1)) + scale_x_continuous(limits = c(0.5,11), breaks = seq(0,10, by =1)) +
labs(
x = "Weeks Since Start",
y = "Proportion Positive") +
scale_fill_manual(values = plotcolors)
The problem: I cannot seem to find a way to position the labels from geom_text directly over each bar. I would also love to add the text "n = " to the sample size value directly over each bar. Thank you for your help!
I have tried different values for position_dodge statement and nudge_y statement with no success.
Sometimes the easiest approach is to do some data wrangling, i.e. one option would be to create a separate dataframe for your labels:
library(ggplot2)
library(dplyr)
dat_label <- DummyData |>
group_by(WeeksSinceStart) |>
summarise(y = sum(PositiveOccurences / SampleSize), SampleSize = unique(SampleSize))
ggplot(DummyData, aes(x = WeeksSinceStart, y = PositiveOccurences / SampleSize, fill = BacteriaType)) +
geom_bar(position = "stack", stat = "identity") +
geom_text(data = dat_label, aes(x = WeeksSinceStart, y = y, label = SampleSize), inherit.aes = FALSE, nudge_y = .01) +
#scale_y_continuous(limits = c(0, 0.6), breaks = seq(0, 1, by = 0.1)) +
scale_x_continuous(limits = c(0.5, 11), breaks = seq(0, 10, by = 1)) +
labs(
x = "Weeks Since Start",
y = "Proportion Positive"
) +
scale_fill_manual(values = plotcolors)

2D summary plot with counts as labels

I have measurements of a quantity (value) at specific points (lon and lat), like the example data below:
library(ggplot2)
set.seed(1)
dat <- data.frame(lon = runif(1000, 1, 15),
lat = runif(1000, 40, 60),
value = rnorm(1000))
I want to make a 2D summary (e.g. mean) of the measured values with color in space and on top of that I want to show the counts as labels.
I can plot the labels and to the summary plot
## Left plot
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_summary_hex(bins = 5, fun = "mean", geom = "hex")
## Right plot
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_binhex(aes(label = ..count..), bins = 5, geom = "text")
But when I combine both I loose the summary:
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
stat_binhex(aes(label = ..count..), bins = 5, geom = "text")
I can achieve the opposite, counts as color and summary as labels:
ggplot(dat, aes(lon, lat, z = value)) +
geom_hex(bins = 5) +
stat_summary_hex(aes(label=..value..), bins = 5,
fun = function(x) round(mean(x), 3),
geom = "text")
While writing the question, which took some hours of testing, I found a solution: adding a fill=NULL, or fill=mean(value) in the text one gives me what I want. Below the code and their resulting plots; the only difference is the label of the legend.
But it feels very hacky, so I would appreciate a better solution.
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
stat_binhex(aes(label = ..count.., fill = NULL), bins = 5, geom = "text") +
theme_bw()
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
stat_binhex(aes(label = ..count.., fill = mean(value)), bins = 5, geom = "text") +
theme_bw()
I propose a completely different approach to this problem. However, it needs to be clarified a bit first. You write "I have measurements of a quantity (value) at specific points (lon and lat)" but you do not specify these points exactly. Your data (generated) contains 1000 lon points and the same number of lat points.
Anyway, see for yourself.
library(tidyverse)
set.seed(1)
dat <-
tibble(
lon = runif(1000, 1, 15),
lat = runif(1000, 40, 60),
value = rnorm(1000)
)
dat %>% distinct(lon) %>% nrow() #1000
dat %>% distinct(lat) %>% nrow() #1000
My guess is that for real data you have a much smaller set of values for lon and lat.
Let me break it down to an accuracy of 2.
grid = 2
dat %>% mutate(
lon = round(lon/grid)*grid,
lat = round(lat/grid)*grid,
) %>%
group_by(lon, lat) %>%
summarise(
mean = mean(value),
label = n()
)
As you can see after rounding, the data was grouped according to these two variables and then I calculated the statistics you are interested in (mean and number of observations).
Also note that these statistics are generated at the intersection of lon and lat, so we have a square grid. In your solution, this is not the case at all. You are not getting the number of observations at these points and your grid is not square.
So let's make a graph.
dat %>% ggplot(aes(lon,lat,z=mean)) +
geom_contour_filled(binwidth = 0.25) +
geom_text(aes(label = label)) +
theme_bw()
Nothing stands in the way of increasing your grid a bit, let's say 4.
grid = 4
datg = dat %>% mutate(
lon = round(lon/grid)*grid,
lat = round(lat/grid)*grid,
) %>%
group_by(lon, lat) %>%
summarise(
mean = mean(value),
label = n()
)
datg %>% ggplot(aes(lon,lat,z=mean)) +
geom_contour_filled(binwidth = 0.25) +
geom_text(aes(label = label)) +
theme_bw()
Using such a solution, we can easily supplement the labels in the points of interest to us, e.g. with the average value. This time we will use grid = 1.5.
grid = 1.5
datg = dat %>% mutate(
lon = round(lon/grid)*grid,
lat = round(lat/grid)*grid,
) %>%
group_by(lon, lat) %>%
summarise(
mean = mean(value),
label = n(),
lab2 = paste0("(", round(mean, 2), ")")
)
datg %>% ggplot(aes(lon,lat,z=mean)) +
geom_contour_filled(binwidth = 0.25) +
geom_text(aes(label = label)) +
geom_text(aes(label = lab2), nudge_y = -.5, size = 3) +
theme_bw()
Hope this solution fits your needs much better than the stat_binhex based solution.
The problem here is that both plots share the same legend scale.
As the scales ranges are different : 0-40 vs -1.5 - 0.5, the biggest range makes values of the smallest range appear with (almost) the same color.
This is why displaying count as color works, but the opposite doesn't seem to work.
As an illustration, if you rescale the mean calculation, colors variations are visible:
rescaled_mean <- function(x) mean(x)*40
ggplot(dat) +
aes(x = lon, y = lat, z = value) +
stat_summary_hex(bins = 5, fun = "rescaled_mean", geom = "hex")+
stat_binhex(aes(label = ..count..), bins = 5, geom = "text") +
theme_bw()
To be fair, I find this a very strange behaviour. I like your solution though - I really don't find it very hacky to add fill = NULL. In contrary, I find this very elegant. Here a more hacky approach, basically resulting the same, but with one more line. It's using ggnewscale.
library(ggplot2)
set.seed(1)
dat <- data.frame(lon = runif(1000, 1, 15),
lat = runif(1000, 40, 60),
value = rnorm(1000))
ggplot(dat) +
aes(x = lon, y = lat,z = value) +
stat_summary_hex(bins = 5, fun = "mean", geom = "hex") +
ggnewscale::new_scale_fill() +
stat_binhex(aes(label = ..count..), bins = 5, geom = "text")
Created on 2022-02-17 by the reprex package (v2.0.1)

gganimate: two layers with different geometries and timepoints

The problem is similar to this question but here the two layers use different geometries, geom_tile and geom_point. The idea is to have tiles show up at different locations only in frames 2, 5, 8, and the point move along the diagonal in every frame.
When trying to run the following example, I get the error:
Error: time data must be the same class in all layers
Example
require(data.table)
require(ggplot2)
require(gganimate)
# 3 tiles along x = 10-y; present at time points 2, 5, 8
dtP1 = data.table(x = c(1, 5, 9),
y = c(9, 5, 1),
t = c(2, 5, 8))
# 9 points along x=y; present at every time point
dtP2 = data.table(x = 1:9,
y = 1:9,
t = 1:9)
p = ggplot() +
geom_tile(data = dtP1,
aes(x = x,
y = y),
color = "#000000") +
geom_point(data = dtP2,
aes(x = x,
y = y),
color = "#FF0000") +
gganimate::transition_time(t) +
gganimate::ease_aes('linear')
pAnim = gganimate::animate(p,
renderer = av_renderer("~/test.mp4"),
fps = 1,
nframes = 9,
height = 400, width = 400)
Does the following work for you?
library(dplyr)
p <- rbind(dtP1 %>% mutate(group = "group1"),
dtP2 %>% mutate(group = "group2")) %>%
tidyr::complete(t, group) %>%
ggplot(aes(x = x, y = y)) +
geom_tile(data = . %>% filter(group == "group1"),
color = "black") +
geom_point(data = . %>% filter(group == "group2"),
color = "red") +
ggtitle("{frame_time}") + # added this to show the frame explicitly; optional
transition_time(t) +
ease_aes('linear')
animate(p, nframes = 9, fps = 1)

ggplot Multiple facets and combined x axis

I am trying to create a plot to track results over days for multiple factors. Ideally I would like my xaxis to be Day, with the day number centered in the middle of the reps for that particular day, the y axis to be result, and the facet will be the Lot (1-4). I am having difficulty making the day centered on the bottom using repeatable text, as the number of reps may vary.
I was using ideas shown in this post: Multi-row x-axis labels in ggplot line chart but have been unable to make any progress.
Here is some code I have been using and the plot that I have so far. The x axis is far too busy and I am trying to consolidate it.
data <- data.frame(System = rep(c("A", "B"), each = 120), Lot = rep(1:4, each = 30),
Day = rep(1:5, each = 6), Rep = rep(1:6, 40), Result = rnorm(240))
library(ggplot2)
ggplot(data, aes(x = interaction(Day, Rep, lex.order = TRUE), y = Result, color = System, group = System)) +
geom_point() +
geom_line() +
theme(legend.position = "bottom") +
facet_wrap(~Lot, ncol = 1) +
geom_vline(xintercept = (which(data$Rep == 1 & data$Day != 1)), color = "gray60")
I'm not 100% sure if this is exactly what you are after but this will center the day on the x-axis.
library(dplyr)
library(tidyr)
library(ggplot2)
df <- data.frame(System = rep(c("A", "B"), each = 120), Lot = rep(1:4, each = 30),
Day = rep(1:5, each = 6), Rep = rep(1:6, 40), Result = rnorm(240))
df <- df %>%
unite(Day_Rep, Day, Rep, sep = ".", remove = F) %>%
mutate(Day_Rep = as.numeric(Day_Rep))
ggplot(df, aes(x = Day_Rep, y = Result, color = System, group = System)) +
geom_point() +
geom_line() +
theme(legend.position = "bottom") +
facet_wrap(~Lot, ncol = 1) +
scale_x_continuous(labels = df$Day, breaks = df$Day + 0.5)+
geom_vline(xintercept = setdiff(unique(df$Day), 1))

Resources