How do you label a beeswarm plot in ggplot2? - r

I am trying to add labels to a beeswarm plot I am making using ggplot2. However, it seems as if the labels are pointing to the center line, and not the individual dots. Here is my code:
library(ggbeeswarm)
library(tidyverse)
DataTest <- tibble(Category = c(LETTERS),
Year = runif(26, 2016, 2016),
Size = runif(26, min = 5, max = 10),
SalesGrowth = runif(26, -1, 1))
ggplot() +
coord_flip() +
geom_quasirandom(DataTest,
mapping = aes(factor(Year),
SalesGrowth,
size = Size)) +
geom_label_repel(DataTest %>% filter(Category %in% c('A', 'B', 'C')),
mapping = aes(factor(Year),
SalesGrowth,
label = Category),
box.padding = 2) +
scale_size_binned() +
theme(legend.position = "none")
And here is what the output is looking like visually. I want my labels to point to the respective dots.

This could be achieved like so:
Make use of position_quasirandom in geom_label_repel
As a general rule when using ggrepel, pass the whole data to geom_label_repel and set undesired labels equal to "" instead of filtering the data.
library(ggplot2)
library(ggbeeswarm)
library(ggrepel)
DataTest <- data.frame(Category = c(LETTERS),
Year = runif(26, 2016, 2016),
Size = runif(26, min = 5, max = 10),
SalesGrowth = runif(26, -1, 1))
set.seed(42)
ggplot() +
coord_flip() +
geom_quasirandom(DataTest,
mapping = aes(factor(Year),
SalesGrowth,
size = Size)) +
geom_label_repel(data = DataTest, mapping = aes(factor(Year),
SalesGrowth,
label = ifelse(Category %in% c('A', 'B', 'C'), Category, "")),
position=position_quasirandom(),
box.padding = 2, seed = 42) +
scale_size_binned() +
theme(legend.position = "none")

Related

Position stacked identity data sample size as geom_text directly over a bar using geom_bar from ggplot2

In this experiment, we tracked presence or absence of bacterial infection in our subject animals. We were able to isolate which type of bacteria was present in our animals and created a plot that has Week Since Experiment Start on the X axis, and Percentage of Animals Positive for bacterial infection on the Y axis. This is a stacked identity ggplot where each geom_bar contains the different identities of the bacteria that were in the infected animals each week. Here is a sample dataset with the corresponding ggplot code and result:
DummyData <- data.frame(matrix(ncol = 5, nrow = 78))
colnames(DummyData) <- c('WeeksSinceStart','BacteriaType','PositiveOccurences','SampleSize','NewSampleSize')
DummyData$WeeksSinceStart <- c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,9,9,9,9,9,10,10,10,10)
DummyData$BacteriaType <- c("BactA","BactB","BactD","BactB","BactE","BactA","BactS","BactF","BactE","BactH","BactJ","BactK","BactE","BactB","BactS","BactF","BactL","BactE","BactW","BactH","BactS","BactJ","BactQ","BactN","BactW","BactA","BactD","BactE","BactA","BactC","BactD","BactK","BactL","BactE","BactD","BactA","BactS","BactK","BactB","BactE","BactF","BactH","BactN","BactE","BactL","BactZ","BactE","BactC","BactR","BactD","BactJ","BactN","BactK","BactW","BactR","BactE","BactW","BactA","BactM","BactG","BactO","BactI","BactE","BactD","BactM","BactH","BactC","BactM","BactW","BactA","BactL","BactB","BactE","BactA","BactS","BactH","BactQ","BactF")
PosOcc <- seq(from = 1, to = 2, by = 1)
DummyData$PositiveOccurences <- rep(PosOcc, times = 13)
DummyData$SampleSize <- c(78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,29,29,29,29,29,10,10,10,10)
DummyData$NewSampleSize <- c(78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,29,NA,NA,NA,NA,10,NA,NA,NA)
numcolor <- 20
plotcolors <- colorRampPalette(brewer.pal(8, "Set3"))(numcolor)
#GGplot for Dummy Data
DummyDataPlot <- ggplot(DummyData, aes(x = WeeksSinceStart, y = PositiveOccurences/SampleSize, fill = BacteriaType)) + geom_bar(position = "stack", stat = "identity") +
geom_text(label = DummyData$NewSampleSize, nudge_y = 0.1) +
scale_y_continuous(limits = c(0,0.6), breaks = seq(0, 1, by = 0.1)) + scale_x_continuous(limits = c(0.5,11), breaks = seq(0,10, by =1)) +
labs(
x = "Weeks Since Start",
y = "Proportion Positive") +
scale_fill_manual(values = plotcolors)
The problem: I cannot seem to find a way to position the labels from geom_text directly over each bar. I would also love to add the text "n = " to the sample size value directly over each bar. Thank you for your help!
I have tried different values for position_dodge statement and nudge_y statement with no success.
Sometimes the easiest approach is to do some data wrangling, i.e. one option would be to create a separate dataframe for your labels:
library(ggplot2)
library(dplyr)
dat_label <- DummyData |>
group_by(WeeksSinceStart) |>
summarise(y = sum(PositiveOccurences / SampleSize), SampleSize = unique(SampleSize))
ggplot(DummyData, aes(x = WeeksSinceStart, y = PositiveOccurences / SampleSize, fill = BacteriaType)) +
geom_bar(position = "stack", stat = "identity") +
geom_text(data = dat_label, aes(x = WeeksSinceStart, y = y, label = SampleSize), inherit.aes = FALSE, nudge_y = .01) +
#scale_y_continuous(limits = c(0, 0.6), breaks = seq(0, 1, by = 0.1)) +
scale_x_continuous(limits = c(0.5, 11), breaks = seq(0, 10, by = 1)) +
labs(
x = "Weeks Since Start",
y = "Proportion Positive"
) +
scale_fill_manual(values = plotcolors)

ggforce facet_zoom error with ggplot2 on R

I have a data.frame in R 4.0.2 with a continuous variable in one column and two possible values of a categorical variable (variable 'type': known or novel) in another, which I use to color them differently (using a palette from ggsci 2.9 package). I represent an histogram (stat_bin) with ggplot2 3.3.2 and I want to use the facet_zoom function of ggforce 0.3.2 to zoom only the data belonging to one of the 'types' (using the option zoom.data, as it is done in the volcano example on http://cran.univ-paris1.fr/web/packages/ggforce/vignettes/Visual_Guide.html#contextual-zoom), however I get this error:
Error: Aesthetics must be either length 1 or the same as the data (2000): x
Reproducible example:
library(ggplot2)
library(ggsci)
library(ggforce)
testdata <- as.data.frame(sort(rnorm(1000)))
testdata$type <- "known"
testdata[501:1000,2] <- "novel"
# Working code
ggplot(testdata) +
stat_bin(aes(x=testdata[,1], fill = type), binwidth = 1, color="white") +
scale_fill_npg() + theme_light() +
facet_zoom(xlim = c(0, 4), ylim = c(0, 300), horizontal = TRUE, zoom.size = 0.3)
# Desired code
ggplot(testdata) +
stat_bin(aes(x=testdata[,1], fill = type), data = cbind(testdata, zoom = FALSE), binwidth = 1, color="white") +
stat_bin(aes(x=testdata[testdata$type == "novel",1]), data = cbind(testdata, zoom = TRUE), binwidth = 0.5) +
scale_fill_npg() + theme_light() +
facet_zoom(xlim = c(0, 4), ylim = c(0, 300), horizontal = TRUE, zoom.size = 0.3, zoom.data = zoom)
Thanks!
The issue is that you pass the whole dataset as data in the second stat_bin. Simply pass the subsetted df instead of trying to subset in aes():
BTW: I also renamed the first variable in your data as x.
library(ggplot2)
library(ggsci)
library(ggforce)
set.seed(42)
testdata <- data.frame(x = sort(rnorm(1000)))
testdata$type <- "known"
testdata[501:1000,2] <- "novel"
# Desired code
ggplot(testdata) +
stat_bin(aes(x = x, fill = type), data = cbind(testdata, zoom = FALSE), binwidth = 1, color="white") +
stat_bin(aes(x = x), data = cbind(testdata[testdata$type == "novel", ], zoom = TRUE), binwidth = 0.5) +
scale_fill_npg() + theme_light() +
facet_zoom(xlim = c(0, 4), ylim = c(0, 300), horizontal = TRUE, zoom.size = 0.3, zoom.data = zoom)
To only show the type == "novel" data in the zoomed plot, try this:
library(tidyverse)
library(ggsci)
library(ggforce)
testdata <- data.frame(values = sort(rnorm(1000)))
testdata$type <- "known"
testdata[501:1000,2] <- "novel"
# Desired code
ggplot(testdata) +
stat_bin(aes(x = values, fill = type),
binwidth = 1, color="white") +
scale_fill_npg() + theme_light() +
facet_zoom(zoom.data = ifelse(type == "novel", NA, FALSE),
xlim = c(0, 4), ylim = c(0, 300),
horizontal = TRUE)

How to: Two horizontal Barplots "on top of eachother"

I have a data frame with: Fail [3,3,3,1] and Pass [50,40,50,10]
I just want to make a barplot of Fail and Pass
b_f <- barplot(dat_record$Fail[1], horiz = TRUE, ylab = "FAIL", las = 2, col = "red", xlim = c(0,200))
b_p <- barplot(dat_record$Pass[2], horiz = TRUE, ylab = "PASS", las = 2, col = "green", xlim = c(0,200))
How can i put this two barplots on top of eachother in one graphic/diagram, like this:
And second question:
How can i do this properly with ggplot2? I tried it out, but i always failed with:
ggplot(dat_failpass, aes = (x = fail, fill = "red")+
geom_bar(position = "dodge")+
coord_flip()
Can someone answer me this two question or can you give me any tipps? I'm new into this.
Thank you.
Since you want just the first value of the vectors "Fail" and "Pass" value, this code chunk must plot what you want:
library(ggplot2)
fail = c(3, 3, 3, 1)
pass = c(50, 40, 50, 10)
df = data.frame(value = c(fail[1], pass[1]), label = c('Fail', 'Pass'))
ggplot(df, aes(x = label, y = value)) +
geom_bar(stat = 'identity', position = 'stack') +
coord_flip() +
labs(y = 'Count') +
theme(axis.title.y = element_blank())
Here is the output:
Let us know if this solution solved your problem.
Using your data in this format, here the code for plot:
library(tidyverse)
#Data
df <- structure(list(Fail = c(3, 3, 3, 1), Pass = c(50, 40, 50, 10)), class = "data.frame", row.names = c(NA,
-4L))
Code:
#Reshape and plot
df %>% pivot_longer(cols = everything()) %>%
#Plot
ggplot(aes(x=name,y=value))+
geom_bar(stat = 'identity',fill='gray')+
coord_flip()+
theme_bw()
Output:

ggplot Multiple facets and combined x axis

I am trying to create a plot to track results over days for multiple factors. Ideally I would like my xaxis to be Day, with the day number centered in the middle of the reps for that particular day, the y axis to be result, and the facet will be the Lot (1-4). I am having difficulty making the day centered on the bottom using repeatable text, as the number of reps may vary.
I was using ideas shown in this post: Multi-row x-axis labels in ggplot line chart but have been unable to make any progress.
Here is some code I have been using and the plot that I have so far. The x axis is far too busy and I am trying to consolidate it.
data <- data.frame(System = rep(c("A", "B"), each = 120), Lot = rep(1:4, each = 30),
Day = rep(1:5, each = 6), Rep = rep(1:6, 40), Result = rnorm(240))
library(ggplot2)
ggplot(data, aes(x = interaction(Day, Rep, lex.order = TRUE), y = Result, color = System, group = System)) +
geom_point() +
geom_line() +
theme(legend.position = "bottom") +
facet_wrap(~Lot, ncol = 1) +
geom_vline(xintercept = (which(data$Rep == 1 & data$Day != 1)), color = "gray60")
I'm not 100% sure if this is exactly what you are after but this will center the day on the x-axis.
library(dplyr)
library(tidyr)
library(ggplot2)
df <- data.frame(System = rep(c("A", "B"), each = 120), Lot = rep(1:4, each = 30),
Day = rep(1:5, each = 6), Rep = rep(1:6, 40), Result = rnorm(240))
df <- df %>%
unite(Day_Rep, Day, Rep, sep = ".", remove = F) %>%
mutate(Day_Rep = as.numeric(Day_Rep))
ggplot(df, aes(x = Day_Rep, y = Result, color = System, group = System)) +
geom_point() +
geom_line() +
theme(legend.position = "bottom") +
facet_wrap(~Lot, ncol = 1) +
scale_x_continuous(labels = df$Day, breaks = df$Day + 0.5)+
geom_vline(xintercept = setdiff(unique(df$Day), 1))

showing different units in each free_y of facet_grid

I have plotted two facets one on top of the other with two different ys (a percentage and a cost) and the same x (Years). I took most of the ideas from this post and some variations of the same.
I'd like to show the labels of the y axis as percentages for the rate and as £ for the costs, but I have been unable to change each y label format independently.
Below a reproducible example using facet_grid (I managed to create a similar thing with facet_wrap but I get stuck with the same problem).
I considered using grid.arrange() from the gridExtra package, but it seemed that would bring other issues with the legend.
library(plyr)
library(tidyr)
library(dplyr)
library(ggplot2)
library(scales)
set.seed(12345)
my_labels <- function(variable, value){
names_li <- list("percentage", "cost in pounds")
return(names_li[value])
}
df <- data.frame(
rate = runif(10, 0, 1),
cost = rnorm(10, 100, 40),
years = seq(from = 2001, to = 2010)
)
df %>%
gather(type_of_var,
value,
rate:cost) ->
df2
df2 %>%
ggplot(aes(x = years,
y = value,
ymin = 0,
ymax = .1)) +
facet_grid(type_of_var ~ .,
scales = 'free_y',
labeller = my_labels) +
labs(x = "Year",
y = "") +
geom_point(subset = . (type_of_var == "rate")) +
geom_line(subset = . (type_of_var == "rate"),
colour = "grey") +
## the following two lines don't work
# scale_y_continuous(subset = . (type_of_var == "rate"),
# labels = percent) +
geom_bar(subset = . (type_of_var == "cost"),
stat = "identity") +
theme_bw() +
theme(strip.text.y = element_text(size = 15,
colour = "black"),
plot.title = element_text(lineheight = 0.8,
face = "bold")) +
scale_x_continuous(breaks = seq(2001, 2010, 1)) +
labs(title = "free_y y axis labels")
Thanks
as a fragile workaround, you could use
label_fun <- function (x) {
if(max(x, na.rm=TRUE) > 1) dollar(x) else percent(x)
}
(assuming you only deal with big money and small percentages)

Resources