ggrepel: using position_dodge in combination with geom_label_repel - r

I'm trying to label the outliers in a geom_boxplot using ggrepel::geom_label_repel. It works nicely when there's only one grouping variable, but when I try it for multiple grouping variables I run into a problem. The position argument in ggrepel doesn't seem to work very consistently for some reason, see this example:
library(tidyverse)
library(ggrepel)
set.seed(1337)
df <- tibble(x = rnorm(500),
g1 = factor(sample(c('A','B'), 500, replace = TRUE)),
g2 = factor(sample(c('A','B'), 500, replace = TRUE)),
rownames = 1:500)
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
df_outliers <- df %>% group_by(g1, g2) %>% mutate(outlier=is_outlier(x))
ggplot(df_outliers, aes(x=g1, y=x, fill=g2)) +
geom_boxplot(width=0.3, position = position_dodge(0.5)) +
ggrepel::geom_label_repel(data=. %>% filter(outlier),
aes(label=rownames), position = position_dodge(0.8))
Is there a way to make the labels point to the accompanying dots using ggrepel?

You can try this:
ggplot(df_outliers,
aes(x=g1, y=x, fill=g2, label=rownames)) +
geom_boxplot(width = 0.3, position = position_dodge(0.5)) +
geom_label_repel(data = . %>%
filter(outlier) %>%
group_by(g1) %>%
complete(g2, fill = list(x = 0, rownames = "")),
position = position_dodge(0.5),
box.padding = 1,
min.segment.length = 0,
show.legend = FALSE)
Explanations:
The data source for geom_label_repel() follows aosmith's suggestion to add the B-A combination, filling 0 for x (any number would do, as long as it's not the default NA) and "" for rowname (ggrepel won't plot empty labels, but will take them into account when dodging).
box.padding is set to 1 (increased from the default 0.25) to push the labels further away, so that the line segments are more visible.
min.segment.length is set to 0 (decreased from the default 0.5) to force line segments to be plotted, no matter how short they are.
(show.legend = FALSE is optional. I just don't like seeing "a" letter show up in the legend.)

Related

Arrows showing direction of path on every nth data point in R ggplot2 plot

I want to use ggplot2 to create a path with arrows in a plot. However, I have a lot of data points and so I only want the arrow on every nth datapoint. I adapted this answer for every nth label to put an observation point every nth data point, but if I try to use this with path I get straight lines between these points. I just want the arrow head.
The MWE below shows my attempt to get the two paths working together (I do want the full path as a line), and what worked for points (that I want to be directional arrows). In my real data set the arrows will point in different directions (so I can't just use a static arrow head as the observation symbol). I am also working with other filtering within the plots, and so creating new data frames that only keep some points is not a convenient solution.
MWE
library(tidyverse)
library(tidyr)
library(dplyr)
x <- seq(from = -100, to = 100, by = 0.01)
y <- x^3 - 2 * x + x
df<- data.frame(x,y)
df$t<- seq(1:nrow(df))
ggplot(data = df, aes(x = x, y = y)) +
geom_path(size = 0.1, aes(colour = t)) +
geom_path(aes(colour = t),data = . %>% filter(row_number() %% 2000 == 0), arrow = arrow(type = 'open', angle = 30, length = unit(0.1, "inches")))
ggplot(data = df, aes(x = x, y = y)) +
geom_path(size = 0.1, aes(colour = t)) +
geom_point(aes(colour = t),data = . %>% filter(row_number() %% 2000 == 0))
You can try to add a grouping variable.
ggplot(data = df, aes(x = x, y = y)) +
geom_path(aes(colour = t, group =factor(gr)),data = . %>% filter(row_number() %% 2000 == 0) %>%
mutate(gr = gl(n()/2, 2)),
arrow = arrow(type = 'open', angle = 30, length = unit(0.1, "inches")))

Using geom_text, nudge half the length of a label in ggplot

I wish to add labels to points on a ggplot. The labels should be below each point. There may be multiple labels per point. If so, they should be left-justified. Each label may be a different length.
For each point, the shortest name should be centered below each point. Thus, I wish to nudge_x = half the length of the shortest name for each point.
How do I determine the length of a label so as to nudge half its value?
Example
library("tidyverse")
df <- tibble(
x = c("one", "two"),
y = c(2.5, 1.7),
company = c("Normal", "Short\nA_bit_longer")
)
company_nudge_x <- -0.1
company_nudge_y <- -0.2
ggplot(df, aes(x = x, y = y, group = x)) +
geom_point(size = 5) +
geom_line(aes(group = "x")) +
coord_cartesian(ylim = c(0.3, 2.7)) +
# Labels
geom_text(aes(label = company),
#nudge_x = company_nudge_x,
nudge_y = company_nudge_y,
hjust = 0) # left_justify text
A bit of a hack, but in this solution, for each row of data, you can:
get the first part of the label (if applicable)
count the number of characters
determine the nudge value based on that count (you might have to play around with it, adjusting the value for char_nudge)
And then apply that inside the geom_text() function, inside the aesthetics.
Two things to keep in mind:
Because you have a categorical variable on x, you need to convert it to a factor and then an integer in order to be able to add a nudge to it when using it for the position of the geom_text (thankfully, ggplot2 and as.factor() will both order the levels alphabetically);
This works best with a monospace font (you can try and see what happens if you remove the argument family = "mono": the l is not as wide as other letters, which results in a switch in position).
library(tidyverse)
# example dataframe
df <- tibble(
x = c("one", "two"),
y = c(2.5, 1.7),
company = c("Normalllllllllllll", "Short\nA_bit_longerrrrrr")
)
# set constants
company_nudge_y <- -0.2
char_nudge <- 0.03
# augment dataframe
df <- df %>%
mutate(comp_small = str_extract(company, "^.+"),
len_lab = nchar(comp_small),
nudge_x = -len_lab / 2 * char_nudge)
# plot it
ggplot(df, aes(x = x, y = y, group = x)) +
geom_point(size = 5) +
geom_line(aes(group = "x")) +
coord_cartesian(ylim = c(0.3, 2.7)) +
# Labels
geom_text(aes(x = as.integer(as.factor(x)) + nudge_x, # add the nudge
label = company),
family = "mono", # monospace font will work better
nudge_y = company_nudge_y,
hjust = 0) # left_justify text
Created on 2020-11-27 by the reprex package (v0.3.0)

R: ggplot2: how to separate labels in stat_summary

I try to plot labels above bars with the stat_summary function and a custom function that I wrote. There are three bars and each should be labeled with the letters a:c, respectively. However, instead of putting one label per bar, all three labels are placed on top of each other:
codes <- c ("a", "b", "c")
simple_y <- function(x) {
return (data.frame (y = mean (x) + 1, label = codes))
}
ggplot (iris, mapping = aes (x = Species, y = Sepal.Length)) +
geom_bar (stat = "summary", fun.y = "mean", fill = "blue", width = 0.7, colour = "black", size = 0.7) +
stat_summary (fun.data = simple_y, geom = "text", size = 10)
I do understand why this is not working: each time the simply_y-function is recycled, it sees the whole codes - vector. However, I have no clue how to tell R to separate the three labels. Is it possible to tell R to subsequently use the n_th element of an input-vector when recycling a function?
Does anybody have a good hint?
I would consider doing something like this:
labels <-
tibble(
Species = factor(c("setosa", "versicolor", "virginica")),
codes = c("a", "b", "c")
)
iris %>%
group_by(Species) %>%
summarize(Mean = mean(Sepal.Length)) %>%
ungroup() %>%
left_join(labels, by = "Species") %>%
ggplot(aes(x = Species, y = Mean)) +
geom_col(fill = "blue", width = 0.7, color = "black", size = 0.7) +
geom_text(aes(y = Mean + 0.3, label = codes), size = 6, show.legend = FALSE)
First, you can generate the data frame with means separately, avoiding the need for geom_bar and stat_summary. Then after joining the manual labels/codes to that summarized data frame, it's pretty straightforward to add them with geom_text.

position_dodge and nudge_y together

I am trying to add labels (letters) above a barplot using ggplot2 function geom_text. My bars are separated using position=position_dodge() and so I need to apply the same for the new labels. However I would like to use also nudge_y to separate the labels from the bar. If I try to use both together R complains that I can use only one of either options. I'd like to do something like this:
Tukey.labels <- geom_text(data=stats,
aes(x=factor(Treatment2), y=x.mean,
label=Tukey.dif),
size=4, nudge_y=3, # move letters in Y
position=position_dodge(0.5)) # move letters in X
To create something like this image Does anybody knows a possibility to shift all my labels the same distance in Y while doing position_dodge at the same time? I could not find answer for this in other posts
Hard to troubleshoot without a reproducible example. Hopefully this helps:
library(dplyr); library(ggplot2)
ggplot(mtcars %>% rownames_to_column("car") ,
aes(as.factor(cyl), mpg, group = car)) +
geom_col(position = position_dodge(0.9)) +
geom_errorbar(aes(ymin = mpg - wt,
ymax = mpg + wt),
position = position_dodge(0.9)) +
geom_text(aes(label = gear, y = mpg + wt), vjust = -0.5,
position = position_dodge(0.9))
In the spirit of the original question, one can easily combine ggplot's position_nudge and position_dodge like this:
position_nudgedodge <- function(x = 0, y = 0, width = 0.75) {
ggproto(NULL, PositionNudgedodge,
x = x,
y = y,
width = width
)
}
PositionNudgedodge <- ggproto("PositionNudgedodge", PositionDodge,
x = 0,
y = 0,
width = 0.3,
setup_params = function(self, data) {
l <- ggproto_parent(PositionDodge,self)$setup_params(data)
append(l, list(x = self$x, y = self$y))
},
compute_layer = function(self, data, params, layout) {
d <- ggproto_parent(PositionNudge,self)$compute_layer(data,params,layout)
d <- ggproto_parent(PositionDodge,self)$compute_layer(d,params,layout)
d
}
)
Then you can use it like this:
Tukey.labels <- geom_text(data=stats,
aes(x=factor(Treatment2), y=x.mean, label=Tukey.dif),
size=4,
position=position_nudgedodge(y=3,width=0.5)
)

Bar/Pie Chart Label from Data Frame Column

I am making a pie chart and want to label it with the value for each slice. I have the information in a data frame but the column in which to look should be defined in the function call.
The code is the (decently) long, but I think only 1 line needs to be changed. I have tried mainsym, as.symbol, as.name, quote, and anything else I could think to throw at it but to no avail.
Thanks
library(dplyr)
library(ggplot2)
library(gridExtra)
pie_chart <- function(df, main, labels, labels_title=NULL) {
mainsym <- as.symbol(main)
labelssym <- as.symbol(labels)
# convert the data into percentages. add label position and inner label text
df <- df %>%
mutate(perc = mainsym / sum(mainsym)) %>%
mutate(label_pos = 1 - cumsum(perc) + perc / 2,
inner_label_text = paste0(round(perc * 100), "%\n",main)) #NEED HELP HERE! Replace 'main' with something
#debug print statement
print(df)
# reorder the category factor levels to order the legend
df[[labels]] <- factor(df[[labels]], levels = unique(df[[labels]]))
p <- ggplot(data = df, aes_(x = factor(1), y = ~perc, fill = labelssym)) +
# make stacked bar chart with black border
geom_bar(stat = "identity", color = "black", width = 1) +
# add the percents and values to the interior of the chart
geom_text(aes(x = 1.25, y = label_pos, label = inner_label_text), size = 4) +
# convert to polar coordinates
coord_polar(theta = "y",direction=-1)
return(p)
}
set.seed(42)
donations <- data.frame(donation_total=sample(1:1E5,50,replace=TRUE))
donation_size_levels_same <- seq(0,2E6,10E3)
donations$bracket <- cut(donations$donation_total,breaks=donation_size_levels_same,right=FALSE,dig.lab = 50)
donations.by_bracket <- donations %>%
group_by(bracket) %>%
summarize(n=n(),total=sum(donation_total)) %>%
ungroup() %>%
arrange(bracket)
grid.arrange(
pie_chart(df=donations.by_bracket,main="n",labels="bracket",labels_title="Total Amount Donated"),
pie_chart(df=donations.by_bracket,main="total",labels="bracket",labels_title="Total Amount Donated"))
The label placement still needs some adjustment but this seems to address the labelling issue, if you just replace that one line (where you say need help here) as follows:
mutate(label_pos = 1 - cumsum(perc) + perc / 2,
inner_label_text = paste0(round(perc * 100), "%\n",as.character(df[[main]])))

Resources