Using geom_text, nudge half the length of a label in ggplot - r

I wish to add labels to points on a ggplot. The labels should be below each point. There may be multiple labels per point. If so, they should be left-justified. Each label may be a different length.
For each point, the shortest name should be centered below each point. Thus, I wish to nudge_x = half the length of the shortest name for each point.
How do I determine the length of a label so as to nudge half its value?
Example
library("tidyverse")
df <- tibble(
x = c("one", "two"),
y = c(2.5, 1.7),
company = c("Normal", "Short\nA_bit_longer")
)
company_nudge_x <- -0.1
company_nudge_y <- -0.2
ggplot(df, aes(x = x, y = y, group = x)) +
geom_point(size = 5) +
geom_line(aes(group = "x")) +
coord_cartesian(ylim = c(0.3, 2.7)) +
# Labels
geom_text(aes(label = company),
#nudge_x = company_nudge_x,
nudge_y = company_nudge_y,
hjust = 0) # left_justify text

A bit of a hack, but in this solution, for each row of data, you can:
get the first part of the label (if applicable)
count the number of characters
determine the nudge value based on that count (you might have to play around with it, adjusting the value for char_nudge)
And then apply that inside the geom_text() function, inside the aesthetics.
Two things to keep in mind:
Because you have a categorical variable on x, you need to convert it to a factor and then an integer in order to be able to add a nudge to it when using it for the position of the geom_text (thankfully, ggplot2 and as.factor() will both order the levels alphabetically);
This works best with a monospace font (you can try and see what happens if you remove the argument family = "mono": the l is not as wide as other letters, which results in a switch in position).
library(tidyverse)
# example dataframe
df <- tibble(
x = c("one", "two"),
y = c(2.5, 1.7),
company = c("Normalllllllllllll", "Short\nA_bit_longerrrrrr")
)
# set constants
company_nudge_y <- -0.2
char_nudge <- 0.03
# augment dataframe
df <- df %>%
mutate(comp_small = str_extract(company, "^.+"),
len_lab = nchar(comp_small),
nudge_x = -len_lab / 2 * char_nudge)
# plot it
ggplot(df, aes(x = x, y = y, group = x)) +
geom_point(size = 5) +
geom_line(aes(group = "x")) +
coord_cartesian(ylim = c(0.3, 2.7)) +
# Labels
geom_text(aes(x = as.integer(as.factor(x)) + nudge_x, # add the nudge
label = company),
family = "mono", # monospace font will work better
nudge_y = company_nudge_y,
hjust = 0) # left_justify text
Created on 2020-11-27 by the reprex package (v0.3.0)

Related

R control jitter function - avoid overplotting / non-random jitter

My problems seems simple, I am using ggplot2 with geom_jitter() to plot a variable. (take my picture as an example)
Jitter now adds some random noise to the variable (the variable is just called "1" in this example) to prevent overplotting. So I have now random noise in the y-direction and clearly what otherwise would be completely overplotted is now better visible.
But here is my question:
As you can see, there are still some points, that overplot each other. In my example here, this could be easily prevented, if it wouldn't be random noise in y-direction... but somehow more strategically placed offsets.
Can I somehow alter the geom_jitter() behavior or is there a similar function in ggplot2 that does exactly this?
Not really a minimal example, but also not too long:
library("imputeTS")
library("ggplot2")
data <- tsAirgap
# 2.1 Create required data
# Get all indices of the data that comes directly before and after an NA
na_indx_after <- which(is.na(data[1:(length(data) - 1)])) + 1
# starting from index 2 moves all indexes one in front, so no -1 needed for before
na_indx_before <- which(is.na(data[2:length(data)]))
# Get the actual values to the indices and put them in a data frame with a label
before <- data.frame(id = "1", type = "before", input = na_remove(data[na_indx_before]))
after <- data.frame(id = "1", type = "after", input = na_remove(data[na_indx_after]))
all <- data.frame(id = "1", type = "source", input = na_remove(data))
# Get n values for the plot labels
n_before <- length(before$input)
n_all <- length(all$input)
n_after <- length(after$input)
# 2.4 Create dataframe for ggplot2
# join the data together in one dataframe
df <- rbind(before, after, all)
# Create the plot
gg <- ggplot(data = df) +
geom_jitter(mapping = aes(x = id, y = input, color = type, alpha = type), width = 0.5 , height = 0.5)
gg <- gg + ggplot2::scale_color_manual(
values = c("before" = "skyblue1", "after" = "yellowgreen","source" = "gray66"),
)
gg <- gg + ggplot2::scale_alpha_manual(
values = c("before" = 1, "after" = 1,"source" = 0.3),
)
gg + ggplot2::theme_linedraw() + theme(aspect.ratio = 0.5) + ggplot2::coord_flip()
So many good suggestions...here is what Bens suggestion would look like for my example:
I changed parts of my code to:
gg <- ggplot(data = df, aes(x = input, color = type, fill = type, alpha = type)) +
geom_dotplot(binwidth = 15)
Would basically also work as intended for me. ggbeeplot as suggested by Jon also worked great for my purpose.
I thought of a hack I really like, using ggrepel. It's normally used for labels, but nothing preventing you from making the label into a point.
df <- data.frame(x = rnorm(200),
col = sample(LETTERS[1:3], 200, replace = TRUE),
y = 1)
ggplot(df, aes(x, y, label = "●", color = col)) + # using unicode black circle
ggrepel::geom_text_repel(segment.color = NA,
box.padding = 0.01, key_glyph = "point")
A downside of this method is that ggrepel can take a lot time for a large number of points, and will recalculate differently each time you change the plot size. A faster alternative would be to use ggbeeswarm::geom_quasirandom, which uses a deterministic process to define jitter that looks random.
ggplot(df, aes(x,y, color = col)) +
ggbeeswarm::geom_quasirandom(groupOnX = FALSE)

Alternating color of individual dashes in a geom_line

I'm wondering if in a geom_line you can make it so the colors of, say, the dashes within a single line alternate (rather than the colors differing between lines). For example, if I wanted this singular line to alternate red, green, and blue rather than being just red.
library(tidyverse)
ggplot(tibble(x = 1:10, y = 1:10), aes(x, y)) +
geom_line(linetype = "dashed", color = "red") # i'd like to say something like, color = c("red", "green", "blue") instead
While a little inefficient, a little-known thing about R's par(lty=) (that geom_line(linetype=) shares) is that it can be specified as on/off stretches. From ?par under Line Type Specification:
Line types can either be specified by giving an index into a small
built-in table of line types (1 = solid, 2 = dashed, etc, see
'lty' above) ...
(which is what most tutorials/howtos/plots tend to use)
... or directly as the lengths of on/off stretches of
line. This is done with a string of an even number (up to eight)
of characters, namely _non-zero_ (hexadecimal) digits which give
the lengths in consecutive positions in the string. For example,
the string '"33"' specifies three units on followed by three off
and '"3313"' specifies three units on followed by three off
followed by one on and finally three off. The 'units' here are
(on most devices) proportional to 'lwd', and with 'lwd = 1' are in
pixels or points or 1/96 inch.
So with your dat, one could do
dat <- tibble(x = 1:10, y = 1:10)
ggplot(dat, aes(x,y)) +
geom_line(linetype="1741", color="red", size=3) +
geom_line(linetype="1345", color="blue", size=3) +
geom_line(linetype="49", color="green", size=3)
to get
I could not get it to work without one blank space: the on/off stretches must always start with an "on", and end with an "off"; as such I could not find a pattern that didn't (at least once) end on an "on" without an imposed gap.
For further explanation, since we always must start with an "on", I start all three with at least a single pixel of "on"; the trick is to make the "long" stretch for the beginning to be the last line plotted, so it over-plots the others.
red: R.......RRRR.
1 -4--
---7--- 1
grn: G...GGGG.....
1 -4--
-3- --5--
blu: BBBB.........
-4--
----9----
This has some advantages: regardless of size=, it scales the same. For instance, omitting size=,
Using approx:
# number of points at which interpolation takes place
# increase if line takes sharp turns
n = 100
# number of segments along line, according to taste
n_seg = 20
# segment colors
cols = c("red", "green", "blue")
# interpolate
d = approx(dat$x, dat$y, n = n)
# create start and end points for segments
d2 = data.frame(x = head(d$x, -1), xend = d$x[-1],
y = head(d$y, -1), yend = d$y[-1])
# create vector of segment colors
d2$col = rep(cols, each = ceiling((n - 1) / n_seg), length.out = n - 1)
ggplot(d2, aes(x = x, xend = xend, y = y, yend = yend, color = col)) +
geom_segment() + scale_color_identity(guide = "none")
This is an implementation of a new Stat based on GeomSegment which creates alternating segments of different colors. This works by passing the alternating colors to the data frame created in Stat$compute_group. GeomSegment uses StatIdentity, so no need to specifically map xend, yend and color.
BIG THANKS to Henrik for showing a very neat way of creating the segments. (my own way was very convoluted, and I'll leave it in this thread for posterity). The only remaining "problem" is that the segments might have different lengths in changing slopes - on the other hand, it might be visually desirable to have different segment lengths in this case.
library(ggplot2)
## attaching just for demonstration purpose
library(patchwork)
# geom_colorpath
# #description lines with alternating color "just for the effect".
# #name colorpath
# #examples
# #export
StatColorPath <- ggproto("StatColorPath", Stat,
compute_group = function(data, scales, params,
n_seg = 20, n = 100, cols = c("black", "white")) {
# interpolate
d <- approx(data$x, data$y, n = n)
# create start and end points for segments
d2 <- data.frame(
x = head(d$x, -1), xend = d$x[-1],
y = head(d$y, -1), yend = d$y[-1]
)
# create vector of segment colors
d2$color <- rep(cols, each = ceiling((n - 1) / n_seg), length.out = n - 1)
d2
},
required_aes = c("x", "y")
)
# #rdname colorpath
# #import ggplot2
# #inheritParams ggplot2::layer
# #inheritParams ggplot2::geom_segment
# #param n_seg number of segments along line, according to taste
# #param n number of points at which interpolation takes place
# increase if line takes sharp turns
# #param cols vector of alternating colors
# #export
geom_colorpath <- function(mapping = NULL, data = NULL, geom = "segment",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, cols = c("black", "white"),
n_seg = 20, n = 100, ...) {
layer(
stat = StatColorPath, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, cols = cols, n = n, n_seg = n_seg,...)
)
}
## examples
dat <- data.frame(x = seq(2,10, 2), y = seq(4,20, 4))
p1 <- ggplot(dat, aes(x = x, y = y)) +
geom_colorpath()+
ggtitle("Default colors")
p2 <- ggplot(dat, aes(x, y)) +
geom_colorpath(cols = c("red", "blue"))+
ggtitle("Two colors")
p3 <- ggplot(dat, aes(x, y)) +
geom_colorpath(cols = c("red", "blue", "green"))+
ggtitle("Three colors")
p4 <- ggplot(dat, aes(x, y)) +
geom_colorpath(cols = c("red", "blue", "green", "white"))+
ggtitle("Four colors")
wrap_plots(mget(ls(pattern = "p[1-9]")))
air_df <- data.frame(x = 1: length(AirPassengers), y = c(AirPassengers))
a1 <- ggplot(air_df, aes(x, y)) +
geom_colorpath(cols = c("red", "blue", "green"))+
ggtitle("Works also with more complex curves")
a2 <- ggplot(air_df, aes(x, y)) +
geom_colorpath(cols = c("red", "blue", "green"), n_seg = 150)+
ggtitle("... more color segments")
a1 / a2
Created on 2022-06-22 by the reprex package (v2.0.1)
Here's a ggplot hack that is simple, but works for two colors only (your question is the top result when searching for "alternating colored dashed line" and I wanted to put this option out there). It results in two lines being overlayed, one a solid line, the other a dashed line.
library(dplyr)
library(ggplot2)
library(reshape2)
# Create df
x_value <- 1:10
group1 <- c(0,1,2,3,4,5,6,7,8,9)
group2 <- c(0,2,4,6,8,10,12,14,16,18)
dat <- data.frame(x_value, group1, group2) %>%
mutate(group2_2 = group2) %>% # Duplicate the column that you want to be alternating colors
melt(id.vars = "x_value", variable.name = "group", value.name ="y_value") # Long format
# Put in your selected order
dat$group <- factor(dat$group, levels=c("group1", "group2", "group2_2"))
# Plot
ggplot(dat, aes(x=x_value, y=y_value)) +
geom_line(aes(color=group, linetype=group), size=1) +
scale_color_manual(values=c("black", "red", "black")) +
scale_linetype_manual(values=c("solid", "solid", "dashed"))
Unfortunately the legend still needs to be edited by hand. Here's the example plot.

Is there a way to make a line plot that connects emperical pairs of words with ggplot2?

Im not sure what the correct name for this type of plot would be, but lets say we have a list of names (or letters here): data <- data.frame(letters[1:10])
Lets also say that we want to illustrate which of these names are connected based on some empirical decision, so we have a list of observations we want to connect in a plot like the following (done in powerpoint):
Can this be done in ggplot?
Yes, it can be done in ggplot.
Let's start by setting up a data frame of letters, with associated positions on the x and y axis of a plot. We'll make the x values 1 and 2 (though this is arbitrary), and the y values 1:10 (also arbitrary, as long as they are evenly spaced)
labels <- data.frame(x = c(rep(1, 10), rep(2, 10)),
y = rep(1:10, 2),
labs = rep(LETTERS[10:1], 2),
stringsAsFactors = FALSE)
Now we also need some way of deciding which letters will be joined. Let's do this by having a simple data frame of "left" and "right" values, where each row describes which two letters will be joined:
set.seed(69)
joins <- data.frame(left = sample(LETTERS[1:10], 6, TRUE),
right = sample(LETTERS[1:10], 6, TRUE),
stringsAsFactors = FALSE)
joins
#> left right
#> 1 A G
#> 2 B B
#> 3 H J
#> 4 G D
#> 5 G J
#> 6 F B
Now we can assign start and end x and y co-ordinates for the lines by matching the letters in these two columns to the columns in our labels data frame:
joins$x <- rep(1.05, nrow(joins))
joins$xend <- rep(1.9, nrow(joins))
joins$y <- labels$y[match(joins$left, labels$labs)]
joins$yend <- labels$y[match(joins$right, labels$labs)]
This just leaves the plot. We want to get rid of all the axes, titles and legends so we use theme_void:
library(ggplot2)
ggplot(labels, aes(x, y)) +
geom_text(aes(label = labs), size = 8) +
geom_segment(data = joins, aes(xend = xend, yend = yend, color = left),
arrow = arrow(type = "closed", length = unit(0.02, "npc"))) +
coord_cartesian(xlim = c(0.5, 2.5)) +
theme_void() +
theme(legend.position = "none")
Created on 2020-07-10 by the reprex package (v0.3.0)
This solution could be tidied up, but gives a start using geom_segment
library(tidyverse)
tibble(x0 = 0, x1 = 1, y0 = sample(letters[1:10]), y1 = sample(letters[1:10])) %>%
mutate(y0 = factor(y0, levels = rev(letters[1:10])),
y1 = factor(y1, levels = rev(letters[1:10]))) %>%
ggplot(aes(x = x0, xend = x1, y = y0, yend = y1)) +
geom_segment(arrow = arrow(length = unit(0.03, "npc"))) +
geom_text(aes(x = x1, y = y1, label = y1), nudge_x = 0.01)

Plot grouped barplot with absolute and percent values + labels

I am quite new to R and especially to ggplot. For my next result I think I have to change from plot() to ggplot() where I need your help:
I have a dataframe with numeric values. One column is an absolute number, the other one is the belonging percentage value. I have 3 of this "two groups" indicators a, b and c.
The rownames are the 6 observations and are stored in the first column "X".
I want to plot them in a kind of grouped barplot, where the absolute+percent column is next to each other for the 3 indicators.
Sample dataframe:
df = data.frame(X = c("e 1","e 1,5","e 2","e 2,5","e 3","e 3,5","e 4"),
a_abs=c(-0.3693,-0.0735,-0.019,0.0015,0,-0.0224,-0.0135),
a_per=c(-0.4736,-0.0943,-0.0244,0.0019,0,-0.0287,-0.0173),
b_abs=c(-0.384,-0.0733,-0.0173,0.0034,0,-0.0204,-0.0179),
b_per=c(-0.546,-0.1042,-0.0246,0.0048,0,-0.029,-0.0255),
c_abs=c(-0.3876,-0.0738,-0.019,0.0015,0,-0.0225,-0.0137),
c_per=c(-0.4971,-0.0946,-0.0244,0.0019,0,-0.0289,-0.0176))
Thanks to #jonspring i got the following plot by using this code:
df3 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 2),
stat = str_sub(column, start = 4)) %>%
select(-column) %>%
spread(stat, value) %>%
mutate(combo_label = paste(sep="\n",
scales::comma(abs, accuracy = 0.001),
scales::percent(per, accuracy = 0.01)))
df3$group = gsub(df3$group,pattern = "CK",replacement = "Cohen's\nKappa")
df3$group = gsub(df3$group,pattern = "JA",replacement = "Jaccard")
df3$group = gsub(df3$group,pattern = "KA",replacement = "Krippen-\ndorff's Alpha")
crg = ifelse(df3$abs< 0,"red","darkgreen")
ggplot(df3, aes(group, abs, label = combo_label)) +
geom_segment(aes(xend = group,
yend = 0),
color = crg) +
geom_point() +
geom_text(vjust = 1.5,
size = 3,
lineheight = 1.2) +
scale_y_continuous(expand = c(0.2,0)) +
facet_grid(~X) +
labs(x= "Exponent", y = "Wert")
plot output
When i zoom and have the positive values visible, the labels are written inside the segments. How to place them above / below depending of a positive or negative value?
Zoom with coord_cartesian(ylim = c(-0.015,0.005))
zoomed plot
Thank you for your helping hands.
EDIT: I found the solution already. Like the color changement from red to green i used ifelse for the vjust parameter.
There are a lot of varieties of ways to display this sort of data with ggplot. I highly recommend you check out https://r4ds.had.co.nz/data-visualisation.html if you haven't already.
One suggestion you'll find there is that ggplot almost always works better if you first convert your data into long (aka "tidy") form. This puts each of the dimensions of the data into its own column, so that you can map the dimension to a visual aesthetic. Here's one way to do that:
library(tidyverse)
df2 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 1),
stat = str_sub(column, start = 3),
value_label = if_else(stat == "per",
scales::percent(value, accuracy = 0.1),
scales::comma(value, accuracy = 0.01)))
Now, the group a/b/c is in its own column, as is the type of data abs/per, the values are all together in one column, and we also have text labels that suit the type of data.
> head(df2)
X column value group stat value_label
1 e 1 a_abs -0.3693 a abs -0.37
2 e 1,5 a_abs -0.0735 a abs -0.07
3 e 2 a_abs -0.0190 a abs -0.02
4 e 2,5 a_abs 0.0015 a abs 0.00
5 e 3 a_abs 0.0000 a abs 0.00
6 e 3,5 a_abs -0.0224 a abs -0.02
With that out of the way, it's simpler to try out different combinations of ggplot options, which can help highlight different comparisons within the data.
For instance, if you want to compare the different observations within each group, you could put each group into a facet, and each observation along the x axis:
ggplot(df2, aes(X, value, label = value_label)) +
geom_segment(aes(xend = X, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 2, size = 2) +
facet_grid(stat~group)
Or if you want to highlight how the different groups compared within each observation, you could swap them, like this:
ggplot(df2, aes(group, value, label = value_label)) +
geom_segment(aes(xend = group, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 2, size = 2) +
facet_grid(stat~X)
You might also try combining the abs and per data, since they only vary slightly based on the different denominators applicable to each group and/or observation. To do that, it might be simpler to transform the data to keep each abs and per together:
df3 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 1),
stat = str_sub(column, start = 3)) %>%
select(-column) %>%
spread(stat, value) %>%
mutate(combo_label = paste(sep="\n",
scales::comma(abs, accuracy = 0.01),
scales::percent(per, accuracy = 0.1)))
ggplot(df3, aes(group, abs, label = combo_label)) +
geom_segment(aes(xend = group, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 1.5, size = 2, lineheight = 0.8) +
scale_y_continuous(expand = c(0.2,0)) +
facet_grid(~X)

ggrepel: using position_dodge in combination with geom_label_repel

I'm trying to label the outliers in a geom_boxplot using ggrepel::geom_label_repel. It works nicely when there's only one grouping variable, but when I try it for multiple grouping variables I run into a problem. The position argument in ggrepel doesn't seem to work very consistently for some reason, see this example:
library(tidyverse)
library(ggrepel)
set.seed(1337)
df <- tibble(x = rnorm(500),
g1 = factor(sample(c('A','B'), 500, replace = TRUE)),
g2 = factor(sample(c('A','B'), 500, replace = TRUE)),
rownames = 1:500)
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
df_outliers <- df %>% group_by(g1, g2) %>% mutate(outlier=is_outlier(x))
ggplot(df_outliers, aes(x=g1, y=x, fill=g2)) +
geom_boxplot(width=0.3, position = position_dodge(0.5)) +
ggrepel::geom_label_repel(data=. %>% filter(outlier),
aes(label=rownames), position = position_dodge(0.8))
Is there a way to make the labels point to the accompanying dots using ggrepel?
You can try this:
ggplot(df_outliers,
aes(x=g1, y=x, fill=g2, label=rownames)) +
geom_boxplot(width = 0.3, position = position_dodge(0.5)) +
geom_label_repel(data = . %>%
filter(outlier) %>%
group_by(g1) %>%
complete(g2, fill = list(x = 0, rownames = "")),
position = position_dodge(0.5),
box.padding = 1,
min.segment.length = 0,
show.legend = FALSE)
Explanations:
The data source for geom_label_repel() follows aosmith's suggestion to add the B-A combination, filling 0 for x (any number would do, as long as it's not the default NA) and "" for rowname (ggrepel won't plot empty labels, but will take them into account when dodging).
box.padding is set to 1 (increased from the default 0.25) to push the labels further away, so that the line segments are more visible.
min.segment.length is set to 0 (decreased from the default 0.5) to force line segments to be plotted, no matter how short they are.
(show.legend = FALSE is optional. I just don't like seeing "a" letter show up in the legend.)

Resources