how to automate the legend in a ggplot chart? - r

Consider this simple example
library(dplyr)
library(forcats)
library(ggplot2)
mydata <- data_frame(cat1 = c(1,1,2,2),
cat2 = c('a','b','a','b'),
value = c(10,20,-10,-20),
time = c(1,2,1,2))
mydata <- mydata %>% mutate(cat1 = factor(cat1),
cat2 = factor(cat2))
> mydata
# A tibble: 4 x 4
cat1 cat2 value time
<fct> <fct> <dbl> <dbl>
1 1 a 10.0 1.00
2 1 b 20.0 2.00
3 2 a -10.0 1.00
4 2 b -20.0 2.00
Now, I want to create a chart where I interact the two factor variables.
I know I can use interact in ggplot2 (see below).
My big problem is that I do not know how to automate the labeling (and the colouring) of the interactions so that I can avoid any manual error using scale_colour_manual.
For instance:
ggplot(mydata,
aes(x = time, y = value, col = interaction(cat1, cat2) )) +
geom_point(size=15) + theme(legend.position="bottom")+
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
theme(legend.position="bottom",
legend.text=element_text(size=12, face = "bold")) +
scale_colour_manual(name = ""
, values=c("red","red4","royalblue","royalblue4")
, labels=c("1-b","1-a"
,"2-a","2-b"))
shows:
which has the wrong labels because of a (voluntarily) mistake I made in scale_colour_manual(). Indeed, the bright red dot is 1-a and not 1-b (note how the labels are simply the concatenation of the variable names). The idea is that with more factor levels, guessing the right order can be tricky.
Is there a way to automate this labeling (even better: labeling AND coloring)? Perhaps using forcats? Perhaps creating the labels as strings in the dataframe beforehand?
Thanks!

If the number of factor levels for cat1 / cat2 are not fixed (but could potentially be much larger than 2), I would try to calculate the appropriate colours with hsv(), rather than assign them manually.
The colour cheatsheet here summarise the HSV colour model rather nicely:
Hue (h) is essentially your rainbow colour wheel, Saturation (s) determines how intense the colour is, and Value (v) how dark it is. Each parameter accepts values in the range [0, 1].
Here's how I would adapt it for this use case:
mydata2 <- mydata %>%
# use "-" instead of the default "." since we are using that for the labels anyway
mutate(interacted.variable = interaction(cat1, cat2, sep = "-")) %>%
# cat1: assign hue evenly across the whole wheel,
# cat2: restrict both saturation & value to the [0.3, 1], as it can look too
# faint / dark otherwise
mutate(colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
# create the vector of colours for scale_colour_manual()
manual.colour <- mydata2 %>% select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)
> colour.vector
1-a 1-b 2-a 2-b
"#3AA6A6" "#00FFFF" "#A63A3A" "#FF0000"
With the colours calculated automatically for any number of factors, plotting becomes quite straightforward:
ggplot(mydata2,
aes(x = time, y = value, colour = interacted.variable)) +
geom_point(size = 15) +
scale_colour_manual(name = "",
values = colour.vector,
breaks = names(colour.vector)) +
theme(legend.position = "bottom")
An illustration with more factor levels (code is the same except for the addition of specifying guide_legend(byrow = TRUE) in the colour scale:
mydata3 <- data.frame(
cat1 = factor(rep(1:3, times = 5)),
cat2 = rep(LETTERS[1:5], each = 3),
value = 1:15,
time = 15:1
) %>%
mutate(interacted.variable = interaction(cat1, cat2, sep = "-"),
colour = hsv(h = as.integer(cat1) / length(levels(cat1)),
s = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2)),
v = 0.3 + 0.7 * as.integer(cat2) / length(levels(cat2))))
manual.colour <- mydata3 %>% arrange(cat1, cat2) %>%
select(interacted.variable, colour) %>% unique()
colour.vector <- manual.colour$colour
names(colour.vector) <- manual.colour$interacted.variable
rm(manual.colour)
ggplot(mydata3,
aes(x = time, y = value, colour = interacted.variable)) +
geom_point(size = 15) +
scale_colour_manual(name = "",
values = colour.vector,
breaks = names(colour.vector),
guide = guide_legend(byrow = TRUE)) +
theme(legend.position = "bottom")

Related

R - (ggplot2 library) - Legends not showing on graphs

What I'm doing
I'm using a library for R called ggplot2, which allows for a lot of different options for creating graphics and other things. I'm using that to display two different data sets on one graph with different colours for each set of data I want to display.
The Problem
I'm also trying to get a legend to to show up in my graph that will tell the user which set of data corresponds to which colour. So far, I've not been able to get it to show.
What I've tried
I've set it to have a position at the top/bottom/left/right to make sure nothing was making it's position to none by default, which would've hidden it.
The Code
# PDF/Plot generation
pdf("activity-plot.pdf")
ggplot(data.frame("Time"=times), aes(x=Time)) +
#Data Set 1
geom_density(fill = "#1A3552", colour = "#4271AE", alpha = 0.8) +
geom_text(x=mean(times)-1, y=max(density(times)$y/2), label="Mean {1} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(times)), color="cyan", linetype="dashed", size=1, alpha = 0.5) +
# Data Set 2
geom_density(data=data.frame("Time"=timesSec), fill = "gray", colour = "orange", alpha = 0.8) +
geom_text(x=mean(timesSec)-1, y=max(density(timesSec)$y/2), label="Mean {2} Activity", angle=90, size = 4) +
geom_vline(aes(xintercept=mean(timesSec)), color="orange", linetype="dashed", size=1, alpha = 0.5) +
# Main Graph Info
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
dev.off()
Result
As pointed out by #Ben, you should pass the color into an aes in order to get the legend being displayed.
However, a better way to get a ggplot is to merge your two values "Time" and "Timesec" into a single dataframe and reshape your dataframe into a longer format. Here, to illustrate this, I created this dummy dataframe:
Time = sample(1:24, 200, replace = TRUE)
Timesec = sample(1:24, 200, replace = TRUE)
df <- data.frame(Time, Timesec)
Time Timesec
1 22 23
2 21 9
3 19 9
4 10 6
5 7 24
6 15 9
... ... ...
So, the first step is to reshape your dataframe into a longer format. Here, I'm using pivot_longer function from tidyr package:
library(tidyr)
library(dplyr)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val")
# A tibble: 400 x 2
var val
<chr> <int>
1 Time 22
2 Timesec 23
3 Time 21
4 Timesec 9
5 Time 19
6 Timesec 9
7 Time 10
8 Timesec 6
9 Time 7
10 Timesec 24
# … with 390 more rows
To add geom_vline and geom_text based on the mean of your values, a nice way of doing it easily is to create a second dataframe gathering the mean and the maximal density values needed to be plot:
library(tidyr)
library(dplyr)
df_lab <- df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
group_by(var) %>%
summarise(Mean = mean(val),
Density = max(density(val)$y))
# A tibble: 2 x 3
var Mean Density
<chr> <dbl> <dbl>
1 Time 11.6 0.0555
2 Timesec 12.1 0.0517
So, using df and df_lab, you can generate your entire plot. Here, we passed color and fill arguments into the aes and use scale_color_manual and scale_fill_manual to set appropriate colors:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>% pivot_longer(everything(), names_to = "var",values_to = "val") %>%
ggplot(aes(x = val, fill = var, colour = var))+
geom_density(alpha = 0.8)+
scale_color_manual(values = c("#4271AE", "orange"))+
scale_fill_manual(values = c("#1A3552", "gray"))+
geom_vline(inherit.aes = FALSE, data = df_lab,
aes(xintercept = Mean, color = var), linetype = "dashed", size = 1,
show.legend = FALSE)+
geom_text(inherit.aes = FALSE, data = df_lab,
aes(x = Mean-0.5, y = Density/2, label = var, color = var), angle = 90,
show.legend = FALSE)+
labs(title="Activity in the past 48 hours", subtitle="From {DATE 1} to {DATE 2}", caption="{LOCATION}") +
scale_x_continuous(name = "Time of Day", breaks=seq(c(0:23))) +
scale_y_continuous(name = "Activity") +
theme(legend.position="top")
Does it answer your question ?

Geom_freqpoly with Predefined Count

I can plot geom_freqpoly without problems using the number of observation
ggplot(data=demo) +
geom_freqpoly(mapping=aes(x = value))
But I'd like to use the precalculated obeservation count contained in the data.
I tried using stat = "identity" but it apparently doesn't work.
ggplot(data=demo) +
geom_freqpoly(mapping=aes(x = value, y = cnt), stat = "identity")
This is my sample data
demo <- tribble(
~value, ~cnt,
.25, 20,
.25, 30,
.1, 40
)
TL;DR: You didn't get the graph you want, because the data of pre-calculated counts you passed to ggplot was NOTHING like what was used to produce the freqpoly graph.
Since you didn't include code for the original demo used to generate graph 1, I'll venture a guess:
demo.orig <- data.frame(value = c(0.25, 0.25, 0.1))
p <- ggplot(demo.orig, aes(x = value)) +
geom_freqpoly()
p # show plot to verify its appearance, which matches the graph in the question
layer_data(p) # look at the calculated data used by geom_freqpoly
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
y count x xmin xmax width density ncount ndensity PANEL group colour size linetype alpha
1 0 0 0.09310345 0.09051724 0.09568966 0.005172414 0.00000 0.0 0.0 1 -1 black 0.5 1 NA
2 1 1 0.09827586 0.09568966 0.10086207 0.005172414 64.44444 0.5 0.5 1 -1 black 0.5 1 NA
3 0 0 0.10344828 0.10086207 0.10603448 0.005172414 0.00000 0.0 0.0 1 -1 black 0.5 1 NA
... (omitted to conserve space)
30 0 0 0.24310345 0.24051724 0.24568966 0.005172414 0.00000 0.0 0.0 1 -1 black 0.5 1 NA
31 2 2 0.24827586 0.24568966 0.25086207 0.005172414 128.88889 1.0 1.0 1 -1 black 0.5 1 NA
32 0 0 0.25344828 0.25086207 0.25603448 0.005172414 0.00000 0.0 0.0 1 -1 black 0.5 1 NA
From a small dataframe with only two unique values, stat_bin generated a much larger dataframe with the x-axis split into 30 bins (the default number), and count / y = 0 everywhere except for the two bins containing the original values.
> geom_freqpoly
function (mapping = NULL, data = NULL, stat = "bin", position = "identity",
..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
{
params <- list(na.rm = na.rm, ...)
if (identical(stat, "bin")) {
params$pad <- TRUE
}
layer(data = data, mapping = mapping, stat = stat, geom = GeomPath,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = params)
}
A quick check by printing geom_freqpoly to console shows that its underlying geom is simply GeomPath, which plots x/y pairs in sequential order.
In other words, if you want to get the peaks from graph 1, you need to provide a similar dataset, with rows indicating where y should drop to 0. While it's certainly possible to calculate this by digging into the code for StatBin$compute_group, I'd think it's simpler to expand from the data of pre-calculated counts and let ggplot do its normal job:
demo %>%
tidyr::uncount(cnt) %>%
ggplot(aes(x = value)) +
geom_freqpoly() +
theme_minimal()
Edit: solution without fully expanding dataframe of aggregated counts
Sample dataset with 2 groups:
demo <- data.frame(value = c(0.25, 0.5, 0.1, 0.25, 0.75, 0.1),
cnt = c(5, 2, 4, 3, 8, 7) * 10e8,
group = rep(c("a", "b"), each = 3))
Code:
library(ggplot2)
library(dplyr)
demo %>%
rename(x = value, y = cnt) %>% # rename here so approach below can be easily applied
# to other datasets with different column names
tidyr::nest(data = c(x, y)) %>% # nest to apply same approach for each group
mutate(data = purrr::map(
data,
function(d) ggplot2:::bin_vector( # cut x's range into appropriate bins
x = d$x,
bins = ggplot2:::bin_breaks_bins(
x_range = range(d$x),
bins = 30), # default bin count is 30; change if desired
pad = TRUE) %>%
select(x, xmin, xmax) %>%
# place y counts into the corresponding x bins (this is probably similar
# to interval join, but I don't have that package installed on my machine)
tidyr::crossing(d %>% rename(x2 = x)) %>%
mutate(y = ifelse(x2 >= xmin & x2 < xmax, y, 0)) %>%
select(-x2) %>%
group_by(x) %>%
filter(y == max(y)) %>%
ungroup() %>%
unique())) %>%
tidyr::unnest(cols = c(data)) %>% # unnest to get one flat dataframe back
ggplot(aes(x = x, y = y, colour = group)) + # plot as per normal
geom_path() +
theme_bw()
# package versions used: dplyr 1.0.0, ggplot2 3.3.1, tidyr 1.1.0, purrr 0.3.4
Based on the similar problem for histograms the solution seems to be as simple as to use the weight parameter in the aesthetics.
The solution using the sample data from the other answer would be
demo <- data.frame(value = c(0.25, 0.5, 0.1, 0.25, 0.75, 0.1),
cnt = c(5, 2, 4, 3, 8, 7) * 10e8,
group = rep(c("a", "b"), each = 3))
ggplot(demo, aes(value, weight = cnt, color = group)) + geom_freqpoly()

Plot grouped barplot with absolute and percent values + labels

I am quite new to R and especially to ggplot. For my next result I think I have to change from plot() to ggplot() where I need your help:
I have a dataframe with numeric values. One column is an absolute number, the other one is the belonging percentage value. I have 3 of this "two groups" indicators a, b and c.
The rownames are the 6 observations and are stored in the first column "X".
I want to plot them in a kind of grouped barplot, where the absolute+percent column is next to each other for the 3 indicators.
Sample dataframe:
df = data.frame(X = c("e 1","e 1,5","e 2","e 2,5","e 3","e 3,5","e 4"),
a_abs=c(-0.3693,-0.0735,-0.019,0.0015,0,-0.0224,-0.0135),
a_per=c(-0.4736,-0.0943,-0.0244,0.0019,0,-0.0287,-0.0173),
b_abs=c(-0.384,-0.0733,-0.0173,0.0034,0,-0.0204,-0.0179),
b_per=c(-0.546,-0.1042,-0.0246,0.0048,0,-0.029,-0.0255),
c_abs=c(-0.3876,-0.0738,-0.019,0.0015,0,-0.0225,-0.0137),
c_per=c(-0.4971,-0.0946,-0.0244,0.0019,0,-0.0289,-0.0176))
Thanks to #jonspring i got the following plot by using this code:
df3 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 2),
stat = str_sub(column, start = 4)) %>%
select(-column) %>%
spread(stat, value) %>%
mutate(combo_label = paste(sep="\n",
scales::comma(abs, accuracy = 0.001),
scales::percent(per, accuracy = 0.01)))
df3$group = gsub(df3$group,pattern = "CK",replacement = "Cohen's\nKappa")
df3$group = gsub(df3$group,pattern = "JA",replacement = "Jaccard")
df3$group = gsub(df3$group,pattern = "KA",replacement = "Krippen-\ndorff's Alpha")
crg = ifelse(df3$abs< 0,"red","darkgreen")
ggplot(df3, aes(group, abs, label = combo_label)) +
geom_segment(aes(xend = group,
yend = 0),
color = crg) +
geom_point() +
geom_text(vjust = 1.5,
size = 3,
lineheight = 1.2) +
scale_y_continuous(expand = c(0.2,0)) +
facet_grid(~X) +
labs(x= "Exponent", y = "Wert")
plot output
When i zoom and have the positive values visible, the labels are written inside the segments. How to place them above / below depending of a positive or negative value?
Zoom with coord_cartesian(ylim = c(-0.015,0.005))
zoomed plot
Thank you for your helping hands.
EDIT: I found the solution already. Like the color changement from red to green i used ifelse for the vjust parameter.
There are a lot of varieties of ways to display this sort of data with ggplot. I highly recommend you check out https://r4ds.had.co.nz/data-visualisation.html if you haven't already.
One suggestion you'll find there is that ggplot almost always works better if you first convert your data into long (aka "tidy") form. This puts each of the dimensions of the data into its own column, so that you can map the dimension to a visual aesthetic. Here's one way to do that:
library(tidyverse)
df2 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 1),
stat = str_sub(column, start = 3),
value_label = if_else(stat == "per",
scales::percent(value, accuracy = 0.1),
scales::comma(value, accuracy = 0.01)))
Now, the group a/b/c is in its own column, as is the type of data abs/per, the values are all together in one column, and we also have text labels that suit the type of data.
> head(df2)
X column value group stat value_label
1 e 1 a_abs -0.3693 a abs -0.37
2 e 1,5 a_abs -0.0735 a abs -0.07
3 e 2 a_abs -0.0190 a abs -0.02
4 e 2,5 a_abs 0.0015 a abs 0.00
5 e 3 a_abs 0.0000 a abs 0.00
6 e 3,5 a_abs -0.0224 a abs -0.02
With that out of the way, it's simpler to try out different combinations of ggplot options, which can help highlight different comparisons within the data.
For instance, if you want to compare the different observations within each group, you could put each group into a facet, and each observation along the x axis:
ggplot(df2, aes(X, value, label = value_label)) +
geom_segment(aes(xend = X, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 2, size = 2) +
facet_grid(stat~group)
Or if you want to highlight how the different groups compared within each observation, you could swap them, like this:
ggplot(df2, aes(group, value, label = value_label)) +
geom_segment(aes(xend = group, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 2, size = 2) +
facet_grid(stat~X)
You might also try combining the abs and per data, since they only vary slightly based on the different denominators applicable to each group and/or observation. To do that, it might be simpler to transform the data to keep each abs and per together:
df3 <- df %>%
gather(column, value, -X) %>%
mutate(group = str_sub(column, end = 1),
stat = str_sub(column, start = 3)) %>%
select(-column) %>%
spread(stat, value) %>%
mutate(combo_label = paste(sep="\n",
scales::comma(abs, accuracy = 0.01),
scales::percent(per, accuracy = 0.1)))
ggplot(df3, aes(group, abs, label = combo_label)) +
geom_segment(aes(xend = group, yend = 0), color = "blue") +
geom_point() +
geom_text(vjust = 1.5, size = 2, lineheight = 0.8) +
scale_y_continuous(expand = c(0.2,0)) +
facet_grid(~X)

Free colour scales in facet_grid

Say I have the following data frame:
# Set seed for RNG
set.seed(33550336)
# Create toy data frame
loc_x <- c(a = 1, b = 2, c = 3)
loc_y <- c(a = 3, b = 2, c = 1)
scaling <- c(temp = 100, sal = 10, chl = 1)
df <- expand.grid(loc_name = letters[1:3],
variables = c("temp", "sal", "chl"),
season = c("spring", "autumn")) %>%
mutate(loc_x = loc_x[loc_name],
loc_y = loc_y[loc_name],
value = runif(nrow(.)),
value = value * scaling[variables])
which looks like,
# > head(df)
# loc_name variables season loc_x loc_y value
# 1 a temp spring 1 3 86.364697
# 2 b temp spring 2 2 35.222573
# 3 c temp spring 3 1 52.574082
# 4 a sal spring 1 3 0.667227
# 5 b sal spring 2 2 3.751383
# 6 c sal spring 3 1 9.197086
I want to plot these data in a facet grid using variables and season to define panels, like this:
g <- ggplot(df) + geom_point(aes(x = loc_name, y = value), size = 5)
g <- g + facet_grid(variables ~ season)
g
As you can see, different variables have very different scales. So, I use scales = "free" to account for this.
g <- ggplot(df) + geom_point(aes(x = loc_name, y = value), size = 5)
g <- g + facet_grid(variables ~ season, scales = "free")
g
Mucho convenient. Now, say I want to do this, but plot the points by loc_x and loc_y and have value represented by colour instead of y position:
g <- ggplot(df) + geom_point(aes(x = loc_x, y = loc_y, colour = value),
size = 5)
g <- g + facet_grid(variables ~ season, scales = "free")
g <- g + scale_colour_gradient2(low = "#3366CC",
mid = "white",
high = "#FF3300",
midpoint = 50)
g
Notice that the colour scales are not free and, like the first figure, values for sal and chl cannot be read easily.
My question: is it possible to do an equivalent of scales = "free" but for colour, so that each row (in this case) has a separate colour bar? Or, do I have to plot each variable (i.e., row in the figure) and patch them together using something like cowplot?
Using the development version of dplyr:
library(dplyr)
library(purrr)
library(ggplot2)
library(cowplot)
df %>%
group_split(variables, season) %>%
map(
~ggplot(., aes(loc_x, loc_y, color = value)) +
geom_point(size = 5) +
scale_colour_gradient2(
low = "#3366CC",
mid = "white",
high = "#FF3300",
midpoint = median(.$value)
) +
facet_grid(~ variables + season, labeller = function(x) label_value(x, multi_line = FALSE))
) %>%
plot_grid(plotlist = ., align = 'hv', ncol = 2)

Enforce same color palette for `color` and `fill` of a subset of data

Having the following sample dataset:
set.seed(20)
N <- 20
df1 <- data.frame(x = rnorm(N),
y = rnorm(N),
grp = paste0('grp_', sample(1:500, N, T)),
lab = sample(letters, N, T))
# x y grp lab
# 1 1.163 0.237 grp_104 w
# 2 -0.586 -0.144 grp_448 y
# 3 1.785 0.722 grp_31 m
# 4 -1.333 0.370 grp_471 z
# 5 -0.447 -0.242 grp_356 o
I want to plot all points but label only subset of them (say, those df1$x>0). It works fine when I use the same color=grp aesthetics for both geom_point and geom_text:
ggplot(df1, aes(x=x,y=y,color=grp))+
geom_point(size=4) +
geom_text(aes(label=lab),data=df1[df1$x>1,],size=5,hjust=1,vjust=1)+
theme(legend.position="none")
But if I want to change points design to fill=grp, colors of labels do not match anymore:
ggplot(df1, aes(x=x,y=y))+
geom_point(aes(fill=grp),size=4,shape=21) +
geom_text(aes(label=lab,color=grp),data=df1[df1$x>1,],size=5,hjust=1,vjust=1)+
theme(legend.position="none")
I understand palette is different because levels of the subset are not the same as levels of the whole dataset. But what would be the simplest solution to enforce using the same palette?
The issue arises from different factor levels for the text and fill colours. We can avoid dropping unused factor levels by using drop = FALSE inside scale_*_discrete:
ggplot(df1, aes(x=x,y=y))+
geom_point(aes(fill=grp),size=4,shape=21) +
geom_text(aes(label=lab,color=grp),data=df1[df1$x>1,],size=5,hjust=1,vjust=1)+
theme(legend.position="none") +
scale_fill_discrete(drop = F) +
scale_colour_discrete(drop = F)
Update
With your real data we need to make sure that grp is in fact a factor.
# Load sample data
load("df1.Rdat")
# Make sure `grp` is a factor
library(tidyverse)
df1 <- df1 %>% mutate(grp = factor(grp))
# Or in base R
# df1$grp = factor(df1$grp)
# Same as before
ggplot(df1, aes(x=x,y=y))+
geom_point(aes(fill=grp),size=4,shape=21) +
geom_text(aes(label=lab,color=grp),data=df1[df1$x>1,],size=5,hjust=1,vjust=1)+
theme(legend.position="none") +
scale_fill_discrete(drop = F) +
scale_colour_discrete(drop = F)
One way is to leave the colour / fill palettes alone, & set all unwanted labels to be transparent instead:
ggplot(df1, aes(x = x, y = y)) +
geom_point(aes(fill = grp), size = 4, shape = 21) +
geom_text(aes(label = lab, color = grp,
alpha = x > 1),
size = 5, hjust = 1, vjust = 1) +
scale_alpha_manual(values = c("TRUE" = 1, "FALSE" = 0)) +
theme(legend.position = "none")

Resources