I have created a line chart (plot) in R with labels on each data point. Due to the large number of data points, the plot becomes very fully with labels. I would like to apply the labels only for the last N (say 4) data points. I have tried subset and tail in the geom_label_repel function but was not able to figure them our or got an error message. My data set consist of 99 values, spread over 3 groups (KPI).
I have the following code in R:
library(ggplot)
library(ggrepel)
data.trend <- read.csv(file=....)
plot.line <- ggplot(data=data.trend, aes(x = Version, y = Value, group = KPI, color = KPI)) +
geom_line(aes(group = KPI), size = 1) +
geom_point(size = 2.5) +
# Labels defined here
geom_label_repel(
aes(Version, Value, fill = factor(KPI), label = sprintf('%0.1f%%', Value)),
box.padding = unit(0.35, "lines"),
point.padding = unit(0.4, "lines"),
segment.color = 'grey50',
show.legend = FALSE
)
);
I all fairness, I am quite new to R. Maybe I miss something basic.
Thanks in advance.
The simplest approach is to set the data = parameter in geom_label_repel to only include the points you want labeled.
Here's a reproducible example:
set.seed(1235)
data.trend <- data.frame(Version = rnorm(25), Value = rnorm(25),
group = sample(1:2,25,T),
KPI = sample(1:2,25,T))
ggplot(data=data.trend, aes(x = Version, y = Value, group = KPI, color = KPI)) +
geom_line(aes(group = KPI), size = 1) +
geom_point(size = 2.5) +
geom_label_repel(aes(Version, Value, fill = factor(KPI), label = sprintf('%0.1f%%', Value)),
data = tail(data.trend, 4),
box.padding = unit(0.35, "lines"),
point.padding = unit(0.4, "lines"),
segment.color = 'grey50',
show.legend = FALSE)
Unfortunately, this messes slightly with the repel algorithm, making the label placement suboptimal with respect to the other points which are not labelled (you can see in the above figure that some points get covered by labels).
So, a better approach is to use color and fill to simply make the unwanted labels invisible (by setting both color and fill to NA for labels you want to hide):
ggplot(data=data.trend, aes(x = Version, y = Value, group = KPI, color = KPI)) +
geom_line(aes(group = KPI), size = 1) +
geom_point(size = 2.5) +
geom_label_repel(aes(Version, Value, fill = factor(KPI), label = sprintf('%0.1f%%', Value)),
box.padding = unit(0.35, "lines"),
point.padding = unit(0.4, "lines"),
show.legend = FALSE,
color = c(rep(NA,21), rep('grey50',4)),
fill = c(rep(NA,21), rep('lightblue',4)))
If you want to show just the last label, using group_by and filter may work:
data = data.trend %>% group_by(KPI) %>% filter(Version == max(Version))
Full example:
suppressPackageStartupMessages(library(dplyr))
library(ggplot2)
library(ggrepel)
set.seed(1235)
data.trend <- data.frame(Version = rnorm(25), Value = rnorm(25),
group = sample(1:2,25,T),
KPI = sample(1:2,25,T))
ggplot(data = data.trend, aes(x = Version, y = Value, group = KPI, color = KPI)) +
geom_line(aes(group = KPI), size = 1) +
geom_point(size = 2.5) +
# Labels defined here
geom_label_repel(
data = data.trend %>% group_by(KPI) %>% filter(Version == max(Version)),
aes(Version, Value, fill = factor(KPI), label = sprintf('%0.1f%%', Value)),
color = "black",
fill = "white")
Or if you want to show 4 random labels per KPI, data.trend %>% group_by(KPI) %>% sample_n(4):
suppressPackageStartupMessages(library(dplyr))
library(ggplot2)
library(ggrepel)
set.seed(1235)
data.trend <- data.frame(Version = rnorm(25), Value = rnorm(25),
group = sample(1:2,25,T),
KPI = as.factor(sample(1:2,25,T)))
ggplot(data = data.trend, aes(x = Version, y = Value, group = KPI, color = KPI)) +
geom_line(aes(group = KPI), size = 1) +
geom_point(size = 2.5) +
# Labels defined here
geom_label_repel(
data = data.trend %>% group_by(KPI) %>% sample_n(4),
aes(Version, Value, fill = factor(KPI), label = sprintf('%0.1f%%', Value), fill = KPI),
color = "black", show.legend = FALSE
)
#> Warning: Duplicated aesthetics after name standardisation: fill
Created on 2021-08-27 by the reprex package (v2.0.1)
Related
With this data:
df <- data.frame(value =c(20, 50, 90),
group = c(1, 2,3))
I can get a bar chart:
df %>% ggplot(aes(x = group, y = value, fill = value)) +
geom_col() +
coord_flip()+
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
But I would like to have the colors of those bars to vary according to their corresponding values in value.
I have managed to change them using geom_raster:
ggplot() +
geom_raster(aes(x = c(0:20), y = .9, fill = c(0:20)),
interpolate = TRUE) +
geom_raster(aes(x = c(0:50), y = 2, fill = c(0:50)),
interpolate = TRUE) +
geom_raster(aes(x = c(0:90), y = 3.1, fill = c(0:90)),
interpolate = TRUE) +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
This approach is not efficient when I have many groups in real data. Any suggestions to get it done more efficiently would be appreciated.
I found the accepted answer to a previous similar question, but "These numbers needs to be adjusted depending on the number of x values and range of y". I was looking for an approach that I do not have to adjust numbers based on data. David Gibson's answer fits my purpose.
It does not look like this is supported natively in ggplot. I was able to get something close by adding additional rows, ranging from 0 to value) to the data. Then use geom_tile and separating the tiles by specifying width.
library(tidyverse)
df <- data.frame(value = c(20, 50, 90),
group = c(1, 2, 3))
df_expanded <- df %>%
rowwise() %>%
summarise(group = group,
value = list(0:value)) %>%
unnest(cols = value)
df_expanded %>%
ggplot() +
geom_tile(aes(
x = group,
y = value,
fill = value,
width = 0.9
)) +
coord_flip() +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
If this is too pixilated you can increase the number of rows generated by replacing list(0:value) with seq(0, value, by = 0.1).
This is a real hack using ggforce. This package has a geom that can take color gradients but it is for a line segment. I've just increased the size to make the line segment look like a bar. I made all the bars the same length to get the correct gradient, then covered a portion of each bar over with the same color as the background color to make them appear to be the correct length. Had to hide the grid lines, however. :-)
df %>%
ggplot() +
geom_link(aes(x = 0, xend = max(value), y = group, yend = group, color = stat(index)), size = 30) +
geom_link(aes(x = value, xend = max(value), y = group, yend = group), color = "grey", size = 31) +
scale_color_viridis_c(option = "C") +
theme(legend.position = "none", panel.background = element_rect(fill = "grey"),
panel.grid = element_blank()) +
ylim(0.5, max(df$group)+0.5 )
I want to change the color of my boxplots based on their grouping value or individually instead of the fill value in ggplot. How can I do this? I need to define the fill variable to be able to get the groups subdivided into types, just as I intended, but I cannot seem to control the color anymore.
Here is some example data and how I define the plot:
library(ggplot2)
data <- data.frame( id = rep(1:120),
group = as.factor(rep(1:3, times = 40)),
type = as.factor(rep(1:2, times = 60)),
value = rnorm(120)+1)
ggplot(data, aes(x = group, y = value, fill = type)) +
geom_boxplot(aes(fill = type)) +
geom_point(aes(fill = type), position = position_jitterdodge(dodge.width = 0.65, jitter.width = 0.1, jitter.height = 0.1))
Which results in this image:
Now the colours are based on the "type" grouping. But I would like to control the 6 individual boxplots separately. For example, I want to colour them like this:
colors <- c("#b76e79", "#80a3dd",
"#a5bf9f", "#e3f0cd",
"#8a9a5b", "#ffdead")
Adding scale_fill_manual does not seem to work the way I want.
ggplot(data, aes(x = group, y = value, fill = type)) +
geom_boxplot(aes(fill = type)) +
geom_point(aes(fill = type), position = position_jitterdodge(dodge.width = 0.65, jitter.width = 0.1, jitter.height = 0.1)) +
scale_fill_manual(values = colors)
Any suggestions?
We create a new column by the interaction of 'group', 'type' and use that in scale_fill_manual
library(dplyr)
library(ggplot2)
data <- data %>%
mutate(grptype = interaction(group, type))
gg <- ggplot(data, aes(x = group, y = value, fill = grptype)) +
geom_boxplot(aes(fill = grptype)) +
geom_point(aes(fill = grptype), position = position_jitterdodge(dodge.width = 0.65, jitter.width = 0.1, jitter.height = 0.1))
colors <- c("#b76e79", "#80a3dd",
"#a5bf9f", "#e3f0cd",
"#8a9a5b", "#ffdead")
gg +
scale_fill_manual(name="grptype",
labels = levels(data$grptype),
values = setNames(colors, levels(data$grptype))) +
theme(legend.title = element_text(size=12, color = "black", face="bold"),
legend.justification=c(0,1),
legend.position=c(0.05, 0.95),
legend.background = element_blank(),
legend.key = element_blank())
-output
So I create a boxplot of data and then add a set point over that data. I want my legend to capture what the data type of the geom_points represents. Thanks!
ggplot(data = NULL) +
geom_boxplot(data = discuss_impact_by_county,
aes(x=reorder(State,discuss, FUN = median),y=discuss),
outlier.shape = NA) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(x = "States") +
geom_point(data = by_state,
aes(x = State, y = discuss_happen_difference),
col = "red",
size = 3,
show.legend = TRUE)
If you want a legend you have to map on aesthetics. In your case map something on the color aes, i.e. move col="red" into aes() and use scale_color_manual to set the value and the legend label to be assgined to the color label "red".
As you have only one "category" of points you can simply do scale_color_manual(values = "red", label = "We are red points") to set the color and label. In case that your have multiple points with different colors it's best to make use of a named vector to assign the colors and legend labels to the right "color label"s, i.e use scale_color_manual(values = c(red = "red"), label = c(red = "We are red points")).
Using some random example data try this:
library(ggplot2)
library(dplyr)
set.seed(42)
discuss_impact_by_county <- data.frame(
State = sample(LETTERS[1:4], 100, replace = TRUE),
discuss = runif(100, 1, 5)
)
by_state <- discuss_impact_by_county %>%
group_by(State) %>%
summarise(discuss_happen_difference = mean(discuss))
#> `summarise()` ungrouping output (override with `.groups` argument)
ggplot(data = NULL) +
geom_boxplot(data = discuss_impact_by_county,
aes(x=reorder(State,discuss, FUN = median),y=discuss),
outlier.shape = NA) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(x = "States") +
geom_point(data = by_state,
aes(x = State, y = discuss_happen_difference, col = "red_points"),
size = 3,
show.legend = TRUE) +
scale_color_manual(values = "red", label = "We are red points")
I woul like to be able to make the geom_text inside the geom_point to follow the re-positioning when applying position_dodge. That is, I would like to go from the code below:
Q <- as_tibble(data.frame(series = rep(c("diax","diay"),3),
value = c(3.25,3.30,3.31,3.36,3.38,3.42),
year = c(2018,2018,2019,2019,2020,2020))) %>%
select(year, series, value)
ggplot(data = Q, mapping = aes(x = year, y = value, color = series, label = sprintf("%.2f",value))) +
geom_point(size = 13) +
geom_text(vjust = 0.4,color = "white", size = 4, fontface = "bold", show.legend = FALSE)
which produces the following chart:
to the following change:
ggplot(data = Q, mapping = aes(x = year, y = value, color = series, label = sprintf("%.2f",value))) +
geom_point(size = 13, position = position_dodge(width = 1)) +
geom_text(position = position_dodge(width = 1), vjust = 0.4,
color = "white", size = 4, fontface = "bold",
show.legend = FALSE)
which produces the following chart:
The curious thing about this is the fact that excatly the same change works just fine if I change from geom_point to geom_bar:
ggplot(Q, aes(year, value, fill = factor(series), label = sprintf("%.2f",value))) +
geom_bar(stat = "identity", position = position_dodge(width = 1)) +
geom_text(color = "black", size = 4,fontface= "bold",
position = position_dodge(width = 1), vjust = 0.4, show.legend = FALSE)
This happens because the the dodging is based on the group aesthetic, automatically set in this case to series because of the mapping to color. The issue is that the text layer has it's own color ("white") and so the grouping is dropped. Manually set the grouping, and all is good:
ggplot(Q, aes(x = year, y = value, color = series, label = sprintf("%.2f",value), group = series)) +
geom_point(size = 13, position = position_dodge(width = 1)) +
geom_text(position = position_dodge(width = 1), vjust = 0.4, color = "white", size = 4,
fontface = "bold", show.legend = FALSE)
One patch work would be the following. Since you cannot add labels on top of the data point using geom_text() right away, you may want to go round a bit. I first created a temporary graphic with geom_point(). Then, I accessed to the data frame which is used for drawing the graphic. You can find the values of x and y axis. Using them, I created a new data frame called temp which include the axis information and the label information. Once I had this data frame, I could draw the expected outcome using temp. Make sure that you use inherit.aes = FALSE in geom_text() since you are using another data frame.
library(dplyr)
library(ggplot2)
g <- ggplot(data = Q, aes(x = year, y = value, color = series)) +
geom_point(size = 13, position = position_dodge(width = 1))
temp <- as.data.frame(ggplot_build(g)$data) %>%
select(x, y) %>%
arrange(x) %>%
mutate(label = sprintf("%.2f",Q$value))
ggplot(data = Q, aes(x = year, y = value, color = series)) +
geom_point(size = 13, position = position_dodge(width = 1)) +
geom_text(data = temp, aes(x = x, y = y, label = label),
color = "white", inherit.aes = FALSE)
I have a problem with plot. I want to show only dot points in group A, not in each name. Here is an example:
name <- c("a","b","c","d")
df <- data.frame(id = rep(1:5,3),
value = c(seq(50,58,2),seq(60,68,2),seq(70,78,2)),
name = c(rep("A",5),rep("B",5),rep("C",5)),
type = rep(c("a","b","c","d","r"),3))
df$name <- factor(df$name, levels = c("C","B","A"),ordered = TRUE)
ggplot(df, aes(id, value, fill = name,color = type))+
geom_area( position = 'identity', linetype = 1, size = 1 ,colour="black") +
geom_point(size = 8)+
guides(fill = guide_legend(override.aes = list(colour = NULL, shape = NA)))
If I am reading the question correctly, it seems that you want dots for the blue area only. In that case, you could subset the data and use it for geom_point.
ggplot(df, aes(id, value, fill = name,color = type))+
geom_area( position = 'identity', linetype = 1, size = 1 ,colour="black") +
geom_point(data = subset(df, name == "A"), size = 8) +
guides(fill = guide_legend(override.aes = list(colour = NULL, shape = NA)))