I've searched and tried a bunch of suggestions to be able to display a custom legend instead of the default one in a grouped scatter ggplot. I've tried this and this and following this among others.
For instance, let's say I have a df like this one:
df = data.frame(id = c("A", "A", "B", "C", "C", "C"),
value = c(1,2,1,2,3,4),
ref = c(1.5, 1.5, 1, 2,2,2),
min = c(0.5, 0.5, 1,2,2,2))
and I want to display the values of each id as round dots, but also put the reference values and minimum values for each id as a differently shaped dot, as follows:
p = ggplot(data = df) +
geom_point(aes(x = id, y = value, color = factor(id)), shape = 19, size = 6) +
geom_point(aes(x = id, y = ref, color = factor(id)), shape = 0, size = 8) +
geom_point(aes(x = id, y = min, color = factor(id)), shape = 2, size = 8) +
xlab("") +
ylab("Value")
#print(p)
Now all is fine, but my legend doesn't add anything to the interpretation of the plot, as the X axis and colors are enough to understand it. I know I can remove the legend via theme(legend.position = "none").
Instead, I would like to have a legend of what the actual shapes of each dot represent (e.g., filled round dot = value, triangle = min, square = ref).
Among trying to manually set the scale values via scale_fill_manual and something along those lines
override.shape = shapes$shape
override.linetype = shapes$pch
guides(colour = guide_legend(override.aes = list(shape = override.shape, linetype = override.linetype)))...
....
I've also tried making a secondary plot, but not display it, using something suggested in one of the links pasted above:
shapes = data.frame(shape = c("value", "reference", "minimum"), pch = c(19,0,2), col = c("gray", "gray", "gray"))
p2 = ggplot(shapes, aes(shape, pch)) + geom_point()
#print(p2)
g_legend <- function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)
}
legend <- g_legend(p2)
library(gridExtra)
pp <- arrangeGrob(p1 ,legend,
widths=c(5/4, 1/4),
ncol = 2)
but then I get the error:
> legend <- g_legend(p2)
Error in tmp$grobs[[leg]] :
attempt to select less than one element in get1index
for which I did not find a working solution.. so yeah.. any suggestion on how I could only show a legend related to the different dot shapes would be welcome.
Thank you
You can manually build a shape legend using scale_shape_manual:
library(ggplot2)
ggplot(data = df) +
geom_point(aes(x = id, y = value, color = factor(id), shape = 'value'), size = 6) +
geom_point(aes(x = id, y = ref, color = factor(id), shape = 'ref'), size = 8) +
geom_point(aes(x = id, y = min, color = factor(id), shape = 'min'), size = 8) +
scale_shape_manual(values = c('value' = 19, 'ref' = 0, 'min' = 2)) +
xlab("") +
ylab("Value")
Created on 2020-04-15 by the reprex package (v0.3.0)
But a better way to do this would be to reshape the df to a long format, and map each aes to a variable:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-id) %>%
ggplot() +
geom_point(aes(x = id, y = value, color = factor(id), shape = name, size = name)) +
scale_shape_manual(values = c('value' = 19, 'ref' = 0, 'min' = 2)) +
scale_size_manual(values = c('value' = 6, 'ref' = 8, 'min' = 8)) +
xlab("") +
ylab("Value")
Created on 2020-04-15 by the reprex package (v0.3.0)
To remove the legend for the color use guide_none():
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(-id) %>%
ggplot() +
geom_point(aes(x = id, y = value, color = factor(id), shape = name, size = name)) +
scale_shape_manual(values = c('value' = 19, 'ref' = 0, 'min' = 2)) +
scale_size_manual(values = c('value' = 6, 'ref' = 8, 'min' = 8)) +
guides(color = guide_none()) +
xlab("") +
ylab("Value")
Created on 2020-04-16 by the reprex package (v0.3.0)
Data:
df = data.frame(id = c("A", "A", "B", "C", "C", "C"),
value = c(1,2,1,2,3,4),
ref = c(1.5, 1.5, 1, 2,2,2),
min = c(0.5, 0.5, 1,2,2,2))
You can tidy your data first using tidyr, and then map the aes shape to the new variable
library(tidyr)
df2 <- pivot_longer(df, -id)
ggplot(data = df2) +
geom_point(aes(x = id, y = value, shape = name), size = 6) +
xlab("") +
ylab("Value")
Related
library(ggplot2)
library(directlabels)
mydat <- structure(list(Name = c("Ana", "Josh", "Bart", "Ana", "Josh",
"Bart"), color_line = c("purple", "purple", "orange", "purple",
"purple", "orange"), x = c(0.864864864864865, 0.810810810810811,
0.472972972972973, 0.851351351351351, 0.702702702702703, 0.648648648648649
), y = c(0.702702702702703, 0.675675675675676, 0.797297297297297,
0.797297297297297, 0.72972972972973, 0.635135135135135), Class = c("A",
"A", "A", "B", "B", "B")), class = c("data.table", "data.frame"
), row.names = c(NA, -6L))
mydat
Name color_line x y Class
1: Ana purple 0.8648649 0.7027027 A
2: Josh purple 0.8108108 0.6756757 A
3: Bart orange 0.4729730 0.7972973 A
4: Ana purple 0.8513514 0.7972973 B
5: Josh purple 0.7027027 0.7297297 B
6: Bart orange 0.6486486 0.6351351 B
I have the above data set, and I plotted the results as follows:
g <- ggplot(mydat, aes(x = x, y = y, color = Class)) +
theme_classic() +
geom_line(mapping = aes(group = Name), color = mydat$color_line) +
geom_point() +
scale_color_manual(values=c("springgreen4", "royalblue3"))
g
Now, I would like to add the Name of each individual to each line. Something that might look like this:
The closest thing that I found is the angled.boxes in library(directlabels). You can see how it looks here.
However, when I tried the following, I got a different plot.
direct.label(g, "angled.boxes")
One option to achieve your desired result would be to use the geomtextpath package which adds lots of options to add direct labels to lines and even allows for curved text. For your use case you could simply replace geom_line by geomtextpath::geom_textline to add your labels.
Note: Additionally I slightly adjusted your code to make use of the color aes for the lines and the fill aes to color the points.
library(ggplot2)
library(geomtextpath)
pal_color <- c("purple", "purple", "orange")
names(pal_color) <- c("Ana", "Josh", "Bart")
pal_fill <- c("springgreen4", "royalblue3")
names(pal_fill) <- c("A", "B")
base <- ggplot(mydat, aes(x = x, y = y)) +
scale_color_manual(values = pal_color) +
scale_fill_manual(values= pal_fill) +
theme_classic() +
guides(color = "none")
base +
geomtextpath::geom_textline(
aes(group = Name, color = Name, label = Name), textcolour = "black") +
geom_point(aes(fill = Class), shape = 21, stroke = 0, size = 2)
Or using the offset and gap arguments you could add the labels on top of the lines:
base +
geomtextpath::geom_textline(
aes(group = Name, color = Name, label = Name),
offset = unit(5, "pt"), gap = FALSE, textcolour = "black") +
geom_point(aes(fill = Class), shape = 21, stroke = 0, size = 2)
It's not ideal, but I did this a long time ago using some math and manual adjustments
mydat %>%
group_by(Name) %>%
mutate(
posx = mean(x)*1.01,
posy = mean(y)*1.01,
angle = -60*diff(range(y))/diff(range(x))
) %>%
ggplot(aes(x = x, y = y, color = Class)) + theme_classic() +
geom_line(mapping = aes(group = Name), color = mydat$color_line) +
geom_point() + scale_color_manual(values=c("springgreen4", "royalblue3"))+
geom_text(aes(x = posx, y = posy, group = Name, label = Name, angle = angle),
size = 6, show.legend = FALSE, color = "black")
I would like to make heatmaps using the following data:
dt <- data.frame(
h = rep(LETTERS[1:7], 7),
j = c(rep("A", 7), rep("B", 7), rep("C", 7), rep("D", 7), rep("E", 7), rep("F", 7), rep("G", 7)),
Red = runif(7, 0, 1),
Yellow = runif(7, 0, 1),
Green = runif(7, 0, 1),
Blue = runif(7, 0, 1),
Black = runif(7, 0, 1)
)
For each of the heatmaps, the x and y axes stay as the first 2 columns of df. The values that fill in each heatmap will be each of the remaining columns, e.g., Red, Yellow, ...
I borrowed this example to produce the following code:
loop = function(df, x_var, y_var, f_var) {
ggplot(df, aes(x = .data[[x_var]], y = .data[[y_var]], fill = .data[[f_var]])) +
geom_tile(color = "black") +
scale_fill_gradient(low = "white", high = "blue") +
geom_text(aes(label = .data[[f_var]]), color = "black", size = 4) +
coord_fixed() +
theme_minimal() +
labs(x = "",
y = "",
fill = "R", # Want the legend title to be each of the column names that are looped
title = .data[[f_var]])
ggsave(a, file = paste0("heatmap_", f_var,".png"), device = png, width = 15, height = 15, units = "cm")
}
plot_list <- colnames(dt)[-1] %>%
map( ~ loop(df = dt,
x_var = colnames(dt)[1],
y_var = colnames(dt)[2],
f_var = .x))
# view all plots individually (not shown)
plot_list
Problems I encountered when ran this chunk of code:
Error: Discrete value supplied to continuous scale
Step ggsave didn't work. I would like to save each plot by the names of the changing columns.
There are some minor issues with your code. You get the first error as you included the second column of your dataset (which is a categorical, i.e. discrete variable) in the loop. Second, title = .data[[f_var]] will not work. Simply use title = f_var to add the variable name as the title. Finally, you are trying to save an object called a which however is not defined in your code, i.e. you have to assign your plot to a variable a and to return the plot I added a return(a):
set.seed(123)
library(ggplot2)
library(purrr)
loop = function(df, x_var, y_var, f_var) {
a <- ggplot(df, aes(x = .data[[x_var]], y = .data[[y_var]], fill = .data[[f_var]])) +
geom_tile(color = "black") +
scale_fill_gradient(low = "white", high = "blue") +
geom_text(aes(label = .data[[f_var]]), color = "black", size = 4) +
coord_fixed() +
theme_minimal() +
labs(x = "",
y = "",
fill = "R", # Want the legend title to be each of the column names that are looped
title = f_var)
ggsave(a, file = paste0("heatmap_", f_var,".png"), device = png, width = 15, height = 15, units = "cm")
return(a)
}
plot_list <- colnames(dt)[-c(1, 2)] %>%
map( ~ loop(df = dt,
x_var = colnames(dt)[1],
y_var = colnames(dt)[2],
f_var = .x))
# view all plots individually (not shown)
plot_list[c(1, 5)]
#> [[1]]
#>
#> [[2]]
I am making a line plot of several groups and want to make a visualization where one of the groups lines are highlighted
ggplot(df) + geom_line(aes(x=timepoint ,y=var, group = participant_id, color=color)) +
scale_color_identity(labels = c(red = "g1",gray90 = "Other"),guide = "legend")
However, the group lines are partially obscured by the other groups lines
How can I make these lines always on top of other groups lines?
The simplest way to do this is to plot the gray and red groups on different layers.
First, let's try to replicate your problem with a dummy data set:
set.seed(1)
df <- data.frame(
participant_id = rep(1:50, each = 25),
timepoint = factor(rep(0:24, 50)),
var = c(replicate(50, runif(1, 50, 200) + runif(25, 0.3, 1.5) *
sin(0:24/(0.6*pi))^2/seq(0.002, 0.005, length = 25))),
color = rep(sample(c("red", "gray90"), 50, TRUE, prob = c(1, 9)), each = 100)
)
Now we apply your plotting code:
library(ggplot2)
ggplot(df) +
geom_line(aes(x=timepoint ,y=var, group = participant_id, color = color)) +
scale_color_identity(labels = c(red = "g1", gray90 = "Other"),
guide = "legend") +
theme_classic()
This looks broadly similar to your plot. If instead we plot in different layers, we get:
ggplot(df, aes(timepoint, var, group = participant_id)) +
geom_line(data = df[df$color == "gray90",], aes(color = "Other")) +
geom_line(data = df[df$color == "red",], aes(color = "gl")) +
scale_color_manual(values = c("red", "gray90")) +
theme_classic()
Created on 2022-06-20 by the reprex package (v2.0.1)
You can use factor releveling to bring the line (-s) of interest to front.
First, let's plot the data as is, with the red line partly hidden by others.
library(ggplot2)
library(dplyr)
set.seed(13)
df <-
data.frame(timepoint = rep(c(1:100), 20),
participant_id = paste0("p_", sort(rep(c(1:20), 100))),
var = abs(rnorm(2000, 200, 50) - 200),
color = c(rep("red", 100), rep("gray90", 1900)))
ggplot(df) +
geom_line(aes(x = timepoint ,
y = var,
group = participant_id, color = color)) +
scale_color_identity(labels = c(red = "g1", gray90 = "Other"),
guide = "legend")
Now let's bring p_1 to front by making it the last factor level.
df %>%
mutate(participant_id = factor(participant_id)) %>%
mutate(participant_id = relevel(participant_id, ref = "p_1")) %>%
mutate(participant_id = factor(participant_id, levels = rev(levels(participant_id)))) %>%
ggplot() +
geom_line(aes(x=timepoint,
y=var,
group = participant_id,
color = color)) +
scale_color_identity(labels = c(red = "g1", gray90 = "Other"),
guide = "legend")
This is my df :
df <- data.frame(annee = rep(c(2003,2004), times = 1, each = 3), sps = c("a", "b", "c"), nb = 1:3)
I create a column containing my labels :
df$labels <- paste("nb", df$sps, "=", df$nb)
Then I do my plot :
ggplot(df, aes(nb)) +
geom_density(aes(fill = sps, colour = sps), alpha = 0.1) +
facet_wrap(~ annee) +
geom_text(data=df, aes(x=8, y=2.5, label= labels), colour="black", inherit.aes=FALSE, parse=FALSE)
But I have a problem with my text in each facet : I would like to have 3 lines (one for each sps).
I tried with the symbol "\n" but I failed in trying to obtain :
"nb a = 1 \n nb b = 2 \n nb c = 3" for each year
Thanks for help
You will have to concatenate what you want broken into several lines into one single string.
newdf <- aggregate(labels ~ annee, data = df, FUN = paste, collapse = "\n")
ggplot(df, aes(nb)) +
geom_density(aes(fill = sps, colour = sps), alpha = 0.1) +
facet_wrap(~ annee) +
geom_text(data = newdf, aes(x = 8, y = 2, label = labels), color = "black") +
scale_x_continuous(limits = c(0, 11)) +
scale_y_continuous(limits = c(0, 2.25))
You can achieve what you want by creating a separate data.frame for your labels:
library(tidyverse)
df <- data.frame(annee = rep(c(2003,2004),
times = 1, each = 3),
sps = c("a", "b", "c"),
nb = 1:3)
# create labels in separate data.frame
label_df <- df %>%
mutate(labels = paste("nb", sps, "=", nb)) %>%
group_by(annee) %>%
summarise(labels = paste(labels, collapse = "\n")) %>%
mutate(x = 6.5,
y = 2.2)
ggplot(df, aes(nb)) +
geom_density(aes(fill = sps, colour = sps), alpha = 0.1) +
facet_wrap(~annee) +
geom_text(data = label_df, aes(x = x, y = y, label = labels)) +
coord_cartesian(ylim = c(0, 2.4), xlim = c(1, 8))
I recognize that this has been an issue that's been asked in many other instances, but none of the solutions provided worked for my particular problem.
Here, I have the following data:
library(tidyverse)
library(scales)
mydata <- tibble(Category = c("A", "B", "C", "D"),
Result = c(0.442, 0.537, 0.426, 0.387),
A = c(NA, "A", NA, NA),
B = rep(NA, 4),
C = c(NA, "C", NA, NA),
D = c("D", "D", NA, NA))
mydata$Category <- factor(mydata$Category)
And I have the following vector for the colors:
colors_vct <- c(A = "#0079c0", B = "#cc9900", C = "#252525", D = "#c5120e")
With this information, I can create the following plot:
p <- ggplot(data = mydata , aes(x = Category, y = Result, fill = Category)) +
geom_bar(stat = "identity") + geom_text(aes(label = percent(Result), color = Category), hjust = -.25) +
coord_flip() + scale_y_continuous(limits = c(0,1), labels = percent) +
scale_colour_manual(values = colors_vct) + scale_fill_manual(values = colors_vct)
p
And I'd like to have little triangles appear after the labels based on whether a certain category is mentioned in the last 4 columns of mydata, colored by that category's color, as so:
p <- p + geom_text(data = filter(mydata, mydata[,3] == "A"), aes(label = sprintf("\u25b2")), colour = colors_vct["A"], hjust = -4)
#p <- p + geom_text(data = filter(mydata, mydata[,4] == "B"), aes(label = sprintf("\u25b2")), colour = colors_vct["B"], hjust = -5) #This is commented out because there are no instances where the layer ends up being applied.
p <- p + geom_text(data = filter(mydata, mydata[,5] == "C"), aes(label = sprintf("\u25b2")), colour = colors_vct["C"], hjust = -6)
p <- p + geom_text(data = filter(mydata, mydata[,6] == "D"), aes(label = sprintf("\u25b2")), colour = colors_vct["D"], hjust = -7)
p
This is what I want the final chart to look like (more or less, see bonus question below). Now, I'd like to iterate the last bit of code using a for loop. And this is where I'm running into trouble. It just ends up adding one layer only. How do I make this work? Here is my attempt:
#Set the colors into another table for matching:
colors_tbl <- tibble(Category = levels(mydata$Category),
colors = c("#0079c0", "#cc9900", "#252525", "#c5120e"))
for (i in seq_along(mydata$Category)) {
if (is_character(mydata[[i]])) { #This makes the loop skip if there is nothing to be applied, as with category B.
#Filters to just the specific categories I need to have the triangles shown.
triangles <- filter(mydata, mydata[,(i+2)] == levels(mydata$Category)[i])
#Matches up with the colors_tbl to determine which color to use for that triangle.
triangles <- mutate(triangles, colors = colors_tbl$colors[match(levels(triangles$Category)[i], colors_tbl$Category)])
#Sets a particular position for that triangle for the hjust argument below.
pos <- -(i+3)
#Adding the layer to the plot object
p <- p + geom_text(data = triangles, aes(label = sprintf("\u25b2")), color = triangles$colors, hjust = pos)
}
}
p
:(
Bonus question: Is there a way I can avoid gaps in between the triangles, as per the 2nd chart?
EDIT: As per #baptiste 's suggestion, I re-processed the data as such:
mydata2 <- mydata %>% gather(key = comp, value = Present, -Result, -Category)
mydata2 <- mydata2 %>% mutate(colors = colors_tbl$colors[match(mydata2$Present, colors_tbl$Category)]) %>%
filter(!is.na(mydata2$Present)) %>% select(-comp)
mydata2 <- mydata2 %>% mutate(pos = if_else(Present == "A", -4, if_else(Present == "B", -5, if_else(Present == "C", -6, -7))))
p <- p + geom_text(data = mydata2, aes(x = Category, label = sprintf("\u25b2")), colour = mydata2$colors, hjust = mydata2$pos)
p
Ok, I got it to work. my bonus question still stands.