Changing legend labels when using multiple variables - r

In R using ggplot:
I'm trying to plot a line plot with multiple columns as individual variables. I'm not using a fill = parameter so I know that's why scale_fill_discrete doesn't work. From what I've seen from other similar questions, it seems like all the other options (scale_colour_discrete, scale_shape_discrete etc) require you to use those parameters in the first step of building the plot. That may be my main issue, but I don't know how to fix it with the three different variables. Right now the legend that shows up shows the three different colors but they are not associated with the right variable.
ggplot(summary_5yr) +
geom_line(aes(x = Year, y = NY_Med_Inc, group = 1, color ="blue")) +
geom_line(aes(x = Year, y = FL_Med_Inc, group = 1, color = "red")) +
geom_line(aes(x = Year, y = WA_Med_Inc, group = 1, color = "green")) +
labs(title = "Median Income Trends", x = "Year", y = "Median Income (USD)")

Try this. To get the colors and the legend right you have to make use of scale_color_manual. Using color = "blue" inside aes() will not set the color to "blue". Instead "blue" is simply a kind of label to which you have to assign a color inside scale_color_manual. Also. To get the correct labels you have to set the labels argument.
A second approach to achieve the desired plot would be to reshape your df into long format via e.g. tidyr::pivot_longer. This way only one geom_line layer is needed and you get the correct labels automatically.
library(ggplot2)
library(tidyr)
library(dplyr)
set.seed(123)
summary_5yr <- data.frame(
Year = 2010:2020,
NY_Med_Inc = runif(11, 10000, 50000),
FL_Med_Inc = runif(11, 10000, 50000),
WA_Med_Inc = runif(11, 10000, 50000)
)
ggplot(summary_5yr) +
geom_line(aes(x = Year, y = NY_Med_Inc, group = 1, color ="blue")) +
geom_line(aes(x = Year, y = FL_Med_Inc, group = 1, color = "red")) +
geom_line(aes(x = Year, y = WA_Med_Inc, group = 1, color = "green")) +
scale_color_manual(values = c(blue = "blue", red = "red", green = "green"),
labels = c(blue = "NY_Med_Inc", red = "FL_Med_Inc", green = "WA_Med_Inc")) +
labs(title = "Median Income Trends", x = "Year", y = "Median Income (USD)")
summary_5yr %>%
tidyr::pivot_longer(-Year, names_to = "var", values_to = "value") %>%
ggplot() +
geom_line(aes(x = Year, y = value, group = var, color = var)) +
scale_color_manual(values = c(NY_Med_Inc = "blue", FL_Med_Inc = "red", WA_Med_Inc = "green")) +
labs(title = "Median Income Trends", x = "Year", y = "Median Income (USD)")

Related

Matching Histogram ID label with fill colour

I created a geom_histogram using the dataset and code below, and I wanted to label each bar in histogram with the subject ID and color the bar according to the metabolizer group, I noticed that for some reason the ID label and the color don't match, the ID is correct on the x-axis value but it is not colored according to the group.
For example ID 72 in the graph below has a value of -2.85, the ID is correct on the x-axis location but should be colored dark green as a PM, same for ID 33 should be UM light blue color and so on!
Any suggestions! Thanks
The dataset:
Set.seed(4)
df <- data.frame(ID = factor(1:72), gengroup = c("UM","NM" ,"IM", "PM"), value = 2 - rgamma(72, 3, 2))
Histogram code:
p1 <- ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name= "Metabolizer group"), palette = "Paired", labels= c("UM","NM" ,"IM", "PM"))+
geom_histogram(bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
labs(x = NULL)
show(p1)
Graph:
You could extract the colors of the Paired palette using brewer.pal from RColorBrewer and manually assign them with scale_fill_manual like this:
set.seed(4)
df <- data.frame(ID = factor(1:72), gengroup = c("UM","NM" ,"IM", "PM"), value = 2 - rgamma(72, 3, 2))
library(ggplot2)
library(RColorBrewer)
colors <- brewer.pal(4, "Paired")
p1 <- ggplot(df, aes(x = value, fill = gengroup)) +
geom_histogram(bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
scale_fill_manual("Metabolizer group", values = c("UM" = colors[1],
"NM" = colors[2],
"IM" = colors[3],
"PM" = colors[4])) +
labs(x = NULL)
show(p1)
Created on 2022-09-12 with reprex v2.0.2
There are actually two issues in your code:
Using labels= c("UM","NM" ,"IM", "PM") you are changing the labels for your groups in the legend. But under the hood the colors are assigned by the order of the groups in the data, which by default is c("IM","NM" ,"PM", "UM"), e.g. the dark green which is labelled PM is actually assigned to gengroup UM. To fix that you set the limits = c("UM", "NM", "IM", "PM") instead of using labels
set.seed(4)
df <- data.frame(ID = factor(1:72), gengroup = c("UM", "NM", "IM", "PM"), value = 2 - rgamma(72, 3, 2))
library(dplyr)
library(ggplot2)
ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name= "Metabolizer group"), palette = "Paired", limits = c("UM", "NM", "IM", "PM"))+
geom_histogram(bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
labs(x = NULL)
As you see now your ID 72 get the correct dark green and the ID 33 the light blue.
However there are still some issues in all cases where the bars contain more than one ID, e.g. ID 8 should also be colored dark green but is colored light green.
The reason for that is that you apply a different grouping for the geom_histogram and for adding the labels via stat_bin. For the first the grouping is defined by gengroup while for the second you group by ID. This could be seen clearly by grouping the geom_histogram by ID too:
ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name= "Metabolizer group"), palette = "Paired", limits = c("UM", "NM", "IM", "PM"))+
geom_histogram(aes(group = ID), bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
labs(x = NULL)
As can be seen now we get the right colors but the bars are no longer stacked in the order of gengroup
To fix that and to stack the labels by gengroup you could convert ID to a factor with the order of the IDs set according to the order of gengroup. To this end I arrange the data first and use forcats::fct_inorder. However, to get right labels we also have to make use of a lookup table to assign the right labels inside after_stat:
df <- df |>
arrange(gengroup) |>
mutate(ID = forcats::fct_inorder(ID))
labels <- setNames(levels(df$ID), seq_along(levels(df$ID)))
ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name = "Metabolizer group"), palette = "Paired", limits = c("UM", "NM", "IM", "PM")) +
geom_histogram(bins = 30) +
stat_bin(
geom = "text", bins = 30, size = 2, na.rm = TRUE,
aes(
label = ifelse(after_stat(count) == 0, NA, after_stat(labels[group])),
group = ID, y = after_stat(count)
),
position = position_stack(vjust = 0.5)
) +
labs(x = NULL)

How to supress/combine the legend for two geoms in R

Im making a scatterplot which shows a value plotted against the date since symptom onset. These patients are categorised based on disease severity, and i wanted to show how the values change over time in each severity category. I have coloured the dots based on severity score, but i prefer to use shape =21 so i can have a border. I also draw a line to see the trend, and i want that coloured in the same way, however, this has added another legend and it looks complicated. This issue doesnt happen if use a different shape that isnt filled, because scale_colour_manual can be used for both the lines and the dots, but i dont think it looks as nice. Any idea how i can fix this?
IC50SymObySS <- ggplot(data = isaric) +
geom_point(mapping = aes(x = Days_since_onset, y = log2IC50, fill = Severity_score), size = 2, colour = "black", shape = 21)+
geom_smooth(mapping = aes(x = Days_since_onset, y = log2IC50, colour = Severity_score), se = FALSE)+
scale_fill_manual(breaks=c("1","2","3","4","5"),
values=c("1" = "lightblue1","2" = "lightblue3","3" = "lightblue4","4" = "lightcoral","5" = "firebrick2"),
labels=c("1","2","3","4","5"),
name = "Severity Score")+
scale_colour_manual(values=c("1" = "lightblue1","2" = "lightblue3","3" = "lightblue4","4" = "lightcoral","5" = "firebrick2"))+
theme_minimal()+
JTheme+
ylab("Serum Log2 IC50")+
xlab("Days Since Symptom Onset")+
guides(colour = guide_legend(title.position = "top", title.hjust = 0.5))
IC50SymObySS
As per this answer, you need to use identical name and labels values for both fill and colour scale.
library(ggplot2)
library(dplyr)
isaric <- transmute(iris,
Days_since_onset = (Sepal.Length - 4)^3,
log2IC50 = Sepal.Width * 3,
Severity_score = cut(Petal.Length, breaks = quantile(Petal.Length, prob = 0:5 / 5), labels = 1:5))
ggplot(data = isaric) +
geom_smooth(mapping = aes(x = Days_since_onset, y = log2IC50, colour = Severity_score), se = FALSE)+
geom_point(mapping = aes(x = Days_since_onset, y = log2IC50, fill = Severity_score), size = 2, colour = "black", shape = 21)+
scale_colour_manual(
name = "Severity Score",
values=c("1" = "lightblue1","2" = "lightblue3","3" = "lightblue4","4" = "lightcoral","5" = "firebrick2"),
labels=c("1","2","3","4","5"))+
scale_fill_manual(
name = "Severity Score",
breaks=c("1","2","3","4","5"),
values=c("1" = "lightblue1","2" = "lightblue3","3" = "lightblue4","4" = "lightcoral","5" = "firebrick2"),
labels=c("1","2","3","4","5"))+
theme_minimal()+
ylab("Serum Log2 IC50")+
xlab("Days Since Symptom Onset")+
guides(colour = guide_legend(title.position = "top", title.hjust = 0.5))

How to change the text and title of legend in ggplot with several variables

I'm trying to fix my legend text so that the text is representing the appropriate symbols and color. However, I have a lot of variables that I need to include in the legend, and they are all in different columns. Does anyone know a quick way to indicate what the colours and symbol are in the ggplot legend?
Here is some sample code
#sample data
temps = data.frame(Temperature= c(15,25,35),
Growth.Phase = c("exponential", "stationary", "death"),
Carbohydrates = sample(c(3:10), 9, replace = T),
Lipids = sample(c(10:25), 9, replace = T),
Chlorophyll = sample(c(2:15), 9),
DNA.RNA = sample(c(3:15), 9),
Protein = sample(c(5:20), 9))
temps$Shape = if_else(temps$Growth.Phase == "exponential", 21,
if_else(temps$Growth.Phase == "stationary", 22, 23))
#Graph code
ggplot(data = temps, aes(x = Temperature, y = "Proportions", shape = factor(Shape))) +
geom_point(aes(y = Carbohydrates),colour = "darkred",
fill = "darkred", size = 3) +
geom_line(aes(y = Carbohydrates), size = 1, col = "darkred") +
geom_point(aes(y = Lipids), colour = "darkblue",
fill = "darkblue", size = 3, col ="darkblue") +
geom_line(aes(y = Lipids), size = 1) +
geom_point(aes(y = Protein), colour = "violet",
fill = "violet", size = 3) +
geom_line(aes(y = Protein), size = 1, col ="violet") +
geom_point(aes(y = DNA.RNA), colour = "darkorange",
fill = "darkorange", size = 3) +
geom_line(aes(y = DNA.RNA), size = 1, col = "darkorange") +
geom_point(aes(y = Chlorophyll), size = 3, colour = "darkgreen",
fill = "darkgreen") +
geom_line(aes(y = Chlorophyll), size = 1, col = "darkgreen") +
labs(x = "Temperature (°C)", y = "Proportion")
This is the image I am getting
But as you can see it's not giving me the correct text in the legend. I would like the symbols to specify which Growth.Phase they are and the colour to specify what column I have plotted (ie. Carbohydrate, Protein etc....). Does anyone know a quick fix?
When I use my own data this is what the graph looks like, please note the lines are going through the same symbols, and are the same colours
I'm not sure whether I got the legend right. But the idea is the same as in #dc37's answer. Your plot can be considerably simplified using pivot_longer:
#sample data
temps = data.frame(Temperature= c(15,25,35),
Growth.Phase = c("exponential", "stationary", "death"),
Carbohydrates = sample(c(3:10), 9, replace = T),
Lipids = sample(c(10:25), 9, replace = T),
Chlorophyll = sample(c(2:15), 9),
DNA.RNA = sample(c(3:15), 9),
Protein = sample(c(5:20), 9))
library(ggplot2)
library(dplyr)
library(tidyr)
library(tibble)
temps_long <- temps %>%
pivot_longer(-c(Temperature, Growth.Phase)) %>%
mutate(
shape = case_when(
Growth.Phase == "exponential" ~ 21,
Growth.Phase == "stationary" ~ 22,
TRUE ~ 23
),
color = case_when(
name == "Carbohydrates" ~ "darkred",
name == "Lipids" ~ "darkblue",
name == "Protein" ~ "violet",
name == "DNA.RNA" ~ "darkorange",
name == "Chlorophyll" ~ "darkgreen",
TRUE ~ NA_character_
),
)
# named color vector
colors <- select(temps_long, name, color) %>%
distinct() %>%
deframe()
# named shape vector
shapes <- select(temps_long, Growth.Phase, shape) %>%
distinct() %>%
deframe()
ggplot(data = temps_long, aes(x = Temperature, y = value, shape = Growth.Phase, color = name, fill = name, group = Temperature)) +
geom_point(size = 3) +
geom_line(size = 1) +
scale_shape_manual(values = shapes) +
scale_fill_manual(values = colors) +
scale_color_manual(values = colors) +
labs(x = "Temperature (C)", y = "Proportion", color = "XXXX") +
guides(fill = FALSE, shape = guide_legend(override.aes = list(fill = "black")))
Created on 2020-04-04 by the reprex package (v0.3.0)
In order to make your code simpler and not have to repeat several times the same line, you can transform your data into a longer format and then use those new variables to attribute color, fill and shape arguments in your aes.
Then, using scale_color_manual or scale_shape_manual, you can set appropriate color and shape.
In order to add lines between appropriate points, I add a "rep" column in order to mimick the rpesence of replicate in your experiments. Otherwise, geom_line can't decide which points are associated together.
library(tidyr)
library(dplyr)
library(ggplot2)
temps %>% mutate(Rep = rep(1:3,each = 3)) %>%
pivot_longer(cols = Carbohydrates:Protein, names_to = "Type", values_to = "proportions") %>%
ggplot(aes(x = Temperature, y = proportions))+
geom_point(aes(fill = Type, shape = Growth.Phase, color = Type), size = 3)+
geom_line(aes( color = Type, group =interaction(Rep, Type)))+
scale_color_manual(values = c("darkred","darkgreen","darkorange","darkblue","violet"))+
scale_fill_manual(values = c("darkred","darkgreen","darkorange","darkblue","violet"))+
scale_shape_manual(values = c(23,21,22))+
labs(x = "Temperature (°C)", y = "Proportion")
Does it answer your question ?

ggplot2 fill not converting ggplotly color mapping correctly

I'm trying to convert a ggplot2 image using ggplotly(), but it's not converting correctly. My ggplot image takes in two lines all of the time, and has a color variable for the geom_line component and a fill variable for the geom_point component.
Here is my code for the strictly the basic ggplot part:
test_data1 = data.frame(
filter = "Filter 1",
time = seq(as.Date("2017-01-01"), as.Date("2017-03-01"), "days"),
ovr_perc = rnorm(n = 60, mean = 8, sd = 2),
neg_perc = rnorm(n = 60, mean = 6, sd = 2),
count = sample(50:250,60,replace=T)
)
test_data2 = data.frame(
filter = "Filter 2",
time = seq(as.Date("2017-01-01"), as.Date("2017-03-01"), "days"),
ovr_perc = rnorm(n = 60, mean = 20, sd = 6),
neg_perc = rnorm(n = 60, mean = 6, sd = 2),
count = sample(50:250,60,replace=T)
)
test_data = rbind(test_data1, test_data2)
p = ggplot(test_data, aes(x=time, y=ovr_perc, group = factor(filter))) +
geom_line(aes(color=factor(filter))) +
geom_point(aes(fill=neg_perc, size = count), shape=21) +
scale_fill_gradient2(low = "darkgreen", mid = "yellow", high = "red1",
midpoint = 5, limits = c(0, 10), oob = squish) +
scale_colour_manual(values = c("pink", "green")) +
theme_classic()
p
The image should look something like this, which is almost exactly how I want it to look:
However, when I try adding the ggplotly code to get the tooltip feature, it turns into this:
ggplotly(p)
The entire color scheme on each of the points is gone. How can I change this so it keeps the same color scale from the ggplot image to the ggplotly image?
As noted in the open bug report linked above, geom_point's aesthetic mapping for fill doesn't work properly, while color works fine. According to the bug report, the problem goes away in the package's dev version, but if you don't have easy access to that (neither do I), the following works, at least on my end:
library(dplyr)
library(plotly)
p2 <- ggplot(test_data, aes(x=time, y=ovr_perc, linetype = filter)) +
geom_line(data = . %>% filter(filter == "Filter 1"),
colour = "pink") +
geom_line(data = . %>% filter(filter == "Filter 2"),
colour = "green") +
geom_point(aes(color = neg_perc, size = count)) +
scale_color_gradient2(low = "darkgreen", mid = "yellow", high = "red1",
midpoint = 5, limits = c(0, 10), oob = squish) +
scale_linetype_manual(values = c("solid", "solid"),
guide = guide_legend(override.aes = list(
color = c("pink", "green")
))) +
theme_classic(); p2
ggplotly(p2)
Explanations:
This uses color rather than fill for geom_point, which works fine.
The lines are plotted in separate geom_line layers, with their colours specified directly, outside aes().
The lines' linetype is mapped inside aes(), to force the creation of a legend for each filter value, while the actual mapping specified in scale_linetype_manual sets both lines to be solid.

scale_color_gradient2 having text be rounded

I am using scale_fill_gradient2() and the colourbar that is created is showing decimal places. I tried to reproduce the text that shows decimals but could not but the text below is in scientific notations.
How can you round the numbers that displayed in the colourbar using scale_fill_gradient2()? For example I am seeing "25.00" and I'd like to show just "25"?
Also how can you set the labels manually? Let's say I want to look a the data and set labels like c(15, 25, 40)?
library(ggplot2)
dat <- data.frame(group = c(rep("A", 10), rep("B", 10)),
value = c(rnorm(10, 5,300), rnorm(10, 5000, 80000)))
ggplot(dat, aes(x = group, y = value, fill= value)) +
geom_bar(stat = "identity") +
scale_fill_gradient2(low = "red", mid = "yellow", high = "blue", midpoint = 0, name = "")
You can manually specify breaks and labels as needed.
ggplot(dat, aes(x = group, y = value, fill= value)) +
geom_bar(stat = "identity")+
scale_fill_gradient2(low = "red", mid = "yellow", high = "blue",
midpoint = 0, name = "",
breaks = c(0, 1e5, 2e5),
labels = c("0", "100,000", "200,000"))

Resources