How to add custom legend in ggboxplot - r

I'm trying to create some boxplots in R. I've been using both ggboxplot and ggplot. This is my code and output so far:
ggboxplot:
ggboxplot(shp_PA#data, x = "hei_1998to2007_cat", y = "adjrate.2008to2017",
xlab = "Hazardous Exposure Index Jenks",
ylab = "Lung Cancer Incidence Rate",
color = "red",
add = c("jitter", "mean"),
add.params = list(color = "black", shape=20))
ggplot:
shp_PA#data %>%
ggplot(aes(x=hei_1998to2007_cat, y=adjrate.2008to2017)) +
geom_boxplot(colour = "red") +
geom_jitter(color="black", size=0.75) +
stat_summary(fun=mean, geom="point", shape=4, size=3, color="black") +
xlab("Hazardous Exposure Index Jenks") +
ylab("Lung Cancer Incidence Rate")
My main interest right now is in putting a legend on each boxplot that has the symbol used to depict the mean, and the word "Mean" next to it. In base R, its as simple as putting something like
legend("topright", legend=c("Mean"),pch=5, col="red")
but I can't figure it out in ggboxplot or ggplot. Most of the things I've seen online discuss modifying a legend that is already present.
One other thing I'm wondering how to do is specific to ggboxplot. I want to be able to make the color and shape of the jitter points different from the symbol for the mean. I've tried changing the add.params code to
add.params = list(color = c("black", "blue"), shape=c(20,4))
but I get the error
Error: Aesthetics must be either length 1 or the same as the data (213): shape and colour
Any help is greatly appreciated!
Edit: Add reproducible example using iris dataset in R
ggboxplot:
ggboxplot(iris, x = "Species", y = "Sepal.Length",
color = "red",
add = c("jitter", "mean"),
add.params = list(color = "black", shape=20))
ggplot:
ggplot(data=iris, aes(x=Species, y=Sepal.Length)) +
geom_boxplot(colour = "red") +
geom_jitter(color="black", size=0.75) +
stat_summary(fun=mean, geom="point", shape=4, size=3, color="black")
Again, I'd like to add a legend with the symbol used to depict the mean and the word "Mean", and be able to use ggboxplot to have the color and shape of the jitter and mean to be different.

Its a bit of a non-standard way to use ggplot, but you can do something like this.
add a legend with the symbol used to depict the mean and the word "Mean"
Map different shapes to geom_jitter and stat_summary using aes. Control those shapes using scale_shape_manual
have the color and shape of the jitter and mean to be different
Use color to change the colors for the jitter points and mean point, and use override.aes to change the colors in the legend.
ggplot(data=iris, aes(x=Species, y=Sepal.Length)) +
geom_boxplot(colour = "red") +
geom_jitter(size=1, color = 'green', aes(shape = 'all data')) +
stat_summary(fun=mean, geom="point", size=3, color = 'black', aes(shape = 'mean')) +
scale_shape_manual(values = c(20, 4)) +
guides(shape = guide_legend(override.aes = list(color = c('green', 'black'))))
Another similar answer here: https://stackoverflow.com/a/5179731/12400385

Welcome to SO!
Adding custom labels to ggplot2 is notoriously difficult, and I believe this is by design. All legends are controlled by the arguments placed in aes and scale_*_[continuoues|discrete|manual]. If we don't want to start learning how to grob (likely spending several hours) we can however achieve the desired output by
Adding are statistic to the data itself
Create a column indicating which is the statistic and which is data points
Abuse that we can subset the data directly in our geom_* function to create a specific layer for jitter and non-jittered points, and set the shape in the aestethics of these layers
Customize the marks using scale_shape_manual (or scale_shape_discrete).
Using the mtcars dataset as an example (and dplyr for piping) we can obtain something very similar to ggboxplot
library(ggplot2)
library(dplyr)
data(mtcars)
# Setup data with mean instead of using stat_summary
mtcars %>%
select(cyl, hp) %>%
group_by(cyl) %>%
summarize(hp = mean(hp)) %>%
bind_cols(stat = factor(rep('mean', 3))) %>%
bind_rows(mtcars %>%
select(cyl, hp) %>%
bind_cols(stat = rep('data', nrow(mtcars)))) %>%
# Create ggplot
ggplot(aes(x = factor(cyl), y = hp)) +
geom_boxplot(colour = 'red') +
# Jitter based on subset of data. Do the same for geom_point (means)
## Note that to only plot a subset I pass a function to data that "filters" the data.
geom_jitter(data = function(.data)filter(.data, stat == 'data'),
aes(shape = stat), color = 'black') +
# Add mean to the point and change shape into something we like.
geom_point(data = function(.data)filter(.data, stat == 'mean'),
aes(shape = stat), size = 2.5) +
## Use scale_shape_manual to change shape into something i like.
scale_shape_manual(values = c('mean' = 8, 'data' = 16)) +
# Fix the plot theme to be similar to ggboxplot
theme(panel.grid = element_line(colour = NA),
panel.background = element_rect(fill = "#00000000"),
axis.line.x = element_line(colour = 'black'),
axis.line.y = element_line(colour = 'black'),
axis.text = element_text(size = 11),
legend.position = 'bottom'
) +
# Remove label from the legend if wanted
labs(shape = NULL)

Related

R: Legend for geom_polygon() with single value

I'm using ggplot2 for map plots in R. How do I add a legend entry for a layer without a scale, just for a uniform color:
geom_polygon(data = watercourses, fill = "#0055aa", alpha = .5)
I just want to have the item title "Watercourses" and a color block representing the correct fill color. How does this work? So far, I only figured out how I can include scales to the legend.
Thank you!
EDIT: Here's an example with the NC dataset.
Map without centroids in legend
library(sf)
library(ggplot2)
demo(nc)
nc_centroids <- st_centroid(nc)
ggplot(nc) +
geom_sf(aes(fill = BIR74)) +
scale_fill_gradient(low = "white", high = "red") +
geom_sf(data = nc_centroids, color = "blue") +
coord_sf()
Wrong usage of aes() for legend
ggplot(nc) +
geom_sf(aes(fill = BIR74)) +
scale_fill_gradient(low = "white", high = "red") +
geom_sf(data = nc_centroids, aes(color = "blue")) +
coord_sf()
Trying to add the centroids to the legend (based on the answer of r2evans, https://stackoverflow.com/a/75346358/4921339)
ggplot(nc) +
geom_sf(aes(fill = BIR74)) +
scale_fill_gradient(low = "white", high = "red") +
geom_sf(data = nc_centroids, aes(color = "County centroids")) +
scale_fill_manual(name = "Centroids", values = c("County centroids" = "blue"))
coord_sf()
Throws the following messages and an error:
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
Error: Continuous value supplied to discrete scale
In my original case I use sp package instead of sf, but the messages and error thrown in the end are the same.
I think I did not yet understand how things work here, unfortunately. Any helping hints are highly appreciated.
If you place your fill in an aes(.), it will create a legend. Since you want a specific color, I suggest also adding scale_fill_manual:
ggplot(mtcars[-(1:3),], aes(mpg, disp)) +
geom_point() +
# placeholder for your `geom_polygon`:
geom_point(data = mtcars[1:3,], aes(fill = "something"), alpha = 0.5) +
scale_fill_manual(name = "something else", values = c("something" = "#0055aa"))
Perhaps this to add your blue points:
ggplot(nc) +
geom_sf(aes(fill = BIR74)) +
scale_fill_gradient(low = "white", high = "red") +
geom_sf(data = transform(nc_centroids, col = "County centroids"), aes(colour = col)) +
coord_sf() +
scale_colour_manual(name = NULL, values = c("County centroids" = "blue"))
EDIT by #winnewoerp: Explanation of how it works, for those who have problems understanding it all (like me until now...):
Add an extra column to the data frame with a unique value to be used within the legend, this can be done e.g. using the transform() function like in the given example or like df$col <- "Column unique value" prior to ggplot().
The (sole?) advantage to using transform is that there is no need to alter the original data, which might affect other processes on it (outside of this image). One disadvantage to doing it this way is that one must hard-code the "Column unique value" in both the transform and the scale_colour_manual, below.
Use scale_colour_manual() to add the legend item (blue colour example):
scale_colour_manual(
name = "Legend title",
values = c("Column unique value" = "blue")
)

making changes on boxplot objects

these are my codes:
ggplot(summer.months, aes(x = month, y = Temp_mean, linetype = position, color = canopy, fill = position)) +
geom_boxplot() +
theme_bw() +
ggtitle(" Temperature changes in elevated and lying deadwood in summer under different canopies") +
labs(y = "temperature values(C°)", x = "months") +
scale_fill_manual(values = c("white", "white", "green", "black"))
my professor said:
i have to put number of objects on the legend on the graph and put the legend on the upper right-hand corner of the graph & make the legend bigger.
put the months in a chronological order like 11,12,1,2,3,4.. ( put the names of the months in the graph instead of numbers)
i created a basic ggplot but the problem is i can´t do the changes that they want from me cuz the names and order of the objects are so in my excel data.
A dput(head(summer.months)) might be sufficient. Anyway, here's an example using internal dataset mpg for illustrating few adjustments:
library(tidyverse)
## changing variable for x-Axis into ordered factor - this is a bit of a workaround. If using dates,
## it is better to use datatype date and adjust axis labels accordingly
my_mpg <- mpg %>%
mutate(class = factor(class, levels = c("compact", "midsize", "suv", "2seater", "minivan", "pickup", "subcompact"), ordered = TRUE))
ggplot(my_mpg, aes(x = class, y = hwy, linetype = class, colour = fl, fill = drv)) +
geom_boxplot() +
scale_fill_manual(values = c("white", "white", "green", "black")) +
## using subtitle to add information about the dataset
labs(title = "title", subtitle = paste("#lines: ", nrow(mpg))) +
theme_bw() +
theme(legend.justification = "top", ## move legend to top
legend.text = element_text(size = 10), ## adjust text sizes in legend
legend.title = element_text(size =10),
legend.key.size = unit(20, "pt"), ## if required: adjust size of legend keys
plot.subtitle = element_text(hjust = 1.0)) ## shift subtitle to the right
You might find further hints in ggplot2 reference and the ggplot2 book.

ggplot2 custom legend with multiple geom overlays: guide_legend() confusion

I want to create a customized legend that distinguishes two plotted geoms using appropriate shape and color. I see that guide_legend() should be involved, but my legend is presented with both shapes overlayed one on the other for both components of the legend. What is the right way to build these individual legend components using distinct shapes and colors? Thank you.
library(dplyr)
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
ggplot() +
geom_point(data = df, aes(x = year, y = annualNitrogen, fill="green"), shape=24, color="green", size = 4) +
geom_point(data = df, aes(x = year, y = annualPotassium, fill="blue"), color="blue", shape=21, size = 4) +
guides(fill = guide_legend(override.aes = list(color=c("green", "blue"))),
shape = guide_legend(override.aes = list(shape=c(21, 24)))
) +
scale_fill_manual(name = 'cumulative\nmaterial',
values = c("blue"="blue" , "green"="green" ),
labels = c("potassium" , "nitrogen") ) +
theme_bw() +
theme(legend.position="bottom")
Here it helps to transform to "long" format which is more in line with how ggplot is designed to be used when separating factor levels within a single time series.
This allows us to map shape and color directly, rather than having to manually assign different values to multiple plotted series, like you do in your question.
library(tidyverse)
df %>%
pivot_longer(-year, names_to = "element") %>%
ggplot(aes(x=year, y = value, fill = element, shape = element, color = element)) +
geom_point(size = 4)+
scale_color_manual(values = c("green", "blue"))
Put your df into a long format that ggplot likes with tidyr::gather. You should only use one geom_point for this, you don't need separate geoms for separate variables. You can then specify the shape and variable in one call to geom_point.
df <- tibble(year=seq(2010,2020,1),
annualNitrogen=seq(100,200,10),
annualPotassium=seq(500,600,10))
df <- tidyr::gather(df, key = 'variable', value='value', annualNitrogen, annualPotassium)
ggplot(df) +
geom_point(aes(x = year, y = value, shape = variable, color = variable)) +
scale_color_manual(
name = 'cumulative\nmaterial',
values = c(
"annualPotassium" = "blue",
"annualNitrogen" = "green"),
labels = c("potassium" , "nitrogen")) +
guides(shape = FALSE)

Adding legend for combo bar and line graph -- ggplot ignoring commands

I am trying to make a bar chart with line plots as well. The graph has created fine but the legend does not want to add the line plots to the legend.
I have tried so many different ways of adding these to the legend including:
ggplot Legend Bar and Line in Same Graph
None of which have worked. show.legend also seems to have been ignored in the geom_line aes.
My code to create the graph is as follows:
ggplot(first_q, aes(fill = Segments)) +
geom_bar(aes(x= Segments, y= number_of_new_customers), stat =
"identity") + theme(axis.text.x = element_blank()) +
scale_y_continuous(expand = c(0, 0), limits = c(0,3000)) +
ylab('Number of Customers') + xlab('Segments') +
ggtitle('Number Customers in Q1 by Segments') +theme(plot.title =
element_text(hjust = 0.5)) +
geom_line(aes(x= Segments, y=count) ,stat="identity",
group = 1, size = 1.5, colour = "darkred", alpha = 0.9, show.legend =
TRUE) +
geom_line(aes(x= Segments, y=bond_count)
,stat="identity", group = 1, size = 1.5, colour = "blue", alpha =
0.9) +
geom_line(aes(x= Segments, y=variable_count)
,stat="identity", group = 1, size = 1.5, colour = "darkgreen",
alpha = 0.9) +
geom_line(aes(x= Segments, y=children_count)
,stat="identity", group = 1, size = 1.5, colour = "orange", alpha
= 0.9) +
guides(fill=guide_legend(title="Segments")) +
scale_color_discrete(name = "Prod", labels = c("count", "bond_count", "variable_count", "children_count)))
I am fairly new to R so if any further information is required or if this question could be better represented then please let me know.
Any help is greatly appreciated.
Alright, you need to remove a little bit of your stuff. I used the mtcars dataset, since you did not provide yours. I tried to keep your variable names and reduced the plot to necessary parts. The code is as follows:
first_q <- mtcars
first_q$Segments <- mtcars$mpg
first_q$val <- seq(1,nrow(mtcars))
first_q$number_of_new_costumers <- mtcars$hp
first_q$type <- "Line"
ggplot(first_q) +
geom_bar(aes(x= Segments, y= number_of_new_costumers, fill = "Bar"), stat =
"identity") + theme(axis.text.x = element_blank()) +
scale_y_continuous(expand = c(0, 0), limits = c(0,3000)) +
geom_line(aes(x=Segments,y=val, linetype="Line"))+
geom_line(aes(x=Segments,y=disp, linetype="next line"))
The answer you linked already gave the answer, but i try to explain. You want to plot the legend by using different properties of your data. So if you want to use different lines, you can declare this in your aes. This is what get's shown in your legend. So i used two different geom_lines here. Since the aes is both linetype, both get shown at the legend linetype.
the plot:
You can adapt this easily to your use. Make sure you using known keywords for the aesthetic if you want to solve it this way. Also you can change the title names afterwards by using:
labs(fill = "costum name")
If you want to add colours and the same line types, you can do customizing by using scale_linetype_manual like follows (i did not use fill for the bars this time):
library(ggplot2)
first_q <- mtcars
first_q$Segments <- mtcars$mpg
first_q$val <- seq(1,nrow(mtcars))
first_q$number_of_new_costumers <- mtcars$hp
first_q$type <- "Line"
cols = c("red", "green")
ggplot(first_q) +
geom_bar(aes(x= Segments, y= number_of_new_costumers), stat =
"identity") + theme(axis.text.x = element_blank()) +
scale_y_continuous(expand = c(0, 0), limits = c(0,3000)) +
geom_line(aes(x=Segments,y=val, linetype="solid"), color = "red", alpha = 0.4)+
geom_line(aes(x=Segments,y=disp, linetype="second"), color ="green", alpha = 0.5)+
scale_linetype_manual(values = c("solid","solid"),
guide = guide_legend(override.aes = list(colour = cols)))

Add geom_hline to legend

After searching the web both yesterday and today, the only way I get a legend working was to follow the solution by 'Brian Diggs' in this post:
Add legend to ggplot2 line plot
Which gives me the following code:
library(ggplot2)
ggplot()+
geom_line(data=myDf, aes(x=count, y=mean, color="TrueMean"))+
geom_hline(yintercept = myTrueMean, color="SampleMean")+
scale_colour_manual("",breaks=c("SampleMean", "TrueMean"),values=c("red","blue"))+
labs(title = "Plot showing convergens of Mean", x="Index", y="Mean")+
theme_minimal()
Everything works just fine if I remove the color of the hline, but if I add a value in the color of hline that is not an actual color (like "SampleMean") I get an error that it's not a color (only for the hline).
How can adding a such common thing as a legend big such a big problem? There much be an easier way?
To create the original data:
#Initial variables
myAlpha=2
myBeta=2
successes=14
n=20
fails=n-successes
#Posterior values
postAlpha=myAlpha+successes
postBeta=myBeta+fails
#Calculating the mean and SD
myTrueMean=(myAlpha+successes)/(myAlpha+successes+myBeta+fails)
myTrueSD=sqrt(((myAlpha+successes)*(myBeta+fails))/((myAlpha+successes+myBeta+fails)^2*(myAlpha+successes+myBeta+fails+1)))
#Simulate the data
simulateBeta=function(n,tmpAlpha,tmpBeta){
tmpValues=rbeta(n, tmpAlpha, tmpBeta)
tmpMean=mean(tmpValues)
tmpSD=sd(tmpValues)
returnVector=c(count=n, mean=tmpMean, sd=tmpSD)
return(returnVector)
}
#Make a df for the data
myDf=data.frame(t(sapply(2:10000, simulateBeta, postAlpha, postBeta)))
Given solution works in most of the cases, but not for geom_hline (vline). For them you usually don't have to use aes, but when you need to generate a legend then you have to wrap them within aes:
library(ggplot2)
ggplot() +
geom_line(aes(count, mean, color = "TrueMean"), myDf) +
geom_hline(aes(yintercept = myTrueMean, color = "SampleMean")) +
scale_colour_manual(values = c("red", "blue")) +
labs(title = "Plot showing convergens of Mean",
x = "Index",
y = "Mean",
color = NULL) +
theme_minimal()
Seeing original data you can use geom_point for better visualisation (also added some theme changes):
ggplot() +
geom_point(aes(count, mean, color = "Observed"), myDf,
alpha = 0.3, size = 0.7) +
geom_hline(aes(yintercept = myTrueMean, color = "Expected"),
linetype = 2, size = 0.5) +
scale_colour_manual(values = c("blue", "red")) +
labs(title = "Plot showing convergens of Mean",
x = "Index",
y = "Mean",
color = "Mean type") +
theme_minimal() +
guides(color = guide_legend(override.aes = list(
linetype = 0, size = 4, shape = 15, alpha = 1))
)

Resources