Adding a legend to ggplot - r

I have the below dataset;
Player
Goals
Shots
Regan Charles-Cook
10
32
Tony Watt
9
36
Bruce Anderson
8
26
Liam Boyce
8
44
Kyogo Furuhashi
8
31
Alfredo Morelos
8
80
Christian Ramirez
8
41
Liel Abada
7
57
Martin Boyle
7
43
Kevin van Veen
7
45
I am attempting a dumbbell chart and so far have the following code;
library(tidyverse)
library(ggplot2)
library(ggalt)
theme_set(theme_bw())
read_excel("SPL_Goals.xlsx")
data <- read_excel("SPL_Goals.xlsx")
data %>%
#the below code sets out the initial plot template without the data
ggplot(aes(x= Goals, xend = Shots, y= Player)) +
#below code inputs the data viz on to the plot
geom_dumbbell(
size = 1.5, color = "black", size_x = 10, #size=1.5 dictates black line size
size_xend = 3, colour_x = "green",
colour_xend = "red") +
labs(
title = "Scotland; Goals v Shots", #add title
subtitle = "Top 10; Matchday 22", #add subtitle
x = "Total", y = "Player"
)+
geom_text(aes(label = Goals))
This produces the below chart;
My query is how do I order the chart so Goals (the green circle) is ascending and also add a legend to show Goals (green) and Shots (red)? I have tried mutate, reorder and fct_reorder but I am doing something wrong as none of these are working.

To change the ordering of the y-axis, you need to reorder your y-axis variable based on the value that you want to order by. In your case, this means reordering Player based on Goals:
data %>% mutate(Player = reorder(Player, Goals) %>% …
Next, creating a manual legend for ggdumbell doesn’t seem possible, or at least it isn’t obvious to me how. However, if you draw the chart manually, you can add a manual legend. This requires several things:
Create the dumbbell shape manually by plotting a geom_segment and two geom_points.
Creating an aes for the colour, which will get mapped into the legend
Create a manual color scale which we use to translate the mapping into actual colours, and to draw the legend.
Putting all that together:
data %>%
mutate(Player = reorder(Player, Goals)) %>%
ggplot(aes(x = Goals, y = Player)) +
geom_segment(aes(xend = Shots, yend = Player), color = "black", size = 1.5) +
geom_point(aes(color = "Goals"), size = 10) +
geom_point(aes(x = Shots, color = "Shots"), size = 3) +
geom_text(aes(label = Goals), color = "black") +
scale_color_manual(
values = c(Goals = "green", Shots = "red"),
guide = guide_legend(title = "", override.aes = list(size = 3))
) +
labs(
title = "Scotland; Goals v Shots",
subtitle = "Top 10; Matchday 22",
x = "Total", y = "Player"
)
We use override.aes to specify a fixed size for the points inside the legend. Without this, ‘ggplot2’ would overlay two points of different sizes. We also set the title of the legend to "" because the default title would be “colour”, and a title doesn’t seem necessary here.
I’ve also used theme_set(theme_bw() + theme(legend.position = "bottom") to generate the image above.
The above is still ordered slightly weirdly, because for players with the same number of goal shots the ordering is still arbitrary. I would probably order those players by (descending) attempted goal shots. That is, it’s better for a player to have scored many goals with the least attempts.
Unfortunately reorder doesn’t support such an ordering directly. Instead, we need to resort to arrangeing the entire table and then reorder players by their row number:
data %>%
arrange(Goals, -Shots) %>%
mutate(Player = reorder(Player, row_number())) %>%
ggplot(aes(x = Goals, y = Player)) +
geom_segment(aes(xend = Shots, yend = Player), color = "black", size = 1.5) +
geom_point(aes(color = "Goals"), size = 10) +
geom_point(aes(x = Shots, color = "Shots"), size = 3) +
geom_text(aes(label = Goals), color = "black") +
scale_color_manual(
values = c(Goals = "green", Shots = "red"),
guide = guide_legend(title = "", override.aes = list(size = 3))
) +
labs(
title = "Scotland; Goals v Shots",
subtitle = "Top 10; Matchday 22",
x = "Total", y = "Player"
)

Related

Any way to flip coordinates without reversing everything else?

I'm working on a plotting function that has the option to flip coordinates (using coord_flip). The thing is, this is a plot by group (using the fill argument), which means, for some reason, coord_flip also reverses colors, legend, my value column and my fill column. In practice, this means I have the following pice of code in my function:
if(flip_coord){
colors = c("#CC0000", "#002D73" ) %>% rev
rev_legend = T
table[[col_plot]] = fct_rev(table[[col_plot]]) # value column
table[['origin_table']] = fct_rev(table[['origin_table']]) # fill column
} else{
colors = c("#CC0000", "#002D73" )
rev_legend = F
}
There's also this line in my plot:
{if(flip_coord) coord_flip()} +
This brings back everything else that gets scrambled with coord_flip, but isn't too elegant. Is there a better way to only flip coordinates without reversing everything else?
PS: I know there's no reproducible example here, I'll try to add one, but if someone has already stumbled upon the answer to this problem that might be common, I'll post as is for the moment.
Edit: made some reprex. Let's say my data is this:
df = tibble(origin = c('2000s', '1990s') %>% rep(2),
region = c('South', 'North') %>% rep(2) %>% sort,
value = 1:4) %>%
mutate(origin = factor(origin, levels = c('1990s', '2000s')),
region = factor(region, levels = c('North', 'South')))
colors = c('red', 'blue')
# origin region value
# <fct> <fct> <int>
# 1 2000s North 1
# 2 1990s North 2
# 3 2000s South 3
# 4 1990s South 4
If I plot regularly, everything comes ordered (90s first, 00s second, North first, South second):
df %>%
ggplot(aes(x = region, fill = origin, y = value)) +
geom_bar(stat = "identity", position = 'dodge', color = "white", alpha= 0.8)+
scale_fill_manual(values=colors)
But, if I flip coordinates (just adding + coord_flip() to the code above) I get the following:
South above north, 00s above 90s and the legend isn't in the same order than the bars. This is exactly the same if I input x = value and y = origin. So, to fix this I have to do the following:
df2 = df
df2[['region']] = fct_rev(df2[['region']]) # Change 1
df2[['origin']] = fct_rev(df2[['origin']]) # Change 2
df2 %>%
ggplot(aes(x = value, fill = origin, y = region)) +
geom_bar(stat = "identity", position = 'dodge', color = "white", alpha= 0.8) +
guides(fill = guide_legend(reverse = T)) + # Change 3
scale_fill_manual(values=rev(colors)) # Change 4
Bringing the correct orders:
Is there any less cumbersome way to achieve this?
The issue is that coord_flip() changes the ordering of bars within groups in grouped bar plot:
According to here a hacky way to solve is to put width of position_dodge() to negative,
With scale_x_discrete(limits=rev)+ we get North in correct position:
library(tidyverse)
df %>%
ggplot(aes(x=region, y=value, fill=origin))+
geom_col(position = position_dodge(), width = -0.4)+
scale_fill_manual(values = c("red", "blue")) +
coord_flip()+
scale_x_discrete(limits=rev)+
theme_minimal(base_size=16)+
theme(axis.title.x=element_blank(),
axis.title.y=element_blank())
Coord flip does not flip everything around. Factors are plotted starting from the bottom. Thus, 1990 will be below 2000, and North will be below South.
The simplest way I can see is to simply reverse your factor levels. (when creating your factors).
library(tidyverse)
df <- tibble(
origin = c("2000s", "1990s") %>% rep(2),
region = c("South", "North") %>% rep(2) %>% sort(),
value = 1:4
) %>%
mutate(
## just reverse the factor levels
origin = factor(origin, levels = rev(c("1990s", "2000s"))),
region = factor(region, levels = rev(c("North", "South")))
)
colors <- c("red", "blue")
df %>%
# switched x and y
ggplot(aes(y = region, x = value, fill = origin)) +
geom_bar(stat = "identity", position = "dodge", color = "white", alpha = 0.8) +
## this is to set the correct legend order and mapping to your colors
scale_fill_manual(values = colors, breaks = rev(unique(df$origin)))

Is it possible to make a gantt chart in ggplot with multiple time intervals on the same chart?

I am trying to create a gantt chart in R where I can have the timeline of Actual Start and Actual Finish vs. Proposed Start and Proposed Finish on the same chart as a comparison. So the y-axis would say "Warehouse" twice and then the x-axis would have the segement for each time interval.
Below is the code I used to attempt this, but it clearly doesn't work.
As always, I appreciate any and all assistance. Thank you.
library(tidyverse)
project_data <- tibble(
Project_Name = c("Warehouse"),
Proposed_Start = c("05-01-2022"),
Proposed_Finish = c("12-01-2022"),
Actual_Start = c("07-01-2022"),
Actual_Finish = c("12-31-2022")
)
project_data %>%
ggplot() +
aes(x = Proposed_Start, xend = Proposed_Finish,
y = Project_Name, yend = Project_Name,
color = "green") +
geom_segment(size = 8) +
aes(x = Actual_Start, xend = Actual_Finish,
y = Project_Name, yend = Project_Name,
color = "red") +
geom_segment(size = 8) +
labs(title = "Comparing Project Proposed and Actual Dates",
x = "Start and Finish Dates",
y = "Project Name")
You can use pivot_longer, which will put your start dates in one column, your finish dates in a different column, and create a column to label the rows as "Proposed" and "Actual". You can dodge the lines so they are both visible at the same point on the y axis. This is easier if you switch to geom_linerange. After pivoting, you only need a single geom layer, since you can map the Proposed/Actual column to colour.
This should work for multiple projects in the same plot.
project_data %>%
pivot_longer(-1, names_sep = "_", names_to = c("Type", ".value")) %>%
ggplot(aes(xmin = Start, xmax = Finish, color = Type,
y = Project_Name)) +
geom_linerange(size = 8, position = position_dodge(width = 0.5)) +
labs(title = "Comparing Project Proposed and Actual Dates",
x = "Start and Finish Dates",
y = "Project Name") +
scale_colour_manual(values = c("orange", "deepskyblue4")) +
theme_minimal(base_size = 16)

Highlighting lines and gray out rest in multiple line chart with ggplot2?

So I had generated a multiple line chart on ggplot with different countries and want to colour the top 10 and grey out the rest.
When I assign colours black and red, it colours the first two countries in the legend. However I want to colour other ones down the list. US, India, Brazil in the chart. Help much appreciated, thanks.
This is what I have:
and the code here:
ggplot(data=y, aes(x = Date, y = Deaths, color = Country)) +
geom_line() +
scale_color_manual(values = c("black",
"red",
rep("gray", 196)))
You first need to order your countries according to the number of deaths, then in scale_color_manual you need the first 10 colours to be of your choice, not just the first two:
library(ggplot2)
y$Country <- reorder(y$Country, -y$Deaths)
ggplot(data = y, aes(x = Date, y = Deaths, color = Country)) +
geom_line() +
scale_color_manual(values = c(rep(c("black", "red"), each = 5),
rep("gray", nrow(y) - 10))) +
guides(color = guide_none())
Note that since you didn't share your data, I made some up with a similar structure and the same names as yours so that the above code should also work on your own data set.
Made-up data
set.seed(1)
y <- data.frame(
Deaths = c(replicate(198, 1000 * cumprod(runif(100, 1, 1 + sample(10, 1)/100)))),
Date = rep(seq(as.POSIXct('2020-01-01'), as.POSIXct('2022-01-01'), len = 100),
198),
Country = factor(rep(1:198, each = 100)))

Legend for bar chart with horizontal bars

I am using ggplot2 to produce a bar chart and I would like to include my main result as well as a "gold standard" on the same chart. I have tried a couple of methods but I am not able to produce an appropriate legend for the chart.
Method 1
Here I use geom_col() for my main result and geom_errorbar() for my "gold standard". I don't know how to show a simple legend (red = gold standard, blue = score) to match this chart. Additionally, I don't like that the error bar overlaps the axis grid line at 1.00 (instead of meeting it exactly).
chart_A_data <- data_frame(students= c("Alice", "Bob", "Charlie"),
score = c(0.5, 0.7, 0.8),
max_score = c(1, 1 , 1))
chart_A <- ggplot(chart_A_data, aes(x = students, y = score)) +
geom_col(fill = "blue") +
geom_errorbar(aes(ymin = max_score, ymax = max_score),
size = 2, colour = "red") +
ggtitle("Chart A", subtitle = "Use errorbars to show \"gold standard\"")
chart_A
Method 2
Here I create dummy variables and produce a stacked bar chart using geom_bar() and then make the unused dummy variable transparent. I am happy with how precise this method is but I don't know how to remove the unused dummy variable from my legend. Additionally, In this case I need to treat any score of 1.00 as a special case (i.e. set it to 0.99 to make space for the "gold standard").
chart_B_data <- chart_A_data %>%
select(-max_score) %>%
# create dummy variables for stacked bars, note: error if score>0.99
mutate(max_score_line = 0.01) %>%
mutate(blank_fill = 0.99 - score) %>%
gather(stat_level, pct, -students) %>%
# set as factor to control order of stacked bars
mutate(stat_level = factor(stat_level,
levels = c("max_score_line", "blank_fill", "score"),
labels = c("max", "", "score")))
chart_B <- ggplot(data = chart_B_data,
aes(x = students, y = pct, fill = stat_level, alpha = stat_level)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual(values = c("red", "pink", "blue")) +
scale_alpha_manual(values = c(1,0,1)) +
ggtitle("Chart B", subtitle = "Create dummy variables and use stacked bar chart")
chart_B
I don't mind if there is a completely different way I should be approaching this, but I really would like to be able to show a gold standard on my bar chart with a simple concise legend. I will be writing a script to do 50-60 of these charts so I don't want to have too many "special cases" to think about.
In case there's only one max score: This may seem a little hacky (and probably not that beautiful), but it does the job:
ggplot(chart_A_data, aes(x = students, y = score))+
geom_col()+
geom_hline(yintercept = chart_A_data$max_score)
Another one:
ggplot(chart_A_data, aes(x = students,
y = score,
fill = students))+
geom_col()+
geom_segment(aes(x = as.numeric(students)-.15,
xend = as.numeric(students)+.15,
y = max_score,
yend = max_score,
color = students))
Here for the case there are variable maximum scores for each student (you may need to play with the hard-coded 0.15 untill you find something suitable):
Edit after the OP clarified request:
ggplot(chart_A_data, aes(x = students,
y = score))+
geom_col(aes(fill = "blue"))+
geom_segment(aes(x = as.numeric(students)-.25,
xend = as.numeric(students)+.25,
y = max_score,
yend = max_score, color = "red"),
size = 1.7)+
scale_fill_identity(name = "",
guide = "legend",
labels = "Score")+
scale_color_manual(name = "",
values = ("red" = "red"),
labels = "Max Score")
Which produces:

Unable to add legend to economic time series chart

I'm attempting to add a legend to a time series chart and I've so far been unable to get any traction. I've provided the working code below, which pulls three economic data series into one chart and applies several changes to get in a format/overall aesthetic that I'd like. I should also add that the chart is graphing the y/y change of quarterly data sets.
I've only been able to find examples of individuals using scale_colour_manual to add a legend - I've provided code that I put together below.
Ideally, the legend just needs to appear to the right of the graph with the color and line chart.
Any help would be greatly appreciated!
library(quantmod)
library(TTR)
library(ggthemes)
library(tidyverse)
Nondurable <- getSymbols("PCND", src = "FRED", auto.assign = F)
Nondurable$chng <- ROC(Nondurable$PCND,4)
Durable <- getSymbols("PCDG", src = "FRED", auto.assign = F)
Durable$chng <- ROC(Durable$PCDG,4)
Services <- getSymbols("PCESV", src = "FRED", auto.assign = F)
Services$chng <- ROC(Services$PCESV, 4)
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng), color = "#5b9bd5", size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng), color = "#00b050", size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng), color = "#ed7d31", size = 1, linetype = "twodash") +
theme_tufte() +
scale_y_continuous(labels = percent, limits = c(-0.01,.09)) +
xlim(as.Date(c('1/1/2010', '6/30/2019'), format="%d/%m/%Y")) +
labs(y = "Percent Change", x = "", caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis") +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(name = 'Legend',
guide = 'legend',
values = c('Nondurable' = '#5b9bd5',
'Durable' = '#00b050',
'Services' = '#ed7d31'),
labels = c('Nondurable',
'Durable',
'Services'))
I receive the following warning messages when I run the program (the chart still plots though).
Warning messages:
1: Removed 252 rows containing missing values (geom_path).
2: Removed 252 rows containing missing values (geom_path).
3: Removed 252 rows containing missing values (geom_path).
There are two reasons you are receiving this error:
The bulk are being removed because of your limits. When you use xlim() or scale_y_continuous(..., limits = ...) ggplot removes the values beyond these limits from your data before plotting and displays that warning as an FYI. After commenting out both of those lines, you will still see a message about removed values but a much smaller number. This is becuase
you have NA values in the first 4 rows of column chng. This is true in all 3 datasets.
For the scales to show, you need to put something differentiating the lines in the aes() as in aes(..., color = "Nondurable"). See if this solution works for you:
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng, color = "Nondurable"), size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng, color = "Durable"), size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng, color = "Services"), size = 1, linetype = "twodash") +
theme_tufte() +
labs(
y = "Percent Change",
x = "",
caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis"
) +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(
name = "Legend",
values = c("#5b9bd5","#00b050","#ed7d31"),
labels = c("Nondurable", "Durable", "Services"
)
) +
scale_x_date(limits = as.Date(c("2010-01-01", "2019-02-01")))

Resources