I am using ggplot2 to produce a bar chart and I would like to include my main result as well as a "gold standard" on the same chart. I have tried a couple of methods but I am not able to produce an appropriate legend for the chart.
Method 1
Here I use geom_col() for my main result and geom_errorbar() for my "gold standard". I don't know how to show a simple legend (red = gold standard, blue = score) to match this chart. Additionally, I don't like that the error bar overlaps the axis grid line at 1.00 (instead of meeting it exactly).
chart_A_data <- data_frame(students= c("Alice", "Bob", "Charlie"),
score = c(0.5, 0.7, 0.8),
max_score = c(1, 1 , 1))
chart_A <- ggplot(chart_A_data, aes(x = students, y = score)) +
geom_col(fill = "blue") +
geom_errorbar(aes(ymin = max_score, ymax = max_score),
size = 2, colour = "red") +
ggtitle("Chart A", subtitle = "Use errorbars to show \"gold standard\"")
chart_A
Method 2
Here I create dummy variables and produce a stacked bar chart using geom_bar() and then make the unused dummy variable transparent. I am happy with how precise this method is but I don't know how to remove the unused dummy variable from my legend. Additionally, In this case I need to treat any score of 1.00 as a special case (i.e. set it to 0.99 to make space for the "gold standard").
chart_B_data <- chart_A_data %>%
select(-max_score) %>%
# create dummy variables for stacked bars, note: error if score>0.99
mutate(max_score_line = 0.01) %>%
mutate(blank_fill = 0.99 - score) %>%
gather(stat_level, pct, -students) %>%
# set as factor to control order of stacked bars
mutate(stat_level = factor(stat_level,
levels = c("max_score_line", "blank_fill", "score"),
labels = c("max", "", "score")))
chart_B <- ggplot(data = chart_B_data,
aes(x = students, y = pct, fill = stat_level, alpha = stat_level)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual(values = c("red", "pink", "blue")) +
scale_alpha_manual(values = c(1,0,1)) +
ggtitle("Chart B", subtitle = "Create dummy variables and use stacked bar chart")
chart_B
I don't mind if there is a completely different way I should be approaching this, but I really would like to be able to show a gold standard on my bar chart with a simple concise legend. I will be writing a script to do 50-60 of these charts so I don't want to have too many "special cases" to think about.
In case there's only one max score: This may seem a little hacky (and probably not that beautiful), but it does the job:
ggplot(chart_A_data, aes(x = students, y = score))+
geom_col()+
geom_hline(yintercept = chart_A_data$max_score)
Another one:
ggplot(chart_A_data, aes(x = students,
y = score,
fill = students))+
geom_col()+
geom_segment(aes(x = as.numeric(students)-.15,
xend = as.numeric(students)+.15,
y = max_score,
yend = max_score,
color = students))
Here for the case there are variable maximum scores for each student (you may need to play with the hard-coded 0.15 untill you find something suitable):
Edit after the OP clarified request:
ggplot(chart_A_data, aes(x = students,
y = score))+
geom_col(aes(fill = "blue"))+
geom_segment(aes(x = as.numeric(students)-.25,
xend = as.numeric(students)+.25,
y = max_score,
yend = max_score, color = "red"),
size = 1.7)+
scale_fill_identity(name = "",
guide = "legend",
labels = "Score")+
scale_color_manual(name = "",
values = ("red" = "red"),
labels = "Max Score")
Which produces:
Related
I'm creating a bar chart with a pattern for a subset of the bars, and I want to add error bars.
However, I'm having trouble lining up the error bars with with the bar charts—I want to have them appear centered on each bar. How do I do this? Moreover, the legend currently does not clearly distinguish the striped and non-striped bars as corresponding to not treated and treated groups.
Finally, I'd like to create version of this plot which stacks adjacent bars (i.e. bars within each facet_grid)—any tips on how to do that would be much appreciated.
The code I'm using is:
library(ggplot2)
library(tidyverse)
library(ggpattern)
models = c("a", "b")
task = c("1","2")
ratios = c(0.3, 0.4)
standard_errors = c(0.02, 0.02)
ymax = ratios + standard_errors
ymin = ratios - standard_errors
colors = c("#F39B7FFF", "#8491B4FF")
df <- data.frame(task = task, ratios = ratios)
df <- df %>% mutate(filler = 1-ratios)
df <- df %>% gather(key = "obs", value = "ratios", -1)
df$upper <- df$ratios + c(standard_errors,standard_errors)
df$models <- c(models,models)
df$lower <- df$ratios - c(standard_errors,standard_errors)
df$col <- c(colors,colors)
df$group <- paste(df$task, df$models, sep="-")
df$treated <- "yes"
df[df$ratios<0.5,]$treated = "no"
p <- ggplot(df, aes(x = group, y = ratios, fill = col, ymin = lower, ymax = upper)) +
stat_summary(aes(pattern=treated),
fun = "mean", position=position_dodge(),
geom = "bar_pattern", pattern_fill="black", colour="black") +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, position=position_dodge(0.9)) +
scale_pattern_manual(values=c("none", "stripe"))+ #edited part
facet_grid(.~task,
scales = "free_x", # Let the x axis vary across facets.
space = "free_x", # Let the width of facets vary and force all bars to have the same width.
switch = "x") + guides(colour = guide_legend(nrow = 1)) +
guides(fill = "none")
p
Here is an option
df %>%
ggplot(aes(x = models, y = ratios)) +
geom_col_pattern(
aes(fill = col, pattern = treated),
pattern_fill = "black",
colour = "black",
pattern_key_scale_factor = 0.2,
position = position_dodge()) +
geom_errorbar(
aes(ymin = lower, ymax = upper, group = interaction(task, treated)),
width = 0.2,
position = position_dodge(0.9)) +
facet_grid(~ task, scales = "free_x") +
scale_pattern_manual(values = c("none", "stripe")) +
scale_fill_identity()
A few comments:
I don't understand the point of creating group. IMO this is unnecessary. TBH, I also don't understand the point of models and task: if task = "1" then models = "a"; if task = "2" then models = "b"; so task and models are redundant as they encode the same thing (whether you call it "1"/"2" or "a"/"b").
The reason why you (originally) didn't see a pattern in the legend is because of the scale factor in the legend key. As per ?scale_col_pattern, you can adjust this with the pattern_key_scale_factor parameter. Here, I've chosen pattern_key_scale_factor = 0.2 but you may want to play with different values.
The reason why the error bars didn't align with the dodged bars was because geom_errorbar didn't know that there are different task-treated combinations. We can fix this by explicitly defining a group aesthetic given by the combination of task & treated values. The reason why you don't need this in geom_col_pattern is because you already allow for different treated values through the pattern aesthetic.
You want to use scale_fill_identity() if you already have actual colour values defined in the data.frame.
I have the below dataset;
Player
Goals
Shots
Regan Charles-Cook
10
32
Tony Watt
9
36
Bruce Anderson
8
26
Liam Boyce
8
44
Kyogo Furuhashi
8
31
Alfredo Morelos
8
80
Christian Ramirez
8
41
Liel Abada
7
57
Martin Boyle
7
43
Kevin van Veen
7
45
I am attempting a dumbbell chart and so far have the following code;
library(tidyverse)
library(ggplot2)
library(ggalt)
theme_set(theme_bw())
read_excel("SPL_Goals.xlsx")
data <- read_excel("SPL_Goals.xlsx")
data %>%
#the below code sets out the initial plot template without the data
ggplot(aes(x= Goals, xend = Shots, y= Player)) +
#below code inputs the data viz on to the plot
geom_dumbbell(
size = 1.5, color = "black", size_x = 10, #size=1.5 dictates black line size
size_xend = 3, colour_x = "green",
colour_xend = "red") +
labs(
title = "Scotland; Goals v Shots", #add title
subtitle = "Top 10; Matchday 22", #add subtitle
x = "Total", y = "Player"
)+
geom_text(aes(label = Goals))
This produces the below chart;
My query is how do I order the chart so Goals (the green circle) is ascending and also add a legend to show Goals (green) and Shots (red)? I have tried mutate, reorder and fct_reorder but I am doing something wrong as none of these are working.
To change the ordering of the y-axis, you need to reorder your y-axis variable based on the value that you want to order by. In your case, this means reordering Player based on Goals:
data %>% mutate(Player = reorder(Player, Goals) %>% …
Next, creating a manual legend for ggdumbell doesn’t seem possible, or at least it isn’t obvious to me how. However, if you draw the chart manually, you can add a manual legend. This requires several things:
Create the dumbbell shape manually by plotting a geom_segment and two geom_points.
Creating an aes for the colour, which will get mapped into the legend
Create a manual color scale which we use to translate the mapping into actual colours, and to draw the legend.
Putting all that together:
data %>%
mutate(Player = reorder(Player, Goals)) %>%
ggplot(aes(x = Goals, y = Player)) +
geom_segment(aes(xend = Shots, yend = Player), color = "black", size = 1.5) +
geom_point(aes(color = "Goals"), size = 10) +
geom_point(aes(x = Shots, color = "Shots"), size = 3) +
geom_text(aes(label = Goals), color = "black") +
scale_color_manual(
values = c(Goals = "green", Shots = "red"),
guide = guide_legend(title = "", override.aes = list(size = 3))
) +
labs(
title = "Scotland; Goals v Shots",
subtitle = "Top 10; Matchday 22",
x = "Total", y = "Player"
)
We use override.aes to specify a fixed size for the points inside the legend. Without this, ‘ggplot2’ would overlay two points of different sizes. We also set the title of the legend to "" because the default title would be “colour”, and a title doesn’t seem necessary here.
I’ve also used theme_set(theme_bw() + theme(legend.position = "bottom") to generate the image above.
The above is still ordered slightly weirdly, because for players with the same number of goal shots the ordering is still arbitrary. I would probably order those players by (descending) attempted goal shots. That is, it’s better for a player to have scored many goals with the least attempts.
Unfortunately reorder doesn’t support such an ordering directly. Instead, we need to resort to arrangeing the entire table and then reorder players by their row number:
data %>%
arrange(Goals, -Shots) %>%
mutate(Player = reorder(Player, row_number())) %>%
ggplot(aes(x = Goals, y = Player)) +
geom_segment(aes(xend = Shots, yend = Player), color = "black", size = 1.5) +
geom_point(aes(color = "Goals"), size = 10) +
geom_point(aes(x = Shots, color = "Shots"), size = 3) +
geom_text(aes(label = Goals), color = "black") +
scale_color_manual(
values = c(Goals = "green", Shots = "red"),
guide = guide_legend(title = "", override.aes = list(size = 3))
) +
labs(
title = "Scotland; Goals v Shots",
subtitle = "Top 10; Matchday 22",
x = "Total", y = "Player"
)
I really struggle to set the correct legend for a geom_point plot with loess regression, while there is 2 data set used
I got a data set, who is summarizing activity over a day, and then I plot on the same graph, all the activity per hours and per days recorded, plus a regression curve smoothed with a loess function, plus the mean of each hours for all the days.
To be more precise, here is an example of the first code, and the graph returned, without legend, which is exactly what I expected:
# first graph, which is given what I expected but with no legend
p <- ggplot(dat1, aes(x = Hour, y = value)) +
geom_point(color = "darkgray", size = 1) +
geom_point(data = dat2, mapping = aes(x = Hour, y = mean),
color = 20, size = 3) +
geom_smooth(method = "loess", span = 0.2, color = "red", fill = "blue")
and the graph (in grey there is all the data, per hours, per days. the red curve is the loess regression. The blue dots are the means for each hours):
When I tried to set the legend I failed to plot one with the explanation for both kind of dots (data in grey, mean in blue), and the loess curve (in red). See below some example of what I tried.
# second graph, which is given what I expected + the legend for the loess that
# I wanted but with not the dot legend
p <- ggplot(dat1, aes(x = Hour, y = value)) +
geom_point(color = "darkgray", size = 1) +
geom_point(data = dat2, mapping = aes(x = Hour, y = mean),
color = "blue", size = 3) +
geom_smooth(method = "loess", span = 0.2, aes(color = "red"), fill = "blue") +
scale_color_identity(name = "legend model", guide = "legend",
labels = "loess regression \n with confidence interval")
I obtained the good legend for the curve only
and another trial :
# I tried to combine both date set into a single one as following but it did not
# work at all and I really do not understand how the legends works in ggplot2
# compared to the normal plots
A <- rbind(dat1, dat2)
p <- ggplot(A, aes(x = Heure, y = value, color = variable)) +
geom_point(data = subset(A, variable == "data"), size = 1) +
geom_point(data = subset(A, variable == "Moy"), size = 3) +
geom_smooth(method = "loess", span = 0.2, aes(color = "red"), fill = "blue") +
scale_color_manual(name = "légende",
labels = c("Data", "Moy", "loess regression \n with confidence interval"),
values = c("darkgray", "royalblue", "red"))
It appears that all the legend settings are mixed together in a "weird" way, the is a grey dot covering by a grey line, and then the same in blue and in red (for the 3 labels). all got a background filled in blue:
If you need to label the mean, might need to be a bit creative, because it's not so easy to add legend manually in ggplot.
I simulate something that looks like your data below.
dat1 = data.frame(
Hour = rep(1:24,each=10),
value = c(rnorm(60,0,1),rnorm(60,2,1),rnorm(60,1,1),rnorm(60,-1,1))
)
# classify this as raw data
dat1$Data = "Raw"
# calculate mean like you did
dat2 <- dat1 %>% group_by(Hour) %>% summarise(value=mean(value))
# classify this as mean
dat2$Data = "Mean"
# combine the data frames
plotdat <- rbind(dat1,dat2)
# add a dummy variable, we'll use it later
plotdat$line = "Loess-Smooth"
We make the basic dot plot first:
ggplot(plotdat, aes(x = Hour, y = value,col=Data,size=Data)) +
geom_point() +
scale_color_manual(values=c("blue","darkgray"))+
scale_size_manual(values=c(3,1),guide=FALSE)
Note with the size, we set guide to FALSE so it will not appear. Now we add the loess smooth, one way to introduce the legend is to introduce a linetype, and since there's only one group, you will have just one variable:
ggplot(plotdat, aes(x = Hour, y = value,col=Data,size=Data)) +
geom_point() +
scale_color_manual(values=c("blue","darkgray"))+
scale_size_manual(values=c(3,1),guide=FALSE)+
geom_smooth(data=subset(plotdat,Data="Raw"),
aes(linetype=line),size=1,alpha=0.3,
method = "loess", span = 0.2, color = "red", fill = "blue")
I'm attempting to add a legend to a time series chart and I've so far been unable to get any traction. I've provided the working code below, which pulls three economic data series into one chart and applies several changes to get in a format/overall aesthetic that I'd like. I should also add that the chart is graphing the y/y change of quarterly data sets.
I've only been able to find examples of individuals using scale_colour_manual to add a legend - I've provided code that I put together below.
Ideally, the legend just needs to appear to the right of the graph with the color and line chart.
Any help would be greatly appreciated!
library(quantmod)
library(TTR)
library(ggthemes)
library(tidyverse)
Nondurable <- getSymbols("PCND", src = "FRED", auto.assign = F)
Nondurable$chng <- ROC(Nondurable$PCND,4)
Durable <- getSymbols("PCDG", src = "FRED", auto.assign = F)
Durable$chng <- ROC(Durable$PCDG,4)
Services <- getSymbols("PCESV", src = "FRED", auto.assign = F)
Services$chng <- ROC(Services$PCESV, 4)
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng), color = "#5b9bd5", size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng), color = "#00b050", size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng), color = "#ed7d31", size = 1, linetype = "twodash") +
theme_tufte() +
scale_y_continuous(labels = percent, limits = c(-0.01,.09)) +
xlim(as.Date(c('1/1/2010', '6/30/2019'), format="%d/%m/%Y")) +
labs(y = "Percent Change", x = "", caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis") +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(name = 'Legend',
guide = 'legend',
values = c('Nondurable' = '#5b9bd5',
'Durable' = '#00b050',
'Services' = '#ed7d31'),
labels = c('Nondurable',
'Durable',
'Services'))
I receive the following warning messages when I run the program (the chart still plots though).
Warning messages:
1: Removed 252 rows containing missing values (geom_path).
2: Removed 252 rows containing missing values (geom_path).
3: Removed 252 rows containing missing values (geom_path).
There are two reasons you are receiving this error:
The bulk are being removed because of your limits. When you use xlim() or scale_y_continuous(..., limits = ...) ggplot removes the values beyond these limits from your data before plotting and displays that warning as an FYI. After commenting out both of those lines, you will still see a message about removed values but a much smaller number. This is becuase
you have NA values in the first 4 rows of column chng. This is true in all 3 datasets.
For the scales to show, you need to put something differentiating the lines in the aes() as in aes(..., color = "Nondurable"). See if this solution works for you:
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng, color = "Nondurable"), size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng, color = "Durable"), size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng, color = "Services"), size = 1, linetype = "twodash") +
theme_tufte() +
labs(
y = "Percent Change",
x = "",
caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis"
) +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(
name = "Legend",
values = c("#5b9bd5","#00b050","#ed7d31"),
labels = c("Nondurable", "Durable", "Services"
)
) +
scale_x_date(limits = as.Date(c("2010-01-01", "2019-02-01")))
There are similar posts to this, namely here and here, but they address instances where both point color and size are continuous. Is it possible to:
Combine discrete colors and continuous point size within a single legend?
Within that same legend, add a description to each point in place of the numerical break label?
Toy data
xval = as.numeric(c("2.2", "3.7","1.3"))
yval = as.numeric(c("0.3", "0.3", "0.2"))
color.group = c("blue", "red", "blue")
point.size = as.numeric(c("200", "11", "100"))
description = c("descript1", "descript2", "descript3")
df = data.frame(xval, yval, color.group, point.size, description)
ggplot(df, aes(x=xval, y=yval, size=point.size)) +
geom_point(color = df$color.group) +
scale_size_continuous(limits=c(0, 200), breaks=seq(0, 200, by=50))
Doing what you originally asked - continuous + discrete in a single legend - in general doesn't seem to be possible even conceptually. The only sensible thing would be to have two legends for size, with a different color for each legend.
Now let's consider having a single legend. Given your "In my case, each unique combination of point size + color is associated with a description.", it sounds like there are very few possible point sizes. In that case, you could use both scales as discrete. But I believe even that is not enough as you use different variables for size and color scales. A solution then would be to create a single factor variable with all possible combinations of color.group and point.size. In particular,
df <- data.frame(xval, yval, f = interaction(color.group, point.size), description)
ggplot(df, aes(x = xval, y = yval, size = f, color = f)) +
geom_point() + scale_color_discrete(labels = 1:3) +
scale_size_discrete(labels = 1:3)
Here 1:3 are those descriptions that you want, and you may also set the colors the way you like. For instance,
ggplot(df, aes(x = xval, y = yval, size = f, color = f)) +
geom_point() + scale_size_discrete(labels = 1:3) +
scale_color_manual(labels = 1:3, values = c("red", "blue", "green"))
However, we may also exploit color.group by using
ggplot(df, aes(x = xval, y = yval, size = f, color = f)) +
geom_point() + scale_size_discrete(labels = 1:3) +
scale_color_manual(labels = 1:3, values = gsub("(.*)\\..*", "\\1", sort(df$f)))