I'm attempting to add a legend to a time series chart and I've so far been unable to get any traction. I've provided the working code below, which pulls three economic data series into one chart and applies several changes to get in a format/overall aesthetic that I'd like. I should also add that the chart is graphing the y/y change of quarterly data sets.
I've only been able to find examples of individuals using scale_colour_manual to add a legend - I've provided code that I put together below.
Ideally, the legend just needs to appear to the right of the graph with the color and line chart.
Any help would be greatly appreciated!
library(quantmod)
library(TTR)
library(ggthemes)
library(tidyverse)
Nondurable <- getSymbols("PCND", src = "FRED", auto.assign = F)
Nondurable$chng <- ROC(Nondurable$PCND,4)
Durable <- getSymbols("PCDG", src = "FRED", auto.assign = F)
Durable$chng <- ROC(Durable$PCDG,4)
Services <- getSymbols("PCESV", src = "FRED", auto.assign = F)
Services$chng <- ROC(Services$PCESV, 4)
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng), color = "#5b9bd5", size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng), color = "#00b050", size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng), color = "#ed7d31", size = 1, linetype = "twodash") +
theme_tufte() +
scale_y_continuous(labels = percent, limits = c(-0.01,.09)) +
xlim(as.Date(c('1/1/2010', '6/30/2019'), format="%d/%m/%Y")) +
labs(y = "Percent Change", x = "", caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis") +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(name = 'Legend',
guide = 'legend',
values = c('Nondurable' = '#5b9bd5',
'Durable' = '#00b050',
'Services' = '#ed7d31'),
labels = c('Nondurable',
'Durable',
'Services'))
I receive the following warning messages when I run the program (the chart still plots though).
Warning messages:
1: Removed 252 rows containing missing values (geom_path).
2: Removed 252 rows containing missing values (geom_path).
3: Removed 252 rows containing missing values (geom_path).
There are two reasons you are receiving this error:
The bulk are being removed because of your limits. When you use xlim() or scale_y_continuous(..., limits = ...) ggplot removes the values beyond these limits from your data before plotting and displays that warning as an FYI. After commenting out both of those lines, you will still see a message about removed values but a much smaller number. This is becuase
you have NA values in the first 4 rows of column chng. This is true in all 3 datasets.
For the scales to show, you need to put something differentiating the lines in the aes() as in aes(..., color = "Nondurable"). See if this solution works for you:
ggplot() +
geom_line(data = Nondurable, aes(x = Index, y = chng, color = "Nondurable"), size = 1, linetype = "solid") +
geom_line(data = Durable, aes(x = Index, y = chng, color = "Durable"), size = 1, linetype = "longdash") +
geom_line(data = Services, aes(x = Index, y = chng, color = "Services"), size = 1, linetype = "twodash") +
theme_tufte() +
labs(
y = "Percent Change",
x = "",
caption = "Seasonally Adjusted Annual Rate. Retrieved from FRED & U.S. Bureau of Economic Analysis"
) +
ggtitle("Year-over-Year Spending Trend Changes of the US Consumer") +
scale_colour_manual(
name = "Legend",
values = c("#5b9bd5","#00b050","#ed7d31"),
labels = c("Nondurable", "Durable", "Services"
)
) +
scale_x_date(limits = as.Date(c("2010-01-01", "2019-02-01")))
Related
I am trying to create a single chart from two created bar charts to show the differences in their distribution. I have both charts merging together, and the axis labels are correct. However, I cannot figure out how to get the bars in each section to be next to each other for comparison instead of overlaying. Data for this chart are two variables within the same DF. I am relatively new to r and new to ggplot so even plotting what I have was a challenge. Please be kind and I apologize if this is a question that has been answered before.
Here is the code I am using:
Labeled <- ggplot(NULL, aes(lab),position_dodge(.5)) + ggtitle("Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges")+
geom_bar(aes(x=AgeFactor,fill = "Age of Autism Diagnosis"), data = Graph, alpha = 0.5,width = 0.6) +
geom_bar(aes(x=FdgFactor,fill = "Feeding Challenge Onset"), data = Graph, alpha = 0.5,width=.6)+
scale_x_discrete(limits=c("0-6months","7-12months","1-1.99","2-2.99","3-3.99","4-4.99","5-5.99","6-6.99","7-7.99","8-8.99","9-9.99","10-10.99"))+
xlab("Age")+
ylab("")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_fill_discrete(name = "")
and this is the graph it is creating for me:
I really appreciate any insight. This is my first time asking a question on stack too - so I am happy to edit/adjust info as needed.
The issue is that you plot from different columns of your dataset. To dodge your bars position_dodge requires a column to group the data by. To this end you could reshape your data to long format using e.g. tidyr::pivot_longer so that your two columns are stacked on top of each other and you get a new column containing the column or group names as categories.
Using some fake random example data. First I replicate your issue with this data and your code:
set.seed(123)
levels <- c("0-6months", "7-12months", "1-1.99", "2-2.99", "3-3.99", "4-4.99", "5-5.99", "6-6.99", "7-7.99", "8-8.99", "9-9.99", "10-10.99")
Graph <- data.frame(
AgeFactor = sample(levels, 100, replace = TRUE),
FdgFactor = sample(levels, 100, replace = TRUE),
lab = 1:100
)
library(ggplot2)
ggplot(NULL, aes(lab), position_dodge(.5)) +
ggtitle("Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges") +
geom_bar(aes(x = AgeFactor, fill = "Age of Autism Diagnosis"), data = Graph, alpha = 0.5, width = 0.6) +
geom_bar(aes(x = FdgFactor, fill = "Feeding Challenge Onset"), data = Graph, alpha = 0.5, width = .6) +
scale_x_discrete(limits = c("0-6months", "7-12months", "1-1.99", "2-2.99", "3-3.99", "4-4.99", "5-5.99", "6-6.99", "7-7.99", "8-8.99", "9-9.99", "10-10.99")) +
xlab("Age") +
ylab("") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
scale_fill_discrete(name = "")
And now the fix using reshaping. Additionally I simplified your code a bit:
library(tidyr)
library(dplyr)
Graph_long <- Graph %>%
select(AgeFactor, FdgFactor) %>%
pivot_longer(c(AgeFactor, FdgFactor))
ggplot(Graph_long, aes(x = value, fill = name)) +
geom_bar(alpha = 0.5, width = 0.6, position = position_dodge()) +
scale_fill_discrete(labels = c("Age of Autism Diagnosis", "Feeding Challenge Onset")) +
scale_x_discrete(limits = levels) +
labs(x = "Age", y = NULL, fill = NULL, title = "Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
I am trying to display data that includes non-detects. For the ND I want to have a circular outline at different sizes so that the lines do not overlap each other. I pretty much have what I want, but for the parameter cis-DCE the circular outline just makes the point look bigger instead of being a distinct outline. How do I attribute size to the parameter and also make the starting size larger?
I will include all of the code I am using for the graphing, but I am specifically working on this bit right now.
geom_point(aes(x= date, y = lrl, group = parm_nmShort, size = parm_nmShort), shape = 1) + #marking lower limit
I also know that I could use facet_wraps and I've done that previously, but historically this data has been shown in one graph, but without identifying the NDs and I do not want to drastically alter the display of the data and confuse anyone.
{
#graphing
# folder where you want the graphs to be saved:
results <- 'C:/Users/cbuckley/OneDrive - DOI/Documents/Projects/New Haven/Data/Graphs/'
{
VOC.graph <- function(df, na.rm = TRUE, ...){
df$parm_nmShort <- factor(df$parm_nm, levels = c("cis.1.2.Dichloroethene_77093",
"Trichloroethene_34485",
"Tetrachloroethene_34475"),
labels = c("cis-DCE", "TCE", "PCE"))
# create list of sites in data to loop over
site_list <- unique(df$site_nm)
# create for loop to produce ggplot2 graphs
for (i in seq_along(site_list)) {
# create plot for each county in df
plot <-
ggplot(subset(df, df$site_nm==site_list[i]),
aes(x = date, y = result,
group = parm_nmShort,
color = parm_nmShort)) +
geom_point() + #add data point plot
geom_line() + #add line plot
#geom_point(aes(y = lrl, group = parm_nmShort, shape = parm_nmShort)) +
geom_point(aes(x= date, y = lrl, group = parm_nmShort, size = parm_nmShort), shape = 1) + #marking lower limit
#scale_shape_manual(values = c("23","24","25")) + #create outlier shapes
#facet_wrap(~parm_nmShort) +
ggtitle(site_list[i]) + #name graphs well names
# theme(legend.position="none") + #removed legend
labs(x = "Year", y = expression(paste("Value, ug/L"))) + #add x and y label titles
theme_article() + #remove grey boxes, outline graph in black
theme(legend.title = element_blank()) + #removes legend title
scale_x_date(labels = date_format("%y"),
limits = as.Date(c("2000-01-01","2021-01-01"))) #+ # set x axis for all graphs
# geom_hline(yintercept = 5) #+ #add 5ug/L contaminant limit horizontal line
# theme(axis.text.x = element_text(angle = 45, size = 12, vjust = 1)) + #angles x axis titles 45 deg
# theme(aspect.ratio = 1) +
# scale_color_hue(labels = c("cic-DCE", "PCE", "TCE")) + #change label names
# scale_fill_discrete(breaks = c("PCE", "TCE", "cic-DCE"))
# Code below will let you block out below the resolution limit
# geom_ribbon(aes(ymin = 0, ymax = ###LRL###), fill ="white", color ="grey3") +
# geom_line(color ="black", lwd = 1)
#ggsave(plot,
# file=paste(results, "", site_list[i], ".png", sep=''),
# scale=1)
# print plots to screen
print(plot)
}
}
#run graphing function with long data set
VOC.graph(data)
}}
Well after a lot of playing around, I figured out the answer to my own question. I figured I'd leave the question up because none of the solutions I found online worked for me but this code did.
geom_point(aes(x= date, y = lrl, group = parm_nmShort, shape = parm_nmShort, size = parm_nmShort)) + #identify non detects
scale_shape_manual(values = c(1,1,1)) +
scale_size_manual(values = c(3,5,7)) +
I'm not very good at R, but for some reason when I didn't include the group and shape in the aes as parm_nmShort, I couldn't mannualy change the values. I don't know if it's because I have more than one geom_point in my whole script and so maybe it didn't know which one to change.
I am using ggplot2 to produce a bar chart and I would like to include my main result as well as a "gold standard" on the same chart. I have tried a couple of methods but I am not able to produce an appropriate legend for the chart.
Method 1
Here I use geom_col() for my main result and geom_errorbar() for my "gold standard". I don't know how to show a simple legend (red = gold standard, blue = score) to match this chart. Additionally, I don't like that the error bar overlaps the axis grid line at 1.00 (instead of meeting it exactly).
chart_A_data <- data_frame(students= c("Alice", "Bob", "Charlie"),
score = c(0.5, 0.7, 0.8),
max_score = c(1, 1 , 1))
chart_A <- ggplot(chart_A_data, aes(x = students, y = score)) +
geom_col(fill = "blue") +
geom_errorbar(aes(ymin = max_score, ymax = max_score),
size = 2, colour = "red") +
ggtitle("Chart A", subtitle = "Use errorbars to show \"gold standard\"")
chart_A
Method 2
Here I create dummy variables and produce a stacked bar chart using geom_bar() and then make the unused dummy variable transparent. I am happy with how precise this method is but I don't know how to remove the unused dummy variable from my legend. Additionally, In this case I need to treat any score of 1.00 as a special case (i.e. set it to 0.99 to make space for the "gold standard").
chart_B_data <- chart_A_data %>%
select(-max_score) %>%
# create dummy variables for stacked bars, note: error if score>0.99
mutate(max_score_line = 0.01) %>%
mutate(blank_fill = 0.99 - score) %>%
gather(stat_level, pct, -students) %>%
# set as factor to control order of stacked bars
mutate(stat_level = factor(stat_level,
levels = c("max_score_line", "blank_fill", "score"),
labels = c("max", "", "score")))
chart_B <- ggplot(data = chart_B_data,
aes(x = students, y = pct, fill = stat_level, alpha = stat_level)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual(values = c("red", "pink", "blue")) +
scale_alpha_manual(values = c(1,0,1)) +
ggtitle("Chart B", subtitle = "Create dummy variables and use stacked bar chart")
chart_B
I don't mind if there is a completely different way I should be approaching this, but I really would like to be able to show a gold standard on my bar chart with a simple concise legend. I will be writing a script to do 50-60 of these charts so I don't want to have too many "special cases" to think about.
In case there's only one max score: This may seem a little hacky (and probably not that beautiful), but it does the job:
ggplot(chart_A_data, aes(x = students, y = score))+
geom_col()+
geom_hline(yintercept = chart_A_data$max_score)
Another one:
ggplot(chart_A_data, aes(x = students,
y = score,
fill = students))+
geom_col()+
geom_segment(aes(x = as.numeric(students)-.15,
xend = as.numeric(students)+.15,
y = max_score,
yend = max_score,
color = students))
Here for the case there are variable maximum scores for each student (you may need to play with the hard-coded 0.15 untill you find something suitable):
Edit after the OP clarified request:
ggplot(chart_A_data, aes(x = students,
y = score))+
geom_col(aes(fill = "blue"))+
geom_segment(aes(x = as.numeric(students)-.25,
xend = as.numeric(students)+.25,
y = max_score,
yend = max_score, color = "red"),
size = 1.7)+
scale_fill_identity(name = "",
guide = "legend",
labels = "Score")+
scale_color_manual(name = "",
values = ("red" = "red"),
labels = "Max Score")
Which produces:
I cannot figure out how the make the letter "R" in the annotate() function below italicised on my plot. I've tried adding in expression() before paste(), and using italic(), but that then pastes the section starting "round(cor..." as text, rather than the result of the calculation.
ggplot(subset(crossnat, !is.na(homicide) & !is.na(gdppercapita)),
aes(x = gdppercapita, y = homicide)) +
geom_point(alpha = 0.4) +
ggtitle("Figure 3: Relationship between GDP per capita ($) and homicide rate") +
labs(subtitle = "n = 177 (17 countries removed as either GDP per capita or homicide data unavailable",
x = "GDP per capita ($)",
y = "Number of homicides in 2013 (per 100k of population)") +
scale_y_continuous(breaks = c(0,15,30,45,60,75,90)) +
geom_smooth(method = "loess",
formula = y ~ x,
colour = "red",
size = 0.5) +
annotate(x = 50000, y = 75,
label = paste("R = ", round(cor(crossnat$gdppercapita, crossnat$homicide, use = "complete.obs"),3)),
geom = "text", size = 4)
Thanks
EDIT - the suggested possible duplicate does not seem to work for me. I think this might be due to the calculation of the correlation being embedded inside the annotate()?
This type of formatting is tricky. You need to pay attention to the white spaces when the parse=TRUE is used. To format the text you need proceed in two steps of pasting. Let's create a simple reproducible example:
ggData <- data.frame(x=rnorm(100), y=rnorm(100) )
I recommend you to store the text AND the correlation value R outside of the ggplot function for readability of the code:
textPart1 <- "paste(italic(R), \" =\")" # check the ?annotate example for \" =\"
corVal <- round(cor(ggData$x, ggData$y, use = "complete.obs"), 3)
The trick is to paste the two variables with the sep="~" instead of the white space.
ggplot(ggData, aes(x = x, y = y) ) +
geom_point(alpha = 0.4) +
annotate("text", x = 2, y = 1.5,
label = paste(textPart1, corVal, sep="~"), size = 4 , parse=TRUE)
While creating a shot chart in R, I've been using some open source stuff from Todd W. Schneider's BallR court design (https://github.com/toddwschneider/ballr/blob/master/plot_court.R)
along with another Stack Overflow post on how to create percentages within hexbins (How to replicate a scatterplot with a hexbin plot in R?).
Both sources have been really helpful for me.
When I run the following lines of code, I get a solid hexbin plot of percent made for shots for the different locations on the court:
ggplot(shots_df, aes(x = location_y-25, y = location_x, z = made_flag)) +
stat_summary_hex(fun = mean, alpha = 0.8, bins = 30) +
scale_fill_gradientn(colors = my_colors(7), labels = percent_format(),
name = "Percent Made")
However, when I include the BallR court design code snippet, which is shown below:
ggplot(shots_df, aes(x=location_y-25,y=location_x,z=made_flag)) +
stat_summary_hex(fun = mean, alpha = 0.8, bins = 30) +
scale_fill_gradientn(colors = my_colors(7), labels=percent_format(),
name="Percent Made") +
geom_path(data = court_points,
aes(x = x, y = y, group = desc, linetype = dash),
color = "#000004") +
scale_linetype_manual(values = c("solid", "longdash"), guide = FALSE) +
coord_fixed(ylim = c(0, 35), xlim = c(-25, 25)) +
theme_court(base_size = 22)
I get the error: Error in eval(expr, envir, enclos) : object 'made_flag' not found, even though that the made_flag is 100% in the data frame, shots_df, and worked in the original iteration. I am lost on how to fix this problem.
I believe your problem lies in the geom_path() layer. Try this tweek:
geom_path(data = court_points, aes(x = x, y = y, z = NULL, group = desc, linetype = dash))
Because you set the z aesthetic at the top, it is still inheriting in geom_path() even though you are on a different data source. You have to manually overwrite this with z = NULL.