I am trying to add a legend for the mean and median to my histogram. I am also trying to change the scale on the y-axis that is labeled count. It is currently showing the density scale. I want the density plot but the count scale. Alternatively, I would be fine with a second scale or the counts at the end of the histogram. I am just not sure how to go about it. Below is some data and the current code. Thank you in advance.
studyData=data.frame(X=rchisq(1:100000, df=3))
colnames(studyData) <- "hoursstudying"
mu <- data.frame(mean(studyData$hoursstudying))
colnames(mu) <- "Mean"
med <- data.frame(median(studyData$hoursstudying))
colnames(med) <- "Median"
p <- ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..density..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(data = mu, aes(xintercept = Mean),
color = "red", linetype = "dashed", size = 1) +
geom_vline(data = med, aes(xintercept = median(Median)),
color = "purple", size = 1) +
labs(title = "Hours Spent Completing Course Work") +
ylab("Count") +
xlab("Hours Studying")
theme(plot.title = element_text(hjust = 0.5))
p
You can access the count instead of density on the y axis much in the same way you reference the internal calculation of density using the "..XXXX.." notation. In this case, use ..count...
You will need to change both y aesthetics for geom_histogram() and geom_density():
ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..count..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(aes(y=..count..), alpha=.2, fill="#FF6666") +
# ... everything else is the same
Note: also, I echo the comment from u/Limey. The median and mean values in your original plot shared are clearly wrong... yet when I run the code I am getting the values looking correct. Not sure what that's about, OP, but perhaps that's a different question.
Since #chemdork123 answered the question about the y-axis scale I won't say anything about it. To add the median/mean values to the legend you need to add them as aesthetics.
p <- ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..density..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(data = mu, aes(xintercept = Mean,
color = "red"),
linetype = "dashed", size = 1) +
geom_vline(data = med, aes(xintercept = Median,
color = "purple"),
size = 1) +
scale_color_manual(values = c("purple", "red"),
labels = c("Median", "Mean")) +
labs(title = "Hours Spent Completing Course Work") +
ylab("Count") +
xlab("Hours Studying") +
theme(plot.title = element_text(hjust = 0.5))
Related
Apologies, this was posted earlier but deleted as its is a duplicate question. The duplicate talks about using scale_colour_manual to add a legend to the plot however I could not get this to work, i have added my code below including this suggestion. Appreciate its probably frowned upon re-posting so feel free to delete once resolved.
I have the following plot and I wish to add a legend to the plot but cannot seem to be able too. I have included my code and plot below if anyone knows how to do this. I'm looking for red to be 'east, blue to be 'west and black to be 'overall. Also included a small bit of data to be replicated. I've had a look at other posts on here suggesting things like scale_colour_manual but unable to get it to work.
code used;
ggplot(climate_df_year, aes(y = overall_sst, x = year)) +
geom_line() +
geom_line(aes(y = eastern_sst, x = year), col = 'red', linetype = 2, size = 0.6) +
geom_line(aes(y = western_sst, x = year), col = 'blue', linetype = 2, size = 0.6) +
scale_colour_manual("",
breaks = c("Eastern", "Overall", "Western"),
values = c("red", "black", "blue")) +
ylab("SST (°C)") +
xlab("Year") +
theme(plot.title = element_text(hjust = 0.5, size = 12, face = 'bold'))
data
year overall_sst eastern_sst western_sst
1998 20.3 21.3 19.0
1999 20.6 21.6 19.2
2000 20.4 21.3 19.1
plot
I suggest pivoting the data and using ggplot's native aesthetics for controlling color and linetype.
tidyr::pivot_longer(climate_df_year, -year) |>
ggplot(aes(year, value)) +
geom_line(aes(color = name, linetype = !grepl("overall", name))) +
scale_x_continuous(breaks = do.call(seq, as.list(range(dat$year)))) +
scale_colour_manual(name = "Something", values = c(overall_sst="black", eastern_sst="red", western_sst="blue")) +
labs(x = "Year", y = "SST (°C)") +
scale_linetype_discrete(guide = NULL)
One can also use reshape2::melt(climate_df_year, "year", variable.name = "name") for pivoting, same code otherwise.
If you really prefer to not pivot/melt your data, however, the dupe link had several working changes, namely to bring the color= portions into the aes(..) call. Had you done just that from your original code (and removed or updated the scale_colour_manual call, since the breaks would be different), you would have seen a marked change:
ggplot(climate_df_year, aes(y = overall_sst, x = year)) +
geom_line() +
geom_line(aes(y = eastern_sst, x = year, col = 'red'), linetype = 2, size = 0.6) +
geom_line(aes(y = western_sst, x = year, col = 'blue'), linetype = 2, size = 0.6) +
scale_colour_manual("",
breaks = c("red", "Overall", "blue"),
values = c("red", "black", "blue")) +
ylab("SST (°C)") +
xlab("Year") +
theme(plot.title = element_text(hjust = 0.5, size = 12, face = 'bold'))
The two changes were bring col= (sic) into aes(..), and renaming two of the breaks= values. From that, a little play suggests adding color= to the original geom_line (for "Overall") and renaming all of the color= values (reverting back to the original scale_colour_manual.
ggplot(climate_df_year, aes(y = overall_sst, x = year)) +
geom_line(aes(color = 'Overall')) +
geom_line(aes(y = eastern_sst, x = year, color = 'Eastern'), linetype = 2, size = 0.6) +
geom_line(aes(y = western_sst, x = year, color = 'Western'), linetype = 2, size = 0.6) +
scale_colour_manual("",
breaks = c("Eastern", "Overall", "Western"),
values = c("red", "black", "blue")) +
ylab("SST (°C)") +
xlab("Year") +
theme(plot.title = element_text(hjust = 0.5, size = 12, face = 'bold'))
Having said all that, I highly recommend sticking with the melted (first) option, as it needs only one call to geom_line, handles all aesthetics natively, and scales better too (e.g., additional variables).
I have a Lorenz Curve graph that I filled by factor variables (male and female). This was done simply enough and overlapping was not an issue because there were only two factors.
Wage %>%
ggplot(aes(x = salary, fill = gender)) +
stat_lorenz(geom = "polygon", alpha = 0.65) +
geom_abline(linetype = "dashed") +
coord_fixed() +
scale_fill_hue() +
theme(legend.title = element_blank()) +
labs(x = "Cumulative Percentage of Observations",
y = "Cumulative Percentage of Wages",
title = "Lorenz curve by sex")
This provides the following graph:
However, when I have more than two factors (in this case four), the overlapping becomes a serious problem even if I use contrasting colors. Changing alpha does not do much at this stage. Have a look:
Wage %>%
ggplot(aes(x = salary, fill = Diploma)) +
stat_lorenz(geom = "polygon", alpha = 0.8) +
geom_abline(linetype = "dashed") +
coord_fixed() +
scale_fill_manual(values = c("green", "blue", "black", "white")) +
theme(legend.title = element_blank()) +
labs(x = "Cumulative Percentage of Observations",
y = "Cumulative Percentage of Wages",
title = "Lorenz curve by diploma")
At this point I've tried all different color pallettes, hues, brewers, manuals etc. I've also tried reordering the factors but as you can imagine, this did not work as well.
What I need is probably a single argument or function to stack all these areas on top of each other so they all have their distinct colors. Funny enough, I've failed to find what I'm looking for and decided to ask for help.
Thanks a lot.
The problem was solved by a dear friend. This was done by adding the categorical variables layer by layer, without defining the Lorenz Curve as a whole.
ggplot() + scale_fill_manual(values = wes_palette("GrandBudapest2", n = 4)) +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[3],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[3],]$Diploma), geom = "polygon") +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[4],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[4],]$Diploma), geom = "polygon") +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[2],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[2],]$Diploma), geom = "polygon") +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[1],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[1],]$Diploma), geom = "polygon") +
geom_abline(linetype = "dashed") +
coord_fixed() +
theme(legend.title = element_blank()) +
labs(x = "Cumulative Percentage of Observations",
y = "Cumulative Percentage of Wages",
title = "Lorenz curve by diploma")
Which yields:
After searching the web both yesterday and today, the only way I get a legend working was to follow the solution by 'Brian Diggs' in this post:
Add legend to ggplot2 line plot
Which gives me the following code:
library(ggplot2)
ggplot()+
geom_line(data=myDf, aes(x=count, y=mean, color="TrueMean"))+
geom_hline(yintercept = myTrueMean, color="SampleMean")+
scale_colour_manual("",breaks=c("SampleMean", "TrueMean"),values=c("red","blue"))+
labs(title = "Plot showing convergens of Mean", x="Index", y="Mean")+
theme_minimal()
Everything works just fine if I remove the color of the hline, but if I add a value in the color of hline that is not an actual color (like "SampleMean") I get an error that it's not a color (only for the hline).
How can adding a such common thing as a legend big such a big problem? There much be an easier way?
To create the original data:
#Initial variables
myAlpha=2
myBeta=2
successes=14
n=20
fails=n-successes
#Posterior values
postAlpha=myAlpha+successes
postBeta=myBeta+fails
#Calculating the mean and SD
myTrueMean=(myAlpha+successes)/(myAlpha+successes+myBeta+fails)
myTrueSD=sqrt(((myAlpha+successes)*(myBeta+fails))/((myAlpha+successes+myBeta+fails)^2*(myAlpha+successes+myBeta+fails+1)))
#Simulate the data
simulateBeta=function(n,tmpAlpha,tmpBeta){
tmpValues=rbeta(n, tmpAlpha, tmpBeta)
tmpMean=mean(tmpValues)
tmpSD=sd(tmpValues)
returnVector=c(count=n, mean=tmpMean, sd=tmpSD)
return(returnVector)
}
#Make a df for the data
myDf=data.frame(t(sapply(2:10000, simulateBeta, postAlpha, postBeta)))
Given solution works in most of the cases, but not for geom_hline (vline). For them you usually don't have to use aes, but when you need to generate a legend then you have to wrap them within aes:
library(ggplot2)
ggplot() +
geom_line(aes(count, mean, color = "TrueMean"), myDf) +
geom_hline(aes(yintercept = myTrueMean, color = "SampleMean")) +
scale_colour_manual(values = c("red", "blue")) +
labs(title = "Plot showing convergens of Mean",
x = "Index",
y = "Mean",
color = NULL) +
theme_minimal()
Seeing original data you can use geom_point for better visualisation (also added some theme changes):
ggplot() +
geom_point(aes(count, mean, color = "Observed"), myDf,
alpha = 0.3, size = 0.7) +
geom_hline(aes(yintercept = myTrueMean, color = "Expected"),
linetype = 2, size = 0.5) +
scale_colour_manual(values = c("blue", "red")) +
labs(title = "Plot showing convergens of Mean",
x = "Index",
y = "Mean",
color = "Mean type") +
theme_minimal() +
guides(color = guide_legend(override.aes = list(
linetype = 0, size = 4, shape = 15, alpha = 1))
)
I'm struggling to learn the ins and outs of R, ggplot2, etc - being more used to being taught in an A to Z manner an entire (fixed) coding language (not used to open source - I learned to code when dinosaurs roamed the earth). So I have kluged together the following code to create one graph. Only ... I don't have the dupe legends problem -- I have no legend a'tall!
erc <- ggplot(usedcarval, aes(x = usedcarval$age)) +
geom_line(aes(y = usedcarval$dealer), colour = "orange", size = .5) +
geom_point(aes(y = usedcarval$dealer),
show.legend = TRUE, colour = "orange", size = 1) +
geom_line(aes(y = usedcarval$pvtsell), colour = "green", size = .5) +
geom_point(aes(y = usedcarval$pvtsell), colour = "green", size = 1) +
geom_line(aes(y = usedcarval$tradein), colour = "blue", size = .5) +
geom_point(aes(y = usedcarval$tradein), colour = "blue", size = 1) +
geom_line(aes(y = as.integer(predvalt)), colour = "gray", size = 1) +
geom_line(aes(y = as.integer(predvalp)), colour = "gray", size = 1) +
geom_line(aes(y = as.integer(predvald)), colour = "gray", size = 1) +
labs(x = "Value of a Used Car as it Ages (Years)", y = "Dollars") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 60, vjust = .6))
erc
I can't figure out how to put an image in this text since I have no link except to my dropbox...
I would appreciate any help. Sincerely, Stephanie
Ok, I felt like doing some ggplot, and it was an interesting task to contrast the way ggplot-beginners (I was one not so long ago) approach it compared to the way you need to do it to get things like legends.
Here is the code:
library(ggplot2)
library(gridExtra)
library(tidyr)
# fake up some data
n <- 100
dealer <- 12000 + rnorm(n,0,100)
age <- 10 + rnorm(n,3)
pvtsell <- 10000 + rnorm(n,0,300)
tradein <- 5000 + rnorm(n,0,100)
predvalt <- 6000 + rnorm(n,0,120)
predvalp <- 7000 + rnorm(n,0,100)
predvald <- 8000 + rnorm(n,0,100)
usedcarval <- data.frame(dealer=dealer,age=age,pvtsell=pvtsell,tradein=tradein,
predvalt=predvalt,predvalp=predvalp,predvald=predvald)
# The ggplot-naive way
erc <- ggplot(usedcarval, aes(x = usedcarval$age)) +
geom_line(aes(y = usedcarval$dealer), colour = "orange", size = .5) +
geom_point(aes(y = usedcarval$dealer),
show.legend = TRUE, colour = "orange", size = 1) +
geom_line(aes(y = usedcarval$pvtsell), colour = "green", size = .5) +
geom_point(aes(y = usedcarval$pvtsell), colour = "green", size = 1) +
geom_line(aes(y = usedcarval$tradein), colour = "blue", size = .5) +
geom_point(aes(y = usedcarval$tradein), colour = "blue", size = 1) +
geom_line(aes(y = as.integer(predvalt)), colour = "gray", size = 1) +
geom_line(aes(y = as.integer(predvalp)), colour = "gray", size = 1) +
geom_line(aes(y = as.integer(predvald)), colour = "gray", size = 1) +
labs(x = "ggplot naive way - Value of a Used Car as it Ages (Years)", y = "Dollars") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 60, vjust = .6))
# The tidyverse way
# ggplot needs long data, not wide data.
# Also we have two different sets of data for points and lines
gdf <- usedcarval %>% gather(series,value,-age)
pdf <- gdf %>% filter( series %in% c("dealer","pvtsell","tradein"))
# our color and size lookup tables
clrs = c("dealer"="orange","pvtsell"="green","tradein"="blue","predvalt"="gray","predvalp"="gray","predvald"="gray")
szes = c("dealer"=0.5,"pvtsell"=0.0,"tradein"=0.5,"predvalt"=1,"predvalp"=1,"predvald"=1)
trc <- ggplot(gdf,aes(x=age)) + geom_line(aes(y=value,color=series,size=series)) +
scale_color_manual(values=clrs) +
scale_size_manual(values=szes) +
geom_point(data=pdf,aes(x=age,y=value,color=series),size=1) +
labs(x = "tidyverse way - Value of a Used Car as it Ages (Years)", y = "Dollars") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 60, vjust = .6))
grid.arrange(erc, trc, ncol=1)
Study it, espeically look at gdf,pdf and gather. You just can't get legends without using "long data".
If you want more information on the "tidyverse", start here: Hadley Wickham's tidyverse
If you are looking for a short example of how to take some series data that comes in wide format, convert it to long format (using gather), and then plot it with a ggplot (with a legend), here is a nice short example I cooked up for someone recently:
library(ggplot2)
library(tidyr)
# womp up some fake news (uhh... data)
x <- seq(-pi,pi,by=0.25)
y <- sin(x)
yhat <- sin(x) + 0.4*rnorm(length(x))
# This is the data in wide form
# you will never get ggplot to make a legend for it
# it simply hates wide data
df1 <- data.frame(x=x,y=y,yhat=yhat)
# So we use gather from tidyr to make it into long data
# creates two new colums, throws y and yhat in them, and replicates x as needed
# you have to look at the data frame to understand gather,
# and read the docs a few times
df2 <- gather(df1,series,value,-x)
# it is now in long form and we can plot it
ggplot(df2) + geom_line(aes(x,value,color=series))
So here is the plot:
I know that this question has been asked before but the solutions don't seem to work for me.
What I want to do is represent my median, mean, upper and lower quantiles on a histogram in different colours and then add a legend to the plot. This is what I have so far and I have tried to use scale_color_manual and scale_color_identity to give me a legend. Nothing seems to be working.
quantile_1 <- quantile(sf$Unit.Sales, prob = 0.25)
quantile_2 <- quantile(sf$Unit.Sales, prob = 0.75)
ggplot(aes(x = Unit.Sales), data = sf) +
geom_histogram(color = 'black', fill = NA) +
geom_vline(aes(xintercept=median(Unit.Sales)),
color="blue", linetype="dashed", size=1) +
geom_vline(aes(xintercept=mean(Unit.Sales)),
color="red", linetype="dashed", size=1) +
geom_vline(aes(xintercept=quantile_1), color="yellow", linetype="dashed", size=1)
You need to map the color inside the aes:
ggplot(aes(x = Sepal.Length), data = iris) +
geom_histogram(color = 'black', fill = NA) +
geom_vline(aes(xintercept=median(iris$Sepal.Length),
color="median"), linetype="dashed",
size=1) +
geom_vline(aes(xintercept=mean(iris$Sepal.Length),
color="mean"), linetype="dashed",
size=1) +
scale_color_manual(name = "statistics", values = c(median = "blue", mean = "red"))