After searching the web both yesterday and today, the only way I get a legend working was to follow the solution by 'Brian Diggs' in this post:
Add legend to ggplot2 line plot
Which gives me the following code:
library(ggplot2)
ggplot()+
geom_line(data=myDf, aes(x=count, y=mean, color="TrueMean"))+
geom_hline(yintercept = myTrueMean, color="SampleMean")+
scale_colour_manual("",breaks=c("SampleMean", "TrueMean"),values=c("red","blue"))+
labs(title = "Plot showing convergens of Mean", x="Index", y="Mean")+
theme_minimal()
Everything works just fine if I remove the color of the hline, but if I add a value in the color of hline that is not an actual color (like "SampleMean") I get an error that it's not a color (only for the hline).
How can adding a such common thing as a legend big such a big problem? There much be an easier way?
To create the original data:
#Initial variables
myAlpha=2
myBeta=2
successes=14
n=20
fails=n-successes
#Posterior values
postAlpha=myAlpha+successes
postBeta=myBeta+fails
#Calculating the mean and SD
myTrueMean=(myAlpha+successes)/(myAlpha+successes+myBeta+fails)
myTrueSD=sqrt(((myAlpha+successes)*(myBeta+fails))/((myAlpha+successes+myBeta+fails)^2*(myAlpha+successes+myBeta+fails+1)))
#Simulate the data
simulateBeta=function(n,tmpAlpha,tmpBeta){
tmpValues=rbeta(n, tmpAlpha, tmpBeta)
tmpMean=mean(tmpValues)
tmpSD=sd(tmpValues)
returnVector=c(count=n, mean=tmpMean, sd=tmpSD)
return(returnVector)
}
#Make a df for the data
myDf=data.frame(t(sapply(2:10000, simulateBeta, postAlpha, postBeta)))
Given solution works in most of the cases, but not for geom_hline (vline). For them you usually don't have to use aes, but when you need to generate a legend then you have to wrap them within aes:
library(ggplot2)
ggplot() +
geom_line(aes(count, mean, color = "TrueMean"), myDf) +
geom_hline(aes(yintercept = myTrueMean, color = "SampleMean")) +
scale_colour_manual(values = c("red", "blue")) +
labs(title = "Plot showing convergens of Mean",
x = "Index",
y = "Mean",
color = NULL) +
theme_minimal()
Seeing original data you can use geom_point for better visualisation (also added some theme changes):
ggplot() +
geom_point(aes(count, mean, color = "Observed"), myDf,
alpha = 0.3, size = 0.7) +
geom_hline(aes(yintercept = myTrueMean, color = "Expected"),
linetype = 2, size = 0.5) +
scale_colour_manual(values = c("blue", "red")) +
labs(title = "Plot showing convergens of Mean",
x = "Index",
y = "Mean",
color = "Mean type") +
theme_minimal() +
guides(color = guide_legend(override.aes = list(
linetype = 0, size = 4, shape = 15, alpha = 1))
)
Related
I am trying to add a legend for the mean and median to my histogram. I am also trying to change the scale on the y-axis that is labeled count. It is currently showing the density scale. I want the density plot but the count scale. Alternatively, I would be fine with a second scale or the counts at the end of the histogram. I am just not sure how to go about it. Below is some data and the current code. Thank you in advance.
studyData=data.frame(X=rchisq(1:100000, df=3))
colnames(studyData) <- "hoursstudying"
mu <- data.frame(mean(studyData$hoursstudying))
colnames(mu) <- "Mean"
med <- data.frame(median(studyData$hoursstudying))
colnames(med) <- "Median"
p <- ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..density..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(data = mu, aes(xintercept = Mean),
color = "red", linetype = "dashed", size = 1) +
geom_vline(data = med, aes(xintercept = median(Median)),
color = "purple", size = 1) +
labs(title = "Hours Spent Completing Course Work") +
ylab("Count") +
xlab("Hours Studying")
theme(plot.title = element_text(hjust = 0.5))
p
You can access the count instead of density on the y axis much in the same way you reference the internal calculation of density using the "..XXXX.." notation. In this case, use ..count...
You will need to change both y aesthetics for geom_histogram() and geom_density():
ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..count..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(aes(y=..count..), alpha=.2, fill="#FF6666") +
# ... everything else is the same
Note: also, I echo the comment from u/Limey. The median and mean values in your original plot shared are clearly wrong... yet when I run the code I am getting the values looking correct. Not sure what that's about, OP, but perhaps that's a different question.
Since #chemdork123 answered the question about the y-axis scale I won't say anything about it. To add the median/mean values to the legend you need to add them as aesthetics.
p <- ggplot(studyData, aes(x = hoursstudying)) +
geom_histogram(aes(y=(..density..)), binwidth = 1, colour = "black", fill = "lightblue") +
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(data = mu, aes(xintercept = Mean,
color = "red"),
linetype = "dashed", size = 1) +
geom_vline(data = med, aes(xintercept = Median,
color = "purple"),
size = 1) +
scale_color_manual(values = c("purple", "red"),
labels = c("Median", "Mean")) +
labs(title = "Hours Spent Completing Course Work") +
ylab("Count") +
xlab("Hours Studying") +
theme(plot.title = element_text(hjust = 0.5))
I'm having trouble setting a custom legend for confidence bands and dashed lines. This is my graph so far.
di<-matrix(ncol = 3,nrow = 5) %>% as.data.frame()
colnames(di)<-c('group','estimate','SE')
di<-di %>% mutate(group=1:5,
estimate=c(0.5,9.6,13,15,23.1),
SE=14)
ggplot(di, aes(x=group, y=estimate)) +
geom_point() +
geom_errorbar(width=.5, aes(ymin=estimate-(1.647*SE), ymax=estimate+(1.647*SE)), colour="black") +
xlab('Group') +
ylab('Treatment Effect') +
labs(title="GATE with confidence bands",
subtitle="Point estimates and confidence bands are derived using median of all splits") +
geom_hline(yintercept=c(7.83,22.55),
linetype="longdash",
col='darkred') +
geom_hline(yintercept=15.19,
linetype="longdash",
col='blue')
It looks like this:
However what I want it to look like is something like this, with the exact same legend:
Any advice on this?
This could be achieved like so:
As a general rule: If you want to have a legend you have to map something on aesthetics, e.g. move color=... into aes() for all four geoms
The desired color values can then be set via scale_color_manual
For the geom_hline we also have to pass yintercept as an aes() too. To this end these get something helper data frames with the desired values.
To fix the lines and shapes in the legend I make use of guide_legend's overide.aes to remove the undesired points in the legend as well as removing the line for the point. Additionally I set the number of rows for the legend to 2.
The labels and the order of the layers can be set via the labels and the breaks argument of scale_color_manual
Move the legend in the topleft and get rid of the background fill for the legend and the keys via theme options.
library(ggplot2)
di <- data.frame(
group = 1:5,
estimate = c(0.5, 9.6, 13, 15, 23.1),
SE = 14
)
labels <- c(point = "Point", error = "Error", blue = "Blue", darkred = "Red")
breaks <- c("blue", "darkred", "point", "error")
ggplot(di, aes(x = group, y = estimate)) +
geom_point(aes(color = "point"), size = 3) +
geom_errorbar(width = .5, aes(
ymin = estimate - (1.647 * SE),
ymax = estimate + (1.647 * SE),
color = "error"
)) +
scale_color_manual(values = c(
point = "black",
error = "black",
blue = "blue",
darkred = "darkred"
), labels = labels, breaks = breaks) +
labs(
title = "GATE with confidence bands",
subtitle = "Point estimates and confidence bands are derived using median of all splits",
x = "Group",
y = "Treatment Effect",
color = NULL, linetype = NULL, shape = NULL
) +
geom_hline(
data = data.frame(yintercept = c(7.83, 22.55)),
aes(yintercept = yintercept, color = "darkred"), linetype = "longdash"
) +
geom_hline(
data = data.frame(yintercept = 15.19),
aes(yintercept = yintercept, color = "blue"), linetype = "longdash"
) +
guides(color = guide_legend(override.aes = list(
shape = c(NA, NA, 16, NA),
linetype = c("longdash", "longdash", "blank", "solid")
), nrow = 2, byrow = TRUE)) +
theme(legend.position = c(0, 1),
legend.justification = c(0, 1),
legend.background = element_rect(fill = NA),
legend.key = element_rect(fill = NA))
I have a Lorenz Curve graph that I filled by factor variables (male and female). This was done simply enough and overlapping was not an issue because there were only two factors.
Wage %>%
ggplot(aes(x = salary, fill = gender)) +
stat_lorenz(geom = "polygon", alpha = 0.65) +
geom_abline(linetype = "dashed") +
coord_fixed() +
scale_fill_hue() +
theme(legend.title = element_blank()) +
labs(x = "Cumulative Percentage of Observations",
y = "Cumulative Percentage of Wages",
title = "Lorenz curve by sex")
This provides the following graph:
However, when I have more than two factors (in this case four), the overlapping becomes a serious problem even if I use contrasting colors. Changing alpha does not do much at this stage. Have a look:
Wage %>%
ggplot(aes(x = salary, fill = Diploma)) +
stat_lorenz(geom = "polygon", alpha = 0.8) +
geom_abline(linetype = "dashed") +
coord_fixed() +
scale_fill_manual(values = c("green", "blue", "black", "white")) +
theme(legend.title = element_blank()) +
labs(x = "Cumulative Percentage of Observations",
y = "Cumulative Percentage of Wages",
title = "Lorenz curve by diploma")
At this point I've tried all different color pallettes, hues, brewers, manuals etc. I've also tried reordering the factors but as you can imagine, this did not work as well.
What I need is probably a single argument or function to stack all these areas on top of each other so they all have their distinct colors. Funny enough, I've failed to find what I'm looking for and decided to ask for help.
Thanks a lot.
The problem was solved by a dear friend. This was done by adding the categorical variables layer by layer, without defining the Lorenz Curve as a whole.
ggplot() + scale_fill_manual(values = wes_palette("GrandBudapest2", n = 4)) +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[3],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[3],]$Diploma), geom = "polygon") +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[4],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[4],]$Diploma), geom = "polygon") +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[2],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[2],]$Diploma), geom = "polygon") +
stat_lorenz(aes(x=Wage[Wage$Diploma==levels(Wage$Diploma)[1],]$salary, fill=Wage[Wage$Diploma==levels(Wage$Diploma)[1],]$Diploma), geom = "polygon") +
geom_abline(linetype = "dashed") +
coord_fixed() +
theme(legend.title = element_blank()) +
labs(x = "Cumulative Percentage of Observations",
y = "Cumulative Percentage of Wages",
title = "Lorenz curve by diploma")
Which yields:
I am trying to make a bar chart with line plots as well. The graph has created fine but the legend does not want to add the line plots to the legend.
I have tried so many different ways of adding these to the legend including:
ggplot Legend Bar and Line in Same Graph
None of which have worked. show.legend also seems to have been ignored in the geom_line aes.
My code to create the graph is as follows:
ggplot(first_q, aes(fill = Segments)) +
geom_bar(aes(x= Segments, y= number_of_new_customers), stat =
"identity") + theme(axis.text.x = element_blank()) +
scale_y_continuous(expand = c(0, 0), limits = c(0,3000)) +
ylab('Number of Customers') + xlab('Segments') +
ggtitle('Number Customers in Q1 by Segments') +theme(plot.title =
element_text(hjust = 0.5)) +
geom_line(aes(x= Segments, y=count) ,stat="identity",
group = 1, size = 1.5, colour = "darkred", alpha = 0.9, show.legend =
TRUE) +
geom_line(aes(x= Segments, y=bond_count)
,stat="identity", group = 1, size = 1.5, colour = "blue", alpha =
0.9) +
geom_line(aes(x= Segments, y=variable_count)
,stat="identity", group = 1, size = 1.5, colour = "darkgreen",
alpha = 0.9) +
geom_line(aes(x= Segments, y=children_count)
,stat="identity", group = 1, size = 1.5, colour = "orange", alpha
= 0.9) +
guides(fill=guide_legend(title="Segments")) +
scale_color_discrete(name = "Prod", labels = c("count", "bond_count", "variable_count", "children_count)))
I am fairly new to R so if any further information is required or if this question could be better represented then please let me know.
Any help is greatly appreciated.
Alright, you need to remove a little bit of your stuff. I used the mtcars dataset, since you did not provide yours. I tried to keep your variable names and reduced the plot to necessary parts. The code is as follows:
first_q <- mtcars
first_q$Segments <- mtcars$mpg
first_q$val <- seq(1,nrow(mtcars))
first_q$number_of_new_costumers <- mtcars$hp
first_q$type <- "Line"
ggplot(first_q) +
geom_bar(aes(x= Segments, y= number_of_new_costumers, fill = "Bar"), stat =
"identity") + theme(axis.text.x = element_blank()) +
scale_y_continuous(expand = c(0, 0), limits = c(0,3000)) +
geom_line(aes(x=Segments,y=val, linetype="Line"))+
geom_line(aes(x=Segments,y=disp, linetype="next line"))
The answer you linked already gave the answer, but i try to explain. You want to plot the legend by using different properties of your data. So if you want to use different lines, you can declare this in your aes. This is what get's shown in your legend. So i used two different geom_lines here. Since the aes is both linetype, both get shown at the legend linetype.
the plot:
You can adapt this easily to your use. Make sure you using known keywords for the aesthetic if you want to solve it this way. Also you can change the title names afterwards by using:
labs(fill = "costum name")
If you want to add colours and the same line types, you can do customizing by using scale_linetype_manual like follows (i did not use fill for the bars this time):
library(ggplot2)
first_q <- mtcars
first_q$Segments <- mtcars$mpg
first_q$val <- seq(1,nrow(mtcars))
first_q$number_of_new_costumers <- mtcars$hp
first_q$type <- "Line"
cols = c("red", "green")
ggplot(first_q) +
geom_bar(aes(x= Segments, y= number_of_new_costumers), stat =
"identity") + theme(axis.text.x = element_blank()) +
scale_y_continuous(expand = c(0, 0), limits = c(0,3000)) +
geom_line(aes(x=Segments,y=val, linetype="solid"), color = "red", alpha = 0.4)+
geom_line(aes(x=Segments,y=disp, linetype="second"), color ="green", alpha = 0.5)+
scale_linetype_manual(values = c("solid","solid"),
guide = guide_legend(override.aes = list(colour = cols)))
I am trying to add a legend to ggplot2 based on some horizontal comparison lines that were added in. My code currently looks like this
aplot <- ggplot(aData, aes(x = DEP, y = DECAY, color = GENDER))
aplot +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_hline(yintercept = meanCD, color = "purple")+
geom_hline(yintercept = medianCD, color = "forestgreen") +
geom_hline(yintercept = medianSAD, color = "goldenrod3", linetype = "dashed")+
geom_hline(yintercept = meanSAD, color = "deeppink", linetype = "dashed")
meanCD, medianCD, meanSD, medianSD are all stored as seperate values and I need to add them to the graph for comparitive purposes. aData is just a bunch of points. I cannot get a legend to show up giving the color of the line and giving an appropriate label and I am unsure how to accomplish this in ggplot2.