Custom legend in ggplot2: how to fill without factors? - r

I am making a personality survey that will generate score reports for participants. I want to make these as easy to read and understand as possible, so I am generating a normal curve for the surveyed trait and a line showing the person where they fall on the curve.
First, let's generate some data:
sample <- as.data.frame(rnorm(1000, 0, 1))
names(sample) <- "trait"
score <- mean(sample$trait)
My problem is with the legend—I cannot figure out how to customize the legend to display 1) a filled "population" color when I'm not graphing multiple factors, and 2) the line showing the participant's score.
I can get close:
ggplot(sample, aes(x=trait)) +
geom_density(fill="blue") +
geom_density(aes(fill="Population")) +
geom_vline(aes(xintercept=score, color="You")) +
geom_vline(xintercept=score, color='red',
linetype="solid",size=1.5) +
scale_colour_manual(values=c('Population'='blue',
'You'='red'))
Graph image 1
But this does not use the colors specified, and has extraneous "colour" and "fill" text in the legend.
If I change the geom_density aesthetic to color instead of fill and leave everything else the same...
geom_density(aes(color="Population")) +
...This correctly applies the colors, but then does not fill the "Population" box in the legend with blue.
Graph image 2
Optimally, I'd like to fill the "Population" box blue and remove the red box around the "You" line in the legend. How can I achieve this?

I hope this can be used.
ggplot(sample, aes(trait)) +
geom_density(aes(fill = "Population")) +
geom_vline(aes(xintercept = mean(trait), color = "You")) +
theme(legend.title = element_blank()) +
scale_color_manual(values = "red", breaks = "You") +
scale_fill_manual(values = "blue", breaks = "Population")

Related

How to plot two histograms of different variables in one GGPlot, with legend and colours

This is my first post on Stack Overflow, my first reproducible example, and I'm new to R, so please be gentle!
I am trying to display two histograms on one plot. Each histogram is a different variable (column) in my dataframe. I can't figure out how to both colour in the bars and have the legend displayed. If I use scale_fill_manual the colours are ignored, but if I use scale_colour_manual the colours are just the outlines of the bars. If I map the colours to each histogram separately (and don't use scale_xxx_manual at all) the colours work great but I then don't get the legend.
Here is my code:
TwoHistos <- ggplot (cars) +
labs(color="Variable name",x="XX",y="Count")+
geom_histogram(aes(x=speed, color= "Speed"), alpha = 0.2 ) +
geom_histogram(aes(x=dist, color= "Dist"), alpha = 0.2) +
scale_colour_manual(values = c("yellow","green"))
TwoHistos
Here is my result in an image (I pasted it but I don't know why it isn't showing up. I'm sorry!):
Two histograms with outlines for colours
I think (if I understand you correctly), what you might want is to give a fill arguement within the geom_histogram() call.
(I've used the mtcars built-in R data here as you did not give any data to work with)
TwoHistos <- ggplot (mtcars) +
labs(fill="Variable name",x="XX",y="Count")+
geom_histogram(aes(x=hp, fill= "Speed", color = "yellow"), alpha = 0.2 ) +
geom_histogram(aes(x=disp, fill= "Dist", color = "green"), alpha = 0.2) +
scale_fill_manual(values = c("yellow","green"))+
scale_colour_manual(values = c("yellow","green"), guide=FALSE)
TwoHistos
Edit: just to make really clear that I've changed the x in the geom_histogram() so it works with mtcars
Use fill instead of color and use scale_fill_manual
TwoHistos <- ggplot (cars) +
labs(color="Variable name",x="XX",y="Count")+
geom_histogram(aes(x=speed, fill= "Speed"), alpha = 0.2 ) +
geom_histogram(aes(x=dist, fill= "Dist"), alpha = 0.2) +
scale_fill_manual(values = c("yellow","green"))
TwoHistos

Adding reference points and lines to a geom_line() plot

I have a data frame like `dat1:
dat1 <- data.frame(idx = 1:200,
fit = rnorm(200,10,0.5))
cis <- data.frame(uci=dat1$fit+0.5,
lci = dat1$fit-0.5)
dat1 <- cbind(dat1,cis)
I also have 3 other objects where I have stored "points of interest" (or poi1:3) for dat1.
poi1 <- c(30,59,120,150)
poi2 <- c(10,42,110,165,190)
poi3 <- c(50, 100)
I made a line plot with confidence bands for dat1 using this code:
p<-
ggplot(dat1, aes(x=idx, y=fit))+
geom_line()+
geom_ribbon(aes(ymin = lci, ymax = uci), alpha = 0.3)+
labs(x="Distance", y="Var1")
p
I want to highlight the "points of interest" along this line. I want to highlight the points for poi1 red, the points for poi2 blue, and the points for poi3 green. I can use geom_vline() to make them all vertical:
p+
geom_vline(xintercept =poi1, color="red")+
geom_vline(xintercept = poi2, color = "blue")+
geom_vline(xintercept = poi3, color = "green")
But I would actually like the points for poi1 and poi2 to be either blue and red points (instead of lines, and leaving poi3 as a vertical green line), or much "shorter" versions of what is done by geom_vline (both above and below the black line). I cannot get geom_point() to behave correctly and do this. Do I need to format it differently, or how can I accomplish this?
Also, how can I next add a legend in the top corner that denotes which line/point and color denotes which group of poi? For instance if they are points (plus the green line) it will have a red point and say "poi1" a blue point next to the word "poi2" and a small green line next to the work "poi3"
Probably the most straightforward way to add the colored points is to have separate calls to geom_point and supply poi1, poi2, and poi3 as the method to subset your plot.
So, the code would look something like this:
p +
geom_point(data=dat1[which(dat1$idx %in% poi1),], color='red', size=3) +
geom_point(data=dat1[which(dat1$idx %in% poi2),], color='blue', size=3) +
geom_point(data=dat1[which(dat1$idx %in% poi3),], color='green', size=3)
The one problem here is that there's no reference to what those colors are supposed to be (as in... a legend). If you want to add that, you would put color= within aes(...) so that a legend key is created, and assign the name of the key item within (e.g. aes(color="name of item 1")). Then you would need a scale_color_manual(...) call to set the colors specifically... or you can do without the scale_color_manual object and just leave it up to ggplot to figure out the color scheme.
Assigning the colors manually in legend:
p +
geom_point(data=dat1[which(dat1$idx %in% poi1),], aes(color='poi1'), size=3) +
geom_point(data=dat1[which(dat1$idx %in% poi2),], aes(color='poi2'), size=3) +
geom_point(data=dat1[which(dat1$idx %in% poi3),], aes(color='poi3'), size=3) +
scale_color_manual(values=list('poi1'='red','poi2'='blue','poi3'='green'))
Without manual color assignment. ggplot just uses the theme's default color palette.
p +
geom_point(data=dat1[which(dat1$idx %in% poi1),], aes(color='poi1'), size=3) +
geom_point(data=dat1[which(dat1$idx %in% poi2),], aes(color='poi2'), size=3) +
geom_point(data=dat1[which(dat1$idx %in% poi3),], aes(color='poi3'), size=3)
Adding a vertical line and showing that in the legend
Related to the follow up question in the comment to this solution: How would you add a vertical line for xintercept=poi3 and have that shown in the legend?
You could just add individual calls to geom_vline with separate values for xintercept=, but if you have many this is just kind of bad practice. A better practice is to have one geom_vline call where you set the data= as poi3. ggplot wants to receive a dataframe as data=, so in the code below you'll see I force this conversion via as.data.frame(.... Forcing a data frame makes a datframe of 2 observations with one column called "poi3", so that's what we will assign as the xintercept= aesthetic. This works... but you'll see it's not quite right:
p +
geom_point(data=dat1[which(dat1$idx %in% poi1),], aes(color='poi1'), size=3) +
geom_point(data=dat1[which(dat1$idx %in% poi2),], aes(color='poi2'), size=3) +
geom_vline(data=as.data.frame(poi3), aes(xintercept=poi3, color='poi3'),
linetype=2, size=1) +
scale_color_manual(values=list('poi1'='red','poi2'='blue','poi3'='green2'))
The green line is added to the legend, but it's added "behind" all the points as a line + point. What we want is a separate legend. ggplot2 tries to combine legends where possible. What we'll do is kind of a cheat, which is to force ggplot2 to create another legend for linetype, and just set color='green2' outside of aes() in the geom_vline call. This removes the color of the line from the legend created for "colour" and adds another legend for "linetype" showing only our line. The final fix is to set the value for linetype via scale_manual_linetype. Since there's only one key here, you only need one value.
p +
geom_point(data=dat1[which(dat1$idx %in% poi1),], aes(color='poi1'), size=3) +
geom_point(data=dat1[which(dat1$idx %in% poi2),], aes(color='poi2'), size=3) +
geom_vline(
data=as.data.frame(poi3), aes(xintercept=poi3, linetype='poi3'),
color='green2', size=1) +
scale_color_manual(values=list('poi1'='red','poi2'='blue')) +
scale_linetype_manual(values=2)

R: ggplot2 density plot shows wrong fill colors

I would like to plot densities of two variables ("red_variable", "green_variable") from two independent dataframes on one density plot, using red and green color for the two variables.
This is my attempt at coding:
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = red_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = green_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
Result: The legend shows correct colors, but the colors on the plot are wrong: The "red" variable is plotted with green color, the "green" variable with red color. The "green" density (mean=8) should appear left and the "red" density (mean=12) on the right on the x-axis. This behavior of the plot doesn't make any sense to me.
I can in fact get the desired result by switching red and green in the code:
### load ggplot2
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = green_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = red_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
... While the plot makes sense now, the code doesn't. I cannot really trust code doing the opposite of what I would expect it to do. What's the problem here? Am I color blind?
On your code, in order to have color at the right position, you need to specify fill = red_color or fill = green_color (as well as alpha as it is a constant - as pointed out by #Gregor) outside of the aes such as:
...+
geom_density(aes(x=red_variable), alpha=0.5, fill = red_color, data=red_dataframe) +
geom_density(aes(x=green_variable), alpha=0.5, fill = green_color, data=green_dataframe) + ...
Alternatively, you can bind your dataframes together, reshape them into a longer format (much more appropriate to ggplot) and then add color column that you can use with scale_fill_identity function (https://ggplot2.tidyverse.org/reference/scale_identity.html):
df <- cbind(red_dataframe,green_dataframe)
library(tidyr)
library(ggplot2)
library(dplyr)
df <- df %>% pivot_longer(.,cols = c(red_variable,green_variable), names_to = "var",values_to = "val") %>%
mutate(Color = ifelse(grepl("red",var),red_color,green_color))
ggplot(df, aes(val, fill = Color))+
geom_density(alpha = 0.5)+
scale_fill_identity(guide = "legend", name = "Legend", labels = levels(as.factor(df$var)))+
xlab("X value") +
ylab("Density")
Does it answer your question ?
You're trying to use ggplot as if it's base graphics... the mindset shift can take a little while to get used to. dc37's answer shows how you should do it. I'll try to explain what goes wrong in your attempt:
When you put fill = green_color inside aes(), because it's inside aes() ggplot essentially creates a new column of data filled with the green_color values in your green_data_frame, i.e., "#008000", "#008000", "#008000", .... Ditto for the red color values in the red data frame. We can see this if we modify your plot by simply deleting your scale:
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
We can actually get what you want by putting the identity scale, which is designed for the (common in base, rare in ggplot2) case where you actually put color values in the data.
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
scale_fill_identity() +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
When you added your scale_fill_manual, ggplot was like "okay, cool, you want to specify colors and labels". But you were thinking in the order that you added the layers to the plot (much like base graphics), whereas ggplot was thinking of these newly created variables "#FF0000" and "#008000", which it ordered alphabetically by default (just as if they were factor or character columns in a data frame). And since you happened to add the layers in reverse alphabetical order, it was switched.
dc37's answer shows a couple better methods. With ggplot you should (a) work with a single, long-format data frame whenever possible (b) don't put constants inside aes() (constant color, constant alpha, etc.), (c) set colors in a scale_fill_* or scale_color_* function when they're not constant.

remove specific legend borders

I'm trying to modify the legend to a map by removing the borders around particular cells within the legend.
A simplified version of my code looks like this:
ggplot() +
stat_density2d(data = carto#data,
aes(x=field_2,y=field_1),
geom="polygon",
alpha = .37,
fill = "#e29206") +
geom_point(data = Schools#data[Schools#data$sch_type == "Charter" | Schools#data$sch_type == "District", ],
aes(x = x, y = y, color = sch_type, shape = grade_cat),
size = 1) +
scale_colour_manual(values=c("#e0100d", "#4753ff")) +
guides(color=guide_legend(override.aes=list(fill = "white"))) +
theme(legend.key = element_blank()) +
coord_map()
which produces the following image:
I would like to remove the blue and red borders on the top two legend cells. If I add color to the override.aes() arguments within guides() it changes the borders, but also makes the charter and district colors the same. Is there a different argument I could use in the place of color?
I looked at these two questions (among various sources) before posting:
Different legend-keys inside same legend in ggplot2
ggplot2 avoid boxes around legend symbols

R {ggplot2} define the bars columns by color within applied facet_grid()?

I have a basic plot of two variables on multiple levels, which I can display by facet_grid(). As a result, I have a set of barks of the same colors, arranged by levels of two variables.
However, what if I want to indicate, that my data are from different source?
i.e. from Friday to Sunday - from RED data, for Thursday from BLACK data.
Is there a way how can I indicate on my final plot by colors, that my data are from RED and BLACK datasets?
Something like:
(example taken from : http://www.cookbook-r.com/Graphs/Facets_(ggplot2)/)
Example code:
require(ggplot2)
library(reshape2)
head(tips)
ggplot(tips, aes(x=total_bill)) + geom_histogram(binwidth=2,colour="white")+
facet_grid(sex ~ day)
I know that I can change those colors eaily in Inkscape, by maybe there is a simple R based solution?
the data source (RED and BLACK) I will specify in Figure caption: no need to to specify that in the plot.
By assigning day to the fill aesthetic and manually defining the colour for the different values of day you can get the desired effect.
The following code:
ggplot(tips, aes(x=total_bill, fill = day)) + geom_histogram(binwidth=2, colour = "white")+
facet_grid(sex ~ day) +
scale_fill_manual(values = c("Fri" = "Red", "Sat" = "Red", "Sun" = "Red", "Thur" = "Black"))
gives the following plot:
The legend could be hidden by adding + guides(fill = "none")
This works:
ggplot(tips, aes(x=total_bill, fill=day=="Thur")) + geom_histogram(binwidth=2)+
facet_grid(sex ~ day) +
scale_fill_manual(values=c("red", "black"))
Edit: added colors "black" and "red"

Resources