Legend for overlaid plots in ggplot - r

I'm trying to make a plot that overlays a bunch of simulated density plots that are one color with low alpha and one empirical density plot with high alpha in a new color. This produces a plot that looks about how I want it.
library(ggplot2)
model <- c(1:100)
values <- rnbinom(10000, 1, .4)
df = data.frame(model, values)
empirical_data <- rnbinom(1000, 1, .3)
ggplot() +
geom_density(aes(x=empirical_data), color='orange') +
geom_line(stat='density',
data = df,
aes(x=values,
group = model),
color='blue',
alpha = .05) +
xlab("Value")
However, it doesn't have a legend and I can't figure out how to add a legend to differentiate plots from df and plots from empirical_data.
The other road I started to go down was to put them all in one dataframe but I couldn't figure out how to change the color and alpha for just one of the density plots.

Moving the color = ... into the aes allows you to call the scale_color_manual and move them into the aes and make the values you pass to color a binding. You can then change it to whatever you want as the actual colors are determined in the scale_color_manual.
ggplot() +
geom_density(aes(x=empirical_data, color='a')) +
geom_line(stat='density',
data = df,
aes(x=values,
group = model,
color='b'),
alpha = .05) +
scale_color_manual(name = 'data source',
values =c('b'='blue','a'='orange'),
labels = c('df','empirical_data')) +
xlab("Value")

Related

Plotting colors by value of a variable not being plotted in {ggplot2}

I am trying to code a plot using the data frame 'swiss' from {datasets} using {ggplot2}. I am plotting Infant.Mortality on the x-axis and Fertility on the y-axis, and I want the points to be colored such that they are a transparent blue or orange depending on if they are above or below the median value for Education. However, when I plot, I only get transparent blue points and the legend titles are off.
This is the code I have to far:
swiss$color[swiss$Education >= median(swiss$Education)] <- tBlue
swiss$color[swiss$Education < median(swiss$Education)] <- tOrange
ggplot(data = swiss) +
geom_point(mapping = aes(x = Infant.Mortality, y = Fertility, color = color)) +
scale_color_manual(values = swiss$color,
labels = ">= median", "<median")
I've also tried what was explained in this question (ggplot geom_point() with colors based on specific, discrete values) but I couldn't get it to work.
I am very new to ggplot, so any advice is appreciated!!
With ggplot we don't normally create column of color names (this is common in base graphics). Instead, the usual way is to create a column in your data with meaningful labels, like this:
swiss$edu_med = ifelse(swiss$Education >= median(swiss$Education), ">= Median", "< Median")
ggplot(data = swiss) +
geom_point(mapping = aes(x = Infant.Mortality, y = Fertility, color = edu_med)) +
scale_color_manual(values = c(tblue, torange))
The legend labels will be automatically generated from the data values.
It is possible to do it the way you have in the question, in this case use scale_color_identity(labels = ">= median", "< median") instead of scale_color_manual().

How to change the legend from geom_area to color in geom_line [duplicate]

I have a graph of wind speeds against direction which has a huge numeber of points, and so am using alpha=I(1/20) in addition to color=month
Here is a sample of code:
library(RMySQL)
library(ggplot2)
con <- dbConnect(...)
wind <- dbGetQuery(con, "SELECT speed_w/speed_e AS ratio, dir_58 as dir, MONTHNAME(timestamp) AS month, ROUND((speed_w+speed_e)/2) AS speed FROM tablename;");
png("ratio-by-speed.png",height=400,width=1200)
qplot(wind$dir,wind$ratio,ylim=c(0.5,1.5),xlim=c(0,360),color=wind$month,alpha=I(1/30),main="West/East against direction")
dev.off()
This produces a decent graph, however my issue is that the alpha of the legend is 1/30th also, which makes it unreadable. Is there a way I can force the legend to be 1 alpha instead?
Here is an example:
Update With the release of version 0.9.0, one can now override aesthetic values in the legend using override.aes in the guides function. So if you add something like this to your plot:
+ guides(colour = guide_legend(override.aes = list(alpha = 1)))
that should do it.
I've gotten around this by doing a duplicate call to the geom using an empty subset of the data and using the legend from that call. Unfortunately, it doesn't work if the data frame is actually empty (e.g. as you'd get from subset(diamonds,FALSE)) since ggplot2 seems to treat this case the same as it treats NULL in place of a data frame. But we can get the same effect by taking a subset with only one row and setting it to NaN on one of the plot dimensions, which will prevent it from getting plotted.
Based off Chase's example:
# Alpha parameter washes out legend:
gp <- ggplot() + geom_point(data=diamonds, aes(depth, price, colour=clarity), alpha=0.1)
print(gp)
# Full color legend:
dummyData <- diamonds[1, ]
dummyData$price <- NaN
#dummyData <- subset(diamonds, FALSE) # this would be nicer but it doesn't work!
gp <- ggplot() +
geom_point(data=diamonds, aes(depth, price, colour=clarity), alpha=0.1, legend=FALSE) +
geom_point(data=dummyData, aes(depth, price, colour=clarity), alpha=1.0, na.rm=TRUE)
print(gp)
A bit of googling turned up this post which doesn't seem to indicate that ggplot currently supports this option. Others have addressed related problems by using gridExtra and using viewPorts as discussed here.
I'm not that sophisticated, but here's one approach that should give you the desired results. The approach is to plot the geom twice, once without an alpha parameter and outside of the real plotting area. The second geom will include the alpha parameter and suppress the legend. We will then specify the plotting region with xlim and ylim. Given that you are a lot of points, this will roughly double the plotting time, but should give you the effect you are after.
Using the diamonds dataset:
#Alpha parameter washes out legend
ggplot(data = diamonds, aes(depth, price, colour = clarity)) +
geom_point(alpha = 1/10)
#Fully colored legend
ggplot() +
geom_point(data = diamonds, aes(depth, price, colour =clarity), alpha = 1/10, legend = FALSE) +
geom_point(data = diamonds, aes(x = depth - 999999, y = price - 999999, colour = clarity)) +
xlim(40, 80) + ylim(0, 20000)

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

R: ggplot2 density plot shows wrong fill colors

I would like to plot densities of two variables ("red_variable", "green_variable") from two independent dataframes on one density plot, using red and green color for the two variables.
This is my attempt at coding:
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = red_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = green_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
Result: The legend shows correct colors, but the colors on the plot are wrong: The "red" variable is plotted with green color, the "green" variable with red color. The "green" density (mean=8) should appear left and the "red" density (mean=12) on the right on the x-axis. This behavior of the plot doesn't make any sense to me.
I can in fact get the desired result by switching red and green in the code:
### load ggplot2
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = green_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = red_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
... While the plot makes sense now, the code doesn't. I cannot really trust code doing the opposite of what I would expect it to do. What's the problem here? Am I color blind?
On your code, in order to have color at the right position, you need to specify fill = red_color or fill = green_color (as well as alpha as it is a constant - as pointed out by #Gregor) outside of the aes such as:
...+
geom_density(aes(x=red_variable), alpha=0.5, fill = red_color, data=red_dataframe) +
geom_density(aes(x=green_variable), alpha=0.5, fill = green_color, data=green_dataframe) + ...
Alternatively, you can bind your dataframes together, reshape them into a longer format (much more appropriate to ggplot) and then add color column that you can use with scale_fill_identity function (https://ggplot2.tidyverse.org/reference/scale_identity.html):
df <- cbind(red_dataframe,green_dataframe)
library(tidyr)
library(ggplot2)
library(dplyr)
df <- df %>% pivot_longer(.,cols = c(red_variable,green_variable), names_to = "var",values_to = "val") %>%
mutate(Color = ifelse(grepl("red",var),red_color,green_color))
ggplot(df, aes(val, fill = Color))+
geom_density(alpha = 0.5)+
scale_fill_identity(guide = "legend", name = "Legend", labels = levels(as.factor(df$var)))+
xlab("X value") +
ylab("Density")
Does it answer your question ?
You're trying to use ggplot as if it's base graphics... the mindset shift can take a little while to get used to. dc37's answer shows how you should do it. I'll try to explain what goes wrong in your attempt:
When you put fill = green_color inside aes(), because it's inside aes() ggplot essentially creates a new column of data filled with the green_color values in your green_data_frame, i.e., "#008000", "#008000", "#008000", .... Ditto for the red color values in the red data frame. We can see this if we modify your plot by simply deleting your scale:
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
We can actually get what you want by putting the identity scale, which is designed for the (common in base, rare in ggplot2) case where you actually put color values in the data.
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
scale_fill_identity() +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
When you added your scale_fill_manual, ggplot was like "okay, cool, you want to specify colors and labels". But you were thinking in the order that you added the layers to the plot (much like base graphics), whereas ggplot was thinking of these newly created variables "#FF0000" and "#008000", which it ordered alphabetically by default (just as if they were factor or character columns in a data frame). And since you happened to add the layers in reverse alphabetical order, it was switched.
dc37's answer shows a couple better methods. With ggplot you should (a) work with a single, long-format data frame whenever possible (b) don't put constants inside aes() (constant color, constant alpha, etc.), (c) set colors in a scale_fill_* or scale_color_* function when they're not constant.

How to set legend alpha with ggplot2

I have a graph of wind speeds against direction which has a huge numeber of points, and so am using alpha=I(1/20) in addition to color=month
Here is a sample of code:
library(RMySQL)
library(ggplot2)
con <- dbConnect(...)
wind <- dbGetQuery(con, "SELECT speed_w/speed_e AS ratio, dir_58 as dir, MONTHNAME(timestamp) AS month, ROUND((speed_w+speed_e)/2) AS speed FROM tablename;");
png("ratio-by-speed.png",height=400,width=1200)
qplot(wind$dir,wind$ratio,ylim=c(0.5,1.5),xlim=c(0,360),color=wind$month,alpha=I(1/30),main="West/East against direction")
dev.off()
This produces a decent graph, however my issue is that the alpha of the legend is 1/30th also, which makes it unreadable. Is there a way I can force the legend to be 1 alpha instead?
Here is an example:
Update With the release of version 0.9.0, one can now override aesthetic values in the legend using override.aes in the guides function. So if you add something like this to your plot:
+ guides(colour = guide_legend(override.aes = list(alpha = 1)))
that should do it.
I've gotten around this by doing a duplicate call to the geom using an empty subset of the data and using the legend from that call. Unfortunately, it doesn't work if the data frame is actually empty (e.g. as you'd get from subset(diamonds,FALSE)) since ggplot2 seems to treat this case the same as it treats NULL in place of a data frame. But we can get the same effect by taking a subset with only one row and setting it to NaN on one of the plot dimensions, which will prevent it from getting plotted.
Based off Chase's example:
# Alpha parameter washes out legend:
gp <- ggplot() + geom_point(data=diamonds, aes(depth, price, colour=clarity), alpha=0.1)
print(gp)
# Full color legend:
dummyData <- diamonds[1, ]
dummyData$price <- NaN
#dummyData <- subset(diamonds, FALSE) # this would be nicer but it doesn't work!
gp <- ggplot() +
geom_point(data=diamonds, aes(depth, price, colour=clarity), alpha=0.1, legend=FALSE) +
geom_point(data=dummyData, aes(depth, price, colour=clarity), alpha=1.0, na.rm=TRUE)
print(gp)
A bit of googling turned up this post which doesn't seem to indicate that ggplot currently supports this option. Others have addressed related problems by using gridExtra and using viewPorts as discussed here.
I'm not that sophisticated, but here's one approach that should give you the desired results. The approach is to plot the geom twice, once without an alpha parameter and outside of the real plotting area. The second geom will include the alpha parameter and suppress the legend. We will then specify the plotting region with xlim and ylim. Given that you are a lot of points, this will roughly double the plotting time, but should give you the effect you are after.
Using the diamonds dataset:
#Alpha parameter washes out legend
ggplot(data = diamonds, aes(depth, price, colour = clarity)) +
geom_point(alpha = 1/10)
#Fully colored legend
ggplot() +
geom_point(data = diamonds, aes(depth, price, colour =clarity), alpha = 1/10, legend = FALSE) +
geom_point(data = diamonds, aes(x = depth - 999999, y = price - 999999, colour = clarity)) +
xlim(40, 80) + ylim(0, 20000)

Resources