how could I make a legend representing all the curves that are plotted in my graph ? Presently, an automatic legend is generated for the first layer (based on the "colour" aesthetic), but the other layer (the black curve representing the density of "price" variable across all observations) in not contained in this legend.
I conceive that my question comes certainly from an incomplete understanding of the concepts behing ggplot package.
ggplot(diamonds) +
geom_density(aes(x = price, y = ..density.., colour = cut)) +
geom_density(aes(x = price,y = ..density..))
The principle in ggplot2 is that each aesthetic gets mapped to a scale. So, if you want to include a layer in the colour scale, you need to map that layer to colour.
Like this:
ggplot(diamonds, aes(x=price)) +
geom_density(aes(colour = cut)) +
geom_density(aes(colour="Overall"), size=1.5)
Note: You can take additional control over the colours by specifying a manual colour scale:
ggplot(diamonds, aes(x=price)) +
geom_density(aes(colour = cut)) +
geom_density(aes(colour="Overall"), size=1.5) +
scale_colour_manual(
limits=c("Overall", levels(diamonds$cut)),
values=c("black", 2:6)
)
Related
I am trying to do a qplot to do a scatter plot matrix.
qplot(X, Y, data=Customers, shape = Z,facets=ColA~ColB, size=I(3), xlab="X",ylab="Y")
Where Z is a categorical variable with more than 6 levels.
I get this error message:
"The shape palette can deal with a maximum of 6 discrete values because more than 6 becomes difficult to discriminate. Consider specifying shapes manually if you must have them"
My question is, how do I specify shape manually?
You would be better off calling the plot through ggplot directly and setting the shape scale manually instead of using qplot:
ggplot(data=Customers, aes(x=X, y=Y, shape=Z)) +
geom_point(size=1) +
labs(x="X",y="Y")+
scale_shape_manual(values=c(4,29,30,53,23,53,64,53,23)) +
facet_grid(ColA~ColB)
This page has a legend of all the available shapes for plotting in ggplot: https://www.datanovia.com/en/blog/ggplot-point-shapes-best-tips/
Qplot is a "quick and dirty" method for making plots, and calling the plot commands through ggplot allows you to have more control over the output.
My first tip would be to avoid qplot. The short syntax is doing nobody any favors. Try
ggplot(Customers, aes(x = X, y = Y, shape = Z)) +
theme_bw() +
geom_point(size = 3) +
xlab("X") + ylab("Y") +
facet_grid(ColA ~ ColB)
and you can now easily read and add extra layers, i.e. manual colors. See documentation on how to specify colors in various ways.
ggplot(Customers, aes(x = X, y = Y, shape = Z)) +
theme_bw() +
geom_point(size = 3) +
xlab("X") + ylab("Y") +
scale_colour_manual(values = c("red", "blue", "green", _more_colors_)) +
facet_grid(ColA ~ ColB)
my favorite is
scale_color_brewer(palette = "Set1")
I would like to plot densities of two variables ("red_variable", "green_variable") from two independent dataframes on one density plot, using red and green color for the two variables.
This is my attempt at coding:
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = red_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = green_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
Result: The legend shows correct colors, but the colors on the plot are wrong: The "red" variable is plotted with green color, the "green" variable with red color. The "green" density (mean=8) should appear left and the "red" density (mean=12) on the right on the x-axis. This behavior of the plot doesn't make any sense to me.
I can in fact get the desired result by switching red and green in the code:
### load ggplot2
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = green_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = red_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
... While the plot makes sense now, the code doesn't. I cannot really trust code doing the opposite of what I would expect it to do. What's the problem here? Am I color blind?
On your code, in order to have color at the right position, you need to specify fill = red_color or fill = green_color (as well as alpha as it is a constant - as pointed out by #Gregor) outside of the aes such as:
...+
geom_density(aes(x=red_variable), alpha=0.5, fill = red_color, data=red_dataframe) +
geom_density(aes(x=green_variable), alpha=0.5, fill = green_color, data=green_dataframe) + ...
Alternatively, you can bind your dataframes together, reshape them into a longer format (much more appropriate to ggplot) and then add color column that you can use with scale_fill_identity function (https://ggplot2.tidyverse.org/reference/scale_identity.html):
df <- cbind(red_dataframe,green_dataframe)
library(tidyr)
library(ggplot2)
library(dplyr)
df <- df %>% pivot_longer(.,cols = c(red_variable,green_variable), names_to = "var",values_to = "val") %>%
mutate(Color = ifelse(grepl("red",var),red_color,green_color))
ggplot(df, aes(val, fill = Color))+
geom_density(alpha = 0.5)+
scale_fill_identity(guide = "legend", name = "Legend", labels = levels(as.factor(df$var)))+
xlab("X value") +
ylab("Density")
Does it answer your question ?
You're trying to use ggplot as if it's base graphics... the mindset shift can take a little while to get used to. dc37's answer shows how you should do it. I'll try to explain what goes wrong in your attempt:
When you put fill = green_color inside aes(), because it's inside aes() ggplot essentially creates a new column of data filled with the green_color values in your green_data_frame, i.e., "#008000", "#008000", "#008000", .... Ditto for the red color values in the red data frame. We can see this if we modify your plot by simply deleting your scale:
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
We can actually get what you want by putting the identity scale, which is designed for the (common in base, rare in ggplot2) case where you actually put color values in the data.
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
scale_fill_identity() +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
When you added your scale_fill_manual, ggplot was like "okay, cool, you want to specify colors and labels". But you were thinking in the order that you added the layers to the plot (much like base graphics), whereas ggplot was thinking of these newly created variables "#FF0000" and "#008000", which it ordered alphabetically by default (just as if they were factor or character columns in a data frame). And since you happened to add the layers in reverse alphabetical order, it was switched.
dc37's answer shows a couple better methods. With ggplot you should (a) work with a single, long-format data frame whenever possible (b) don't put constants inside aes() (constant color, constant alpha, etc.), (c) set colors in a scale_fill_* or scale_color_* function when they're not constant.
I'm creating a plot with ggplot that uses colored points, vertical lines, and horizontal lines to display the data. Ideally, I'd like to use two different color or linetype scales for the geom_vline and geom_hline layers, but ggplot discourages/disallows multiple variables mapped to the same aesthetic.
# Create example data
library(tidyverse)
library(lubridate)
set.seed(1234)
example.df <- data_frame(dt = seq(ymd("2016-01-01"), ymd("2016-12-31"), by="1 day"),
value = rnorm(366),
grp = sample(LETTERS[1:3], 366, replace=TRUE))
date.lines <- data_frame(dt = ymd(c("2016-04-01", "2016-10-31")),
dt.label = c("April Fools'", "Halloween"))
value.lines <- data_frame(value = c(-1, 1),
value.label = c("Threshold 1", "Threshold 2"))
If I set linetype aesthetics for both geom_*lines, they get put in the
linetype legend together, which doesn't necessarily make logical sense
ggplot(example.df, aes(x=dt, y=value, colour=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, linetype=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(size=1) +
scale_x_date() +
theme_minimal()
Alternatively, I could set one of the lines to use a colour aesthetic,
but then that again puts the legend lines in an illogical legend
grouping
ggplot(example.df, aes(x=dt, y=value, colour=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, colour=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(size=1) +
scale_x_date() +
theme_minimal()
The only partial solution I've found is to use a fill aesthetic instead
of colour in geom_pointand setting shape=21 to use a fillable shape,
but that forces a black border around the points. I can get rid of the
border by manually setting color="white, but then the white border
covers up points. If I set colour=NA, no points are plotted.
ggplot(example.df, aes(x=dt, y=value, fill=grp)) +
geom_hline(data=value.lines, aes(yintercept=value, colour=value.label)) +
geom_vline(data=date.lines, aes(xintercept=as.numeric(dt), linetype=dt.label)) +
geom_point(shape=21, size=2, colour="white") +
scale_x_date() +
theme_minimal()
This might be a case where ggplot's "you can't have two variables mapped
to the same aesthetic" rule can/should be broken, but I can't figure out clean way around it. Using fill with geom_point shows the most promise, but there's no way to remove the point borders.
Any ideas for plotting two different color or linetype aesthetics here?
I am trying to change the style settings of this kind of chart and hope you can help me.
R code:
set_theme(theme_bw)
cglac$pred2<-as.factor(cglac$pred)
ggplot(cglac, aes(x=depth, colour=pred2))
+ geom_bar(aes(y=..density..),binwidth=3, alpha=.5, position="stack")
+ geom_density(alpha=.2)
+ xlab("Depth (m)")
+ ylab("Counts & Density")
+ coord_flip()
+ scale_x_reverse()
+ theme_bw()
which produces this graph:
Here some points:
What I want is to have the density line as black and white lines separated by symbols rather than colour (dashed line, dotted line etc).
The other thing is the histogram itself. How do I get rid of the grey background in the bars?
Can I change the bars also to black and white symbol lines (shaded etc)? So that they would match the density lines?
Last but not least I want to add a second x or in this case y axis, because of flip_coord(). The one I see right now is for the density. The other one I need would then be the count data from the pred2 variable.
Thanks for helping.
Best,
Moritz
Have different line types: inside aes(), put linetype = pred2. To make the line color black, inside geom_density, add an argument color = "black".
The "background" of the bars is called "fill". Inside geom_bar, you can set fill = NA for no fill. A more common approach is to fill in the bars with the colors, inside aes() specify fill = pred2. You might consider faceting by your variable, + facet_wrap(~ pred2, nrow = 1) might look very nice.
Shaded bars in ggplot? No, you can't do that easily. See the answers to this question for other options and hacks.
Second y-axis, similar to the shaded symbol lines, the ggplot creator thinks a second y-axis is a terrible design choice, so you can't do it at all easily. Here's a related question, including Hadley's point of view:
I believe plots with separate y scales (not y-scales that are transformations of each other) are fundamentally flawed.
It's definitely worth considering his point of view, and asking yourself if those design choices are really what you want.
Different linetypes for densities
Here's my built-in data version of what you're trying to do:
ggplot(mtcars, aes(x = hp,
linetype = cyl,
group = cyl,
color = cyl)) +
geom_histogram(aes(y=..density.., fill = cyl),
alpha=.5, position="stack") +
geom_density(color = "black") +
coord_flip() +
theme_bw()
And what I think you should do instead. This version uses facets instead of stacking/colors/linetypes. You seem to be aiming for black and white, which isn't a problem at all in this version.
ggplot(mtcars, aes(x = hp,
group = cyl)) +
geom_histogram(aes(y=..density..),
alpha=.5) +
geom_density() +
facet_wrap(~ cyl, nrow = 1) +
coord_flip() +
theme_bw()
I am using ggplot's geom_tile to do 2-D density plots faceted by a factor. Every facet's scale goes from the minimum of all the data to the maximum of all the data, but the geom_tile in each facet only extends to the range of the data plotted in that facet.
Example code that demonstrates the problem:
library(ggplot2)
data.unlimited <- data.frame(x=rnorm(500), y=rnorm(500))
data.limited <- subset(data.frame(x=rnorm(500), y=rnorm(500)), x<1 & y<1 & x>-1 & y>-1)
mydata <- rbind(data.frame(groupvar="unlimited", data.unlimited),
data.frame(groupvar="limited", data.limited))
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar)
Run the code, and you will see two facets. One facet shows a density plot of an "unlimited" random normal distribution. The second facet shows a random normal truncated to lie within a 2x2 square about the origin. The geom_tile in the "limited" facet will be confined inside this small box instead of filling the facet.
last_plot() +
scale_x_continuous(limits=c(-5,5)) +
scale_y_continuous(limits=c(-5,5))
These last three lines plot the same data with specified x and y limits, and we see that neither facet extends the tile sections to the edge in this case.
Is there any way to force the geom_tile in each facet to extend to the full range of the facet?
I think you're looking for a combination of scales = "free" and expand = c(0,0):
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar,scales = "free") +
scale_x_continuous(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0))
EDIT
Given the OP's clarification, here's one option via simply setting the panel background manually:
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar) +
scale_fill_gradient(low = "blue", high = "red") +
opts(panel.background = theme_rect(fill = "blue"),panel.grid.major = theme_blank(),
panel.grid.minor = theme_blank())