R ggplot2 legend with linetypes - r

I am relatively new to R and I have some difficulties with ggplot2. I have a data frame consisting of three variables (alpha, beta, gamma) and I want to plot them together. I get the plot but I have two problems:
legend is outside the plot and I want it to be inside
linetypes are changed to "solid", "dashed" and "dotted"!
Any ideas/suggestions would be more than welcome!
p <- ggplot() +
geom_line(data=my.data,aes(x = time, y = alpha,linetype = 'dashed')) +
geom_line(data=my.data,aes(x = time, y = beta, linetype = 'dotdash')) +
geom_line(data=my.data,aes(x = time, y = gamma,linetype = 'twodash')) +
scale_linetype_discrete(name = "", labels = c("alpha", "beta", "gamma"))+
theme_bw()+
xlab('time (years)')+
ylab('Mean optimal paths')
print(p)

What you are after is easier to achieve if you first rearrange your data to long format, with one observation per row.
You can do this with tidyr's gather function. Then you can simply map the linetype to the variable in your data.
In your original approach, you tried to assign a literal 'linetype' by using aes(), but ggplot interprets this as you saying, 'assign a line type here as if the variable that is mapped to linetype had the value dashed/dotdash/twodash'. When drawing the plot, it looks up the linetypes in the default scale_linetype_discrete, the first three values of which happen to be solid, dotted and dashed, which is why you're seeing the confusing replacement. You can specify linetypes by using scale_linetype_manual.
The position of the legend is adjustable in theme().
legend.position = c(0,1) defines the legend to be placed at the left, top corner.
legend.justification = c(0,1) sets the anchor to use in legend.position to the left, top corner of the legend box.
library(tidyr)
library(ggplot2)
# Create some example data
my.data <- data.frame(
time=1:100,
alpha = rnorm(100),
beta = rnorm(100),
gamma = rnorm(100)
)
my.data <- my.data %>%
gather(key="variable", value="value", alpha, beta, gamma)
p <- ggplot(data=my.data, aes(x=time, y=value, linetype=variable)) +
geom_line() +
scale_linetype_manual(
values=c("solid", "dotdash", "twodash"),
name = "",
labels = c("alpha", "beta", "gamma")) +
xlab('time (years)')+
ylab('Mean optimal paths') +
theme_bw() +
theme(legend.position=c(0.1, 0.9), legend.justification=c(0,1))
print(p)

Related

Is there are a way to change the breaks of a ggplot legend without changing other properties of the aesthetic?

I wish to change the breaks of a ggplot legend without affecting the other properties of the aesthetic (e.g., palette, name, etc.). For example, a MWE where the aesthetic is colour:
## Original plot:
df <- data.frame(x = 1:10, y = 1:10, z = 1:10)
gg <- ggplot(df, aes(x, y, colour = z)) +
geom_point() +
scale_colour_distiller(palette = "Spectral", name = "Original title")
gg
## Plot with adjusted breaks:
gg + scale_colour_distiller(breaks = c(2.5, 7.5))
Original plot
Plot with adjusted breaks
In the second plot, the colour palette and the legend name are reset to their default values: I want to change the legend breaks only.
I understand why the above approach does not work; the first colour scale is completely replaced by the second scale. However, I don't know how to tackle this problem. Any advice is greatly appreciated!
I wrote a function which solves my question. It takes a ggplot object, the name of an aesthetic (as a string), and the breaks for the corresponding legend.
change_legend_breaks <- function(gg, aesthetic, breaks) {
## Find the scales associated with the specifed aesthetic
sc <- as.list(gg$scales)$scales
all_aesthetics <- sapply(sc, function(x) x[["aesthetics"]][1])
idx <- which(aesthetic == all_aesthetics)
## Overwrite the breaks of the specifed aesthetic
gg$scales$scales[[idx]][["breaks"]] <- breaks
return(gg)
}
This is my first time dealing with ggplot objects at a low level, so perhaps there is a better, more robust approach: This works for me, though.
Interestingly, it seems to be a mutating function, that is, it alters the plot object itself, rather than a copy of the object. I didn't know this was possible in R.
As a check that the function works as intended, here is a variant on the original MWE, this time with two aesthetics:
df <- data.frame(x = 1:10, y = 1:10, z1 = 1:10, z2 = 1:10)
gg <- ggplot(df, aes(x, y, colour = z1, size = z2)) +
geom_point() +
scale_size(name = "Original size title") +
scale_colour_distiller(palette = "Spectral", name = "Original colour title")
change_legend_breaks(gg, "colour", breaks = c(2.5, 7.5))
change_legend_breaks(gg, "size", breaks = c(1, 9))

Create ggplot2 legend for multiple datasets

I am trying to display background data in grey in a ggplot with legend automatically. My aim is to either include the grey datapoints in the legend, or to make a second legend with a manual title. However I fail at doing any of the two. My data is in long format.
require(ggplot2)
xx<-data.frame(observation="all cats",x=1:2,y=1:2)
yy<-data.frame(observation=c("red cats","blue cats"),x=3:4,y=3:4)
g<-ggplot() +
geom_point(aes(x,y, colour=factor(observation)), colour="grey60", size=5, data=xx) +
geom_point(aes(x,y, colour=factor(observation)), size=5, data=yy) +
scale_color_discrete(name = "ltitle")
g
I tried to merge the data.frames with rbind.data.frame, which produces a nice legend, but then I am not able to colour the background data in grey and keep ggplot colours at the same time.
I also realized that this solves the problem:
g<-ggplot(aes(x,y, colour=factor(observation)), colour="grey60", data=xx) +
geom_point(size=5) +
geom_point(aes(x,y, colour=factor(observation)), size=5, data=yy) +
scale_color_discrete(name = "ltitle")
g
however I can't do this, because I'm using a function which creates a complicated empty plot before, in which I then add the geom_points.
Assuming your plot doesn't have other geoms that require a fill parameter, the following is a workaround that fixes the colour of your background data geom_point layer without affecting the other geom_point layers:
g <- ggplot() +
geom_point(aes(x, y,
fill = "label"), # key change 1
shape = 21, # key change 2
color = "grey50", size = 5,
data = xx) +
geom_point(aes(x, y, colour = factor(observation)), size = 5, data = yy) +
scale_color_discrete(name = "ltitle") +
scale_fill_manual(name = "", values = c("label" = "grey50")) # key change 3
g
shape = 21 gives you a shape that looks like the default round dot, but accepts a fill parameter in addition to the colour parameter. You can then set xx's geom_point layer's fill to grey in scale_fill_manual() (this creates a fill legend), while leaving color = "grey50" outside aes() (this does not add to the colour legend).
The colour scale for yy's geom_point layer is not affected by any of this.
p.s. Just realized I used "grey50" instead of "grey60"... But everything else still applies. :)
One solution is to create color vector and pass it to scale_color_manual.
xx <- data.frame(observation = "all cats",x = 1:2,y = 1:2)
yy <- data.frame(observation = c("red cats", "blue cats"),x = 3:4,y = 3:4)
# rbind both datasets
# OP tried to use rbind.data.frame here
plotData <- rbind(xx, yy)
# Create color vector
library(RColorBrewer)
# Extract 3 colors from brewer Set1 palette
colorData <- brewer.pal(length(unique(plotData$observation)), "Set1")
# Replace first color first wanted grey
colorData[1] <- "grey60"
# Plot data
library(ggplot2)
ggplot(plotData, aes(x, y, colour = observation)) +
geom_point(size = 5)+
scale_color_manual(values = colorData, name = "ltitle")
I came up with pretty much same solution as Z.Lin but using the combined dataframe from rbind.data.frame. Similarly, it uses scale_colour_manual with a vector colors specifying the color mapping:
require(ggplot2)
xx<-data.frame(observation="all cats",x=1:2,y=1:2)
yy<-data.frame(observation=c("red cats","blue cats"),x=3:4,y=3:4)
zz <- rbind.data.frame(xx,yy)
colors <- c(
"all cats" = "grey60",
"red cats" = "red",
"blue cats" = "blue"
)
g<-ggplot() +
geom_point(aes(x,y, colour=factor(observation)), size=5, data=zz) +
scale_color_manual(values= colors, name = "ltitle")
g

Manipulating the legend of scale_fill_gradient2

I have data which comes from a statistical test (gene set enrichment analysis, but that's not important), so I obtain p-values for statistics that are normally distributed, i.e., both positive and negative values:
The test is run on several categories:
set.seed(1)
df <- data.frame(col = rep(1,7),
category = LETTERS[1:7],
stat.sign = sign(rnorm(7)),
p.value = runif(7, 0, 1),
stringsAsFactors = TRUE)
I want to present these data in a geom_tile ggplot such that I color code the df$category by their df$p.value multiplied by their df$stat.sign (i.e, the sign of the statistic)
For that I first take the log10 of df$p.value:
df$sig <- df$stat.sign*(-1*log10(df$p.value))
Then I order the df by df$sig for each sign of df$sig:
library(dplyr)
df <- rbind(dplyr::filter(df, sig < 0)[order(dplyr::filter(df, sig < 0)$sig), ],
dplyr::filter(df, sig > 0)[order(dplyr::filter(df, sig > 0)$sig), ])
And then I ggplot it:
library(ggplot2)
df$category <- factor(df$category, levels=df$category)
ggplot(data = df,
aes(x = col, y = category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue', mid='white', high='darkred') +
theme_minimal() +
xlab("") + ylab("") + labs(fill="-log10(P-Value)") +
theme(axis.text.y = element_text(size=12, face="bold"),
axis.text.x = element_blank())
which gives me:
Is there a way to manipulate the legend such that the values of df$sig are represented by their absolute value but everything else remains unchanged? That way I still get both red and blue shades and maintain the order I want.
If you check ggplot's documentation, scale_fill_gradient2, like other continuous scales, accepts one of the following for its labels argument:
NULL for no labels
waiver() for the default labels computed for the transofrmation object
a character vector giving labels (must be same length as breaks)
a function that takes the breaks as input and returns labels as output
Since you only want the legend values to be absolute, I assume you're satisfied with the default breaks in the legend colour bar (-0.1 to 0.4 with increments in 0.1), so all you really need is to add a function that manipulates the labels.
I.e. instead of this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred') +
Use this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred',
labels = abs) +
I'm not sure I did understood what you're looking for. Do you meant that you wan't to change the labels within legends? If you want to change labels manipulating breaks and labels given by scale_fill_gradient2() shall do it.
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
For what you're looking for maybe you could display texts inside the figure to show the values, try stacking stat_bin_2d() like this:
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
stat_bin_2d(geom = 'text', aes(label = sig), colour = 'black', size = 16) +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
You might want to give the size and colour arguments some tries.

Can I fix overlapping dashed lines in a histogram in ggplot2?

I am trying to plot a histogram of two overlapping distributions in ggplot2. Unfortunately, the graphic needs to be in black and white. I tried representing the two categories with different shades of grey, with transparency, but the result is not as clear as I would like. I tried adding outlines to the bars with different linetypes, but this produced some strange results.
require(ggplot2)
set.seed(65)
a = rnorm(100, mean = 1, sd = 1)
b = rnorm(100, mean = 3, sd = 1)
dat <- data.frame(category = rep(c('A', 'B'), each = 100),
values = c(a, b))
ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 1) +
scale_fill_grey()
Notice that one of the lines that should appear dotted is in fact solid (at a value of x = 4). I think this must be a result of it actually being two lines - one from the 3-4 bar and one from the 4-5 bar. The dots are out of phase so they produce a solid line. The effect is rather ugly and inconsistent.
Is there any way of fixing this overlap?
Can anyone suggest a more effective way of clarifying the difference between the two categories, without resorting to colour?
Many thanks.
One possibility would be to use a 'hollow histogram', as described here:
# assign your original plot object to a variable
p1 <- ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 0.4) +
scale_fill_grey()
# p1
# extract relevant variables from the plot object to a new data frame
# your grouping variable 'category' is named 'group' in the plot object
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
# plot using geom_step
ggplot(data = df, aes(x = xmin, y = y, linetype = factor(group))) +
geom_step()
If you want to vary both linetype and fill, you need to plot a histogram first (which can be filled). Set the outline colour of the histogram to transparent. Then add the geom_step. Use theme_bw to avoid 'grey elements on grey background'
p1 <- ggplot() +
geom_histogram(data = dat, aes(x = values, fill = category),
colour = "transparent", position = 'identity', alpha = 0.4, binwidth = 0.4) +
scale_fill_grey()
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
df$category <- factor(df$group, labels = c("A", "B"))
p1 +
geom_step(data = df, aes(x = xmin, y = y, linetype = category)) +
theme_bw()
First, I would recommend theme_set(theme_bw()) or theme_set(theme_classic()) (this sets the background to white, which makes it (much) easier to see shades of gray).
Second, you could try something like scale_linetype_manual(values=c(1,3)) -- this won't completely eliminate the artifacts you're unhappy about, but it might make them a little less prominent since linetype 3 is sparser than linetype 2.
Short of drawing density plots instead (which won't work very well for small samples and may not be familiar to your audience), dodging the positions of the histograms (which is ugly), or otherwise departing from histogram conventions, I can't think of a better solution.

How to remove the background and labels when we have two geoms in ggplot2

I am developing a graph in R with ggplot2 that has two geoms (one geom_line and one geom_text). It draws a line graph and then places text labels on start and end points of each line segment.
(myplot <- ggplot(data=datatable, aes(x, y, group = group,colour = group, label=mylabels)) + geom_line(size = 1.5))
myplot + geom_text(color = "black")
Now my question is how can I do the following tasks in ggplot2, they all work when I only have one geom but not with both (seems that they overide each other)
1 - making the background white.
The following code works with geom_line but as soon as I add geom_text it becomes gray again. Even if I add this line after geom_text it gets rid of the point labels that are on the chart.
myplot + opts(panel.background = theme_rect(fill = "white", colour = NA))
2- x labels and x label format. Again the following code works with only one geom but breaks when I have the second geom
myplot + scale_x_date(format="%m", 'my x label')
3- While we are on it how can I put the legend at the bottom and spread it horizontally (p + opts(legend.position="bottom")) spreads that vertically that looks very stupid.
For 1), you haven't saved the object myplot after the second and third calls involving it. This works for me:
set.seed(3)
dat <- data.frame(x = runif(10), y = rnorm(10), group = gl(2,5),
mylabel = paste(1:10, "foo"))
require(ggplot2)
myplot <- ggplot(data=dat, aes(x, y, group = group, colour = group,
label = mylabel)) + geom_line(size = 1.5)
myplot + geom_text(color = "black") +
opts(panel.background = theme_rect(fill = "white", colour = NA))
Note that I only ever save myplot once. The second call involving myplot modifies it on the fly but doesn't save it.
For the rest, you'll need to provide a reproducible example.

Resources