Adding a legend to a double plot using ggplot - r

I'm trying to add a legend to my plot using ggplot in R. Everything OK so far. My case is special because I'm trying to deal with three variables, but not in order to draw a 3D plot but draw a 2D plot facing v1 vs. v2 and v1 vs. v3.
I get my plot in a correct way but I dont get the legend.
This is my code:
colfuncWarmest <- colorRampPalette(c("orange","red"))
colfuncColdest <- colorRampPalette(c("green","blue"))
plot <- ggplot(data=temperatures_Celsius, aes(x=temperatures_Celsius$Year))
params <- labs(title=paste("Year vs. (Warmest minimum temperature\n",
"and Coldest minimum temperature)"),
x="Year",
y="Coldest min temp / Warmest min temp")
theme <- theme(plot.title = element_text(hjust = 0.5)) #Centering title
wmtl<-geom_line(data=temperatures_Celsius,
aes(y=temperatures_Celsius$Warmest.Minimum.Temperature..C.,
color="red"
),
colour=colfuncWarmest(length(temperatures_Celsius$Year))
)
wmtt<-stat_smooth(data=temperatures_Celsius,
aes(y=temperatures_Celsius$Warmest.Minimum.Temperature..C.),
color="green",
method = "loess")
cmtl<- geom_line(data=temperatures_Celsius,
aes(y=temperatures_Celsius$Coldest.Minimum.Temperature..C.,
color="blue"
),
colour=colfuncColdest(length(temperatures_Celsius$Year))
)
cmtt<-stat_smooth(data=temperatures_Celsius,
aes(y=temperatures_Celsius$Coldest.Minimum.Temperature..C.),
color="orange",
method = "loess")
plot + theme + params + wmtl + wmtt + cmtl + cmtt
(Not all code was added because I did a lot of changes. It is only to get an idea) I get this:
If I add
+ scale_color_manual(values=c("red","blue"))
(for example) in order to add the legend, I get no error, but nothing different happens. I get the same plot.
What I want is only two lines. A red one that says "Warmest minimum" and another blue line that says "Coldest minimum". What could I do to get my legend in this way?
Thanks in advance.

Generally I would say that the correct way to apply a legend to a ggplot is to map a variable to an aesthetic (such as fill, color, size, alpha). Usually this consists of transforming the data to long format (key ~ value pair) and mapping the key variable to color or other aestetic.
In the current case this is not desirable since there is next to no chance the color gradient (colorRampPalette) on the line could be achieved. So I suggest a hacky way where a dummy layer (layer which will not be seen on the plot) is used to create the legend.
Here is some data
temperatures_Celsius = data.frame(year = 1900:2000,
Warmest = rnorm(100, mean = 20, sd = 5),
Coldest = rnorm(100, mean = 10, sd = 5))
Your plot:
colfuncWarmest <- colorRampPalette(c("orange","red"))
colfuncColdest <- colorRampPalette(c("green","blue"))
plot <- ggplot(data=temperatures_Celsius, aes(x=year))
params <- labs(title=paste("Year vs. (Warmest minimum temperature\n",
"and Coldest minimum temperature)"),
x="Year",
y="Coldest min temp / Warmest min temp")
theme <- theme(plot.title = element_text(hjust = 0.5)) #Centering title
wmtl<-geom_line(data=temperatures_Celsius,
aes(y=Warmest),
colour=colfuncWarmest(length(temperatures_Celsius$year)))
wmtt<-stat_smooth(data=temperatures_Celsius,
aes(y=Warmest),
color="green",
method = "loess")
cmtl<- geom_line(data=temperatures_Celsius,
aes(y=Coldest),
colour=colfuncColdest(length(temperatures_Celsius$year)))
cmtt<-stat_smooth(data=temperatures_Celsius,
aes(y=Coldest),
color="orange",
method = "loess")
plot1 <- plot + theme + params + wmtl + wmtt + cmtl + cmtt
Now add a dummy layer:
plot1+
geom_line(data = data.frame(year = c(1900, 1900),
group = factor(c("Coldest", "Warmest"), levels = c("Warmest", "Coldest")),
value = c(10, 20)), aes(x=year, y = value, color = group), size = 2)+
scale_color_manual(values=c("red","blue"))

Related

How to log transform only one axis in dual axis plot in r?

i am currently plotting (long format) data which consists of fluorescence (RFU) on the 1. Y-Axis and Growth (OD600) on the 2. Y-Axis. I have managed to create the plots, but i find it very difficult to log transform the 2. Y-axis (for OD600) and not messing up the entire plot. (The data is all derived from the same data frame)
My question is this: Is there any way to log10 transform only the 2. Y-axis (from 0.01-1) and making perhaps 5 breaks something like:("0.01","0.1","0.5","0.1")?
My code looks like this: (i apologize for ugly code)
for (i in 1:length(unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))])){
print(i)
coeff <- 1/max(lf_combined_test$normalized_gfp)
p1<-lf_combined_test[lf_combined_test$media %in% unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))][i], ] %>%
# filter(normalized_gfp>0) %>%
filter(row_number() %% 3 == 1) %>%
ggplot( aes(x=time)) +
geom_bar( aes(y=normalized_gfp), stat="identity", size=.1, fill="green", color="green", alpha=.4)+
geom_line( aes(y=od / coeff), size=2, color="tomato") +
scale_x_continuous(breaks = round(seq(0,92, by = 5),1))+
geom_vline(xintercept = 12, linetype="dotted",
color = "blue", size=1)+
scale_y_continuous(limits = c(0,80000),
name = "Relative Flourescence [RFU]/[OD] ",
sec.axis = sec_axis(~.*coeff, name="[OD600]")
) +
scale_y_log10(limits=c(0.01,1))+
theme_grey() +
theme(
axis.title.y = element_text(color = "green", size=13),
axis.title.y.right = element_text(color = "tomato", size=13)
) +
ggtitle(paste("Relative fluorescence & OD600 time series for",unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))][i],sep=" "))
print(p1)
)
}
Which gives a plots that looks like this for now:
Thank you very much in advance! :))
Yes, this is certainly possible. Without your data set it is difficult to give you specific code, but here is an example using the built-in mtcars data set. We plot a best-fitting line for mpg against an x axis of wt.
p <- ggplot(mtcars, aes(wt, mpg)) + geom_smooth(aes(color = 'mpg'))
p
Suppose we want to draw the value of disp according to a log scale which we will show on the y axis. We need to carry out the log transform of our data to do this, but also multiply it by 10 to get it on a similar visual scale to the mpg line:
p <- p + geom_smooth(aes(y = 10 * log10(disp), color = 'disp'))
p
To draw the secondary axis in, we need to supply it with the reverse transformation of 10 * log10(x), which is 10^(x/10), and we will supply appropriately logarithmic breaks at 10, 100 and 1000
p + scale_y_continuous(
sec.axis = sec_axis(~ 10^(.x/10), breaks = c(10, 100, 1000), name = 'disp'))
It seems that you are generating the values of your line by using od / coeff, and reversing that transform with .*coeff, which seems appropriate, but to get a log10 axis, you will need to do something like log10(od) * constant and reverse it with 10^(od/constant). Without your data, it's impossible to know what this constant should be, but you can play around with different values until it looks right visually.

Scale geom_point() size to increase size based on distance from zero

I'd like to plot some measures that have been standardized to z-scores. I want the size of the point in geom_point() to increase from 0 to 3, and also to increase from 0 to -3. I also want the colour to change from red, to blue. The trick is to get both to work together.
Here is an example that's as close as I can get to what I'd like, note that the size of the point increases from -2, whereas I want the size of the point to increase as the z_score moves away from zero.
library(tidyverse)
year <- rep(c(2015:2018), each = 3)
parameters <- rep(c("length", "weight", "condition"), 4)
z_score <- runif(12, min = -2, max = 2)
df <- tibble(year, parameters, z_score)
cols <- c("#d73027",
"darkgrey",
"#4575b4")
ggplot(df, aes(year, parameters, colour = z_score, size = z_score)) +
geom_point() +
scale_colour_gradientn(colours = cols) +
theme(legend.position="bottom") +
scale_size(range = c(1,15)) +
guides(color= guide_legend(), size=guide_legend())
bubble plot output
One trick I tried was to use the absolute value of z_score which scaled the points correctly but messed up the legend.
Here's what I'd like the legend and points size to be scaled to, though I'd like the colour to be a gradient as in my example. Any insight would be greatly appreciated!
Link to plot legend
You were very close. In order to adjust the size of the points in the legend, use the override.aes option in the guides function.
library(ggplot2)
year <- rep(c(2015:2018), each = 3)
parameters <- rep(c("length", "weight", "condition"), 4)
z_score <- runif(12, min = -2, max = 2)
df <- tibble(year, parameters, z_score)
cols <- c("#d73027", "darkgrey", "#4575b4")
ggplot(df, aes(year, parameters, colour = z_score)) +
geom_point( size=abs(5*df$z_score)) + # times 5 to increase size
scale_colour_gradientn(colours = cols) +
theme(legend.position="bottom") +
scale_size(range = c(1,15)) +
guides(color=guide_legend(override.aes = list(size = c( 5, 1, 5))) )
In order to suppress the legend being print for the size attribute, I moved it outside the aes, field. This works for this example, one will have to adjust the size=c(...) to match the number of division in the legend.
This should answer your question and get you most of the way there on answering your question.

Manipulating the legend of scale_fill_gradient2

I have data which comes from a statistical test (gene set enrichment analysis, but that's not important), so I obtain p-values for statistics that are normally distributed, i.e., both positive and negative values:
The test is run on several categories:
set.seed(1)
df <- data.frame(col = rep(1,7),
category = LETTERS[1:7],
stat.sign = sign(rnorm(7)),
p.value = runif(7, 0, 1),
stringsAsFactors = TRUE)
I want to present these data in a geom_tile ggplot such that I color code the df$category by their df$p.value multiplied by their df$stat.sign (i.e, the sign of the statistic)
For that I first take the log10 of df$p.value:
df$sig <- df$stat.sign*(-1*log10(df$p.value))
Then I order the df by df$sig for each sign of df$sig:
library(dplyr)
df <- rbind(dplyr::filter(df, sig < 0)[order(dplyr::filter(df, sig < 0)$sig), ],
dplyr::filter(df, sig > 0)[order(dplyr::filter(df, sig > 0)$sig), ])
And then I ggplot it:
library(ggplot2)
df$category <- factor(df$category, levels=df$category)
ggplot(data = df,
aes(x = col, y = category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue', mid='white', high='darkred') +
theme_minimal() +
xlab("") + ylab("") + labs(fill="-log10(P-Value)") +
theme(axis.text.y = element_text(size=12, face="bold"),
axis.text.x = element_blank())
which gives me:
Is there a way to manipulate the legend such that the values of df$sig are represented by their absolute value but everything else remains unchanged? That way I still get both red and blue shades and maintain the order I want.
If you check ggplot's documentation, scale_fill_gradient2, like other continuous scales, accepts one of the following for its labels argument:
NULL for no labels
waiver() for the default labels computed for the transofrmation object
a character vector giving labels (must be same length as breaks)
a function that takes the breaks as input and returns labels as output
Since you only want the legend values to be absolute, I assume you're satisfied with the default breaks in the legend colour bar (-0.1 to 0.4 with increments in 0.1), so all you really need is to add a function that manipulates the labels.
I.e. instead of this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred') +
Use this:
scale_fill_gradient2(low = 'darkblue', mid = 'white', high = 'darkred',
labels = abs) +
I'm not sure I did understood what you're looking for. Do you meant that you wan't to change the labels within legends? If you want to change labels manipulating breaks and labels given by scale_fill_gradient2() shall do it.
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
For what you're looking for maybe you could display texts inside the figure to show the values, try stacking stat_bin_2d() like this:
ggplot(data=df,aes(x=col,y=category)) +
geom_tile(aes(fill=sig)) +
scale_fill_gradient2(low='darkblue',mid='white',high='darkred',
breaks = order(unique(df$sig)),
labels = abs(order(unique(df$sig)))) +
theme_minimal()+xlab("")+ylab("")+labs(fill="-log10(P-Value)") +
stat_bin_2d(geom = 'text', aes(label = sig), colour = 'black', size = 16) +
theme(axis.text.y=element_text(size=12,face="bold"),axis.text.x=element_blank())
You might want to give the size and colour arguments some tries.

Creating a density histogram in ggplot2?

I want to create the next histogram density plot with ggplot2. In the "normal" way (base packages) is really easy:
set.seed(46)
vector <- rnorm(500)
breaks <- quantile(vector,seq(0,1,by=0.1))
labels = 1:(length(breaks)-1)
den = density(vector)
hist(df$vector,
breaks=breaks,
col=rainbow(length(breaks)),
probability=TRUE)
lines(den)
With ggplot I have reached this so far:
seg <- cut(vector,breaks,
labels=labels,
include.lowest = TRUE, right = TRUE)
df = data.frame(vector=vector,seg=seg)
ggplot(df) +
geom_histogram(breaks=breaks,
aes(x=vector,
y=..density..,
fill=seg)) +
geom_density(aes(x=vector,
y=..density..))
But the "y" scale has the wrong dimension. I have noted that the next run gets the "y" scale right.
ggplot(df) +
geom_histogram(breaks=breaks,
aes(x=vector,
y=..density..,
fill=seg)) +
geom_density(aes(x=vector,
y=..density..))
I just do not understand it. y=..density.. is there, that should be the height. So why on earth my scale gets modified when I try to fill it?
I do need the colours. I just want a histogram where the breaks and the colours of each block are directionally set according to the default ggplot fill colours.
Manually, I added colors to your percentile bars. See if this works for you.
library(ggplot2)
ggplot(df, aes(x=vector)) +
geom_histogram(breaks=breaks,aes(y=..density..),colour="black",fill=c("red","orange","yellow","lightgreen","green","darkgreen","blue","darkblue","purple","pink")) +
geom_density(aes(y=..density..)) +
scale_x_continuous(breaks=c(-3,-2,-1,0,1,2,3)) +
ylab("Density") + xlab("df$vector") + ggtitle("Histogram of df$vector") +
theme_bw() + theme(plot.title=element_text(size=20),
axis.title.y=element_text(size = 16, vjust=+0.2),
axis.title.x=element_text(size = 16, vjust=-0.2),
axis.text.y=element_text(size = 14),
axis.text.x=element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
fill=seg results in grouping. You are actually getting a different histogram for each value of seg. If you don't need the colours, you could use this:
ggplot(df) +
geom_histogram(breaks=breaks,aes(x=vector,y=..density..), position="identity") +
geom_density(aes(x=vector,y=..density..))
If you need the colours, it might be easiest to calculate the density values outside of ggplot2.
Or an option with ggpubr
library(ggpubr)
gghistogram(df, x = "vector", add = "mean", rug = TRUE, fill = "seg",
palette = c("#00AFBB", "#E7B800", "#E5A800", "#00BFAB", "#01ADFA",
"#00FABA", "#00BEAF", "#01AEBF", "#00EABA", "#00EABB"), add_density = TRUE)
The confusion regarding interpreting the y-axis might be due to density is plotted rather than count. So, the values on the y-axis are proportions of the total sample, where the sum of the bars is equal to 1.

Showing separate legend for a geom_text layer?

I have the following plot:
library(ggplot2)
ib<- data.frame(
category = factor(c("Cat1","Cat2","Cat1", "Cat1", "Cat2","Cat1","Cat1", "Cat2","Cat2")),
city = c("CITY1","CITY1","CITY2","CITY3", "CITY3","CITY4","CITY5", "CITY6","CITY7"),
median = c(1.3560, 2.4830, 0.7230, 0.8100, 3.1480, 1.9640, 0.6185, 1.2205, 2.4000),
samplesize = c(851, 1794, 47, 189, 185, 9, 94, 16, 65)
)
p<-ggplot(data=ib, aes(x=city, y=category, size=median, colour=category, label=samplesize)) +
geom_point(alpha=.6) +
scale_area(range=c(1,15)) +
scale_colour_hue(guide="none") +
geom_text(aes(size = 1), colour="black")
p
(I'm plotting the circles proportional to a median value and overlaying with a text label representing the sample size. image at http://imgur.com/T82cF)
Is there any way to SEPARATE the two legends? I would like one legend (labeled "median") to give the scale of circles, and the other legend with a single letter "a" (or even better a number) which I could label "sample size". Since the two properties are not related in any way, it doesn't make sense to bundle them in the same legend.
I've tried all sorts of combinations but the best I can come up with is loosing the text legend altogether :)
thanks for the answer!
Updated scale_area has been deprecated; scale_size used instead. The gtable function gtable_filter() is used to extract the legends. And modified code used to replace default legend key in one of the legends.
If you are still looking for an answer to your question, here's one that seems to do most of what you want, although it's a bit of a hack in places. The symbol in the legend can be changes using kohske's comment here
The difficulty was trying to apply the two different size mappings. So, I've left the dot size mapping inside the aesthetic statement but removed the label size mapping from the aesthetic statement. This means that label size has to be set according to discrete values of a factor version of samplesize (fsamplesize). The resulting chart is nearly right, except the legend for label size (i.e., samplesize) is not drawn. To get round that problem, I drew a chart that contained a label size mapping according to the factor version of samplesize (but ignoring the dot size mapping) in order to extract its legend which can then be inserted back into the first chart.
## Your data
ib<- data.frame(
category = factor(c("Cat1","Cat2","Cat1", "Cat1", "Cat2","Cat1","Cat1", "Cat2","Cat2")),
city = c("CITY1","CITY1","CITY2","CITY3", "CITY3","CITY4","CITY5", "CITY6","CITY7"),
median = c(1.3560, 2.4830, 0.7230, 0.8100, 3.1480, 1.9640, 0.6185, 1.2205, 2.4000),
samplesize = c(851, 1794, 47, 189, 185, 9, 94, 16, 65)
)
## Load packages
library(ggplot2)
library(gridExtra)
library(gtable)
library(grid)
## Obtain the factor version of samplesize.
ib$fsamplesize = cut(ib$samplesize, breaks = c(0, 100, 1000, Inf))
## Obtain plot with dot size mapped to median, the label inside the dot set
## to samplesize, and the size of the label set to the discrete levels of the factor
## version of samplesize. Here, I've selected three sizes for the labels (3, 6 and 10)
## corresponding to samplesizes of 0-100, 100-1000, >1000. The sizes of the labels are
## set using three call to geom_text - one for each size.
p <- ggplot(data=ib, aes(x=city, y=category)) +
geom_point(aes(size = median, colour = category), alpha = .6) +
scale_size("Median", range=c(0, 15)) +
scale_colour_hue(guide = "none") + theme_bw()
p1 <- p +
geom_text(aes(label = ifelse(samplesize > 1000, samplesize, "")),
size = 10, color = "black", alpha = 0.6) +
geom_text(aes(label = ifelse(samplesize < 100, samplesize, "")),
size = 3, color = "black", alpha = 0.6) +
geom_text(aes(label = ifelse(samplesize > 100 & samplesize < 1000, samplesize, "")),
size = 6, color = "black", alpha = 0.6)
## Extracxt the legend from p1 using functions from the gridExtra package
g1 = ggplotGrob(p1)
leg1 = gtable_filter(g1, "guide-box")
## Keep p1 but dump its legend
p1 = p1 + theme(legend.position = "none")
## Get second legend - size of the label.
## Draw a dummy plot, using fsamplesize as a size aesthetic. Note that the label sizes are
## set to 3, 6, and 10, matching the sizes of the labels in p1.
dummy.plot = ggplot(data = ib, aes(x = city, y = category, label = samplesize)) +
geom_point(aes(size = fsamplesize), colour = NA) +
geom_text(show.legend = FALSE) + theme_bw() +
guides(size = guide_legend(override.aes = list(colour = "black", shape = utf8ToInt("N")))) +
scale_size_manual("Sample Size", values = c(3, 6, 10),
breaks = levels(ib$fsamplesize), labels = c("< 100", "100 - 1000", "> 1000"))
## Get the legend from dummy.plot using functions from the gridExtra package
g2 = ggplotGrob(dummy.plot)
leg2 = gtable_filter(g2, "guide-box")
## Arrange the three components (p1, leg1, leg2) using functions from the gridExtra package
## The two legends are arranged using the inner arrangeGrob function. The resulting
## chart is then arranged with p1 in the outer arrrangeGrob function.
ib.plot = arrangeGrob(p1, arrangeGrob(leg1, leg2, nrow = 2), ncol = 2,
widths = unit(c(9, 2), c("null", "null")))
## Draw the graph
grid.newpage()
grid.draw(ib.plot)
This actually doesn't directly address your question, but it is how I might go about creating a graph with the general characteristics you describe:
ib$ss <- paste("n = ",ib$samplesize,sep = "")
ggplot(data=ib, aes(x=city, y=category, size=median, colour=category, label=ss)) +
geom_point(alpha=.6) +
geom_text(size = 2, vjust = -1.2,colour="black") +
scale_colour_hue(legend = FALSE)
I removed the scale_area piece, as I'm not sure what purpose it served and it was causing errors for me.
So the rationale here is that the sample size information feels more like an annotation to me than something that deserves its own scale and legend. Opinions may differ on that, of course, but I thought I'd put it out there in case you find it useful.
This too doesn't answer your question. I've left samplesize inside the circle. Also, samplesize to me is more like an annotation than a legend.
But I think you are using an old version of ggplot2. There have been some changes in ggplot2 version 0.9.0. I've made the changes below.
p<-ggplot(data=ib, aes(x=city, y=category, size=median, colour=category, label=samplesize)) +
geom_point(alpha=.6) +
scale_area(range = c(1,15)) + # range instead of to
scale_colour_hue(guide = "none") + # guide instead of legend
geom_text(size = 2.5, colour="black")
p

Resources