Label minimum and maximum of scale fill gradient legend with text: ggplot2 - r

I have a plot created in ggplot2 that uses scale_fill_gradientn. I'd like to add text at the minimum and maximum of the scale legend. For example, at the legend minimum display "Minimum" and at the legend maximum display "Maximum". There are posts using discrete fills and adding labels with numbers instead of text (e.g. here), but I am unsure how to use the labels feature with scale_fill_gradientn to only insert text at the min and max. At the present I am apt to getting errors:
Error in scale_labels.continuous(scale, breaks) :
Breaks and labels are different lengths
Is this text label possible within ggplot2 for this type of scale / fill?
# The example code here produces an plot for illustrative purposes only.
# create data frame, from ggplot2 documentation
df <- expand.grid(x = 0:5, y = 0:5)
df$z <- runif(nrow(df))
#plot
ggplot(df, aes(x, y, fill = z)) + geom_raster() +
scale_fill_gradientn(colours=topo.colors(7),na.value = "transparent")

For scale_fill_gradientn() you should provide both arguments: breaks= and labels= with the same length. With argument limits= you extend colorbar to minimum and maximum value you need.
ggplot(df, aes(x, y, fill = z)) + geom_raster() +
scale_fill_gradientn(colours=topo.colors(7),na.value = "transparent",
breaks=c(0,0.5,1),labels=c("Minimum",0.5,"Maximum"),
limits=c(0,1))

User Didzis Elfert's answer slightly lacks "automatism" in my opinion (but it is of course pointing to the core of the problem +1 :).
Here an option to programatically define minimum and maximum of your data.
Advantages:
You will not need to hard code values any more (which is error prone)
You will not need hard code the limits (which also is error prone)
Passing a named vector: You don't need the labels argument (manually map labels to values is also error-prone).
As a side effect you will avoid the "non-matching labels/breaks" problem
library(ggplot2)
foo <- expand.grid(x = 0:5, y = 0:5)
foo$z <- runif(nrow(foo))
myfuns <- list(Minimum = min, Mean = mean, Maximum = max)
ls_val <- unlist(lapply(myfuns, function(f) f(foo$z)))
# you only need to set the breaks argument!
ggplot(foo, aes(x, y, fill = z)) +
geom_raster() +
scale_fill_gradientn(
colours = topo.colors(7),
breaks = ls_val
)
# You can obviously also replace the middle value with sth else
ls_val[2] <- 0.5
names(ls_val)[2] <- 0.5
ggplot(foo, aes(x, y, fill = z)) +
geom_raster() +
scale_fill_gradientn(
colours = topo.colors(7),
breaks = ls_val
)

Related

Is there are a way to change the breaks of a ggplot legend without changing other properties of the aesthetic?

I wish to change the breaks of a ggplot legend without affecting the other properties of the aesthetic (e.g., palette, name, etc.). For example, a MWE where the aesthetic is colour:
## Original plot:
df <- data.frame(x = 1:10, y = 1:10, z = 1:10)
gg <- ggplot(df, aes(x, y, colour = z)) +
geom_point() +
scale_colour_distiller(palette = "Spectral", name = "Original title")
gg
## Plot with adjusted breaks:
gg + scale_colour_distiller(breaks = c(2.5, 7.5))
Original plot
Plot with adjusted breaks
In the second plot, the colour palette and the legend name are reset to their default values: I want to change the legend breaks only.
I understand why the above approach does not work; the first colour scale is completely replaced by the second scale. However, I don't know how to tackle this problem. Any advice is greatly appreciated!
I wrote a function which solves my question. It takes a ggplot object, the name of an aesthetic (as a string), and the breaks for the corresponding legend.
change_legend_breaks <- function(gg, aesthetic, breaks) {
## Find the scales associated with the specifed aesthetic
sc <- as.list(gg$scales)$scales
all_aesthetics <- sapply(sc, function(x) x[["aesthetics"]][1])
idx <- which(aesthetic == all_aesthetics)
## Overwrite the breaks of the specifed aesthetic
gg$scales$scales[[idx]][["breaks"]] <- breaks
return(gg)
}
This is my first time dealing with ggplot objects at a low level, so perhaps there is a better, more robust approach: This works for me, though.
Interestingly, it seems to be a mutating function, that is, it alters the plot object itself, rather than a copy of the object. I didn't know this was possible in R.
As a check that the function works as intended, here is a variant on the original MWE, this time with two aesthetics:
df <- data.frame(x = 1:10, y = 1:10, z1 = 1:10, z2 = 1:10)
gg <- ggplot(df, aes(x, y, colour = z1, size = z2)) +
geom_point() +
scale_size(name = "Original size title") +
scale_colour_distiller(palette = "Spectral", name = "Original colour title")
change_legend_breaks(gg, "colour", breaks = c(2.5, 7.5))
change_legend_breaks(gg, "size", breaks = c(1, 9))

Boxplot ggplot2: Show mean value and number of observations in grouped boxplot

I wish to add the number of observations to this boxplot, not by group but separated by factor. Also, I wish to display the number of observations in addition to the x-axis label that it looks something like this: ("PF (N=12)").
Furthermore, I would like to display the mean value of each box inside of the box, displayed in millions in order not to have a giant number for each box.
Here is what I have got:
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
}
mean.n <- function(x){x <- x/1000000
return(c(y = median(x)*0.97, label = round(mean(x),2)))
}
ggplot(Soils_noctrl) +
geom_boxplot(aes(x=Slope,y=Events.g_Bacteria, fill = Detergent),
varwidth = TRUE) +
stat_summary(aes(x = Slope, y = Events.g_Bacteria), fun.data = give.n, geom = "text",
fun = median,
position = position_dodge(width = 0.75))+
ggtitle("Cell Abundance")+
stat_summary(aes(x = Slope, y = Events.g_Bacteria),
fun.data = mean.n, geom = "text", fun = mean, colour = "red")+
facet_wrap(~ Location, scale = "free_x")+
scale_y_continuous(name = "Cell Counts per Gram (Millions)",
breaks = round (seq(min(0),
max(100000000), by = 5000000),1),
labels = function(y) y / 1000000)+
xlab("Sample")
And so far it looks like this:
As you can see, the mean value is at the bottom of the plot and the number of observations are in the boxes but not separated
Thank you for your help! Cheers
TL;DR - you need to supply a group= aesthetic, since ggplot2 does not know on which column data it is supposed to dodge the text geom.
Unfortunately, we don't have your data, but here's an example set that can showcase the rationale here and the function/need for group=.
set.seed(1234)
df1 <- data.frame(detergent=c(rep('EDTA',15),rep('Tween',15)), cells=c(rnorm(15,10,1),rnorm(15,10,3)))
df2 <- data.frame(detergent=c(rep('EDTA',20),rep('Tween',20)), cells=c(rnorm(20,1.3,1),rnorm(20,4,2)))
df3 <- data.frame(detergent=c(rep('EDTA',30),rep('Tween',30)), cells=c(rnorm(30,5,0.8),rnorm(30,3.3,1)))
df1$smp='Sample1'
df2$smp='Sample2'
df3$smp='Sample3'
df <- rbind(df1,df2,df3)
Instead of using stat_summary(), I'm just going to create a separate data frame to hold the mean values I want to include as text on my plot:
summary_df <- df %>% group_by(smp, detergent) %>% summarize(m=mean(cells))
Now, here's the plot and use of geom_text() with dodging:
p <- ggplot(df, aes(x=smp, y=cells)) +
geom_boxplot(aes(fill=detergent))
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2)),
color='blue', position=position_dodge(0.8)
)
You'll notice the numbers are all separated along y= just fine, but the "dodging" is not working. This is because we have not supplied any information on how to do the dodging. In this case, the group= aesthetic can be supplied to let ggplot2 know that this is the column by which to use for the dodging:
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), group=detergent),
color='blue', position=position_dodge(0.8)
)
You don't have to supply the group= aesthetic if you supply another aesthetic such as color= or fill=. In cases where you give both a color= and group= aesthetic, the group= aesthetic will override any of the others for dodging purposes. Here's an example of the same, but where you don't need a group= aesthetic because I've moved color= up into the aes() (changing fill to greyscale so that you can see the text):
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), color=detergent),
position=position_dodge(0.8)
) + scale_fill_grey()
FUN FACT: Dodging still works even if you supply geom_text() with a nonsensical aesthetic that would normally work for dodging, such as fill=. You get a warning message Ignoring unknown aesthetics: fill, but the dodging still works:
p + geom_text(data=summary_df,
aes(y=m, label=round(m,2), fill=detergent),
position=position_dodge(0.8)
)
# gives you the same plot as if you just supplied group=detergent, but with black text
In your case, changing your stat_summary() line to this should work:
stat_summary(aes(x = Slope, y = Events.g_Bacteria, group = Detergent),...

changing ggplot legend unit scale

This question is motivated by a previous post illustrating various ways to change how axes scales are plotted in a ggplot figure, from the default exponential notation to the full integer value (when ones axes values are very large). While I am able to convert the axes scales from exponential notation to full values, I am unclear how one would achieve the same goal for the values appearing in the legend.
While I understand that one can manually change the length of the legend scale with "scale_color..." or "scale_fill..." followed by the "limits" argument, this does not appear to be a solution to getting my legend values to show "6000000000" rather than "6e+09" (or "0" rather than "0e+00" for that matter).
The following example should suffice. My hope is someone can point out how to implement the 'scales' package to apply for legend scales rather than axes scales.
Thanks very much.
library(ggplot2)
library(scales)
Data <- data.frame(
pi = c(2,71,828,1828,45904,523536,2874713,52662497,757247093,6999595749),
e = c(3,14,159,2653,58979,311599,7963468,54418516,1590576171, 99),
face = 1:10)
p <- ggplot(data = Data, aes(x=face, y=e, colour = pi))
myplot <- p + geom_point() +
scale_y_continuous(labels = comma) +
scale_color_gradientn(colours = rainbow(2), limits=c(0,7000000000))
myplot
Use the Comma formatter in scale_color_gradientn by setting labels = comma e.g.:
p <- ggplot(data = Data, aes(x=face, y=e, colour = pi))
myplot <- p + geom_point() +
scale_y_continuous(labels = comma) +
scale_color_gradientn(colours = rainbow(2), limits=c(0,7000000000), labels = comma)
myplot

How to fill histogram with color gradient?

I have a simple problem. How to plot histogram with ggplot2 with fixed binwidth and filled with rainbow colors (or any other palette)?
Lets say I have a data like that:
myData <- abs(rnorm(1000))
I want to plot histogram, using e.g. binwidth=.1. That however will cause different number of bins, depending on data:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
If I knew number of bins (e.g. n=15) I'd use something like:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill=rainbow(n))
But with changing number of bins I'm kind of stuck on this simple problem.
If you really want the number of bins flexible, here is my little workaround:
library(ggplot2)
gg_b <- ggplot_build(
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
)
nu_bins <- dim(gg_b$data[[1]])[1]
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill = rainbow(nu_bins))
In case the binwidth is fixed, here is an alternative solution which is using the internal function ggplot2:::bin_breaks_width() to get the number of bins before creating the graph. It's still a workaround but avoids to call geom_histogram() twice as in the other solution:
# create sample data
set.seed(1L)
myData <- abs(rnorm(1000))
binwidth <- 0.1
# create plot
library(ggplot2) # CRAN version 2.2.1 used
n_bins <- length(ggplot2:::bin_breaks_width(range(myData), width = binwidth)$breaks) - 1L
ggplot() + geom_histogram(aes(x = myData), binwidth = binwidth, fill = rainbow(n_bins))
As a third alternative, the aggregation can be done outside of ggplot2. Then, geom_col() cam be used instead of geom_histogram():
# start binning on multiple of binwidth
start_bin <- binwidth * floor(min(myData) / binwidth)
# compute breaks and bin the data
breaks <- seq(start_bin, max(myData) + binwidth, by = binwidth)
myData2 <- cut(sort(myData), breaks = breaks, by = binwidth)
ggplot() + geom_col(aes(x = head(breaks, -1L),
y = as.integer(table(myData2)),
fill = levels(myData2))) +
ylab("count") + xlab("myData")
Note that breaks is plotted on the x-axis instead of levels(myData2) to keep the x-axis continuous. Otherwise each factor label would be plotted which would clutter the x-axis. Also note that the built-in ggplot2 color palette is used instead of rainbow().

Boxplot, how to match outliers' color to fill aesthetics?

I am trying to match boxplot's outliers color to the fill color which is set by aesthetic (scale_colour_discrete).
Here is an example.
m <- ggplot(movies, aes(y = votes, x = factor(round(rating)),
fill=factor(Animation)))
m + geom_boxplot() + scale_y_log10()
This generates plot below. How do I change those black dots to be reddish/greenish colors used in the body? outlier.colour option of the boxplot seems to pick one colour across, and not as aesthetic, if I understand correctly. I dont mind using colour aesthetics if that helps.
Edit:
Adapted this solution (Changing whisker definition in geom_boxplot). The horizontal dodging is reset by stats_summary and I couldn't figure out how to get it back. I'd ptobably drop outliers and stretch whiskers as needed since I know how now.
# define the summary function
f <- function(x) {
r <- quantile(x, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
# define outlier function, beyound 5 and 95% percentiles
o <- function(x) {
subset(x, x < quantile(x,probs=c(0.05))[1] | quantile(x,probs=c(0.95))[1] < x)
}
m <- ggplot(movies, aes(y = votes, x = factor(round(rating)),
colour=factor(Animation)))
m <- m + stat_summary(fun.data=f, geom='boxplot')
m <- m + stat_summary(fun.y=o, geom='point', aes(colour=factor(Animation)))
m + scale_y_log10()
As #koshke said, having the outliers colored like the lines of the box (not the fill color) is now easily possible by setting outlier.colour = NULL:
m <- ggplot(movies, aes(y = votes, x = factor(round(rating)),
colour = factor(Animation)))
m + geom_boxplot(outlier.colour = NULL) + scale_y_log10()
outlier.colour must be written with "ou"
outlier.colour must be outside aes ()
I'm posting this as a late answer because I find myself looking this up again and again, and I also posted it for the related question Coloring boxplot outlier points in ggplot2?.
I found a solution to the fact that setting geom_boxplot(outlier.colour = NULL) doesn't work anymore in newest versions of R (#jonsnow speaks about version 1.0.0 of ggplot2).
In order to replicate the behaviour that #cbeleites propsed you simply need to use the following code:
update_geom_defaults("point", list(colour = NULL))
m <- ggplot(movies, aes(y = votes, x = factor(round(rating)),
colour = factor(Animation)))
m + geom_boxplot() + scale_y_log10()
as expected this produces plot with points that match the line color.
Of course one should remember to restore the default if he needs to draw multiple plots:
update_geom_defaults("point", list(colour = "black"))
The solution was found by reading the ggplot2 changelog on github:
The outliers of geom_boxplot() use the default colour, size and shape from
geom_point(). Changing the defaults of geom_point() with
update_geom_defaults() will apply the same changes to the outliers of
geom_boxplot(). Changing the defaults for the outliers was previously not
possible. (#ThierryO, #757)
Posted here as well: Coloring boxplot outlier points in ggplot2?
I found a way to do this, editing raw grid object.
library(ggplot2)
match.ol.col <- function(plt,aes.cp='fill') {
# matches outliers' color to either fill or colour aesthetics
# plt: ggplot layer object having boxplot
# aes.cp: aetsthetic from which copy color. must be either 'fill' or 'col'
# returns grid objects, so print it wigh grid.draw(), not print()
if (aes.cp %in% c('color', 'colour')) aes.cp <- 'col'
grob <- ggplotGrob(plt)
bps <- getGrob(grob, 'boxplots', grep=T)
for (bp in bps$children) {
p <- getGrob(bp, 'point', grep=T)
if (is.null(p)) next
r <- getGrob(bp, 'rect', grep=T)
grob <- geditGrob(grob, p$name, gp=gpar(col=r$gp[[aes.cp]]))
}
return(grob)
}
m <- ggplot(movies, aes(y = votes, x = factor(round(rating)),
colour=factor(Animation)))
p <- m + geom_boxplot() + scale_y_log10()
grob <- match.ol.col(p, aes.cp='colour')
grid.draw(grob)
results:
I had a very similar issue. I wanted to match style with a previous plot, so wanted black borders with coloured fill, and matching outliers.
My solution was to over-print , once with colour= and the default solid circle point, and once with fill= and an open circle point-shape
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot(aes(colour=factor(cyl))) +
geom_boxplot(aes(fill=factor(cyl)), outlier.shape=21)
boxplot with coloured fill, and black borders and median line

Resources