How do I limit the range of the viridis colour scale? - r

I have two sets of data, which I want to present using a heat map with the viridis color scale. For the first data set, my values range from 0 to 1.2 and I can easily see the differences I want to see. However my second data set has some outliers, resulting in a range from 0 to 2. Now it's harder to see the differences in the interesting range between 0 and 1 and it's more diffucult to compare the two images directly. Is there a possibility to show the data from 0 to 1.2 using the viridis colour scale while showing the higher values in yellow ("highest" colour of the viridis scale)?
Here is an example:
library(viridis)
#Create Data
DataSet1 <- expand.grid(x = 0:5, y = 0:5)
DataSet1$z <- runif(36, 0, 1.2)
DataSet2 <- expand.grid(x = 0:5, y = 0:5)
DataSet2$z <- runif(36, 0, 2)
#Plot Data
ggplot(DataSet1, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_viridis() +
geom_text(aes(label = round(z, 2)), size = 2)
DataSet1: Differences between 0.5 and 0.7 are easy to see
ggplot(DataSet2, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_viridis() +
geom_text(aes(label = round(z, 2)), size = 2)
DataSet2: Differences between 0.5 and 0.7 are diffucult to see

EDIT 2022-05-03: The scale function is called scale_fill_viridis_c() these days.
#ClausWilke's solution is better because it shows in the legend, but sometimes one just needs a quick solution without having to write too much specific code. This one also relies on the scales package
ggplot(DataSet2, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_viridis_c(limits = c(0.2, 1), oob = scales::squish) +
geom_text(aes(label = round(z, 2)), size = 2)

You can define an arbitrary rescaling function. Not sure this looks that great, would likely need some work with the legend, but in principle this mechanism allows you to map data values onto the scale in any way you want.
ggplot(DataSet2, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_viridis(rescaler = function(x, to = c(0, 1), from = NULL) {
ifelse(x<1.2,
scales::rescale(x,
to = to,
from = c(min(x, na.rm = TRUE), 1.2)),
1)}) +
geom_text(aes(label = round(z, 2)), size = 2)

Are you looking for something like this?
ggplot(DataSet2, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_gradient(low="green", high="red", limits=c(0, 1.2),
na.value = "yellow") +
geom_text(aes(label = round(z, 2)), size = 2)
Using the viridis colors, asper jazzurro recommendation.
ggplot(DataSet2, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_gradientn(colors = viridis_pal()(9), limits=c(0, 1.2),
na.value = "#FDE725FF") +
geom_text(aes(label = round(z, 2)), size = 2)

It's not necessarily an improvement, but you could do something like this to show the higher values in yellow:
DataSet2A <- DataSet2 %>% filter(z <= 1.2)
DataSet2B <- DataSet2 %>% filter(z > 1.2)
ggplot(DataSet2A, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_viridis(begin = 0, end = .75) +
geom_text(aes(label = round(z, 2)), size = 2) +
geom_tile(data = DataSet2B, aes(x, y), fill = "yellow")
Maybe if you play around with the cutoff as well as the begin= and end= parameters in the scale, which control the portion of the viridis scale that you're employing, you can achieve the result you want. (Note that you can only have one fill scale per plot, but you can set additional constant fills as I've done here with yellow.)

Related

ggplot line legend disappears with alpha < 1

When trying to plot some data in ggplot2 using geom_line(), I noticed that the legend items become empty if I use alpha < 1. How can I fix this and why is this happening?
# dummy data
data <- data.frame(
x = rep(1:10, 10),
y = 1:100 + c(runif(50,0,5), runif(50,0,10)),
grp = c(rep("A", 50), rep("B", 50)))
# using alpha on defaul = 1
ggplot(data, aes(x = x, y = y, col = grp)) +
geom_line()
When I plot the same graph, but with alpha < 1, the lines in the legend completely disappear:
# using alpha < 1
ggplot(data, aes(x = x, y = y, col = grp)) +
geom_line(alpha = 0.9)
(versions: R 4.1.3, ggplot2 3.3.5)
Edit: Updating R and restarting RStudio did not help. This also occurs when using R directly without RStudio.
I ran into the same problem. When saving the plots to PDF/PNG the lines do appear in the legend.
Another workaround I found is adding geom_point() so that way at least you have the colors in the legend:
ggplot(data, aes(x = x, y = y, col = grp)) +
geom_line(alpha = 0.4) +
geom_point(alpha = 0.4, size = 0.1) +
guides(colour = guide_legend(override.aes = list(size=4)))
Legend take the same aes() than plot, you can override this by override.aes.
This should work
ggplot(data, aes(x = x, y = y, col = grp)) +
geom_line(alpha = 0.2) + # using alpha = 0.2 to have it more evident
guides(col = guide_legend(override.aes = list(alpha = 1)))
The same can be used for example to change shape or color of legend elements, respect to aes() mapping in plot

Overlaying histogram with different y-scales

I'm struggling with the following issue:
I want to plot two histograms, but since the statistics of one of the two classes is much less than the other I need to add a second y-axis to allow a direct comparison of the values.
I report below the code I used at the moment and the result.
Thank you in advance!
ggplot(data,aes(x= x ,group=class,fill=class)) + geom_histogram(position="identity",
alpha=0.5, bins = 20)+ theme_bw()
Consider the following situation where you have 800 versus 200 observations:
library(ggplot2)
df <- data.frame(
x = rnorm(1000, rep(c(1, 2), c(800, 200))),
class = rep(c("A", "B"), c(800, 200))
)
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
# Note that y = stat(count) is the default behaviour
mapping = aes(y = stat(count)))
You could scale the counts for each group to a maximum of 1 by using y = stat(ncount):
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
mapping = aes(y = stat(ncount)))
Alternatively, you can set y = stat(density) to have the total area integrate to 1.
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
mapping = aes(y = stat(density)))
Note that after ggplot 3.3.0 stat() probably will get replaced by after_stat().
How about comparing them side by side with facets?
ggplot(data,aes(x= x ,group=class,fill=class)) +
geom_histogram(position="identity",
alpha=0.5,
bins = 20) +
theme_bw() +
facet_wrap(~class, scales = "free_y")

R bubble plot using ggplot manually selecting the colour and axis names

I using ggplot to create a bubble plot. With this code:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
theme_bw() +
theme() +
scale_size(range = c(1, 50)) +
ylim(0,100)
It is working perfectly apart from 2 things:
For each name (fill) I would like to manually specify the colour used (via a dataframe that maps name to colour) - this is to provide consistency across multiple figures.
I would like to substitute the numbers on the y for text labels (for several reasons I cannot use the text labels from the outset due to ordering issues)
I have tried several methods using scale_color_manual() and scale_y_continuous respectively and I am getting nowhere! Any help would be very gratefully received!
Thanks
Since you have not specified an example df, I created one of my own.
To manually specify the color, you have to use scale_fill_manual with a named vector as the argument of values.
Edit 2
This appears to do what you want. We use scale_y_continuous. The breaks argument specifies the vector of positions, while the labels argument specifies the labels which should appear at those positions. Since we already created the vectors when creating the data frame, we simply pass those vectors as arguments.
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(breaks = mean, labels = order_label)
Edit 1
From your comment, it appears that you want to label the circles. One option would be to use geom_text. Code below. You may need to experiment with values of nudge_y to get the position correct.
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
order_label <- c("New York", "London")
df <- data.frame(order, mean, n, name, order_label, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
geom_text(aes(label = order_label), size = 3, hjust = "inward",
nudge_y = 0.03) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
ylab(NULL)
Original Answer
It is not clear what you mean by "substitute the numbers on the y for text labels". In the example below, I have formatted the y-axis as a percentage using the scales::percent_format() function. Is this similar to what you want?
order <- c(1, 2)
mean <- c(0.75, 0.3)
n <- c(180, 200)
name <- c("a", "b")
df <- data.frame(order, mean, n, name, stringsAsFactors = FALSE)
color <- c("blue", "red")
name_color <- data.frame(name, color, stringsAsFactors = FALSE)
gcolors <- name_color[, 2]
names(gcolors) <- name_color[, 1]
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_y_continuous(labels = scales::percent_format())
Thanks, for all your help, this worked perfectly:
ggplot(df, aes(x = order, y = mean, size = n, fill = name)) +
geom_point(shape = 21) +
scale_fill_manual(values = gcolors) +
scale_size(limits = c(min(df$n), max(df$n))) +
scale_x_continuous(breaks = order, labels = order_label)

ggplot outline jitter datapoints

I'm trying to create a scatterplot where the points are jittered (geom_jitter), but I also want to create a black outline around each point. Currently I'm doing it by adding 2 geom_jitters, one for the fill and one for the outline:
beta <- paste("beta == ", "0.15")
ggplot(aes(x=xVar, y = yVar), data = data) +
geom_jitter(size=3, alpha=0.6, colour=my.cols[2]) +
theme_bw() +
geom_abline(intercept = 0.0, slope = 0.145950, size=1) +
geom_vline(xintercept = 0, linetype = "dashed") +
annotate("text", x = 2.5, y = 0.2, label=beta, parse=TRUE, size=5)+
xlim(-1.5,4) +
ylim(-2,2)+
geom_jitter(shape = 1,size = 3,colour = "black")
However, that results in something like this:
Because jitter randomly offsets the data, the 2 geom_jitters are not in line with each other. How do I ensure the outlines are in the same place as the fill points?
I've see threads about this (e.g. Is it possible to jitter two ggplot geoms in the same way?), but they're pretty old and not sure if anything new has been added to ggplot that would solve this issue
The code above works if, instead of using geom_jitter, I use the regular geom_point, but I have too many overlapping points for that to be useful
EDIT:
The solution in the posted answer works. However, it doesn't quite cooperate for some of my other graphs where I'm binning by some other variable and using that to plot different colours:
ggplot(aes(x=xVar, y = yVar, color=group), data = data) +
geom_jitter(size=3, alpha=0.6, shape=21, fill="skyblue") +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_colour_brewer(name = "Title", direction = -1, palette = "Set1") +
xlim(-1.5,4) +
ylim(-2,2)
My group variable has 3 levels, and I want to colour each group level by a different colour in the brewer Set1 palette. The current solution just colours everything skyblue. What should I fill by to ensure I'm using the correct colour palette?
You don't actually have to use two layers; you can just use the fill aesthetic of a plotting character with a hole in it:
# some random data
set.seed(47)
df <- data.frame(x = rnorm(100), y = runif(100))
ggplot(aes(x = x, y = y), data = df) + geom_jitter(shape = 21, fill = 'skyblue')
The colour, size, and stroke aesthetics let you customize the exact look.
Edit:
For grouped data, set the fill aesthetic to the grouping variable, and use scale_fill_* functions to set color scales:
# more random data
set.seed(47)
df <- data.frame(x = runif(100), y = rnorm(100), group = sample(letters[1:3], 100, replace = TRUE))
ggplot(aes(x=x, y = y, fill=group), data = df) +
geom_jitter(size=3, alpha=0.6, shape=21) +
theme_bw() +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_fill_brewer(name = "Title", direction = -1, palette = "Set1")

Constructing an area plot with outlines for discrete variable (i.e. with steps)

Similar to geom_area plot with areas and outlines ggplot, I'm trying to construct a stacked area plot with outlines. Since my variables are discrete I'm using geom_bar() for stacking them. The code is as follows:
require(ggplot2)
require(reshape)
x = 0:4
y1 = c(3,2,2,1,0)
y2 = c(1,1,0,0,0)
data = data.frame(x,y1,y2)
data.plot <-melt(data, id.vars = "x")
cols = c(y1="darkgrey",y2="lightgrey")
p = ggplot(data.plot,aes(x=x,y=value,fill=variable))
p + geom_bar(aes(width=1),stat = "identity") + theme_bw() + scale_fill_manual(values=cols)
Which gives
My problem is now adding the outlines as in the example I referred to. I can use colour="black" in geom_bar() but this adds vertical lines between the bars which look quite ugly.
Does anyone have a suggestion to get these outlines? The solution doesn't have to be based on geom_bar.
If possible, I am also interested in a solution where only the dark grey part has an outline, since this outline has an important interpretation. Perhaps this could be based on some shifted version of geom_line()?
Here is another approach, using annotate("path"). This suggestion has hard-coded values for some of the path components, but I suspect there is a way to algorithmically fill in those values (perhaps with gg_build().
p <- ggplot(data.plot,aes(x=x, y=value, fill=variable))
p <- p + geom_bar(aes(width=1), stat = "identity") + theme_bw() + scale_fill_manual(values=cols)
p <- p + annotate(x=c(-.5, 0.5, 0.5, 2.5, 2.5, 3.5, 3.5),
y=c(3, 3, 2, 2, 1, 1, 0 ), group = 1, "path", color = "black", size = 2)
p <- p + annotate(x=c(min(x)-.5, min(x)+0.5, min(x)+0.5, min(x)+2.5, min(x)+2.5, min(x)+3.5, min(x)+3.5),
y=c(max(value), max(value), max(value)- 1, max(value)- 1, max(value)- 2, max(value)- 2, min(value)), group = 1, "path", color = "black", size = 2)
p
Your plotting code (I don't want to use c since that's a function):
p <- ggplot(data.plot, aes(x = x, y = value, fill = variable))
p <- p + geom_bar(aes(width = 1), stat = "identity") + theme_bw() + scale_fill_manual(values = cols)
Now add a stepping line along the bars:
p <- p + geom_step(aes(x = x - 0.5), position = "stack")
It's a bit more work to fix a line along the axes:
library (dplyr)
y.max <- data.plot %>% group_by(x) %>% summarize(s = sum(value))
y.max <- max(y.max$s)
p + geom_step(aes(x = x - 0.5, ymax = value), position = "stack") +
annotate('segment',
x = min(data.plot$x) - 0.5,
xend = min(data.plot$x) - 0.5,
y = 0,
yend = y.max) +
annotate('segment',
x = min(data.plot$x) - 0.5,
xend = max(data.plot$x) - 0.5,
y = 0,
yend = 0)
I'd be interested in simpler solutions!

Resources