ggplot2 colorbar with discontinuous jump for skewed data - r

Here is some fake data, x and y, with color information z. z is highly skewed, and as such renders the colorbar uninformative:
set.seed(1)
N <- 100
x <- rnorm(N)
y <- x + rnorm(N)
z <- x+y+rnorm(N)
z[z>2] <- z[z>2]+exp(z[z>2]-2)
d <- data.frame(x,y,z)
ggplot(d, aes(x=x, y=y, color = z)) + geom_point()
I'd like to have most of the colorbar reflect the main range of the the data, but have a box for overflows, say above 5. Something like this:
Is there a way to do this in ggplot2? Note that I would like the colorbar to remain continuous, rather than discrete, for most of its range. I'll probably either discretize or topcode if what I want isn't feasible.

You can get that general plot, although the legends would need more work:
p <- ggplot(d, aes(x=x, y=y, color = z)) + geom_point(size = 5)
p + scale_color_gradient2(
low = 'green', high = 'red', mid = 'grey80', na.value = 'blue', limits= c(-10, 10)
)
You can cheat in some extra legend fluff, e.g.:
ggplot(d, aes(x=x, y=y, color = z, alpha = '>10')) +
geom_point(size = 5) +
scale_color_gradient2(
low = 'green', high = 'red', mid = 'grey80', na.value = 'blue', limits= c(-10, 10),
guide = guide_colorbar(title.position = 'left')
) +
scale_alpha_manual(
values = 1, name = 'z',
guide = guide_legend(
override.aes = list(color = 'blue'), title.position = 'left',
title.theme = element_text(color = 'white', angle = 0)
)
) +
theme(legend.margin = margin(-5, 10, -5, 10))
Note that red/green pallets are bad for the color impaired.

Extending upon Axeman's answer I came up with the following slight hack to get blues into your color scale:
First, define a color map with 20 colors for the values within and 5 for the values outside your range.
cmap <- colorRampPalette(c("green","grey80","red"))(20)
cmap <- append(cmap,rep("blue",5))
Then cut the z values into 20 chunks between -10 and 10 and convert to numeric (resulting in NA's for values above 10). By specifying the cmap in scale_color_gradientn and limits of [1,25] we map values of -10 to 1 (green) and 10 to 20 (red). Finally by specifying breaks we manually add the correct labels (i.e. the 5th category corresponds to values between -6 and -5).
ggplot(d, aes(x=x, y=y, color=as.numeric(cut(z, breaks=seq(-10,10))))) +
geom_point(size=3) +
scale_color_gradientn(colors=cmap, limits=c(1,25), breaks=c(5,11,17,23),
labels=c(-6,0,6,">10"), name="z", na.value = "blue")
Lovely result :)
The only issue is that you will have to make sure that no values will ever fall below -10 as they would also be shown in blue as well using this method.

Related

ggplot2 plots more points than asked

I am trying to fill a square region with non-overlapping squares with different colors and ggplot2 is plotting more points than those in the dataframe at the higher x and y limits. Here is the code
l = 1000
a=seq(0,1, 1/(l-1))
x=rep(a, each=length(a))
y=rep(a, length(a))
k = length(x)
c=sample(1:10, k, replace = TRUE)
data <- data.frame(x, y, c)
ggplot(data, aes(x=x, y=y)) + geom_point(shape=15, color=c)
ggsave('k.jpg', width=10, height=10)
The result I am getting with RStudio is this. Notice the extra points on the right and top of the image.
How can I get ggplot to plot exactly one square exclusively for those points in the dataframe and not more?
As a second related question, this is what happens if l is changed from 1000 to l=100
My problem is now that the squares are not perfectly stacked, leaving empty space between them. I would like to know how can I compute from the number of points in each dimension of the array (l), the correct value for size inside geom_point so that the squares are perfectly stacked.
Many thanks
You might be better off with geom_tile, rather than geom_point, as this will allow more control over the size of the rectangles and the border width. See ?geom_tile for details.
Providing a couple of alternatives using OP's example, reducing the data frame dimension to increase the size of the tile:
Data
library(ggplot2)
l = 100
a = seq(0, 1, 1 / (l - 1))
x = rep(a, each = length(a))
y = rep(a, length(a))
k = length(x)
c = sample(1:10, k, replace = TRUE)
data <- data.frame(x, y, c)
Example 1
Very simple, just pasing "white" as colour to make the tiles more distinctive.
ggplot(data, aes(x = x, y = y, fill = c)) + geom_tile(colour = "white")
Example 2
Creating manually a palette, and coord_equal to force a specified ratio (default 1) so tiles are squares:
colors<-c("peachpuff", "yellow", "orange", "orangered", "red",
"darkred","firebrick", "royalblue", "darkslategrey", "black")
ggplot(data, aes(x = x, y = y)) +
geom_tile(aes(fill = factor(c)), colour = "white") +
scale_fill_manual(values = colors, name = "Colours") +
coord_equal()
Comparing geom_point and geom_tile
Creating small data frame (10 x 10, l = 10) to observe closer what happens when using geom_point instead of geom_tile.
Original OP code
ggplot(data, aes(x = x, y = y)) + geom_point(shape = 15, color = c)
Example 1
ggplot(data, ae(x = x, y = y, fill = c)) + geom_tile(colour = "white")
Example 2
colors<-c("peachpuff", "yellow", "orange", "orangered", "red",
"darkred","firebrick", "royalblue", "darkslategrey", "black")
ggplot(data, aes(x = x, y = y)) +
geom_tile(aes(fill = factor(c)), colour = "white") +
scale_fill_manual(values = colors, name = "Colours") +
coord_equal()

ggplot2: dealing with extremes values by setting a continuous color scale

I am trying to plot some global maps (raster files) and I have some problems in setting up a good color scale for my data. What I would like to do is to plot my data using a divergent palette (e.g. cm.colors), and I would like to center the color "white" of such scale with the value zero, but without having to set symmetric values in the scale (i.e. the same value both negative and positive, i.e. limits=c(-1,1)). Additionally, I would like to plot all values above and/or below a certain value all with the same color.
In other words, if we suppose that my map has a range of -100 to 150, I would like to plot my map with a diverging palette with a "white" color corresponding to the value 0, and having all values e.g. below -20 and above 50 plotted with the same color, i.e. respectively with the negative and positive extremes of the color palette.
Here an example of the code that I am using for the moment:
ggplot(df, aes(y=Latitude, x=Longitude)) +
geom_raster(aes(fill=MAP)) +
coord_equal()+
theme_gray() +
theme(panel.background = element_rect(fill = 'skyblue2', colour = 'black'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "right",
legend.key = element_blank()) +
scale_fill_gradientn("MAP", limits=c(-0.5,1), colours=cm.colors(20))
There are simple ways to accomplish this, such as truncating your data beforehand, or using cut to create discrete bins for appropriate labels.
require(dplyr)
df %>% mutate(z2 = ifelse(z > 50, 50, ifelse(z < -20, -20, z))) %>%
ggplot(aes(x, y, fill = z2)) + geom_tile() +
scale_fill_gradient2(low = cm.colors(20)[1], high = cm.colors(20)[20])
df %>% mutate(z2 = cut(z, c(-Inf, seq(-20, 50, by = 10), Inf)),
z3 = as.numeric(z2)-3) %>%
{ggplot(., aes(x, y, fill = z3)) + geom_tile() +
scale_fill_gradient2(low = cm.colors(20)[1], high = cm.colors(20)[20],
breaks = unique(.$z3), labels = unique(.$z2))}
But I'd thought about this task before, and felt unsatisfied with that. The pre-truncating doesn't leave nice labels, and the cut option is always fiddly (particularly having to adjust the parameters of seq inside cut and figure out how to recenter the bins). So I tried to define a reusable transformation that would do the truncating and relabeling for you.
I haven't fully debugged this and I'm going out of town, so hopefully you or another answerer can take a crack at it. The main problem seems to be collisions in the edge cases, so occasionally the limits overlap the intended breaks visually, as well as some unexpected behavior with the formatting. I just used some dummy data to create your desired range of -100 to 150 to test it.
require(scales)
trim_tails <- function(range = c(-Inf, Inf)) trans_new("trim_tails",
transform = function(x) {
force(range)
desired_breaks <- extended_breaks(n = 7)(x[x >= range[1] & x <= range[2]])
break_increment <- diff(desired_breaks)[1]
x[x < range[1]] <- range[1] - break_increment
x[x > range[2]] <- range[2] + break_increment
x
},
inverse = function(x) x,
breaks = function(x) {
force(range)
extended_breaks(n = 7)(x)
},
format = function(x) {
force(range)
x[1] <- paste("<", range[1])
x[length(x)] <- paste(">", range[2])
x
})
ggplot(df, aes(x, y, fill = z)) + geom_tile() +
guides(fill = guide_colorbar(label.hjust = 1)) +
scale_fill_gradient2(low = cm.colors(20)[1], high = cm.colors(20)[20],
trans = trim_tails(range = c(-20,50)))
Also works with a boxed legend instead of a colorbar, just use ... + guides(fill = guide_legend(label.hjust = 1, reverse = T)) + ...

Numbered point labels plus a legend in a scatterplot

I am trying to label points in a scatterplot in R (ggplot2) using numbers (1, 2, 3, ...) and then match the numbers to names in a legend (1 - Alpha, 2 - Bravo, 3 - Charlie... ), as a way of dealing with too many, too long labels on the plot.
Let's assume this is a.df:
Name X Attribute Y Attribute Size Attribute Color Attribute
Alpha 1 2.5 10 A
Bravo 3 3.5 5 B
Charlie 2 1.5 10 C
Delta 5 1 15 D
And this is a standard scatterplot:
ggplot(a.df, aes(x=X.Attribute, y=Y.Attribute, size=Size.Attribute, fill=Colour.Attribute, label=Name)) +
geom_point(shape=21) +
geom_text(size=5, hjust=-0.2,vjust=0.2)
Is there a way to change it as follows?
have scatterplot points labeled with numbers (1,2,3...)
have a legend next to the plot assigning the plot labels (1,2,3...) to a.df$Name
In the next step I would like to assign other attributes to the point size and color, which may rule out some 'hacks'.
Here's an alternative solution, which draws the labels as geom_text. I've borrowed from
ggplot2 - annotate outside of plot.
library(MASS) # for Cars93 data
library(grid)
library(ggplot2)
d <- Cars93[1:30,]
d$row_num <- 1:nrow(d)
d$legend_entry <- paste(" ", d$row_num, d$Manufacturer, d$Model)
ymin <- min(d$Price)
ymax <- max(d$Price)
y_values <- ymax-(ymax-ymin)*(1:nrow(d))/nrow(d)
p <- ggplot(d, aes(x=Min.Price, y=Price)) +
geom_text(aes(label=row_num)) +
geom_text(aes(label=legend_entry, x=Inf, y=y_values, hjust=0)) +
theme(plot.margin = unit(c(1,15,1,1), "lines"))
gt <- ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name == "panel"] <- "off"
grid.draw(gt)
This is pretty hacky, but might help. The plot labels are simply added by geom_text, and to produce a legend, I've mapped colour to a label in the data. Then to stop the points being coloured, I override it with scale_colour_manual, where you can set the colour of the points, as well as the labels on the legend. Finally, I made the points in the legend invisible by setting alpha = 0, and the squares that are usually behind the dots in theme().
dat <- data.frame(id = 1:10, x = rnorm(10), y = rnorm(10), label = letters[1:10])
ggplot(dat, aes(x, y)) + geom_point(aes(colour = label)) +
geom_text(aes(x = x + 0.1, label = id)) +
scale_colour_manual(values = rep("black", nrow(dat)),
labels = paste(dat$id, "=", dat$label)) +
guides(colour = guide_legend(override.aes = list(alpha = 0))) +
theme(legend.key = element_blank())

row column heatmap plot with overlayed circle (fill and size) in r

Here is a graph I am trying to develop:
I have row and column coordinate variables, also three quatitative variables (rectheat = to fill the rectangle heatmap,circlesize = size of circles, circlefill = fill color heatmap). NA should be missing represented by a different color (for example gray color).
The following is data:
set.seed (1234)
rectheat = sample(c(rnorm (10, 5,1), NA, NA), 7*14, replace = T)
dataf <- data.frame (rowv = rep (1:7, 14), columnv = rep(1:14, each = 7),
rectheat, circlesize = rectheat*1.5,
circlefill = rectheat*10 )
dataf
Here is code that I worked on:
require(ggplot2)
ggplot(dataf, aes(y = factor(rowv),x = factor(columnv))) +
geom_rect(aes(colour = rectheat)) +
geom_point(aes(colour = circlefill, size =circlesize)) + theme_bw()
I am not sure if geom_rect is appropriate and other part is fine as I could not get any results except errors.
Here it is better to use geom_tile (heatmap).
require(ggplot2)
ggplot(dataf, aes(y = factor(rowv),
x = factor(columnv))) + ## global aes
geom_tile(aes(fill = rectheat)) + ## to get the rect filled
geom_point(aes(colour = circlefill,
size =circlesize)) + ## geom_point for circle illusion
scale_color_gradient(low = "yellow",
high = "red")+ ## color of the corresponding aes
scale_size(range = c(1, 20))+ ## to tune the size of circles
theme_bw()

Using multiple scale_colour_gradient scales for different ranges of the data in one plot

I am very new to R so please bear with me if something is not clear in my question.
I have a data.frame "protein" with 5 columns, namely;
1.protein_name, 2.protein_FC, 3.protein_pval, 4.mRNA_FC, 5.mRNA_pval and 6.freq.
I am trying to plot a volcano plot with x=log2(protein_FC), y=-log10(protein_pval). Then map the size of the dots to freq and colour to mRNA_FC. This all works fine and here is the code that I have used:
ggplot( protein [ which ( protein$freq <= 0.05 ),] , aes( x = log2( protein_FC ) ,
y = -log10 ( protein_pval ) , size = freq , colour = mRNA_FC ,
label = paste(protein_name,",",mRNA_pval), alpha=1/1000)) +
geom_point() + geom_text( hjust = 0 , vjust = 0 , colour = "black" , size = 2.5 ) +
geom_abline( intercept = 1.3 , slope = 0) +
scale_colour_gradient(limits=c(-3,3))
all is fine till here. But because of the nature of the experiment, data it is quite dense around mRNA_FC = 0. There, the default colour scheme that ggplot applies doesnt work very well in distinguishing different points.
I have tried various colour scales by using low="colour1" and high="colour2". However I think it will be best to use multiple colour scales over the ranges of mRNA_FC, i.e. something like. blue to white for -3<mRNA<-0.2, red to white for -0.2<mRNA_FC<0, green to white for 0<mRNA_FC<0.2 and black to white for 0.2<mRNA_FC<3.
But I havent found any way of doing it yet.
Any help would be appreciated.
Cheers!
For this type of thing you want to use scale_gradientn. For example:
library(ggplot2)
x = seq(-0.1, 0.1, len=100)
y = 0:10
dat = expand.grid(x=x, y=y)
ggplot(data=dat, aes(x=x, y=y, fill=x)) +
geom_raster() +
scale_fill_gradientn(colours=c('red', 'yellow', 'cyan', 'blue'),
values = c(-0.05,-1e-32,1e-32,0.05),
breaks = c(-0.05,-0.005,0.005,0.05),
rescaler = function(x,...) x,
oob = identity)

Resources