How to map ggplot histogram x-axis intervals to fixed colour palette? - r

I am trying to stratify my ggplot2 histogram into fixed intervals and colour them based on a specific colour palette: 'x<4':black; '4<x<6':blue; '6<x<8':yellow; and so on...
I tried 2 ways, both of which didn't work.
Referring to my code below, alternative 1 fails when NoOfElement falls to a small figure, say, 500, and there is no element in the first interval 'x<4'. ggplot2 then assigns 'black' to whichever is the first interval (this would be '4<x<6' when size=500). But this is not what I want (see picture).
In Alternative 2, I created another variable in my data frame and assigned the colours for each element. I did this based on a modification of the solution given in: Set specific fill colors in ggplot2 by sign. Unfortunately, the resulting histogram has colours randomly assigned by ggplot2.
I'm quite stuck and would really appreciate some help. Thanks in advance!
Sample code:
library(ggplot2)
NoOfElement <- 5000; MyBreaks <- c(-Inf, seq(4, 16, by=2), Inf)
MyColours <- c("black", "blue", "yellow", "green", "gray", "brown", "purple", "red")
set.seed(2)
c <- data.frame(a=rnorm(NoOfElement, 10, 2), b=rep(NA, NoOfElement))
c$b <- cut(c$a, MyBreaks)
try <- 1 # Allows toggling of alternatives below
if (try==1)
{
p <- ggplot( c, aes(x=c$a, fill=c$b) ) + geom_histogram( binwidth=0.2 ) +
scale_fill_manual(breaks = levels(c$b), values = MyColours,
name = "X Intervals") +
scale_x_continuous( limits=c(2, 20))
}else
{
c$BarCol <- factor(c$b, levels = levels(c$b), labels = MyColours)
p <- ggplot( c, aes(x=c$a, fill=c$b) ) + geom_histogram( binwidth=0.2 ) +
scale_fill_manual(values = c$BarCol, name = "X Intervals") +
scale_x_continuous( limits=c(2, 20))
}
plot (p)

There's a drop argument in scale_ family for empty levels:
NoOfElement <- 500; MyBreaks <- c(-Inf, seq(4, 16, by=2), Inf)
MyColours <- c("black", "blue", "yellow", "green", "gray", "brown", "purple", "red")
set.seed(2)
c <- data.frame(a=rnorm(NoOfElement, 10, 2), b=rep(NA, NoOfElement))
c$b <- cut(c$a, MyBreaks)
p <- ggplot( c, aes(x=c$a, fill=c$b) ) + geom_histogram( binwidth=0.2 ) +
scale_fill_manual(breaks = levels(c$b), values = MyColours,
name = "X Intervals", drop=FALSE)
Related question here.

Related

Need a solution on color problem with ggplot [duplicate]

This question already has answers here:
Plot with conditional colors based on values in R [duplicate]
(2 answers)
How to conditionally highlight points in ggplot2 facet plots - mapping color to column
(2 answers)
Closed 1 year ago.
I have a problem with visualizing a ggplot graph. I would love to have different intervals in different colors.
Here is my code:
ggplot(totalarum,aes(y=Pris,x=Boyta)+
geom_point() +
xlim(100, 200) +
ylim(1, 10000000)
so what im after is a solution on how to get, for an example x<=100 to get the color red and 100<x<=200 to be the color blue.
Thanks on advance!
If you have two intervals or only one condition you could map that condition on the color aesthetic and set your desired colors via scale_color_manual:
totalarum <- data.frame(
Boyta = seq(0, 200, length.out = 20),
Pris = seq(0, 10000000, length.out = 20)
)
library(ggplot2)
ggplot(totalarum, aes(y = Pris, x = Boyta, color = Boyta <= 100)) +
geom_point() +
scale_color_manual(values = c("TRUE" = "red", "FALSE" = "blue"))
EDIT In the more general case where you have multiple intervals I would suggest to add a new column to your dataset using e.g. cut which could then be mapped on the color aes:
library(ggplot2)
totalarum <- data.frame(
Boyta = seq(0, 500, length.out = 20),
Pris = seq(0, 10000000, length.out = 20)
)
totalarum$Boyta_cut <- cut(totalarum$Boyta, breaks = seq(0, 500, 100), include.lowest = TRUE, right = TRUE)
colors <- c("red", "blue", "green", "purple", "steelblue")
ggplot(totalarum, aes(y = Pris, x = Boyta, color = factor(Boyta_cut))) +
geom_point() +
scale_color_manual(values = colors)

Fill or colour in continuous scale through gradient of more than 2 defined colours

Is it possible to make a colour gradient through more than low and high defined parameters?
Lets say, on this data:
df <- data.frame(a = 1:100,
b = rnorm(100, mean = 1000, sd = 500))
ggplot(df) +
geom_point(aes(a, b)) +
scale_fill_continous(low = "white", high = "black")
## as "low" and "high" is all scale_fill_continous() funcion offer
Setting the low and high parameters only results in a linear colour gradient, but I want more combined colour gradient, lets say from white to blue and then to black.
Thank you for your attention and answers.
You can try:
set.seed(123)
df <- data.frame(a=1:100, b=rnorm(100, mean = 1000, sd = 500))
ggplot(df,aes(x=a, y=b, col=b)) + geom_point() +
scale_colour_gradientn(colours = c("green", "blue","blue", "red","red", "yellow"),breaks = c(0,1000,2000,3000), limits=c(0,3000))
ggplot(data.frame(x=seq(along=x), y=rnorm(100, mean = 1000, sd = 500)), aes(x=x, y=y, colour=y)) +
geom_point() +
scale_colour_gradientn(values=c(0, 1000, 2000, 3000),
colours=c("green", "blue", "red", "yellow"),
rescaler=function(x, ...) x, oob=identity)
giving this plot
The colorpanel function in gplots is ideal for this. It produces a spectrum of n colours (e.g., 50) between 2 or 3 colours (middle colour is optional):
library("gplots")
scale_fill_continuous <- colorpanel(50, "blue", "red") #2 colours
scale_fill_continuous <- colorpanel(50, "blue", "white", "red") #3 colours
There are also various preset functions, e.g., rainbow(50), greenred(50) (mid=black), or bluered(50) (mid=white).
For the example given:
#Generate a colour spectrum
library("gplots")
scale_fill_continuous <- colorpanel(300, "green", "purple", "yellow")
#Generate data
set.seed(5)
data <- rnorm(100, mean = 1000, sd = 500)
mapping <- ceiling(data/100) #round to index in colour scale
coloured_data <- scale_fill_continuous[mapping] #map to colour sale

Plot multiple layers with ggplot2

I am trying to plot two data.frame as two layers using ggplot2 "geom_raster" function. The top layer contains NA values that are set to "transparent" in order to make the underneath layer visible. As the scale_fill_xxx function can't be used twice, I've tried the following code (based on this post : ggplot2 - using two different color scales for overlayed plots) :
library(ggplot2)
df1 <- data.frame(x=rep(c(1,2,3),times=3), y=c(1,1,1,2,2,2,3,3,3), data= c(NA,4,9,NA,2,7,NA,NA,3))
df2 <- data.frame(x=rep(c(1,2,3),times=3), y=c(1,1,1,2,2,2,3,3,3), data= c(1,NA,NA,2,NA,NA,1,2,NA))
ggplot() +
geom_raster(data=df1, aes(y= y, x= x, fill= data)) +
scale_fill_gradientn(name="df1", colours=c("red", "blue"), na.value = "transparent") +
geom_raster(data= df2, aes(y= y, x= x, colour= as.factor(data))) +
scale_colour_manual(values = c("green", "black"), name= "df2", labels= c("Class 1", "Class 2"), na.value="transparent")
The thing is that the "colour" / "scale_colour_manual" solution does not return what I expect (it returns a dark grey plot instead). I would like the df1 "data" column to be represented on a red to blue scale (NA's should be transparent) and the df2 "data" column to be represented according to class number ("1"=green and "2"=black).
Could anyone help me to understand what's wrong with my procedure?
Here is a solution :
df3 = merge(df1, df2, by = c("x","y"))
names(df3)[names(df3) == "data.x"] <- "data.1"
names(df3)[names(df3) == "data.y"] <- "data.2"
df3$data = df3$data.1
df3$data[is.na(df3$data)] = df3$data.2[is.na(df3$data)]
myGrad <- colorRampPalette(c('blue','red')) # color gradient
min_value = min(df3$data[df3$data >2]) # minimum value except 1 and 2
max_value = max(df3$data) # maximum value
param = max_value - min_value + 1 # number of colors in the gradient
ggplot(df3, aes(x, y, fill = data)) + geom_raster() +
scale_fill_gradientn(colours=c("green","black", myGrad(param)),
values = rescale(c(1, 2, seq(min_value, max_value, 1))), na.value = "transparent")
I guess you will use this plot with higher values and ranges, I tried with a 5x5 matrix:
set.seed(123)
df4 = data.frame(x=rep(c(1,2,3,4,5),5), y=c(rep(1,5), rep(2,5), rep(3,5), rep(4,5), rep(5,5)),
data = sample(c(1:20), 25, prob = c(0.2,0.2,rep(0.6/18,18)), replace = T))
min_value = min(df4$data[df4$data >2])
max_value = max(df4$data)
param = max_value - min_value + 1
ggplot(df4, aes(x, y, fill = data)) + geom_raster() +
scale_fill_gradientn(colours=c("green","black", myGrad(param)),
values = rescale(c(1, 2, seq(min_value, max_value, 1))), na.value = "transparent")

Change alpha value for certain break values in ggplot geom_point

I have made a scatter plot from 100k++ points and i would like the colour points (break values 1 and 2 which are "green" and break value 20 which is "red") to stand out more than the "cornsilk1" points (break values 3 to 19). I have tried the code below but no luck.
Any help would be appreciated.
Thanks so much
p.s. please excuse my juvenile code. I am sure there is a way more effective way to do this...
plotIA<-ggplot(plotintaobs,aes(x=SD13009PB,y=SD13009PB2,colour=quartile))+geom_point()+labs(x="Phillips Observeration 1", y="Phillips Observation 2") + ggtitle("Intra-observer Variation") + mytheme
plotIA+ scale_color_manual(breaks = c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20"),
values=c("green","green", "cornsilk1", "cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","cornsilk1","red"))
plotIA+scale_alpha_manual(values=c(1,1,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,1))
One strategy is to use cut to split the quartiles into into your three groups. Then you can use scale_colour_manual
# some fake data
plotintaobs <- data.frame(SD13009PB = rnorm(20), SD13009PB2 = rnorm(20), quartile = 1:20)
#cut quartile
plotintaobs$q2 <- cut(plotintaobs$quartile, breaks = c(0, 2, 19, 20), labels = c("low", "mid", "high"))
#plot
plotIA <- ggplot(plotintaobs, aes(x = SD13009PB, y = SD13009PB2, colour = q2, alpha = q2)) +
geom_point() +
scale_colour_manual(values = c("green", "cornsilk1","red")) +
scale_alpha_manual(values = c(1, 0.8, 1))
plotIA

How to make a color scale with sharp transition in ggplot2

I am trying to create a color scale with a sharp color transition at one point. What I am currently doing is:
test <- data.frame(x = c(1:20), y = seq(0.01, 0.2, by = 0.01))
cutoff <- 0.10
ggplot(data = test,
aes(x = as.factor(x), y = y, fill = log(y), width = 1, binwidth = 0)) +
geom_bar(stat = "identity") +
scale_fill_gradientn(colours = c("red", "red", "yellow", "green"),
values = rescale(log(c(0.01, cutoff - 0.0000000000000001, cutoff, 0.2))),
breaks = c(log(cutoff)), label = c(cutoff))
It is producing the plots I want. But the position of the break in colorbar somehow varies depending on the cutoff. Sometimes below the value, sometimes above, sometimes on the line. Here are some plots with different cutoffs (0.05, 0.06, 0.1):
What am I doing wrong? Or alternatively, is there a better way to create a such a color scale?
Have you looked into scale_colour_steps or scale_colour_stepsn?
Using the option n.break from scale_colour_stepsn you should be able to specify the number of breaks you want and have sharper transitions.
Be sure to use ggplot2 > 3.3.2
In case you are still interested in a solution for this, you can add guide = guide_colourbar(nbin = <some arbitrarily large number>) to scale_fill_gradientn(). This increases the number of bins used by the colourbar legend, which makes the transition look sharper.
# illustration using nbin = 1000, & weighted colours below the cutoff
plot.cutoff <- function(cutoff){
p <- ggplot(data = test,
aes(x = as.factor(x), y = y, fill = log(y))) +
geom_col(width = 1) +
scale_fill_gradientn(colours = c("red4", "red", "yellow", "green"),
values = scales::rescale(log(c(0.01, cutoff - 0.0000000000000001,
cutoff, 0.2))),
breaks = c(log(cutoff)),
label = c(cutoff),
guide = guide_colourbar(nbin = 1000))
return(p)
}
cowplot::plot_grid(plot.cutoff(0.05),
plot.cutoff(0.06),
plot.cutoff(0.08),
plot.cutoff(0.1),
ncol = 2)
(If you find the above insufficiently sharp at very high resolutions, you can also set raster = FALSE in guide_colourbar(), which turns off interpolation & draws rectangles instead.)
I think it is slightly tricky to achieve an exact, discrete cutoff point in the continuous color scale using scale_fill_gradientn. A quick alternative would be to use scale_fill_gradient, set the cutoff with limits, and set the color of 'out-of-bounds' values with na.value.
Here's a slightly simpler example than in your question:
# some data
df <- data.frame(x = factor(1:10), y = 1, z = 1:10)
# a cutoff point
lo <- 4
ggplot(df, aes(x = x, y = y, fill = z)) +
geom_bar(stat = "identity") +
scale_fill_gradient(low = "yellow", high = "green",
limits = c(lo, max(df$z)), na.value = "red")
As you see, the values below your cutpoint will not appear in the legend, but one may consider including a large chunk of red a waste of "legend band width" anyway. You might just add a verbal description of the red bars in the figure caption instead.
You may also wish to differentiate between values below a lower cutpoint and above an upper cutpoint. For example, set 'too low' values to blue and 'too high values' to red. Here I use findInterval to differentiate between low, mid and high values.
# some data
set.seed(2)
df <- data.frame(x = factor(1:10), y = 1, z = sample(1:10))
# lower and upper limits
lo <- 3
hi <- 8
# create a grouping variable based on the the break points
df$grp <- findInterval(df$z, c(lo, hi), rightmost.closed = TRUE)
ggplot(df, aes(x = x, y = y, fill = z)) +
geom_bar(stat = "identity") +
scale_fill_gradient(low = "yellow", high = "green", limits = c(lo, hi), na.value = "red") +
geom_bar(data = df[df$grp == 0, ], fill = "blue", stat = "identity")

Resources