ggplot2::scale_fill_gradient in R - custom colours - r

Help needed please with scale_fill_gradient() in ggplot2. Consider this simple example:
library(tidyverse)
set.seed(3)
dat <- as.data.frame(matrix(runif(100, 0, 1), ncol = 10))
# turn from wide to long
dat2 <- dat %>%
rownames_to_column("model_1") %>%
gather(model_2, value, -model_1) %>%
mutate(model_1 = model_1 %>% factor(levels = 1:10),
model_2 = factor(gsub("V", "", model_2), levels = 1:10))
## plot data
ggplot(dat2, aes(model_1, model_2)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = round(value, 2))) +
scale_fill_gradient(low = "yellow", high = "white")
What I'd like to do is to have custom shading rules. For example, something like:
value < 0.05: bright yellow
value >= 0.05 & value < 0.15: pale yellow
value >= 0.15: white
is.na(value): white
Is there any easy way to do this please? I'm afraid I don't really follow the documentation for scale_fill_gradient().
Thank you.

You can try manually add color:
dat2 <- dat2 %>%
mutate(
my_color = case_when(
value < 0.05 ~ "#FFFF00" ,
value >= 0.05 & value < 0.15 ~ "#FFFFCC",
TRUE ~ "#FFFFFF"
)
)
ggplot(dat2, aes(model_1, model_2)) +
geom_tile(aes(fill = my_color)) +
geom_text(aes(label = round(value, 2))) +
scale_fill_identity()

Related

How to draw a multi-colored dashed line (alternating colors for visual effect) [duplicate]

This question already has answers here:
Alternating color of individual dashes in a geom_line
(4 answers)
Closed 8 months ago.
I was wondering if it is possible to create a multicolored dashed line in ggplot.
Basically I have a plot displaying savings based on two packages.
A orange line with savings based on package A
A green line with savings based on package B
I also have a third line and I would like that one to be dashed alterenating between orange and green. Is that something that somebody has been able to do?
Here is an example:
library(tidyverse)
S <- seq(0, 5, by = 0.05)
a <- S ^ 2
b <- S
a_b = a + b #This data should have the dashed multicolor line, since it is the sum of the other two lines.
S <- data.frame(S)
temp <- cbind(S, a, b, a_b)
temp <- gather(temp, variable, value, -S)
desiredOrder <- c("a", "b", "a_b")
temp$variable <- factor(temp$variable, levels = desiredOrder)
temp <- temp[order(temp$variable),]
p <- ggplot(temp, aes(x = S, y = value, colour = variable)) +
theme_minimal() +
geom_line(size = 1) +
scale_color_manual(name = "Legend", values = c("orange", "green", "#0085bd"),
breaks = c("a", "b", "a_b"))
p
I basically want to have a multicolored (dashed or dotted) line for "c"
This is, to my best knowledge, currently only possible via creation of new segments for each alternate color. This is fiddly.
Below I've tried a largely programmatic approach in which you can define the size of the repeating segment (based on your x unit). The positioning of y values is slightly convoluted and it will also result in slightly irregular segment lengths when dealing with different slopes. I also haven't tested it on many data, either. But I guess it's a good start :)
For the legend, I'm taking the same approach, by creating a fake legend and stitching it onto the other plot. The challenges here include:
positioning of legend elements relative to the plot
relative distance between the legend elements
update
For a much neater way to create those segments and a Stat implementation see this thread
library(tidyverse)
library(patchwork)
S <- seq(0, 5, by = 0.05)
a <- S^2
b <- S
a_b <- a + b
df <- data.frame(x = S, a, b, a_b) %>%
pivot_longer(-x, names_to = "variable", values_to = "value")
## a function to create modifiable cuts in order to get segments.
## this looks convoluted - and it is! there are a few if/else statements.
## Why? The assigment of new y to x values depends on how many original values
## you have.
## There might be more direct ways to get there
alt_colors <- function(df, x, y, seg_length, my_cols) {
x <- df[[x]]
y <- df[[y]]
## create new x for each tiny segment
length_seg <- seg_length / length(my_cols)
new_x <- seq(min(x, na.rm = TRUE), x[length(x)], length_seg)
## now we need to interpolate y values for each new x
## This is different depending on how many x and new x you have
if (length(new_x) < length(x)) {
ind_int <- findInterval(new_x, x)
new_y <- sapply(seq_along(ind_int), function(i) {
if (y[ind_int[i]] == y[ind_int[length(ind_int)]]) {
y[ind_int[i]]
} else {
seq_y <- seq(y[ind_int[i]], y[ind_int[i] + 1], length.out = length(my_cols))
head(seq_y, -1)
}
})
} else {
ind_int <- findInterval(new_x, x)
rle_int <- rle(ind_int)
new_y <- sapply(rle_int$values, function(i) {
if (y[i] == y[max(rle_int$values)]) {
y[i]
} else {
seq_y <- seq(y[i], y[i + 1], length.out = rle_int$lengths[i] + 1)
head(seq_y, -1)
}
})
}
## THis is also a bit painful and might cause other bugs that I haven't
## discovered yet.
if (length(unlist(new_y)) < length(new_x)) {
newdat <- data.frame(
x = new_x,
y = rep_len(unlist(new_y), length.out = length(new_x))
)
} else {
newdat <- data.frame(x = new_x, y = unlist(new_y))
}
newdat <- newdat %>%
mutate(xend = lead(x), yend = lead(y)) %>%
drop_na(xend)
newdat$color <- my_cols
newdat
}
## the below is just a demonstration of how the function would work
## using different segment widths
df_alt1 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", 1, c("orange", "green"))
df_alt.5 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", .5, c("orange", "green"))
df_ab <-
df %>%
filter(variable != "a_b") %>%
# for the identity mapping
mutate(color = ifelse(variable == "a", "green", "orange"))
## create data frame for the legend, also using the alt_colors function as per above
## the amount of x is a bit of trial and error, this is just a quick hack
## this is a trick to center the legend more or less relative to the main plot
y_leg <- ceiling(mean(range(df$value, na.rm = TRUE)))
dist_y <- 2
df_legend <-
data.frame(
variable = rep(unique(df$variable), each = 2),
x = 1:2,
y = rep(seq(y_leg - dist_y, y_leg + dist_y, by = dist_y), each = 2)
)
df_leg_onecol <-
df_legend %>%
filter(variable != "a_b") %>%
mutate(color = ifelse(variable == "a", "green", "orange"))
df_leg_alt <-
df_legend %>%
filter(variable == "a_b") %>%
alt_colors("x", "y", .5, c("orange", "green"))
## I am mapping the colors globally using identity mapping (see scale_identity).
p1 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt1, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every 1 unit")
p.5 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt.5, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every .5 unit")
p_leg <-
ggplot(mapping = aes(x, y, colour = color)) +
theme_void() +
geom_line(data = df_leg_onecol, size = 1) +
geom_segment(data = df_leg_alt, aes(xend = xend, yend = yend), size = 1) +
scale_color_identity() +
annotate(
geom = "text", y = unique(df_legend$y), label = unique(df_legend$variable),
x = max(df_legend$x + 1), hjust = 0
)
## set y limits to the range of the main plot
## in order to make the labels visible you need to adjust the plot margin and
## turn clipping off
p1 + p.5 +
(p_leg + coord_cartesian(ylim = range(df$value), clip = "off") +
theme(plot.margin = margin(r = 20, unit = "pt"))) +
plot_layout(widths = c(1, 1, .2))
Created on 2022-01-18 by the reprex package (v2.0.1)
(Copied this over from Alternating color of individual dashes in a geom_line)
Here's a ggplot hack that is simple, but works for two colors only. It results in two lines being overlayed, one a solid line, the other a dashed line.
library(dplyr)
library(ggplot2)
library(reshape2)
# Create df
x_value <- 1:10
group1 <- c(0,1,2,3,4,5,6,7,8,9)
group2 <- c(0,2,4,6,8,10,12,14,16,18)
dat <- data.frame(x_value, group1, group2) %>%
mutate(group2_2 = group2) %>% # Duplicate the column that you want to be alternating colors
melt(id.vars = "x_value", variable.name = "group", value.name ="y_value") # Long format
# Put in your selected order
dat$group <- factor(dat$group, levels=c("group1", "group2", "group2_2"))
# Plot
ggplot(dat, aes(x=x_value, y=y_value)) +
geom_line(aes(color=group, linetype=group), size=1) +
scale_color_manual(values=c("black", "red", "black")) +
scale_linetype_manual(values=c("solid", "solid", "dashed"))
Unfortunately the legend still needs to be edited by hand. Here's the example plot.

Plotting multiple plots with two discrete variables - how to include all discrete variables in both axes

I have a dataset that looks like this:
test<-data.frame("M"=c("a","b","c","a","b","b","c","a","b","c"),
"N"=c(1,3,4,6,6,7,7,8,8,8),
"X"=c(0,1,0,1,1,0,1,0,1,1),
"Y"=c(1,1,0,0,1,0,1,1,1,0))
I'm making a simple plot where I want X and Y on the y axis, M on the x axis, each grid colored if the value of X or Y is 1 and empty if the value of X or Y is 0. I'm repeating this for each categories in N (the categories of N are 1 to 5, 6, 7, 8), then stacking all plots together. Right now, I'm doing this with the following code.
test <- test[order(test$N),]
test1 <- test[c(1:3),]
test2 <- test[c(4:5),]
test3 <- test[c(6:7),]
test4 <- test[c(8:10),] # I'm doing this to "separate" categories of `N` manually
p1 <- test1[,c(1,3:4)] %>%
gather(col_name, value, -M) %>%
ggplot(aes(factor(M), col_name, fill = value == 1))+
geom_tile(colour = 'black')+
scale_fill_manual(values = c('FALSE' = 'white', 'TRUE' = 'red'))
p2 <- test2[,c(1,3:4)] %>%
gather(col_name, value, -M) %>%
ggplot(aes(factor(M), col_name, fill = value == 1))+
geom_tile(colour = 'black')+
scale_fill_manual(values = c('FALSE' = 'white', 'TRUE' = 'yellow'))
p3 <- test3[,c(1,3:4)] %>%
gather(col_name, value, -M) %>%
ggplot(aes(factor(M), col_name, fill = value == 1))+
geom_tile(colour = 'black')+
scale_fill_manual(values = c('FALSE' = 'white', 'TRUE' = 'green'))
p4 <- test4[,c(1,3:4)] %>%
gather(col_name, value, -M) %>%
ggplot(aes(factor(M), col_name, fill = value == 1))+
geom_tile(colour = 'black')+
scale_fill_manual(values = c('FALSE' = 'white', 'TRUE' = 'blue'))
grid.arrange(p1, p2, p3, p4, ncol = 1)
I'm attaching an image of what I have right now. I want to fix these plots so that I would have the same factors of M for all four plots (right now, only p1 and p4 have all three factors (a, b and c) in the x axis but I want to add factor c to p2 and a to p3 so that all x axes are identical to each other. Can anyone give me suggestions on how to do this?
(Also, I'm suspecting that the current way I'm plotting things is probably not the most quickest/easiest way to go, if anyone has suggestions on how to improve things it'd be really helpful!)
To continue using grid.arrange(), instead of facet_wrap(), do the following:
Make M a factor:
test$M <- factor(test$M)
Add the following to each of your plots:
scale_x_discrete(limits = levels(test$M))
Maybe one approach I can suggest you is using facets after applying a smart trick to group your values and avoid splitting in different dataframes. Here the code as an option for you (The colors will be the same across the facets in base of TRUE/FALSE values):
library(tidyverse)
#Code
test %>% mutate(Var=lead(N)) %>%
mutate(Diff=Var-N,Diff=ifelse(row_number()==1,0,Diff)) %>%
mutate(Group=ifelse(Diff==0,N,NA)) %>%
fill(Group) %>% select(-c(N,Var,Diff)) %>%
group_by(Group) %>% mutate(NG=paste0('p',cur_group_id())) %>% ungroup() %>%
select(-Group) %>%
pivot_longer(cols = -c(NG,M)) %>%
ggplot(aes(factor(M), name, fill = value == 1,group=value))+
geom_tile(colour = 'black')+
facet_wrap(.~NG,ncol = 1)+
scale_fill_manual('value',values=c('tomato','cyan3'))+
xlab('M')
Output:
The othe option would be patchwork with a customized function:
library(tidyverse)
library(patchwork)
#Code
data <- test %>% mutate(Var=lead(N)) %>%
mutate(Diff=Var-N,Diff=ifelse(row_number()==1,0,Diff)) %>%
mutate(Group=ifelse(Diff==0,N,NA)) %>%
fill(Group) %>% select(-c(N,Var,Diff)) %>%
group_by(Group) %>% mutate(NG=paste0('p',cur_group_id())) %>% ungroup() %>%
select(-Group) %>%
mutate(M=factor(M,levels = unique(M),ordered = T)) %>%
pivot_longer(cols = -c(NG,M))
#List
List <- split(data,data$NG)
#Function
myfun <- function(x)
{
#Test for color
val <- unique(x$NG)
#Conditioning for color
if(val=='p1') {vcolor=c('FALSE' = 'white', 'TRUE' = 'red')} else
if(val=='p2') {vcolor=c('FALSE' = 'white', 'TRUE' = 'yellow')} else
if(val=='p3') {vcolor=c('FALSE' = 'white', 'TRUE' = 'green')} else
{vcolor=c('FALSE' = 'white', 'TRUE' = 'blue')}
#Update data
x <- x %>% mutate(M=factor(M,levels = c('a','b','c'),ordered = T)) %>% complete(M=M)
#Plot
G <- ggplot(x,aes(factor(M), name, fill = (value == 1 & !is.na(value))))+
geom_tile(colour = 'black')+
scale_fill_manual('value',values=vcolor)+
xlab('M')+
scale_y_discrete(limits=c('X','Y'))+
theme_bw()+
ggtitle(val)
return(G)
}
#Apply
Lplot <- lapply(List,myfun)
#Wrap
GF <- wrap_plots(Lplot,ncol = 1)
Output:
Something like this?
test<-data.frame("M"=c("a","b","c","a","b","b","c","a","b","c"),
"N"=c(1,3,4,6,6,7,7,8,8,8),
"X"=c(0,1,0,1,1,0,1,0,1,1),
"Y"=c(1,1,0,0,1,0,1,1,1,0))
library(tidyverse)
test = mutate(test, N2 = cut(N, breaks = c(0,5:100)))
m = pivot_longer(test, c(X, Y))
ggplot(m, aes(M, name,fill=factor(value))) +
geom_tile(colour = 'black') +
facet_wrap(~N2, scales = 'free') +
scale_fill_manual(values = c(`0` = 'white', `1` = 'red'))

Color aes disconnects geom_line()

I'm trying to make a plot, and show different colors when p > 0.5, but when I use the color aes, the line appears to be disconnected.
library(tidyverse)
data <- tibble(n = 1:365)
prob <- function (x) {
pr <- 1
for (t in 2:x) {
pr <- pr * ((365 - t + 1) / 365)
}
return(1 - pr)
}
data %>%
mutate(prob = map_dbl(n, prob)) %>%
filter(n < 100) %>%
ggplot(aes(x = n, y = prob, color = prob > 0.5)) + geom_line() +
scale_x_continuous(breaks = seq(0,100,10))
Anyone knows why? Removing the color aes() provides an unique line.
This is because prob is a discrete variable and condition prob > 0.5 is splitting your data into two parts, with gap between them: the first half has max(prob) = .476 and the second half has min(prob) = .507. Hence, the (vertical) gap on the line plot is the gap between this numbers.
you can see it, if you filter modified data for values close to .5:
data %>%
mutate(prob = map_dbl(n, prob)) %>%
filter(n < 100) %>%
filter(between(prob, .4, .6))
if we modify your example:
data2 <- data %>%
mutate(prob = map_dbl(n, prob)) %>%
filter(n < 100)
#bringing extremes closer together
data2$prob[22] <- .49999999999999
data2$prob[23] <- .50000000000001
data2 %>%
ggplot(aes(x = n, y = prob, color = prob >= 0.5)) + geom_line() +
scale_x_continuous(breaks = seq(0,100,10))
The gap becomes significantly smaller:
However, it is still present (mostly on horizontal level) - because x variable is also discrete
A simple way of fixing this is to add dummy aesthetic group = 1 inside aes(), which overrides default grouping by x variable.
data %>%
mutate(prob = map_dbl(n, prob)) %>%
filter(n < 100) %>%
#add 'group = 1' below
ggplot(aes(x = n, y = prob, color = prob >= 0.5, group = 1)) + geom_line() +
scale_x_continuous(breaks = seq(0,100,10))

dplyr find minimum in x,y-plot

to find and plot a minium in an X,Y-plot I made below, working code.
However I find it ugly and would expect something more elegant. The tibble, cf, contains in reality more then two variable. I reduced it to two only.
Who can provide me with more elegant code?
thx!!
=========
cf <- tibble(x = seq(0,pi,0.1), y = abs(cos(x)))
xlim <- cf %>% filter(y < 0.5) %>%
arrange(desc(y)) %>%
top_n(1) %>%
select(x)
ggplot(cf, aes(x, y) ) +
geom_point(size = 1, colour = "blue") +
geom_hline(colour = "red", size = 1.2, yintercept = 0.5) +
geom_vline(colour = "#99CCFF", size = 1, xintercept = as.numeric(xlim[1])) +
labs(title = paste0("Y < 0.5 for X > ", as.numeric(xlim[1])))
Sorry for not being explicit enough. Before I used below base R code:
`
if ( length(cf[cf$y > 0.5, c("x")]) > 0 ) {
xlim <- max(round(cf[cf$y > 0.5, c("x")],1))
} else {
xlim <- min(round(cf$x,1))
}
`
which I re-implemented using dplyr. I wonder if I choose the best dplyr implementation, as it seems so lengthy to me:
`
xlim <- cf %>% filter(y < 0.5) %>%
arrange(desc(y)) %>%
top_n(1) %>%
select(x)
`
I think the most straightforward way of doing that with dplyr is as follows:
xl <- cf %>% filter(y < 0.5) %>%
filter(y == max(y)) %>%
select(x) %>%
as.numeric()
Note that with the last expression you can use the value just as a scalar:
ggplot(cf, aes(x, y) ) +
geom_point(size = 1, colour = "blue") +
geom_hline(colour = "red", size = 1.2, yintercept = 0.5) +
geom_vline(colour = "#99CCFF", size = 1, xintercept = xl) +
labs(title = paste0("Y < 0.5 for X > ", xl))

How to highlight bin of observation in ggplot?

How to highlight the entire bar in which the observations obs.A and obs.B respectively are being allocated using ggplot? The exact same thing has been done for the regular hist() function but what is the ggplot way?
Below some code to illustrate
library(ggplot2)
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(100,10,2),0))
obs.A<-8
obs.B<-10
ggplot(data)+
geom_histogram(aes(x=Value))+
facet_grid(Var ~ .)
Edit: It needs to work for large and small sample sizes and really only highlight one and all of the bar.
one ggplot way is to build it into the dataframe used for plotting:
library(ggplot2)
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(100,10,2),0))
obs.A<-8
obs.B<-10
data$color <- ifelse(data$Var == "A" & data$Value == obs.A, T, F)
data$color <- ifelse(data$Var == "B" & data$Value == obs.B, T, data$color)
ggplot(data)+
geom_histogram(aes(x=Value, fill = color))+
facet_grid(Var ~ .)
Note this works easily for your test case because the range for data$Value is 5-16 and the default for geom_histogram() is bins = 30. If you wanted to make it more transferable you would want to set geom_histogram(binwidth = 1) or set data$color based on bins, something like this:
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(100,10,10),0)) # bigger sd
obs.A<-8
obs.B<-10
data$cuts <- cut(data$Value, 30, labels = F)
A_colored_cuts <- unique(data$cuts[data$Value == obs.A])
data$color <- ifelse(data$Var == "A" & data$cuts == A_colored_cuts, T, F)
B_colored_cuts <- unique(data$cuts[data$Value == obs.B])
data$color <- ifelse(data$Var == "B" & data$cuts == B_colored_cuts, T, data$color)
ggplot(data)+
geom_histogram(aes(x=Value, fill = color))+
facet_grid(Var ~ .)
EDIT: For larger sample sizes, we would want to use the second option outlined above and specify geom_histogram(boundary = .5), since we want the bin breaks on integers.
set.seed(1)
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(10000,10,10),0))
#use code chunk 2 above
ggplot(data)+
geom_histogram(aes(x=Value, fill = color), boundary = .5)+
facet_grid(Var ~ .)
Adding the conditions inside geom_histogram for the aesthetics fill. We remove the oversized legend with theme(legend.position = "none")
# Example 1
set.seed(12345)
data <- data.frame(Var = c(rep("A", 50), rep("B", 50)), Value = round(rnorm(100, 10, 2), 0))
ggplot(data) +
geom_histogram(aes(x = Value,
fill = Value == 8 & Var == "A" | Value == 10 & Var == "B"), binwidth=0.5) +
facet_grid(Var ~ .) +
theme(legend.position = "none")
# Example 2
set.seed(12345)
data <- data.frame(Var = c(rep("A", 50), rep("B", 50)), Value = round(rnorm(10000, 10, 10), 0))
ggplot(data) +
geom_histogram(aes(x = Value,
fill = Value == 8 & Var == "A" | Value == 10 & Var == "B"), binwidth = 0.5) +
facet_grid(Var ~ .) +
theme(legend.position = "none")
If we would like to assign different colours, we use scale_fill_manual:
# Example3
set.seed(12345)
data <- data.frame(Var = c(rep("A", 50), rep("B", 50)), Value = round(rnorm(100, 10, 2), 0))
ggplot(data) +
geom_histogram(aes(x = Value,
fill = Value == 8 & Var == "A" | Value == 10 & Var == "B"), binwidth=0.5) +
facet_grid(Var ~ .) +
scale_fill_manual(values = c("grey45", "red"))+
theme(legend.position = "none")
We can try this to have desired the highlighting for any size of data (we may need to adjust the bindwith with the data size for the bar chart to look prettier and more informative):
library(ggplot2)
set.seed(12345)
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(100,10,2),0))
obs.A<-8
obs.B<-10
cond <- (data$Var=='A' & data$Value == obs.A)|(data$Var=='B' & data$Value == obs.B)
binwidth <- 0.25
ggplot(data)+
geom_histogram(data=data[!cond,], aes(x=Value), binwidth=binwidth) +
geom_histogram(data=data[cond,], aes(x=Value), fill='red', binwidth=binwidth) +
facet_grid(Var ~ .)
set.seed(12345)
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(10000,10,10),0))
obs.A<-8
obs.B<-10
cond <- (data$Var=='A' & data$Value == obs.A)|(data$Var=='B' & data$Value == obs.B)
binwidth <- 0.5
ggplot(data)+
geom_histogram(data=data[!cond,], aes(x=Value), binwidth=binwidth) +
geom_histogram(data=data[cond,], aes(x=Value), fill='red', binwidth=binwidth) +
facet_grid(Var ~ .)

Resources