Color of levelplot in R - r

I have the following code. It produces a levelplot in which square values less than 0 should be colored in a red hue and squares with values greater than 0 in a blue hue. And then I would like squares with values of 0 to be colored white. However, nothing ends up being white. How can I fix this?
All three squares in that first column should be white.
library(lattice)
cc = colorRampPalette( c("red", "white","blue"))
trellis.par.set(regions=list(col=cc(20)))
x = c(1,2,3,1,2,3,1,2,3)
y = c(1,1,1,2,2,2,3,3,3)
z = c(0,-2,-3,0,2,3,0,1,-1)
df = data.frame(x,y,z)
p <- levelplot(z~x*y, df,
panel=function(...) {
arg <- list(...)
panel.levelplot(...)
})
print(p)
Update:
Here is a reproducible example that attempts to fix it, but still isn't quite right:
Here is a dataframe df:
x y z
1 1 1 -0.17457167
2 2 1 0.93407856
3 3 1 0.55129545
4 4 1 0.97388216
5 5 1 -1.00000000
6 6 1 0.52883410
7 7 1 -1.00000000
8 8 1 0.85112829
9 9 1 -1.00000000
10 10 1 1.00000000
11 11 1 -0.87714166
12 12 1 1.00000000
13 13 1 -0.95403260
14 14 1 1.00000000
15 15 1 -0.91600501
16 16 1 1.00000000
17 17 1 -1.00000000
18 18 1 -0.38800669
19 19 1 -0.52110322
20 20 1 0.00000000
21 21 1 -0.08211450
22 22 1 0.55390723
23 23 1 1.00000000
24 24 1 -0.04147514
25 25 1 -1.00000000
26 26 1 -0.39751358
27 27 1 -0.99550773
28 28 1 0.00000000
29 29 1 0.20737568
30 30 1 0.00000000
31 31 1 0.00000000
32 32 1 0.00000000
33 33 1 -0.26702883
And then here is the code:
cc = colorRampPalette( c("red", "white","blue"))
trellis.par.set(regions=list(col=cc(21)))
zrng <- range(z) # what's the range of z
tol <- 1e-2 # what tolerance is necessary?
colorBreaks <- c(
seq(zrng[1] - 0.01, 0 - tol, length.out = 11),
seq(0 + tol,zrng[2] + 0.01,length.out = 10))
p <- levelplot(z~x*y, df,
at = colorBreaks,
panel=function(...) {
arg <- list(...)
panel.levelplot(...)
})
print(p)
It produces this plot, which does not have a slot for the color white in the spectrum:

As thelatemail pointed out, cc(20) will never produce white ("#FFFFFF"). You have to use an odd number for the middle value of the color ramp to be represented exactly (checkout cc(3) vs. cc(4)).
Now, you need to set the at argument for levelplot to set breakpoints for the colors. The default is at = pretty(z):
#[1] -3 -2 -1 0 1 2 3
But you don't want 0 to be a breakpoint. You want it to have it's own color, and align with the middle of the color ramp.
You can achieve that by setting breakpoints as close to 0 as necessary (within some tol) to prevent any other values from mapping to white. The rough idea is to leave a little spot for 0 by doing something like this at = c(seq(-3.01, -0.00001, length.out = 11), seq(0.00001, 3.01, length.out = 11)) or using the similar method shown below. Because the color ramp has an odd number of values, the sequence needs an even number of values. (i.e. a color ramp of 3 colors can be divide by 2 breakpoints, but a color ramp of 4 values can be divided by 3 breakpoints)
trellis.par.set(regions=list(col=cc(21)))
# Define a sequence of breaks for the at argument to levelplot.
zrng <- range(z) # what's the range of z
tol <- 1e-5 # what tolerance is necessary?
colorBreaks <- c(
seq(zrng[1] - 0.01, # adding a small buffer on end
0 - tol,
length.out = 11),
seq(0 + tol,
zrng[2] + 0.01,
length.out = 11))
# note, I chose length.out = 11.
# Don't do more than roughly ceiling((# of colors) / 2)
p <- levelplot(z~x*y, df,
at = colorBreaks,
panel=function(...) {
arg <- list(...)
panel.levelplot(...)
})

Related

Find hexadecimal color for a specific value, given a color scale

I'm struggling with color coding and was hoping you could help.
Here is my issue. I have a dummy dataset:
df <- data.frame(x = 1:10, y = sample(1:100, 10, replace = FALSE, set.seed(2021)))
I first want to plot these data, using a specific color scale:
ggplot(data = df, aes(x = x, y = y, fill = y)) +
geom_point(shape = 21) +
scale_fill_continuous_divergingx(palette = "RdBu",
mid = 50,
rev = TRUE)
Now I would like to use the color that corresponds to, let's say, x = 6 (i.e. y = 70) for another plot. Given certain constraints, I cannot make just another simple ggplot with the same scale to do this, but would instead need to 'hardcode' the hexadecimal value of that specific color, i.e. #e99c8f.
Is there a way to do this, so that I can just then use fill = "#e99c8f" in my other plot?
Hardcoding hex values is easy for certain color scales, e.g. viridis, but I haven't found a way to do it with this one, which I need... :/
Thanks for your help!
Try with the ggplot_build() function, to get a data.frame with your data and all layers, and see the points to which color they map (fill column):
R> p <- ggplot(data = df, aes(x = x, y = y, fill = y)) +
+ geom_point(shape = 21) +
+ scale_fill_continuous_divergingx(palette = "RdBu",
+ mid = 50,
+ rev = TRUE)
R> x <- ggplot_build(p)
R> x$data
[[1]]
fill x y PANEL group shape colour size alpha stroke
1 #00578C 1 7 1 -1 21 black 1.5 NA 0.5
2 #CBDEEB 2 38 1 -1 21 black 1.5 NA 0.5
3 #ECF1F5 3 46 1 -1 21 black 1.5 NA 0.5
4 #F3D9D6 4 58 1 -1 21 black 1.5 NA 0.5
5 #0772AC 5 12 1 -1 21 black 1.5 NA 0.5
6 #E69C90 6 70 1 -1 21 black 1.5 NA 0.5
7 #EEBCB4 7 64 1 -1 21 black 1.5 NA 0.5
8 #611300 8 99 1 -1 21 black 1.5 NA 0.5
9 #E8A197 9 69 1 -1 21 black 1.5 NA 0.5
10 #4EA4CB 10 23 1 -1 21 black 1.5 NA 0.5
In addition, scale_fill_continuous_divergingx() is internally using RColorBrewer's "RdBu" palette, which you can query the hex colors by:
R> RColorBrewer::brewer.pal(11, "RdBu")
R> scales::show_col(RColorBrewer::brewer.pal(11, "RdBu"))

Adding Proportionate Na's in a column [duplicate]

I have a complete dataframe. I want to 20% of the values in the dataframe to be replaced by NAs to simulate random missing data.
A <- c(1:10)
B <- c(11:20)
C <- c(21:30)
df<- data.frame(A,B,C)
Can anyone suggest a quick way of doing that?
df <- data.frame(A = 1:10, B = 11:20, c = 21:30)
head(df)
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 14 24
## 5 5 15 25
## 6 6 16 26
as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ]))
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 14 24
## 5 5 NA 25
## 6 6 16 26
## 7 NA 17 27
## 8 8 18 28
## 9 9 19 29
## 10 10 20 30
It's a random process, so it might not give 15% every time.
You can unlist the data.frame and then take a random sample, then put back in a data.frame.
df <- unlist(df)
n <- length(df) * 0.15
df[sample(df, n)] <- NA
as.data.frame(matrix(df, ncol=3))
It can be done a bunch of different ways using sample().
If you are in the mood to use purrr instead of lapply, you can also do it like this:
> library(purrr)
> df <- data.frame(A = 1:10, B = 11:20, C = 21:30)
> df
A B C
1 1 11 21
2 2 12 22
3 3 13 23
4 4 14 24
5 5 15 25
6 6 16 26
7 7 17 27
8 8 18 28
9 9 19 29
10 10 20 30
> map_df(df, function(x) {x[sample(c(TRUE, NA), prob = c(0.8, 0.2), size = length(x), replace = TRUE)]})
# A tibble: 10 x 3
A B C
<int> <int> <int>
1 1 11 21
2 2 12 22
3 NA 13 NA
4 4 14 NA
5 5 15 25
6 6 16 26
7 7 17 27
8 8 NA 28
9 9 19 29
10 10 20 30
Same result, using binomial distribution:
dd=dim(df)
nna=20/100 #overall
df1<-df
df1[matrix(rbinom(prod(dd), size=1,prob=nna)==1,nrow=dd[1])]<-NA
df1
May i suggest a first function (ggNAadd) designed to do this, and improve it with a second function providing graphical distribution of the NAs created (ggNA)
What is neat is the possibility to input either a proportion of a fixed number of NAs.
ggNAadd = function(data, amount, plot=F){
temp <- data
amount2 <- ifelse(amount<1, round(prod(dim(data))*amount), amount)
if (amount2 >= prod(dim(data))) stop("exceeded data size")
for (i in 1:amount2) temp[sample.int(nrow(temp), 1), sample.int(ncol(temp), 1)] <- NA
if (plot) print(ggNA(temp))
return(temp)
}
And the plotting function:
ggNA = function(data, alpha=0.5){
require(ggplot2)
DF <- data
if (!is.matrix(data)) DF <- as.matrix(DF)
to.plot <- cbind.data.frame('y'=rep(1:nrow(DF), each=ncol(DF)),
'x'=as.logical(t(is.na(DF)))*rep(1:ncol(DF), nrow(DF)))
size <- 20 / log( prod(dim(DF)) ) # size of point depend on size of table
g <- ggplot(data=to.plot) + aes(x,y) +
geom_point(size=size, color="red", alpha=alpha) +
scale_y_reverse() + xlim(1,ncol(DF)) +
ggtitle("location of NAs in the data frame") +
xlab("columns") + ylab("lines")
pc <- round(sum(is.na(DF))/prod(dim(DF))*100, 2) # % NA
print(paste("percentage of NA data: ", pc))
return(g)
}
Which gives (using ggplot2 as graphical output):
ggNAadd(df, amount=0.20, plot=TRUE)
## [1] "percentage of NA data: 20"
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 NA 24
## ..
Of course, as mentioned earlier, if you ask too many NAs the actual percentage will drop because of repetitions.
A mutate_all approach:
df %>%
dplyr::mutate_all(~ifelse(sample(c(TRUE, FALSE), size = length(.), replace = TRUE, prob = c(0.8, 0.2)),
as.character(.), NA))

randomly insert sequence of missing data (NAs) [duplicate]

I have a complete dataframe. I want to 20% of the values in the dataframe to be replaced by NAs to simulate random missing data.
A <- c(1:10)
B <- c(11:20)
C <- c(21:30)
df<- data.frame(A,B,C)
Can anyone suggest a quick way of doing that?
df <- data.frame(A = 1:10, B = 11:20, c = 21:30)
head(df)
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 14 24
## 5 5 15 25
## 6 6 16 26
as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ]))
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 14 24
## 5 5 NA 25
## 6 6 16 26
## 7 NA 17 27
## 8 8 18 28
## 9 9 19 29
## 10 10 20 30
It's a random process, so it might not give 15% every time.
You can unlist the data.frame and then take a random sample, then put back in a data.frame.
df <- unlist(df)
n <- length(df) * 0.15
df[sample(df, n)] <- NA
as.data.frame(matrix(df, ncol=3))
It can be done a bunch of different ways using sample().
If you are in the mood to use purrr instead of lapply, you can also do it like this:
> library(purrr)
> df <- data.frame(A = 1:10, B = 11:20, C = 21:30)
> df
A B C
1 1 11 21
2 2 12 22
3 3 13 23
4 4 14 24
5 5 15 25
6 6 16 26
7 7 17 27
8 8 18 28
9 9 19 29
10 10 20 30
> map_df(df, function(x) {x[sample(c(TRUE, NA), prob = c(0.8, 0.2), size = length(x), replace = TRUE)]})
# A tibble: 10 x 3
A B C
<int> <int> <int>
1 1 11 21
2 2 12 22
3 NA 13 NA
4 4 14 NA
5 5 15 25
6 6 16 26
7 7 17 27
8 8 NA 28
9 9 19 29
10 10 20 30
Same result, using binomial distribution:
dd=dim(df)
nna=20/100 #overall
df1<-df
df1[matrix(rbinom(prod(dd), size=1,prob=nna)==1,nrow=dd[1])]<-NA
df1
May i suggest a first function (ggNAadd) designed to do this, and improve it with a second function providing graphical distribution of the NAs created (ggNA)
What is neat is the possibility to input either a proportion of a fixed number of NAs.
ggNAadd = function(data, amount, plot=F){
temp <- data
amount2 <- ifelse(amount<1, round(prod(dim(data))*amount), amount)
if (amount2 >= prod(dim(data))) stop("exceeded data size")
for (i in 1:amount2) temp[sample.int(nrow(temp), 1), sample.int(ncol(temp), 1)] <- NA
if (plot) print(ggNA(temp))
return(temp)
}
And the plotting function:
ggNA = function(data, alpha=0.5){
require(ggplot2)
DF <- data
if (!is.matrix(data)) DF <- as.matrix(DF)
to.plot <- cbind.data.frame('y'=rep(1:nrow(DF), each=ncol(DF)),
'x'=as.logical(t(is.na(DF)))*rep(1:ncol(DF), nrow(DF)))
size <- 20 / log( prod(dim(DF)) ) # size of point depend on size of table
g <- ggplot(data=to.plot) + aes(x,y) +
geom_point(size=size, color="red", alpha=alpha) +
scale_y_reverse() + xlim(1,ncol(DF)) +
ggtitle("location of NAs in the data frame") +
xlab("columns") + ylab("lines")
pc <- round(sum(is.na(DF))/prod(dim(DF))*100, 2) # % NA
print(paste("percentage of NA data: ", pc))
return(g)
}
Which gives (using ggplot2 as graphical output):
ggNAadd(df, amount=0.20, plot=TRUE)
## [1] "percentage of NA data: 20"
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 NA 24
## ..
Of course, as mentioned earlier, if you ask too many NAs the actual percentage will drop because of repetitions.
A mutate_all approach:
df %>%
dplyr::mutate_all(~ifelse(sample(c(TRUE, FALSE), size = length(.), replace = TRUE, prob = c(0.8, 0.2)),
as.character(.), NA))

Randomly insert NAs into dataframe proportionaly

I have a complete dataframe. I want to 20% of the values in the dataframe to be replaced by NAs to simulate random missing data.
A <- c(1:10)
B <- c(11:20)
C <- c(21:30)
df<- data.frame(A,B,C)
Can anyone suggest a quick way of doing that?
df <- data.frame(A = 1:10, B = 11:20, c = 21:30)
head(df)
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 14 24
## 5 5 15 25
## 6 6 16 26
as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ]))
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 14 24
## 5 5 NA 25
## 6 6 16 26
## 7 NA 17 27
## 8 8 18 28
## 9 9 19 29
## 10 10 20 30
It's a random process, so it might not give 15% every time.
You can unlist the data.frame and then take a random sample, then put back in a data.frame.
df <- unlist(df)
n <- length(df) * 0.15
df[sample(df, n)] <- NA
as.data.frame(matrix(df, ncol=3))
It can be done a bunch of different ways using sample().
If you are in the mood to use purrr instead of lapply, you can also do it like this:
> library(purrr)
> df <- data.frame(A = 1:10, B = 11:20, C = 21:30)
> df
A B C
1 1 11 21
2 2 12 22
3 3 13 23
4 4 14 24
5 5 15 25
6 6 16 26
7 7 17 27
8 8 18 28
9 9 19 29
10 10 20 30
> map_df(df, function(x) {x[sample(c(TRUE, NA), prob = c(0.8, 0.2), size = length(x), replace = TRUE)]})
# A tibble: 10 x 3
A B C
<int> <int> <int>
1 1 11 21
2 2 12 22
3 NA 13 NA
4 4 14 NA
5 5 15 25
6 6 16 26
7 7 17 27
8 8 NA 28
9 9 19 29
10 10 20 30
Same result, using binomial distribution:
dd=dim(df)
nna=20/100 #overall
df1<-df
df1[matrix(rbinom(prod(dd), size=1,prob=nna)==1,nrow=dd[1])]<-NA
df1
May i suggest a first function (ggNAadd) designed to do this, and improve it with a second function providing graphical distribution of the NAs created (ggNA)
What is neat is the possibility to input either a proportion of a fixed number of NAs.
ggNAadd = function(data, amount, plot=F){
temp <- data
amount2 <- ifelse(amount<1, round(prod(dim(data))*amount), amount)
if (amount2 >= prod(dim(data))) stop("exceeded data size")
for (i in 1:amount2) temp[sample.int(nrow(temp), 1), sample.int(ncol(temp), 1)] <- NA
if (plot) print(ggNA(temp))
return(temp)
}
And the plotting function:
ggNA = function(data, alpha=0.5){
require(ggplot2)
DF <- data
if (!is.matrix(data)) DF <- as.matrix(DF)
to.plot <- cbind.data.frame('y'=rep(1:nrow(DF), each=ncol(DF)),
'x'=as.logical(t(is.na(DF)))*rep(1:ncol(DF), nrow(DF)))
size <- 20 / log( prod(dim(DF)) ) # size of point depend on size of table
g <- ggplot(data=to.plot) + aes(x,y) +
geom_point(size=size, color="red", alpha=alpha) +
scale_y_reverse() + xlim(1,ncol(DF)) +
ggtitle("location of NAs in the data frame") +
xlab("columns") + ylab("lines")
pc <- round(sum(is.na(DF))/prod(dim(DF))*100, 2) # % NA
print(paste("percentage of NA data: ", pc))
return(g)
}
Which gives (using ggplot2 as graphical output):
ggNAadd(df, amount=0.20, plot=TRUE)
## [1] "percentage of NA data: 20"
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 NA 24
## ..
Of course, as mentioned earlier, if you ask too many NAs the actual percentage will drop because of repetitions.
A mutate_all approach:
df %>%
dplyr::mutate_all(~ifelse(sample(c(TRUE, FALSE), size = length(.), replace = TRUE, prob = c(0.8, 0.2)),
as.character(.), NA))

Emulate ggplot2 default color palette

What function can I use to emulate ggplot2's default color palette for a desired number of colors. For example, an input of 3 would produce a character vector of HEX colors with these colors:
It is just equally spaced hues around the color wheel, starting from 15:
gg_color_hue <- function(n) {
hues = seq(15, 375, length = n + 1)
hcl(h = hues, l = 65, c = 100)[1:n]
}
For example:
n = 4
cols = gg_color_hue(n)
dev.new(width = 4, height = 4)
plot(1:n, pch = 16, cex = 2, col = cols)
This is the result from
library(scales)
show_col(hue_pal()(4))
show_col(hue_pal()(3))
These answers are all very good, but I wanted to share another thing I discovered on stackoverflow that is really quite useful, here is the direct link
Basically, #DidzisElferts shows how you can get all the colours, coordinates, etc that ggplot uses to build a plot you created. Very nice!
p <- ggplot(mpg,aes(x=class,fill=class)) + geom_bar()
ggplot_build(p)$data
[[1]]
fill y count x ndensity ncount density PANEL group ymin ymax xmin xmax
1 #F8766D 5 5 1 1 1 1.111111 1 1 0 5 0.55 1.45
2 #C49A00 47 47 2 1 1 1.111111 1 2 0 47 1.55 2.45
3 #53B400 41 41 3 1 1 1.111111 1 3 0 41 2.55 3.45
4 #00C094 11 11 4 1 1 1.111111 1 4 0 11 3.55 4.45
5 #00B6EB 33 33 5 1 1 1.111111 1 5 0 33 4.55 5.45
6 #A58AFF 35 35 6 1 1 1.111111 1 6 0 35 5.55 6.45
7 #FB61D7 62 62 7 1 1 1.111111 1 7 0 62 6.55 7.45
From page 106 of the ggplot2 book by Hadley Wickham:
The default colour scheme, scale_colour_hue picks evenly spaced hues
around the hcl colour wheel.
With a bit of reverse engineering you can construct this function:
ggplotColours <- function(n = 6, h = c(0, 360) + 15){
if ((diff(h) %% 360) < 1) h[2] <- h[2] - 360/n
hcl(h = (seq(h[1], h[2], length = n)), c = 100, l = 65)
}
Demonstrating this in barplot:
y <- 1:3
barplot(y, col = ggplotColours(n = 3))
To get the hex values instead of the plot you can use:
hue_pal()(3)
Instead of this code:
show_col(hue_pal()(3))

Resources