How do you simultaneously use `group_by()` and `ggplot_build()` with facets? - r

# Create the Data Frame
library(tidyverse)
library(ggQC)
set.seed(5555)
Golden_Egg_df <- data.frame(month = 1:12,
egg_diameter = rnorm(n=12, mean=1.5, sd=0.2)) %>%
mutate(grp = c(rep("A", 3), rep("B", 9)))
Golden_Egg_df$egg_diameter[3] <- 5
# Determine the control limit values (red lines)
p <- ggplot(Golden_Egg_df, aes(x = month, y = egg_diameter)) +
geom_point() +
geom_line() +
stat_QC(method = "XmR")
pb <- ggplot_build(p)
thres <- range(pb$data[[3]]$yintercept)
# Circle anything outside the control limits (red lines)
p + geom_point(
data = subset(Golden_Egg_df,
egg_diameter > max(thres) | egg_diameter < min(thres)),
shape = 21,
size = 4,
col = "red"
)
The code chunk above determines the y-values of the control limits (red lines) from the ggplot_build() function. It then draws red circles around outliers. This works great until I facet the plot. It's because the logic of thres <- range(pb$data[[3]]$yintercept) isn't "smart" enough to wade through the different facet groupings.
# ONLY ONE 'Y-INTERCEPT' RANGE HERE TO WORRY ABOUT WITHOUT FACETING
#> $`data`[[3]]
#> yintercept y x label
#> 1 -0.2688471 -0.2688471 -Inf LCL
#> 2 3.7995203 3.7995203 -Inf UCL
#> 3 -0.2688471 -0.2688471 Inf -0.3
#> 4 3.7995203 3.7995203 Inf 3.8
# MULTIPLE 'Y-INTERCEPT' RANGES HERE TO WORRY ABOUT WITH FACETING
#> $`data`[[3]]
#> yintercept y x label
#> 1 -0.8759612 -0.8759612 -Inf LCL
#> 2 4.5303358 4.5303358 -Inf UCL
#> 3 -0.8759612 -0.8759612 Inf -0.9
#> 4 4.5303358 4.5303358 Inf 4.5
#> 5 1.2074161 1.2074161 -Inf LCL
#> 6 1.9521532 1.9521532 -Inf UCL
#> 7 1.2074161 1.2074161 Inf 1.2
#> 8 1.9521532 1.9521532 Inf 2
How do I get my code block below to work properly and circle the outliers? I obviously need a more sophisticated thres2, that can recognize there are different groupings of control limits (red lines) between the different facets.
# Determine the control limit values (red lines)
Golden_Egg_df$egg_diameter[11] <- 5
p2 <- ggplot(Golden_Egg_df, aes(x = month, y = egg_diameter)) +
geom_point() +
geom_line() +
stat_QC(method = "XmR") +
facet_grid(~ grp, scales = "free_x", space = "free_x") +
scale_x_continuous(breaks = 1:12, labels = month.abb)
pb2 <- ggplot_build(p2)
thres2 <- range(pb2$data[[3]]$yintercept)
thres2
#> [1] -2.274056 7.445141
# Circle anything outside the control limits (red lines)
p2 + geom_point(
data = subset(Golden_Egg_df,
egg_diameter > max(thres2) | egg_diameter < min(thres2)),
shape = 21,
size = 4,
col = "red"
)

I think the best way is to get the ranges in the same data.frame as your data. I'am not sure if this is the most elegant solution, but it works with your example:
library(tidyverse)
library(ggQC)
set.seed(5555)
Golden_Egg_df <- data.frame(month = 1:12,
egg_diameter = rnorm(n=12, mean=1.5, sd=0.2)) %>%
mutate(grp = c(rep("A", 3), rep("B", 9)))
Golden_Egg_df$egg_diameter[3] <- 5
Golden_Egg_df$egg_diameter[11] <- 5
# create the plot
p2 <- ggplot(Golden_Egg_df, aes(x = month,
y = egg_diameter)) +
geom_point() +
geom_line() +
stat_QC(method = "XmR") +
facet_grid(~ grp,
scales = "free_x",
space = "free_x") +
scale_x_continuous(breaks = 1:12,
labels = month.abb)
# get all the info about the plot
pb2 <- ggplot_build(p2)
# extract the UCL and LCL for each plot (facet)
Golden_Egg_df <- Golden_Egg_df %>%
mutate(min = ifelse(grp == "A",
min(pb2$data[[3]]$yintercept[1:4]), # LCL of 1st plot
min(pb2$data[[3]]$yintercept[5:8])), # LCL of 1st plot
max = ifelse(grp == "A",
max(pb2$data[[3]]$yintercept[1:4]), # UCL 2nd plot
max(pb2$data[[3]]$yintercept[5:8]))) # UCL 2nd plot
# add the circled outlier
p2 + geom_point(data = subset(Golden_Egg_df,
egg_diameter > max |
egg_diameter < min),
shape = 21,
size = 4,
col = "red")
Cheers, Rico

Related

R question: How to mimic this graph to show the percent change of cells in health and disease?

I am interested in showing how the proportions of cells change from health to disease. I wanted to show a 'flow' from health to disease rather than just have two separate stacked bar charts, but I'm unsure if this type of visualization has a name and I have not been able to find many examples online. I would like to do this in R. It's almost a mix between a sankey diagram and a chord diagram.
I was hoping some of you would have some ideas on which packages I could use to achieve this in R.
As Ian Campbell points out in the comments, this is called an alluvial plot, and you can probably get quite close with the ggalluvial package. However, it is possible to get a near-identical recreation of your plot using just geom_ribbon and geom_text from ggplot2:
However, it's a bit tricky to do. First we need a way of producing those nice smooth curves that go from one side to the other. The following function takes the starting and ending levels (as numbers between 0 and 1). It also allows an optional increasing or decreasing the width of the columns on either side:
ribbon_line <- function(p1, p2, width = 10, len = 100)
{
if (width > 50) width <- 50
if (width < 0) width <- 0
if (p1 < 0) p1 <- 0
if (p1 > 1) p1 <- 1
if (p2 < 0) p2 <- 0
if (p2 > 1) p2 <- 1
yvals <- c(p1, p1, pnorm(seq(-2.5, 2.5, length.out = len)) * (p2 - p1) + p1, p2, p2)
xvals <- c(0, seq(width, 100 - width, length.out = len + 2), 100)
list(x = xvals, y = yvals)
}
Now we need a way of combining two lines into a data frame with co-ordinates we can plot:
ribbon_df <- function(uppers, lowers, group, width = 10)
{
data.frame(x = ribbon_line(uppers[1], uppers[2], width)$x,
ymax = ribbon_line(uppers[1], uppers[2], width)$y,
ymin = ribbon_line(lowers[1], lowers[2], width)$y,
group = group, stringsAsFactors = FALSE)
}
Next, we need a method of taking a simple input and turning it into a group of these ribbons, plus left and right columns, plus text labels:
multi_ribbons <- function(left_bottom, right_bottom, left_top, right_top,
groups, width = 10)
{
if (length(left_bottom) != length(right_bottom) |
length(left_bottom) != length(left_top) |
length(left_top) != length(right_top))
stop("Left and right columns different length")
if (length(groups) != length(left_bottom))
stop("Group length has to be same length as columns")
d <- lapply(seq_along(groups), function(i) {
ribbon_df(c(left_top[i], right_top[i]),
c(left_bottom[i], right_bottom[i]),
groups[i], width)})
left_cols <- lapply(d, function(x) x[1:2,])
right_cols <- lapply(d, function(x) x[nrow(x) - 1:0,])
res <- list( left = do.call(rbind, left_cols),
right = do.call(rbind, right_cols),
bands = do.call(rbind, d))
text_y <- c((res$left$ymax + res$left$ymin)/2,
(res$right$ymax + res$right$ymin)/2)
text_x <- c(rep(width / 2, length(res$left$x)),
rep(100 - width/2, length(res$left$x)))
text_labels <- paste0(round(c(res$left$ymax - res$left$ymin,
res$right$ymax - res$right$ymin), 3) * 100, "%")
res$text <- data.frame(x = text_x, y = text_y, labels = text_labels)
res
}
Finally, we want a way of taking our data as a simple pair of factor vectors and using the above functions to plot them:
alluvial <- function(yvar, xvar, width = 20)
{
tab <- table(yvar, xvar)
x_labs <- rownames(tab)
y_labs <- colnames(tab)
left <- tab[1,]/sum(tab[1,])
left <- cumsum(sort(left))
right <- tab[2,]/sum(tab[2,])
right <- cumsum(sort(right))
left_lower <- c(0, left[-length(left)])
names(left_lower) <- names(left)
right_lower <- c(0, right[-length(right)])
names(right_lower) <- names(right)
right <- right[match(names(left), names(right))]
right_lower <- right_lower[match(names(left), names(right_lower))]
df_list <- multi_ribbons(left_lower, right_lower, left, right,
names(left), width = 20)
ggplot(df_list$bands, aes(x = x, ymin = ymin, ymax = ymax, fill = group)) +
geom_ribbon(alpha = 0.5) +
geom_ribbon(alpha = 1, data = df_list$left) +
geom_ribbon(alpha = 1, data = df_list$right) +
geom_text(data = df_list$text, inherit.aes = FALSE, colour = "white",
aes(x = x, y = y, label = labels), size = 8) +
geom_text(data = data.frame(x = c(width / 2, 100 - width /2), y = c(1.05, 1.05),
labels = factor(x_labs, levels = x_labs)),
inherit.aes = FALSE,
mapping = aes(x = x, y = y, label = labels), size = 12) +
geom_text(data = data.frame(x = rep(-5, length(y_labs)),
y = unique(df_list$text$y[1:(nrow(df_list$text)/2)]),
labs = unique(df_list$bands$group)),
mapping = aes(x = x, y = y, colour = labs, label = labs),
inherit.aes = FALSE, size = 8, hjust = 1) +
scale_fill_manual(values = c("#e64b35", "#806249", "#00a087", "#3c5488")) +
scale_colour_manual(values = c("#e64b35", "#806249", "#00a087", "#3c5488")) +
coord_cartesian(xlim = c(-15, 101)) +
theme_void() + theme(legend.position = "none")
}
So, if we you data frame is in a format like this:
head(df, 20)
#> condition variable
#> 110 Disease Immune
#> 149 Disease Fibroblast
#> 133 Disease Immune
#> 184 Disease Endothelial
#> 137 Disease Immune
#> 200 Disease Endothelial
#> 30 Health Immune
#> 11 Health Immune
#> 63 Health Fibroblast
#> 88 Health Endothelial
#> 42 Health Fibroblast
#> 38 Health Fibroblast
#> 106 Disease Immune
#> 139 Disease Immune
#> 6 Health Epithelial
#> 21 Health Immune
#> 27 Health Immune
#> 181 Disease Endothelial
#> 95 Health Endothelial
#> 108 Disease Immune
You can just do:
alluvial(df$condition, df$variable)
To get the above plot, or, for something more random:
set.seed(69)
alluvial(sample(c(TRUE, FALSE), 200, replace = TRUE),
sample(LETTERS[1:4], 200, replace = TRUE))
If you want more than four colour or fill levels, you can remove or adjust the scale_colour_manual and scale_fill_manual calls, to get, for example:
set.seed(69)
alluvial(sample(c(TRUE, FALSE), 200, replace = TRUE),
sample(LETTERS[1:20], 200, replace = TRUE))

Small ggplots on a ggmap - a purrr map version

Based on Small ggplot2 plots placed on coordinates on a ggmap
I would like to have the same solution, but with ggplot function outside the pipeline, applied with purrr::map().
The data for small bar subplots indicating 2 values, may contain
lon, lat, id, valueA, valueB,
After tidyr::gather operation it may look like:
Town, Potential_Sum, lon, lat, component , sales
Aaa, 9.00, 20.80, 54.25, A, 5.000
Aaa, 9.00, 20.80, 54.25, B, 4.000
Bbb, 5.00, 19.60, 50.50, A, 3.000
Bbb, 5.00, 19.60, 50.50, B, 2.000
Current working solution is to use do() to generate sublopts and then ggplotGrob to generate a column with objects "grobs" to be placed at lon,lat locations on a ggmap.
maxSales <- max(df$sales)
df.grobs <- df %>%
do(subplots = ggplot(., aes(1, sales, fill = component)) +
geom_col(position = "dodge", alpha = 0.50, colour = "white") +
coord_cartesian(ylim = c(0, maxSales)) +
scale_fill_manual(values = c("green", "red"))+
geom_text(aes(label=if_else(sales>0,round(sales), NULL)), vjust=0.35,hjust=1.1, colour="black",
position=position_dodge(.9), size=2.5, angle=90)+
theme_void()+ guides(fill = F)) %>%
mutate(subgrobs = list(annotation_custom(ggplotGrob(subplots),
x = lon-0.14, y = lat-0.20,
xmax = lon+0.14, ymax = lat+1.2)))
df.grobs %>%
{p + geom_label(aes(x = 15, y = 49.8, label = "A"), colour = c("black"),fill = "green", size=3)+
geom_label(aes(x = 15, y = 5.01, label = "B"), colour = c("black"),fill = "red", size=3)+
.$subgrobs +
geom_text(data=df, aes(label = Miasto), vjust = 3.5,nudge_x = 0.05, size=2.5) +
geom_col(data = df,
aes(0,0, fill = component),
colour = "white")}
p is a ggmap object, map of Poland, on which I would like to place small plots:
# p <-
# get_googlemap(
# "Poland",
# maptype = "roadmap",
# zoom = 6,
# color = "bw",
# crop = T,
# style = "feature:all|element:labels|visibility:off" # 'feature:administrative.country|element:labels|visibility:off'
# ) %>% # or 'feature:all|element:labels|visibility:off'
# ggmap() + coord_cartesian() +
# scale_x_continuous(limits = c(14, 24.3), expand = c(0, 0)) +
# scale_y_continuous(limits = c(48.8, 55.5), expand = c(0, 0))
#
How to translate this solution to the syntax nest - apply -unnest so that the ggplot part should be outside of the piped expression as a function.
In other words. How to replace do() with map(parameters, GGPlot_function) and then plot grobs on a ggmap .
What I did so far was I tried to write a ggplot function
#----barplots----
maxSales <- max(df$sales)
fn_ggplot <- function (df, x, component, maxX) {
x <- enquo(x)
component <-enquo(component)
maxX <-enquo(maxX)
p <- ggplot(df, aes(1, !!x, fill = !!component)) +
geom_col(position = "dodge", alpha = 0.50, colour = "white") +
coord_cartesian(ylim = c(0, !!maxX)) +
scale_fill_manual(values = c("green", "red"))+
geom_text(aes(label=if_else(x>0,round(!!x), NULL)), vjust=0.35,hjust=1.1, colour="black",
position=position_dodge(.9), size=2.5, angle=90)+
theme_void()+ guides(fill = F)
return(p)
}
And got totaly confused trying to apply it like this (I am a constant beginner unfortunately)... this is not working, showing
df.grobs <- df %>%
mutate(subplots = pmap(list(.,sales,component,Potential_Sum),fn_ggplot)) %>%
mutate(subgrobs = list(annotation_custom(ggplotGrob(subplots),
x = lon-0.14, y = lat-0.20,
xmax = lon+0.14, ymax = lat+1.2)))
I get errors indicating I do not know what I am doing, ie lengths of arguments are incorrect and something else is expected.
message: Element 2 of `.l` must have length 1 or 7, not 2
class: `purrr_error_bad_element_length`
backtrace:
1. dplyr::mutate(...)
12. purrr:::stop_bad_length(...)
13. dplyr::mutate(...)
Call `rlang::last_trace()` to see the full backtrace
> rlang::last_trace()
x
1. +-`%>%`(...)
2. | +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
4. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
5. | \-global::`_fseq`(`_lhs`)
6. | \-magrittr::freduce(value, `_function_list`)
7. | \-function_list[[i]](value)
8. | +-dplyr::mutate(...)
9. | \-dplyr:::mutate.tbl_df(...)
10. | \-dplyr:::mutate_impl(.data, dots, caller_env())
11. +-purrr::pmap(list(., sales, component, Potential_Sum), fn_ggplot)
12. \-purrr:::stop_bad_element_length(...)
13. \-purrr:::stop_bad_length(...)
data
First let's build some sample data close to yours but reproducible without the need for an api key.
As a starting point we have a plot of a country map stored in p, and some data in long form to build the charts stored in plot_data.
library(maps)
library(tidyverse)
p <- ggplot(map_data("france"), aes(long,lat,group=group)) +
geom_polygon(fill = "lightgrey") +
theme_void()
set.seed(1)
plot_data <- tibble(lon = c(0,2,5), lat = c(44,48,46)) %>%
group_by(lon, lat) %>%
do(tibble(component = LETTERS[1:3], value = runif(3,min=1,max=5))) %>%
mutate(total = sum(value)) %>%
ungroup()
plot_data
# # A tibble: 9 x 5
# lon lat component value total
# <dbl> <dbl> <chr> <dbl> <dbl>
# 1 0 44 A 2.06 7.84
# 2 0 44 B 2.49 7.84
# 3 0 44 C 3.29 7.84
# 4 2 48 A 4.63 11.0
# 5 2 48 B 1.81 11.0
# 6 2 48 C 4.59 11.0
# 7 5 46 A 4.78 11.9
# 8 5 46 B 3.64 11.9
# 9 5 46 C 3.52 11.9
define a plotting function
we isolate the plotting code in a separate function
my_plot_fun <- function(data){
ggplot(data, aes(1, value, fill = component)) +
geom_col(position = position_dodge(width = 1),
alpha = 0.75, colour = "white") +
geom_text(aes(label = round(value, 1), group = component),
position = position_dodge(width = 1),
size = 3) +
theme_void()+ guides(fill = F)
}
build a wrapper
This function takes a data set, some coordinates and the plotting function as parameters, to annotate at the right spot.
annotation_fun <- function(data, lat,lon, plot_fun) {
subplot = plot_fun(data)
sub_grob <- annotation_custom(ggplotGrob(subplot),
x = lon-0.5, y = lat-0.5,
xmax = lon+0.5, ymax = lat+0.5)
}
The final code
The the code becomes simple, using nest and pmap
subgrobs <- plot_data %>%
nest(-lon,-lat) %>%
pmap(annotation_fun,plot_fun = my_plot_fun)
p + subgrobs

Free colour scales in facet_grid

Say I have the following data frame:
# Set seed for RNG
set.seed(33550336)
# Create toy data frame
loc_x <- c(a = 1, b = 2, c = 3)
loc_y <- c(a = 3, b = 2, c = 1)
scaling <- c(temp = 100, sal = 10, chl = 1)
df <- expand.grid(loc_name = letters[1:3],
variables = c("temp", "sal", "chl"),
season = c("spring", "autumn")) %>%
mutate(loc_x = loc_x[loc_name],
loc_y = loc_y[loc_name],
value = runif(nrow(.)),
value = value * scaling[variables])
which looks like,
# > head(df)
# loc_name variables season loc_x loc_y value
# 1 a temp spring 1 3 86.364697
# 2 b temp spring 2 2 35.222573
# 3 c temp spring 3 1 52.574082
# 4 a sal spring 1 3 0.667227
# 5 b sal spring 2 2 3.751383
# 6 c sal spring 3 1 9.197086
I want to plot these data in a facet grid using variables and season to define panels, like this:
g <- ggplot(df) + geom_point(aes(x = loc_name, y = value), size = 5)
g <- g + facet_grid(variables ~ season)
g
As you can see, different variables have very different scales. So, I use scales = "free" to account for this.
g <- ggplot(df) + geom_point(aes(x = loc_name, y = value), size = 5)
g <- g + facet_grid(variables ~ season, scales = "free")
g
Mucho convenient. Now, say I want to do this, but plot the points by loc_x and loc_y and have value represented by colour instead of y position:
g <- ggplot(df) + geom_point(aes(x = loc_x, y = loc_y, colour = value),
size = 5)
g <- g + facet_grid(variables ~ season, scales = "free")
g <- g + scale_colour_gradient2(low = "#3366CC",
mid = "white",
high = "#FF3300",
midpoint = 50)
g
Notice that the colour scales are not free and, like the first figure, values for sal and chl cannot be read easily.
My question: is it possible to do an equivalent of scales = "free" but for colour, so that each row (in this case) has a separate colour bar? Or, do I have to plot each variable (i.e., row in the figure) and patch them together using something like cowplot?
Using the development version of dplyr:
library(dplyr)
library(purrr)
library(ggplot2)
library(cowplot)
df %>%
group_split(variables, season) %>%
map(
~ggplot(., aes(loc_x, loc_y, color = value)) +
geom_point(size = 5) +
scale_colour_gradient2(
low = "#3366CC",
mid = "white",
high = "#FF3300",
midpoint = median(.$value)
) +
facet_grid(~ variables + season, labeller = function(x) label_value(x, multi_line = FALSE))
) %>%
plot_grid(plotlist = ., align = 'hv', ncol = 2)

Enforce same color palette for `color` and `fill` of a subset of data

Having the following sample dataset:
set.seed(20)
N <- 20
df1 <- data.frame(x = rnorm(N),
y = rnorm(N),
grp = paste0('grp_', sample(1:500, N, T)),
lab = sample(letters, N, T))
# x y grp lab
# 1 1.163 0.237 grp_104 w
# 2 -0.586 -0.144 grp_448 y
# 3 1.785 0.722 grp_31 m
# 4 -1.333 0.370 grp_471 z
# 5 -0.447 -0.242 grp_356 o
I want to plot all points but label only subset of them (say, those df1$x>0). It works fine when I use the same color=grp aesthetics for both geom_point and geom_text:
ggplot(df1, aes(x=x,y=y,color=grp))+
geom_point(size=4) +
geom_text(aes(label=lab),data=df1[df1$x>1,],size=5,hjust=1,vjust=1)+
theme(legend.position="none")
But if I want to change points design to fill=grp, colors of labels do not match anymore:
ggplot(df1, aes(x=x,y=y))+
geom_point(aes(fill=grp),size=4,shape=21) +
geom_text(aes(label=lab,color=grp),data=df1[df1$x>1,],size=5,hjust=1,vjust=1)+
theme(legend.position="none")
I understand palette is different because levels of the subset are not the same as levels of the whole dataset. But what would be the simplest solution to enforce using the same palette?
The issue arises from different factor levels for the text and fill colours. We can avoid dropping unused factor levels by using drop = FALSE inside scale_*_discrete:
ggplot(df1, aes(x=x,y=y))+
geom_point(aes(fill=grp),size=4,shape=21) +
geom_text(aes(label=lab,color=grp),data=df1[df1$x>1,],size=5,hjust=1,vjust=1)+
theme(legend.position="none") +
scale_fill_discrete(drop = F) +
scale_colour_discrete(drop = F)
Update
With your real data we need to make sure that grp is in fact a factor.
# Load sample data
load("df1.Rdat")
# Make sure `grp` is a factor
library(tidyverse)
df1 <- df1 %>% mutate(grp = factor(grp))
# Or in base R
# df1$grp = factor(df1$grp)
# Same as before
ggplot(df1, aes(x=x,y=y))+
geom_point(aes(fill=grp),size=4,shape=21) +
geom_text(aes(label=lab,color=grp),data=df1[df1$x>1,],size=5,hjust=1,vjust=1)+
theme(legend.position="none") +
scale_fill_discrete(drop = F) +
scale_colour_discrete(drop = F)
One way is to leave the colour / fill palettes alone, & set all unwanted labels to be transparent instead:
ggplot(df1, aes(x = x, y = y)) +
geom_point(aes(fill = grp), size = 4, shape = 21) +
geom_text(aes(label = lab, color = grp,
alpha = x > 1),
size = 5, hjust = 1, vjust = 1) +
scale_alpha_manual(values = c("TRUE" = 1, "FALSE" = 0)) +
theme(legend.position = "none")

How to make stacked circle plot without coord_polar

I've got a dataset similar to this:
x <- 100 - abs(rnorm(1e6, 0, 5))
y <- 50 + rnorm(1e6, 0, 3)
dist <- sqrt((x - 100)^2 + (y - 50)^2)
z <- exp(-(dist / 8)^2)
which can be visualised as follows:
data.frame(x, y, z) %>%
ggplot() + geom_point(aes(x, y, color = z))
What I would like to do is a stacked half-circle plot with averaged value of z in subsequent layers. I think it can be done with the combination of geom_col and coord_polar(), although the farthest I can get is
data.frame(x, y, z, dist) %>%
mutate(dist_fct = cut(dist, seq(0, max(dist), by = 5))) %>%
ggplot() + geom_bar(aes(x = 1, y = 1, fill = dist_fct), stat = 'identity', position = 'fill') +
coord_polar()
which is obviously far from the expectation (layers should be of equal size, plot should be clipped on the right half).
The problem is that I can't really use coord_polar() due to further use of annotate_custom(). So my question are:
can plot like this can be done without coord_polar()?
If not, how can it be done with coord_polar()?
The result should be similar to a graphic below, except from plotting layers constructed from points I would like to plot only layers as a whole with color defined as an average value of z inside a layer.
If you want simple radius bands, perhaps something like this would work as you pictured it in your question:
# your original sample data
x <- 100 - abs(rnorm(1e6, 0, 5))
y <- 50 + rnorm(1e6, 0, 3)
dist <- sqrt((x - 100)^2 + (y - 50)^2)
nbr_bands <- 6 # set nbr of bands to plot
# calculate width of bands
band_width <- max(dist)/(nbr_bands-1)
# dist div band_width yields an integer 0 to nbr bands
# as.factor makes it categorical, which is what you want for the plot
band = as.factor(dist %/% (band_width))
library(dplyr)
library(ggplot2)
data.frame(x, y, band) %>%
ggplot() + geom_point(aes(x, y, color = band)) + coord_fixed() +
theme_dark() # dark theme
Edit to elaborate:
As you first attempted, it would be nice to use the very handy cut() function to calculate the radius color categories.
One way to get categorical (discrete) colors, rather than continuous shading, for your plot color groups is to set your aes color= to a factor column.
To directly get a factor from cut() you may use option ordered_result=TRUE:
band <- cut(dist, nbr_bands, ordered_result=TRUE, labels=1:nbr_bands) # also use `labels=` to specify your own labels
data.frame(x, y, band) %>%
ggplot() + geom_point(aes(x, y, color = band)) + coord_fixed()
Or more simply you may use cut() without options and convert to a factor using as.factor():
band <- as.factor( cut(dist, nbr_bands, labels=FALSE) )
data.frame(x, y, band) %>%
ggplot() + geom_point(aes(x, y, color = band)) + coord_fixed()
Sounds like you may find the circle & arc plotting functions from the ggforce package useful:
# data
set.seed(1234)
df <- data.frame(x = 100 - abs(rnorm(1e6, 0, 5)),
y = 50 + rnorm(1e6, 0, 3)) %>%
mutate(dist = sqrt((x - 100)^2 + (y - 50)^2)) %>%
mutate(z = exp(-(dist / 8)^2))
# define cut-off values
cutoff.values <- seq(0, ceiling(max(df$dist)), by = 5)
df %>%
# calculate the mean z for each distance band
mutate(dist_fct = cut(dist, cutoff.values)) %>%
group_by(dist_fct) %>%
summarise(z = mean(z)) %>%
ungroup() %>%
# add the cutoff values to the dataframe for inner & outer radius
arrange(dist_fct) %>%
mutate(r0 = cutoff.values[-length(cutoff.values)],
r = cutoff.values[-1]) %>%
# add coordinates for circle centre
mutate(x = 100, y = 50) %>%
# plot
ggplot(aes(x0 = x, y0 = y,
r0 = r0, r = r,
fill = z)) +
geom_arc_bar(aes(start = 0, end = 2 * pi),
color = NA) + # hide outline
# force equal aspect ratio in order to get true circle
coord_equal(xlim = c(70, 100), expand = FALSE)
Plot generation took <1s on my machine. Yours may differ.
I'm not sure this satisfies everything, but it should be a start. To cut down on the time for plotting, I'm summarizing the data into a grid, which lets you use geom_raster. I don't entirely understand the breaks and everything you're using, so you might want to tweak some of how I divided the data for making the distinct bands. I tried out a couple ways with cut_interval and cut_width--this would be a good place to plug in different options, such as the number or width of bands.
Since you mentioned getting the average z for each band, I'm grouping by the gridded x and y and the cut dist, then using mean of z for setting bands. I threw in a step to make labels like in the example--you probably want to reverse them or adjust their positioning--but that comes from getting the number of each band's factor level.
library(tidyverse)
set.seed(555)
n <- 1e6
df <- data_frame(
x = 100 - abs(rnorm(n, 0, 5)),
y = 50 + rnorm(n, 0, 3),
dist = sqrt((x - 100)^2 + (y - 50)^2),
z = exp(-(dist / 8)^2)
) %>%
mutate(brk = cut(dist, seq(0, max(dist), by = 5), include.lowest = T))
summarized <- df %>%
filter(!is.na(brk)) %>%
mutate(x_grid = floor(x), y_grid = floor(y)) %>%
group_by(x_grid, y_grid, brk) %>%
summarise(avg_z = mean(z)) %>%
ungroup() %>%
# mutate(z_brk = cut_width(avg_z, width = 0.15)) %>%
mutate(z_brk = cut_interval(avg_z, n = 9)) %>%
mutate(brk_num = as.numeric(z_brk))
head(summarized)
#> # A tibble: 6 x 6
#> x_grid y_grid brk avg_z z_brk brk_num
#> <dbl> <dbl> <fct> <dbl> <fct> <dbl>
#> 1 75 46 (20,25] 0.0000697 [6.97e-05,0.11] 1
#> 2 75 47 (20,25] 0.000101 [6.97e-05,0.11] 1
#> 3 75 49 (20,25] 0.0000926 [6.97e-05,0.11] 1
#> 4 75 50 (20,25] 0.0000858 [6.97e-05,0.11] 1
#> 5 75 52 (20,25] 0.0000800 [6.97e-05,0.11] 1
#> 6 76 51 (20,25] 0.000209 [6.97e-05,0.11] 1
To make the labels, summarize that data to have a single row per band--I did this by taking the minimum of the gridded x, then using the average of y so they'll show up in the middle of the plot.
labels <- summarized %>%
group_by(brk_num) %>%
summarise(min_x = min(x_grid)) %>%
ungroup() %>%
mutate(y_grid = mean(summarized$y_grid))
head(labels)
#> # A tibble: 6 x 3
#> brk_num min_x y_grid
#> <dbl> <dbl> <dbl>
#> 1 1 75 49.7
#> 2 2 88 49.7
#> 3 3 90 49.7
#> 4 4 92 49.7
#> 5 5 93 49.7
#> 6 6 94 49.7
geom_raster is great for these situations where you have data in an evenly spaced grid that just needs uniform tiles at each position. At this point, the summarized data has 595 rows, instead of the original 1 million, so the time to plot shouldn't be an issue.
ggplot(summarized) +
geom_raster(aes(x = x_grid, y = y_grid, fill = z_brk)) +
geom_label(aes(x = min_x, y = y_grid, label = brk_num), data = labels, size = 3, hjust = 0.5) +
theme_void() +
theme(legend.position = "none", panel.background = element_rect(fill = "gray40")) +
coord_fixed() +
scale_fill_brewer(palette = "PuBu")
Created on 2018-11-04 by the reprex package (v0.2.1)

Resources