Optimal/efficient plotting of survival/regression analysis results - r

I perform regression analyses on a daily basis. In my case this typically means estimation of the effect of continuous and categorical predictors on various outcomes. Survival analysis is probably the most common analysis that I perform. Such analyses are often presented in a very convenient way in journals. Here is an example:
I wonder if anyone has come across any publicly availble function or package that can:
directly use a regression object (coxph, lm, lmer, glm or whatever object you have)
plot the effect of each predictor on a forest plot, or perhaps even allow for plotting of a selection of the predictors.
for categorical predictors also display the reference category
Display the number of events in each category for factor variables (see image above). Display p values.
preferably use ggplot
offer some sort of customization
I am aware that sjPlot package allows for plotting of lme4, glm and lm results. But no package allows the abovementioned for coxph results and coxph is one of the most used regression methods. I have tried to create such a function myself but without any success. I have read this great post: Reproduce table and plot from journal but could not figure out how to "generalize" the code.
Any suggestions are much welcome.

Edit I've now put this together into a package on github. I've tested it using output from coxph, lm and glm.
Example:
devtools::install_github("NikNakk/forestmodel")
library("forestmodel")
example(forest_model)
Original code posted on SO (superseded by github package):
I've worked on this specifically for coxph models, though the same technique could be extended to other regression models, especially since it uses the broom package to extract the coefficients. The supplied forest_cox function takes as its arguments the output of coxph. (Data is pulled using model.frame to calculate the number of individuals in each group and to find the reference levels for factors.) It also takes a number of formatting arguments. The return value is a ggplot which can be printed, saved, etc.
The output is modelled on the NEJM figure shown in the question.
library("survival")
library("broom")
library("ggplot2")
library("dplyr")
forest_cox <- function(cox, widths = c(0.10, 0.07, 0.05, 0.04, 0.54, 0.03, 0.17),
colour = "black", shape = 15, banded = TRUE) {
data <- model.frame(cox)
forest_terms <- data.frame(variable = names(attr(cox$terms, "dataClasses"))[-1],
term_label = attr(cox$terms, "term.labels"),
class = attr(cox$terms, "dataClasses")[-1], stringsAsFactors = FALSE,
row.names = NULL) %>%
group_by(term_no = row_number()) %>% do({
if (.$class == "factor") {
tab <- table(eval(parse(text = .$term_label), data, parent.frame()))
data.frame(.,
level = names(tab),
level_no = 1:length(tab),
n = as.integer(tab),
stringsAsFactors = FALSE, row.names = NULL)
} else {
data.frame(., n = sum(!is.na(eval(parse(text = .$term_label), data, parent.frame()))),
stringsAsFactors = FALSE)
}
}) %>%
ungroup %>%
mutate(term = paste0(term_label, replace(level, is.na(level), "")),
y = n():1) %>%
left_join(tidy(cox), by = "term")
rel_x <- cumsum(c(0, widths / sum(widths)))
panes_x <- numeric(length(rel_x))
forest_panes <- 5:6
before_after_forest <- c(forest_panes[1] - 1, length(panes_x) - forest_panes[2])
panes_x[forest_panes] <- with(forest_terms, c(min(conf.low, na.rm = TRUE), max(conf.high, na.rm = TRUE)))
panes_x[-forest_panes] <-
panes_x[rep(forest_panes, before_after_forest)] +
diff(panes_x[forest_panes]) / diff(rel_x[forest_panes]) *
(rel_x[-(forest_panes)] - rel_x[rep(forest_panes, before_after_forest)])
forest_terms <- forest_terms %>%
mutate(variable_x = panes_x[1],
level_x = panes_x[2],
n_x = panes_x[3],
conf_int = ifelse(is.na(level_no) | level_no > 1,
sprintf("%0.2f (%0.2f-%0.2f)", exp(estimate), exp(conf.low), exp(conf.high)),
"Reference"),
p = ifelse(is.na(level_no) | level_no > 1,
sprintf("%0.3f", p.value),
""),
estimate = ifelse(is.na(level_no) | level_no > 1, estimate, 0),
conf_int_x = panes_x[forest_panes[2] + 1],
p_x = panes_x[forest_panes[2] + 2]
)
forest_lines <- data.frame(x = c(rep(c(0, mean(panes_x[forest_panes + 1]), mean(panes_x[forest_panes - 1])), each = 2),
panes_x[1], panes_x[length(panes_x)]),
y = c(rep(c(0.5, max(forest_terms$y) + 1.5), 3),
rep(max(forest_terms$y) + 0.5, 2)),
linetype = rep(c("dashed", "solid"), c(2, 6)),
group = rep(1:4, each = 2))
forest_headings <- data.frame(term = factor("Variable", levels = levels(forest_terms$term)),
x = c(panes_x[1],
panes_x[3],
mean(panes_x[forest_panes]),
panes_x[forest_panes[2] + 1],
panes_x[forest_panes[2] + 2]),
y = nrow(forest_terms) + 1,
label = c("Variable", "N", "Hazard Ratio", "", "p"),
hjust = c(0, 0, 0.5, 0, 1)
)
forest_rectangles <- data.frame(xmin = panes_x[1],
xmax = panes_x[forest_panes[2] + 2],
y = seq(max(forest_terms$y), 1, -2)) %>%
mutate(ymin = y - 0.5, ymax = y + 0.5)
forest_theme <- function() {
theme_minimal() +
theme(axis.ticks.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
axis.text.y = element_blank(),
strip.text = element_blank(),
panel.margin = unit(rep(2, 4), "mm")
)
}
forest_range <- exp(panes_x[forest_panes])
forest_breaks <- c(
if (forest_range[1] < 0.1) seq(max(0.02, ceiling(forest_range[1] / 0.02) * 0.02), 0.1, 0.02),
if (forest_range[1] < 0.8) seq(max(0.2, ceiling(forest_range[1] / 0.2) * 0.2), 0.8, 0.2),
1,
if (forest_range[2] > 2) seq(2, min(10, floor(forest_range[2] / 2) * 2), 2),
if (forest_range[2] > 20) seq(20, min(100, floor(forest_range[2] / 20) * 20), 20)
)
main_plot <- ggplot(forest_terms, aes(y = y))
if (banded) {
main_plot <- main_plot +
geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
forest_rectangles, fill = "#EFEFEF")
}
main_plot <- main_plot +
geom_point(aes(estimate, y), size = 5, shape = shape, colour = colour) +
geom_errorbarh(aes(estimate,
xmin = conf.low,
xmax = conf.high,
y = y),
height = 0.15, colour = colour) +
geom_line(aes(x = x, y = y, linetype = linetype, group = group),
forest_lines) +
scale_linetype_identity() +
scale_alpha_identity() +
scale_x_continuous(breaks = log(forest_breaks),
labels = sprintf("%g", forest_breaks),
expand = c(0, 0)) +
geom_text(aes(x = x, label = label, hjust = hjust),
forest_headings,
fontface = "bold") +
geom_text(aes(x = variable_x, label = variable),
subset(forest_terms, is.na(level_no) | level_no == 1),
fontface = "bold",
hjust = 0) +
geom_text(aes(x = level_x, label = level), hjust = 0, na.rm = TRUE) +
geom_text(aes(x = n_x, label = n), hjust = 0) +
geom_text(aes(x = conf_int_x, label = conf_int), hjust = 0) +
geom_text(aes(x = p_x, label = p), hjust = 1) +
forest_theme()
main_plot
}
Sample data and plot
pretty_lung <- lung %>%
transmute(time,
status,
Age = age,
Sex = factor(sex, labels = c("Male", "Female")),
ECOG = factor(lung$ph.ecog),
`Meal Cal` = meal.cal)
lung_cox <- coxph(Surv(time, status) ~ ., pretty_lung)
print(forest_cox(lung_cox))

For a "write this code for me" question showing no effort, you certainly have a lot of specific demands. This doesn't fit your criteria, but maybe someone will find it useful in base graphics
The plot in the center panel can be just about anything so long as there is one plot per line and kindasorta fits within each. (Actually that's not true, any kind of plot can go in that panel if you want since it's just a normal plotting window). There are three examples in this code: points, box plots, lines.
This is the input data. Just a generic list and indices for "headers" so much better IMO than "directly using a regression object."
## indices of headers
idx <- c(1,5,7,22)
l <- list('Make/model' = rownames(mtcars),
'No. of\ncycles' = mtcars$cyl,
MPG = mtcars$mpg)
l[] <- lapply(seq_along(l), function(x)
ifelse(seq_along(l[[x]]) %in% idx, l[[x]], paste0(' ', l[[x]])))
# List of 3
# $ Make/model : chr [1:32] "Mazda RX4" " Mazda RX4 Wag" " Datsun 710" " Hornet 4 Drive" ...
# $ No. of
# cycles: chr [1:32] "6" " 6" " 4" " 6" ...
# $ MPG : chr [1:32] "21" " 21" " 22.8" " 21.4" ...
I realize this code generates a pdf. I didn't feel like changing it to an image to upload, so I converted it with imagemagick
## choose the type of plot you want
pl <- c('point','box','line')[1]
## extra (or less) c(bottom, left, top, right) spacing for additions in margins
pad <- c(0,0,0,0)
## default padding
oma <- c(1,1,2,1)
## proportional size of c(left, middle, right) panels
xfig = c(.25,.45,.3)
## proportional size of c(caption, main plot)
yfig = c(.15, .85)
cairo_pdf('~/desktop/pl.pdf', height = 9, width = 8)
x <- l[-3]
lx <- seq_along(x[[1]])
nx <- length(lx)
xcf <- cumsum(xfig)[-length(xfig)]
ycf <- cumsum(yfig)[-length(yfig)]
plot.new()
par(oma = oma, mar = c(0,0,0,0), family = 'serif')
plot.window(range(seq_along(x)), range(lx))
## bars -- see helper fn below
par(fig = c(0,1,ycf,1), oma = par('oma') + pad)
bars(lx)
## caption
par(fig = c(0,1,0,ycf), mar = c(0,0,3,0), oma = oma + pad)
p <- par('usr')
box('plot')
rect(p[1], p[3], p[2], p[4], col = adjustcolor('cornsilk', .5))
mtext('\tFigure I: Some fancy statistical model results.',
adj = 0, font = 2, line = -1)
mtext(paste('\tHere we discuss the fancy graphic that you are currently reading',
'about. We worked really hard on it, and you\n\tshould appreciate',
'our hard work by citing this paper in your next manuscript.'),
adj = 0, line = -3)
## left panel -- select two columns
lp <- l[1:2]
par(fig = c(0,xcf[1],ycf,1), oma = oma + vec(pad, 0, 4))
plot_text(lp, c(1,2),
adj = rep(0:1, c(nx, nx)),
font = vec(1, 3, idx, nx),
col = c(rep(1, nx), vec(1, 'transparent', idx, nx))
) -> at
vtext(unique(at$x), max(at$y) + c(1,1.5), names(lp),
font = 2, xpd = NA, adj = c(0,1))
## right panel -- select three columns
rp <- l[c(2:3,3)]
par(fig = c(tail(xcf, -1),1,ycf,1), oma = oma + vec(pad, 0, 2))
plot_text(rp, c(1,2),
col = c(rep(vec(1, 'transparent', idx, nx), 2),
vec('transparent', 2, idx, nx)),
font = vec(1, 3, idx, nx),
adj = rep(c(NA,NA,1), each = nx)
) -> at
vtext(unique(at$x), max(at$y) + c(1.5,1,1), names(rp),
font = 2, xpd = NA, adj = c(NA, NA, 1), col = c(1,1,2))
## middle panel -- some generic plot
par(new = TRUE, fig = c(xcf[1], xcf[2], ycf, 1),
mar = c(0,2,0,2), oma = oma + vec(pad, 0, c(2,4)))
set.seed(1)
xx <- rev(rnorm(length(lx)))
yy <- rev(lx)
plot(xx, yy, ann = FALSE, axes = FALSE, type = 'n',
panel.first = {
segments(0, 0, 0, nx, lty = 'dashed')
},
panel.last = {
## option 1: points, confidence intervals
if (pl == 'point') {
points(xx, yy, pch = 15, col = vec(1, 2, idx, nx))
segments(xx * .5, yy, xx * 1.5, yy, col = vec(1, 2, idx, nx))
}
## option 2: boxplot, distributions
if (pl == 'box')
boxplot(rnorm(200) ~ rep_len(1:nx, 200), at = nx:1,
col = vec(par('bg'), 2, idx, nx),
horizontal = TRUE, axes = FALSE, add = TRUE)
## option 3: trend lines
if (pl == 'line') {
for (ii in 1:nx) {
n <- sample(40, 1)
wh <- which(nx:1 %in% ii)
lines(cumsum(rep(.1, n)) - 2, wh + cumsum(runif(n, -.2, .2)), xpd = NA,
col = (ii %in% idx) + 1L, lwd = c(1,3)[(ii %in% idx) + 1L])
}
}
## final touches
mtext('HR (95% confidence interval)', font = 2, line = -.5)
axis(1, at = -3:2, tcl = 0.2, mgp = c(0,0,0))
mtext(c('Worse','Better'), side = 1, line = 1, at = c(-4, 3))
try(silent = TRUE, {
## can just replace this with graphics::arrows with minor changes
## i just like the filled ones
rawr::arrows2(-.1, -1.5, -3, size = .5, width = .5)
rawr::arrows2(0.1, -1.5, 2, size = .5, width = .5)
})
}
)
box('outer')
dev.off()
Using these four helper functions (see example use in the body)
vec <- function(default, replacement, idx, n) {
# vec(1, 0, 2:3, 5); vec(1:5, 0, 2:3)
out <- if (missing(n))
default else rep(default, n)
out[idx] <- replacement
out
}
bars <- function(x, cols = c(NA, grey(.9)), horiz = TRUE) {
# plot(1:10, type = 'n'); bars(1:10)
p <- par('usr')
cols <- vec(cols[1], cols[2], which(!x %% 2), length(x))
x <- rev(x) + 0.5
if (horiz)
rect(p[1], x - 1L, p[2], x, border = NA, col = rev(cols), xpd = NA) else
rect(x - 1L, p[3], x, p[4], border = NA, col = rev(cols), xpd = NA)
invisible()
}
vtext <- function(...) {Vectorize(text.default)(...); invisible()}
plot_text <- function(x, width = range(seq_along(x)), ...) {
# plot(col(mtcars), row(mtcars), type = 'n'); plot_text(mtcars)
lx <- lengths(x)[1]
rn <- range(seq_along(x))
sx <- (seq_along(x) - 1) / diff(rn) * diff(width) + width[1]
xx <- rep(sx, each = lx)
yy <- rep(rev(seq.int(lx)), length(x))
vtext(xx, yy, unlist(x), ..., xpd = NA)
invisible(list(x = sx, y = rev(seq.int(lx))))
}

Related

Combined persp and ggplot graph on the same window, persp too small

I am combining a persp graph and a ggplot graph in the same window using plot_grid. However, the persp graph is too small, how can I make it bigger?
library(pacman)
p_load(tidyverse)
p_load(mvtnorm)
p_load(cowplot)
p_load(gridGraphics)
p_load(GA)
my_mean<-c(25,65)
mycors<-seq(-1,1,by=.25)
sd_vec<-c(5,7)
i<-3
temp_cor<-matrix(c(1,mycors[i],
mycors[i],1),
byrow = T,ncol=2)
V<-sd_vec %*% t(sd_vec) *temp_cor
my_x<-seq(my_mean[1]-3*sd_vec[1], my_mean[1]+3*sd_vec[1], length.out=20)
my_y<-seq(my_mean[2]-3*sd_vec[2], my_mean[2]+3*sd_vec[2], length.out=20)
temp_f<-function(a,b){dmvnorm(cbind(a,b), my_mean,V)}
my_z<-outer(my_x, my_y,temp_f)
nlevels<-20
my_zlim <- range(my_z, finite = TRUE)
my_levels <- pretty(my_zlim, nlevels)
zz <- (my_z[-1, -1] + my_z[-1, -ncol(my_z)] + my_z[-nrow(my_z), -1] + my_z[-nrow(my_z),
-ncol(my_z)])/4
cols <- jet.colors(length(my_levels) - 1)
zzz <- cut(zz, breaks = my_levels, labels = cols)
persp(my_x, my_y, my_z, theta = -25, phi = 45, expand = 0.5,xlab="x",ylab="y",zlab="f(x,y)",col = as.character(zzz))
p1 <- recordPlot()
data.grid <- expand.grid(x = seq(my_mean[1]-3*sd_vec[1], my_mean[1]+3*sd_vec[1], length.out=200),
y = seq(my_mean[2]-3*sd_vec[2], my_mean[2]+3*sd_vec[2], length.out=200))
q.samp <- cbind(data.grid, prob = dmvnorm(data.grid, mean = my_mean, sigma = V))
p2<-ggplot(q.samp, aes(x, y, z = prob)) +
geom_contour(aes(color = ..level..), bins = 11, size = 1) +
scale_color_gradientn(colours = jet.colors(11)) +
theme_bw()
plot_grid(p1, p2)
Created on 2020-10-31 by the reprex package (v0.3.0)
I think there are two things you need to do:
Set par(mar = c(0, 0, 0, 0)) before calling persp. Ensure you save your default parameters before and reset them afterwards.
Resize your plotting window to give it a wider aspect ratio
So basically you can change your persp call to:
par_store <- par()
par(mar = c(0, 0, 0, 0))
persp(my_x, my_y, my_z, theta = -25, phi = 45, expand = 0.5,
xlab = "x", ylab = "y", zlab = "f(x,y)", col = as.character(zzz))
p1 <- recordPlot()
par(par_store)
And after resizing the plotting window you get:

Trying to recreate plot with tresholds in ggplot

So basically I'm trying to recreate this plot in ggplot, to match my theme:
and I've come pretty close:
but I can't recreate the treshold in my plot. How can I possibly add this to my ggplot? Here is the source code of the original plotting function:
function (data, option = c("alpha", "xi", "quantile"), start = 15,end = NA,
reverse = FALSE, p = NA, ci = 0.95, auto.scale = TRUE, labels = TRUE, ...)
{
if (is.timeSeries(data))
data <- as.vector(series(data))
data <- as.numeric(data)
ordered <- rev(sort(data))
ordered <- ordered[ordered > 0]
n <- length(ordered)
option <- match.arg(option)
if ((option == "quantile") && (is.na(p)))
stop("\nInput a value for the probability p.\n")
if ((option == "quantile") && (p < 1 - start/n)) {
cat("Graph may look strange !! \n\n")
cat(paste("Suggestion 1: Increase `p' above", format(signif(1 -
start/n, 5)), "\n"))
cat(paste("Suggestion 2: Increase `start' above ", ceiling(length(data) *
(1 - p)), "\n"))
}
k <- 1:n
loggs <- logb(ordered)
avesumlog <- cumsum(loggs)/(1:n)
xihat <- c(NA, (avesumlog - loggs)[2:n])
alphahat <- 1/xihat
y <- switch(option, alpha = alphahat, xi = xihat, quantile = ordered *
((n * (1 - p))/k)^(-1/alphahat))
ses <- y/sqrt(k)
if (is.na(end))
end <- n
x <- trunc(seq(from = min(end, length(data)), to = start))
y <- y[x]
ylabel <- option
yrange <- range(y)
if (ci && (option != "quantile")) {
qq <- qnorm(1 - (1 - ci)/2)
u <- y + ses[x] * qq
l <- y - ses[x] * qq
ylabel <- paste(ylabel, " (CI, p =", ci, ")", sep = "")
yrange <- range(u, l)
}
if (option == "quantile")
ylabel <- paste("Quantile, p =", p)
index <- x
if (reverse)
index <- -x
if (auto.scale) {
plot(index, y, ylim = yrange, type = "l", xlab = "",
ylab = "", axes = FALSE, ...)
}
else {
plot(index, y, type = "l", xlab = "", ylab = "", axes = FALSE,
...)
}
axis(1, at = index, labels = paste(x), tick = FALSE)
axis(2)
threshold <- findthreshold(data, x)
axis(3, at = index, labels = paste(format(signif(threshold,
3))), tick = FALSE)
box()
if (ci && (option != "quantile")) {
lines(index, u, lty = 2, col = 2)
lines(index, l, lty = 2, col = 2)
}
if (labels) {
title(xlab = "Order Statistics", ylab = ylabel)
mtext("Threshold", side = 3, line = 3)
}
return(invisible(list(x = index, y = y)))
}
Thanks for your help!
I think you are looking for the sec.axis argument for scale_x_continuous(). To make sure everything lined up, I had to create a function to find every nth value. Hope this helps
set.seed(1234)
df <- tibble(
x = 1:50,
threshold = round(1+rnorm(1:50), 2),
base_y = c(0.1, runif(49, -1, 2)/5),
mid = cumsum(base_y),
upper = mid + 5,
lower = mid - 5
)
every_nth <- function(x, n) {
x[seq(0, length(x), n)]
}
ggplot(df, aes(x = x)) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2) +
geom_line(aes(y = mid)) +
scale_x_continuous(
breaks = every_nth(df$x, 10),
sec.axis = dup_axis(
labels = every_nth(df$threshold, 10),
name = "Threshold"
)
) +
theme_minimal()

Equal spacing with multiple atop

I'm trying to create a legend in a ggplot2 graph with multiple lines and a parameter and value on each line. Since I have symbols as variables, this needs to be done with expression. To create new lines, I have used multiple atop commands, but this leads to uneven spacing in the final line. Please see my following example:
library(ggplot2)
N = 25
a = -5
b = 2
sigma = 1
x = runif(N, 0, 10)
y = a + x * b + rnorm(N, sd = sigma)
df = data.frame(x, y)
ggplot(df, aes(x, y)) +
geom_point() +
geom_label(aes(x = 1, y = max(y) - 2),
label = paste0("atop(atop(",
"textstyle(a == ", a, "),",
"textstyle(b == ", b, ")),",
"textstyle(sigma == ", sigma, "))"
), parse = TRUE
)
ggsave("plotmath_atop.png", width = 6, height = 4, scale = 1)
This gives the following plot:
As you can see, the spacing between the lines b=2 and \sigma=1 is noticeably larger than the spacing between the lines a=-5 and b=2.
Is there a way of using expression with multiple line breaks while still having even spacing between each line?
you could use gridExtra::tableGrob,
library(gridExtra)
library(grid)
table_label <- function(label, params=list()) {
params <- modifyList(list(hjust=0, x=0), params)
mytheme <- ttheme_minimal(padding=unit(c(1, 1), "mm"),
core = list(fg_params = params), parse=TRUE)
disect <- strsplit(label, "\\n")[[1]]
m <- as.matrix(disect)
tg <- tableGrob(m, theme=mytheme)
bg <- roundrectGrob(width = sum(tg$widths) + unit(3, "mm"), height = sum(tg$heights) + unit(3, "mm"))
grobTree(bg, tg)
}
txt <- 'a == -5\n
b == 2\n
sigma == 1'
library(ggplot2)
qplot(1:10,1:10) +
annotation_custom(table_label(txt), xmin=0, xmax=5, ymin=7.5)
A simple solution is to avoid the use of expressions, print the sigma letter using the unicode character \u03c3, and use \n for line breaking.
library(ggplot2)
N = 25
a = -5
b = 2
sigma = 1
df = data.frame(runif(N, 0, 10), a + x * b + rnorm(N, sd = sigma))
lab <- paste0("a = ", a, "\n",
"b = ", b, "\n",
"\u03c3 = ", sigma)
ggplot(df, aes(x, y)) +
geom_point() +
geom_label(aes(x = 1, y = max(y) - 2), label = lab, parse = FALSE)
ggsave("plot_multiline_label.png", width = 6, height = 4, scale = 1)

Fix interpolated polar contour plot function to works with current R and (possibly) use ggplot

The question R interpolated polar contour plot shows an excellent way to produce interpolated polar plots in R. I include the very slightly modified version I'm using:
PolarImageInterpolate <- function(
### Plotting data (in cartesian) - will be converted to polar space.
x, y, z,
### Plot component flags
contours=TRUE, # Add contours to the plotted surface
legend=TRUE, # Plot a surface data legend?
axes=TRUE, # Plot axes?
points=TRUE, # Plot individual data points
extrapolate=FALSE, # Should we extrapolate outside data points?
### Data splitting params for color scale and contours
col_breaks_source = 1, # Where to calculate the color brakes from (1=data,2=surface)
# If you know the levels, input directly (i.e. c(0,1))
col_levels = 10, # Number of color levels to use - must match length(col) if
#col specified separately
col = rev(heat.colors(col_levels)), # Colors to plot
# col = rev(heat.colors(col_levels)), # Colors to plot
contour_breaks_source = 1, # 1=z data, 2=calculated surface data
# If you know the levels, input directly (i.e. c(0,1))
contour_levels = col_levels+1, # One more contour break than col_levels (must be
# specified correctly if done manually
### Plotting params
outer.radius = ceiling(max(sqrt(x^2+y^2))),
circle.rads = pretty(c(0,outer.radius)), #Radius lines
spatial_res=1000, #Resolution of fitted surface
single_point_overlay=0, #Overlay "key" data point with square
#(0 = No, Other = number of pt)
### Fitting parameters
interp.type = 1, #1 = linear, 2 = Thin plate spline
lambda=0){ #Used only when interp.type = 2
minitics <- seq(-outer.radius, outer.radius, length.out = spatial_res)
# interpolate the data
if (interp.type ==1 ){
Interp <- akima:::interp(x = x, y = y, z = z,
extrap = extrapolate,
xo = minitics,
yo = minitics,
linear = FALSE)
Mat <- Interp[[3]]
}
else if (interp.type == 2){
library(fields)
grid.list = list(x=minitics,y=minitics)
t = Tps(cbind(x,y),z,lambda=lambda)
tmp = predict.surface(t,grid.list,extrap=extrapolate)
Mat = tmp$z
}
else {stop("interp.type value not valid")}
# mark cells outside circle as NA
markNA <- matrix(minitics, ncol = spatial_res, nrow = spatial_res)
Mat[!sqrt(markNA ^ 2 + t(markNA) ^ 2) < outer.radius] <- NA
### Set contour_breaks based on requested source
if ((length(contour_breaks_source == 1)) & (contour_breaks_source[1] == 1)){
contour_breaks = seq(min(z,na.rm=TRUE),max(z,na.rm=TRUE),
by=(max(z,na.rm=TRUE)-min(z,na.rm=TRUE))/(contour_levels-1))
}
else if ((length(contour_breaks_source == 1)) & (contour_breaks_source[1] == 2)){
contour_breaks = seq(min(Mat,na.rm=TRUE),max(Mat,na.rm=TRUE),
by=(max(Mat,na.rm=TRUE)-min(Mat,na.rm=TRUE))/(contour_levels-1))
}
else if ((length(contour_breaks_source) == 2) & (is.numeric(contour_breaks_source))){
contour_breaks = pretty(contour_breaks_source,n=contour_levels)
contour_breaks = seq(contour_breaks_source[1],contour_breaks_source[2],
by=(contour_breaks_source[2]-contour_breaks_source[1])/(contour_levels-1))
}
else {stop("Invalid selection for \"contour_breaks_source\"")}
### Set color breaks based on requested source
if ((length(col_breaks_source) == 1) & (col_breaks_source[1] == 1))
{zlim=c(min(z,na.rm=TRUE),max(z,na.rm=TRUE))}
else if ((length(col_breaks_source) == 1) & (col_breaks_source[1] == 2))
{zlim=c(min(Mat,na.rm=TRUE),max(Mat,na.rm=TRUE))}
else if ((length(col_breaks_source) == 2) & (is.numeric(col_breaks_source)))
{zlim=col_breaks_source}
else {stop("Invalid selection for \"col_breaks_source\"")}
# begin plot
Mat_plot = Mat
Mat_plot[which(Mat_plot<zlim[1])]=zlim[1]
Mat_plot[which(Mat_plot>zlim[2])]=zlim[2]
image(x = minitics, y = minitics, Mat_plot , useRaster = TRUE, asp = 1, axes = FALSE, xlab = "", ylab = "", zlim = zlim, col = col)
# add contours if desired
if (contours){
CL <- contourLines(x = minitics, y = minitics, Mat, levels = contour_breaks)
A <- lapply(CL, function(xy){
lines(xy$x, xy$y, col = gray(.2), lwd = .5)
})
}
# add interpolated point if desired
if (points){
points(x, y, pch = 21, bg ="blue")
}
# add overlay point (used for trained image marking) if desired
if (single_point_overlay!=0){
points(x[single_point_overlay],y[single_point_overlay],pch=0)
}
# add radial axes if desired
if (axes){
# internals for axis markup
RMat <- function(radians){
matrix(c(cos(radians), sin(radians), -sin(radians), cos(radians)), ncol = 2)
}
circle <- function(x, y, rad = 1, nvert = 500){
rads <- seq(0,2*pi,length.out = nvert)
xcoords <- cos(rads) * rad + x
ycoords <- sin(rads) * rad + y
cbind(xcoords, ycoords)
}
# draw circles
if (missing(circle.rads)){
circle.rads <- pretty(c(0,outer.radius))
}
for (i in circle.rads){
lines(circle(0, 0, i), col = "#66666650")
}
# put on radial spoke axes:
axis.rads <- c(0, pi / 6, pi / 3, pi / 2, 2 * pi / 3, 5 * pi / 6)
r.labs <- c(90, 60, 30, 0, 330, 300)
l.labs <- c(270, 240, 210, 180, 150, 120)
for (i in 1:length(axis.rads)){
endpoints <- zapsmall(c(RMat(axis.rads[i]) %*% matrix(c(1, 0, -1, 0) * outer.radius,ncol = 2)))
segments(endpoints[1], endpoints[2], endpoints[3], endpoints[4], col = "#66666650")
endpoints <- c(RMat(axis.rads[i]) %*% matrix(c(1.1, 0, -1.1, 0) * outer.radius, ncol = 2))
lab1 <- bquote(.(r.labs[i]) * degree)
lab2 <- bquote(.(l.labs[i]) * degree)
text(endpoints[1], endpoints[2], lab1, xpd = TRUE)
text(endpoints[3], endpoints[4], lab2, xpd = TRUE)
}
axis(2, pos = -1.25 * outer.radius, at = sort(union(circle.rads,-circle.rads)), labels = NA)
text( -1.26 * outer.radius, sort(union(circle.rads, -circle.rads)),sort(union(circle.rads, -circle.rads)), xpd = TRUE, pos = 2)
}
# add legend if desired
# this could be sloppy if there are lots of breaks, and that's why it's optional.
# another option would be to use fields:::image.plot(), using only the legend.
# There's an example for how to do so in its documentation
if (legend){
library(fields)
image.plot(legend.only=TRUE, smallplot=c(.78,.82,.1,.8), col=col, zlim=zlim)
# ylevs <- seq(-outer.radius, outer.radius, length = contour_levels+ 1)
# #ylevs <- seq(-outer.radius, outer.radius, length = length(contour_breaks))
# rect(1.2 * outer.radius, ylevs[1:(length(ylevs) - 1)], 1.3 * outer.radius, ylevs[2:length(ylevs)], col = col, border = NA, xpd = TRUE)
# rect(1.2 * outer.radius, min(ylevs), 1.3 * outer.radius, max(ylevs), border = "#66666650", xpd = TRUE)
# text(1.3 * outer.radius, ylevs[seq(1,length(ylevs),length.out=length(contour_breaks))],round(contour_breaks, 1), pos = 4, xpd = TRUE)
}
}
Unfortunately, this function has a few bugs:
a) Even with a purely radial pattern, the produced plot has a distortion whose origin I don't understand:
#example
r <- rep(seq(0.1, 0.9, len = 8), each = 8)
theta <- rep(seq(0, 7/4*pi, by = pi/4), times = 8)
x <- r*sin(theta)
y <- r*cos(theta)
z <- z <- rep(seq(0, 1, len = 8), each = 8)
PolarImageInterpolate(x, y, z)
why the wiggles between 300° and 360°? The z function is constant in theta, so there's no reason why there should be wiggles.
b) After 4 years, some of the packages loaded have been modified and at least one functionality of the function is broken. Setting interp.type = 2 should use thin plate splines for interpolation instead than a basic linear interpolation, but it doesn't work:
> PolarImageInterpolate(x, y, z, interp.type = 2)
Warning:
Grid searches over lambda (nugget and sill variances) with minima at the endpoints:
(GCV) Generalized Cross-Validation
minimum at right endpoint lambda = 9.493563e-06 (eff. df= 60.80002 )
predict.surface is now the function predictSurface
Error in image.default(x = minitics, y = minitics, Mat_plot, useRaster = TRUE, :
'z' must be a matrix
the first message is a warning and doesn't worry me, but the second one is actually an error and prevents me from using thin plate splines. Can you help me solve these two problems?
Also, I'd like to "upgrade" to using ggplot2, so if you can give an answer which does that, it would be great. Otherwise, after the bugs are fixed, I'll try asking a specific question which only asks to modify the function so that it uses ggplot2.
For the ggplot2 solution, here is a start. geom_raster allows interpolation, but does not work for polar coordinates. Instead, you can use geom_tile, though then you may need to do the interpolation yourself before passing the values to ggplot.
Of important note: the example data you gave gives an error when working with geom_raster that I believe is caused by the spacing of the values. Here is an example set that works (note, used this blog as a guide, though it is now outdated):
dat_grid <-
expand.grid(x = seq(0,350,10), y = 0:10)
dat_grid$density <- runif(nrow(dat_grid))
ggplot(dat_grid
, aes(x = x, y = y, fill = density)) +
geom_tile() +
coord_polar() +
scale_x_continuous(breaks = seq(0,360,90)) +
scale_fill_gradient2(low = "white"
, mid = "yellow"
, high = "red3"
, midpoint = 0.5)
If you are working from raw data, you might be able to get ggplot to do the work for you. Here is an example working from raw data. There are a lot of manual tinkering things to do, but it is at least an optional starting point:
polarData <-
data.frame(
theta = runif(10000, 0, 2*pi)
, r = log(abs(rnorm(10000, 0, 10)))
)
toCart <-
data.frame(
x = polarData$r * cos(polarData$theta)
, y = polarData$r * sin(polarData$theta)
)
axisLines <-
data.frame(
x = 0
, y = 0
, xend = max(polarData$r)*cos(seq(0, 2*pi, pi/4))
, yend = max(polarData$r)*sin(seq(0, 2*pi, pi/4))
, angle = paste(seq(0, 2, 1/4), "pi") )
ticks <-
data.frame(
label = pretty(c(0, max(polarData$r)) )[-1]
)
ggplot(toCart) +
# geom_point(aes(x = x, y = y)) +
stat_density_2d(aes(x = x, y = y
, fill = ..level..)
, geom = "polygon") +
scale_fill_gradient(low = "white"
, high = "red3") +
theme(axis.text = element_blank()
, axis.title = element_blank()
, axis.line = element_blank()
, axis.ticks = element_blank()) +
geom_segment(data = axisLines
, aes(x = x, y = y
, xend = xend
, yend = yend)) +
geom_label(data = axisLines
, aes(x = xend, y = yend, label = angle)) +
geom_label(data = ticks
, aes(x = 0, y = label, label = label))
From an another post, I came to know that the fucnction predict.surface from package fields is deprecated whic is used for interp.type = 2 in PolarImageInterpolate. Instead, a new predictSurface function is introduced, which can be used here.
Example:
r <- rep(seq(0.1, 0.9, len = 8), each = 8)
theta <- rep(seq(0, 7/4*pi, by = pi/4), times = 8)
x <- r*sin(theta)
y <- r*cos(theta)
z <- z <- rep(seq(0, 1, len = 8), each = 8)
PolarImageInterpolate(x, y, z, interp.type = 2)

ggplot2 grey-scale-schemes: suggestions for clarifying changepoint plot

I can plot multiple simultaneous time series that undergo changepoints and regimes using ggplot2, and I can use colour to make the regimes clear (plotting different sections in different colors using geom_rect). I need to produce a plot that makes it clear where the regimes are without the use of color. With three regimes it is possible to distinguish between the regimes using white, black and gray for shading, but it is difficult to tell them apart if more than three regimes are present.
I've put an example of a plot that I can make using color, I'd be very grateful if someone can suggest a plot that conveys the same information without the use of color.
library(ggplot2)
library(scales)
# generate 3 time series and store them in a data frame
generate_cp_ts <- function(tau, params) {
ts(c(arima.sim(model = list(ar = 0.2), n = tau[1], rand.gen = function(n) params[1] * rnorm(n)), arima.sim(model = list(ar = 0.2), n = tau[2] - tau[1], rand.gen = function(n) params[2] * rnorm(n)), arima.sim(model = list(ar = 0.2), n = tau[3] - tau[2], rand.gen = function(n) params[3] * rnorm(n)), arima.sim(model = list(ar = 0.2), n = tau[4] - tau[3], rand.gen = function(n) params[4] * rnorm(n))))
}
tau <- 100 * (1:4)
ts1 <- generate_cp_ts(tau, c(1.7, 0.3, 1.7, 1.7))
ts2 <- generate_cp_ts(tau, c(0.3, 2, 0.3, 0.9))
ts3 <- generate_cp_ts(tau, c(2, 2, 0.1, 0.7))
tsframe <- data.frame(ts = c(ts1, ts2, ts3), ts_level = factor(paste("Time Series", rep(1:3, each = 400))), time = rep(1:400, 3))
# Work out which colors are needed to color the plot and store in a data frame
CPs <- c(0, tau)
colour.frame <- data.frame(regime.from = rep(CPs[-length(CPs)], each = 3), regime.to = rep(CPs[-1], each = 3), ts_level = factor(paste("Time Series", rep(c(1:3), length(CPs) - 1))), regime = factor(c(0,0,0, 1,1,0, 0,0,1, 0,2,2) + 1))
# Plotting
qplot(x = time, y = ts, data = tsframe, facets = ts_level ~ ., alpha = I(1), geom = "line", ylab = "Time Series", xlab = "Time") +
geom_rect(aes(NULL, NULL, xmin = regime.from, xmax = regime.to, fill = regime), ymin = -6, ymax = 6, data = colour.frame) +
scale_fill_manual(values = alpha(c("blue", "red", "green"), 0.2))
Plot generated by the above code
After you created colour.frame you can insert this code:
tdf <- colour.frame
tdf$xval <- (tdf$regime.from + tdf$regime.to)/2
tdf$yval <- max(tsframe$ts) * 0.8 # if 0.8 is higher (0.9) then the text is set higher
ggplot(tsframe, aes(x = time, y = ts)) +
geom_line() +
facet_grid(ts_level~.) +
geom_vline(xintercept = CPs) + # maybe play around with linetype
geom_text(aes(x = xval, y = yval, label = regime), data = tdf)
which gives this plot:

Resources