Remove whiskers and outliers in R plotly - r

I have continuous data that I'd like to plot using R's plotly with a box or violin plot without the outliers and whiskers:
set.seed(1)
df <- data.frame(group=c(rep("g1",500),rep("g2",700),rep("g3",600)),
value=c(c(rep(0,490),runif(10,10,15)),abs(rnorm(700,1,10)),c(rep(0,590),runif(10,10,15))),
stringsAsFactors = F)
df$group <- factor(df$group, levels = c("g1","g2","g3"))
I know how to remove outliers in plotly:
plotly::plot_ly(x = df$group, y =df$value, type = 'box', color = df$group, boxpoints = F, showlegend = F)
But I'm still left with the whiskers.
I tried using ggplot2 for that (also limiting the height of the y-axis to that of the 75 percentile):
library(ggplot2)
gp <- ggplot(df, aes(group, value, color = group, fill = group)) + geom_boxplot(outlier.shape = NA, coef = 0) +
scale_y_continuous(limits = c(0, ceiling(max(dplyr::summarise(dplyr::group_by(df, group), tile = quantile(value, probs = 0.75))$tile)))) +
theme_minimal() + theme(legend.position = "none",axis.title = element_blank())
But then trying to convert that to a plotly object doesn't maintain that:
plotly::ggplotly(gp)
Any idea?

This is a workaround.
I changed your plot a bit, first.
# box without outliers
p <- plot_ly(df, x = ~group, y = ~value, type = 'box',
color = ~group, boxpoints = F, showlegend = F,
whiskerwidth = 0, line = list(width = 0)) # no whisker, max or min line
Then I add the medians back to the graph. This requires calculating the medians, matching the colors, and creating the shape lists for Plotly.
For the colors, it's odd, the first three default colors are used, but the order is g3, g2, g1...
# the medians
res = df %>% group_by(group) %>%
summarise(med = median(value))
# default color list: https://community.plotly.com/t/plotly-colours-list/11730/2
col = rev(c('#1f77b4', '#ff7f0e', '#2ca02c')) # the plot is colored 3, 2, 1
# discrete x-axis; domain default [0, 1]
# default box margin = .08, three groups, each get 1/3 of space
details <- function(col){ # need everytime basics
list(type = 'line',
line = list(color = col, width = 4),
xref = "paper", yref = "y")
}
# horizontal segments/ median
segs = lapply(1:nrow(res),
function(k){
x1 <- k/3 - .08 # if the domain is [0, 1]
x0 <- (k - 1)/3 + .08
y0 <- y1 <- res[k, ]$med
line = list("x0" = x0, "x1" = x1,
"y0" = y0, "y1" = y1)
deets = details(col[k])
c(deets, line)
})
Finally, I added them back onto the plot.
p %>% layout(shapes = segs)
I made the lines obnoxiously wide, but you get the idea.
If you wanted the IQR outline back, you could do this, as well. I used functions here, as well. I figured that the data you've provided is not the actual data, so the function will serve a purpose.
# include IQR outline
res2 = df %>% group_by(group) %>%
summarise(q1 = setNames(quantile(value, type = 7, 1/4), NULL),
q3 = setNames(quantile(value, type = 7, 3/4), NULL),
med = median(value))
# IQR segments
rects = lapply(1:nrow(res2), # if the domain is [0, 1]
function(k){
x1 <- k/3 - .08
x0 <- (k - 1)/3 + .08
y0 <- res2[k, ]$q1
y1 <- res2[k, ]$q3
line = list(color = col[k], width = 4)
rect = list("x0" = x0, "x1" = x1,
"y0" = y0, "y1" = y1,
type = "rect", xref = "paper",
yref = "y", "line" = line)
rect
})
rects = append(segs, rects)
p %>% layout(shapes = rects)

Related

Shaded violin plot by group

I'm trying to produce a variation of a grouped violin plot in R (preferably using ggplot2), similar to the one below:
which was produced by the following reproducible example code:
# Load libraries #
library(tidyverse)
# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100),
Y = rgamma(n = 200, shape = 2, rate = 2),
Z = rep(c("Za", "Zb"), rep = 100),
stringsAsFactors = FALSE)
# Grouped violin plot #
df %>%
ggplot(., aes(x = X, y = Y, fill = Z)) +
geom_violin(draw_quantiles = 0.5) +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))
The variation I'd like to have is that the density above the median should have a different shade compared to the density below the median, as in the following plot:
I produced the above (single) violin plot for the combination X = X1 and Z = Za in the data, using the following code:
## Shaded violin plot ##
# Calculate limits and median #
df.lim <- df %>%
filter(X == "X1", Z == "Za") %>%
summarise(Y_min = min(Y),
Y_qnt = quantile(Y, 0.5),
Y_max = max(Y))
# Calculate density, truncate at limits and assign shade category #
df.dens <- df %>%
filter(X == "X1", Z == "Za") %>%
do(data.frame(LOC = density(.$Y)$x,
DENS = density(.$Y)$y)) %>%
filter(LOC >= df.lim$Y_min, LOC <= df.lim$Y_max) %>%
mutate(COL = ifelse(LOC > df.lim$Y_qnt, "Empty", "Filled"))
# Find density values at limits #
df.lim.2 <- df.dens %>%
filter(LOC == min(LOC) | LOC == max(LOC))
# Produce shaded single violin plot #
df.dens %>%
ggplot(aes(x = LOC)) +
geom_area(aes(y = DENS, alpha = COL), fill = "red") +
geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
geom_path(aes(y = DENS)) +
geom_path(aes(y = -DENS)) +
geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
coord_flip() +
scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))
As you will notice in the code, I'm building the violin plot from scratch using the density function horizontally and then flipping the axes. The problem arises when I try to produce a grouped violin plot mainly because the axis in which the groups X and Z will appear, is already used for the "height" of the density. I did try to reach the same result by repeating all the calculations by groups but I'm stuck in the final step:
## Shaded grouped violin plot ##
# Calculate limits and median by group #
df.lim <- df %>%
group_by(X, Z) %>%
summarise(Y_min = min(Y),
Y_qnt = quantile(Y, 0.5),
Y_max = max(Y))
# Calculate density, truncate at limits and assign shade category by group #
df.dens <- df %>%
group_by(X, Z) %>%
do(data.frame(LOC = density(.$Y)$x,
DENS = density(.$Y)$y)) %>%
left_join(., df.lim, by = c("X", "Z")) %>%
filter(LOC >= Y_min, LOC <= Y_max) %>%
mutate(COL = ifelse(LOC > Y_qnt, "Empty", "Filled"))
# Find density values at limits by group #
df.lim.2 <- df.dens %>%
group_by(X, Z) %>%
filter(LOC == min(LOC) | LOC == max(LOC))
# Produce shaded grouped violin plot #
df.dens %>%
ggplot(aes(x = LOC, group = interaction(X, Z))) +
# The following two lines don't work when included #
#geom_area(aes(y = DENS, alpha = COL), fill = "red") +
#geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
geom_path(aes(y = DENS)) +
geom_path(aes(y = -DENS)) +
geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
coord_flip() +
scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))
Running the code above will produce the outline of the violin plots for each group, each one on top of the other. But once I try to include the geom_area lines, the code fails.
My gut feeling tells me that I would need to somehow produce the "shaded" violin plot as a new geom which can then be used under the general structure of ggplot2 graphics but I have no idea how to do that, as my coding skills don't extend that far. Any help or pointers, either along my line of thought or in a different direction would be much appreciated. Thank you for your time.
Idea
For the fun of it, I hacked a quick half-violin geom. It is basically a lot of copy & paste from GeomViolin and in order to make it run I had to access some of the internal ggplot2 function, which are not exported via ::: which means that this solution may not run in the future (if the ggplot team decides to change their internal functions).
However, this solution works and you can specify the alpha level of both the upper and the lower part. The geom assumes that you are providing just one quantile. The code is only superficially tested but it gives you an idea of how this can be done. As said it is in large part a simple copy & paste from GeomViolin where I added some code which finds out which values are below and above the quantile and splits the underlying GeomPolygon in 2 parts, as this function uses just a single alpha value. It works with groups and coord_flip likewise.
Code
library(grid)
GeomHalfViolin <- ggproto("GeomHalfViolin", GeomViolin,
draw_group = function (self, data, ..., draw_quantiles = NULL,
alpha_upper = .5, alpha_lower = 1) {
data <- transform(data, xminv = x - violinwidth * (x - xmin),
xmaxv = x + violinwidth * (xmax - x))
newdata <- rbind(transform(data, x = xminv)[order(data$y),
], transform(data, x = xmaxv)[order(data$y, decreasing = TRUE),
])
newdata <- rbind(newdata, newdata[1, ])
if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <=
1))
stopifnot(length(draw_quantiles) <= 1)
## need to add ggplot::: to access ggplot2 internal functions here and there
quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
###------------------------------------------------
## find out where the quantile is supposed to be
quantile_line <- unique(quantiles$y)
## which y values are below this quantile?
ind <- newdata$y <= quantile_line
## set the alpha values accordingly
newdata$alpha[!ind] <- alpha_upper
newdata$alpha[ind] <- alpha_lower
###------------------------------------------------
aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data),
c("x", "y", "group")), drop = FALSE]
aesthetics$alpha <- rep(1, nrow(quantiles))
both <- cbind(quantiles, aesthetics)
both <- both[!is.na(both$group), , drop = FALSE]
quantile_grob <- if (nrow(both) == 0) {
zeroGrob()
}
else {
GeomPath$draw_panel(both, ...)
}
###------------------------------------------------
## GeomPolygon uses a single alpha value by default
## Hence, split the violin in two parts
ggplot2:::ggname("geom_half_violin",
grobTree(GeomPolygon$draw_panel(newdata[ind, ], ...),
GeomPolygon$draw_panel(newdata[!ind, ], ...),
quantile_grob))
###------------------------------------------------
}
else {
ggplot2:::ggname("geom_half_violin", GeomPolygon$draw_panel(newdata,
...))
}
}
)
geom_half_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
position = "dodge", ..., draw_quantiles = NULL,
alpha_upper = .5, alpha_lower = 1,
trim = TRUE, scale = "area",
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {
layer(data = data, mapping = mapping, stat = stat, geom = GeomHalfViolin,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles,
alpha_lower = alpha_lower, alpha_upper = alpha_upper,
na.rm = na.rm, ...))
}
library(tidyverse)
# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100),
Y = rgamma(n = 200, shape = 2, rate = 2),
Z = rep(c("Za", "Zb"), rep = 100),
stringsAsFactors = FALSE)
# Grouped violin plot #
df %>%
ggplot(., aes(x = X, y = Y, fill = Z)) +
geom_half_violin(draw_quantiles = 0.5, alpha_upper = .1) +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))
# no groups
df %>% filter(Z == "Za") %>%
ggplot(., aes(x = X, y = Y)) +
geom_half_violin(draw_quantiles = 0.5, alpha_upper = .1, fill = "red") +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue")) +
coord_flip()
Graphs

R Plotly jittered boxplot with NAs width

I am plotting the grouped boxplot with jittering with the following function:
plot_boxplot <- function(dat) {
# taking one of each joine_group to be able to plot it
allx <- dat %>%
mutate(y = median(y, na.rm = TRUE)) %>%
group_by(joined_group) %>%
sample_n(1) %>%
ungroup()
p <- dat %>%
plotly::plot_ly() %>%
# plotting all the groups 1:20
plotly::add_trace(data = allx,
x = ~as.numeric(joined_group),
y = ~y,
type = "box",
hoverinfo = "none",
boxpoints = FALSE,
color = NULL,
opacity = 0,
showlegend = FALSE) %>%
# plotting the boxes
plotly::add_trace(data = dat,
x = ~as.numeric(joined_group),
y = ~y,
color = ~group1,
type = "box",
hoverinfo = "none",
boxpoints = FALSE,
showlegend = FALSE) %>%
# adding ticktext
layout(xaxis = list(tickvals = 1:20,
ticktext = rep(levels(dat$group1), each = 4)))
p <- p %>%
# adding jittering
add_markers(data = dat,
x = ~jitter(as.numeric(joined_group), amount = 0.2),
y = ~y,
color = ~group1,
showlegend = FALSE)
p
}
The problem is that when some of the levels have NA as y variable the width of the jittered boxes changes. Here is an example:
library(plotly)
library(dplyr)
set.seed(123)
dat <- data.frame(group1 = factor(sample(letters[1:5], 100, replace = TRUE)),
group2 = factor(sample(LETTERS[21:24], 100, replace = TRUE)),
y = runif(100)) %>%
dplyr::mutate(joined_group = factor(
paste0(group1, "-", group2)
))
# do the plot with all the levels
p1 <- plot_boxplot(dat)
# now the group1 e is having NAs as y values
dat$y[dat$group1 == "e"] <- NA
# create the plot with missing data
p2 <- plot_boxplot(dat)
# creating the subplot to see that the width has changed:
subplot(p1, p2, nrows = 2)
The problem is that the width of boxes in both plots is different:
I've realised that the boxes have the same size without jittering so I know that the jittering is "messing" with the width but I don't know how to fix that.
Does anyone know how to make the width in both jittered plots exactly the same?
I see two separate plot shifts:
due to jittering
due to NAs
First can be solved by declaring new jitter function with fixed seed
fixed_jitter <- function (x, factor = 1, amount = NULL) {
set.seed(42)
jitter(x, factor, amount)
}
and using it instead of jitter in add_markers call.
Second problem can be solved by assigning -1 instead of NA and setting
yaxis = list(range = c(0, ~max(1.1 * y)))
as a second parameter to layout.

Is output from ggplotly not a full-featured plotly object?

The purpose of the code is to produce an interactive plotly chart with shaded vertical areas on specified subsets on X-axis.
The first step is to construct a ggplot2 object, with shaded vertical areas constructed using geom_rect, then use ggplotly to produce a plotly object.
Since ggplotly does not produce an output which contains the shaded vertical areas anymore, I am adding them to ggplotly output (which is is a plotly object) by using plotly function add_lines.
However, this approach does not work. The approach that works is to start from a natively-built plotly object and then using plotly function add_lines.
Does this mean that output from ggplotly is not a full-featured plotly object?
The reproducible example is below. One can change values of logical variables useOnlyPlotly (line 67) and useGeomRect (line 66) to see the behaviors described above
require(tidyverse)
require(plotly)
require(lubridate)
plotShadedAreaUsingGeomBarsFunc <- function(colorArea, dataY){
ggplot2::geom_bar(data = trimmedRecessionsDates, inherit.aes = FALSE,
aes_(x = quote(MidPoint), y = base::max(dataY)), # y = Inf doesn't work
stat = "identity",width = 0.1,
# position = "stack",
fill = colorArea, alpha = 0.2)
}
plotShadedAreaUsingGeomRectFunc <- function(colorArea, dataY){
ggplot2::geom_rect(data = trimmedRecessionsDates, inherit.aes = FALSE,
aes(xmin = as.Date(Peak), xmax = as.Date(Trough), ymin = -Inf, ymax = +Inf),
fill = colorArea,
alpha = 0.2)
}
# dates
dateOne <- lubridate::ymd("2000-1-1")
dateTwo <- lubridate::ymd("2004-1-1")
dateThree <- lubridate::ymd("2009-1-1")
dateFour <- lubridate::ymd("2013-1-1")
dateFive <- lubridate::ymd("2017-12-31")
PeakDates <- c(lubridate::ymd("2001-03-01"), lubridate::ymd("2007-12-01"))
TroughDates <- c(lubridate::ymd("2001-11-01"), lubridate::ymd("2008-08-31"))
sequenceDates <- seq(dateOne, dateFive, by="month")
sequenceInRecession <- c(rep(0,length(sequenceDates)))
sequenceInRecession <- base::replace(sequenceInRecession, list = c(15,16,17,18,19,20,21,22,23,96,97,98,99,100), values = c(rep(1,14)))
sequenceInRecession <- base::replace(sequenceInRecession, list = c(101,102,103,104,105,106,107,108,109,110,111,112,113,114), values = c(rep(1,14)))
dataFrameRecessionDates <- data.frame(Dates = sequenceDates, InRecession = sequenceInRecession)
dataFrameRecessionDates$Dates <- lubridate::as_date(dataFrameRecessionDates$Dates)
#data
theDataFrame <- data.frame(Dates = c(dateOne, dateTwo, dateThree, dateFour, dateFive), SomeValues = c(0.2, 2.8, 4.5, 9.8, -0.3),
season = c("SeasOne","SeasTwo","SeasOne","SeasOne","SeasTwo"))
trimmedRecessionsDates <- data.frame(Peak = PeakDates, Trough = TroughDates)
# define midPoint as middle point between Peak and Trough
trimmedRecessionsDates$MidPoint = trimmedRecessionsDates$Peak + floor((trimmedRecessionsDates$Trough - trimmedRecessionsDates$Peak)/2)
trimmedRecessionsDates$MidPoint <- base::as.Date(trimmedRecessionsDates$MidPoint)
colNamesDataFrame <- colnames(theDataFrame)[2:2]
valMax <- base::max(sapply(theDataFrame[colNamesDataFrame], max, na.rm = TRUE))
valMin <- base::min(sapply(theDataFrame[colNamesDataFrame], min, na.rm = TRUE))
dataFrameRecessionDates$InRecession[dataFrameRecessionDates$InRecession %in% 1] <- valMax + 0.2*base::abs(valMax)
dataFrameRecessionDates$InRecession[dataFrameRecessionDates$InRecession %in% 0] <- valMin - 0.2*base::abs(valMin)
ggplotObjUsingGeomBar <- ggplot2::ggplot(data = theDataFrame, aes(x = Dates, y = SomeValues, color = season)) +
ggplot2::geom_line() +
plotShadedAreaUsingGeomBarsFunc('turquoise3', theDataFrame$SomeValues)
ggplotObjUsingGeomRect <- ggplot2::ggplot(data = theDataFrame, aes(x = Dates, y = SomeValues)) +
ggplot2::geom_line() +
plotShadedAreaUsingGeomRectFunc('turquoise3', theDataFrame$SomeValues)+
ggplot2::theme_bw()
useGeomRect = TRUE
useOnlyPlotly = TRUE
thePlotlyObjToAnalyze <- plot_ly()
if (useOnlyPlotly)
{
thePlotlyObjToAnalyze <- plot_ly(data = theDataFrame, x = ~Dates, y = ~SomeValues) %>%
add_lines(data = theDataFrame, x = ~Dates, y = ~SomeValues,
line = list(width = 3), hoverinfo = "x + y")
} else {
if (useGeomRect)
{
thePlotlyObjToAnalyze <- hide_legend(ggplotly(ggplotObjUsingGeomRect))
} else {
thePlotlyObjToAnalyze <- hide_legend(ggplotly(ggplotObjUsingGeomBar))
}
}
(thePlotlyObjToAnalyze %>%
plotly::add_lines(data = dataFrameRecessionDates,
x = ~Dates, y = ~InRecession,
line = list(width = 0),
fill = "tozerox",
fillcolor = "rgba(64, 64, 64, 0.3)",
showlegend = F,
hoverinfo = "none"))
Update: Below is code based on answer provided in enter link description here, but unfortunately it did not work for me
library(plotly)
library(ggplot2)
useOnlyPlotly <- FALSE
thePlot <- plot_ly()
if (useOnlyPlotly)
{
thePlot <- plot_ly() %>%
add_trace(data = economics, x = ~date, y = ~unemploy, type="scatter", mode = "lines")
}else{
theGgplot2Obj <- ggplot(data = economics, aes(x = date, y = unemploy)) + geom_line()
thePlot <- ggplotly(theGgplot2Obj)
thePlot[['x']][['layout']][['shapes']] <- c()
}
( thePlot <- layout(thePlot,
shapes = list(
list(type = "rect",
fillcolor = "blue", line = list(color = "blue"), opacity = 0.5,
x0 = "1980-01-01", x1 = "1990-01-01",
y0 = 6000, y1 = 8000
)
)
)
)
Your idea of using add_lines combined with filltozero is good but the gaps between your shades will be problematic, you would probably need to add NaN in between to get it right.
The real problem is that your input dates are strings and Plotly stores the dates as integers (milliseconds since the epoch). So we would need to convert the dates first and then plot them.
x0 = as.integer(as.POSIXct(trimmedRecessionsDates$Peak[[i]])) * 1000
thePlotlyObjToAnalyze$x$layout$shape <- c()
shapes = list()
for (i in 1:length(trimmedRecessionsDates$MidPoint)) {
shapes[[i]] = list(type = "rect",
fillcolor = "blue", line = list(color = "blue"), opacity = 0.5,
x0 = as.integer(as.POSIXct(trimmedRecessionsDates$Peak[[i]])) * 1000,
x1 = as.integer(as.POSIXct(trimmedRecessionsDates$Trough[[i]])) * 1000,
y0 = 0,
y1 = 1,
yref = 'paper'
)
}
thePlotlyObjToAnalyze <- layout(thePlotlyObjToAnalyze,
shapes = shapes
)

plotly linetype poperty setting in R

I am trying to set the line type("solid", "dash", "dot" etc) for multiple lines in plotly. I have a column(factor variable) in my data frame which specifies the type of line.
Below is the sample code I am working with.
mydf <- data.frame(x = c(1:10), y1 = c(11:20), y2 = c(21:30))
mydf1 <- gather(mydf,'var', 'val', -x)
mydf1$lt <- factor(c(rep("solid",10),rep("dot",10)))
pal <- RColorBrewer::brewer.pal(nlevels(mydf1$lt),"Set1")
p <-plot_ly(mydf1, x = x, y = val, type = 'line', color = var,colors = pal,line = list(width = 3, dash = lt))
p<- layout(p,title = "Hello", annotations = list(x = mydf1$x, y = mydf1$var))
p
mydf1$lt specifies the required line type.
For the above example, y1 must be solid line and y2 must be dotted line.
I can solve the issue by individually adding lines using add_trace(). I am looking for a more concised, elegant way of doing the same. Below is one possible solution.
p1 <- plot_ly(mydf)
p1 <- add_trace(p1, x = x, y = y1, line = list(dash = "dash"))
p1 <- add_trace(p1, x = x, y = y2, line = list(dash = "solid"))
p1
For the above example, y1 must be solid line and y2 must be dotted
line.
You could do
library(ggplot2)
library(tidyr)
library(plotly)
mydf <- data.frame(x = c(1:10), y1 = c(11:20), y2 = c(21:30))
mydf1 <- gather(mydf,'var', 'val', -x)
mydf1$lt <- factor(c(rep("solid",10),rep("dot",10)))
pal <- RColorBrewer::brewer.pal(nlevels(mydf1$lt),"Set1")
p <-plot_ly(
transform(mydf1, lt=c(solid="solid", dot="3")[lt]),
x = x,
y = val,
type = 'line',
color = var,
colors = pal,
line = list(dash = lt)
)
p <- layout(
p,
title = "Hello",
annotations = list(x = mydf1$x, y = mydf1$var)
)
p
Or c(solid="3", dot="solid")[lt] if you want it the other way around.

Adding a manual legend in ggplot2?

I'm doing a comparison chart of two different estimates of the same time series data. I'm filling the area between the two series in green if the original estimate is more than the latest estimate, and red otherwise.
I've got that part working, but I'd like to add a legend for the fill colors. I tried scale_fill_manual towards the bottom of the code, but it doesn't seem to be doing anything?
Here's the code:
library(ggplot2)
library(scales)
library(colorspace)
# Return a polygon that only plots between yLower and yUpper when yLower is
# less than yUpper.
getLowerPolygon = function(x, yLower, yUpper) {
# Create the table of coordinates
poly = data.frame(
x = numeric(),
y = numeric())
lastReversed = (yUpper[1] < yLower[1])
for (r in 1:length(x)) {
reversed = (yUpper[r] < yLower[r])
if (reversed != lastReversed) {
# Between points r-1 and r, the series intersected, so we need to
# change the polygon from visible to invisible or v.v. In either
# case, just add the intersection between those two segments to the
# polygon. Algorithm from:
# https://en.wikipedia.org/wiki/Line-line_intersection
# First line: x1,y1 - x2,y2
x1 = x[r-1]
y1 = yLower[r-1]
x2 = x[r]
y2 = yLower[r]
# Second line: x3,y3 - x4,y4
x3 = x[r-1]
y3 = yUpper[r-1]
x4 = x[r]
y4 = yUpper[r]
# Calculate determinants
xy12 = det(matrix(c(x1, y1, x2, y2), ncol = 2))
xy34 = det(matrix(c(x3, y3, x4, y4), ncol = 2))
x12 = det(matrix(c(x1, 1, x2, 1), ncol = 2))
x34 = det(matrix(c(x3, 1, x4, 1), ncol = 2))
y12 = det(matrix(c(y1, 1, y2, 1), ncol = 2))
y34 = det(matrix(c(y3, 1, y4, 1), ncol = 2))
# Calculate fraction pieces
xn = det(matrix(c(xy12, x12, xy34, x34), ncol = 2))
yn = det(matrix(c(xy12, y12, xy34, y34), ncol = 2))
d = det(matrix(c(x12 , y12, x34, y34), ncol = 2))
# Calculate intersection
xi = xn / d
yi = yn / d
# Add the point
poly[nrow(poly)+1,] = c(xi, yi)
}
lastReversed = reversed
# http://stackoverflow.com/questions/2563824
poly[nrow(poly)+1,] = c(x[r], min(yLower[r], yUpper[r]))
}
poly = rbind(poly, data.frame(
x = rev(x),
y = rev(yUpper)))
return(poly)
}
getComparisonPlot = function(data, title, lower_name, upper_name,
x_label, y_label, legend_title = '') {
lightGreen = '#b0dd8d'
lightRed = '#fdba9a'
darkGray = RGB(.8, .8, .8)
midGray = RGB(.5, .5, .5)
plot = ggplot(data, aes(x = x))
plot = plot + geom_polygon(
aes(x = x, y = y),
data = data.frame(
x = c(data$x, rev(data$x)),
y = c(data$yLower, rev(data$yUpper))
),
fill = lightRed)
coords = getLowerPolygon(data$x, data$yLower, data$yUpper)
plot = plot + geom_polygon(
aes(x = x, y = y),
data = coords,
fill = lightGreen)
plot = plot + geom_line(
aes(y = yUpper, color = 'upper'),
size = 0.5)
plot = plot + geom_line(
aes(y = yLower, color = 'lower'),
size = 0.5)
plot = plot +
ggtitle(paste(title, '\n', sep='')) +
xlab(x_label) +
ylab(y_label) +
scale_y_continuous(labels = comma)
# http://stackoverflow.com/a/10355844/106302
plot = plot + scale_color_manual(
name = legend_title,
breaks = c('upper' , 'lower'),
values = c('gray20', 'gray50'),
labels = c(upper_name, lower_name))
plot = plot + scale_fill_manual(
name = 'Margin',
breaks = c('upper', 'lower'),
values = c(lightGreen, lightRed),
labels = c('Over', 'Under'))
return(plot)
}
print(getComparisonPlot(
data = data.frame(
x = 1:20,
yLower = 1:20 %% 5 + 2,
yUpper = 1:20 %% 7
),
title = 'Comparison Chart',
lower_name = 'Latest',
upper_name = 'Original',
x_label = 'X axis',
y_label = 'Y axis',
legend_title = 'Thing'
))
Here's an image of the chart, I think it is a cool technique:
I'm also open to any other suggestions for improving my ggplot code.
GGplot need you to map polygons fill aesthetic to some variable. OR, in this case, it need just you to "label" the types of polygons (i.e. 'upper' and 'lower'). You do this by passing a string with the respective label for the fill aesthetic of geom_polygon(). What you are doing is passing a giving colour for each polygon and not mapping to anything that the ggplot will understand. It's kind of a "hard coded colour" =P.
Well, here are the changes inside getComparisonPlot:
plot = plot + geom_polygon(
aes(x = x, y = y, fill = "upper"),
data = coords)
plot = plot + geom_polygon(
aes(x = x, y = y, fill = "lower"),
data = data.frame(
x = c(data$x, rev(data$x)),
y = c(data$yLower, rev(data$yUpper))
))
One more thing. Note that the strings passed to fill aesthetic coincides with the breaks passed to the scale_fill_manual. It is necessary to make the legend map things right.
plot = plot + scale_fill_manual(
name = 'Margin',
breaks = c('upper', 'lower'), # <<< corresponds to fill aesthetic labels
values = c(lightGreen, lightRed),
labels = c('Over', 'Under'))
Result:
hope it helps.

Resources