R: ggplot2: plot initiating at the axes - r

I'm trying to respond to a reviewer that wants some changes to a figure... I am using ggplot2 to generate Kaplan-Meier curves, and the reviewer wants the X-axis to start at 0. The default in ggkmTable adds some space between 0 and the Y-axis. I can't figure out what to change to make it look right.
Here is my code:
ggkmTable <- function(sfit, table=TRUE,returns = FALSE,
xlabs = "Time in Years", ylabs = "Survival Probability",
ystratalabs = NULL, ystrataname = NULL,
timeby = 100, main = "Kaplan-Meier Plot",
pval = TRUE, ...) {
require(plyr)
require(ggplot2)
require(survival)
require(gridExtra)
if(is.null(ystratalabs)) {
ystratalabs <- as.character(levels(summary(sfit)$strata))
}
m <- max(nchar(ystratalabs))
if(is.null(ystrataname)) ystrataname <- "Strata"
times <- seq(0, max(sfit$time), by = timeby)
.df <- data.frame(time = sfit$time, n.risk = sfit$n.risk,
n.event = sfit$n.event, surv = sfit$surv, strata = summary(sfit, censored = T)$strata,
upper = sfit$upper, lower = sfit$lower)
levels(.df$strata) <- ystratalabs
zeros <- data.frame(time = 0, surv = 1, strata = factor(ystratalabs, levels=levels(.df$strata)),
upper = 1, lower = 1)
.df <- rbind.fill(zeros, .df)
d <- length(levels(.df$strata))
p <- ggplot(.df, aes(time, surv, group = strata)) +
geom_step(aes(linetype = strata), size = 0.7) +
theme_bw() +
theme(axis.title.x = element_text(vjust = 0.5)) +
scale_x_continuous(xlabs, breaks = times, limits = c(0, max(sfit$time))) +
scale_y_continuous(ylabs, limits = c(0, 1)) +
theme(panel.grid.minor = element_blank()) +
theme(legend.position = "bottom") +
theme(legend.key = element_rect(colour = NA)) +
labs(linetype = ystrataname) +
theme(plot.margin = unit(c(0, 1, .5, ifelse(m < 10, 1.5, 2.5)), "lines")) +
ggtitle(main)
if(pval) {
sdiff <- survdiff(eval(sfit$call$formula), data = eval(sfit$call$data))
pval <- pchisq(sdiff$chisq, length(sdiff$n)-1, lower.tail = FALSE)
pvaltxt <- paste("p =", signif(pval, 3))
p <- p + annotate("text", x = 0.6 * max(sfit$time), y = 0.1, label = pvaltxt)
}
## Create a blank plot for place-holding
## .df <- data.frame()
blank.pic <- ggplot(.df, aes(time, surv)) +
geom_blank() +
theme_bw() +
theme(axis.text.x = element_blank(), axis.text.y = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.ticks = element_blank(), panel.grid.major = element_blank(),
panel.border = element_blank())
if(table) {
## Create table graphic to include at-risk numbers
risk.data <- data.frame(strata = summary(sfit, times = times, extend = TRUE)$strata,
time = summary(sfit, times = times, extend = TRUE)$time,
n.risk = summary(sfit, times = times, extend = TRUE)$n.risk)
data.table <- ggplot(risk.data, aes(x = time, y = strata, label = format(n.risk, nsmall = 0))) +
#, color = strata)) +
geom_text(size = 3.5) +
theme_bw() +
scale_y_discrete(breaks = as.character(levels(risk.data$strata)), labels = ystratalabs) +
# scale_y_discrete(#format1ter = abbreviate,
# breaks = 1:3,
# labels = ystratalabs) +
scale_x_continuous("Numbers at risk", limits = c(0, max(sfit$time))) +
theme(axis.title.x = element_text(size = 10, vjust = 1), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.border = element_blank(),
axis.text.x = element_blank(), axis.ticks = element_blank(),
axis.text.y = element_text(face = "bold", hjust = 1))
data.table <- data.table + theme(legend.position = "none") +
xlab(NULL) + ylab(NULL)
data.table <- data.table +
theme(plot.margin = unit(c(-1.5, 1, 0.1, ifelse(m < 10, 2.5, 3.5)-0.28 * m), "lines"))
## Plotting the graphs
## p <- ggplotGrob(p)
## p <- addGrob(p, textGrob(x = unit(.8, "npc"), y = unit(.25, "npc"), label = pvaltxt,
## gp = gpar(fontsize = 12)))
grid.arrange(p, blank.pic, data.table,
clip = FALSE, nrow = 3, ncol = 1,
heights = unit(c(2, .1, .25),c("null", "null", "null")))
if(returns) {
a <- arrangeGrob(p, blank.pic, data.table, clip = FALSE,
nrow = 3, ncol = 1, heights = unit(c(2, .1, .25),c("null", "null", "null")))
return(a)
}
}
else {
## p <- ggplotGrob(p)
## p <- addGrob(p, textGrob(x = unit(0.5, "npc"), y = unit(0.23, "npc"),
## label = pvaltxt, gp = gpar(fontsize = 12)))
print(p)
if(returns) return(p)
}
}

The answer first, then an explanation.
Add the line:
coord_cartesian(xlim=c(0,max(sfit$time)))
to your ggplot object.
A simple example:
df <- data.frame(c(runif(10,0,1)),runif(10,0,1))
names(df) <- c("x","y")
p <- ggplot(df, aes(x,y)) +
geom_point() +
scale_x_continuous(breaks=c(0,0.25,0.5,0.75,1.0),
labels=c("0","0.25","0.5","0.75","1.0"))
p
gives you
whereas, if you add to the above code
p <- p + coord_cartesian(xlim=c(0,1))
p
you get
coord_cartesian() is a friend of yours (and anyone who uses your code after you) IFF you are absolutely certain any [visually important] data will never fall beyond the bounds you set within that very function. This is well documented; see Hadley's doc on this useful creature
To apply it to your code above and visualize it myself... I need to know what "sfit" is (as per user Pascal's insightful inquiry), likely among other idiosyncratic things. But the gist is:
scale_x_continuous (and its sisters scale_y_continuous et al.) don't hard-cut bounds to the EXACT parameters you specify. They do as the name implies, yes, "scale" axes according to a function (e.g. log10). But they always leave a cute little buffer around the limits, for assumed aesthetically preferential defaults.
'coord_cartesian`, on the other hand, DOES set axes limits EXACTLY as you specify, cutting out all space and data falling outside those bounds without altering the analyses pertaining to that entire data field. But don't take my word for it: read Hadley's apt description:
"The Cartesian coordinate system is the most familiar, and common, type of coordinate system. Setting limits on the coordinate system will zoom the plot (like you're looking at it with a magnifying glass), and will not change the underlying data like setting limits on a scale will."
To be clear, you can use both scale_x_continuous (and y) and coord_cartesian in the same ggplot object, because they do different things. The former sets breaks and labels for those breaks, the latter the frame (i.e. visual bounds) of the plot.
The other (dirty, dirty) solution is... photoshop. :(

Use:
scale_x_continuous("Numbers at risk", limits = c(0, max(sfit$time)), expand = c(0, 0))
The expand can also be used on your y-axis if needed.

Related

Modify the size of each legend icon in ggplot2

I am using ggplot/usmap libararies to plot highly skewed data onto a map.
Because the data is so skewed, I created uneven interval brackets. See below;
My Code:
library(dplyr)
library(tidyverse)
library(usmap)
library(ggplot2)
library(readxl)
library(rgdal)
plot_usmap(regions = "states",
# fill = 'orange',
labels = TRUE) +
geom_point(data = grant_sh,
size = 5,
aes(x = x,
y = y,
color = funding_cat)) +
theme(
legend.title = element_text(size = 16),
#change legend title font size
legend.text = element_text(size = 14),
#change legend text font size
legend.position = 'left',
plot.title = element_text(size = 22),
plot.subtitle = element_text(size = 16)
) + #+
scale_color_manual(
values = c('#D4148C', # pink muesaum
'#049CFC', #library,blue
'#1C8474',
'#7703fC'),
name = "Map Key",
labels = c(
'$1,500 - $4,000 (n = 7)',
'$4,001 - $6,000 (n = 12)',
'$6,001 - $20,000 (n = 6)',
'$20,001 - $40,000 (n = 25)'
)
) +
guides(colour = guide_legend(override.aes = list(size = 3)))
Current Output
Desired Output
I would like to adjust the legend key to reflect the size of each interval. So, for example 1500-400 would be the smallest icon, and 20,001-40,000 would be the largest.
I want to do this so that the viewer immediately knows that the intervals are not even. Any solution to achieve this outcome is greatly appreciated!
See how the sign/oval next to each interval represents the range of the interval in my example below.
One option to create this kind of legend would be to make it as a second plot and glue it to the main plot using e.g. patchwork.
Note: Especially with a map as the main plot and the export size if any, this approach requires some fiddling to position the legend, e.g. in my code below a added a helper row to the patchwork design to shift the legend upwards.
UPDATE: Update the code to include the counts in the labels. Added a second approach to make the legend using geom_col and a separate dataframe.
library(dplyr, warn = FALSE)
library(usmap)
library(ggplot2)
library(patchwork)
# Make example data
set.seed(123)
cat1 <- c(1500, 4001, 6001, 20001)
cat2 <- c(4000, 6000, 2000, 40000)
n = c(7, 12, 6, 25)
funding_cat <- paste0("$", cat1, " - $", cat2, " (n=", n, ")")
funding_cat <- factor(funding_cat, levels = rev(funding_cat))
grant_sh <- utils::read.csv(system.file("extdata", "us_states_centroids.csv", package = "usmapdata"))
grant_sh$funding_cat = sample(funding_cat, 51, replace = TRUE, prob = n / sum(n))
# Make legend plot
grant_sh_legend <- data.frame(
funding_cat = funding_cat,
n = c(7, 12, 6, 25)
)
legend <- ggplot(grant_sh, aes(y = funding_cat, fill = funding_cat)) +
geom_bar(width = .6) +
scale_y_discrete(position = "right") +
scale_fill_manual(
values = c('#D4148C',
'#049CFC',
'#1C8474',
'#7703fC')
) +
theme_void() +
theme(axis.text.y = element_text(hjust = 0),
plot.title = element_text(size = rel(1))) +
guides(fill = "none") +
labs(title = "Map Key")
map <- plot_usmap(regions = "states",
labels = TRUE) +
geom_point(data = grant_sh,
size = 5,
aes(x = x,
y = y,
color = funding_cat)) +
theme(
legend.position = 'none',
plot.title = element_text(size = 22),
plot.subtitle = element_text(size = 16)
) + #+
scale_color_manual(
values = c('#D4148C', # pink muesaum
'#049CFC', #library,blue
'#1C8474',
'#7703fC'),
name = "Map Key",
labels = c(
'$1,500 - $4,000 (n = 7)',
'$4,001 - $6,000 (n = 12)',
'$6,001 - $20,000 (n = 6)',
'$20,001 - $40,000 (n = 25)'
)
) +
guides(colour = guide_legend(override.aes = list(size = 3)))
# Glue together
design <- "
#B
AB
#B
"
legend + map + plot_layout(design = design, heights = c(5, 1, 1), widths = c(1, 10))
Using geom_bar the counts are computed from your dataset grant_sh. A second option would be to compute the counts manually or use a manually created dataframe and then use geom_col for the legend plot:
grant_sh_legend <- data.frame(
funding_cat = funding_cat,
n = c(7, 12, 6, 25)
)
legend <- ggplot(grant_sh, aes(y = funding_cat, n = n, fill = funding_cat)) +
geom_col(width = .6) +
scale_y_discrete(position = "right") +
scale_fill_manual(
values = c('#D4148C',
'#049CFC',
'#1C8474',
'#7703fC')
) +
theme_void() +
theme(axis.text.y = element_text(hjust = 0),
plot.title = element_text(size = rel(1))) +
guides(fill = "none") +
labs(title = "Map Key")

SHAP Summary Plot for XGBoost model in R without displaying Mean Absolute SHAP value on the plot

I don't want to display the Mean Absolute Values on my SHAP Summary Plot in R. I want an output similar to the one produced in python. What line of code will help remove the mean absolute values from the summary plot in R?
I'm currently using this line of code:
shap.plot.summary.wrap1(xgb_model, X = x, top_n = 10)
You can do this by sligtly modifying the source code of shap.plot.summary() as below:
shap.plot.summary.edited <- function(data_long,
x_bound = NULL,
dilute = FALSE,
scientific = FALSE,
my_format = NULL){
if (scientific){label_format = "%.1e"} else {label_format = "%.3f"}
if (!is.null(my_format)) label_format <- my_format
# check number of observations
N_features <- setDT(data_long)[,uniqueN(variable)]
if (is.null(dilute)) dilute = FALSE
nrow_X <- nrow(data_long)/N_features # n per feature
if (dilute!=0){
# if nrow_X <= 10, no dilute happens
dilute <- ceiling(min(nrow_X/10, abs(as.numeric(dilute)))) # not allowed to dilute to fewer than 10 obs/feature
set.seed(1234)
data_long <- data_long[sample(nrow(data_long),
min(nrow(data_long)/dilute, nrow(data_long)/2))] # dilute
}
x_bound <- if (is.null(x_bound)) max(abs(data_long$value))*1.1 else as.numeric(abs(x_bound))
plot1 <- ggplot(data = data_long) +
coord_flip(ylim = c(-x_bound, x_bound)) +
geom_hline(yintercept = 0) + # the y-axis beneath
# sina plot:
ggforce::geom_sina(aes(x = variable, y = value, color = stdfvalue),
method = "counts", maxwidth = 0.7, alpha = 0.7) +
# print the mean absolute value:
#geom_text(data = unique(data_long[, c("variable", "mean_value")]),
# aes(x = variable, y=-Inf, label = sprintf(label_format, mean_value)),
# size = 3, alpha = 0.7,
# hjust = -0.2,
# fontface = "bold") + # bold
# # add a "SHAP" bar notation
# annotate("text", x = -Inf, y = -Inf, vjust = -0.2, hjust = 0, size = 3,
# label = expression(group("|", bar(SHAP), "|"))) +
scale_color_gradient(low="#FFCC33", high="#6600CC",
breaks=c(0,1), labels=c(" Low","High "),
guide = guide_colorbar(barwidth = 12, barheight = 0.3)) +
theme_bw() +
theme(axis.line.y = element_blank(),
axis.ticks.y = element_blank(), # remove axis line
legend.position="bottom",
legend.title=element_text(size=10),
legend.text=element_text(size=8),
axis.title.x= element_text(size = 10)) +
# reverse the order of features, from high to low
# also relabel the feature using `label.feature`
scale_x_discrete(limits = rev(levels(data_long$variable))#,
#labels = label.feature(rev(levels(data_long$variable)))
)+
labs(y = "SHAP value (impact on model output)", x = "", color = "Feature value ")
return(plot1)
}

Error in `combine_vars()`: ! At least one layer must contain all faceting variables: in function with facet_wrap() & facet_grid()

I am trying to create a function on Rthat creates and saves a dotplot with facets on my wd. Code for the function below:
get_dotplot <- function(df, xvalue, avgvalue, sdvalue, svalue, gvalue, main, xaxis, yaxis, glegend, figure_title)
{
dp <- ggplot(df, aes(x = xvalue, y = avgvalue, color = gvalue)) +
geom_point(stat = 'identity', aes(shape=svalue, color=gvalue))+
geom_errorbar(aes(ymin=avgvalue-sdvalue, ymax=avgvalue+sdvalue))+
facet_grid(cols = vars(svalue), scales = "fixed")+
labs(x = xaxis, y = yaxis, title = main, color=glegend)+
theme(axis.title.x.bottom = element_text(hjust = 0.5, vjust = 1),
axis.title.y = element_text(hjust = 0.5, vjust = 1),
axis.ticks.x = element_line(),
axis.text.x = element_text(angle = 0, hjust = 1, vjust = 0.5, size = 7),
axis.ticks.x.bottom = element_line(colour = "grey", size = (0.5)),
axis.ticks.y.left = element_line(colour = "black", size = (0.4)),
panel.background=element_rect(colour = "black", size = 0.5, fill=NA),
panel.grid = element_blank())
print(dp)
ggsave(paste(figure_title, "png", sep = "."), plot = dp, scale = 1, dpi = 600)
}
get_dotplot(df, xvalue, avgvalue, sdvalue, svalue, gvalue, main, xaxis, yaxis, glegend, figure_title)
However, I always get this error message:
Error in `combine_vars()`:
! At least one layer must contain all faceting variables: `svalue`.
* Plot is missing `svalue`
* Layer 1 is missing `svalue`
* Layer 2 is missing `svalue`
Backtrace:
1. global get_dotplot_errorbar_yaxis(...)
3. ggplot2:::print.ggplot(dp)
5. ggplot2:::ggplot_build.ggplot(x)
6. layout$setup(data, plot$data, plot$plot_env)
7. ggplot2 f(..., self = self)
8. self$facet$compute_layout(data, self$facet_params)
9. ggplot2 f(...)
10. ggplot2::combine_vars(data, params$plot_env, cols, drop = params$drop)
I suspect it's because of the facetting so I played around between facet_wrap() and facet_grid() with no result. Could someone please help me with that ?
I checked and I have the svalue variable in my dataframe, and it is spelled correctly. I also consulted previous questions about the topic but they were not helpful.
the dataset looks something like this, but with a larger number of individuals and numbers of days:
set.seed(108)
n <- 1:12
treatment <- factor(paste("trt", 1:2))
individuals <- sample(LETTERS, 2)
days <- c("12", "20", "25")
avg_var1 <- sample(1:100, 12)
sd_var1 <- sample(1:50, 12)
avg_var2 <- sample(1:100, 12)
sd_var2 <- sample(1:50, 12)
avg_var3 <- sample(1:100, 12)
sd_var3 <- sample(1:50, 12)
test <- data.frame(n, treatment, individuals, days,avg_var1, sd_var1, avg_var2, sd_var2, avg_var3, sd_var3)
I define the variables for the function as follows on R:
df=test
xvalue=test$days
avgvalue=test$avg_var1
sdvalue = test$sd_var1
svalue=test$treatment
gvalue=test$individuals
main= "var1 in function of days"
xaxis="days"
yaxis="var1"
glegend="individuals"
figure_title ="var1_days"
As written, your code passes columns into the function repeating the data in the dataframe. This doesn't seem to "play nicely" with the non-standard evaluation used in ggplot. Essentially ggplot is looking for a column in df called "svalue" to use for faceting (it doesn't find it). Once this has been fixed, the same sort of problem occurs with the error bars.
One way round this is to just pass in the column names, and use aes_string for the variables. This doesn't work for the faceting or the calculated values, so those are calculated at the start of the function. This would give:
get_dotplot <- function(df, xvalue, avgvalue, sdvalue, svalue, gvalue, main, xaxis, yaxis, glegend, figure_title)
{
df$ymin <- df[[avgvalue]] - df[[sdvalue]]
df$ymax <- df[[avgvalue]] + df[[sdvalue]]
df$facets <- df[[svalue]]
dp <- ggplot(df, aes_string(x = xvalue, y = avgvalue, color = gvalue)) +
geom_point(stat = 'identity', aes_string(shape=svalue, color=gvalue)) +
geom_errorbar(aes(ymin=ymin, ymax=ymax))+
facet_grid(cols = vars(facets), scales = "fixed")+
labs(x = xaxis, y = yaxis, title = main, color=glegend)+
theme(axis.title.x.bottom = element_text(hjust = 0.5, vjust = 1),
axis.title.y = element_text(hjust = 0.5, vjust = 1),
axis.ticks.x = element_line(),
axis.text.x = element_text(angle = 0, hjust = 1, vjust = 0.5, size = 7),
axis.ticks.x.bottom = element_line(colour = "grey", size = (0.5)),
axis.ticks.y.left = element_line(colour = "black", size = (0.4)),
panel.background=element_rect(colour = "black", size = 0.5, fill=NA),
panel.grid = element_blank())
print(dp)
ggsave(paste(figure_title, "png", sep = "."), plot = dp, scale = 1, dpi = 600)
}
get_dotplot(df=test,
xvalue="days",
avgvalue="avg_var1",
sdvalue = "sd_var1",
svalue="treatment",
gvalue="individuals",
main= "var1 in function of days",
xaxis="days",
yaxis="var1",
glegend="individuals",
figure_title ="var1_days")

R heatmap with circles

I would like to generate, in R, a heatmap visualization of a matrix using circles, in order to have both the color and diameter of the circles be informative. Something looking like this:
This sort of plotting is called "bubblegum plot" in certain computational biology labs, but I could not find an R function/package to do it.
Any ideas? Thanks!
Not sure whether there is a package which offers this out-of-the-box but using just ggplot2 this could be achieved like so:
library(ggplot2)
set.seed(42)
d <- data.frame(
x = rep(paste("Team", LETTERS[1:8]), 4),
y = rep(paste("Task", 1:4), each = 8),
value = runif(32)
)
ggplot(d, aes(x, forcats::fct_rev(y), fill = value, size = value)) +
geom_point(shape = 21, stroke = 0) +
geom_hline(yintercept = seq(.5, 4.5, 1), size = .2) +
scale_x_discrete(position = "top") +
scale_radius(range = c(1, 15)) +
scale_fill_gradient(low = "orange", high = "blue", breaks = c(0, .5, 1), labels = c("Great", "OK", "Bad"), limits = c(0, 1)) +
theme_minimal() +
theme(legend.position = "bottom",
panel.grid.major = element_blank(),
legend.text = element_text(size = 8),
legend.title = element_text(size = 8)) +
guides(size = guide_legend(override.aes = list(fill = NA, color = "black", stroke = .25),
label.position = "bottom",
title.position = "right",
order = 1),
fill = guide_colorbar(ticks.colour = NA, title.position = "top", order = 2)) +
labs(size = "Area = Time Spent", fill = "Score:", x = NULL, y = NULL)
I wrote an alternative function to perform the plotting, without ggplot and tidyverse. I will soon upload it to the CRAN corto package. Enjoy!
Usage
inputp<-matrix(runif(1000),nrow=50)
inputn<-matrix(rnorm(1000),nrow=50)
colnames(inputp)<-colnames(inputn)<-paste0("Score",1:ncol(inputp))
rownames(inputp)<-rownames(inputn)<-paste0("Car",1:nrow(inputp))
par(las=2,mar=c(0,6,6,10))
bubblegum(inputp,inputn)
BUBBLEGUM function
require(gplots)
require(plotrix)
bubblegum<-function(
inputp,
inputn,
pcr=0.1,
grid=FALSE,
reorder=FALSE,
legend=TRUE,
matrix2col=TRUE
) {
if(nrow(inputp)!=nrow(inputn)|ncol(inputp)!=ncol(inputn)){
warning("inputp and inpute have different sizes!")
}
### Initialize
rownumber<-nrow(inputp)
colnumber<-ncol(inputp)
### Trasform the NESs into colors
if(matrix2col){
colconversion<-matrix2col(inputn,nbreaks=20)
nescolors<-colconversion$colormatrix
} else {
nescolors<-inputn
}
#pradii<-0.3*(-log(inputp)/max(-log(inputp)))
pradii<-inputp
pradii[inputp>0.1]<-pcr*0
pradii[inputp<=0.1]<-pcr*1
pradii[inputp<0.05]<-pcr*2
pradii[inputp<1E-5]<-pcr*3
pradii[inputp<1E-10]<-pcr*4
pradii[inputp<1E-20]<-pcr*5
### Order by sum NES
sumnes<-apply(inputn,1,function(x){sum(abs(x))})
if(reorder){
neworder<-order(sumnes)
pradii<-pradii[neworder,]
nescolors<-nescolors[neworder,]
} else {
pradii<-pradii[nrow(pradii):1,]
nescolors<-nescolors[nrow(nescolors):1,]
}
### Plot
#par(las=2,mar=c(0,20,6,0))
plot(0,ylim=c(0,rownumber+1),xlim=c(0,colnumber+1),xaxt="n",yaxt="n",type="n",frame.plot=FALSE,xlab="",ylab="")#,xaxs="i",yaxs="i")
if(grid){
abline(h=1:rownumber,lty=2)
abline(v=1:colnumber,lty=2)
}
for (i in (1:rownumber)) {
for(j in 1:colnumber) {
radius<-pradii[i,j]
color<-nescolors[i,j]
draw.circle(j,i,radius=radius,col=color,lwd=0.2)
}
}
axis(3,at=1:colnumber,labels=colnames(pradii))
axis(2,at=1:rownumber,labels=rownames(pradii),cex.axis=0.7)
### Enable things to be drawn outside the plot region
par(xpd=TRUE)
### Title
### Legend
if(legend){
#legend(-colnumber,rownumber,c("group A", "group B"), pch = c(1,2), lty = c(1,2))
legend("topright",inset=c(-0.1,0),legend=c(
"<0.1","0.05","<1e-5","<1e-10","<1e-20"
), pch=c(21), title="FDR",pt.bg="white",horiz=FALSE,pt.cex=c(1,1.5,2,2.5,3))
}
if(matrix2col){
extreme<-round(max(abs(inputn)),1)
legend("bottomright", inset=c(-0.1,0),legend=c(
-extreme,-extreme/2,0,extreme/2,extreme
), pch=c(21), title="Score",
pt.bg=colconversion$col[c(1,5,10,15,19)],
horiz=FALSE,pt.cex=3)
}
}
###########################
matrix2col<-function(z,col1="navy",col2="white",col3="red3",nbreaks=100,center=TRUE){
if(center){
extreme=max(abs(z))+0.001
breaks <- seq(-extreme, extreme, length = nbreaks)
}else {
breaks <- seq(min(z), max(z), length = nbreaks)
}
ncol <- length(breaks) - 1
col <- colorpanel(ncol,col1,col2,col3)
CUT <- cut(z, breaks=breaks)
colorlevels <- col[match(CUT, levels(CUT))] # assign colors to heights for each point
names(colorlevels)<-rownames(z)
colormatrix<-matrix(colorlevels,ncol=ncol(z),nrow=nrow(z))
dimnames(colormatrix)<-dimnames(z)
return(list(colormatrix=colormatrix,col=col))
}

How to combine and modify ggplot2 legends with ribbons and lines?

I have several issues with the legend on the following plot:
Ribbons and lines are separated into 2 different legends, but I want them to be merged: lines in legend should have background fill that corresponds to the ribbon with the same name (and transparent for the "official tax" entry).
Lines in the legend differs by colour but they should differ by colour AND linetype.
Seems that I need to use scale_manual and guide_legend options, but all my attempts failed miserably.
Here is the code to create the plot. plotDumping is the function that draws the plot, updateData generates data frame for the plot and 'updateLabels' generates footnote for the plot.
library(ggplot2)
library(grid)
library(gridExtra)
library(scales)
max_waste_volume <- 2000
Illegal_dumping_fine_P <- 300000
Illigal_landfilling_fine_P1 <- 500000
Fine_probability_k <- 0.5
Official_tax_Ta <- 600
# mwv = max_waste_volume
# P = Illegal_dumping_fine_P
# P1 = Illigal_landfilling_fine_P1
# k = Fine_probability_k
# Ta = Official_tax_Ta
updateData <- function(mwv, k, P1, P, Ta){
# creates and(or) updates global data frame to provide data for the plot
new_data <<- NULL
new_data <<- as.data.frame(0:mwv)
names(new_data) <<- 'V'
new_data$IlD <<- k*P1/new_data$V
new_data$IlD_fill <<- new_data$IlD
new_data$IlD_fill[new_data$IlD_fill > Ta] <<- NA # we don't want ribbon to
new_data$IlL <<- Ta-k*P/new_data$V
}
updateLabels <- function(k, P1, P, Ta){
### creates footnote caption for the plot
prob <- paste('Fining probability = ', k, sep = '')
landfilling_fine <- paste('Illegal landfilling fine = ', P1, sep = '')
dumping_fine <- paste('Illegal dumping fine = ', P, sep = '')
tax <- paste('Official tax = ', Ta, sep = '')
note <<- paste(prob, landfilling_fine, sep = '; ')
note <<- paste(note, dumping_fine, sep = '; ')
note <<- paste(note, tax, sep = '; ')
note
}
plotDumping <- function(mwv,
P,
P1,
k,
Ta){
### this function draws the plot
# initialise plot data
updateData(mwv, k, P1, P, Ta)
updateLabels(k, P1, P, Ta)
# draw the plot
profit <- ggplot(data=new_data, aes(x = new_data$V)) +
geom_ribbon(show_guide = T, alpha = 0.25, ymax = Ta,
aes(ymin = new_data$IlD_fill,
fill = "Illegal landfill owner's\nprofitable ratio\n")) +
geom_ribbon(show_guide = F, alpha = 0.25, ymin = 0,
aes(ymax = new_data$IlL,
fill = "Waste owner's\nprofitable ratio")) +
geom_line(data=new_data,
aes(y = new_data$IlD, col = "Illegal landfill owner's\nprofitable ratio\n"),
size = 1,
linetype = 4) +
geom_line(data=new_data,
aes(y = new_data$IlL, col = "Waste owner's\nprofitable ratio"),
size = 1,
linetype = 5) +
geom_line(y = Ta,
aes(col = "Official tax"),
size = 1.5,
linetype = 1)+
xlim(c(0, max(new_data$V)))+
ylim(c(0, Ta*1.5))+
theme(axis.text.x = element_text(angle=0, hjust = 0),
axis.title = element_text(face = 'bold', size = 14),
title = element_text(face = 'bold', size = 16),
legend.position = 'right',
legend.title = element_blank(),
legend.text = element_text(size = 12),
legend.key.width = unit(1, 'cm'))+
labs(title="Profitable ratio between the volume \nof illegally disposed waste \nand costs of illegal waste disposure",
x="Waste volume, cubic meters",
y="Cost per cubic meter, RUB")
# add a footnote about paramaters used for the current plot
profit <- arrangeGrob(profit, sub = textGrob(note,
x = 0,
hjust = -0.1,
vjust=0.1,
gp = gpar(fontface = "italic", fontsize = 12)))
# show plot
print(profit)
}
# draw the plot
plotDumping(max_waste_volume,
Illegal_dumping_fine_P,
Illigal_landfilling_fine_P1,
Fine_probability_k,
Official_tax_Ta)
One workaround in this situation would be to add geom_ribbon() also for the Official tax (using value Ta as ymax and ymin). This will make both legend with the same levels and they will be joined together. Then with scale_fill_manual() you can set fill value for Official tax to NA and then in legend for this level fill will be as background.
+ geom_ribbon(show_guide = F, alpha = 0.25, ymin = Ta,ymax=Ta,
aes(fill = "Official tax"))
+ scale_fill_manual(values=c("#F8766D",NA,"#00BFC4"))
P.S. Do not use the $ inside the aes() of ggplot() functions (use just column names). As you already wrote data=new_data all variables inside the aes() are looked for in this data frame.
You can use your version to also get the correct linetype, if you put the linetype insidt the aes function. However, your code becomes even more cumbersome that way. Consider reshaping your data before calling ggplot. Then you don't have to worry about the legend at all.
# reshape data ...
new_data$Ta <- Ta
new_data$zero <- 0
require(reshape2)
dta <- melt(new_data, id.vars="V", measure.vars=c("IlD", "IlL", "Ta"))
dta.lower <- melt(new_data, id.vars="V", measure.vars=c("IlD_fill", "zero", "Ta"))
dta.upper <- melt(new_data, id.vars="V", measure.vars=c("Ta", "IlL", "Ta"))
dta <- cbind(dta, lower=dta.lower$value, upper=dta.upper$value)
dta$name <- factor(NA, levels=c("Illegal landfill owner's\nprofitable ratio\n",
"Waste owner's\nprofitable ratio",
"Official tax"))
dta$name[dta$variable=="IlD"] <- "Illegal landfill owner's\nprofitable ratio\n"
dta$name[dta$variable=="IlL"] <- "Waste owner's\nprofitable ratio"
dta$name[dta$variable=="Ta"] <- "Official tax"
Now tha plotting command becomes much easier and more transparent:
ggplot(dta, aes(x=V, y=value, ymin=lower, ymax=upper,
color=name, fill=name, linetype=name)) +
geom_line(size=1.2) + ylim(c(0, Ta*1.5)) +
geom_ribbon(alpha=.25, linetype=0) +
theme(axis.text.x = element_text(angle=0, hjust = 0),
axis.title = element_text(face = 'bold', size = 14),
title = element_text(face = 'bold', size = 16),
legend.position = 'right',
legend.title = element_blank(),
legend.text = element_text(size = 12),
legend.key.width = unit(2, 'cm'))+
scale_linetype_manual(values=c(4, 5, 1)) +
labs(title="Profitable ratio between the volume \nof illegally disposed waste \nand costs of illegal waste disposure",
x="Waste volume, cubic meters",
y="Cost per cubic meter, RUB")

Resources