Remove whiskers in boxplot made by ggboxplot (of ggpubr package)? - r

Sometimes it is appropriate to show a boxplot without the whiskers.
In geom_boxplot (ggplot2) we can achieve this with coef=0.
Is there a way to achieve this in ggboxplot (ggpubr v0.5.0, current version at the time of writing)?
I note that ggboxplot has much in common with geom_boxplot,
such as the ability to use outlier.shape=NA in each case to suppress
outliers. It seems that there should be an easy way to also suppress the whiskers.

I cannot find a way implemented in ggboxplot directly to do this, which is a bit strange because it passes the ellipsis to a geom_boxplot call, so I am not sure why the coef=0 does not reach there and supresses the whiskers.
As a stopgap, you can modify the ggplot object created by ggboxplot and remove whiskers that way.
The following function shows this:
ggboxplot_whisker_opt <- function(...)
{
opts <- list(...)
# check if user specified a whiskers argument and set options accordingly
if("whisker" %in% names(opts))
{
whisk <- opts$whisker
opts$whisker <- NULL
} else {
whisk <- TRUE
}
pl <- do.call(ggboxplot,opts) # create plot by calling ggboxplot with all user options
if(!whisk)
{
pl_list <- ggplot_build(pl) # get listed version of ggplot object to modify
pl_list$data[[1]]$ymin <- NA # remove the ymin/max that specify the whiskers
pl_list$data[[1]]$ymax <- NA
pl <- ggplot_gtable(pl_list) # convert back to ggplot object
}
# plot the ggplot and return
plot(pl)
}
We can now call that function with whisker=TRUE/FALSE or without it and it produced plots accordingly:
set.seed(123)
x <- rnorm(100)
labels <- round(runif(100,1,2))
df <- data.frame(labels=labels,
value=x)
ggboxplot_whisker_opt(df,"labels","value")
# is the same as
ggboxplot_whisker_opt(df,"labels","value",whisker=TRUE)
ggboxplot_whisker_opt(df,"labels","value",whisker=FALSE)

Related

Save plots as R objects and displaying in grid

In the following reproducible example I try to create a function for a ggplot distribution plot and saving it as an R object, with the intention of displaying two plots in a grid.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
output<-list(distribution,var1,dat)
return(output)
}
Call to function:
set.seed(100)
df <- data.frame(x = rnorm(100, mean=10),y =rep(1,100))
output1 <- ggplothist(dat=df,var1='x')
output1[1]
All fine untill now.
Then i want to make a second plot, (of note mean=100 instead of previous 10)
df2 <- data.frame(x = rep(1,1000),y = rnorm(1000, mean=100))
output2 <- ggplothist(dat=df2,var1='y')
output2[1]
Then i try to replot first distribution with mean 10.
output1[1]
I get the same distibution as before?
If however i use the information contained inside the function, return it back and reset it as a global variable it works.
var1=as.numeric(output1[2]);dat=as.data.frame(output1[3]);p1 <- output1[1]
p1
If anyone can explain why this happens I would like to know. It seems that in order to to draw the intended distribution I have to reset the data.frame and variable to what was used to draw the plot. Is there a way to save the plot as an object without having to this. luckly I can replot the first distribution.
but i can't plot them both at the same time
var1=as.numeric(output2[2]);dat=as.data.frame(output2[3]);p2 <- output2[1]
grid.arrange(p1,p2)
ERROR: Error in gList(list(list(data = list(x = c(9.66707664902549, 11.3631137069225, :
only 'grobs' allowed in "gList"
In this" Grid of multiple ggplot2 plots which have been made in a for loop " answer is suggested to use a list for containing the plots
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
pltlist <- list()
pltlist[["plot"]] <- distribution
output<-list(pltlist,var1,dat)
return(output)
}
output1 <- ggplothist(dat=df,var1='x')
p1<-output1[1]
output2 <- ggplothist(dat=df2,var1='y')
p2<-output2[1]
output1[1]
Will produce the distribution with mean=100 again instead of mean=10
and:
grid.arrange(p1,p2)
will produce the same Error
Error in gList(list(list(plot = list(data = list(x = c(9.66707664902549, :
only 'grobs' allowed in "gList"
As a last attempt i try to use recordPlot() to record everything about the plot into an object. The following is now inside the function.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
distribution<-recordPlot()
output<-list(distribution,var1,dat)
return(output)
}
This function will produce the same errors as before, dependent on resetting the dat, and var1 variables to what is needed for drawing the distribution. and similarly can't be put inside a grid.
I've tried similar things like arrangeGrob() in this question "R saving multiple ggplot2 plots as R-object in list and re-displaying in grid " but with no luck.
I would really like a solution that creates an R object containing the plot, that can be redrawn by itself and can be used inside a grid without having to reset the variables used to draw the plot each time it is done. I would also like to understand wht this is happening as I don't consider it intuitive at all.
The only solution I can think of is to draw the plot as a png file, saved somewhere and then have the function return the path such that i can be reused - is that what other people are doing?.
Thanks for reading, and sorry for the long question.
Found a solution
How can I reference the local environment within a function, in R?
by inserting
localenv <- environment()
And referencing that in the ggplot
distribution <- ggplot(data=dat, aes(dat[,var1]),environment = localenv)
made it all work! even with grid arrange!

Inverse of ggplotGrob?

I have a function which manipulates a ggplot object, by converting it to a grob and then modifying the layers. I would like the function to return a ggplot object not a grob. Is there a simple way to convert a grob back to gg?
The documentation on ggplotGrob is awfully sparse.
Simple example:
P <- ggplot(iris) + geom_bar(aes(x=Species, y=Petal.Width), stat="identity")
G <- ggplotGrob(P)
... some manipulation to G ...
## DESIRED:
P2 <- inverse_of_ggplotGrob(G)
such that, we can continue to use basic ggplot syntax, ie
`P2 + ylab ("The Width of the Petal")`
UPDATE:
To answer the question in the comment, the motivation here is to modify the colors of facet labels programmatically, based on the value of label name in each facet. The functions below work nicely (based on input from baptise in a previous question).
I would like for the return value from colorByGroup to be a ggplot object, not simply a grob.
Here is the code, for those interested
get_grob_strips <- function(G, strips=grep(pattern="strip.*", G$layout$name)) {
if (inherits(G, "gg"))
G <- ggplotGrob(G)
if (!inherits(G, "gtable"))
stop ("G must be a gtable object or a gg object")
strip.type <- G$layout[strips, "name"]
## I know this works for a simple
strip.nms <- sapply(strips, function(i) {
attributes(G$grobs[[i]]$width$arg1)$data[[1]][["label"]]
})
data.table(grob_index=strips, type=strip.type, group=strip.nms)
}
refill <- function(strip, colour){
strip[["children"]][[1]][["gp"]][["fill"]] <- colour
return(strip)
}
colorByGroup <- function(P, colors, showWarnings=TRUE) {
## The names of colors should match to the groups in facet
G <- ggplotGrob(P)
DT.strips <- get_grob_strips(G)
groups <- names(colors)
if (is.null(groups) || !is.character(groups)) {
groups <- unique(DT.strips$group)
if (length(colors) < length(groups))
stop ("not enough colors specified")
colors <- colors[seq(groups)]
names(colors) <- groups
}
## 'groups' should match the 'group' in DT.strips, which came from the facet_name
matched_groups <- intersect(groups, DT.strips$group)
if (!length(matched_groups))
stop ("no groups match")
if (showWarnings) {
if (length(wh <- setdiff(groups, DT.strips$group)))
warning ("values in 'groups' but not a facet label: \n", paste(wh, colapse=", "))
if (length(wh <- setdiff(DT.strips$group, groups)))
warning ("values in facet label but not in 'groups': \n", paste(wh, colapse=", "))
}
## identify the indecies to the grob and the appropriate color
DT.strips[, color := colors[group]]
inds <- DT.strips[!is.na(color), grob_index]
cols <- DT.strips[!is.na(color), color]
## Fill in the appropriate colors, using refill()
G$grobs[inds] <- mapply(refill, strip = G$grobs[inds], colour = cols, SIMPLIFY = FALSE)
G
}
I would say no. ggplotGrob is a one-way street. grob objects are drawing primitives defined by grid. You can create arbitrary grobs from scratch. There's no general way to turn a random collection of grobs back into a function that would generate them (it's not invertible because it's not 1:1). Once you go grob, you never go back.
You could wrap a ggplot object in a custom class and overload the plot/print commands to do some custom grob manipulation, but that's probably even more hack-ish.
You can try the following:
p = ggplotify::as.ggplot(g)
For more info, see https://cran.r-project.org/web/packages/ggplotify/vignettes/ggplotify.html
It involves a little bit of a cheat annotation_custom(as.grob(plot),...), so it may not work for all circumstances: https://github.com/GuangchuangYu/ggplotify/blob/master/R/as-ggplot.R
Have a look at the ggpubr package: it has a function as_ggplot(). If your grob is not too complex it might be a solution!
I would also advise to have a look at the patchwork package which combine nicely ggplots... it is likely to not be what you are looking for but... have a look.

Multiple graphs within plot with loop

How to get graph for each column of data.frame within one plot with loop? Must be easy just can't figure it out.
Sample data:
rdata <- data.frame(y=rnorm(1000,2,2),v1=rnorm(1000,1,1),v2=rnorm(1000,3,3),
v3=rnorm(1000,4,4),v4=rnorm(1000,5,5))
What I have tried?
library(lattice)
p <- par(mfrow=c(2,2))
for(i in 2:5){
w <- xyplot(y~rdata[,i],rdata)
print(w)
}
par(p)
If you don't have to use lattice you can just use base plot instead and it should work as you want.
p <- par(mfrow=c(2,2))
for(i in 2:5){
plot(y~rdata[,i],rdata)
}
par(p)
If you want to use lattice look this answer. Lattice ignores par, so you have to do some more work to achieve what you want.
Inorder to easily arrange a bunch of lattice plots, I like to use the helper function print.plotlist. It has a layout= parameter that acts like the layout() function for base graphics. For example, you could call
rdata <- data.frame(y=rnorm(1000,2,2),v1=rnorm(1000,1,1),v2=rnorm(1000,3,3),
v3=rnorm(1000,4,4),v4=rnorm(1000,5,5))
library(lattice)
plots<-lapply(2:5, function(i) {xyplot(y~rdata[,i],rdata)})
print.plotlist(plots, layout=matrix(1:4, ncol=2))
to get
Otherwise you normally use a split= parameter to the print statement to place a plot in a subsection of the device. For example, you could also do
print(plots[[1]], split=c(1,1,2,2), more=T)
print(plots[[2]], split=c(1,2,2,2), more=T)
print(plots[[3]], split=c(2,1,2,2), more=T)
print(plots[[4]], split=c(2,2,2,2))

How can I suppress the creation of a plot while calling a function in R?

I am using a function in R (specifically limma::plotMDS) that produces a plot and also returns a useful value. I want to get the returned value without producing the plot. Is there an easy way to call the function but suppress the plot that it creates?
You can wrap the function call like this :
plotMDS.invisible <- function(...){
ff <- tempfile()
png(filename=ff)
res <- plotMDS(...)
dev.off()
unlink(ff)
res
}
An example of call :
x <- matrix(rnorm(1000*6,sd=0.5),1000,6)
rownames(x) <- paste("Gene",1:1000)
x[1:50,4:6] <- x[1:50,4:6] + 2
# without labels, indexes of samples are plotted.
mds <- plotMDS.invisible(x, col=c(rep("black",3), rep("red",3)) )

Can I tell ggpairs to use log scales?

Can I provide a parameter to the ggpairs function in the GGally package to use log scales for some, not all, variables?
You can't provide the parameter as such (a reason is that the function creating the scatter plots is predefined without scale, see ggally_points), but you can change the scale afterward using getPlot and putPlot. For instance:
custom_scale <- ggpairs(data.frame(x=exp(rnorm(1000)), y=rnorm(1000)),
upper=list(continuous='points'), lower=list(continuous='points'))
subplot <- getPlot(custom_scale, 1, 2) # retrieve the top left chart
subplotNew <- subplot + scale_y_log10() # change the scale to log
subplotNew$type <- 'logcontinuous' # otherwise ggpairs comes back to a fixed scale
subplotNew$subType <- 'logpoints'
custom_scale <- putPlot(custom_fill, subplotNew, 1, 2)
This is essentially the same answer as Jean-Robert but looks much more simple (approachable). I don't know if it is a new feature but it doesn't look like you need to use getPlot or putPlot anymore.
custom_scale[1,2]<-custom_scale[1,2] + scale_y_log10() + scale_x_log10()
Here is a function to apply it across a big matrix. Supply the number of rows in the plot and the name of the plot.
scalelog2<-function(x=2,g){ #for below diagonal
for (i in 2:x){
for (j in 1:(i-1)) {
g[i,(j)]<-g[i,(j)] + scale_x_continuous(trans='log2') +
scale_y_continuous(trans='log2')
} }
for (i in 1:x){ #for the bottom row
g[(x+1),i]<-g[(x+1),i] + scale_y_continuous(trans='log2')
}
for (i in 1:x){ #for the diagonal
g[i,i]<-g[i,i]+ scale_x_continuous(trans='log2') }
return(g) }
It's probably better use a linear scale and log transform variables as appropriate before supplying them to ggpairs because this avoids ambiguity in how the correlation coefficients have been computed (before or after log-transform).
This can be easily achieved e.g. like this:
library(tidyverse)
log10_vars <- vars(ends_with(".Length")) # define variables to be transformed
iris %>% # use standard R example dataframe
mutate_at(log10_vars, log10) %>% # log10 transform selected columns
rename_at(log10_vars, sprintf, fmt="log10 %s") %>% # rename variables accordingly
GGally::ggpairs(aes(color=Species))

Resources