Case dependent scaling of plot size in ggplot loop - r

I am running a several ggplot barplots in a loop, including added text on top of each bar. I have defined plot scale via coord_fixed and expand_limits. Unfortunately, the y-axis differs from plot to plot, so that scale settings will not fit in all cases, i.e. the text gets cut off and/or the axes get compressed. Let me illustrate:
period <- c(rep("A",4),rep("B",4))
group <- rep(c("C","C","D","D"),2)
size <- rep(c("E","F"),4)
value <- c(23,29,77,62,18,30,54,81)
df <- data.frame(period,group,size,value)
library(ggplot2)
for (i in levels(df$group))
{
p <- ggplot(subset(df, group==i), aes(x=size, y=value, fill = period)) +
geom_bar(position="dodge", stat="identity", show.legend=F) +
geom_text(data=subset(df, group==i), aes(x=size, y=value,label=value),
size=10, fontface="bold", position = position_dodge(width=1),vjust = -0.5) +
expand_limits(y = max(df$value)*0.6) +
coord_fixed(ratio = 0.01)
ggsave(paste0("yourfilepath",i,".png"), width=7.72, height=4.5, units="in", p)
}
I would like the settings of coord_fixed and expand_limits to be case sensitive, dependening on value. I have experimented with using e.g. expand_limits(y = max(df$value * ifelse(df$value <= 50, 0.6, 1))), but that doesn't work in the way I had hoped. Any suggestions will be greatly appreciated!

Based on #Z.Lin's comment, I have added the df$value[df$group==i] argument to my ifelse function: expand_limits(y = max(df$value[df$group==i] * ifelse(df$value[df$group==i] <= 50, 5, 8))).

Related

Add new geom as new row in ggplot2, preventing layering of plots

I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())

How to set automatic label position based on box height

In a previous question, I asked about moving the label position of a barplot outside of the bar if the bar was too small. I was provided this following example:
library(ggplot2)
options(scipen=2)
dataset <- data.frame(Riserva_Riv_Fine_Periodo = 1:10 * 10^6 + 1,
Anno = 1:10)
ggplot(data = dataset,
aes(x = Anno,
y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity",
width=0.8,
position="dodge") +
geom_text(aes( y = Riserva_Riv_Fine_Periodo,
label = round(Riserva_Riv_Fine_Periodo, 0),
angle=90,
hjust= ifelse(Riserva_Riv_Fine_Periodo < 3000000, -0.1, 1.2)),
col="red",
size=4,
position = position_dodge(0.9))
And I obtain this graph:
The problem with the example is that the value at which the label is moved must be hard-coded into the plot, and an ifelse statement is used to reposition the label. Is there a way to automatically extract the value to cut?
A slightly better option might be to base the test and the positioning of the labels on the height of the bar relative to the height of the highest bar. That way, the cutoff value and label-shift are scaled to the actual vertical range of the plot. For example:
ydiff = max(dataset$Riserva_Riv_Fine_Periodo)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity", width=0.8) +
geom_text(aes(label = round(Riserva_Riv_Fine_Periodo, 0), angle=90,
y = ifelse(Riserva_Riv_Fine_Periodo < 0.3*ydiff,
Riserva_Riv_Fine_Periodo + 0.1*ydiff,
Riserva_Riv_Fine_Periodo - 0.1*ydiff)),
col="red", size=4)
You would still need to tweak the fractional cutoff in the test condition (I've used 0.3 in this case), depending on the physical size at which you render the plot. But you could package the code into a function to make the any manual adjustments a bit easier.
It's probably possible to automate this by determining the actual sizes of the various grobs that make up the plot and setting the condition and the positioning based on those sizes, but I'm not sure how to do that.
Just as an editorial comment, a plot with labels inside some bars and above others risks confusing the visual mapping of magnitudes to bar heights. I think it would be better to find a way to shrink, abbreviate, recode, or otherwise tweak the labels so that they contain the information you want to convey while being able to have all the labels inside the bars. Maybe something like this:
library(scales)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo/1000)) +
geom_col(width=0.8, fill="grey30") +
geom_text(aes(label = format(Riserva_Riv_Fine_Periodo/1000, big.mark=",", digits=0),
y = 0.5*Riserva_Riv_Fine_Periodo/1000),
col="white", size=3) +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
theme_classic() +
labs(y="Riserva (thousands)")
Or maybe go with a line plot instead of bars:
ggplot(dataset, aes(Anno, Riserva_Riv_Fine_Periodo/1e3)) +
geom_line(linetype="11", size=0.3, colour="grey50") +
geom_text(aes(label=format(Riserva_Riv_Fine_Periodo/1e3, big.mark=",", digits=0)),
size=3) +
theme_classic() +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
expand_limits(y=0) +
labs(y="Riserva (thousands)")

Hide legend elements in ggplot2

I am trying to plot the parameter estimates and levels of hierarchy from a stan model output. For the legend, I am hoping to remove all labels except for the "Overall Effects" label but I can't figure out how to remove all of the species successfully.
Here is the code:
ggplot(dfwide, aes(x=Estimate, y=var, color=factor(sp), size=factor(rndm),
alpha=factor(rndm))) +
geom_point(position =pd) +
geom_errorbarh(aes(xmin=(`2.5%`), xmax=(`95%`)), position=pd,
size=.5, height = 0, width=0) +
geom_vline(xintercept=0) +
scale_colour_manual(values=c("blue", "red", "orangered1","orangered3", "sienna4",
"sienna2", "green4", "green3", "purple2", "magenta2"),
labels=c("Overall Effects", expression(italic("A. pensylvanicum"),
italic("A. rubrum"), italic("A. saccharum"),
italic("B. alleghaniensis"), italic("B. papyrifera"),
italic("F. grandifolia"), italic("I. mucronata"),
italic("P. grandidentata"), italic("Q. rubra")))) +
scale_size_manual(values=c(3, 1, 1, 1, 1, 1, 1, 1, 1, 1)) +
scale_shape_manual(labels="", values=c("1"=16,"2"=16)) +
scale_alpha_manual(values=c(1, 0.4)) + guides(size=FALSE, alpha=FALSE) +
ggtitle(label = "A.") +
scale_y_discrete(limits = rev(unique(sort(dfwide$var))), labels=estimates) +
ylab("") +
labs(col="Effects") + theme(legend.title=element_blank())
The key points you need to notice is that remove part of the labels in legend can't be achieved by the function in ggplot2, what you need to do is interact with grid, which more underlying since both lattice and ggplot2 are based grid,to do some more underlying work, we need some functions in the grid.
To remove part of the labels in legend, there are three functions need to be used, they are grid.force(), grid.ls() and grid.remove() . After draw the picture by ggplot2, then using grid.force() and grid.ls(), we can find all the elements in the picture, they all are point, line, text, etc. Then we may need to find the elements we are interested, this process is interactive, since names of the element in ggplot2 are made by some numbers and text, they are not always meanful, after we identify the names of the element we are interested, we can use the grid.remove() function to remove the elements, blew is the sample code I made.
library(grid)
library(ggplot2)
set.seed(1)
data <- data.frame(x = rep(1:10, 2), y = sample(1:100, 20),
type = sample(c("A", "B"), 20, replace = TRUE))
ggplot(data, aes(x = x, y =y,color = type))+
geom_point()+
geom_line()+
scale_color_manual(values = c("blue", "darkred"))+
theme_bw()
until now, we have finished draw the whole picture, then we need to do some works remove some elements in the picture.
grid.force()
grid.ls()
grid.ls() list all the element names
grid.remove("key-4-1-1.5-2-5-2")
grid.remove("key-4-1-2.5-2-5-2")
grid.remove("label-4-3.5-4-5-4")
It's not perfect, but my solution would be to actually make two plots and combine them together. See this post where I lifted the extraction code from.
I don't have your data, but I think you will get the idea below:
library(ggplot2)
library(gridExtra)
library(grid)
#g_table credit goes to https://stackoverflow.com/a/11886071/2060081
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)}
p_legend = ggplot(dfwide[sp=='Overall Effects'], aes(x=Estimate, y=var, color=factor(sp),
size=factor(rndm),
alpha=factor(rndm))) +
geom_point(position =pd) +
geom_errorbarh(aes(xmin=(`2.5%`), xmax=(`95%`)), position=pd,
size=.5, height = 0, width=0) +
geom_vline(xintercept=0) +
scale_colour_manual(values=c("blue"),
labels=c("Overall Effects"))) +
scale_size_manual(values=c(3)) +
scale_shape_manual(labels="", values=c("1"=16,"2"=16)) +
scale_alpha_manual(values=c(1, 0.4)) + guides(size=FALSE, alpha=FALSE) +
ggtitle(label = "A.") +
scale_y_discrete(limits = rev(unique(sort(dfwide$var))), labels=estimates) +
ylab("") +
labs(col="Effects") + theme(legend.title=element_blank())
p_legend = g_legend(p_legend)
One of your plots will just be the legend. Subset your data based on the Overall Effects and then plot the two plots together as a grid.

ggplot2 add offset to jitter positions

I have data that looks like this
df = data.frame(x=sample(1:5,100,replace=TRUE),y=rnorm(100),assay=sample(c('a','b'),100,replace=TRUE),project=rep(c('primary','secondary'),50))
and am producing a plot using this code
ggplot(df,aes(project,x)) + geom_violin(aes(fill=assay)) + geom_jitter(aes(shape=assay,colour=y),height=.5) + coord_flip()
which gives me this
This is 90% of the way to being what I want. But I would like it if each point was only plotted on top of the violin plot for the matching assay type. That is, the jitterred positions of the points were set such that the triangles were only ever on the upper teal violin plot and the circles in the bottom red violin plot for each project type.
Any ideas how to do this?
In order to get the desired result, it is probably best to use position_jitterdodge as this gives you the best control over the way the points are 'jittered':
ggplot(df, aes(x = project, y = x, fill = assay, shape = assay, color = y)) +
geom_violin() +
geom_jitter(position = position_jitterdodge(dodge.width = 0.9,
jitter.width = 0.5,
jitter.height = 0.2),
size = 2) +
coord_flip()
which gives:
You can use interaction between assay & project:
p <- ggplot(df,aes(x = interaction(assay, project), y=x)) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip()
The labeling can be adjusted by numeric scaled x axis:
# cbind the interaction as a numeric
df$group <- as.numeric(interaction(df$assay, df$project))
# plot
p <- ggplot(df,aes(x=group, y=x, group=cut_interval(group, n = 4))) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip() + scale_x_continuous(breaks = c(1.5, 3.5), labels = levels(df$project))

Possible to combine position_jitter with position_dodge?

I've become quite fond of boxplots in which jittered points are overlain over the boxplots to represent the actual data, as below:
set.seed(7)
l1 <- gl(3, 1, length=102, labels=letters[1:3])
l2 <- gl(2, 51, length=102, labels=LETTERS[1:2]) # Will use this later
y <- runif(102)
d <- data.frame(l1, l2, y)
ggplot(d, aes(x=l1, y=y)) +
geom_point(position=position_jitter(width=0.2), alpha=0.5) +
geom_boxplot(fill=NA)
(These are particularly helpful when there are very different numbers of data points in each box.)
I'd like to use this technique when I am also (implicitly) using position_dodge to separate boxplots by a second variable, e.g.
ggplot(d, aes(x=l1, y=y, colour=l2)) +
geom_point(position=position_jitter(width=0.2), alpha=0.5) +
geom_boxplot(fill=NA)
However, I can't figure out how to dodge the points by the colour variable (here, l2) and also jitter them.
Here is an approach that manually performs the jittering and dodging.
# a plot with no dodging or jittering of the points
dp <- ggplot(d, aes(x=l1, y=y, colour=l2)) +
geom_point(alpha=0.5) +
geom_boxplot(fill=NA)
# build the plot for rendering
foo <- ggplot_build(dp)
# now replace the 'x' values in the data for layer 1 (unjittered and un-dodged points)
# with the appropriately dodged and jittered points
foo$data[[1]][['x']] <- jitter(foo$data[[2]][['x']][foo$data[[1]][['group']]],amount = 0.2)
# now draw the plot (need to explicitly load grid package)
library(grid)
grid.draw(ggplot_gtable(foo))
# note the following works without explicitly loading grid
plot(ggplot_gtable(foo))
I don't think you'll like it, but I've never found a way around this except to produce your own x values for the points. In this case:
d$l1.num <- as.numeric(d$l1)
d$l2.num <- (as.numeric(d$l2)/3)-(1/3 + 1/6)
d$x <- d$l1.num + d$l2.num
ggplot(d, aes(l1, y, colour = l2)) + geom_boxplot(fill = NA) +
geom_point(aes(x = x), position = position_jitter(width = 0.15), alpha = 0.5) + theme_bw()
It's certainly a long way from ideal, but becomes routine pretty quickly. If anyone has an alternative solution, I'd be very happy!
The new position_jitterdodge() works for this. However, it requires the fill aesthetic to tell it how to group points, so you have to specify a manual fill to get uncolored boxes:
ggplot(d, aes(x=l1, y=y, colour=l2, fill=l2)) +
geom_point(position=position_jitterdodge(width=0.2), alpha=0.5) +
geom_boxplot() + scale_fill_manual(values=rep('white', length(unique(l2))))
I'm using a newer version of ggplot2 (ggplot2_2.2.1.9000) and I was struggling to find an answer that worked for a similar plot of my own. #John Didon's answer produced an error for me; Error in position_jitterdodge(width = 0.2) : unused argument (width = 0.2). I had previous code that worked with geom_jitter that stopped working after downloading the newer version of ggplot2. This is how I solved it below - minimal-fuss code....
ggplot(d, aes(x=l1, y=y, colour=l2, fill=l2)) +
geom_point(position = position_jitterdodge(dodge.width = 1,
jitter.width = 0.5), alpha=0.5) +
geom_boxplot(position = position_dodge(width = 1), fill = NA)
Another option would be to use facets:
set.seed(7)
l1 <- gl(3, 1, length=102, labels=letters[1:3])
l2 <- gl(2, 51, length=102, labels=LETTERS[1:2]) # Will use this later
y <- runif(102)
d <- data.frame(l1, l2, y)
ggplot(d, aes(x=l1, y=y, colour=l2)) +
geom_point(position=position_jitter(width=0.2), alpha=0.5) +
geom_boxplot(fill=NA) +
facet_grid(.~l2) +
theme_bw()
Sorry, don´t have enough points to post the resulting graph.

Resources