R - ggplot2 - Add arrow if geom_errorbar outside limits - r

I am creating a figure using ggplot and would like to use arrows to indicate where my error bars go beyond the defined axis. For example, I would like to end up with a figure that looks like:
I want R to determine which lower bounds are outside the defined chart range and to add a nice looking arrow (instead of my ugly paint added arrows).
I know there has to be a way to do this. Any ideas? Here is my code to make the above graph without the arrows added by-hand:
#generate data
myData<-data.frame(ALPHA=round(runif(60,.5,.8),2),
error=round(runif(60,.05,.15),2),
formN=rep(1:5,12),
Cat=c(rep("ELL",30),rep("SWD",30)),
grade=rep(c(rep(3,5),rep(4,5),rep(5,5),rep(6,5),rep(7,5),rep(8,5)),2)
)
myData$LCL<-myData$ALPHA-myData$error
myData$UCL<-myData$ALPHA+myData$error
#set error outside of range for example
myData[myData$Cat=="ELL" & formN==1,"LCL"]<-0
library(ggplot2)
ggplot(myData, aes(x=formN, y=ALPHA, colour=Cat)) +
geom_errorbar(aes(ymin=LCL, ymax=UCL), width=.4, position=position_dodge(.5)) +
geom_point(position=position_dodge(.5), size=2) +
labs(x="Form", y="Alpha", title="TITLE") +
geom_line(position=position_dodge(.5), size=.3) +
coord_cartesian(ylim=c(.3, 1)) +
facet_wrap(~grade, ncol=3)

What about this: first create a column to check if the values go beyond your range and if this is the case determine the length from the y-point to the border of the plot.
library(dplyr)
myData_m <- myData %>% mutate(LCL_l = ifelse(LCL < .3, ALPHA - .3, NA), UCL_l = ifelse(UCL > 1, 1 - ALPHA, NA))
In the second step use this variable to add arrows with segment. If there are also values going through the upper limit you can additionally use the other variable ULC_l to add further arrows.
ggplot(myData_m, aes(x=formN, y=ALPHA, colour=Cat)) +
geom_errorbar(aes(ymin=LCL, ymax=UCL), width=.4, position=position_dodge(.5)) +
geom_point(position=position_dodge(.5), size=2) +
labs(x="Form", y="Alpha", title="TITLE") +
geom_line(position=position_dodge(.5), size=.3) +
coord_cartesian(ylim=c(.3, 1)) +
facet_wrap(~grade, ncol=3) +
geom_segment(aes(x = formN - .12, xend = formN - .12, y = ALPHA, yend = ALPHA - LCL_l), arrow = arrow(length = unit(myData_m$LCL_l, "cm")))
P.S.: the -.12 is used to get rid of the dodging effect to the arrows.

Related

Case dependent scaling of plot size in ggplot loop

I am running a several ggplot barplots in a loop, including added text on top of each bar. I have defined plot scale via coord_fixed and expand_limits. Unfortunately, the y-axis differs from plot to plot, so that scale settings will not fit in all cases, i.e. the text gets cut off and/or the axes get compressed. Let me illustrate:
period <- c(rep("A",4),rep("B",4))
group <- rep(c("C","C","D","D"),2)
size <- rep(c("E","F"),4)
value <- c(23,29,77,62,18,30,54,81)
df <- data.frame(period,group,size,value)
library(ggplot2)
for (i in levels(df$group))
{
p <- ggplot(subset(df, group==i), aes(x=size, y=value, fill = period)) +
geom_bar(position="dodge", stat="identity", show.legend=F) +
geom_text(data=subset(df, group==i), aes(x=size, y=value,label=value),
size=10, fontface="bold", position = position_dodge(width=1),vjust = -0.5) +
expand_limits(y = max(df$value)*0.6) +
coord_fixed(ratio = 0.01)
ggsave(paste0("yourfilepath",i,".png"), width=7.72, height=4.5, units="in", p)
}
I would like the settings of coord_fixed and expand_limits to be case sensitive, dependening on value. I have experimented with using e.g. expand_limits(y = max(df$value * ifelse(df$value <= 50, 0.6, 1))), but that doesn't work in the way I had hoped. Any suggestions will be greatly appreciated!
Based on #Z.Lin's comment, I have added the df$value[df$group==i] argument to my ifelse function: expand_limits(y = max(df$value[df$group==i] * ifelse(df$value[df$group==i] <= 50, 5, 8))).

How to have y-axis label of two different colors in ggplot2

As my y-axis stays for two things in the plot, I would like to be able to have, on the same side (left) two labels of the 'kind text1 / text2' with text2 of a particular colour (green). I tried to do so by using latex2exp but I think it does not support the \color LaTeX command.
This is what I tried till now:
ggplot() + geom_point(aes(x=1, y=1)) + ylab(TeX('\\text{text1 / } {\\color{DarkGreen} \\text{text2}}'))
ggplot() + geom_point(aes(x=1, y=1)) + ylab('') +
annotate('text', x = 0.25, y = 1, label = TeX('\\text{text1 / } {\\color{DarkGreen} \\text{text2}}'), angle = 90)
The second one fails miserably, also because it is plotting IN the plot and not outside.

How to set automatic label position based on box height

In a previous question, I asked about moving the label position of a barplot outside of the bar if the bar was too small. I was provided this following example:
library(ggplot2)
options(scipen=2)
dataset <- data.frame(Riserva_Riv_Fine_Periodo = 1:10 * 10^6 + 1,
Anno = 1:10)
ggplot(data = dataset,
aes(x = Anno,
y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity",
width=0.8,
position="dodge") +
geom_text(aes( y = Riserva_Riv_Fine_Periodo,
label = round(Riserva_Riv_Fine_Periodo, 0),
angle=90,
hjust= ifelse(Riserva_Riv_Fine_Periodo < 3000000, -0.1, 1.2)),
col="red",
size=4,
position = position_dodge(0.9))
And I obtain this graph:
The problem with the example is that the value at which the label is moved must be hard-coded into the plot, and an ifelse statement is used to reposition the label. Is there a way to automatically extract the value to cut?
A slightly better option might be to base the test and the positioning of the labels on the height of the bar relative to the height of the highest bar. That way, the cutoff value and label-shift are scaled to the actual vertical range of the plot. For example:
ydiff = max(dataset$Riserva_Riv_Fine_Periodo)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity", width=0.8) +
geom_text(aes(label = round(Riserva_Riv_Fine_Periodo, 0), angle=90,
y = ifelse(Riserva_Riv_Fine_Periodo < 0.3*ydiff,
Riserva_Riv_Fine_Periodo + 0.1*ydiff,
Riserva_Riv_Fine_Periodo - 0.1*ydiff)),
col="red", size=4)
You would still need to tweak the fractional cutoff in the test condition (I've used 0.3 in this case), depending on the physical size at which you render the plot. But you could package the code into a function to make the any manual adjustments a bit easier.
It's probably possible to automate this by determining the actual sizes of the various grobs that make up the plot and setting the condition and the positioning based on those sizes, but I'm not sure how to do that.
Just as an editorial comment, a plot with labels inside some bars and above others risks confusing the visual mapping of magnitudes to bar heights. I think it would be better to find a way to shrink, abbreviate, recode, or otherwise tweak the labels so that they contain the information you want to convey while being able to have all the labels inside the bars. Maybe something like this:
library(scales)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo/1000)) +
geom_col(width=0.8, fill="grey30") +
geom_text(aes(label = format(Riserva_Riv_Fine_Periodo/1000, big.mark=",", digits=0),
y = 0.5*Riserva_Riv_Fine_Periodo/1000),
col="white", size=3) +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
theme_classic() +
labs(y="Riserva (thousands)")
Or maybe go with a line plot instead of bars:
ggplot(dataset, aes(Anno, Riserva_Riv_Fine_Periodo/1e3)) +
geom_line(linetype="11", size=0.3, colour="grey50") +
geom_text(aes(label=format(Riserva_Riv_Fine_Periodo/1e3, big.mark=",", digits=0)),
size=3) +
theme_classic() +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
expand_limits(y=0) +
labs(y="Riserva (thousands)")

Midpoint of discrete diverging scale in ggplot2

I'm working with some grid data and I'm having problems with working with discrete diverging scales. Specifically, how to set the midpoint so it's not at the center of the range. This is a reproducible example to get what i mean:
library(ggplot2)
grid <- expand.grid(lon = seq(0, 360, by = 2), lat = seq(-90, 0, by = 2))
grid$z <- with(grid, cos(lat*pi/180) - .7)
ggplot(grid, aes(lon, lat)) +
geom_raster(aes(fill = cut_width(z, .1))) +
scale_fill_brewer(palette = "RdBu")
Here, the center of the scale is not a the divide between positive and negative values. I know I could use a continuous scale, but I find that having fewer colours help with what I'm trying to show.
Is there a way to shift the midpoint in a discrete scale? Other alternatives that achieve the same result are welcome too.
The issue is that your cut points are not falling symmetrically around 0, and are mapping directly to your colors. One approach is to manually set your cut points so that they center around 0. Then, just make sure to not drop unused levels in the legend:
zCuts <-
seq(-.7, 0.7, length.out = 10)
ggplot(grid, aes(lon, lat)) +
geom_raster(aes(fill = cut(z, zCuts))) +
scale_fill_brewer(palette = "RdBu"
, drop = FALSE)
If you are willing to go with a gradient instead of such discrete colors, you can use scale_fill_gradient2 which by default centers at 0 and ranges between two colors:
ggplot(grid, aes(lon, lat)) +
geom_raster(aes(fill = z)) +
scale_fill_gradient2()
Or, if you really want the interpolation from Color Brewer, you can set the limits argument in scale_fill_distiller and get a gradient that way instead. Here, I set them at + and - the range around 0 (max(abs(grid$z)) is getting the largest deviation from 0, whether it is the min or the max, to ensure that the range is symetrical). If you are using more than the 11 available values, that is probably the best way to go:
ggplot(grid, aes(lon, lat)) +
geom_raster(aes(fill = z)) +
scale_fill_distiller(palette = "RdBu"
, limits = c(-1,1)*max(abs(grid$z))
)
If you want more colors, without doing a gradient, you are probably going to need to construct your own palette manually with more colors. The more you add, the less the distinction between the colors you will find. Here is one example stitching together two palettes to ensure that you are working from colors that are distinct.
zCuts <-
seq(-.7, 0.7, length.out = 20)
myPallette <-
c(rev(brewer.pal(9, "YlOrRd"))
, "white"
, brewer.pal(9, "Blues"))
ggplot(grid, aes(lon, lat)) +
geom_raster(aes(fill = cut(z, zCuts))) +
scale_fill_manual(values = myPallette
, drop = FALSE)

Whisker plots to compare mean and variance between clusters [duplicate]

I am trying to recreate a figure from a GGplot2 seminar http://dl.dropbox.com/u/42707925/ggplot2/ggplot2slides.pdf.
In this case, I am trying to generate Example 5, with jittered data points subject to a dodge. When I run the code, the points are centered around the correct line, but have no jitter.
Here is the code directly from the presentation.
set.seed(12345)
hillest<-c(rep(1.1,100*4*3)+rnorm(100*4*3,sd=0.2),
rep(1.9,100*4*3)+rnorm(100*4*3,sd=0.2))
rep<-rep(1:100,4*3*2)
process<-rep(rep(c("Process 1","Process 2","Process 3","Process 4"),each=100),3*2)
memorypar<-rep(rep(c("0.1","0.2","0.3"),each=4*100),2)
tailindex<-rep(c("1.1","1.9"),each=3*4*100)
ex5<-data.frame(hillest=hillest,rep=rep,process=process,memorypar=memorypar, tailindex=tailindex)
stat_sum_df <- function(fun, geom="crossbar", ...) {stat_summary(fun.data=fun, geom=geom, ...) }
dodge <- position_dodge(width=0.9)
p<- ggplot(ex5,aes(x=tailindex ,y=hillest,color=memorypar))
p<- p + facet_wrap(~process,nrow=2) + geom_jitter(position=dodge) +geom_boxplot(position=dodge)
p
In ggplot2 version 1.0.0 there is new position named position_jitterdodge() that is made for such situation. This postion should be used inside the geom_point() and there should be fill= used inside the aes() to show by which variable to dodge your data. To control the width of dodging argument dodge.width= should be used.
ggplot(ex5, aes(x=tailindex, y=hillest, color=memorypar, fill=memorypar)) +
facet_wrap(~process, nrow=2) +
geom_point(position=position_jitterdodge(dodge.width=0.9)) +
geom_boxplot(fill="white", outlier.colour=NA, position=position_dodge(width=0.9))
EDIT: There is a better solution with ggplot2 version 1.0.0 using position_jitterdodge. See #Didzis Elferts' answer. Note that dodge.width controls the width of the dodging and jitter.width controls the width of the jittering.
I'm not sure how the code produced the graph in the pdf.
But does something like this get you close to what you're after?
I convert tailindex and memorypar to numeric; add them together; and the result is the x coordinate for the geom_jitter layer. There's probably a more effective way to do it. Also, I'd like to see how dodging geom_boxplot and geom_jitter, and with no jittering, will produce the graph in the pdf.
library(ggplot2)
dodge <- position_dodge(width = 0.9)
ex5$memorypar2 <- as.numeric(ex5$tailindex) +
3 * (as.numeric(as.character(ex5$memorypar)) - 0.2)
p <- ggplot(ex5,aes(x=tailindex , y=hillest)) +
scale_x_discrete() +
geom_jitter(aes(colour = memorypar, x = memorypar2),
position = position_jitter(width = .05), alpha = 0.5) +
geom_boxplot(aes(colour = memorypar), outlier.colour = NA, position = dodge) +
facet_wrap(~ process, nrow = 2)
p

Resources