ggplot in R - stop error bars going below zero - r

I'm plotting the modelled population density of several bird species +/- standard error. Because the y variable is density, values of less than zero make no sense, I want to truncate the error bars so they don't go below zero. However, I'm having trouble doing this.
This code works fine, but as you can see for Black Kite the error bars go below zero:
bird.plot.data <- data.frame(species = rep(c("Black kite", "Cormorant","Goosander"),2),
Restored = c(rep("YES",3), rep("NO",3)),
est.count = c(1.48, 3.12, 20.0, 0, 5.18, 2.11),
std.err = c(1.78, 1.78, 1.39, 0, 0.66, 1.02))
bird.plot <- ggplot(data = bird.plot.data, aes(x = Restored))+
facet_wrap(~ species, scales = "free_y")+
geom_col(aes(y = est.count, fill = Restored), position = position_dodge())+
geom_errorbar(aes(ymax = est.count + std.err, ymin = est.count - std.err ))+
scale_fill_manual(values = c("darkgreen", "olivedrab1"))+
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.x = element_blank())+
ylab("Estimated density (birds/km\U00B2)")
last_plot()
I've tried a couple of options. The most obvious one would be to modify the ymin of the error bars themselves to be no lower than zero. However, this messes up the error bars completely and I'm not sure why:
b.p.mod <- ggplot(data = bird.plot.data, aes(x = Restored))+
facet_wrap(~ species, scales = "free_y")+
geom_col(aes(y = est.count, fill = Restored), position = position_dodge())+
geom_errorbar(aes(ymax = est.count + std.err, ymin = max(est.count - std.err, 0)))+
scale_fill_manual(values = c("darkgreen", "olivedrab1"))+
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.x = element_blank())+
ylab("Estimated density (birds/km\U00B2)")
last_plot()
Another option would be to limit the y axis to 0 so the error bar is not shown below zero. However, the cropping method
b.p.mod2 <- bird.plot + ylim(0,NA)
last_plot()
removes the error bar completely, which I don't want. The zooming method
b.p.mod3 <- bird.plot + coord_cartesian(ylim = c(0,NA))
last_plot() # Produces error
Doesn't let me leave the upper end unspecified, which is important as different species have very different densities.
Thoughts? My preferred solution would be to work out why the first option is creating such odd results.

I know this is an old post but maybe somebody will stumble upon the same problem.
For me:
geom_errorbar(aes(ymax = est.count + std.err, ymin = ifelse(est.count - std.err < 0, 0, est.count - std.err)))
works perfectly fine :)

Related

ggplot how to control fonts on boxplot with stat

How do I control the font-family and size for text elements added to my boxplot:
Following the approach in this question, I have implemented the following code to show the number of observations:
library("ggplot2")
v_min <- -1
v_max <- 3.5
increm <- 0.5
y_limits <- c(v_min, v_max)
increms <- seq(v_min, v_max, increm)
counts <- function(x){
# you can experiment with 'adjust' and 'max-value' to find the perfect position
adjust <- 0.95
return(c(y = adjust * v_max, label = length(x)))
}
ggplot(d1, aes(x = ONRC_Hierarchy, y=lwpMeanRut_inc)) +
geom_boxplot(outlier.alpha = 0.2, outlier.size = 0.5) +
geom_hline(aes(yintercept=0), color="blue", linetype="dotted", size=1)+
stat_summary(fun.data = counts, geom = "text") +
scale_y_continuous(limits = y_limits, breaks = increms) +
xlab("") +
ylab("Rut Increment (mm/year)\n") +
theme_minimal() +
theme(
text = element_text(size = 10, family = "mono"),
axis.text.x=element_text(angle = -35, hjust = 0),
panel.grid.major.y = element_line(color = "lightgray",
size = 0.15,linetype = 2),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank())
This solution works, as shown in the plot below, except that the font is different from the other graph elements. As you can see, I have tried to control this with the theme() statement, but it does not seem to work. Note, I deliberately used a small mono font to show the difference in the number of observations labels and the other graph elements.
Geom/stat fonts are set in the geom/stat layer, not in theme(). In this case, you can add family = "mono" as an argument to your stat_summary().
The size of fonts in geom_text/geom_label etc. is not on the same scale as element_text theme options, which are in points. You can read more about that here and here.

colour geom_rect under certain condition

I´ve got the following code:
ggplot(dummy$Crustacean) +
geom_rect(
aes(
xmin = char2num(sites_fct) - 0.4,
xmax = char2num(sites_fct) + 0.4,
ymin = ifelse(trophic == "Crustacean", 0.01, 1),
ymax = summed_tu),
colour = 'black', alpha =0.7) +
labs(y= expression("Summed TU"[EC10-QSAR]), x= "Sampling sites")+
scale_y_log10(limits = c(0.0001, 1)) +
# Fake discrete axis
scale_x_continuous(labels = sort(unique(dummy$Crustacean$sites_fct)), breaks = 1:9) +
# before the dot means vertical plotting
facet_grid(dummy$Crustacean$metrics_fct ~ dummy$Crustacean$trophic) +
theme_bw()+
# facet_grid box colour
theme(strip.background.x = element_rect(colour = "black", fill = "white"),
strip.background.y = element_blank(), strip.text.y = element_blank())+
theme(axis.text.x = element_text(size=10, margin =margin(0,0,0,0), angle =45, vjust = 1, hjust=1),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.line.x = element_line(color = 'black', size=0.5),
axis.line.y = element_blank())
which give as uutput the following figure:
I need to change the colour of those boxes y > 0.01 in order to have this desire output:
I found several post about background (quite useful for the future) but I cloud not find something like my example.
Thanks!
OP, this should probably help you. You're trying to draw what appears to be a column or bar chart. In this case, it's probably best to use geom_col instead of geom_rect. With geom_col you only have to supply an x aesthetic (discrete value), and a y aesthetic for the height of the bar. You have not shared your data, but it seems the x axis is categorical already in your dataset, right?
Here's a reprex:
library(ggplot2)
set.seed(1234)
df <- data.frame(x=LETTERS, y=rnorm(26))
ggplot(df, aes(x,y)) +
geom_col(
aes(fill=ifelse(y>0, 'positive', 'negative')),
color='black', alpha=0.8
) +
scale_fill_manual(name='Value', values=c('positive'='orange', 'negative'='gray'))
What's going on here is that we only have to supply x and y to get the bars in the correct place and set the height. For the fill of each of the bars, you can actually just set the label to be "positive" or "negative" (or whatever your desired label would be) on the fly via an ifelse statement. Doing this alone will result in creating a legend automatically with fill colors chosen automatically. To fix a particular set of colors, I'm setting that manually via scale_fill_manual() and supplying a named vector to the values argument.
In your case, you can probably do something similar for geom_rect. That is, you could just try specifying fill= inside aes() and following a similar manner to here if you want... but I'd recommend switching to use geom_col, as it is most appropriate for what you're doing.
EDIT
As OP indicated in the comment, in the original question on which this is based, geom_rect is required since the bars minimum is not always the same number. The ymin aesthetic changes, so it makes sense to use geom_rect here.
The brute force way is to still use ifelse statements inside aes() for fill. It get's a bit dodgey, but it gets the job done:
ggplot(df) +
geom_rect(
aes(
xmin = char2num(sites) - 0.4,
xmax = char2num(sites) + 0.4,
ymin = ifelse(trop == "pt", 0.1, 1),
ymax = conc,
fill = ifelse(trop == "pt",
ifelse(conc > 0.1, 'positive', 'negative'),
ifelse(conc > 1, 'positive', 'negative'))
),
colour = 'black', alpha = 0.8
) +
scale_y_log10() +
# Fake discrete axis
scale_x_continuous(labels = sort(unique(df$sites)),
breaks = 1:3) +
scale_fill_manual(name='Conc', values=c('positive'='orange', 'negative'='gray')) +
facet_grid(. ~ trop) +
theme_bw()
To complete the setup, you may want to adjust the order of the items in the legend and avoid some of that kind of icky nested ifelse stuff. In that case, you can always do the checking outside the ggplot call. If you have more than the two values for df$trop, you can consider creating the df$conc_min column via a merge with another dataset, but it works just fine here.
df$conc_adjust <- char2num(df$sites)
df$conc_min <- ifelse(df$trop=='pt', 0.1, 1)
df$status <- ifelse(df$conc > df$conc_min, 'positive', 'negative')
# levels of the factor = the order appearing in the legend
df$status <- factor(df$status, levels=c('positive', 'negative'))
ggplot(df) +
geom_rect(
aes(
xmin = conc_adjust - 0.4,
xmax = conc_adjust + 0.4,
ymin = conc_min,
ymax = conc,
fill = status
),
colour = 'black', alpha = 0.8
) +
scale_y_log10() +
# Fake discrete axis
scale_x_continuous(labels = sort(unique(df$sites)),
breaks = 1:3) +
scale_fill_manual(name='Conc', values=c('positive'='orange', 'negative'='gray')) +
facet_grid(. ~ trop) +
theme_bw()

Putting horizontal lines on grouped boxplots

I am trying to make a boxplot with this basic code:
design=c("Red","Green","Blue")
actions=c("1","2","3","4","5","6","7","8")
proportion=(seq(1:240)+sample(1:500, 240, replace=T))/2000
df=data.frame(design, actions , proportion)
ggplot(df, aes(x=actions, y=proportion, fill=design)) +
geom_boxplot()+
xlab(TeX("group"))+
ylab("Y value")+
ggtitle("Y values for each group stratified by color")
Producing something like this:
I want to add horizontal lines for "true" Y values that are different for each group.
Does anyone have any tips for doing this? I don't know how to extract the width of each group of boxes, otherwise I could use geom_segment.
Here is a MWE with a non-grouped boxplot:
dBox <- data.frame(y = rnorm(10),group="1")
dBox=rbind(dBox,data.frame(y=rnorm(10),group="2"))
dLines <- data.frame(X =c(-0.36, 0.015),
Y = c(0.4, -0.2),
Xend = c(0.-0.015, 0.36),
Yend=c(0.4, -0.2),
group = c("True", "True"),
color = c("black", "red"))
ggplot(dBox, aes(x=0, y=y,fill=group)) +
geom_boxplot(outlier.shape = 1)+
geom_segment(data = dLines, aes(x = X, xend = Xend, y = Y, yend = Yend),color="red",size=1.5,linetype=1) +
theme(legend.background = element_rect(fill = "white", size = 0.1, linetype = "solid", colour = "black"))
This produces something like this:
However, it's difficult to make the geom_segments line up with the boxes exactly, and to then extend this to the grouped boxplot setting.
Thanks!
This can be done using a workaround with facets:
lines = data.frame(actions = 1:8, proportion=abs(rnorm(8)))
design=c("Red","Green","Blue")
actions=c("1","2","3","4","5","6","7","8")
proportion=(seq(1:240)+sample(1:500, 240, replace=T))/2000
df=data.frame(design, actions , proportion)
lines = data.frame(actions = 1:8, proportion=abs(rnorm(8)))
p = ggplot(df, aes(x=actions, y=proportion, fill=design)) +
geom_boxplot()+
xlab("group")+
ylab("Y value")+
ggtitle("Y values for each group stratified by color") +
facet_grid(~actions, scale='free_x') +
theme(
panel.spacing.x = unit(0, "lines"),
strip.background = element_blank(),
strip.text.x = element_blank())
p + geom_hline(aes(yintercept = proportion), lines)
You could probably fiddle around with removing the spaces between the facets to make it look more like what you intended.
Thanks to #eugene100hickey for pointing out how to remove spacing between facets.
theme(panel.spacing.x) can remove those pesky lines:
p + geom_hline(aes(yintercept = proportion), lines) +
theme(panel.spacing.x = unit(0, "lines"))

ggplot dot covering error bar

I have a huge file and I don't really know what small test dataset I can give here to produce the same problem in the plot, so I will not give any test dataset, I will only attach the plot image here to show the problem.
My code:
ggplot(tgc, aes(x=Week, y=MuFreq)) +
theme_gray(base_size=18) +
theme(plot.title=element_text(hjust=.5),
axis.title.x = element_text(face="bold"),
axis.title.y = element_text(face="bold")) +
geom_errorbar(aes(ymin=MuFreq-(1.96*se), ymax=MuFreq+(1.96*se)), width=3) +
geom_line() +
geom_point(aes(size= N), color="blue")+
scale_x_continuous(breaks=c(68,98,188), labels=c("Wk68", "Wk98", "Wk188")) +
scale_y_continuous(limits=c(0,0.15)) +
scale_size( breaks = unique(tgc$N))
So the problem is that I'm sizing the dots based on the sample size for each week, the middle dot actually has error bars associated with it but it's covering the error bar. I tried to use horizontal error bar but it didn't work because my x-axis is customized to be non-numerical.
What can I do to show the error bar that's being covered?
Also is there any way to make the background vertical grid lines spaced evenly?
The Q asks to improve two things in the ggplot2 chart:
Show error bars that are being covered
Make the background vertical grid lines spaced evenly
Data
As the OP didn't supply any data, we need a dummy data set. This is easily done by reading values from the plot:
tgc <- data.frame(Week = c(68, 98, 188),
MuFreq = c(0.08, 0.09, 0.091),
se = c(0.003, 0.001, 0.019)/1.96,
N = c(91, 835, 7))
This reproduces the original plot quite nicely:
Variant 1
This one is picking up Nick Criswell's comments:
Change order in which layers are plotted, so that error bars are plotted on top
Change colour and alpha
plus
Remove all vertical grid lines except those which are explicetly specified as breaks. The distances of major grid lines are still uneven but reflect the difference in time
With this code
library(ggplot2)
ggplot(tgc, aes(x = Week, y = MuFreq)) +
theme_gray(base_size = 18) +
theme(plot.title = element_text(hjust = .5),
axis.title = element_text(face = "bold")) +
geom_line() +
geom_point(aes(size = N), color = "dodgerblue1", alpha = 0.5) +
geom_errorbar(aes(ymin = MuFreq - (1.96 * se),
ymax = MuFreq + (1.96 * se)), width = 3) +
scale_x_continuous(
breaks = c(68, 98, 188),
labels = c("Wk68", "Wk98", "Wk188"),
minor_breaks = NULL
) +
scale_y_continuous(limits = c(0, 0.15)) +
scale_size(breaks = unique(tgc$N))
we do get:
Variant 2
To get evenly spaced data points on the x-axis we can turn weeks into factor. This requires to tell ggplot2 that the data belong to one group in order to have lines plotted and to add a custom x-axis label.
In addition, theme_bw is used instead of theme_gray:
library(ggplot2)
ggplot(tgc, aes(x = factor(Week, labels = c("Wk68", "Wk98", "Wk188")),
y = MuFreq, group = 1)) +
theme_bw(base_size = 18) +
theme(plot.title = element_text(hjust = .5),
axis.title = element_text(face = "bold")) +
geom_line() +
geom_point(aes(size = N), color = "dodgerblue1", alpha = 0.5) +
geom_errorbar(aes(ymin = MuFreq - (1.96 * se),
ymax = MuFreq + (1.96 * se)), width = 0.05 ) +
scale_y_continuous(limits = c(0, 0.15)) +
scale_size(breaks = unique(tgc$N)) +
xlab("Week")

Position dodge does not work with geom_point and geom_errorbar

I have this overplotting issue going on. Even after reading a lot of posts on dodge, jitter and jitter dodge in all kinds of implementations I can't figure it out.
Here you can get my data: http://pastebin.com/embed_js.php?i=uPXN7nPt
library(dplyr)
library(gdata)
library(ggplot2)
library(directlabels)
all<-read.xls('all_auto_bio_adjusted_c.xls')
all$size.new<-sqrt(all$size.new)
all$station<-as.factor(all$station)
all$group.new<-factor(all$group, levels=c('C. hyperboreus','C. glacialis','Special Calanus','M. longa','Pseudocalanus sp.','Copepoda'))
pd <- position_dodge(w = 50)
allp <- ggplot(data = all, aes(y = averagebiol, x = automatic, colour = group.new, group=group.new)) +
geom_abline(intercept = 0, slope = 1) +
geom_point(aes(size = size.new), show_guide=TRUE, position=pd) +
scale_size_identity()+
geom_errorbar(aes(ymin = averagebiol - stdevbiol, ymax = averagebiol + stdevbiol),colour = "grey", width = 0.1, position=pd) +
facet_grid(group.new~station, scales="free") +
xlab("Automatic identification") + ylab("Manual identification") +
ggtitle("Comparison of automatic vs manual identification") +
theme_bw() +
theme(plot.title = element_text(lineheight=.8, face="bold", size=20,vjust=1), axis.text.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=.5,face="bold"), axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="bold"), axis.title.x = element_text(colour="grey20",size=20,angle=0,hjust=.5,vjust=0,face="bold"), axis.title.y = element_text(colour="grey20",size=20,angle=90,hjust=.5,vjust=1,face="bold"), legend.position="none", strip.text.x = element_text(size = 12, face="bold", colour = "black", angle = 0), strip.text.y = element_text(size = 12, face="bold", colour = "black"))
allp
Which produces this nice plot
But as you can see a lot of the points and error bars are cramped together. Shouldn't my implementation of position dodge work?
If I understood right position dodge takes the scale of the axes, so with a doge of 50 I should see some results. I also tried putting the dodge argument directly into the geom, but that had no effect either.
Any ideas?
If you leave out position = pd in both geom_errorbar() and geom_point() you get the same plot. The reason the data look 'cramped' is because of the spread of the x-values. As far as I know, dodging will only happen if two points 'overlap', which I interpret as having the same x-value, e.g. data on a categorical x-axis like in the case of a bar plot. Your x-axis is continuous so the points will not be dodged.
To deal with the overplotting you could try logarithmic scales:
library(ggplot2)
tmp <- tempfile()
download.file("http://pastebin.com/raw.php?i=uPXN7nPt", tmp)
all <- read.csv(tmp)
all$size.new <- sqrt(all$size.new)
all$station <- as.factor(all$station)
all$group.new <- factor(all$group, levels = c("C. hyperboreus", "C. glacialis",
"Special Calanus", "M. longa",
"Pseudocalanus sp.", "Copepoda"))
# explicitly remove missing data
all <- all[complete.cases(all), ]
allp <- ggplot(data = all, aes(y = averagebiol, x = automatic, colour = group.new,
group = group.new, ymin = averagebiol - stdevbiol,
ymax = averagebiol + stdevbiol)) +
theme_bw() +
geom_abline(intercept = 0, slope = 1) +
geom_errorbar(colour = "grey", width = 0.1) +
geom_point(aes(size = size.new)) +
scale_size_area() + # Just so I could see all the points on my monitor :)
xlab("Automatic identification") +
ylab("Manual identification") +
ggtitle("Comparison of automatic vs manual identification")
allp + scale_x_log10() +
scale_y_log10() +
facet_grid(group.new ~ station, scales = "fixed")

Resources