Error when adding errorbars to ggplot - r
Dear Stackoverflow users,
I would like to draw a grouped barplot with three independent variables with error bars. I based my graph on an example on Stacked Overflow (stacked bars within grouped bars), using ggplot with geom_bar. When I add the geom_errorbar according to examples of the help pages, I get the following error:
Error in if (empty(data)) { : missing value where TRUE/FALSE needed
This is the script I use:
treatment<-rep(c(rep(c(1),8),rep(c(2),8)),2)
origin<-rep(c("A","B"),16)
time<-c(rep(c(5),16),rep(c(10),16))
sulfide<-c(0,10,5,8,9,6,16,18,20,25,50,46,17,58,39,43,20,25,50,46,17,58,39,43,100,120,103,104,150,160,200,180)
Reed<-data.frame(treatment,origin,time,sulfide)
# specify factor types
Reed$treatment<-as.factor(Reed$treatment)
Reed$origin<-as.character(Reed$origin)
Reed$time<-as.factor(Reed$time)
library(ggplot2)
library(scales)
#draw plot
ggplot() +geom_bar(data=Reed, aes(y = sulfide, x = treatment, fill=origin), stat="identity",position="dodge") +theme_bw() + facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")+ggtitle("Time)")
This is how I added error bars:
ErrorBars <- function(x, y, upper, lower=upper, length=0.03,...{if(length(x) != length(y) | length(y) !=length(lower) | length(lower) != length(upper))stop("vectors must be same length")arrows(x,y+upper, x, y-lower, angle=90, code=3, length=length, ...)}#function for errorbars
SE<- function(x) sqrt(var(x,na.rm=TRUE)/length(na.omit(x))) #function for SE
Reed$trt<- paste(Reed$treatment,Reed$origin,sep="")#combine treatment and origin to a column
mean_Reed<-data.frame(tapply(Reed$sulfide,list(Reed$trt,Reed$time),mean,na.rm=TRUE)) #mean
SE_Reed<-data.frame(tapply(Reed$sulfide,list(Reed$trt, Reed$time),SE)) # SE
limits <- aes(ymax = mean_Reed + SE_Reed, ymin=mean_Reed - SE_Reed)# Define the top and bottom of the errorbars
#plot with error bars:
ggplot() +geom_bar(data=Reed, aes(y = sulfide, x = treatment, fill=origin), stat="identity",position="dodge") +theme_bw() + facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")+ggtitle("Time)"+ geom_errorbar(limits, width=.2,position="dodge")
I really can't find what I'm doing wrong.
I hope you can help me:)
Leaving aside the issue of error bars for the moment, there's a much more serious problem with your plot. You have 2 values each of treatment, time, and origin, for a total of 8 combinations, but 32 values of sulfide - so there are 4 values of sulfide for each combination. When you plot this using, e.g.,
ggplot(data=Reed) +
geom_bar(aes(y = sulfide, x = treatment, fill=origin), stat="identity",position="dodge") +
facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")
you are plotting bars for all four sulfide values on top of each other all in the same color. This has the effect of displaying only the maximum value. It's a little hard to believe this is what you intended, and even if you did there's a better way to do that. For instance, if you want to plot the mean value of sulfide for each combination of factors, you can do it this way.
ggp <- ggplot(data=Reed, aes(y = sulfide, x = as.factor(treatment), group=origin)) +
geom_bar(aes(fill=origin), stat="summary", fun.y=mean, position="dodge") +
theme_bw() +
facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")+ggtitle("Time")
ggp
This uses stat="summary" to automatically summarize the result using the aggregating function mean (fun.y=mean).
As similar approach can be used to very simply add the error bars:
se <- function(y) sd(y)/length(y) # to calculate standard error in the mean
ggp+stat_summary(geom="errorbar",position=position_dodge(width=0.85),
fun.data=function(y)c(ymin=mean(y)-se(y),ymax=mean(y)+se(y)), width=0.1)
Notice that there is no need to aggregate the data externally - ggplot does it for you.
Finally, this approach lends itself to the use of many built-in functions for generating confidence limits with more statistical rigor.
ggp+stat_summary(fun.data=mean_cl_normal, conf.int=0.95,
geom="errorbar",position=position_dodge(width=0.85), width=0.1)
So here we use the ggplot built-in function mean_cl_normal to calculate 95% confidence limits on the mean assuming the data follows a normal distribution (and that, hence, the means will follow a t-distribution). We use the argument conf.int=... to specify the desired confidence interval, but the default is 0.95 so it really wasn't necessary in this example.
There are several other functions of this type: see the documentation and links therein for an explanation.
If you want to build your error bars by making a summary dataset, you just need to get that dataset in the correct format. There are lots of options for this; I will use dplyr. Notice I keep all the grouping variables from the plot in this dataset in a "tidy" format, with each variable in a separate column.
library(dplyr)
meandat = Reed %>%
group_by(treatment, time, origin) %>%
summarise(mean = mean(sulfide, na.rm = TRUE), se = SE(sulfide))
Source: local data frame [8 x 5]
Groups: treatment, time [?]
treatment time origin mean se
(fctr) (fctr) (chr) (dbl) (dbl)
1 1 5 A 7.50 3.378856
2 1 5 B 10.50 2.629956
3 1 10 A 31.50 7.858117
4 1 10 B 43.00 6.819091
5 2 5 A 31.50 7.858117
6 2 5 B 43.00 6.819091
7 2 10 A 138.25 23.552689
8 2 10 B 141.00 17.540429
Now error bars can be added via geom_errorbar. You'll see I set the aesthetics globally within ggplot to save myself having to re-type some of these, but you can change this as you want. I use position_dodge to get the error bars placed correctly over each bar.
ggplot(data = Reed, aes(y = sulfide, x = treatment, fill=origin)) +
geom_bar(stat="identity", position="dodge") +
theme_bw() +
facet_grid( ~ time)+
xlab("treatment") +
ylab("Sulfide")+
ggtitle("Time")+
geom_errorbar(data = meandat, aes(ymin = mean - se, ymax = mean + se, y = mean),
position = position_dodge(width = .9))
You can actually do all of this via stat_summary, rather than calculating the summary statistics "by hand". An example is here. The code would look like so, and gives the same plot as above.
ggplot(data = Reed, aes(y = sulfide, x = treatment, fill=origin)) +
geom_bar(stat="identity",position="dodge") +
theme_bw() +
facet_grid( ~ time) +
xlab("treatment") +
ylab("Sulfide") +
ggtitle("Time") +
stat_summary(geom = "errorbar", fun.data = mean_cl_normal, mult = 1,
position = position_dodge(width = .9))
I've been using the development version of ggplot2, ggplot2_1.0.1.9003, and found that I needed to add stat_summary function arguments via fun.args. This would look like fun.args = list(mult = 1) to get error bars of 1 standard error.
Related
Additional x axis on ggplot
I'm aware there are similar posts but I could not get those answers to work in my case. e.g. Here and here. Example: diamonds %>% ggplot(aes(scale(price) %>% as.vector)) + geom_density() + xlim(-3, 3) + facet_wrap(vars(cut)) Returns a plot: Since I used scale, those numbers are the zscores or standard deviations away from the mean of each break. I would like to add as a row underneath the equivalent non scaled raw number that corresponds to each. Tried: diamonds %>% ggplot(aes(scale(price) %>% as.vector)) + geom_density() + xlim(-3, 3) + facet_wrap(vars(cut)) + geom_text(aes(label = price)) Gives: Error: geom_text requires the following missing aesthetics: y My primary question is how can I add the raw values underneath -3:3 of each break? I don't want to change those breaks, I still want 6 breaks between -3:3. Secondary question, how can I get -3 and 3 to actually show up in the chart? They have been trimmed. [edit] I've been trying to make it work with geom_text but keep hitting errors: diamonds %>% ggplot(aes(x = scale(price) %>% as.vector)) + geom_density() + xlim(-3, 3) + facet_wrap(vars(cut)) + geom_text(label = price) Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomText, : object 'price' not found I then tried changing my call to geom_text() geom_text(data = diamonds, aes(price), label = price) This results in the same error message.
You can make a custom labeling function for your axis. This takes each label on the axis and performs a custom transform for you. In your case you could paste the z score, a line break, and the z-score times the standard deviation plus the mean. Because of the distribution of prices in the diamonds data set, this means that z scores below about -1 represent negative prices. This may not be a problem in your own data. For clarity I have drawn in a vertical line representing $0 labeller <- function(x) { paste0(x,"\n", scales::dollar(sd(diamonds$price) * x + mean(diamonds$price))) } diamonds %>% ggplot(aes(scale(price) %>% as.vector)) + geom_density() + geom_vline(aes(xintercept = -0.98580251364833), linetype = 2) + facet_wrap(vars(cut)) + scale_x_continuous(label = labeller, limits = c(-3, 3)) + xlab("price")
We can use the sec_axis functionality in scale_x_continuous. To use this functionality we need to manually scale your data. This will add a secondary axis at the top of the plot, not underneath. So it's not quite exactly what you're looking for. library(tidyverse) # manually scale the data mean_price <- mean(diamonds$price) sd_price <- sd(diamonds$price) diamonds$price_scaled <- (diamonds$price - mean_price) / sd_price # make the plot ggplot(diamonds, aes(price_scaled))+ geom_density()+ facet_wrap(~cut)+ scale_x_continuous(sec.axis = sec_axis(~ mean_price + (sd_price * .)), limits = c(-3, 4), breaks = -3:3) You could cheat a bit by passing some dummy data to geom_text: geom_text(data = tibble(label = round(((-3:3) * sd_price) + mean_price), y = -0.25, x = -3:3), aes(x, y, label = label))
Plot using mean and standard error values using ggplot2
Here's my data: year means stder 1 A_1996 4.1291 0.19625 2 B_1997 3.4490 0.18598 3 C_1998 4.1166 0.15977 4 D_1999 3.6500 0.15093 5 E_2000 3.9528 0.14950 6 F_2001 2.7318 0.13212 This is all the data I have. I'd like to plot these using the ggplot2 package, if possible. X axis will be year, and Y axis will be means. Each year will have one point -its corresponding mean value, with the respective standard error values as the "whiskers" around that point. How would I do this using the ggplot() function? I think I'm mainly confused on how to put the standard error data into the ymin and ymax inputs. I started looking here, but the beginning data is different, so I'm a little confused. Plotting means and error bars (ggplot2)
Simple plot using general ggplot2 commands: library(ggplot2) df$year <- as.numeric(gsub(".*_", "", df$year)) ggplot(df, aes(year, mean)) + geom_point() + geom_errorbar(aes(ymin = mean - stder, ymax = mean + stder)) Same plot with fancier visuals: ggplot(df, aes(year, mean)) + geom_point(size = 3) + geom_errorbar(aes(ymin = mean - stder, ymax = mean + stder), width = 0.5, size = 0.5) + theme_bw() + labs(x = "Year", y = "Mean", title = "Change in mean over the period")
dodge columns in ggplot2
I am trying to create a picture that summarises my data. Data is about prevalence of drug use obtained from different practices form different countries. Each practice has contributed with a different amount of data and I want to show all of this in my picture. Here is a subset of the data to work on: gr<-data.frame(matrix(0,36)) gr$drug<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b") gr$practice<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r") gr$country<-c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3") gr$prevalence<-c(9.14,5.53,16.74,1.93,8.51,14.96,18.90,11.18,15.00,20.10,24.56,22.29,19.41,20.25,25.01,25.87,29.33,20.76,18.94,24.60,26.51,13.37,23.84,21.82,23.69,20.56,30.53,16.66,28.71,23.83,21.16,24.66,26.42,27.38,32.46,25.34) gr$prop<-c(0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406,0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406) gr$low.CI<-c(8.27,4.80,12.35,1.83,7.22,14.53,18.25,10.56,14.28,18.76,24.25,21.72,18.62,19.83,24.36,25.22,28.80,20.20,17.73,23.15,21.06,13.12,21.79,21.32,22.99,19.76,29.60,15.41,28.39,23.25,20.34,24.20,25.76,26.72,31.92,24.73) gr$high.CI<-c(10.10,6.37,22.31,2.04,10.00,15.40,19.56,11.83,15.74,21.52,24.87,22.86,20.23,20.68,25.67,26.53,29.86,21.34,20.21,26.10,32.79,13.63,26.02,22.33,24.41,21.39,31.48,17.98,29.04,24.43,22.01,25.12,27.09,28.05,33.01,25.95) The code I wrote is this p<-ggplot(data=gr, aes(x=factor(drug), y=as.numeric(gr$prevalence), ymax=max(high.CI),position="dodge",fill=practice,width=prop)) colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2)) p + theme_bw()+ geom_bar(stat="identity",position = position_dodge(0.9)) + labs(x="Drug",y="Prevalence") + geom_errorbar(ymax=gr$high.CI,ymin=gr$low.CI,position=position_dodge(0.9),width=0.25,size=0.25,colour="black",aes(x=factor(drug), y=as.numeric(gr$prevalence), fill=practice)) + ggtitle("Drug usage by country and practice") + scale_fill_manual(values = colour)+ guides(fill=F) The figure I obtain is this one where bars are all on top of each other while I want them "dodge". I also obtain the following warning: ymax not defined: adjusting position using y instead Warning message: position_dodge requires non-overlapping x intervals Ideally I would get each bar near one another, with their error bars in the middle of its bar, all organised by country. Also should I be concerned about the warning (which I clearly do not fully understand)? I hope this makes sense. I hope I am close enough, but I don't seem to be going anywhere, some help would be greatly appreciated. Thank you
ggplot's geom_bar() accepts the width parameter, but doesn't line them up neatly against one another in dodged position by default. The following workaround references the solution here: library(dplyr) # calculate x-axis position for bars of varying width gr <- gr %>% group_by(drug) %>% arrange(practice) %>% mutate(pos = 0.5 * (cumsum(prop) + cumsum(c(0, prop[-length(prop)])))) %>% ungroup() x.labels <- gr$practice[gr$drug == "a"] x.pos <- gr$pos[gr$drug == "a"] ggplot(gr, aes(x = pos, y = prevalence, fill = country, width = prop, ymin = low.CI, ymax = high.CI)) + geom_col(col = "black") + geom_errorbar(size = 0.25, colour = "black") + facet_wrap(~drug) + scale_fill_manual(values = c("c1" = "gray79", "c2" = "gray60", "c3" = "gray39"), guide = F) + scale_x_continuous(name = "Drug", labels = x.labels, breaks = x.pos) + labs(title = "Drug usage by country and practice", y = "Prevalence") + theme_classic()
There is a lot of information you are trying to convey here - to contrast drug A and drug B across countries using the barplots and accounting for proportions, you might use the facet_grid function. Try this: colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2)) gr$drug <- paste("Drug", gr$drug) p<-ggplot(data=gr, aes(x=factor(practice), y=as.numeric(prevalence), ymax=high.CI,ymin = low.CI, position="dodge",fill=practice, width=prop)) p + theme_bw()+ facet_grid(drug~country, scales="free") + geom_bar(stat="identity") + labs(x="Practice",y="Prevalence") + geom_errorbar(position=position_dodge(0.9), width=0.25,size=0.25,colour="black") + ggtitle("Drug usage by country and practice") + scale_fill_manual(values = colour)+ guides(fill=F) The width is too small in the C1 country and as you indicated the one clinic is quite influential. Also, you can specify your aesthetics with the ggplot(aes(...)) and not have to reset it and it is not needed to include the dataframe objects name in the aes function within the ggplot call.
How to plot multiple group means and the confidence intervals in ggplot2 (R)?
I have data that looks like this: A B C 8 5 2 9 3 1 1 2 3 3 1 2 4 3 1 I need to plot the means of each of these along with the confidence intervals using ggplot2. I also want to derive the confidence intervals from the data iteself (eg. using stat_summary(fun.data = mean_cl), however I am not sure how I can plot the means for the data from this format. I tried the following code, but it does not run. I am not sure what needs to go into the y in line 2. pd <- position_dodge(0.78) ggplot(dat, y = c(dat$A,dat$B,dat$C) + ylim(0,10) + theme_bw()) + stat_summary(geom="bar", fun.y=mean, position = "dodge") + stat_summary(geom="errorbar", fun.data=mean_cl_normal, position = pd) I get the following error: Warning messages: 1: Computation failed in `stat_summary()`: object 'x' not found 2: Computation failed in `stat_summary()`: object 'x' not found
Your data isn't in long format, meaning that it should look like this: thing<-data.frame(Group=factor(rep(c("A","B","C"),5)), Y = c(8,9,1,3,4, 5,3,2,1,3, 2,1,3,2,1) ) You can use a function like melt() to help with getting the data formatted in the reshape2 package. Once you have that, you also have to calculate the means and SEs for your data (by hand prior to ggplot or by the correct expressions within stat_summary in ggplot). You may have copied/pasted from an example because the functions that you're using (eg, mean_cl_normal) are possibly undefined. Let's do it by hand then. library(plyr) cdata <- ddply(thing, "Group", summarise, N = length(Y), mean = mean(Y), sd = sd(Y), se = sd / sqrt(N) ) cdata #Group N mean sd se #1 A 5 4.0 2.236068 1.000000 #2 B 5 3.8 3.033150 1.356466 #3 C 5 1.8 1.788854 0.800000 Now you can use ggplot. pd <- position_dodge(0.78) ggplot(cdata, aes(x=Group, y = mean, group = Group)) + #draws the means geom_point(position=pd) + #draws the CI error bars geom_errorbar(data=cdata, aes(ymin=mean-2*se, ymax=mean+2*se, color=Group), width=.1, position=pd) This gives the attached plot.
like David said, you need long format first, but you should be able to use fun.data = "mean_cl_normal" or plug in various others just fine like this: library(tidyr); library(ggplot2) dat <- gather(dat) # gather to long form ggplot(data = dat, aes(x = key, y = value)) + geom_point(size = 4, alpha = .5) + # always plot the raw data stat_summary(fun.data = "mean_cl_normal", geom = "crossbar") + labs(title = "95% Mean Confidence Intervals") If you want to build the same intervals manually all you need are lm and confint to get the information you are after: mod <- lm(value ~ 0 + key, data = dat) ci <- confint(mod)
ggplot2 shading envelope of time series
I am plotting the results of 50 - 100 experiments. Each experiment results in a time series. I can plot a spaghetti plot of all time series, but what I'd like to have is sort of a density map for the time series plume. (something similar to the gray shading in the lower panel in this figure: http://www.ipcc.ch/graphics/ar4-wg1/jpg/fig-6-14.jpg) I can 'sort of' do this with 2d binning or binhex but the result could be prettier (see example below). Here is a code that reproduces a plume plot for mock data (uses ggplot2 and reshape2). # mock data: random walk plus a sinus curve. # two envelopes for added contrast. tt=10*sin(c(1:100)/(3*pi)) rr=apply(matrix(rnorm(5000),100,50),2,cumsum) +tt rr2=apply(matrix(rnorm(5000),100,50),2,cumsum)/1.5 +tt # stuff data into a dataframe and melt it. df=data.frame(c(1:100),cbind(rr,rr2) ) names(df)=c("step",paste("ser",c(1:100),sep="")) dfm=melt(df,id.vars = 1) # ensemble average ensemble_av=data.frame(step=df[,1],ensav=apply(df[,-1],1,mean)) ensemble_av$variable=as.factor("Mean") ggplot(dfm,aes(step,value,group=variable))+ stat_binhex(alpha=0.2) + geom_line(alpha=0.2) + geom_line(data=ensemble_av,aes(step,ensav,size=2))+ theme(legend.position="none") Does anyone know of a nice way do get a shaded envelope with gradients. I have also tried geom_ribbon but that did not give any indication of density changes along the plume. binhex does that, but not with aesthetically pleasing results.
Compute quantiles: qs = data.frame( do.call( rbind, tapply( dfm$value, dfm$step, function(i){quantile(i)})), t=1:100) head(qs) X0. X25. X50. X75. X100. t 1 -0.8514179 0.4197579 0.7681517 1.396382 2.883903 1 2 -0.6506662 1.2019163 1.6889073 2.480807 5.614209 2 3 -0.3182652 2.0480082 2.6206045 4.205954 6.485394 3 4 -0.1357976 2.8956990 4.2082762 5.138747 8.860838 4 5 0.8988975 3.5289219 5.0621513 6.075937 10.253379 5 6 2.0027973 4.5398120 5.9713921 7.015491 11.494183 6 Plot ribbons: ggplot() + geom_ribbon(data=qs, aes(x=t, ymin=X0., ymax=X100.),fill="gray30", alpha=0.2) + geom_ribbon(data=qs, aes(x=t, ymin=X25., ymax=X75.),fill="gray30", alpha=0.2) This is for two quantile intervals, (0-100) and (25-75). You'll need more args to quantile and more ribbon layers for more quantiles, and need to adjust the colours too.
Based on the idea of Spacedman, I found a way to add more intervals in an automatic way: I first compute the quantiles for each step, group them by pairs of symmetric values and then use geom_ribbon in the right order... library(tidyr) library(dplyr) condquant <- dfm %>% group_by(step) %>% do(quant = quantile(.$value, probs = seq(0,1,.05)), probs = seq(0,1,.05)) %>% unnest() %>% mutate(delta = 2*round(abs(.5-probs)*100)) %>% group_by(step, delta) %>% summarize(quantmin = min(quant), quantmax= max(quant)) ggplot() + geom_ribbon(data = condquant, aes(x = step, ymin = quantmin, ymax = quantmax, group = reorder(delta, -delta), fill = as.numeric(delta)), alpha = .5) + scale_fill_gradient(low = "grey10", high = "grey95") + geom_line(data = dfm, aes(x = step, y = value, group=variable), alpha=0.2) + geom_line(data=ensemble_av,aes(step,ensav),size=2)+ theme(legend.position="none")
Thanks Erwan and Spacedman. Avoiding 'tidyr' ('dplyr' and 'magrittr') my version of Erwans answer becomes probs=c(0:10)/10 # use fewer quantiles than Erwan arr=t(apply(df[,-1],1,quantile,prob=probs)) dfq=data.frame(step=df[,1],arr) names(dfq)=c("step",colnames(arr)) dfqm=melt(dfq,id.vars=c(1)) # add inter-quantile (per) range as delta dfqm$delta=dfqm$variable levels(dfqm$delta)=abs(probs-rev(probs))*100 dfplot=ddply(dfqm,.(step,delta),summarize, quantmin=min(value), quantmax=max(value) ) ggplot() + geom_ribbon(data = dfplot, aes(x = step, ymin = quantmin, ymax =quantmax,group=rev(delta), fill = as.numeric(delta)), alpha = .5) + scale_fill_gradient(low = "grey25", high = "grey75") + geom_line(data=ensemble_av,aes(step,ensav),size=2) + theme(legend.position="none")