ggplot: error bars - r

I want to plot error bars of the two different set of value of y1, y2 with respect to x. In other words, I have two data Y1,Y2 and they are correspond X value. I managed to plot them together after I reshaped the data frame. Now I want to graph the error bars on the same graph for each Y1, Y2 points. I understand geom_errorbar() is what I'm looking for. However, I'm following long way to do that and I'm sure there is a short way. What I'm doing I'm calculating "se" for each set and calculate aes(ymin=y1-se, ymax=y+se) and repeat the same for Y2. Because I want to apply this error bars to different plots . I 'd rather do it in a short way.
Here my data frame after reshape:
M Req Rec load Un L1
1 30.11 9.000000 3.000000 30.02000 A
2 50.31 10.030000 6.045000 39.44000 A
3 60.01 11.290000 7.366667 54.93000 A
4 66.10 12.630000 8.827500 68.44500 A
5 80.18 13.106000 9.462000 71.07600 A
6 87.10 14.421667 15.961667 82.70500 A
7 90.08 15.880000 20.644286 94.20714 A
1 4.000 1.500000 1.000000 1 B
2 8.240 6.240000 4.760000 3.00000 B
3 10.28 12.230000 9.420000 4.05000 B
4 18.570 25.570000 17.930000 6.00000 B
5 22.250 35.250000 27.850000 7.00000 B
6 35.070 55.010000 36.810000 8.06000 B
7 48.480 0.420000 47.020000 9.06000 B
I have used the following command to graph it:
ggplot(df_reshaped,aes(x = M, y = Req, colour = L1, shape=L1)) +
geom_point(size = 5)+
geom_line() +
scale_x_discrete(name="M") +
scale_y_continuous(name="Y1 Y2")+
ggtitle("A vs B")
In this case I'm graphing Y1=Req1, Y2=Req2, with respect to x=M
Any short way or suggestion to calculate the error bars ?
Is there any quick way to calculate the "se" ?

In general there are two possibilities to prepare your data for ggplot:
You could aggregate the raw data and plot the results. If you follow this way, you have to calculate the standard errors too since the information cannot be retrieved from the aggregated data. These standard errors could be plotted with geom_errorbar.
A second option is to use the raw data and let ggplot do all the calculations for you. This could be done with stat_summary. For example:
stat_summary(fun.data = "mean_cl_normal", mult = 1, geom = "errorbar")
Obviously, you have chosen the first approach. So, you just need to calculate the standard errors for the points of both variables.

Related

density functions of multiple columns in a dataframe - ggplot

I need some help to produce a graph similar to the one posted here Density plot for numerous variables using ggplot in R
I tried the code mentioned in the post however the result is not good looking
My database looks like this:
head(df)
a b c d e f g
1 0.9999994 0.9999994 0.7924445 0.9998647 0.7300587 0.9249790 0.9816021
2 0.9999885 0.9999885 0.6782044 0.9983770 0.6119326 0.9434158 0.9583668
3 1.0000000 1.0000000 0.8709003 0.9999908 0.8181097 0.8939165 0.9942465
4 1.0000000 1.0000000 0.8587627 0.9999847 0.8035536 0.9034016 0.9998198
5 0.9999996 0.9999996 0.8059187 0.9999075 0.7480368 0.9043720 0.9290576
6 0.9999999 0.9999999 0.8532174 0.9999810 0.7971970 0.9059244 0.9983568
dat <- stack(df)
ggplot(dat, aes(x=values, fill=ind)) + geom_density(alpha=0.5)
The values range from 0.6 to 1
I've also tried the approach with pivot_longer but it doesn't have a great look as well ..
could anyone help?provide me with suggestions or alternatives?
Thanks
If you look at your y axis, you will notice it has very high values. The reason is that the density for column d is extremely high, since its values are all concentrated into a tiny spot. A grouped density plot will calculate the density for each group separately, and the smoothing kernel is scaled according to the range of the data. Since the density of column d has to fit in a range of about 0.001 of the x axis but have an area under its curve of 1, that curve is going to be a very tall sharp spike. Its density therefore "drowns out" the density of all the other groups. If you use coord_cartesian to set the y range, we can see all the other densities much more clearly. Of course, this cuts off the top of the d density since it is three orders of magnitude higher, but this seems like a reasonable compromise.
ggplot(dat, aes(x = values, fill=ind)) +
geom_density(alpha = 0.5, position = "identity") +
coord_cartesian(ylim = c(0, 30))

Barplot with continuous x axis using base r graphics

I am looking to scale the x axis on my barplot to time, so as to accurately represent when measurements were taken.
I have these data frames:
> Botcv
Date Average SE
1 2014-09-01 4.0 1.711307
2 2014-10-02 5.5 1.500000
> Botc1
Date Average SE
1 2014-10-15 2.125 0.7180703
2 2014-11-12 1.000 0.4629100
3 2014-12-11 0.500 0.2672612
> Botc2
Date Average SE
1 2014-10-15 3.375 1.3354708
2 2014-11-12 1.750 0.4531635
3 2014-12-11 0.625 0.1829813
I use this code to produce a grouped barplot:
covaverage <- c(Botcv$Average,NA,NA,NA)
c1average <- c(NA,NA, Botc1$Average)
c2average <- c(NA,NA, Botc2$Average)
date <- c(Botcv$Date, Botc1$Date)
averagematrix <- matrix(c(covaverage,c1average, c2average), nrow=3, ncol=5, byrow=TRUE)
barplot(averagematrix,date, xlab="Date", ylab="Average", axis.lty=1, space=NULL,width=3,beside=T, ylim=c(0.00,6.00))
R plots the bars equal distances apart by default and I have been trying to find a workaround for this. I have seen several other solutions that utilise ggplot2 but I am producing plots for my masters thesis and would like to keep the appearance of my barplots in line with other graphs that I have created using base R graphics. I also want to add error bars to the plot. If anyone could provide a solution then I would be very grateful!! Thanks!
Perhaps you can use this as a start. It is probably easier to use boxplots, as they can be put at a given x position by using the at argument. For base barplots this cannot be done, but you can use rectangle instead to replicate the barplot look. Error bars can be added using arrows or segments.
bar_w = 1 # width of bars
offset = c(-1,1) # offset to avoid overlapping
cols = grey.colors(2) # colors for different types
# combine into a single data frame
d = data.frame(rbind(Botc1, Botc2), 'type' = c(1,1,1,2,2,2))
# set up empty plot with sensible x and y lims
plot(as.Date(d$Date), d$Average, type='n', ylim=c(0,4))
# draw data of data frame 1 and 2
for (i in unique(d$type)){
dd = d[d$type==i, ]
x = as.Date(dd$Date)
y = dd$Average
# rectangles
rect(xleft=x-bar_w+offset[i], ybottom=0, xright=x+bar_w+offset[i], ytop=y, col=cols[i])
# errors bars
arrows(x0=x+offset[i], y0=y-0.5*dd$SE, x1=x+offset[i], y1=y+0.5*dd$SE, col=1, angle=90, code=3, length = 0.1)
}
If what you want to get is simply the theme that will match the base theme the + theme_bw() in ggplot2 will achieve this:
data(mtcars)
require(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
theme_bw()
Result
Alternative
boxplot(mpg~cyl,data=mtcars)
If, as you said, the only thing you want to achieve is similar look, and you have working plot in the ggplot2 using the theme_bw() should produce plots that are indistinguishable from what would be derived via the standard plotting mechanism. If you feel so inclined you may tweak some minutiae details like font sizes, thickness of graph borders or visualisation of outliers.

plotCI: how to overlay plots of two variables

I am trying to plot populations of predators and of prey over time, with confidence intervals. I can plot these two separately, how to plot on same graph?
#take mean, number, and create se of prey(d)
d.means=tapply(mydata$prey,mydata$week, mean)
d.n=tapply(mydata$prey,mydata$week, length)
d.se=tapply(mydata$prey,mydata$week, sd)/sqrt(d.n)
#plot with se using plotrix
plotCI(as.numeric(row.names(d.means)),d.means,d.se,ylim=c(0,400),pch=19,gap=0,xlab="Week",ylab="d, w population")
#take mean, number, and create se of predator(w)
w.means=tapply(mydata$pred,mydata$week, mean)
w.n=tapply(mydata$pred,mydata$week, length)
w.se=tapply(mydata$pred,mydata$week, sd)/sqrt(w.n)
#plot with se using plotrix
plotCI(as.numeric(row.names(w.means)),w.means,w.se,ylim=c(0,400),pch=19,gap=0,xlab="Week",ylab="d, w population")
After the first plot, use the code below before plotting the next plot:
par(new=T)
Make sure that you set the xlim and ylim to accommodate both plots. And you will need to use the options axes=F and ann=F.
These graphical features are discussed in detail in the ebook "R Fundamentals & Graphics". You might want to use it as a desk reference.
#take mean, number, and create se of prey(d)
d.means=tapply(mydata$prey,mydata$week, mean)
d.n=tapply(mydata$prey,mydata$week, length)
d.se=tapply(mydata$prey,mydata$week, sd)/sqrt(d.n)
#take mean, number, and create se of predator(w)
w.means=tapply(mydata$pred,mydata$week, mean)
w.n=tapply(mydata$pred,mydata$week, length)
w.se=tapply(mydata$pred,mydata$week, sd)/sqrt(w.n)
Here you have created all the variables you need but to plot them using ggplot you need them to be in a tall dataset with an variable indicating if they are predator or prey. I also added a time variable, I think yours would be week.
x=data.frame(means=c(w.means,d.means),
n=c(w.n,d.n),
se=c(w.se,d.se),
role=c(rep("pred",length(w.n)),rep("prey",length(d.n))),
time=c(1:length(w.n),1:length(d.n))
)
I don't know exactly what your data look like so here is a fake one I cooked up just to illustrate the format.
means n se role time
1 0.9874234 10 0.16200575 pred 1
2 1.4120207 12 0.08895026 pred 2
3 2.7352516 8 0.07991036 pred 3
4 1.1301248 11 0.05481813 prey 1
5 2.4810040 13 0.28682585 prey 2
6 3.1546947 9 0.22126054 prey 3
Once the data are in this nice format using ggplot is really pretty easy.
ggplot(x, aes(x=time, y=means, colour=role)) +
geom_errorbar(aes(ymin=means-se, ymax=means+se), width=.1) +
geom_line()
That gives this:

How do you plot two vectors on x-axis and another on y-axis in ggplot2

I am trying to plot two vectors with different values, but equal length on the same graph as follows:
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(x,y,days)
a b days
1 23.33 33.33 1
2 24.33 34.33 2
3 25.33 35.33 3
4 26.33 36.33 4
5 27.33 37.33 5
etc..
I am trying to use ggplot2 to plot x and y on the x-axis and the days on the y-axis. However, I can't figure out how to do it. I am able to plot them individually and combine the graphs, but I want just one graph with both a and b vectors (different colors) on x-axis and number of days on y-axis.
What I have so far:
X<-ggplot(df, aes(x=a,y=days)) + geom_line(color="red")
Y<-ggplot(df, aes(x=b,y=days)) + geom_line(color="blue")
Is there any way to define the x-axis for both a and b vectors? I have also tried using the melt long function, but got stuck afterwards.
Any help is much appreciated. Thank you
I think the best way to do it is via a the approach of melting the data (as you have mentioned). Especially if you are going to add more vectors. This is the code
library(reshape2)
library(ggplot2)
a<-23:52
b<-33:62
days<-1:30
df<-data.frame(x=a,y=b,days)
df_molten=melt(df,id.vars="days")
ggplot(df_molten) + geom_line(aes(x=value,y=days,color=variable))
You can also change the colors manually via scale_color_manual.
A simpler solution is to use only ggplot. The following code will work in your case
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(a,b,days)
ggplot(data = df)+
geom_line(aes(x = df$days,y = df$a), color = "blue")+
geom_line(aes(x = df$days,y = df$b), color = "red")
I added the colors, you might want to use them to differentiate between your variables.

plotting aggregate data with ggplot

I have a data like this
subject<-1:208
ev<-runif(208, min=1, max=2)
seeds<-gl(6,40,labels=c('seed1', 'seed2','seed3','seed4','seed5','seed6'),length=208)
ngambles<-gl(2,1, labels=c('4','32'))
trial<-rep(1:20, each= 2, length=208)
ngambles<-rep('4','32' ,each=1, length=208)
data<-data.frame(subject,ev,seeds,ngambles,trial)
the data looks like this
subject ev seeds ngambles trial
1 1.996717 seed1 4 1
2 1.280977 seed1 32 1
3 1.571648 seed1 4 2
4 1.153311 seed1 32 2
5 1.502559 seed1 4 3
6 1.644001 seed1 32 3
I plot a graph with rep as x axis and expected_value as y axis for each seed and n_gambles by this command.
qplot(trial,ev,data=data,
facets=ngambles~seeds,xlab="Trial", ylab="Expected Value", geom="line")+
opts(title = "Expected Value for Each Seed")
now I want to draw a new graph by aggregating ev for trial equal to 1-5, 6-10,11-15,and 16-20. I also want to draw an error bar.
I have no clue how to do in R
maybe somebody can help me
thanks in advance
Assuming that your data frame is called df. First, added new column ag that show to which interval original trial value will belong with function cut().
df$ag<-cut(df$trial,c(1,6,11,16,21),right=FALSE)
Now there is two possibilities - first, aggregate your data using stat_.. functions of ggplot2. There is stat_summary() function already defined and then you should define also stat_sum_df() function (taken from stat_summary() help file) to calculate more than one summary value.
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="red", geom=geom, width=0.2, ...)
}
With stat_sum_df() and argument "mean_cl_normal" calculate confidence intervals to use in geom="errorbar" and with stat_summary() mean value for geom="line". As x value use new column ag. With scale_x_discrete() you can get right labels for x axis.
ggplot(df, aes(ag,ev,group=seeds))+stat_sum_df("mean_cl_normal",geom="errorbar")+
stat_summary(fun.y="mean",geom="line",color="red")+
facet_grid(ngambles~seeds)+
scale_x_discrete(labels=c("1-5","6-10","11-15","16-20"))
Second approach is to summarize data before plotting, for example, with function ddply() from library plyr. Also in this case you need column ag made in first example. And then use new data for plotting.
library(plyr)
df.new<-ddply(df,.(ag,seeds,ngambles),summarise,ev.m=mean(ev),
ev.lim=qt(0.975,length(ev)-1)*sd(ev)/sqrt(length(ev)))
ggplot(df.new,aes(ag,group=seeds))+
geom_errorbar(aes(y=ev.m,ymin=ev.m-ev.lim,ymax=ev.m+ev.lim))+
geom_line(aes(y=ev.m))+
facet_grid(ngambles~seeds)+
scale_x_discrete(labels=c("1-5","6-10","11-15","16-20"))

Resources