I am looking to scale the x axis on my barplot to time, so as to accurately represent when measurements were taken.
I have these data frames:
> Botcv
Date Average SE
1 2014-09-01 4.0 1.711307
2 2014-10-02 5.5 1.500000
> Botc1
Date Average SE
1 2014-10-15 2.125 0.7180703
2 2014-11-12 1.000 0.4629100
3 2014-12-11 0.500 0.2672612
> Botc2
Date Average SE
1 2014-10-15 3.375 1.3354708
2 2014-11-12 1.750 0.4531635
3 2014-12-11 0.625 0.1829813
I use this code to produce a grouped barplot:
covaverage <- c(Botcv$Average,NA,NA,NA)
c1average <- c(NA,NA, Botc1$Average)
c2average <- c(NA,NA, Botc2$Average)
date <- c(Botcv$Date, Botc1$Date)
averagematrix <- matrix(c(covaverage,c1average, c2average), nrow=3, ncol=5, byrow=TRUE)
barplot(averagematrix,date, xlab="Date", ylab="Average", axis.lty=1, space=NULL,width=3,beside=T, ylim=c(0.00,6.00))
R plots the bars equal distances apart by default and I have been trying to find a workaround for this. I have seen several other solutions that utilise ggplot2 but I am producing plots for my masters thesis and would like to keep the appearance of my barplots in line with other graphs that I have created using base R graphics. I also want to add error bars to the plot. If anyone could provide a solution then I would be very grateful!! Thanks!
Perhaps you can use this as a start. It is probably easier to use boxplots, as they can be put at a given x position by using the at argument. For base barplots this cannot be done, but you can use rectangle instead to replicate the barplot look. Error bars can be added using arrows or segments.
bar_w = 1 # width of bars
offset = c(-1,1) # offset to avoid overlapping
cols = grey.colors(2) # colors for different types
# combine into a single data frame
d = data.frame(rbind(Botc1, Botc2), 'type' = c(1,1,1,2,2,2))
# set up empty plot with sensible x and y lims
plot(as.Date(d$Date), d$Average, type='n', ylim=c(0,4))
# draw data of data frame 1 and 2
for (i in unique(d$type)){
dd = d[d$type==i, ]
x = as.Date(dd$Date)
y = dd$Average
# rectangles
rect(xleft=x-bar_w+offset[i], ybottom=0, xright=x+bar_w+offset[i], ytop=y, col=cols[i])
# errors bars
arrows(x0=x+offset[i], y0=y-0.5*dd$SE, x1=x+offset[i], y1=y+0.5*dd$SE, col=1, angle=90, code=3, length = 0.1)
}
If what you want to get is simply the theme that will match the base theme the + theme_bw() in ggplot2 will achieve this:
data(mtcars)
require(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
theme_bw()
Result
Alternative
boxplot(mpg~cyl,data=mtcars)
If, as you said, the only thing you want to achieve is similar look, and you have working plot in the ggplot2 using the theme_bw() should produce plots that are indistinguishable from what would be derived via the standard plotting mechanism. If you feel so inclined you may tweak some minutiae details like font sizes, thickness of graph borders or visualisation of outliers.
Related
I need some help to produce a graph similar to the one posted here Density plot for numerous variables using ggplot in R
I tried the code mentioned in the post however the result is not good looking
My database looks like this:
head(df)
a b c d e f g
1 0.9999994 0.9999994 0.7924445 0.9998647 0.7300587 0.9249790 0.9816021
2 0.9999885 0.9999885 0.6782044 0.9983770 0.6119326 0.9434158 0.9583668
3 1.0000000 1.0000000 0.8709003 0.9999908 0.8181097 0.8939165 0.9942465
4 1.0000000 1.0000000 0.8587627 0.9999847 0.8035536 0.9034016 0.9998198
5 0.9999996 0.9999996 0.8059187 0.9999075 0.7480368 0.9043720 0.9290576
6 0.9999999 0.9999999 0.8532174 0.9999810 0.7971970 0.9059244 0.9983568
dat <- stack(df)
ggplot(dat, aes(x=values, fill=ind)) + geom_density(alpha=0.5)
The values range from 0.6 to 1
I've also tried the approach with pivot_longer but it doesn't have a great look as well ..
could anyone help?provide me with suggestions or alternatives?
Thanks
If you look at your y axis, you will notice it has very high values. The reason is that the density for column d is extremely high, since its values are all concentrated into a tiny spot. A grouped density plot will calculate the density for each group separately, and the smoothing kernel is scaled according to the range of the data. Since the density of column d has to fit in a range of about 0.001 of the x axis but have an area under its curve of 1, that curve is going to be a very tall sharp spike. Its density therefore "drowns out" the density of all the other groups. If you use coord_cartesian to set the y range, we can see all the other densities much more clearly. Of course, this cuts off the top of the d density since it is three orders of magnitude higher, but this seems like a reasonable compromise.
ggplot(dat, aes(x = values, fill=ind)) +
geom_density(alpha = 0.5, position = "identity") +
coord_cartesian(ylim = c(0, 30))
I have a data frame (100 x 4). The first column is a set of "bins" 0-100, the remaining columns are the counts for each variable of events within each bin (0 to the maximum number of events).
What I'm trying to do is to plot each of the three columns of data (2:4), alongside each other. Because the counts in each of the bins for each of the data sets is close to identical, the data are overlapped in the histogram/barplots I've created, despite my use of beside=true, and position = dodge.
I've set the first column as both numeric and character, but the results are identical- the bars are overlayed on top of each other. (semi-transparent density plots don't work because I want counts not the distribution densities).
The attached code, based on both R and other documentation produced the attached chart.
barplot(BinCntDF$preT,main=NewMain_Trigger, plot=TRUE,
xlab="sample frequency interval counts (0-100 msec bins)",
names.arg=BinCntDF$dT, las=0,
ylab="bin counts", axes=TRUE, xlim=c(0,100),
ylim=c(0,1000), col="red")
geom_bar(position="dodge")
barplot(BinCntDF$postT, beside=TRUE, add=TRUE)
geom_bar()
The goal is to be able to compare the two (or more) data sets side by side on the same axes, without either overlapping the other(s).
I think you have confused barplot with ggplot2. ggplot2 is a library where the function geom_bar comes from and isn't compatible with barplot which comes with Base R.
Simply compare ?barplot and ?geom_bar, and you will see that geom_bar is from the ggplot2 library. To achieve what you're after I have used the ggplot2 library and reshape2.
Step 1
Based on your description, I have assumed that your data looks roughly like this:
df <- data.frame(x = 1:10,
c1 = sample(0:100, replace=TRUE, size=10),
c2 = sample(0:50, replace=TRUE, size=10),
c3 = sample(0:70, replace=TRUE, size=10))
To plot it using ggplot2 you first have to transform the data to a long format instead of a wide format. You can do this using melt function from reshape2.
library(reshape2)
a <- melt(df, id=c("x"))
The output would look something like this
> head(a)
x variable value
1 1 c1 62
2 2 c1 47
3 3 c1 20
4 4 c1 64
5 5 c1 4
6 6 c1 52
Step 2
There are plenty of tutorials online to what ggplot2 does and the arguments. I would recommend you Google, or search through the many threads in SO to understand.
ggplot(a, aes(x=x, y=value, group=variable, fill=variable)) +
geom_bar(stat='identity', position='dodge')
Which gives you the output:
In a nutshell:
group groups the variables of interest
stat=identity ensures that no additional aggregations are made on your data
With that many bins (100) and groups (3) the plot will look messy, but try this:
set.seed(123)
myDF <- data.frame(bins=1:100, x=sample(1:100, replace=T), y=sample(1:100, replace=T), z=sample(1:100, replace=T))
myDF.m <- melt(myDF, id.vars='bins')
ggplot(myDF.m, aes(x=bins, y=value, fill=variable)) + geom_bar(stat='identity', position='dodge')
You could also try plotting w/ facets:
ggplot(myDF.m, aes(x=bins, y=value, fill=variable)) + geom_bar(stat='identity') + facet_wrap(~ variable)
I am trying to plot populations of predators and of prey over time, with confidence intervals. I can plot these two separately, how to plot on same graph?
#take mean, number, and create se of prey(d)
d.means=tapply(mydata$prey,mydata$week, mean)
d.n=tapply(mydata$prey,mydata$week, length)
d.se=tapply(mydata$prey,mydata$week, sd)/sqrt(d.n)
#plot with se using plotrix
plotCI(as.numeric(row.names(d.means)),d.means,d.se,ylim=c(0,400),pch=19,gap=0,xlab="Week",ylab="d, w population")
#take mean, number, and create se of predator(w)
w.means=tapply(mydata$pred,mydata$week, mean)
w.n=tapply(mydata$pred,mydata$week, length)
w.se=tapply(mydata$pred,mydata$week, sd)/sqrt(w.n)
#plot with se using plotrix
plotCI(as.numeric(row.names(w.means)),w.means,w.se,ylim=c(0,400),pch=19,gap=0,xlab="Week",ylab="d, w population")
After the first plot, use the code below before plotting the next plot:
par(new=T)
Make sure that you set the xlim and ylim to accommodate both plots. And you will need to use the options axes=F and ann=F.
These graphical features are discussed in detail in the ebook "R Fundamentals & Graphics". You might want to use it as a desk reference.
#take mean, number, and create se of prey(d)
d.means=tapply(mydata$prey,mydata$week, mean)
d.n=tapply(mydata$prey,mydata$week, length)
d.se=tapply(mydata$prey,mydata$week, sd)/sqrt(d.n)
#take mean, number, and create se of predator(w)
w.means=tapply(mydata$pred,mydata$week, mean)
w.n=tapply(mydata$pred,mydata$week, length)
w.se=tapply(mydata$pred,mydata$week, sd)/sqrt(w.n)
Here you have created all the variables you need but to plot them using ggplot you need them to be in a tall dataset with an variable indicating if they are predator or prey. I also added a time variable, I think yours would be week.
x=data.frame(means=c(w.means,d.means),
n=c(w.n,d.n),
se=c(w.se,d.se),
role=c(rep("pred",length(w.n)),rep("prey",length(d.n))),
time=c(1:length(w.n),1:length(d.n))
)
I don't know exactly what your data look like so here is a fake one I cooked up just to illustrate the format.
means n se role time
1 0.9874234 10 0.16200575 pred 1
2 1.4120207 12 0.08895026 pred 2
3 2.7352516 8 0.07991036 pred 3
4 1.1301248 11 0.05481813 prey 1
5 2.4810040 13 0.28682585 prey 2
6 3.1546947 9 0.22126054 prey 3
Once the data are in this nice format using ggplot is really pretty easy.
ggplot(x, aes(x=time, y=means, colour=role)) +
geom_errorbar(aes(ymin=means-se, ymax=means+se), width=.1) +
geom_line()
That gives this:
I have a dataframe mdata which looks like:
>head(mdata)
ID variable value
SJ5444_MAXGT coding 4.241920
SJ5426_MAXGT coding 4.254331
HR1383_MAXGT coding 4.244994
HR5522_MAXGT missense 4.250347
CH30041_MAXGT missense 4.303174
SJ5438_MAXGT utr.3 4.242218
and I am trying to plot a violin plot like this:
x1<- mdata$value[mdata$variable=='coding']
x2<- mdata$value[mdata$variable=='missense']
x3<- mdata$value[mdata$variable=='utr.3']
vioplot(x1, x2, x3, names=as.character(unique(mdata$variable)), col="red")
title("Violin Plot: Log10 values")
But I have another dataframe ndata which looks like:
>head(ndata)
ID variable value
SJ5444_MAXGT coding 17455
SJ5426_MAXGT coding 17961
HR1383_MAXGT coding 17579
HR5522_MAXGT missense 17797
CH30041_MAXGT missense 20099
SJ5438_MAXGT utr.3 17467
Basically mdata$value is:
mdata$value = log10(ndata$value)
So I can make the Violin plot alright. But I need to change the Y-axis labels to match ndata$value and not mdata$value. I am plotting mdata$value but want the Y-axis labels taken from ndata$value. Just FYI, this is a subset of the actual data & min and max values in actual data are 12 & 36937 and I know how to plot it on a boxplot using:
axis(side=2,labels=round(10^(seq(log10(min(ndata$value)),log10(max(ndata$value)),len=5))),at=seq(log10(min(ndata$value)),log10(max(ndata$value)),len=5))
But I cannot plot the Y-axis labels to match ndata$value in the Violin plot. Any suggestions?
P.S. I could not find a tag vioplot or violinplot so I couldn't tag it.
vioplot isn't very flexible -- it doesn't allow you to turn off the axis labels or modify them -- but you can create your own empty plot first, then add the violin plot to it with vioplot(...,add=TRUE), then add the labels manually, as follows:
## make up data
set.seed(101)
x1 <- rlnorm(1000,meanlog=3,sdlog=1)
x2 <- rlnorm(1000,meanlog=3,sdlog=2)
x3 <- rlnorm(1000,meanlog=2,sdlog=2)
Now create the plot:
library(vioplot)
par(las=1,bty="l") ## my preferred setting
## set up empty plot
plot(0:1,0:1,type="n",xlim=c(0.5,3.5),ylim=range(log10(c(x1,x2,x3))),
axes=FALSE,ann=FALSE)
vioplot(log10(x1),log10(x2),log10(x3),add=TRUE)
axis(side=1,at=1:3,labels=c("first","second","third"))
axis(side=2,at=-2:4,labels=10^(-2:4))
Alternately, you could use ggplot2::geom_violin() along with scale_y_log10() (I think).
Based on Ben Bolker's suggestion, I used ggplot2::geom_violin() and achieved what I wanted, plotting log10(value) but labeling 'value' as such on the Y-axis using:
ggplot(mdata, aes(variable, log10(value))) + geom_violin(colour="black",fill="red")
+ scale_y_continuous(
breaks = seq(log10(min(mdata$value)),log10(max(mdata$value)),len=5),
labels = round(10^(seq(log10(min(mdata$value)),log10(max(mdata$value)),len=5)))
)
I am trying to plot two vectors with different values, but equal length on the same graph as follows:
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(x,y,days)
a b days
1 23.33 33.33 1
2 24.33 34.33 2
3 25.33 35.33 3
4 26.33 36.33 4
5 27.33 37.33 5
etc..
I am trying to use ggplot2 to plot x and y on the x-axis and the days on the y-axis. However, I can't figure out how to do it. I am able to plot them individually and combine the graphs, but I want just one graph with both a and b vectors (different colors) on x-axis and number of days on y-axis.
What I have so far:
X<-ggplot(df, aes(x=a,y=days)) + geom_line(color="red")
Y<-ggplot(df, aes(x=b,y=days)) + geom_line(color="blue")
Is there any way to define the x-axis for both a and b vectors? I have also tried using the melt long function, but got stuck afterwards.
Any help is much appreciated. Thank you
I think the best way to do it is via a the approach of melting the data (as you have mentioned). Especially if you are going to add more vectors. This is the code
library(reshape2)
library(ggplot2)
a<-23:52
b<-33:62
days<-1:30
df<-data.frame(x=a,y=b,days)
df_molten=melt(df,id.vars="days")
ggplot(df_molten) + geom_line(aes(x=value,y=days,color=variable))
You can also change the colors manually via scale_color_manual.
A simpler solution is to use only ggplot. The following code will work in your case
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(a,b,days)
ggplot(data = df)+
geom_line(aes(x = df$days,y = df$a), color = "blue")+
geom_line(aes(x = df$days,y = df$b), color = "red")
I added the colors, you might want to use them to differentiate between your variables.