Adding bidirectional error bars to points on scatter plot in ggplot - r

I am trying to add x and y axis error bars to each individual point in a scatter plot.
Each point represents a standardized mean value for fitness for males and females (n=33).
I have found the geom_errorbar and geom_errorbarh functions and this example
ggplot2 : Adding two errorbars to each point in scatterplot
However my issue is that I want to specify the standard error for each point (which I have already calculated) from another column in my dataset which looks like this below
line MaleBL1 FemaleBL1 BL1MaleSE BL1FemaleSE
3 0.05343516 0.05615977 0.28666600 0.3142001
4 -0.53321642 -0.27279609 0.23929438 0.1350793
5 -0.25853484 -0.08283566 0.25904025 0.2984323
6 -1.11250479 0.03299387 0.23553281 0.2786233
7 -0.14784506 0.28781883 0.27872358 0.2657080
10 0.38168220 0.89476555 0.25620796 0.3108585
11 0.24466921 0.14419021 0.27386482 0.3322349
12 -0.06119015 1.42294820 0.32903199 0.3632367
14 0.38957538 1.66850680 0.30362671 0.4437925
15 0.05784842 -0.12453429 0.32319116 0.3372879
18 0.71964923 -0.28669563 0.16336556 0.1911489
23 0.03191843 0.13955703 0.34522310 0.1872229
28 -0.04598340 -0.35156017 0.27001451 0.1822967
'line' is the population (n=10 individuals in each) from where each value comes from my x,y variables are 'MaleBL1' & 'FemaleBL1' and the standard error for each populations for males and females respectively 'BL1MaleSE' & 'BL1FemaleSE'
So far code wise I have
p<-ggplot(BL1ggplot, aes(x=MaleBL1, y=FemaleBL1)) +
geom_point(shape=1) +
geom_smooth(method=lm)+ # add regression line
xmin<-(MaleBL1-BL1MaleSE)
xmax<-(MaleBL1+BL1MaleSE)
ymin<-(FemaleBL1-BL1FemaleSE)
ymax<-(FemaleBL1+BL1FemaleSE)
geom_errorbarh(aes(xmin=xmin,xmax=xmax))+
geom_errorbar(aes(ymin=ymin,ymax=ymax))
I think the last two lines are wrong with specifying the limits of the error bars. I just don't know how to tell R where to take the SE values for each point from the columns BL1MaleSE and BL1FemaleSE
Any tips greatly appreciated

You really should study some tutorials. You haven't understood ggplot2 syntax.
BL1ggplot <- read.table(text=" line MaleBL1 FemaleBL1 BL1MaleSE BL1FemaleSE
3 0.05343516 0.05615977 0.28666600 0.3142001
4 -0.53321642 -0.27279609 0.23929438 0.1350793
5 -0.25853484 -0.08283566 0.25904025 0.2984323
6 -1.11250479 0.03299387 0.23553281 0.2786233
7 -0.14784506 0.28781883 0.27872358 0.2657080
10 0.38168220 0.89476555 0.25620796 0.3108585
11 0.24466921 0.14419021 0.27386482 0.3322349
12 -0.06119015 1.42294820 0.32903199 0.3632367
14 0.38957538 1.66850680 0.30362671 0.4437925
15 0.05784842 -0.12453429 0.32319116 0.3372879
18 0.71964923 -0.28669563 0.16336556 0.1911489
23 0.03191843 0.13955703 0.34522310 0.1872229
28 -0.04598340 -0.35156017 0.27001451 0.1822967", header=TRUE)
library(ggplot2)
p<-ggplot(BL1ggplot, aes(x=MaleBL1, y=FemaleBL1)) +
geom_point(shape=1) +
geom_smooth(method=lm)+
geom_errorbarh(aes(xmin=MaleBL1-BL1MaleSE,
xmax=MaleBL1+BL1MaleSE),
height=0.2)+
geom_errorbar(aes(ymin=FemaleBL1-BL1FemaleSE,
ymax=FemaleBL1+BL1FemaleSE),
width=0.2)
print(p)
Btw., looking at the errorbars you should probably use Deming regression or Total Least Squares instead of OLS regression.

Related

ggplot multiple lines in same graph

I am trying to plot multiple gene expressions over time in the same graph to demonstrate a similar profile and then add a line to illustrate the mean of total for each timepoint (like the figure 4b in recent Nature comm article https://www.nature.com/articles/s41467-017-02546-5/figures/4). My data has been normalised to be around 0 so they are all on the same scale.
df2 sample:
variable value gene
1 5 -0.610384193 1
2 5 -6.25967087 2
3 5 -3.773389731 3
50 6 -0.358879035 1
51 6 -6.066341017 2
52 6 -4.202998579 3
99 7 -0.103885903 1
100 7 -6.648844687 2
101 7 -5.041554127 3
I plot the expression levels with ggplot2:
plotC <- ggplot(df2, aes(x=variable, y=value, group=factor(gene), colour=gene)) + geom_line(size=0.5, aes(color=gene), alpha=0.4)
But adding the mean line in red to this plot is proving difficult. I calculated the means and put them in another dataframe:
means
value variable gene
1 -1.5037354 5 50
2 -0.8783492 6 50
3 -0.7769085 7 50
Then tried adding them as another layer:
plotC + geom_line(data=means, aes(x=variable, y=value, color="red", group=factor(gene)), size=0.75)
But I get an error Error: Discrete value supplied to continuous scale
Do you have any suggestions as to how I can plot this mean on the same graph in another color?
Thank you,
Anna
edit: the answer by RG20 is helpful, thanks for pointing out I had the color in the wrong place. However it plots the line outside the rest of the graph... I really don't understand what's wrong with my graph...
enter image description here
plotC + geom_line(data=means, aes(x=variable, y=value, group=factor(gene)), color='red',size=0.75)

Creating a Bar Plot with Proportions on ggplot

I'm trying to create a bar graph on ggplot that has proportions rather than counts, and I have c+geom_bar(aes(y=(..count..)/sum(..count..)*100)) but I'm not sure what either of the counts refer to. I tried putting in the data but it didn't seem to work. What should I input here?
This is the data I'm using
> describe(topprob1)
topprob1
n missing unique Info Mean
500 0 9 0.93 3.908
1 2 3 4 5 6 7 8 9
Frequency 128 105 9 15 13 172 39 12 7
% 26 21 2 3 3 34 8 2 1
You haven't provided a reproducible example, so here's an illustration with the built-in mtcars data frame. Compare the following two plots. The first gives counts. The second gives proportions, which are displayed in this case as percentages. ..count.. is an internal variable that ggplot creates to store the count values.
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar()
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..count../sum(..count..))) +
scale_y_continuous(labels=percent_format())
You can also use ..prop.. computed variable with group aesthetics:
library(ggplot2)
library(scales)
ggplot(mtcars, aes(am)) +
geom_bar(aes(y=..prop.., group = 1)) +
scale_y_continuous(labels=percent_format())

Plot barplot as density plot in ggplot

Could anyone help me to plot the data below as a density plot where colour=variable?
> head(combined_length.m)
length seq mir variable value
1 22 TGAGGTATTAGGTTGTATGGTT mmu-let-7c-5p Ago1 8.622468
2 23 TGAGGGAGTAGGTTGTATGGTTT mmu-let-7c-5p Ago1 22.212471
3 21 TGAGGTAGTAGGTTGCATGGT mmu-let-7c-5p Ago1 9.745199
4 22 TGAGGTAGTATGTTGTATGGTT mmu-let-7c-5p Ago1 11.635982
5 22 TGAGTTAGTAGGTTGTATGGTT mmu-let-7c-5p Ago1 13.203627
6 20 TGAGGTAGTAGGCTGTATGG mmu-let-7c-5p Ago1 7.752571
ggplot(combined_length.m, aes(factor(length),value)) + geom_bar(stat="identity") + facet_grid(~variable) +
theme_bw(base_size=16
I tried this without success:
ggplot(combined_length.m, aes(factor(length),value)) + geom_density(aes(fill=variable), size=2)
Error in data.frame(counts = c(167, 9324, 177, 150451, 62640, 74557, 4, :
arguments imply differing number of rows: 212, 6, 1, 4
I want something like this:
http://i.stack.imgur.com/qitOs.jpg
Using factor(length) for x seems to create problems. Just use length.
Also, density plots display the distribution of whatever you define as x. So by definition the y axis is the density at a given value of x. In your code you seem to be trying to specify both x and y, which makes no sense. You can specify a y in geom_density(...) but this controls the scaling, as shown below. [Note: Your example has only one type of variable (Ago1) so I created an artificial dataset].
set.seed(1) # for reproducible example
df <- data.frame(variable=rep(LETTERS[1:3],c(5,10,15)),
length =rpois(30,25),
value =rnorm(30,mean=20,sd=5))
library(ggplot2)
ggplot(df,aes(x=length))+geom_density(aes(color=variable))
In this representation, the area under each curve is 1. This is the same as setting y=..density..
ggplot(df,aes(x=length))+geom_density(aes(color=variable,y=..density..))
You can also set y=..count.. which scales based on the counts. In this example, since there are 15 observations for C and only 5 for A, the blue curve (C) has three times the area as the red curve (A).
ggplot(df,aes(x=length))+geom_density(aes(color=variable,y=..count..))
You can also set y=..scaled.. which adjusts the curves so the maximum value in each corresponds to 1.
ggplot(df,aes(x=length))+geom_density(aes(color=variable,y=..scaled..))
Finally, if you want to get rid of all those annoying extra lines, use stat_density(...) instead:
ggplot(df,aes(x=length))+
stat_density(aes(color=variable),geom="line",position="identity")

Simplest way to do grouped barplot

I have the following dataframe:
Catergory Reason Species
1 Decline Genuine 24
2 Improved Genuine 16
3 Improved Misclassified 85
4 Decline Misclassified 41
5 Decline Taxonomic 2
6 Improved Taxonomic 7
7 Decline Unclear 41
8 Improved Unclear 117
I'm trying to make a grouped bar chart, species as height and then 2 colours for catergory.
here is my code:
Reasonstats<-read.csv("bothstats.csv")
Reasonstats2<-as.matrix(Reasonstats[,3])
barplot((Reasonstats2),beside=T,col=c("darkblue","red"),ylab="number of
species",names.arg=Reasonstats$Reason, cex.names=0.8,las=2,space=c(0,100)
,ylim=c(0,120))
box(bty="l")
Now what I want, is to not have to label the two bars twice and to group them apart, I've tried changing the space value to all sorts of things and it doesn't seem to move the bars apart. Can anyone tell me what I'm doing wrong?
with ggplot2:
library(ggplot2)
Animals <- read.table(
header=TRUE, text='Category Reason Species
1 Decline Genuine 24
2 Improved Genuine 16
3 Improved Misclassified 85
4 Decline Misclassified 41
5 Decline Taxonomic 2
6 Improved Taxonomic 7
7 Decline Unclear 41
8 Improved Unclear 117')
ggplot(Animals, aes(factor(Reason), Species, fill = Category)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")
Not a barplot solution but using lattice and barchart:
library(lattice)
barchart(Species~Reason,data=Reasonstats,groups=Catergory,
scales=list(x=list(rot=90,cex=0.8)))
There are several ways to do plots in R; lattice is one of them, and always a reasonable solution, +1 to #agstudy. If you want to do this in base graphics, you could try the following:
Reasonstats <- read.table(text="Category Reason Species
Decline Genuine 24
Improved Genuine 16
Improved Misclassified 85
Decline Misclassified 41
Decline Taxonomic 2
Improved Taxonomic 7
Decline Unclear 41
Improved Unclear 117", header=T)
ReasonstatsDec <- Reasonstats[which(Reasonstats$Category=="Decline"),]
ReasonstatsImp <- Reasonstats[which(Reasonstats$Category=="Improved"),]
Reasonstats3 <- cbind(ReasonstatsImp[,3], ReasonstatsDec[,3])
colnames(Reasonstats3) <- c("Improved", "Decline")
rownames(Reasonstats3) <- ReasonstatsImp$Reason
windows()
barplot(t(Reasonstats3), beside=TRUE, ylab="number of species",
cex.names=0.8, las=2, ylim=c(0,120), col=c("darkblue","red"))
box(bty="l")
Here's what I did: I created a matrix with two columns (because your data were in columns) where the columns were the species counts for Decline and for Improved. Then I made those categories the column names. I also made the Reasons the row names. The barplot() function can operate over this matrix, but wants the data in rows rather than columns, so I fed it a transposed version of the matrix. Lastly, I deleted some of your arguments to your barplot() function call that were no longer needed. In other words, the problem was that your data weren't set up the way barplot() wants for your intended output.
I wrote a function wrapper called bar() for barplot() to do what you are trying to do here, since I need to do similar things frequently. The Github link to the function is here. After copying and pasting it into R, you do
bar(dv = Species,
factors = c(Category, Reason),
dataframe = Reasonstats,
errbar = FALSE,
ylim=c(0, 140)) #I increased the upper y-limit to accommodate the legend.
The one convenience is that it will put a legend on the plot using the names of the levels in your categorical variable (e.g., "Decline" and "Improved"). If each of your levels has multiple observations, it can also plot the error bars (which does not apply here, hence errbar=FALSE

Multiple Plots in R

I want to plot 2 graphs in 1 frame. Basically I want to compare the results.
Anyways, the code I tried is:
plot(male,pch=16,col="red")
lines(male,pch=16,col="red")
par(new=TRUE)
plot(female,pch=16,col="green")
lines(female,pch=16,col="green")
When I run it, I DO get 2 plots in a frame BUT it changes my y-axis. Added my plot below. Anyways, y-axis values are -4,-4,-3,-3,...
It's like both of the plots display their own axis.
Please help.
Thanks
You don't need the second plot. Just use
> plot(male,pch=16,col="red")
> lines(male, pch=16, col = "red")
> lines(female, pch=16, col = "green")
> points(female, pch=16, col = "green")
Note: that will set the frame boundaries based on the first data set, so some data from the second plot could be outside the boundaries of the plot. You can fix it by e.g. setting the limits of the first plot yourself.
For this kind of plot I usually like the plotting with ggplot2 much better. The main reason: It generalizes nicely to more than two lines without a lot of code.
The drawback for your sample data is that it is not available as a data.frame, which is required for ggplot2. Furthermore, in every case you need a x-variable to plot against. Thus, first let us create a data.frame out of your data.
dat <- data.frame(index=rep(1:10, 2), vals=c(male, female), group=rep(c('male', 'female'), each=10))
Which leaves us with
> dat
index vals group
1 1 -0.4334269341 male
2 2 0.8829902521 male
3 3 -0.6052638138 male
4 4 0.2270191965 male
5 5 3.5123679143 male
6 6 0.0615821014 male
7 7 3.6280155376 male
8 8 2.3508890457 male
9 9 2.9824432680 male
10 10 1.1938052833 male
11 1 1.3151289227 female
12 2 1.9956491556 female
13 3 0.8229389822 female
14 4 1.2062726250 female
15 5 0.6633392820 female
16 6 1.1331669670 female
17 7 -0.9002109636 female
18 8 3.2137052284 female
19 9 0.3113656610 female
20 10 1.4664434215 female
Note that my command assumes you have 10 data values each. That command would have to be adjusted according to your actual data.
Now we may use the mighty power of ggplot2:
library(ggplot2)
ggplot(dat, aes(x=index, y=vals, color=group)) + geom_point() + geom_line()
The call above has three elements: ggplot initializes the plot, tells R to use dat as datasource and defines the plot aesthetics, or better: Which aesthetic properties of the plot (such as color, position, size, etc.) are influenced by your data. We use the x and y-values as expected and furthermore set the color aesthetic to the grouping variable - that makes ggplot automatically plot two groups with different colors. Finally, we add two geometries, that pretty much do what is written above: Draw lines and draw points.
The result:
If you have your data saved in the standard way in R (in a data.frame), you end with one line of code. And if after some thousands years of evolution you want to add another gender, it is still one line of code.

Resources