Putting error bars on jittered points in R - r

I'm trying to plot a scatter graph with error bars in R using categorical data on the x axis, using the following code:
Nesk <- read.table("E:\\R stuff\\Chapter 2\\Boxplots of nb\\NEnbNOINF.txt", header=TRUE, fill=TRUE)
pd <- position_dodge(0.2)
ggplot(Nesk, aes(x = TYPE, y = NB, color = TYPE)) +
geom_jitter() +
geom_point(position = pd) +
geom_errorbar(aes(ymin = LC, ymax = UC), position = pd) +
theme_bw() +
theme(axis.title = element_text(face = "bold")) +
ylab("Nb")
However, I can't get the error bars on the jittered points. I end up with this https://imgur.com/qBcvOat. Sorry all, don't have the reputation to directly insert images
I've tried using position dodge however I'm aware that it just separates the points by category (COL, LIN, NOM) as opposed to within each category. Is there any way I can jitter the points and attach error bars to these? I've seen some posts with fixes for this, but I think somewhere along the line an update invalidated those.
Thanks in advance!

Related

Grouping 2 categorical variables with geom_boxplot

I have tried some examples I found here but I always get an error or a different graph from what I need (e.g. lines instead of the boxplot, or only 2 boxes instead of 4).
I want to plot the following
Condition Time mean sem
A I 0.5578552 0.05294356
A II 0.6957565 0.09149457
P I 0.7078374 0.08142464
P II 0.7762761 0.10945771 ```
I need "Condition" in the x axis and I need to group "Time".
The idea is to get a similar visual representation to this:
enter image description here
My attempt was:
ggplot(data = means.sem, aes(x = Condition, y = mean, fill=Time, ymin = mean-sem, ymax = mean + sem))
+ geom_boxplot() +
stat_boxplot(geom ='errorbar', width = 0.5)+
scale_y_continuous(expand = c(0, 0), limits = c(0, 0.85))+ scale_fill_manual(values=c("black", "grey"))+
labs(y= "Mean", x="")+ theme_classic()```
Thank you!
What do you want your y-axis to be? On the assumption it is, for example, the sem variable, I use the following code:
boxplot <- ggplot(data=dataset, aes(x=condition, y=sem, fill=time)) + geom_boxplot(position="dodge2")
Obviously you can alter the colours, etc as you need to.
EDIT: changed the position to dodge2 as this creates a pleasing small gap between each boxplot within a group.

How to create a legend title in a ggplot2 line graph

I am creating a line graph using the ggplot2 package in R.
I cannot upload the data as it is for a study I am conducting for my final year project. So, I can only share the code with you.
This is the code for the APA formatted graph.
ggplot(accuracy_data,
aes(x = eccentricity, y = accuracy, group= speech_task)) +
geom_line(aes(linetype=speech_task)) +
scale_linetype_manual(values=c("twodash", "dotted", "solid")) +
geom_point(aes(shape = speech_task)) +
facet_grid(. ~ duration, labeller=labeller(duration = labels)) +
labs(x='Eccentricity (degrees of visual angle)', y='Accuracy of responses') +
theme_apa() +
theme(text=element_text(family='Times')) +
scale_x_continuous(breaks =c(5, 10, 15)) +
geom_errorbar(aes(ymin = accuracy - acc_sum$se , ymax = accuracy + acc_sum$se ), width=.1)
This produces a graph with a legend without a title, hence I am asking for help in creating a title for the legend.
I have tried a lot of different options however none work. I don't even get an error message.
These are the codes I have tried so far:
legend_title <- "Speech Task"
scale_fill_manual(legend_title,values=c("Conversation", "N-Back", "Silence"))
guides(fill=guide_legend(title="Speech Task"))
scale_fill_discrete(name = "Speech Task",
labels = c("Conversation", "N-Back", "Silence"))
labs(fill="Speech Task")
The following and final code I tried was the only one to produce a change in the graph. However because I have manually changed the point shape as well as line type it caused two legends to be made and only titled the line type legend.
labs(linetype= "Speech Task")
Please can I have some help :)
Seeing no data or final results I'm going on a hunch here.
I suspect you need to name shape and fill legends the same. So something along the lines of
scale_linetype_manual(name = legend_title, values = c("twodash", "dotted", "solid")) +
scale_fill_manual(name = legend_title, values = c("Conversation", "N-Back", "Silence")) +

ggplot2 add manual bar to existing plot

I'm trying to add a single, manual bar to the existing area (ribbon) plot. Ideally I just wanted to specify the x (position) and y (value) for the bar.
ExampleData <- data.frame(myID=c(1,2,3,4,5,6,7,8,9,10),PU=c(10,20,30,40,50,60,70,80,90,100))
MyPlot <- ggplot(ExampleData,aes(x=myID))
MyPlot <- MyPlot + geom_ribbon(aes(ymin=0, ymax=PU), fill="lightgray", color="darkgray", size=1)
MyPlot <- MyPlot + geom_col(aes(x=4,y=40), color="red", linetype="solid", size=1)
MyPlot
It is almost working, but for some reason the value of 40 is becoming 400, and ideally I should be able to specify the width of the bar (should be half of what we see below).
Thank you for any help!
Maybe something more like this?
ExampleData <- data.frame(myID=c(1,2,3,4,5,6,7,8,9,10),
PU=c(10,20,30,40,50,60,70,80,90,100))
bar <- data.frame(xmin = 4,xmax= 4.5,ymin = 0,ymax = 40)
ggplot() +
geom_ribbon(data = ExampleData,
aes(x = myID,ymin=0, ymax=PU),
fill="lightgray",
color="darkgray", size=1) +
geom_rect(data = bar,
aes(xmin = xmin,xmax = xmax,ymin = ymin,ymax = ymax),
color = "red")
The 40 vs 400 issue you mention happens when you specify a data frame at the top ggplot() level and then try to add layers where all the aesthetics are intended to be "set" rather than "mapped". The most common case when this happens is when people are adding text labels and you end up with many many copies of each text label plotted on top of each other.
In this case, ggplot is trying to interpret the x and y values you give geom_col in the context of ExampleData, and so ends up repeating those single values 10 times and stacking the resulting bars.

Creating Error Bars in R (ggplot2)

I've been working on creating a bar graph with error bars to depict group differences for a dataset that I have. But the error bars are coming out funky, in that they are appearing further above the bar and in the middle of a bar.
My code:
ggplot(MRS_Hippo_NAA_Cre_Data_copy, aes(Type, Hippo_6_9NAACre, fill=Type)) +
geom_bar(stat="summary", fun.y="mean", colour="black", size=.3) +
geom_errorbar(aes(ymin=meanNAA-NAAse, ymax=meanNAA+NAAse), width=.2,
position=position_dodge(.9)) + labs(x="Group", y="Right Posterior NAA/Cre") +
scale_fill_manual(values=c("#0072B2", "#D55E00"), name="Group") + theme(text =
element_text(size=18))`
This produced this graph:
I calculated the standard error by using the following function:
std <- function(x) sd(x)/sqrt(length(x))
x=Hippo_6_9NAACre
Not sure why the graph is producing funky error bars. Can anyone help or provide insight?
I had very recently a similar problem.
To solve it, first of all you may want to remove the layer
geom_errorbar(aes(ymin=meanNAA-NAAse,
ymax=meanNAA+NAAse), width=.2, position=position_dodge(.9))
and rather use a layer with the statsummary function again. That will generate the error bars separated for group.
As you want the bars indicating the standard error, you must create an appropriate function that returns the needed values, such that can be used from statsummary.
Find below a working example with iris dataset.
library(ggplot2)
## create a function for standard error that can be used with stat_summary
# I created the function inspecting the results returned by 'mean_cl_normal' that is the
# function used in some examples of stat_summary (see ?stat_summary).
mean_se = function(x){
se = function(x){sd(x)/sqrt(length(x))}
data.frame(y=mean(x), ymin=mean(x)+se(x), ymax=mean(x)-se(x))
}
## create the plot
p = ggplot(iris, aes(x = Species, y = Sepal.Length), stat="identity") +
stat_summary(fun.y = mean, geom = "col", fill = "White", colour = "Black", width=0.5) +
stat_summary(fun.data = mean_se, geom = "errorbar", width=0.2, size=1)
# print the plot
print(p)

Overlay raw data onto geom_bar

I have a data-frame arranged as follows:
condition,treatment,value
A , one , 2
A , one , 1
A , two , 4
A , two , 2
...
D , two , 3
I have used ggplot2 to make a grouped bar plot that looks like this:
The bars are grouped by "condition" and the colours indicate "treatment." The bar heights are the mean of the values for each condition/treatment pair. I achieved this by creating a new data frame containing the mean and standard error (for the error bars) for all the points that will make up each group.
What I would like to do is superimpose the raw jittered data to produce a bar-chart version of this box plot: http://docs.ggplot2.org/0.9.3.1/geom_boxplot-6.png [I realise that a box plot would probably be better, but my hands are tied because the client is pathologically attached to bar charts]
I have tried adding a geom_point object to my plot and feeding it the raw data (rather than the aggregated means which were used to make the bars). This sort of works, but it plots the raw values at the wrong x axis locations. They appear at the points at which the red and grey bars join, rather than at the centres of the appropriate bar. So my plot looks like this:
I can not figure out how to shift the points by a fixed amount and then jitter them in order to get them centered over the correct bar. Anyone know? Is there, perhaps, a better way of achieving what I'm trying to do?
What follows is a minimal example that shows the problem I have:
#Make some fake data
ex=data.frame(cond=rep(c('a','b','c','d'),each=8),
treat=rep(rep(c('one','two'),4),each=4),
value=rnorm(32) + rep(c(3,1,4,2),each=4) )
#Calculate the mean and SD of each condition/treatment pair
agg=aggregate(value~cond*treat, data=ex, FUN="mean") #mean
agg$sd=aggregate(value~cond*treat, data=ex, FUN="sd")$value #add the SD
dodge <- position_dodge(width=0.9)
limits <- aes(ymax=value+sd, ymin=value-sd) #Set up the error bars
p <- ggplot(agg, aes(fill=treat, y=value, x=cond))
#Plot, attempting to overlay the raw data
print(
p + geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25) +
geom_point(data= ex[ex$treat=='one',], colour="green", size=3) +
geom_point(data= ex[ex$treat=='two',], colour="pink", size=3)
)
I found it is unnecessary to create separate dataframes. The plot can be created by providing ggplot with the raw data.
ex <- data.frame(cond=rep(c('a','b','c','d'),each=8),
treat=rep(rep(c('one','two'),4),each=4),
value=rnorm(32) + rep(c(3,1,4,2),each=4) )
p <- ggplot(ex, aes(cond,value,fill = treat))
p + geom_bar(position = 'dodge', stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', position = 'dodge', width = 0.9) +
geom_point(aes(x = cond), shape = 21, position = position_dodge(width = 1))
You need just one call to geom_point() where you use data frame ex and set x values to cond, y values to value and color=treat (inside aes()). Then add position=dodge to ensure that points are dodgeg. With scale_color_manual() and argument values= you can set colors you need.
p+geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25)+
geom_point(data=ex,aes(cond,value,color=treat),position=dodge)+
scale_color_manual(values=c("green","pink"))
UPDATE - jittering of points
You can't directly use positions dodge and jitter together. But there are some workarounds. If you save whole plot as object then with ggplot_build() you can see x positions for bars - in this case they are 0.775, 1.225, 1.775... Those positions correspond to combinations of factors cond and treat. As in data frame ex there are 4 values for each combination, then add new column that contains those x positions repeated 4 times.
ex$xcord<-rep(c(0.775,1.225,1.775,2.225,2.775,3.225,3.775,4.225),each=4)
Now in geom_point() use this new column as x values and set position to jitter.
p+geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25)+
geom_point(data=ex,aes(xcord,value,color=treat),position=position_jitter(width =.15))+
scale_color_manual(values=c("green","pink"))
As illustrated by holmrenser above, referencing a single dataframe and updating the stat instruction to "summary" in the geom_bar function is more efficient than creating additional dataframes and retaining the stat instruction as "identity" in the code.
To both jitter and dodge the data points with the bar charts per the OP's original question, this can also be accomplished by updating the position instruction in the code with position_jitterdodge. This positioning scheme allows widths for jitter and dodge terms to be customized independently, as follows:
p <- ggplot(ex, aes(cond,value,fill = treat))
p + geom_bar(position = 'dodge', stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', position = 'dodge', width = 0.9) +
geom_point(aes(x = cond), shape = 21, position =
position_jitterdodge(jitter.width = 0.5, jitter.height=0.4,
dodge.width=0.9))

Resources