Overlay raw data onto geom_bar - r

I have a data-frame arranged as follows:
condition,treatment,value
A , one , 2
A , one , 1
A , two , 4
A , two , 2
...
D , two , 3
I have used ggplot2 to make a grouped bar plot that looks like this:
The bars are grouped by "condition" and the colours indicate "treatment." The bar heights are the mean of the values for each condition/treatment pair. I achieved this by creating a new data frame containing the mean and standard error (for the error bars) for all the points that will make up each group.
What I would like to do is superimpose the raw jittered data to produce a bar-chart version of this box plot: http://docs.ggplot2.org/0.9.3.1/geom_boxplot-6.png [I realise that a box plot would probably be better, but my hands are tied because the client is pathologically attached to bar charts]
I have tried adding a geom_point object to my plot and feeding it the raw data (rather than the aggregated means which were used to make the bars). This sort of works, but it plots the raw values at the wrong x axis locations. They appear at the points at which the red and grey bars join, rather than at the centres of the appropriate bar. So my plot looks like this:
I can not figure out how to shift the points by a fixed amount and then jitter them in order to get them centered over the correct bar. Anyone know? Is there, perhaps, a better way of achieving what I'm trying to do?
What follows is a minimal example that shows the problem I have:
#Make some fake data
ex=data.frame(cond=rep(c('a','b','c','d'),each=8),
treat=rep(rep(c('one','two'),4),each=4),
value=rnorm(32) + rep(c(3,1,4,2),each=4) )
#Calculate the mean and SD of each condition/treatment pair
agg=aggregate(value~cond*treat, data=ex, FUN="mean") #mean
agg$sd=aggregate(value~cond*treat, data=ex, FUN="sd")$value #add the SD
dodge <- position_dodge(width=0.9)
limits <- aes(ymax=value+sd, ymin=value-sd) #Set up the error bars
p <- ggplot(agg, aes(fill=treat, y=value, x=cond))
#Plot, attempting to overlay the raw data
print(
p + geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25) +
geom_point(data= ex[ex$treat=='one',], colour="green", size=3) +
geom_point(data= ex[ex$treat=='two',], colour="pink", size=3)
)

I found it is unnecessary to create separate dataframes. The plot can be created by providing ggplot with the raw data.
ex <- data.frame(cond=rep(c('a','b','c','d'),each=8),
treat=rep(rep(c('one','two'),4),each=4),
value=rnorm(32) + rep(c(3,1,4,2),each=4) )
p <- ggplot(ex, aes(cond,value,fill = treat))
p + geom_bar(position = 'dodge', stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', position = 'dodge', width = 0.9) +
geom_point(aes(x = cond), shape = 21, position = position_dodge(width = 1))

You need just one call to geom_point() where you use data frame ex and set x values to cond, y values to value and color=treat (inside aes()). Then add position=dodge to ensure that points are dodgeg. With scale_color_manual() and argument values= you can set colors you need.
p+geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25)+
geom_point(data=ex,aes(cond,value,color=treat),position=dodge)+
scale_color_manual(values=c("green","pink"))
UPDATE - jittering of points
You can't directly use positions dodge and jitter together. But there are some workarounds. If you save whole plot as object then with ggplot_build() you can see x positions for bars - in this case they are 0.775, 1.225, 1.775... Those positions correspond to combinations of factors cond and treat. As in data frame ex there are 4 values for each combination, then add new column that contains those x positions repeated 4 times.
ex$xcord<-rep(c(0.775,1.225,1.775,2.225,2.775,3.225,3.775,4.225),each=4)
Now in geom_point() use this new column as x values and set position to jitter.
p+geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25)+
geom_point(data=ex,aes(xcord,value,color=treat),position=position_jitter(width =.15))+
scale_color_manual(values=c("green","pink"))

As illustrated by holmrenser above, referencing a single dataframe and updating the stat instruction to "summary" in the geom_bar function is more efficient than creating additional dataframes and retaining the stat instruction as "identity" in the code.
To both jitter and dodge the data points with the bar charts per the OP's original question, this can also be accomplished by updating the position instruction in the code with position_jitterdodge. This positioning scheme allows widths for jitter and dodge terms to be customized independently, as follows:
p <- ggplot(ex, aes(cond,value,fill = treat))
p + geom_bar(position = 'dodge', stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', position = 'dodge', width = 0.9) +
geom_point(aes(x = cond), shape = 21, position =
position_jitterdodge(jitter.width = 0.5, jitter.height=0.4,
dodge.width=0.9))

Related

Grouping 2 categorical variables with geom_boxplot

I have tried some examples I found here but I always get an error or a different graph from what I need (e.g. lines instead of the boxplot, or only 2 boxes instead of 4).
I want to plot the following
Condition Time mean sem
A I 0.5578552 0.05294356
A II 0.6957565 0.09149457
P I 0.7078374 0.08142464
P II 0.7762761 0.10945771 ```
I need "Condition" in the x axis and I need to group "Time".
The idea is to get a similar visual representation to this:
enter image description here
My attempt was:
ggplot(data = means.sem, aes(x = Condition, y = mean, fill=Time, ymin = mean-sem, ymax = mean + sem))
+ geom_boxplot() +
stat_boxplot(geom ='errorbar', width = 0.5)+
scale_y_continuous(expand = c(0, 0), limits = c(0, 0.85))+ scale_fill_manual(values=c("black", "grey"))+
labs(y= "Mean", x="")+ theme_classic()```
Thank you!
What do you want your y-axis to be? On the assumption it is, for example, the sem variable, I use the following code:
boxplot <- ggplot(data=dataset, aes(x=condition, y=sem, fill=time)) + geom_boxplot(position="dodge2")
Obviously you can alter the colours, etc as you need to.
EDIT: changed the position to dodge2 as this creates a pleasing small gap between each boxplot within a group.

Show mean values in boxplots in R

time_pic <- ggplot(data_box, aes(x=Kind, y=TimeTotal, fill=Sitting_Position)) +
geom_boxplot()
print(time_pic)
time_pic+labs(title="", x="", y = "Time (Sec)")
I ran the above codes to get the following image. But I don't know how to add average value for each boxplot on this image.
updated.
I tried this.
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)
ggplot(data=data_box, aes(x=Kind, y=TimeTotal, fill=Sitting_Position)) +
geom_boxplot() +
stat_summary(fun=mean, colour="darkred", geom="point", shape=18, size=3,show_guide = FALSE) +
geom_text(data = means, aes(label = TimeTotal, y = TimeTotal + 0.08))
This is what it looks like now. Two dots are on the same line. And two values are overlapping with each other.
As others said, you can share your dataset for more specific help, but in this case I think the point can be made using a dummy dataset. I'm creating one that looks pretty similar to your own in terms of naming, so theoretically you can just plug in this code and it could work.
The biggest thing you need here is to control how ggplot2 is separating the separate boxplots for the data_box$Sitting_Position that share the same data_box$Kind. The process of separating and spreading the boxes around that x= axis value is called "dodging". When you supply a fill= or color= (or other) aesthetic in aes() for that geom, ggplot2 knows enough that it will assume you also want to group the data according to that value. So, your initial ggplot() call has in aes() that fill=Sitting_Position, which means that geom_boxplot() "works" - it creates the separate boxes that are colored differently and which are "dodged" properly.
When you create the points and the text, ggplot2 has no idea that you want to "dodge" this data, and even if you did want to dodge, on what basis to use for the dodge, since the fill= aesthetic doesn't make sense for a text or point geom. How to fix this? The answer is to:
Supply a group= aesthetic, which can override the grouping of a fill= or color= aesthetic, but which also can serve as a basis for the dodging for geoms that do not have a similar aesthetic.
Specify more clearly how you want to dodge. This will be important for accurate positioning of all things you want to dodge. Otherwise, you will have things dodged, but maybe not the same distance.
Here's how I combined all that:
# the datasets
set.seed(1234)
data_box <- data.frame(
Kind=c(rep('Model-free AR',100),rep('Real-world',100)),
TimeTotal=c(rnorm(50,5.5,1),rnorm(50,5.43,1.1),rnorm(50,4.9,1),rnorm(50,4.7,0.2)),
Sitting_Position=rep(c(rep('face to face',50),rep('side by side',50)),2)
)
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)
# the plot
ggplot(data_box, aes(x=Kind, y=TimeTotal)) + theme_bw() +
# specifying dodge here and width to avoid overlapping boxes
geom_boxplot(
aes(fill=Sitting_Position),
position=position_dodge(0.6), width=0.5
) +
# note group aesthetic and same dodge call for next two objects
stat_summary(
aes(group=Sitting_Position),
position=position_dodge(0.6),
fun=mean,
geom='point', color='darkred', shape=18, size=3,
show.legend = FALSE
) +
geom_text(
data=means,
aes(label=round(TimeTotal,2), y=TimeTotal + 0.18, group=Sitting_Position),
position=position_dodge(0.6)
)
Giving you this:

ggplot2 add manual bar to existing plot

I'm trying to add a single, manual bar to the existing area (ribbon) plot. Ideally I just wanted to specify the x (position) and y (value) for the bar.
ExampleData <- data.frame(myID=c(1,2,3,4,5,6,7,8,9,10),PU=c(10,20,30,40,50,60,70,80,90,100))
MyPlot <- ggplot(ExampleData,aes(x=myID))
MyPlot <- MyPlot + geom_ribbon(aes(ymin=0, ymax=PU), fill="lightgray", color="darkgray", size=1)
MyPlot <- MyPlot + geom_col(aes(x=4,y=40), color="red", linetype="solid", size=1)
MyPlot
It is almost working, but for some reason the value of 40 is becoming 400, and ideally I should be able to specify the width of the bar (should be half of what we see below).
Thank you for any help!
Maybe something more like this?
ExampleData <- data.frame(myID=c(1,2,3,4,5,6,7,8,9,10),
PU=c(10,20,30,40,50,60,70,80,90,100))
bar <- data.frame(xmin = 4,xmax= 4.5,ymin = 0,ymax = 40)
ggplot() +
geom_ribbon(data = ExampleData,
aes(x = myID,ymin=0, ymax=PU),
fill="lightgray",
color="darkgray", size=1) +
geom_rect(data = bar,
aes(xmin = xmin,xmax = xmax,ymin = ymin,ymax = ymax),
color = "red")
The 40 vs 400 issue you mention happens when you specify a data frame at the top ggplot() level and then try to add layers where all the aesthetics are intended to be "set" rather than "mapped". The most common case when this happens is when people are adding text labels and you end up with many many copies of each text label plotted on top of each other.
In this case, ggplot is trying to interpret the x and y values you give geom_col in the context of ExampleData, and so ends up repeating those single values 10 times and stacking the resulting bars.

Grouped bar plot in ggplot with y values based on combination of 2 categorical variables?

I am trying to create a grouped bar plot in ggplot, in which there should be 4 bars per each x value. Here is a subset of my data (actual data is about 4x longer):
Verb_Type,Frame,proportion_type,speaker
mental,V CP,0.209513024,Child
mental,V NP,0.138731597,Child
perception,V CP,0.017167382,Child
perception,V NP,0.387528402,Child
mental,V CP,0.437998087,Parent
mental,V NP,0.144086707,Parent
perception,V CP,0.042695836,Parent
perception,V NP,0.398376853,Parent
What I want is to plot Frame as the x values and proportion_type as the y values, but with the bars based on both Verb_Type and speaker. So for each x value (Frame), there would be 4 bars grouped together - a bar each for the proportion_type value corresponding to mental~child, mental~parent, perception~child, perception~parent. I need for the fill color to be based on Verb_Type, and the fill "texture" (saturation or something) based on speaker. I do not want stacked bars, as it would not accurately represent the data.
I don't want to use facet grids because I find it visually difficult to compare all 4 bars when they're separated into 2 groups. I want to group all the bars together so that the visualization is easier. But I can't figure out how to make the appropriate groupings. Is this something I can do in ggplot, or do I need to manipulate the data before plotting? I tried using melt to reshape the data, but either I was doing it wrong, or that's not what I actually should be doing.
I think you are looking for the interaction() (i.e. get all unique pairings) between df$Verb_Type and df$speaker to get the column groupings you are after. You can pass this directly to ggplot or make a new variable ahead of time:
ggplot(df, aes(x = Frame, y = proportion_type,
group = interaction(Verb_Type, speaker), fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))
Or:
df$grouper <- interaction(df$Verb_Type, df$speaker)
ggplot(df, aes(x = Frame, y = proportion_type,
group = grouper, fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))

In ggplot2, can borders of bars be changed on only one side? (color, thickness)

I know, 3D Barcharts are a sin. But i´m asked to do them and as a trade-off i suggested to only make a border with a slightly darker color than the bar´s on the top and the right side of the bar. Like that, the bars would have some kind of "shadow" (urgh) but at least you still would be able to compare them.
Is there any way to do this?
ggplot(diamonds, aes(clarity)) + geom_bar()
Another possibility, using two sets of geom_bar. The first set, the green ones, are made slightly higher and offset to the right. I borrow the data from #Didzis Elferts.
ggplot(data = df2) +
geom_bar(aes(x = as.numeric(clarity) + 0.1, y = V1 + 100),
width = 0.8, fill = "green", stat = "identity") +
geom_bar(aes(x = as.numeric(clarity), y = V1),
width = 0.8, stat = "identity") +
scale_x_continuous(name = "clarity",
breaks = as.numeric(df2$clarity),
labels = levels(df2$clarity))+
ylab("count")
As you already said - 3D barcharts are "bad". You can't do it directly in ggplot2 but here is a possible workaround for this.
First, make new data frame that contains levels of clarity and corresponding count for each level.
library(plyr)
df2<-ddply(diamonds,.(clarity),nrow)
Then in ggplot() call use new data frame and clarity as x values and V1 (counts) as y values and add geom_blank() - this will make x axis with levels we need. Then add geom_rect() to produce shading for bars - here xmin and xmax values are made as.numeric() from clarity and constant is added - for xmin constant should be less than half of bars width and xmax constant larger than half of bars width. ymin is 0 and ymax is V1 (counts) plus some constant. Finally add geom_bar(stat="identity") above this shadow to plot actually barplot.
ggplot(df2,aes(clarity,V1)) + geom_blank()+
geom_rect(aes(xmin=as.numeric(clarity)-0.38,
xmax=as.numeric(clarity)+.5,
ymin=0,
ymax=V1+250),fill="green")+
geom_bar(width=0.8,stat="identity")

Resources