Position dodge with error bars in ggplot2 - r

I am trying to use ggplot to show a series of confidence intervals across different time points. I have two sets of confidence intervals, one parametric and one bootstrap, and I would like to display them using geom_errorbar(). I tried using position_dodge() so the two CI won't directly overlay one another, but it is not working. How do I jitter the CI at the same time point?
pd <- position_dodge(.6)
ggplot(results, aes(x=intervals, y = change)) +
geom_errorbar(aes(ymin=ci.par.low, ymax=ci.par.hi), position = pd, width=.1, colour =
"green") +
geom_errorbar(aes(ymin=ci.boot.low, ymax=ci.boot.hi), width=.1, colour = "blue") +
geom_abline(intercept = slope.est, slope = 0, colour = "red") +
labs(title = paste("Protein ID:", prot.name))

I accomplished my goal with position_jitter(), though it's clunky.

Related

Grouping 2 categorical variables with geom_boxplot

I have tried some examples I found here but I always get an error or a different graph from what I need (e.g. lines instead of the boxplot, or only 2 boxes instead of 4).
I want to plot the following
Condition Time mean sem
A I 0.5578552 0.05294356
A II 0.6957565 0.09149457
P I 0.7078374 0.08142464
P II 0.7762761 0.10945771 ```
I need "Condition" in the x axis and I need to group "Time".
The idea is to get a similar visual representation to this:
enter image description here
My attempt was:
ggplot(data = means.sem, aes(x = Condition, y = mean, fill=Time, ymin = mean-sem, ymax = mean + sem))
+ geom_boxplot() +
stat_boxplot(geom ='errorbar', width = 0.5)+
scale_y_continuous(expand = c(0, 0), limits = c(0, 0.85))+ scale_fill_manual(values=c("black", "grey"))+
labs(y= "Mean", x="")+ theme_classic()```
Thank you!
What do you want your y-axis to be? On the assumption it is, for example, the sem variable, I use the following code:
boxplot <- ggplot(data=dataset, aes(x=condition, y=sem, fill=time)) + geom_boxplot(position="dodge2")
Obviously you can alter the colours, etc as you need to.
EDIT: changed the position to dodge2 as this creates a pleasing small gap between each boxplot within a group.

Is there a way to overlay a line plot (with a finer resolution) onto a bar plot (with a lower resolution)?

So I have a dataset of performance scores with an associated difficulty value, and I want to display the average performance score per difficulty value. The difficulty values range from 0 to 10, but have up to 10 decimal points and as a result are hyper specific. To make this more legible, I've been grouping the difficulty scores into bins. I've done this at two different resolutions, bins of width 0.1, and bins of width 1.
What I would like to do, is display a line plot (using the finer data points), on top of a bar plot (using the wider resolution), but I want the bar plot to maintain its structure. Right now, when I try to overlay the line plot, the x-axis seems to scale to the line plot, and the bars end up extremely narrow.
Here's the bar plot code:
g1.4 = ggplot() +
geom_bar(data = grouped_diff_wide, aes(y=mean_diff_perf, x=gr, fill=subject), stat = "identity" )+
facet_wrap(~subject)+
ggtitle("Average Performance By Difficulty")+
labs(fill = "Subject")+
ylab("Performance")+
xlab("Difficulty")+
scale_x_discrete(breaks = diff_breaks_wide, labels = seq(0, 9, 1))
g1.4
And the resulting graph:
just the bar plot
Here's the line plot code:
g1.5 = ggplot() +
geom_line(data = grouped_diff_fine, aes(y=mean_diff_perf, x = gr, group = 1))+
facet_wrap(~subject)+
ggtitle("Average Performance By Difficulty")+
labs(fill = "Subject")+
ylab("Performance")+
xlab("Difficulty")+
scale_x_discrete(breaks = diff_breaks_fine, labels = seq(0, 9, 1))
g1.5
And the resulting graph: just the line plot
And here's my attempt to combine them:
g1.6 = ggplot() +
geom_bar(data = grouped_diff_wide, aes(y=mean_diff_perf, x=gr, fill=subject), stat = "identity" )+
geom_line(data = grouped_diff_fine, aes(y=mean_diff_perf, x = gr, group = 1))+
facet_wrap(~subject)+
ggtitle("Average Performance By Difficulty")+
labs(fill = "Subject")+
ylab("Performance")+
xlab("Difficulty")+
scale_x_discrete(breaks = diff_breaks_fine, labels = seq(0, 9, 1))
g1.6
And how it turns out: combined plot with skinny bars
Is there a way to maintain the proportions of the stand alone bar plot but with the line plot overlayed?
you can use the width parameter of geom_bar (reference see here). As a very simple example using the built-in mtcars data:
ggplot(mtcars, aes(x = mpg, y = disp)) +
geom_bar(stat = "identity", width = 1.1) +
geom_line(colour = "blue", size = 2)

How to make consistent color scales between two plots with geom_hex

I'm trying to make a before/after comparison between two plots, so I need them to both have the same color scale so a true comparison can be made. I have been trying for a while to change the color scales of geom_hex, but I have only find ways of providing a min/max cut off. Is there anyway of manually setting the scale to be a defined range, e.g. 1-100? My plot code and examples are below.
ggplot() +
geom_hex(aes(x=VolumeBefore$Flow, y=SpeedBefore$Speed)) +
xlab("Flow") + ylab("Speed (MPH)")+
theme(legend.justification=c(1,0), legend.position=c(1,0), text = element_text(size = 20)) +
ggtitle('Speed-Flow Before Density Plot')
ggplot() +
geom_hex(aes(x=VolumeAfter$Flow, y=SpeedAfter$Speed)) +
xlab("Flow") + ylab("Speed (MPH)")+
theme(legend.justification=c(1,0), legend.position=c(1,0), text = element_text(size = 20)) +
ggtitle('Speed-Flow After Density Plot')
Before Plot
After Plot
In these two images you can see the scales are different, I just want to make them the same :)
Thanks!
Here's a way using scale_fill_gradient2 and oob = scales::squish, which lets you specify the lower and upper range for the fill, and constrains any values beyond that range.
Note that without specifications, the fill will use the full range of densities in the data:
ggplot(diamonds) +
geom_hex(aes(x=carat,y=price)) +
scale_fill_gradient2()
We could alternately specify the range directly, clamping anything beyond into that range. That will let you match multiple plots' legend ranges:
ggplot(diamonds) +
geom_hex(aes(x=carat,y=price)) +
scale_fill_gradient2(limits = c(0, 3000), oob = scales::squish)

geom_boxplot behaving oddly?

I'm currently plotting some data (response times in ms) in geom_boxplot.
I have a question:
When you adjust the limits on the y-axis does it disregard any values above that in the plotting & error bar calculations?
The data itself comprises of over 20k entries and I'm not sure providing a sample will be of much use as this is a more functionality based question.
Here is the code I use:
f <- function(x) {ans <- boxplot.stats(x)
data.frame(ymin = ans$conf[1], ymax = ans$conf[2], y = ans$stats[3])}
RTs.box = ggplot(mean.vis.aud.long, aes(x = Report, y = RTs, fill =Report)) + theme_bw() + facet_grid(Audio~Visual)
RTs.box +
geom_boxplot(alpha = .8) + geom_hline(yintercept = .333, linetype = 3, alpha = .8) + theme(legend.position = "none") + ylab("Reposponse Times ms") + scale_fill_grey(start=.4) +
labs(title = expression("Visual Condition")) + theme(plot.title = element_text(size = rel(1)))+
theme(panel.background = element_rect())+
#line below for shaded confidence intervals
stat_summary(fun.data = f, geom = "crossbar",
colour = NA, fill = "skyblue", width = 0.75, alpha = .9)+
ylim(0,1000)#this is the value that I change that results in different plots and shaded confidence intervals
Here is the plot with
ylim(0,1000)
And using the same data but changing the limit to
ylim(0,3000)
results in this plot:
As you can see the values in the boxplots appear to be adjusted according to the limit used. Instead of plotting to the edge of the limit the percentiles are reduced. This is apparent when you compare the middle boxplot in the top-left panel of both grids.
There are differences in the confidence intervals also as can be seen.
Does this mean geom_boxplot is discarding the data above the limit or is there something I'm missing?
I want to include all the data when plotting the boxplot & confidence intervals but limit the scale so it can be seen clearly. It means not seeing some major outliers in the data but for my purposes that is fine.
Has anyone got any suggestions as to what is going on here & how to get around it without potentially dropping the values from the data outside the visual range chosen for my calculation?
Thanks as always.
From ?ylim "Observations not in this range will be dropped completely and not passed to any other layers. If a NA value is substituted for one of the limits that limit is automatically calculated."
If you want to adjust the limits without affecting the data, use coord_cartesian instead.
The function ylim clearly influences which data points are used for plotting.
T avoid this, you may want to use coord_cartesian, which will not change the underlying data.
Try to replace ylim(0,1000) with:
coord_cartesian(ylim = c(0,1000))

Modiffy axes in ggplot

I used the following code based on a previous post How to create odds ratio and 95 % CI plot in R to produce the figure posted below. I would like to:
1) Make x and y axes as well as the legends bold
2) Increase the thickness of the lines
How can I do that in ggplot?
ggplot(alln, aes(x = apoll2, y = increase, ymin = l95, ymax = u95)) + geom_pointrange(aes(col = factor(marker)), position=position_dodge(width=0.50)) +
ylab("Percent increase & 95% CI") + geom_hline(aes(yintercept = 0)) + scale_color_discrete(name = "Marker") + xlab("")
To change axis and legend appearance you should add theme() to your plot.
+ theme(axis.text=element_text(face="bold"),
legend.text=element_text(face="bold"))
To make line wider add size=1.5 inside the geom_pointrange() call.

Resources