gg plot error bars positioning - r

I am trying to make the error bars above each bar plot, but I have the bar plots in three groups, and 6 bar plots and it's positioning the error bars with respect to each group, but I want the positioning if each error bar above each bar. Here's what my data looks like:
NewData
Group Session HeartRate StdError n sd
1 Novices one 71.89276 1.821146 29 9.807170
2 Experts one 66.40705 1.923901 26 9.810008
3 Novices two 71.38609 1.571261 29 8.011889
4 Experts two 67.79910 1.788151 26 9.117818
5 Novices three 71.79759 1.941730 29 10.456534
6 Experts three 67.04564 1.938620 26 9.885061
And here is my code:
plot_2 = ggplot(NewData, aes(x=Session, y=HeartRate, fill=Group)) +
theme_bw() +
geom_bar(position="dodge",stat="identity")+
scale_x_discrete(limits=c("one","two","three")) +
coord_cartesian(ylim = c(50, 80)) +
geom_errorbar(aes(ymin=HeartRate-StdError,ymax=HeartRate+StdError),position="dodge",width=.25)
Here's the output: http://i.imgur.com/BrLB6Px.png
Any help would be appreciated. Thanks!
OK-- I found a solution, not really sure how or why it works, but here's my new code:
dodge <- position_dodge(width=0.9)
plot_2 = ggplot(NewData, aes(x=Session, y=HeartRate, fill=Group)) +
geom_bar(position=dodge)+
scale_x_discrete(limits=c("one","two","three")) +
coord_cartesian(ylim = c(50, 80)) +
geom_errorbar(aes(ymin=HeartRate-StdError,ymax=HeartRate+StdError),position=dodge,width=.25)
And here's the desired result: http://i.imgur.com/PodCeh5.png

It's kind of hard to tell what it is you want to see as output, but from what I gather, perhaps this will do what you want?
geom_errorbar(aes(ymin=HeartRate,ymax=HeartRate+StdError*2),position="dodge",width=.25)
If you want to move the error bar along the x-axis, we need to modify position=dodge. I don't see much documentation but you might try something like...
geom_errorbar(aes(ymin=HeartRate-StdError,ymax=HeartRate+StdError),position=group-10,width=.25)
or
geom_errorbar(aes(ymin=HeartRate-StdError,ymax=HeartRate+StdError),position=Session-10,width=.25)
Or (most likely?) this:
geom_errorbar(aes(ymin=HeartRate-StdError,ymax=HeartRate+StdError),position=x-10,width=.25)

Related

Error: Don't know how to add e2 to a plot

Hello I am working on a data set which looks like as below
raw_data =
week v1 v3 v4 v5 v6
1 17 20.983819 7.799831 16.0600278 113.018687
2 34 22.651678 8.090671 16.4898951 120.824817
3 15 24.197048 6.892516 16.9805836 128.105372
4 14 26.016688 5.272781 17.471264 140.15794
5 26 27.572317 10.767018 17.8686156 154.886518
6 37 29.018684 21.280104 19.8096452 165.244061
7 27 30.395094 32.140543 22.937902 176.453934
8 24 31.832068 44.008145 28.714597 184.7598
9 16 33.383742 45.704626 39.2958153 193.461108
10 28 34.877819 39.355206 45.9069661 201.305558
What I am trying to achieve is to plot variables from v3 to v6 as a stacked area plot while variable v1 as a line plot in the same graph plot across the week.
I have tried the following code which does plot the stack area plot but not the line plot.
mdf <- melt(raw_data, id="Week") # convert to long format
p <- ggplot(mdf, aes(x=Week, y=value)) + geom_area(aes(fill= mdf$variable), position = 'stack') + theme_classic()
p + ggplot(raw_data, aes(x=Week, y=v1)) +geom_line()
and I get the following error
Error: Don't know how to add e2 to a plot
I tired the method suggested by this article How to overlay geom_bar and geom_line plots with different number of elements using ggplot2? and used the below code
mdf <- melt(raw_data, id="Week") # convert to long format
p <- ggplot(mdf, aes(x=Week, y=value)) + geom_area(aes(colour =
mdf$variable, fill= mdf$variable), position = 'stack') + theme_classic()
p + geom_line(aes(x=Week, y=mdf$variable=="v1"))
but then I got the below error
Error: Discrete value supplied to continuous scale
I tried to convert the v1 variable as per below code referencing the following article, however it did not help to resolve.
How do I get discrete factor levels to be treated as continuous?
raw_data$v1 <- as.numeric(as.character(raw_data$v1))
Please help how to resolve the issue. Also, how do I create a black border line for each graph in my stacked graph such that it is easy to differentiate among the graphs.
Thanks a lot for the help in advance!!
Using your melt command does not work for me, so I'm using gather instead.
All you need to do is add geom_line and specify the data and mapping:
mdf <- tidyr::gather(raw_data, variable, value, -week, -v1)
ggplot(mdf, aes(week, value)) +
geom_area(aes(fill = variable), position = 'stack', color = 'black') +
geom_line(aes(y = v1), raw_data, lty = 2)
Note: don't use $ inside aes, ever!

R: Visualization of grade point average as (sort of) violin plot

I would like to visualize a data frame much like the following in a plot:
grade number
A 2
B 6
C 1
D 0
E 1
The idea is to have the grades on the x-axis as categories and the number of pupils who received the respective grade on the y-axis.
My task is to display them not as points like in a line chart, but as thickness above the category like in a violin plot. This is really about the pure visuals of it.
I tried ggplot2's violin, but It always takes the values of the number column for the y-axis. But the y-axis is supposed to have just one single dimension: the level around which the density-plot is rotated.
I'd be very happy If someone had a hint at how I should maybe restructure my data or maybe if I am completely mistaken with my approach.
Ah, yes: on top I'd like to display the grade-point-average as a small bar.
Thank you very much in advance for taking your time. I'm sure the solution is very obvious, but I just don't see it.
As #Gregor mentioned, a smoothed density estimate (which is what a violin plot is) with just five ordinal values isn't really appropriate here. Even if you had plus/minus grades, you'd still probably be better off with bars or lines. See below for a few options:
library(ggplot2)
# Fake data
dat = data.frame(grades=LETTERS[c(1:4,6)],
count=c(5,12,11,5,3), stringsAsFactors=FALSE)
# Reusable plot elements
thm = list(theme_bw(),
scale_y_continuous(limits=c(0,max(dat$count)), breaks=seq(0,20,2)),
labs(x="Grade", y="Count"))
ggplot(dat, aes(grades, count)) +
geom_bar(stat="identity", fill=hcl(240,100,50)) +
geom_text(aes(y=0.5*count, label=paste0(count, " (", sprintf("%1.1f", count/sum(count)*100),"%)")),
colour="white", size=3) +
thm
ggplot(dat, aes(grades, count)) +
geom_line(aes(group=1),alpha=0.4) +
geom_point() +
thm
ggplot(dat, aes(x=as.numeric(factor(grades)))) +
geom_ribbon(aes(ymin=0, ymax=count), fill="grey80") +
geom_text(aes(y=count, label=paste0(sprintf("%1.1f", count/sum(count)*100),"%")), size=3) +
scale_x_continuous(labels=LETTERS[c(1:4,6)]) +
thm

ggplot plotting problems and error bars

So I have some data that I imported into R using read.csv.
d = read.csv("Flux_test_results_for_R.csv", header=TRUE)
rows_to_plot = c(1,2,3,4,5,6,13,14)
d[rows_to_plot,]
It looks like it worked fine:
> d[rows_to_plot,]
strain selective rate ci.low ci.high
1 4051 rif 1.97539e-09 6.93021e-10 5.63066e-09
2 4052 rif 2.33927e-09 9.92957e-10 5.51099e-09
3 4081 (mutS) rif 1.32915e-07 1.05363e-07 1.67671e-07
4 4086 (mutS) rif 1.80342e-07 1.49870e-07 2.17011e-07
5 4124 (mutL) rif 5.53369e-08 4.03940e-08 7.58077e-08
6 4125 (mutL) rif 1.42575e-07 1.14957e-07 1.76828e-07
13 4760-all rif 6.74928e-08 5.41247e-08 8.41627e-08
14 4761-all rif 2.49119e-08 1.91979e-08 3.23265e-08
So now I'm trying to plot the column "rate", with "strain" as labels, and ci.low and ci.high as boundaries for confidence intervals.
Using ggplot, I can't even get the plot to work. This gives a plot where all the dots are at 1 on the y-axis:
g <- ggplot(data=d[rows_to_plot,], aes(x=strain, y=rate))
g + geom_dotplot()
Attempt at error bars:
error_limits = aes(ymax = d2$ci.high, ymin = d2$ci.low)
g + geom_errorbar(error_limits)
As you can tell I'm a complete noob to plotting things in R, any help appreciated.
Answer update
There were two things going on. As per boshek's answer, which I selected, I it seems that geom_point(), not geom_dotplot(), was the way to go.
The other issue was that originally, I filtered the data to only plot some rows, but I didn't also filter the error limits by row. So I switched to:
d2 = d[c(1,2,3,4,5,6,13,14),]
error_limits = aes(ymax = d2$ci.high, ymin = d2$ci.low)
g = ggplot(data=d2, ...etc...
A couple general comments. Get away from using attach. Though it has its uses, for beginners it can be very confusing. Get used to things like d$strain and d$selective. That said, once you call the dataframe with ggplot() you can refer to variables in that dataframe subsequently just by their names. Also you really need to ask questions with a reproducible example. This is a very important step in figuring out how to ask questions in R.
Now for the plot. I think this should work:
error_limits = aes(ymax = rate + ci.high, ymin = rate - ci.low)
ggplot(data=d[rows_to_plot,], aes(x=strain, y=rate)) +
geom_point() +
geom_errorbar(error_limits)
But of course this is untested because you haven't provided a reproducible examples.

vertical line chart - change line plotting direction to top-down in R

I am looking for a way where data points are connected following a top-down manner to visualize a ranking. In that the y-axis represents the rank and the x-axis the attributes. With the normal setting the line connects the point starting from left to right. This results that the points are connected in the wrong order.
With the data below the line should be connected from (6,1) to (4,2) and then (5,3) etc. Optimally the ranking scale need to be inverted so that rank one starts on the top.
data <- read.table(header=TRUE, text='
attribute rank
1 6
2 5
3 4
4 2
5 3
6 1
7 7
8 11
9 10
10 8
11 9
')
plot(data$attribute,data$rank,type="l")
Is there a way to change the line drawing direction? My second idea would be to rotate the graph or maybe you have better ideas.
The graph I am trying to achieve is somewhat similar to this one:
example vertical line chart
You can do this with ggplot:
library(ggplot2)
ggplot(data, aes(y = attribute, x = rank)) +
geom_line() +
coord_flip() +
scale_x_reverse()
It solves the problem exactly the way you suggested. The first part of the command (ggplot(...) + geom_line()) creates an "ordinary" line plot. Note that I have already switched x- and y-coordinates. The next command (coord_flip()) flips x- and y-axis, and the last one (scale_x_reverse) changes the ordering of the x-axis (which is plotted as the y-axis) such that 1 is in the top left corner.
Just to show you that something like the example you linked in your question can be done with ggplot2, I add the following example:
library(tidyr)
data$attribute2 <- sample(data$attribute)
data$attribute3 <- sample(data$attribute)
plot_data <- pivot_longer(data, cols = -"rank")
ggplot(plot_data, aes(y = value, x = rank, colour = name)) +
geom_line() +
geom_point() +
coord_flip() +
scale_x_reverse()
If you intend to do your plots with R, learning ggplot2 is really worthwhile. You can find many examples on Cookbook for R.

ggplot2 interaction plot error

I am trying to create an interaction plot, and R is throwing the error geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?, which I don't understand why. Below is my data frame:
topPagesCount DIRTY_INDUSTRY IND_DIRTY_HETEROGENEITY
1 10 1.4444444 1.1727001
2 831 1.4444444 1.1727001
3 1 0.8218391 0.4599108
4 0 0.8218391 0.4599108
5 0 0.8821549 0.4870270
6 30 0.8190476 0.6582197
7 26 0.8218391 0.4599108
8 0 1.4444444 1.1727001
9 7 0.8821549 0.4870270
10 398 0.8218391 0.4599108
and below is my code:
greatDF$DIRTY_INDUSTRY_fac <- as.factor (greatDF$DIRTY_INDUSTRY)
ggplot(data = greatDF, aes(x = IND_DIRTY_HETEROGENEITY, y=topPagesCount,
colour=DIRTY_INDUSTRY_fac, group=DIRTY_INDUSTRY_fac))+
stat_summary(fun.y=mean, geom="point")+
stat_summary(fun.y=mean, geom="line")
I don't see any reason for the error because clearly, there are more than 1 type of value for my response variable topPagesCount for the interaction term DIRTY_INDUSTRY:IND_DIRTY_HETEROGNEITY...am I right? maybe I am misunderstanding something...
thanks,
The reason this happens, as #Troy points out, is because the grouping itself is meaningless for geom_line() or geom_path(). There are no points to be connected with lines at all!
That's why everything is OK when you remove the last line. Note that this "error" is not an actual error, it plots the legend as it is intended to look, there is not a single actual line that should be plotted according to your aesthetics and stats.
How to fix this? Well, that depends on what you are trying to achieve, as usual. Note the difference between your code and mine:
ggplot(data = greatDF, aes(x = IND_DIRTY_HETEROGENEITY, y=topPagesCount,
colour=DIRTY_INDUSTRY_fac, group=DIRTY_INDUSTRY_fac)) +
geom_line(size=1.4) +
geom_point(size=5, shape=10) +
stat_summary(fun.y=mean, geom="point", size=5)
Is my guess correct? You may see this question for more insights on the topic.

Resources