Merge two stacked bar graphs into one plot R (ggplot2) [duplicate] - r

I'm having quite the time understanding geom_bar() and position="dodge". I was trying to make some bar graphs illustrating two groups. Originally the data was from two separate data frames. Per this question, I put my data in long format. My example:
test <- data.frame(names=rep(c("A","B","C"), 5), values=1:15)
test2 <- data.frame(names=c("A","B","C"), values=5:7)
df <- data.frame(names=c(paste(test$names), paste(test2$names)), num=c(rep(1,
nrow(test)), rep(2, nrow(test2))), values=c(test$values, test2$values))
I use that example as it's similar to the spend vs. budget example. Spending has many rows per names factor level whereas the budget only has one (one budget amount per category).
For a stacked bar plot, this works great:
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity")
In particular, note the y value maxes. They are the sums of the data from test with the values of test2 shown on blue on top.
Based on other questions I've read, I simply need to add position="dodge" to make it a side-by-side plot vs. a stacked one:
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", position="dodge")
It looks great, but note the new max y values. It seems like it's just taking the max y value from each names factor level from test for the y value. It's no longer summing them.
Per some other questions (like this one and this one, I also tried adding the group= option without success (produces the same dodged plot as above):
ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(num))) +
geom_bar(stat="identity", position="dodge")
I don't understand why the stacked works great and the dodged doesn't just put them side by side instead of on top.
ETA: I found a recent question about this on the ggplot google group with the suggestion to add alpha=0.5 to see what's going on. It isn't that ggplot is taking the max value from each grouping; it's actually over-plotting bars on top of one another for each value.
It seems that when using position="dodge", ggplot expects only one y per x. I contacted Winston Chang, a ggplot developer about this to confirm as well as to inquire if this can be changed as I don't see an advantage.
It seems that stat="identity" should tell ggplot to tally the y=val passed inside aes() instead of individual counts which happens without stat="identity" and when passing no y value.
For now, the workaround seems to be (for the original df above) to aggregate so there's only one y per x:
df2 <- aggregate(df$values, by=list(df$names, df$num), FUN=sum)
p <- ggplot(df2, aes(x=Group.1, y=x, fill=factor(Group.2)))
p <- p + geom_bar(stat="identity", position="dodge")
p

I think the problem is that you want to stack within values of the num group, and dodge between values of num.
It might help to look at what happens when you add an outline to the bars.
library(ggplot2)
set.seed(123)
df <- data.frame(
id = 1:18,
names = rep(LETTERS[1:3], 6),
num = c(rep(1, 15), rep(2, 3)),
values = sample(1:10, 18, replace=TRUE)
)
By default, there are a lot of bars stacked - you just don't see that they're separate unless you have an outline:
# Stacked bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black")
If you dodge, you get bars that are dodged between values of num, but there may be multiple bars within each value of num:
# Dodged on 'num', but some overplotted bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)
If you also add id as a grouping var, it'll dodge all of them:
# Dodging with unique 'id' as the grouping var
ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(id))) +
geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)
I think what you want is to both dodge and stack, but you can't do both.
So the best thing is to summarize the data yourself.
library(plyr)
df2 <- ddply(df, c("names", "num"), summarise, values = sum(values))
ggplot(df2, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black", position="dodge")

Related

Error when ordering grouped bars in ggplot2

My data is in the long format (as required to do the grouped barplot), so that the values for different categories are in one single column. The data is here.
Now, a standard barplot with ggplot2 orders the bars alphabetically (in my case of country names, from Argentina to Uganda). I want to keep the order of countries as it is in the dataframe. Using the suggestion here (i.e. ussing the limits= option inside the scale_x_discrete function) I get the following graph:
My code is this:
mydata <- read_excel("WDR2016Fig215.xls", col_names = TRUE)
y <- mydata$value
x <- mydata$country
z <- mydata$Skill
ggplot(data=mydata, aes(x=x, y=y, fill=z)) +
geom_bar(stat="identity", position=position_dodge(), colour="black") +
scale_x_discrete(limits=x)
The graph is nicely sorted as I want but the x axis is for some reason expanded. Any idea what is the problem?
this?
mydata$country <- factor(mydata$country, levels=unique(mydata$country)[1:30])
ggplot(data=mydata, aes(x=country, y=value, fill=Skill)) +
geom_bar(stat="identity", position=position_dodge(), colour="black")

Displaying separate means within fill groups in ggplot boxplot

I have a grouped boxplot using data with 3 categories. One category is set as the x-axis of the boxplots, the other is set as the fill, and the last one, as a faceting category. I want to display the means for each fill group, but using stat_summary only gives me the mean for the x-axis category, without separating the means for the fill:
Here is the current code:
demoplot<-ggplot(demo,aes(x=variable,y=value))
demoplot+geom_boxplot(aes(fill=category2),position=position_dodge(.9))+
stat_summary(fun.y=mean, colour="black", geom="point", shape=18, size=4,) +
facet_wrap(~category1)
Is there any way to display the mean for each category2 without having to manually compute and plot the points? Adjusting the position dodge doesn't really help, as it's just one computed mean. Would creating conditions within the mean() function be advisable?
For anyone interested, here's the data:
Advanced thanks for any enlightenment on this.
Ggplot needs to have explicit information on grouping here. You can do that either by using a aes(group=....) in the desired layer, or moving the fill=... to the main call to ggplot. Without explicit grouping for a layer, ggplot will group by the factor on the x-axis. Here's some sample code with fake data:
library(ggplot2)
set.seed(123)
nobs <- 1000
dat <- data.frame(var1=sample(LETTERS[1:3],nobs, T),
var2=sample(LETTERS[1:2],nobs,T),
var3=sample(LETTERS[1:3],nobs,T),
y=rnorm(nobs))
p1 <- ggplot(dat, aes(x=var1, y=y)) +
geom_boxplot(aes(fill=var2), position=position_dodge(.9)) +
facet_wrap(~var3) +
stat_summary(fun.y=mean, geom="point", aes(group=var2), position=position_dodge(.9),
color="black", size=4)

R - aggregating data into a dataframe

I was recently working with some output and I can't seem to plot it informatively. The output looks like the following:
180,A,71
180,C,61
180,G,68
180,U,78
182,A,70
182,C,34
182,G,123
182,U,51
I would like to plot this data so i have on the x axis the first column, and on the y axis bars which are filled according to four different types(column 2) and their frequencies (column 3). So on y axis would be frequency of all types on one value from first column, but that bar would be divided according to size of types.
I hope the question was clear and thanks for any help.
How's this?
df <- data.frame(X=rep(c(180,182), each=4), Group=rep(c("A","C","G","U"),2),
Y=c(71,61,68,78,70,34,123,51))
# Calculating percentages (just using base)
groupSum <- tapply(df$X, df$Group, sum)
df$Label <- paste0(round(100 * df$Y / groupSum[df$Group], 1), "%")
# Go for the plot
library(ggplot2)
ggplot(data=df, aes(x=X, y=Y,fill=Group)) +
geom_bar(position="dodge", stat="identity") +
scale_x_continuous(breaks=unique(df$X))
The last part only labels the x values actually used.
And this is what #Haroka's plot would look like (with percentages now added as per request -- also see here):
ggplot(data=df, aes(x=X, y=Y,fill=Group)) +
geom_bar(position="stack", stat="identity") +
scale_x_continuous(breaks=unique(df$X)) +
geom_text(aes(label = Label), size=12, hjust=0.5, vjust=3, position="stack")

How to make dodge in geom_bar agree with dodge in geom_errorbar, geom_point

I have a dataset where measurements are made for different groups at different days.
I want to have side by side bars representing the measurements at the different days for the different groups with the groups of bars spaced according to day of measurement with errorbars overlaid to them.
I'm having trouble with making the dodging in geom_bar agree with the dodge on geom_errorbar.
Here is a simple piece of code:
days = data.frame(day=c(0,1,8,15));
groups = data.frame(group=c("A","B","C","D", "E"), means=seq(0,1,length=5));
my_data = merge(days, groups);
my_data$mid = exp(my_data$means+rnorm(nrow(my_data), sd=0.25));
my_data$sigma = 0.1;
png(file="bar_and_errors_example.png", height=900, width=1200);
plot(ggplot(my_data, aes(x=day, weight=mid, ymin=mid-sigma, ymax=mid+sigma, fill=group)) +
geom_bar (position=position_dodge(width=0.5)) +
geom_errorbar (position=position_dodge(width=0.5), colour="black") +
geom_point (position=position_dodge(width=0.5), aes(y=mid, colour=group)));
dev.off();
In the plot, the errorsegments appears with a fixed offset from its bar (sorry, no plots allowed for newbies even if ggplot2 is the subject).
When binwidth is adjusted in geom_bar, the offset is not fixed and changes from day to day.
Notice, that geom_errorbar and geom_point dodge in tandem.
How do I get geom_bar to agree with the other two?
Any help appreciated.
The alignment problems are due, in part, to your bars not representing the data you intend. The following lines up correctly:
ggplot(my_data, aes(x=day, weight=mid, ymin=mid-sigma, ymax=mid+sigma, fill=group)) +
geom_bar (position=position_dodge(), aes(y=mid), stat="identity") +
geom_errorbar (position=position_dodge(width=0.9), colour="black") +
geom_point (position=position_dodge(width=0.9), aes(y=mid, colour=group))
This is an old question, but since i ran into the problem today, i want to add the following:
In
geom_bar(position = position_dodge(width=0.9), stat = "identity") +
geom_errorbar( position = position_dodge(width=0.9), colour="black")
the width-argument within position_dodge controls the dodging width of the things to dodge from each other. However, this produces whiskers as wide as the bars, which is possibly undesired.
An additional width-argument outside the position_dodge controls the width of the whiskers (and bars):
geom_bar(position = position_dodge(width=0.9), stat = "identity", width=0.7) +
geom_errorbar( position = position_dodge(width=0.9), colour="black", width=0.3)
The first change I reformatted the code according to the advanced R style guide.
days <- data.frame(day=c(0,1,8,15))
groups <- data.frame(
group=c("A","B","C","D", "E"),
means=seq(0,1,length=5)
)
my_data <- merge(days, groups)
my_data$mid <- exp(my_data$means+rnorm(nrow(my_data), sd=0.25))
my_data$sigma <- 0.1
Now when we look at the data we see that day is a factor and everything else is the same.
str(my_data)
To remove blank space from the plot I converted the day column to factors. CHECK that the levels are in the proper order before proceeding.
my_data$day <- as.factor(my_data$day)
levels(my_data$day)
The next change I made was defining y in your aes arguments. As I'm sure you are aware, this lets ggplot know where to look for y values. Then I changed the position argument to "dodge" and added the stat="identity" argument. The "identity" argument tells ggplot to plot y at x. geom_errorbar inherits the dodge position from geom_bar so you can leave it unspecified, but geom_point does not so you must specify that value. The default dodge is position_dodge(.9).
ggplot(data = my_data,
aes(x=day,
y= mid,
ymin=mid-sigma,
ymax=mid+sigma,
fill=group)) +
geom_bar(position="dodge", stat = "identity") +
geom_errorbar( position = position_dodge(), colour="black") +
geom_point(position=position_dodge(.9), aes(y=mid, colour=group))
sometimes you put aes(x=tasks,y=val,fill=group) in geom_bar rather than ggplot. This causes the problem since ggplot looks forward x and you specify it by the location of each group.

Issue with ggplot2, geom_bar, and position="dodge": stacked has correct y values, dodged does not

I'm having quite the time understanding geom_bar() and position="dodge". I was trying to make some bar graphs illustrating two groups. Originally the data was from two separate data frames. Per this question, I put my data in long format. My example:
test <- data.frame(names=rep(c("A","B","C"), 5), values=1:15)
test2 <- data.frame(names=c("A","B","C"), values=5:7)
df <- data.frame(names=c(paste(test$names), paste(test2$names)), num=c(rep(1,
nrow(test)), rep(2, nrow(test2))), values=c(test$values, test2$values))
I use that example as it's similar to the spend vs. budget example. Spending has many rows per names factor level whereas the budget only has one (one budget amount per category).
For a stacked bar plot, this works great:
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity")
In particular, note the y value maxes. They are the sums of the data from test with the values of test2 shown on blue on top.
Based on other questions I've read, I simply need to add position="dodge" to make it a side-by-side plot vs. a stacked one:
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", position="dodge")
It looks great, but note the new max y values. It seems like it's just taking the max y value from each names factor level from test for the y value. It's no longer summing them.
Per some other questions (like this one and this one, I also tried adding the group= option without success (produces the same dodged plot as above):
ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(num))) +
geom_bar(stat="identity", position="dodge")
I don't understand why the stacked works great and the dodged doesn't just put them side by side instead of on top.
ETA: I found a recent question about this on the ggplot google group with the suggestion to add alpha=0.5 to see what's going on. It isn't that ggplot is taking the max value from each grouping; it's actually over-plotting bars on top of one another for each value.
It seems that when using position="dodge", ggplot expects only one y per x. I contacted Winston Chang, a ggplot developer about this to confirm as well as to inquire if this can be changed as I don't see an advantage.
It seems that stat="identity" should tell ggplot to tally the y=val passed inside aes() instead of individual counts which happens without stat="identity" and when passing no y value.
For now, the workaround seems to be (for the original df above) to aggregate so there's only one y per x:
df2 <- aggregate(df$values, by=list(df$names, df$num), FUN=sum)
p <- ggplot(df2, aes(x=Group.1, y=x, fill=factor(Group.2)))
p <- p + geom_bar(stat="identity", position="dodge")
p
I think the problem is that you want to stack within values of the num group, and dodge between values of num.
It might help to look at what happens when you add an outline to the bars.
library(ggplot2)
set.seed(123)
df <- data.frame(
id = 1:18,
names = rep(LETTERS[1:3], 6),
num = c(rep(1, 15), rep(2, 3)),
values = sample(1:10, 18, replace=TRUE)
)
By default, there are a lot of bars stacked - you just don't see that they're separate unless you have an outline:
# Stacked bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black")
If you dodge, you get bars that are dodged between values of num, but there may be multiple bars within each value of num:
# Dodged on 'num', but some overplotted bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)
If you also add id as a grouping var, it'll dodge all of them:
# Dodging with unique 'id' as the grouping var
ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(id))) +
geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)
I think what you want is to both dodge and stack, but you can't do both.
So the best thing is to summarize the data yourself.
library(plyr)
df2 <- ddply(df, c("names", "num"), summarise, values = sum(values))
ggplot(df2, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black", position="dodge")

Resources