Error when ordering grouped bars in ggplot2 - r

My data is in the long format (as required to do the grouped barplot), so that the values for different categories are in one single column. The data is here.
Now, a standard barplot with ggplot2 orders the bars alphabetically (in my case of country names, from Argentina to Uganda). I want to keep the order of countries as it is in the dataframe. Using the suggestion here (i.e. ussing the limits= option inside the scale_x_discrete function) I get the following graph:
My code is this:
mydata <- read_excel("WDR2016Fig215.xls", col_names = TRUE)
y <- mydata$value
x <- mydata$country
z <- mydata$Skill
ggplot(data=mydata, aes(x=x, y=y, fill=z)) +
geom_bar(stat="identity", position=position_dodge(), colour="black") +
scale_x_discrete(limits=x)
The graph is nicely sorted as I want but the x axis is for some reason expanded. Any idea what is the problem?

this?
mydata$country <- factor(mydata$country, levels=unique(mydata$country)[1:30])
ggplot(data=mydata, aes(x=country, y=value, fill=Skill)) +
geom_bar(stat="identity", position=position_dodge(), colour="black")

Related

Three level group to geom_col plot using facet_wrap [duplicate]

I have the following dataset:
subj <- c(rep(11,3),rep(12,3),rep(14,3),rep(15,3),rep(17,3),rep(18,3),rep(20,3))
group <- c(rep("u",3),rep("t",6),rep("u",6),rep("t",6))
time <- rep(1:3,7)
mean <- c(0.7352941, 0.8059701, 0.8823529, 0.9264706, 0.9852941, 0.9558824, 0.7941176, 0.8676471, 0.7910448, 0.7058824, 0.8382353, 0.7941176, 0.9411765, 0.9558824, 0.9852941, 0.7647059, 0.8088235, 0.7968750, 0.8088235, 0.8500000, 0.8412698)
df <- data.frame(subj,group,time,mean)
df$subj <- as.factor(df$subj)
df$time <- as.factor(df$time)
And now I create a barplot with ggplot2:
library(ggplot2)
qplot(x=subj, y=mean*100, fill=time, data=df, geom="bar",stat="identity",position="dodge") +
facet_wrap(~ group)
How do I make it so that the x-axis labels that are not present in each facet are not shown? How do I get equal distances between each subj (i.e. get rid of the bigger gaps)?
You can use scale="free":
ggplot(df, aes(x=subj, y=mean*100, fill=time)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~ group, scale="free")
Another option with slightly different aesthetics using facet_grid. In contrast to the plots above, the panels aren't the same width here, but due to "space="free_x", the bars are the same widths.
ggplot(df, aes(x=subj, y=mean*100, fill=time)) +
geom_bar(stat="identity", position="dodge") +
facet_grid(~ group, scale="free", space="free_x")

Merge two stacked bar graphs into one plot R (ggplot2) [duplicate]

I'm having quite the time understanding geom_bar() and position="dodge". I was trying to make some bar graphs illustrating two groups. Originally the data was from two separate data frames. Per this question, I put my data in long format. My example:
test <- data.frame(names=rep(c("A","B","C"), 5), values=1:15)
test2 <- data.frame(names=c("A","B","C"), values=5:7)
df <- data.frame(names=c(paste(test$names), paste(test2$names)), num=c(rep(1,
nrow(test)), rep(2, nrow(test2))), values=c(test$values, test2$values))
I use that example as it's similar to the spend vs. budget example. Spending has many rows per names factor level whereas the budget only has one (one budget amount per category).
For a stacked bar plot, this works great:
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity")
In particular, note the y value maxes. They are the sums of the data from test with the values of test2 shown on blue on top.
Based on other questions I've read, I simply need to add position="dodge" to make it a side-by-side plot vs. a stacked one:
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", position="dodge")
It looks great, but note the new max y values. It seems like it's just taking the max y value from each names factor level from test for the y value. It's no longer summing them.
Per some other questions (like this one and this one, I also tried adding the group= option without success (produces the same dodged plot as above):
ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(num))) +
geom_bar(stat="identity", position="dodge")
I don't understand why the stacked works great and the dodged doesn't just put them side by side instead of on top.
ETA: I found a recent question about this on the ggplot google group with the suggestion to add alpha=0.5 to see what's going on. It isn't that ggplot is taking the max value from each grouping; it's actually over-plotting bars on top of one another for each value.
It seems that when using position="dodge", ggplot expects only one y per x. I contacted Winston Chang, a ggplot developer about this to confirm as well as to inquire if this can be changed as I don't see an advantage.
It seems that stat="identity" should tell ggplot to tally the y=val passed inside aes() instead of individual counts which happens without stat="identity" and when passing no y value.
For now, the workaround seems to be (for the original df above) to aggregate so there's only one y per x:
df2 <- aggregate(df$values, by=list(df$names, df$num), FUN=sum)
p <- ggplot(df2, aes(x=Group.1, y=x, fill=factor(Group.2)))
p <- p + geom_bar(stat="identity", position="dodge")
p
I think the problem is that you want to stack within values of the num group, and dodge between values of num.
It might help to look at what happens when you add an outline to the bars.
library(ggplot2)
set.seed(123)
df <- data.frame(
id = 1:18,
names = rep(LETTERS[1:3], 6),
num = c(rep(1, 15), rep(2, 3)),
values = sample(1:10, 18, replace=TRUE)
)
By default, there are a lot of bars stacked - you just don't see that they're separate unless you have an outline:
# Stacked bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black")
If you dodge, you get bars that are dodged between values of num, but there may be multiple bars within each value of num:
# Dodged on 'num', but some overplotted bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)
If you also add id as a grouping var, it'll dodge all of them:
# Dodging with unique 'id' as the grouping var
ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(id))) +
geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)
I think what you want is to both dodge and stack, but you can't do both.
So the best thing is to summarize the data yourself.
library(plyr)
df2 <- ddply(df, c("names", "num"), summarise, values = sum(values))
ggplot(df2, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black", position="dodge")

plotting the whole data within each facet using facet_wrap and ggplot2

I am trying to plot line graphs for and facet_wrap for each dataset. What I would love to have is in light grey, transparent or something, all datasets in the background.
df <- data.frame(id=rep(letters[1:5], each=10),
x=seq(10),
y=runif(50))
ggplot(df, aes(x,y, group=id)) +
geom_line() +
facet_wrap(~ id)
This graph is how far I get, but I would love to have all the other missing 4 lines in each graph as well... In any way I try to use facet_wrap, I get only the data of a single line.
What I would expect is something like this for each facet.
ggplot(df, aes(x,y, group=id)) +
geom_line() +
geom_line(data=df[1:10,], aes(x,y, group=id), size=5)
Here's another approach:
First add a new column identical to id:
df$id2 <- df$id
Then add another geom_line based on the df without the original id column:
ggplot(df, aes(x,y, group=id)) +
geom_line(data=df[,2:4], aes(x=x, y=y, group=id2), colour="grey") +
geom_line() +
facet_wrap(~ id)
Here is an approach. It might not be suitable for larger datasets, as we replicate the data number_of_facets-times.
First, we do some data-wrangling to create this desired dataframe.
df$obs_id <- 1:nrow(df) #unique ID for each observation
#new data with unique ID's and 'true' facets
df2 <- expand.grid(true_facet=unique(df$id), obs_id=1:nrow(df))
#merge them
dat <- merge(df,df2,by="obs_id",all=T)
Then, we create a flag defining the 'true' faceted variable, and to discern background from foreground.
dat$col_flag <- dat$true_facet == dat$id
Now, plotting is easy. I've used geom_line twice instead of scales, as that was easier than to try to fix the ordering (would lead to black being plotted below grey).
p1 <- ggplot(dat, aes(x=x,y=y, group=id))+
geom_line(color="grey")+
geom_line(dat=dat[dat$col_flag,],size=2,color="black")+
facet_wrap(~true_facet)

Plotting continuous and discrete series in ggplot with facet

I have data that plots over time with four different variables. I would like to combine them in one plot using facet_grid, where each variable gets its own sub-plot. The following code resembles my data and the way I'm presenting it:
require(ggplot2)
require(reshape2)
subm <- melt(economics, id='date', c('psavert','uempmed','unemploy'))
mcsm <- melt(data.frame(date=economics$date, q=quarters(economics$date)), id='date')
mcsm$value <- factor(mcsm$value)
ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line() +
facet_grid(variable~., scale='free_y') +
geom_step(data=mcsm, aes(date, value)) +
scale_y_discrete(breaks=levels(mcsm$value))
If I leave out scale_y_discrete, R complains that I'm trying to combine discrete value with continuous scale. If I include scale_y_discreate my continuous series miss their scale.
Is there any neat way of solving this issue ie. getting all scales correct ? I also see that the legend is alphabetically sorted, can I change that so the legend is ordered in the same order as the sub-plots ?
Problem with your data is that that for data frame subm value is numeric (continuous) but for the mcsm value is factor (discrete). You can't use the same scale for numeric and continuous values and you get y values only for the last facet (discrete). Also it is not possible to use two scale_y...() functions in one plot.
My approach would be to make mcsm value as numeric (saved as value2) and then use them - it will plot quarters as 1,2,3 and 4. To solve the problem with legend, use scale_color_discrete() and provide breaks= in order you need.
mcsm$value2<-as.numeric(mcsm$value)
ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') + geom_step(data=mcsm, aes(date, value2)) +
scale_color_discrete(breaks=c('psavert','uempmed','unemploy','q'))
UPDATE - solution using grobs
Another approach is to use grobs and library gridExtra to plot your data as separate plots.
First, save plot with all legends and data (code as above) as object p. Then with functions ggplot_build() and ggplot_gtable() save plot as grob object gp. Extract from gp only part that plots legend (saved as object gp.leg) - in this case is list element number 17.
library(gridExtra)
p<-ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') + geom_step(data=mcsm, aes(date, value2)) +
scale_color_discrete(breaks=c('psavert','uempmed','unemploy','q'))
gp<-ggplot_gtable(ggplot_build(p))
gp.leg<-gp$grobs[[17]]
Make two new plot p1 and p2 - first plots data of subm and second only data of mcsm. Use scale_color_manual() to set colors the same as used for plot p. For the first plot remove x axis title, texts and ticks and with plot.margin= set lower margin to negative number. For the second plot change upper margin to negative number. faced_grid() should be used for both plots to get faceted look.
p1 <- ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y')+
theme(plot.margin = unit(c(0.5,0.5,-0.25,0.5), "lines"),
axis.text.x=element_blank(),
axis.title.x=element_blank(),
axis.ticks.x=element_blank())+
scale_color_manual(values=c("#F8766D","#00BFC4","#C77CFF"),guide="none")
p2 <- ggplot(data=mcsm, aes(date, value,group=1,col=variable)) + geom_step() +
facet_grid(variable~., scale='free_y')+
theme(plot.margin = unit(c(-0.25,0.5,0.5,0.5), "lines"))+ylab("")+
scale_color_manual(values="#7CAE00",guide="none")
Save both plots p1 and p2 as grob objects and then set for both plots the same widths.
gp1 <- ggplot_gtable(ggplot_build(p1))
gp2 <- ggplot_gtable(ggplot_build(p2))
maxWidth = grid::unit.pmax(gp1$widths[2:3],gp2$widths[2:3])
gp1$widths[2:3] <- as.list(maxWidth)
gp2$widths[2:3] <- as.list(maxWidth)
With functions grid.arrange() and arrangeGrob() arrange both plots and legend in one plot.
grid.arrange(arrangeGrob(arrangeGrob(gp1,gp2,heights=c(3/4,1/4),ncol=1),
gp.leg,widths=c(7/8,1/8),ncol=2))

Issue with ggplot2, geom_bar, and position="dodge": stacked has correct y values, dodged does not

I'm having quite the time understanding geom_bar() and position="dodge". I was trying to make some bar graphs illustrating two groups. Originally the data was from two separate data frames. Per this question, I put my data in long format. My example:
test <- data.frame(names=rep(c("A","B","C"), 5), values=1:15)
test2 <- data.frame(names=c("A","B","C"), values=5:7)
df <- data.frame(names=c(paste(test$names), paste(test2$names)), num=c(rep(1,
nrow(test)), rep(2, nrow(test2))), values=c(test$values, test2$values))
I use that example as it's similar to the spend vs. budget example. Spending has many rows per names factor level whereas the budget only has one (one budget amount per category).
For a stacked bar plot, this works great:
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity")
In particular, note the y value maxes. They are the sums of the data from test with the values of test2 shown on blue on top.
Based on other questions I've read, I simply need to add position="dodge" to make it a side-by-side plot vs. a stacked one:
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", position="dodge")
It looks great, but note the new max y values. It seems like it's just taking the max y value from each names factor level from test for the y value. It's no longer summing them.
Per some other questions (like this one and this one, I also tried adding the group= option without success (produces the same dodged plot as above):
ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(num))) +
geom_bar(stat="identity", position="dodge")
I don't understand why the stacked works great and the dodged doesn't just put them side by side instead of on top.
ETA: I found a recent question about this on the ggplot google group with the suggestion to add alpha=0.5 to see what's going on. It isn't that ggplot is taking the max value from each grouping; it's actually over-plotting bars on top of one another for each value.
It seems that when using position="dodge", ggplot expects only one y per x. I contacted Winston Chang, a ggplot developer about this to confirm as well as to inquire if this can be changed as I don't see an advantage.
It seems that stat="identity" should tell ggplot to tally the y=val passed inside aes() instead of individual counts which happens without stat="identity" and when passing no y value.
For now, the workaround seems to be (for the original df above) to aggregate so there's only one y per x:
df2 <- aggregate(df$values, by=list(df$names, df$num), FUN=sum)
p <- ggplot(df2, aes(x=Group.1, y=x, fill=factor(Group.2)))
p <- p + geom_bar(stat="identity", position="dodge")
p
I think the problem is that you want to stack within values of the num group, and dodge between values of num.
It might help to look at what happens when you add an outline to the bars.
library(ggplot2)
set.seed(123)
df <- data.frame(
id = 1:18,
names = rep(LETTERS[1:3], 6),
num = c(rep(1, 15), rep(2, 3)),
values = sample(1:10, 18, replace=TRUE)
)
By default, there are a lot of bars stacked - you just don't see that they're separate unless you have an outline:
# Stacked bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black")
If you dodge, you get bars that are dodged between values of num, but there may be multiple bars within each value of num:
# Dodged on 'num', but some overplotted bars
ggplot(df, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)
If you also add id as a grouping var, it'll dodge all of them:
# Dodging with unique 'id' as the grouping var
ggplot(df, aes(x=factor(names), y=values, fill=factor(num), group=factor(id))) +
geom_bar(stat="identity", colour="black", position="dodge", alpha=0.1)
I think what you want is to both dodge and stack, but you can't do both.
So the best thing is to summarize the data yourself.
library(plyr)
df2 <- ddply(df, c("names", "num"), summarise, values = sum(values))
ggplot(df2, aes(x=factor(names), y=values, fill=factor(num))) +
geom_bar(stat="identity", colour="black", position="dodge")

Resources