R ggplot specific order of bars - r

So, I am doing several descending ordered barplots in R using ggplot. Each of these plots contains one bar named "others", which should always the last bar. How to realize this optimally? More generally: Is there an easy possibility to pick one bar of a bar plot and move it to the last position without manually changing all levels.
Many thanks in advance,
chris

The trick is to use factor, as follows
library(ggplot2) # for plots
# dummy data
dat <- data.frame(
letters = c("A","B","Other","X","Y","Z"),
probs = sample(runif(6)*10,6)
)
# not what we want
ggplot(dat, aes(letters, probs)) + geom_bar(stat = "identity")
# magic happens here
# factor, and push Other to end of levels using c(others, other)
dat$letters <- factor(
dat$letters,
levels = c(
levels(dat$letters)[!levels(dat$letters) == "Other"],
"Other")
)
ggplot(dat, aes(letters, probs)) + geom_bar(stat = "identity")
If you're using + coord_flip(), use levels = rev(c(...)) for intuitive ordering
dat$letters <- factor(
dat$letters,
levels = rev(c(
levels(dat$letters)[!levels(dat$letters) == "Other"],
"Other"))
)
ggplot(dat, aes(letters, probs)) + geom_bar(stat = "identity") + coord_flip()

Related

Reverse stacked bar order

I'm creating a stacked bar chart using ggplot like this:
plot_df <- df[!is.na(df$levels), ]
ggplot(plot_df, aes(group)) + geom_bar(aes(fill = levels), position = "fill")
Which gives me something like this:
How do I reverse the order the stacked bars themselves, so that level 1 is at the bottom, and level 5 is at the top of each bar?
I've seen a number of questions on this (e.g. How to control ordering of stacked bar chart using identity on ggplot2) and the common solution seems to be to reorder the dataframe by that level as that what ggplot is using the determine the order
So I've tried reordering using dplyr:
plot_df <- df[!is.na(df$levels), ] %>% arrange(desc(levels))
However, the plot comes out the same. It also doesn't seem to make a difference whether I arrange by ascending or descending order
Here is a reproducible example:
group <- c(1,2,3,4, 1,2,3,4, 1,2,3,4, 1,2,3,4, 1,2,3,4, 1,2,3,4)
levels <- c("1","1","1","1","2","2","2","2","3","3","3","3","4","4","4","4","5","5","5","5","1","1","1","1")
plot_df <- data.frame(group, levels)
ggplot(plot_df, aes(group)) + geom_bar(aes(fill = levels), position = "fill")
The release notes of ggplot2 version 2.2.0 on Stacking bars suggest:
If you want to stack in the opposite order, try forcats::fct_rev()
library(ggplot2) # version 2.2.1 used
plot_df <- data.frame(group = rep(1:4, 6),
levels = factor(c(rep(1:5, each = 4), rep(1, 4))))
ggplot(plot_df, aes(group, fill = forcats::fct_rev(levels))) +
geom_bar(position = "fill")
This is the original plot:
ggplot(plot_df, aes(group, fill = levels)) +
geom_bar(position = "fill")
Or, using position_fill(reverse = TRUE) as suggested by alistaire in his comment:
ggplot(plot_df, aes(group, fill = levels)) +
geom_bar(position = position_fill(reverse = TRUE))
Note that the levels (colors) in the legend is not in the same order as in the stacked bars.
An alternative is to reorder the factor as such, assuming the factor is called "levels":
levels = ordered(levels, levels=c(5,4,3,2,1)).
for more info: http://www.cookbook-r.com/Manipulating_data/Changing_the_order_of_levels_of_a_factor/

How to get the plots side by side and that too sorted according to Fill in R Language [duplicate]

I am making a dodged barplot in ggplot2 and one grouping has a zero count that I want to display. I remembered seeing this on HERE a while back and figured the scale_x_discrete(drop=F) would work. It does not appear to work with dodged bars. How can I make the zero counts show?
For instance, (code below) in the plot below, type8~group4 has no examples. I would still like the plot to display the empty space for the zero count instead of eliminating the bar. How can I do this?
mtcars2 <- data.frame(type=factor(mtcars$cyl),
group=factor(mtcars$gear))
m2 <- ggplot(mtcars2, aes(x=type , fill=group))
p2 <- m2 + geom_bar(colour="black", position="dodge") +
scale_x_discrete(drop=F)
p2
Here's how you can do it without making summary tables first.
It did not work in my CRAN versioin (2.2.1) but in the latest development version of ggplot (2.2.1.900) I had no issues.
ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = position_dodge(preserve = "single"))
http://ggplot2.tidyverse.org/reference/position_dodge.html
Updated geom_bar() needs stat = "identity"
For what it's worth: The table of counts, dat, above contains NA. Sometimes, it is useful to have an explicit 0 instead; for instance, if the next step is to put counts above the bars. The following code does just that, although it's probably no simpler than Joran's. It involves two steps: get a crosstabulation of counts using dcast, then melt the table using melt, followed by ggplot() as usual.
library(ggplot2)
library(reshape2)
mtcars2 = data.frame(type=factor(mtcars$cyl), group=factor(mtcars$gear))
dat = dcast(mtcars2, type ~ group, fun.aggregate = length)
dat.melt = melt(dat, id.vars = "type", measure.vars = c("3", "4", "5"))
dat.melt
ggplot(dat.melt, aes(x = type,y = value, fill = variable)) +
geom_bar(stat = "identity", colour = "black", position = position_dodge(width = .8), width = 0.7) +
ylim(0, 14) +
geom_text(aes(label = value), position = position_dodge(width = .8), vjust = -0.5)
The only way I know of is to pre-compute the counts and add a dummy row:
dat <- rbind(ddply(mtcars2,.(type,group),summarise,count = length(group)),c(8,4,NA))
ggplot(dat,aes(x = type,y = count,fill = group)) +
geom_bar(colour = "black",position = "dodge",stat = "identity")
I thought that using stat_bin(drop = FALSE,geom = "bar",...) instead would work, but apparently it does not.
I asked this same question, but I only wanted to use data.table, as it's a faster solution for much larger data sets. I included notes on the data so that those that are less experienced and want to understand why I did what I did can do so easily. Here is how I manipulated the mtcars data set:
library(data.table)
library(scales)
library(ggplot2)
mtcars <- data.table(mtcars)
mtcars$Cylinders <- as.factor(mtcars$cyl) # Creates new column with data from cyl called Cylinders as a factor. This allows ggplot2 to automatically use the name "Cylinders" and recognize that it's a factor
mtcars$Gears <- as.factor(mtcars$gear) # Just like above, but with gears to Gears
setkey(mtcars, Cylinders, Gears) # Set key for 2 different columns
mtcars <- mtcars[CJ(unique(Cylinders), unique(Gears)), .N, allow.cartesian = TRUE] # Uses CJ to create a completed list of all unique combinations of Cylinders and Gears. Then counts how many of each combination there are and reports it in a column called "N"
And here is the call that produced the graph
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_x_discrete(drop = FALSE)
And it produces this graph:
Furthermore, if there is continuous data, like that in the diamonds data set (thanks to mnel):
library(data.table)
library(scales)
library(ggplot2)
diamonds <- data.table(diamonds) # I modified the diamonds data set in order to create gaps for illustrative purposes
setkey(diamonds, color, cut)
diamonds[J("E",c("Fair","Good")), carat := 0]
diamonds[J("G",c("Premium","Good","Fair")), carat := 0]
diamonds[J("J",c("Very Good","Fair")), carat := 0]
diamonds <- diamonds[carat != 0]
Then using CJ would work as well.
data <- data.table(diamonds)[,list(mean_carat = mean(carat)), keyby = c('cut', 'color')] # This step defines our data set as the combinations of cut and color that exist and their means. However, the problem with this is that it doesn't have all combinations possible
data <- data[CJ(unique(cut),unique(color))] # This functions exactly the same way as it did in the discrete example. It creates a complete list of all possible unique combinations of cut and color
ggplot(data, aes(color, mean_carat, fill=cut)) +
geom_bar(stat = "identity", position = "dodge") +
ylab("Mean Carat") + xlab("Color")
Giving us this graph:
Use count and complete from dplyr to do this.
library(tidyverse)
mtcars %>%
mutate(
type = as.factor(cyl),
group = as.factor(gear)
) %>%
count(type, group) %>%
complete(type, group, fill = list(n = 0)) %>%
ggplot(aes(x = type, y = n, fill = group)) +
geom_bar(colour = "black", position = "dodge", stat = "identity")
You can exploit the feature of the table() function, which computes the number of occurrences of a factor for all its levels
# load plyr package to use ddply
library(plyr)
# compute the counts using ddply, including zero occurrences for some factor levels
df <- ddply(mtcars2, .(group), summarise,
types = as.numeric(names(table(type))),
counts = as.numeric(table(type)))
# plot the results
ggplot(df, aes(x = types, y = counts, fill = group)) +
geom_bar(stat='identity',colour="black", position="dodge")

ggplot2: create ordered group bar plot - (use reorder)

I want to create grouped bar plot while keeping order. If it was single column and not a grouped bar plot use of reorder function is obvious. But not sure how to use it on a melted data.frame.
Here is the detail explanation with code example:
Lets say we have following data.frame:
d.nfl <- data.frame(Team1=c("Vikings", "Chicago", "GreenBay", "Detroit"), Win=c(20, 13, 9, 12))
plotting a simple bar plot while flipping it.
ggplot(d.nfl, aes(x = Team1, y=Win)) + geom_bar(aes(fill=Team1), stat="identity") + coord_flip()
above plot will not have an order and if I want to order the plot by win I can do following:
d.nfl$orderedTeam <- reorder(d.nfl$Team1, d.nfl$Win)
ggplot(d.nfl, aes(x = orderedTeam, y=Win)) + geom_bar(aes(fill=orderedTeam), stat="identity") + coord_flip()
Now lets say we add another column (to original data frame)
d.nfl$points <- c(12, 3, 45, 5)
Team1 Win points
1 Vikings 20 12
2 Chicago 13 3
3 GreenBay 9 45
4 Detroit 12 5
to generate grouped bar plot, first we need to melt it:
library(reshape2)
> d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
> ggplot(d.nfl.melt,aes(x = Team1,y = value)) + geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()
above ggplot is unordered.
but how I do ordered group bar plot (ascending manner)
This is a non-issue.
The easiest way is to not discard your ordered team in the melt:
d.nfl.melt <- melt(d.nfl,id.vars = c("Team1", "orderedTeam"))
Alternatively, we can use reorder after melting and just only use the Win elements in computing the ordering:
d.nfl.melt$ordered_after_melting = reorder(
d.nfl.melt$Team1,
X = d.nfl.melt$value * (d.nfl.melt$variable == "Win")
)
Yet another idea is to take the levels from the original ordered column and apply them to a melted factor:
d.nfl.melt$copied_levels = factor(
d.nfl.melt$Team1,
levels = levels(d.nfl$orderedTeam)
)
All three methods give the same result. (I left out the coord_flips because they don't add anything to the question, but you can of course add them back in.)
gridExtra::grid.arrange(
ggplot(d.nfl.melt,aes(x = orderedTeam, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = ordered_after_melting, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = copied_levels, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity")
)
As to the easiest, I would recommend just keeping the orderedTeam variable around while melting. Your code seems to work hard to leave it out, it's quite easy to keep it in.
The challenge your question presents is how to reorder a factor Team1 based on a subset values in a melted column.
The comments to your question from #alistaire and #joran link to great answers.
The tl;dr answer is to just apply the ordering from your original, unmelted data.frame to the new one using levels().
library(reshape2)
#Picking up from your example code:
d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
levels(d.nfl.melt$Team1)
#Current order is alphabetical
#[1] "Chicago" "Detroit" "GreenBay" "Vikings"
#Reorder based on Wins (using the same order from your earlier, unmelted data.frame)
d.nfl.melt$Team1 <- factor(d.nfl.melt$Team1, levels = levels(d.nfl$orderedTeam)) #SOLUTION
levels(d.nfl.melt$Team1)
#New order is ascending by wins
#[1] "GreenBay" "Detroit" "Chicago" "Vikings"
ggplot(d.nfl.melt,aes(x = Team1,y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()

normalize ggplot histogram so that first height is 1 (to show growth) in R

I was wondering if there is a way to normalize the heights of the histograms with multiple groups so that their first heights are all = 1. For instance:
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..count..))
p + geom_histogram(position = "dodge")
gives a regular histogram with 3 groups.
Also
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..ncount..))
p + geom_histogram(position = "dodge")
gives a the histogram where each group is normalized to have maximum height of 1.
I want to get a histogram where each group is normalized to have first height of 1 (so I can show growth) but I don't understand if there is an appropriate alternative to ..ncount or ..count.. or if anyone can help me understand the structure of ..count.. I could maybe figure it out from there.
Thanks!
I bet there is a nice way to do everything within ggplot. However, I tend to prefer preparing the desired data set before I plug it into ggplot. If I understood you correctly, you may try something like this:
# convert 'results' to factor and set levels to get an equi-spaced 'results' x-axis
df$results <- factor(df$results, levels = 1:7)
# for each category, count frequency of 'results'
df <- as.data.frame(with(df, table(results, category)))
# normalize: for each category, divide all 'Freq' (heights) with the first 'Freq'
df$freq2 <- with(df, ave(Freq, category, FUN = function(x) x/x[1]))
ggplot(data = df, aes(x = results, y = freq2, fill = category)) +
geom_bar(stat = "identity", position = "dodge")
It looks like ..density.. does what you want, but I can't for the life of me find documentation on it. On both your examples it does what you are looking for, though!
results <- rep(c(1,1,2,2,2,3,1,1,1,3,4,4,2,5,7),3)
category <- rep(c("a","b","c"),15)
data <- data.frame(results,category)
p <- ggplot(data, aes(x=results, fill = category, y = ..density..))
p + geom_histogram(position = "dodge")

Don't drop zero count: dodged barplot

I am making a dodged barplot in ggplot2 and one grouping has a zero count that I want to display. I remembered seeing this on HERE a while back and figured the scale_x_discrete(drop=F) would work. It does not appear to work with dodged bars. How can I make the zero counts show?
For instance, (code below) in the plot below, type8~group4 has no examples. I would still like the plot to display the empty space for the zero count instead of eliminating the bar. How can I do this?
mtcars2 <- data.frame(type=factor(mtcars$cyl),
group=factor(mtcars$gear))
m2 <- ggplot(mtcars2, aes(x=type , fill=group))
p2 <- m2 + geom_bar(colour="black", position="dodge") +
scale_x_discrete(drop=F)
p2
Here's how you can do it without making summary tables first.
It did not work in my CRAN versioin (2.2.1) but in the latest development version of ggplot (2.2.1.900) I had no issues.
ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = position_dodge(preserve = "single"))
http://ggplot2.tidyverse.org/reference/position_dodge.html
Updated geom_bar() needs stat = "identity"
For what it's worth: The table of counts, dat, above contains NA. Sometimes, it is useful to have an explicit 0 instead; for instance, if the next step is to put counts above the bars. The following code does just that, although it's probably no simpler than Joran's. It involves two steps: get a crosstabulation of counts using dcast, then melt the table using melt, followed by ggplot() as usual.
library(ggplot2)
library(reshape2)
mtcars2 = data.frame(type=factor(mtcars$cyl), group=factor(mtcars$gear))
dat = dcast(mtcars2, type ~ group, fun.aggregate = length)
dat.melt = melt(dat, id.vars = "type", measure.vars = c("3", "4", "5"))
dat.melt
ggplot(dat.melt, aes(x = type,y = value, fill = variable)) +
geom_bar(stat = "identity", colour = "black", position = position_dodge(width = .8), width = 0.7) +
ylim(0, 14) +
geom_text(aes(label = value), position = position_dodge(width = .8), vjust = -0.5)
The only way I know of is to pre-compute the counts and add a dummy row:
dat <- rbind(ddply(mtcars2,.(type,group),summarise,count = length(group)),c(8,4,NA))
ggplot(dat,aes(x = type,y = count,fill = group)) +
geom_bar(colour = "black",position = "dodge",stat = "identity")
I thought that using stat_bin(drop = FALSE,geom = "bar",...) instead would work, but apparently it does not.
I asked this same question, but I only wanted to use data.table, as it's a faster solution for much larger data sets. I included notes on the data so that those that are less experienced and want to understand why I did what I did can do so easily. Here is how I manipulated the mtcars data set:
library(data.table)
library(scales)
library(ggplot2)
mtcars <- data.table(mtcars)
mtcars$Cylinders <- as.factor(mtcars$cyl) # Creates new column with data from cyl called Cylinders as a factor. This allows ggplot2 to automatically use the name "Cylinders" and recognize that it's a factor
mtcars$Gears <- as.factor(mtcars$gear) # Just like above, but with gears to Gears
setkey(mtcars, Cylinders, Gears) # Set key for 2 different columns
mtcars <- mtcars[CJ(unique(Cylinders), unique(Gears)), .N, allow.cartesian = TRUE] # Uses CJ to create a completed list of all unique combinations of Cylinders and Gears. Then counts how many of each combination there are and reports it in a column called "N"
And here is the call that produced the graph
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_x_discrete(drop = FALSE)
And it produces this graph:
Furthermore, if there is continuous data, like that in the diamonds data set (thanks to mnel):
library(data.table)
library(scales)
library(ggplot2)
diamonds <- data.table(diamonds) # I modified the diamonds data set in order to create gaps for illustrative purposes
setkey(diamonds, color, cut)
diamonds[J("E",c("Fair","Good")), carat := 0]
diamonds[J("G",c("Premium","Good","Fair")), carat := 0]
diamonds[J("J",c("Very Good","Fair")), carat := 0]
diamonds <- diamonds[carat != 0]
Then using CJ would work as well.
data <- data.table(diamonds)[,list(mean_carat = mean(carat)), keyby = c('cut', 'color')] # This step defines our data set as the combinations of cut and color that exist and their means. However, the problem with this is that it doesn't have all combinations possible
data <- data[CJ(unique(cut),unique(color))] # This functions exactly the same way as it did in the discrete example. It creates a complete list of all possible unique combinations of cut and color
ggplot(data, aes(color, mean_carat, fill=cut)) +
geom_bar(stat = "identity", position = "dodge") +
ylab("Mean Carat") + xlab("Color")
Giving us this graph:
Use count and complete from dplyr to do this.
library(tidyverse)
mtcars %>%
mutate(
type = as.factor(cyl),
group = as.factor(gear)
) %>%
count(type, group) %>%
complete(type, group, fill = list(n = 0)) %>%
ggplot(aes(x = type, y = n, fill = group)) +
geom_bar(colour = "black", position = "dodge", stat = "identity")
You can exploit the feature of the table() function, which computes the number of occurrences of a factor for all its levels
# load plyr package to use ddply
library(plyr)
# compute the counts using ddply, including zero occurrences for some factor levels
df <- ddply(mtcars2, .(group), summarise,
types = as.numeric(names(table(type))),
counts = as.numeric(table(type)))
# plot the results
ggplot(df, aes(x = types, y = counts, fill = group)) +
geom_bar(stat='identity',colour="black", position="dodge")

Resources