ggplot2: create ordered group bar plot - (use reorder) - r

I want to create grouped bar plot while keeping order. If it was single column and not a grouped bar plot use of reorder function is obvious. But not sure how to use it on a melted data.frame.
Here is the detail explanation with code example:
Lets say we have following data.frame:
d.nfl <- data.frame(Team1=c("Vikings", "Chicago", "GreenBay", "Detroit"), Win=c(20, 13, 9, 12))
plotting a simple bar plot while flipping it.
ggplot(d.nfl, aes(x = Team1, y=Win)) + geom_bar(aes(fill=Team1), stat="identity") + coord_flip()
above plot will not have an order and if I want to order the plot by win I can do following:
d.nfl$orderedTeam <- reorder(d.nfl$Team1, d.nfl$Win)
ggplot(d.nfl, aes(x = orderedTeam, y=Win)) + geom_bar(aes(fill=orderedTeam), stat="identity") + coord_flip()
Now lets say we add another column (to original data frame)
d.nfl$points <- c(12, 3, 45, 5)
Team1 Win points
1 Vikings 20 12
2 Chicago 13 3
3 GreenBay 9 45
4 Detroit 12 5
to generate grouped bar plot, first we need to melt it:
library(reshape2)
> d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
> ggplot(d.nfl.melt,aes(x = Team1,y = value)) + geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()
above ggplot is unordered.
but how I do ordered group bar plot (ascending manner)

This is a non-issue.
The easiest way is to not discard your ordered team in the melt:
d.nfl.melt <- melt(d.nfl,id.vars = c("Team1", "orderedTeam"))
Alternatively, we can use reorder after melting and just only use the Win elements in computing the ordering:
d.nfl.melt$ordered_after_melting = reorder(
d.nfl.melt$Team1,
X = d.nfl.melt$value * (d.nfl.melt$variable == "Win")
)
Yet another idea is to take the levels from the original ordered column and apply them to a melted factor:
d.nfl.melt$copied_levels = factor(
d.nfl.melt$Team1,
levels = levels(d.nfl$orderedTeam)
)
All three methods give the same result. (I left out the coord_flips because they don't add anything to the question, but you can of course add them back in.)
gridExtra::grid.arrange(
ggplot(d.nfl.melt,aes(x = orderedTeam, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = ordered_after_melting, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity"),
ggplot(d.nfl.melt,aes(x = copied_levels, y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity")
)
As to the easiest, I would recommend just keeping the orderedTeam variable around while melting. Your code seems to work hard to leave it out, it's quite easy to keep it in.

The challenge your question presents is how to reorder a factor Team1 based on a subset values in a melted column.
The comments to your question from #alistaire and #joran link to great answers.
The tl;dr answer is to just apply the ordering from your original, unmelted data.frame to the new one using levels().
library(reshape2)
#Picking up from your example code:
d.nfl.melt <- melt(d.nfl[,c('Team1','Win','points')],id.vars = 1)
levels(d.nfl.melt$Team1)
#Current order is alphabetical
#[1] "Chicago" "Detroit" "GreenBay" "Vikings"
#Reorder based on Wins (using the same order from your earlier, unmelted data.frame)
d.nfl.melt$Team1 <- factor(d.nfl.melt$Team1, levels = levels(d.nfl$orderedTeam)) #SOLUTION
levels(d.nfl.melt$Team1)
#New order is ascending by wins
#[1] "GreenBay" "Detroit" "Chicago" "Vikings"
ggplot(d.nfl.melt,aes(x = Team1,y = value)) +
geom_bar(aes(fill = variable),position = "dodge", stat="identity") + coord_flip()

Related

Is there a way to order a stacked bar plot in Rstudio ggplot such that the bars are ordered based on the increasing or decreasing of a subset? [duplicate]

I'm trying to plot my data as a stacked bar with 3 levels ("catg"), but I want the categories on the X-axis to appeared in increasing order by their value of the "low" sub-categories,
here is reproductive example:
#creating df:
set.seed(33)
df<-data.frame(value=runif(12),
catg=factor(rep(c("high","medium","low")),
levels = c("high","medium","low")),
var_name=(c(rep("question1",3),rep("question2",3),rep("question3",3),rep("question4",3)))
#plotting
bar_dist<-ggplot(df,aes(x=var_name,
y=value,
fill=catg,
label=round(value,2)))
bar_dist+ geom_bar(stat = "identity",
position = "dodge",
width = 0.7)+
coord_flip() +
xlab("questions")+
ylab("y")+
geom_text(size = 4,position=position_dodge(width = 0.7))
And here is my current plot:
so in this case I should have question3 and then 4, 1, and finally 2.
every help will be appreciate,
One way could be :
df$var_name=factor(df$var_name,levels=rev(levels(reorder(df[df$catg=="low",]$var_name,df[df$catg=="low",]$value))))
It uses reorder() as suggested by Richard Telford to reorder the levels according to df$value after filtering df to keep only the "low".
levels() is used to extract the levels from the previous function.
rev() is used to reverse the levels.
factor() reassigns the levels to df$var_name
or :
df$var_name=factor(df$var_name,levels = df[with(df,order(value,decreasing = T)) ,][df[with(df,order(value,decreasing = T)) ,]$catg=="low",]$var_name)
It sorts df on df$value (by decreasing value), filters on df$catg for "low" and retrieves df$var_name which is used as levels in factor().
The same plotting function is then used:
A solution that doesn't modify the data frame, using fct_reorder2() from the forcats library:
library(forcats)
bar_dist <- ggplot(df,
aes(
x = fct_reorder2(var_name, catg, value),
y = value, fill = catg,
label = round(value, 2)))
bar_dist + geom_bar(stat = "identity",
position = "dodge",
width = 0.7) +
coord_flip() +
xlab("questions") +
ylab("y") +
geom_text(size = 4, position = position_dodge(width = 0.7))

Problem with naming x-axis with ggplot2 in Rstudio

I'm trying to create some variation of a pareto-chart.
Moving along the code I face a problem I cannot solve on my own for several hours. It's regarding the data order of the package ggplot2 (1) and renaming the labels accordingly(2).
(1)Since I want to create an ordered bar-plot with a saturation curve, I created a dummyvar from X to X-1, so my bars are sorted from high to low, as you can see in the output (1).
By maneuvering around this problem I created a second problem I can't fix.
(2)I have a column in my df containing all the species I want to see at the x-axis. However, ggplot won't allow to print those accordingly. Actually since I added the command I won't get any labeling on the x-axis. Somehow I will not get any error.
So my question is:
Is there a way to use my species list as x-axis?(But remember my data has to be sorted from high to low)
Or does some one easily spot a way to solve the labeling problem?
cheers
dfb
Beech id proc kommu Order
1 Va fla 1 8.749851 8.749851 Psocopt
2 Er 2 7.793812 16.543663 Acari
3 Faga dou 3 7.659406 24.203069 Dipt
4 Tro 4 6.675941 30.879010 Acari
5 Hal ann 5 6.289307 37.168317 Dipt
6 Stigm 6 3.724406 40.892723 Acari
7 Di fag 7 3.642574 44.535297 Lepidopt
8 Phyfa 8 3.390545 47.925842 Neoptera
9 Phylma 9 2.766040 50.691881 Lepidopt
data example:
structure(list(Beech = c("Va fla", "Er", "Faga dou", "Tro", "Hal ann",
"Stigm", "Di fag", "Phyfa", "Phylma"), id = c(1, 2, 3, 4, 5,
6, 7, 8, 9), proc = c(8.749851, 7.793812, 7.659406, 6.675941,
6.289307, 3.724406, 3.642574, 3.390545, 2.76604), kommu = c(8.749851,
16.543663, 24.203069, 30.87901, 37.168317, 40.892723, 44.535297,
47.925842, 50.691881), Order = c("Psocopt", "Acari", "Dipt",
"Acari", "Dipt", "Acari", "Lepidopt", "Neoptera", "Lepidopt")), row.names = c(NA,
-9L), class = c("tbl_df", "tbl", "data.frame"))
library(openxlsx)
library(ggplot2)
dfb <- data.xlsx ###(df containing different % values per species)
labelb <- dfb$Beech ###(list of 22 items; same number as x-values)
p <-ggplot(dfb, aes(x=id))
p <- p + geom_bar(aes(y = proc), stat = "identity", fill = "lightgreen")
p <- p + geom_line(aes(y = kommu/10), color = "orange", size = 2) + geom_point(aes(y = kommu/10),size = 2)
p <- p + scale_y_continuous(sec.axis = sec_axis(~.*10, name ="Total biocoenosis[%]"))
p <- p + labs(y = "Species [%]",
x = "Species")
p <- p + scale_x_discrete(labels = labelb)
p <- p + theme(legend.position = c(0.8, 0.9))
--> Answer to other comments:
So basically my problem is the bars are not labeled with a species name.
I know that this is a result due to my dummyvar, which is basically 1 to 22.
So I try to force ggplot to name the x-axis with my wanted values.
But this input doesn't work
p <- p + scale_x_discrete(labels = labelb)
But back to your suggestions:
Jeah, I tried tidyverse just after creating this post and couldn't handle it good enough. But your idea doesn't do anything for me, its like using the ggplot command.
arrange(Beech) %>%
mutate(Beech = factor(Beech, levels = unique(.$Beech))) %>%
ggplot(aes(Beech, proc)) +
geom_col()
I can't quite tell from the picture what's going wrong, but one way to make sure your bar plots are in ascending/descending order is to arrange the column and then convert it to a factor using the existing order of the categories:
So, without ordering:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(price = mean(price)) %>%
ggplot(aes(cut, price)) +
geom_bar(stat = "identity")
And with ordering:
diamonds %>%
group_by(cut) %>%
summarize(price = mean(price)) %>%
arrange(price) %>%
mutate(cut = factor(cut, levels = unique(.$cut))) %>%
ggplot(aes(cut, price)) +
geom_bar(stat = "identity")
I edited your code with the database sample you provided and I think I was able to do what you wanted.
Basically I sorted Beech depending on the descending proc and then convert it to factor. Here is the modified code and the result:
p <-
dfb %>%
arrange(desc(proc)) %>%
mutate(Beech = factor(Beech, levels = unique(.$Beech))) %>%
ggplot(aes(Beech)) +
geom_bar(aes(y = proc), stat = "identity", fill = "lightgreen") +
geom_line(aes(y = kommu/10, x=as.integer(Beech)), color = "orange", size = 2) +
geom_point(aes(y = kommu/10),size = 2) +
labs(y = "Species [%]", x = "Species") +
scale_x_discrete("Species") +
scale_y_continuous(sec.axis = sec_axis(~.*10, name ="Total biocoenosis[%]")) +
theme(legend.position = c(0.8, 0.9))
p
Note: I had to tweak a bit the geom_line by adding x=as.integer(Beech) because it works with numbers and not factors.

How to get the plots side by side and that too sorted according to Fill in R Language [duplicate]

I am making a dodged barplot in ggplot2 and one grouping has a zero count that I want to display. I remembered seeing this on HERE a while back and figured the scale_x_discrete(drop=F) would work. It does not appear to work with dodged bars. How can I make the zero counts show?
For instance, (code below) in the plot below, type8~group4 has no examples. I would still like the plot to display the empty space for the zero count instead of eliminating the bar. How can I do this?
mtcars2 <- data.frame(type=factor(mtcars$cyl),
group=factor(mtcars$gear))
m2 <- ggplot(mtcars2, aes(x=type , fill=group))
p2 <- m2 + geom_bar(colour="black", position="dodge") +
scale_x_discrete(drop=F)
p2
Here's how you can do it without making summary tables first.
It did not work in my CRAN versioin (2.2.1) but in the latest development version of ggplot (2.2.1.900) I had no issues.
ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = position_dodge(preserve = "single"))
http://ggplot2.tidyverse.org/reference/position_dodge.html
Updated geom_bar() needs stat = "identity"
For what it's worth: The table of counts, dat, above contains NA. Sometimes, it is useful to have an explicit 0 instead; for instance, if the next step is to put counts above the bars. The following code does just that, although it's probably no simpler than Joran's. It involves two steps: get a crosstabulation of counts using dcast, then melt the table using melt, followed by ggplot() as usual.
library(ggplot2)
library(reshape2)
mtcars2 = data.frame(type=factor(mtcars$cyl), group=factor(mtcars$gear))
dat = dcast(mtcars2, type ~ group, fun.aggregate = length)
dat.melt = melt(dat, id.vars = "type", measure.vars = c("3", "4", "5"))
dat.melt
ggplot(dat.melt, aes(x = type,y = value, fill = variable)) +
geom_bar(stat = "identity", colour = "black", position = position_dodge(width = .8), width = 0.7) +
ylim(0, 14) +
geom_text(aes(label = value), position = position_dodge(width = .8), vjust = -0.5)
The only way I know of is to pre-compute the counts and add a dummy row:
dat <- rbind(ddply(mtcars2,.(type,group),summarise,count = length(group)),c(8,4,NA))
ggplot(dat,aes(x = type,y = count,fill = group)) +
geom_bar(colour = "black",position = "dodge",stat = "identity")
I thought that using stat_bin(drop = FALSE,geom = "bar",...) instead would work, but apparently it does not.
I asked this same question, but I only wanted to use data.table, as it's a faster solution for much larger data sets. I included notes on the data so that those that are less experienced and want to understand why I did what I did can do so easily. Here is how I manipulated the mtcars data set:
library(data.table)
library(scales)
library(ggplot2)
mtcars <- data.table(mtcars)
mtcars$Cylinders <- as.factor(mtcars$cyl) # Creates new column with data from cyl called Cylinders as a factor. This allows ggplot2 to automatically use the name "Cylinders" and recognize that it's a factor
mtcars$Gears <- as.factor(mtcars$gear) # Just like above, but with gears to Gears
setkey(mtcars, Cylinders, Gears) # Set key for 2 different columns
mtcars <- mtcars[CJ(unique(Cylinders), unique(Gears)), .N, allow.cartesian = TRUE] # Uses CJ to create a completed list of all unique combinations of Cylinders and Gears. Then counts how many of each combination there are and reports it in a column called "N"
And here is the call that produced the graph
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_x_discrete(drop = FALSE)
And it produces this graph:
Furthermore, if there is continuous data, like that in the diamonds data set (thanks to mnel):
library(data.table)
library(scales)
library(ggplot2)
diamonds <- data.table(diamonds) # I modified the diamonds data set in order to create gaps for illustrative purposes
setkey(diamonds, color, cut)
diamonds[J("E",c("Fair","Good")), carat := 0]
diamonds[J("G",c("Premium","Good","Fair")), carat := 0]
diamonds[J("J",c("Very Good","Fair")), carat := 0]
diamonds <- diamonds[carat != 0]
Then using CJ would work as well.
data <- data.table(diamonds)[,list(mean_carat = mean(carat)), keyby = c('cut', 'color')] # This step defines our data set as the combinations of cut and color that exist and their means. However, the problem with this is that it doesn't have all combinations possible
data <- data[CJ(unique(cut),unique(color))] # This functions exactly the same way as it did in the discrete example. It creates a complete list of all possible unique combinations of cut and color
ggplot(data, aes(color, mean_carat, fill=cut)) +
geom_bar(stat = "identity", position = "dodge") +
ylab("Mean Carat") + xlab("Color")
Giving us this graph:
Use count and complete from dplyr to do this.
library(tidyverse)
mtcars %>%
mutate(
type = as.factor(cyl),
group = as.factor(gear)
) %>%
count(type, group) %>%
complete(type, group, fill = list(n = 0)) %>%
ggplot(aes(x = type, y = n, fill = group)) +
geom_bar(colour = "black", position = "dodge", stat = "identity")
You can exploit the feature of the table() function, which computes the number of occurrences of a factor for all its levels
# load plyr package to use ddply
library(plyr)
# compute the counts using ddply, including zero occurrences for some factor levels
df <- ddply(mtcars2, .(group), summarise,
types = as.numeric(names(table(type))),
counts = as.numeric(table(type)))
# plot the results
ggplot(df, aes(x = types, y = counts, fill = group)) +
geom_bar(stat='identity',colour="black", position="dodge")

R ggplot specific order of bars

So, I am doing several descending ordered barplots in R using ggplot. Each of these plots contains one bar named "others", which should always the last bar. How to realize this optimally? More generally: Is there an easy possibility to pick one bar of a bar plot and move it to the last position without manually changing all levels.
Many thanks in advance,
chris
The trick is to use factor, as follows
library(ggplot2) # for plots
# dummy data
dat <- data.frame(
letters = c("A","B","Other","X","Y","Z"),
probs = sample(runif(6)*10,6)
)
# not what we want
ggplot(dat, aes(letters, probs)) + geom_bar(stat = "identity")
# magic happens here
# factor, and push Other to end of levels using c(others, other)
dat$letters <- factor(
dat$letters,
levels = c(
levels(dat$letters)[!levels(dat$letters) == "Other"],
"Other")
)
ggplot(dat, aes(letters, probs)) + geom_bar(stat = "identity")
If you're using + coord_flip(), use levels = rev(c(...)) for intuitive ordering
dat$letters <- factor(
dat$letters,
levels = rev(c(
levels(dat$letters)[!levels(dat$letters) == "Other"],
"Other"))
)
ggplot(dat, aes(letters, probs)) + geom_bar(stat = "identity") + coord_flip()

Don't drop zero count: dodged barplot

I am making a dodged barplot in ggplot2 and one grouping has a zero count that I want to display. I remembered seeing this on HERE a while back and figured the scale_x_discrete(drop=F) would work. It does not appear to work with dodged bars. How can I make the zero counts show?
For instance, (code below) in the plot below, type8~group4 has no examples. I would still like the plot to display the empty space for the zero count instead of eliminating the bar. How can I do this?
mtcars2 <- data.frame(type=factor(mtcars$cyl),
group=factor(mtcars$gear))
m2 <- ggplot(mtcars2, aes(x=type , fill=group))
p2 <- m2 + geom_bar(colour="black", position="dodge") +
scale_x_discrete(drop=F)
p2
Here's how you can do it without making summary tables first.
It did not work in my CRAN versioin (2.2.1) but in the latest development version of ggplot (2.2.1.900) I had no issues.
ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = position_dodge(preserve = "single"))
http://ggplot2.tidyverse.org/reference/position_dodge.html
Updated geom_bar() needs stat = "identity"
For what it's worth: The table of counts, dat, above contains NA. Sometimes, it is useful to have an explicit 0 instead; for instance, if the next step is to put counts above the bars. The following code does just that, although it's probably no simpler than Joran's. It involves two steps: get a crosstabulation of counts using dcast, then melt the table using melt, followed by ggplot() as usual.
library(ggplot2)
library(reshape2)
mtcars2 = data.frame(type=factor(mtcars$cyl), group=factor(mtcars$gear))
dat = dcast(mtcars2, type ~ group, fun.aggregate = length)
dat.melt = melt(dat, id.vars = "type", measure.vars = c("3", "4", "5"))
dat.melt
ggplot(dat.melt, aes(x = type,y = value, fill = variable)) +
geom_bar(stat = "identity", colour = "black", position = position_dodge(width = .8), width = 0.7) +
ylim(0, 14) +
geom_text(aes(label = value), position = position_dodge(width = .8), vjust = -0.5)
The only way I know of is to pre-compute the counts and add a dummy row:
dat <- rbind(ddply(mtcars2,.(type,group),summarise,count = length(group)),c(8,4,NA))
ggplot(dat,aes(x = type,y = count,fill = group)) +
geom_bar(colour = "black",position = "dodge",stat = "identity")
I thought that using stat_bin(drop = FALSE,geom = "bar",...) instead would work, but apparently it does not.
I asked this same question, but I only wanted to use data.table, as it's a faster solution for much larger data sets. I included notes on the data so that those that are less experienced and want to understand why I did what I did can do so easily. Here is how I manipulated the mtcars data set:
library(data.table)
library(scales)
library(ggplot2)
mtcars <- data.table(mtcars)
mtcars$Cylinders <- as.factor(mtcars$cyl) # Creates new column with data from cyl called Cylinders as a factor. This allows ggplot2 to automatically use the name "Cylinders" and recognize that it's a factor
mtcars$Gears <- as.factor(mtcars$gear) # Just like above, but with gears to Gears
setkey(mtcars, Cylinders, Gears) # Set key for 2 different columns
mtcars <- mtcars[CJ(unique(Cylinders), unique(Gears)), .N, allow.cartesian = TRUE] # Uses CJ to create a completed list of all unique combinations of Cylinders and Gears. Then counts how many of each combination there are and reports it in a column called "N"
And here is the call that produced the graph
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_x_discrete(drop = FALSE)
And it produces this graph:
Furthermore, if there is continuous data, like that in the diamonds data set (thanks to mnel):
library(data.table)
library(scales)
library(ggplot2)
diamonds <- data.table(diamonds) # I modified the diamonds data set in order to create gaps for illustrative purposes
setkey(diamonds, color, cut)
diamonds[J("E",c("Fair","Good")), carat := 0]
diamonds[J("G",c("Premium","Good","Fair")), carat := 0]
diamonds[J("J",c("Very Good","Fair")), carat := 0]
diamonds <- diamonds[carat != 0]
Then using CJ would work as well.
data <- data.table(diamonds)[,list(mean_carat = mean(carat)), keyby = c('cut', 'color')] # This step defines our data set as the combinations of cut and color that exist and their means. However, the problem with this is that it doesn't have all combinations possible
data <- data[CJ(unique(cut),unique(color))] # This functions exactly the same way as it did in the discrete example. It creates a complete list of all possible unique combinations of cut and color
ggplot(data, aes(color, mean_carat, fill=cut)) +
geom_bar(stat = "identity", position = "dodge") +
ylab("Mean Carat") + xlab("Color")
Giving us this graph:
Use count and complete from dplyr to do this.
library(tidyverse)
mtcars %>%
mutate(
type = as.factor(cyl),
group = as.factor(gear)
) %>%
count(type, group) %>%
complete(type, group, fill = list(n = 0)) %>%
ggplot(aes(x = type, y = n, fill = group)) +
geom_bar(colour = "black", position = "dodge", stat = "identity")
You can exploit the feature of the table() function, which computes the number of occurrences of a factor for all its levels
# load plyr package to use ddply
library(plyr)
# compute the counts using ddply, including zero occurrences for some factor levels
df <- ddply(mtcars2, .(group), summarise,
types = as.numeric(names(table(type))),
counts = as.numeric(table(type)))
# plot the results
ggplot(df, aes(x = types, y = counts, fill = group)) +
geom_bar(stat='identity',colour="black", position="dodge")

Resources