Using the following dataframe and ggplot...
sample ="BC04"
df<- data.frame(Name=c("Pseudomonas veronii", "Pseudomonas stutzeri", "Janthinobacterium lividum", "Pseudomonas viridiflava"),
Abundance=c(7.17, 4.72, 3.44, 3.33))
ggplot(data=df, aes(x=sample, y=Abundance, fill=Name)) +
geom_bar(stat="identity")
... creates the following graph
barplot
Altough the "geom_bar(stat="identity")" is set to "identity", it still ignores the order in the dataframe. I would like to get a stack order based on the Abundance percentage (Highest percentage at the top with ascending order)
Earlier, strings passed to ggplot, are evaluated with aes_string (which is now deprecated). Now, we convert the string to symbol and evaluate (!!)
library(ggplot2)
ggplot(data=df, aes(x= !! rlang::sym(sample), y=Abundance, fill=Name)) +
geom_bar(stat="identity")
Or another option is .data
ggplot(data=df, aes(x= .data[[sample]]), y=Abundance, fill=Name)) +
geom_bar(stat="identity")
Update
By checking the plot, it may be that the OP created a column named 'sample. In that case, we reorder the 'Name' based on the descending order of 'Abundance'
df$sample <- "BC04"
ggplot(data = df, aes(x = sample, y = Abundance,
fill = reorder(Name, desc(Abundance)))) +
geom_bar(stat = 'identity')+
guides(fill = guide_legend(title = "Name"))
-output
Or another option is to convert the 'Name' to factor with levels mentioned as the unique elements of 'Name' (as the data is already arranged in descending order of 'Abundance')
library(dplyr)
df %>%
mutate(Name = factor(Name, levels = unique(Name))) %>%
ggplot(aes(x = sample, y = Abundance, fill = Name)) +
geom_bar(stat = 'identity')
Related
Code to reproduce the issue I have:
library("data.table")
library("ggplot2")
DT<-data.table(team=c("Q1","Q2","Q3"), mon=c(3,5,2), tues=c(4,2,1), weds=c(4,2,5))
DT<-melt(DT,id.vars = "team", measure.name = c("mon","tues","weds"))
chartdata<-DT[,.(team, day=variable, score=value)]
ggplot(chartdata, aes(fill=day, y=score, x=team)) +#reorder(data3$Insurer, if(thisdir=="asc") {value} else {-value}))) +
geom_bar(position="dodge", stat="identity")
This produces a clustered barplot. I need to set the order by Monday's score (descending), but can't see a way of doing this. I have tried:
ggplot(chartdata, aes(fill=day, y=score, x=reorder(team, {-score}))) +
geom_bar(position="dodge", stat="identity")
but this appears to sort the data measured by the totals of Monday - Wedsnesday, not using only Monday as I want.
Is this possible? Many thanks!
You can sort your dataframe before plotting into ggplot2 and fix factor levels of the variable used for x axis:
library(dplyr)
library(ggplot2)
chartdata %>%
arrange(day, -score) %>%
mutate(team = factor(team, unique(team))) %>%
ggplot(aes(x = team, y = score, fill = day))+
geom_col(position = position_dodge())
Is it what you are looking for ?
In ggplot, I want to compute the means (per group) and plot them as points. I would like to do that with geom_point(), and not stat_summary().
Here are my data.
group = rep(c('a', 'b'), each = 3)
grade = 1:6
df = data.frame(group, grade)
# this does the job
ggplot(df, aes(group, grade)) +
stat_summary(fun.y = 'mean', geom = 'point')
# but this does not
ggplot(df, aes(group, grade)) +
geom_point(stat = 'mean')
What value can take the stat argument above?
Is it possible to compute the means, using geom_point(), without computing a new data frame?
You could do
ggplot(df, aes(group, grade)) +
geom_point(stat = 'summary', fun.y="mean")
But in general its really not a great idea to rely on ggplot to do your data manipulation for you. Just let ggplot take of the plotting. You can use packages like dplyr to help with the summarizing
df %>% group_by(group) %>%
summarize(grade=mean(grade)) %>%
ggplot(aes(group, grade)) +
geom_point()
How to I order a set of variable names along the x-axis that contain letters and numbers? So these come from a survey where the variables are formatted like var1, below. But when plotted, they appear out_1, out_10, out_11...
But what I would like is for it to be plotted out_1, out_2...
library(tidyverse)
var1<-rep(paste0('out','_', seq(1,12,1)), 100)
var2<-rnorm(n=length(var1) ,mean=2)
df<-data.frame(var1, var2)
ggplot(df, aes(x=var1, y=var2))+geom_boxplot()
I tried this:
df %>%
separate(var1, into=c('A', 'B'), sep='_') %>%
arrange(B) %>%
ggplot(., aes(x=B, y=var2))+geom_boxplot()
You can order the levels of var1 before plotting:
levels(df$var1) <- unique(df$var1)
ggplot(df, aes(var1,var2)) + geom_boxplot()
Or you can specify the order in ggplot scale options:
ggplot(df, aes(var1,var2)) +
geom_boxplot() +
scale_x_discrete(labels = unique(df$var1))
Both cases will give the same result:
You can also use it to give personalized labels; there's no need to create a new variable:
ggplot(df, aes(var1, var2)) +
geom_boxplot() +
scale_x_discrete('output', labels = gsub('out_', '', unique(df$var1)))
Check ?discrete_scale for details. You can use breaks and labels in different combinations, including the use of labels that came from outside your data.frame:
pers.labels <- paste('Output', 1:12)
ggplot(df, aes(var1, var2)) +
geom_boxplot() +
scale_x_discrete(NULL, labels = pers.labels)
I am making a dodged barplot in ggplot2 and one grouping has a zero count that I want to display. I remembered seeing this on HERE a while back and figured the scale_x_discrete(drop=F) would work. It does not appear to work with dodged bars. How can I make the zero counts show?
For instance, (code below) in the plot below, type8~group4 has no examples. I would still like the plot to display the empty space for the zero count instead of eliminating the bar. How can I do this?
mtcars2 <- data.frame(type=factor(mtcars$cyl),
group=factor(mtcars$gear))
m2 <- ggplot(mtcars2, aes(x=type , fill=group))
p2 <- m2 + geom_bar(colour="black", position="dodge") +
scale_x_discrete(drop=F)
p2
Here's how you can do it without making summary tables first.
It did not work in my CRAN versioin (2.2.1) but in the latest development version of ggplot (2.2.1.900) I had no issues.
ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = position_dodge(preserve = "single"))
http://ggplot2.tidyverse.org/reference/position_dodge.html
Updated geom_bar() needs stat = "identity"
For what it's worth: The table of counts, dat, above contains NA. Sometimes, it is useful to have an explicit 0 instead; for instance, if the next step is to put counts above the bars. The following code does just that, although it's probably no simpler than Joran's. It involves two steps: get a crosstabulation of counts using dcast, then melt the table using melt, followed by ggplot() as usual.
library(ggplot2)
library(reshape2)
mtcars2 = data.frame(type=factor(mtcars$cyl), group=factor(mtcars$gear))
dat = dcast(mtcars2, type ~ group, fun.aggregate = length)
dat.melt = melt(dat, id.vars = "type", measure.vars = c("3", "4", "5"))
dat.melt
ggplot(dat.melt, aes(x = type,y = value, fill = variable)) +
geom_bar(stat = "identity", colour = "black", position = position_dodge(width = .8), width = 0.7) +
ylim(0, 14) +
geom_text(aes(label = value), position = position_dodge(width = .8), vjust = -0.5)
The only way I know of is to pre-compute the counts and add a dummy row:
dat <- rbind(ddply(mtcars2,.(type,group),summarise,count = length(group)),c(8,4,NA))
ggplot(dat,aes(x = type,y = count,fill = group)) +
geom_bar(colour = "black",position = "dodge",stat = "identity")
I thought that using stat_bin(drop = FALSE,geom = "bar",...) instead would work, but apparently it does not.
I asked this same question, but I only wanted to use data.table, as it's a faster solution for much larger data sets. I included notes on the data so that those that are less experienced and want to understand why I did what I did can do so easily. Here is how I manipulated the mtcars data set:
library(data.table)
library(scales)
library(ggplot2)
mtcars <- data.table(mtcars)
mtcars$Cylinders <- as.factor(mtcars$cyl) # Creates new column with data from cyl called Cylinders as a factor. This allows ggplot2 to automatically use the name "Cylinders" and recognize that it's a factor
mtcars$Gears <- as.factor(mtcars$gear) # Just like above, but with gears to Gears
setkey(mtcars, Cylinders, Gears) # Set key for 2 different columns
mtcars <- mtcars[CJ(unique(Cylinders), unique(Gears)), .N, allow.cartesian = TRUE] # Uses CJ to create a completed list of all unique combinations of Cylinders and Gears. Then counts how many of each combination there are and reports it in a column called "N"
And here is the call that produced the graph
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_x_discrete(drop = FALSE)
And it produces this graph:
Furthermore, if there is continuous data, like that in the diamonds data set (thanks to mnel):
library(data.table)
library(scales)
library(ggplot2)
diamonds <- data.table(diamonds) # I modified the diamonds data set in order to create gaps for illustrative purposes
setkey(diamonds, color, cut)
diamonds[J("E",c("Fair","Good")), carat := 0]
diamonds[J("G",c("Premium","Good","Fair")), carat := 0]
diamonds[J("J",c("Very Good","Fair")), carat := 0]
diamonds <- diamonds[carat != 0]
Then using CJ would work as well.
data <- data.table(diamonds)[,list(mean_carat = mean(carat)), keyby = c('cut', 'color')] # This step defines our data set as the combinations of cut and color that exist and their means. However, the problem with this is that it doesn't have all combinations possible
data <- data[CJ(unique(cut),unique(color))] # This functions exactly the same way as it did in the discrete example. It creates a complete list of all possible unique combinations of cut and color
ggplot(data, aes(color, mean_carat, fill=cut)) +
geom_bar(stat = "identity", position = "dodge") +
ylab("Mean Carat") + xlab("Color")
Giving us this graph:
Use count and complete from dplyr to do this.
library(tidyverse)
mtcars %>%
mutate(
type = as.factor(cyl),
group = as.factor(gear)
) %>%
count(type, group) %>%
complete(type, group, fill = list(n = 0)) %>%
ggplot(aes(x = type, y = n, fill = group)) +
geom_bar(colour = "black", position = "dodge", stat = "identity")
You can exploit the feature of the table() function, which computes the number of occurrences of a factor for all its levels
# load plyr package to use ddply
library(plyr)
# compute the counts using ddply, including zero occurrences for some factor levels
df <- ddply(mtcars2, .(group), summarise,
types = as.numeric(names(table(type))),
counts = as.numeric(table(type)))
# plot the results
ggplot(df, aes(x = types, y = counts, fill = group)) +
geom_bar(stat='identity',colour="black", position="dodge")
I am making a dodged barplot in ggplot2 and one grouping has a zero count that I want to display. I remembered seeing this on HERE a while back and figured the scale_x_discrete(drop=F) would work. It does not appear to work with dodged bars. How can I make the zero counts show?
For instance, (code below) in the plot below, type8~group4 has no examples. I would still like the plot to display the empty space for the zero count instead of eliminating the bar. How can I do this?
mtcars2 <- data.frame(type=factor(mtcars$cyl),
group=factor(mtcars$gear))
m2 <- ggplot(mtcars2, aes(x=type , fill=group))
p2 <- m2 + geom_bar(colour="black", position="dodge") +
scale_x_discrete(drop=F)
p2
Here's how you can do it without making summary tables first.
It did not work in my CRAN versioin (2.2.1) but in the latest development version of ggplot (2.2.1.900) I had no issues.
ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = position_dodge(preserve = "single"))
http://ggplot2.tidyverse.org/reference/position_dodge.html
Updated geom_bar() needs stat = "identity"
For what it's worth: The table of counts, dat, above contains NA. Sometimes, it is useful to have an explicit 0 instead; for instance, if the next step is to put counts above the bars. The following code does just that, although it's probably no simpler than Joran's. It involves two steps: get a crosstabulation of counts using dcast, then melt the table using melt, followed by ggplot() as usual.
library(ggplot2)
library(reshape2)
mtcars2 = data.frame(type=factor(mtcars$cyl), group=factor(mtcars$gear))
dat = dcast(mtcars2, type ~ group, fun.aggregate = length)
dat.melt = melt(dat, id.vars = "type", measure.vars = c("3", "4", "5"))
dat.melt
ggplot(dat.melt, aes(x = type,y = value, fill = variable)) +
geom_bar(stat = "identity", colour = "black", position = position_dodge(width = .8), width = 0.7) +
ylim(0, 14) +
geom_text(aes(label = value), position = position_dodge(width = .8), vjust = -0.5)
The only way I know of is to pre-compute the counts and add a dummy row:
dat <- rbind(ddply(mtcars2,.(type,group),summarise,count = length(group)),c(8,4,NA))
ggplot(dat,aes(x = type,y = count,fill = group)) +
geom_bar(colour = "black",position = "dodge",stat = "identity")
I thought that using stat_bin(drop = FALSE,geom = "bar",...) instead would work, but apparently it does not.
I asked this same question, but I only wanted to use data.table, as it's a faster solution for much larger data sets. I included notes on the data so that those that are less experienced and want to understand why I did what I did can do so easily. Here is how I manipulated the mtcars data set:
library(data.table)
library(scales)
library(ggplot2)
mtcars <- data.table(mtcars)
mtcars$Cylinders <- as.factor(mtcars$cyl) # Creates new column with data from cyl called Cylinders as a factor. This allows ggplot2 to automatically use the name "Cylinders" and recognize that it's a factor
mtcars$Gears <- as.factor(mtcars$gear) # Just like above, but with gears to Gears
setkey(mtcars, Cylinders, Gears) # Set key for 2 different columns
mtcars <- mtcars[CJ(unique(Cylinders), unique(Gears)), .N, allow.cartesian = TRUE] # Uses CJ to create a completed list of all unique combinations of Cylinders and Gears. Then counts how many of each combination there are and reports it in a column called "N"
And here is the call that produced the graph
ggplot(mtcars, aes(x=Cylinders, y = N, fill = Gears)) +
geom_bar(position="dodge", stat="identity") +
ylab("Count") + theme(legend.position="top") +
scale_x_discrete(drop = FALSE)
And it produces this graph:
Furthermore, if there is continuous data, like that in the diamonds data set (thanks to mnel):
library(data.table)
library(scales)
library(ggplot2)
diamonds <- data.table(diamonds) # I modified the diamonds data set in order to create gaps for illustrative purposes
setkey(diamonds, color, cut)
diamonds[J("E",c("Fair","Good")), carat := 0]
diamonds[J("G",c("Premium","Good","Fair")), carat := 0]
diamonds[J("J",c("Very Good","Fair")), carat := 0]
diamonds <- diamonds[carat != 0]
Then using CJ would work as well.
data <- data.table(diamonds)[,list(mean_carat = mean(carat)), keyby = c('cut', 'color')] # This step defines our data set as the combinations of cut and color that exist and their means. However, the problem with this is that it doesn't have all combinations possible
data <- data[CJ(unique(cut),unique(color))] # This functions exactly the same way as it did in the discrete example. It creates a complete list of all possible unique combinations of cut and color
ggplot(data, aes(color, mean_carat, fill=cut)) +
geom_bar(stat = "identity", position = "dodge") +
ylab("Mean Carat") + xlab("Color")
Giving us this graph:
Use count and complete from dplyr to do this.
library(tidyverse)
mtcars %>%
mutate(
type = as.factor(cyl),
group = as.factor(gear)
) %>%
count(type, group) %>%
complete(type, group, fill = list(n = 0)) %>%
ggplot(aes(x = type, y = n, fill = group)) +
geom_bar(colour = "black", position = "dodge", stat = "identity")
You can exploit the feature of the table() function, which computes the number of occurrences of a factor for all its levels
# load plyr package to use ddply
library(plyr)
# compute the counts using ddply, including zero occurrences for some factor levels
df <- ddply(mtcars2, .(group), summarise,
types = as.numeric(names(table(type))),
counts = as.numeric(table(type)))
# plot the results
ggplot(df, aes(x = types, y = counts, fill = group)) +
geom_bar(stat='identity',colour="black", position="dodge")