I am trying to make a weighted dodged bar plot with ggplot2. With stacked bars the behavior is as expected:
df <- data.frame(group = rep(letters[1:3], each = 4),
sex = rep(c("m", "f"), times = 6),
weight = 1:12)
ggplot(df, aes(x = group, fill = sex, y = weight)) +
geom_bar(stat = "identity")
The bars have length equal to the total weight.
If I add position = "dodge", the length of the female group a bar is 4 rather than the expected 6. Similarly, all other bars are only as long as the highest weight in each group & sex combination rather than representing the total weight.
ggplot(df, aes(x = group, fill = sex, y = weight)) +
geom_bar(stat = "identity", position = "dodge")
How do I make the bar lengths match the total weight?
You can first summarise the data in your desired way and then plot it:
library(dplyr)
library(ggplot2)
df %>%
group_by(group, sex) %>%
summarise(total_weight = sum(weight)) %>%
ggplot(aes(x = group, fill = sex, y = total_weight)) +
geom_bar(stat = "identity", position = "dodge")
The problem with your original approach is that as you have several values of weight for one group, sex combination and then specify stat="identity", they are plotted on top of each other. This can be visualized:
ggplot(df, aes(x = group, fill = sex, y = weight)) +
geom_bar(stat = "identity", position = "dodge", color = "black", alpha = 0.5)
#kath's explanation is correct.
Another alternative, if you don't want to summarise the data frame before passing it to ggplot(): use the stat_summary() function instead of geom_bar():
ggplot(df, aes(x = group, fill = sex, y = weight)) +
stat_summary(geom = "bar", position = "dodge", fun.y = sum)
Related
I have this data.frame:
df <- data.frame(id = c("A","A","B","B","C","C"),
age = rep(c("young", "old"), 3),
value = c(20,15,7,5,2,6))
I'd like to plot it using ggplot2's geom_bar such that the bars are first separated (dodged) by age (but with no gaps between them) and then separated by id (along the x axis, with gaps), and are colored by id.
I'm only familiar with setting either the aes(x) argument to id and the fill argument to age:
ggplot(df, aes(x = id, y = value)) +
geom_bar(aes(fill = age), position = "dodge", stat = "identity") +
theme_minimal()
Or the opposite - the aes(x) argument to age and the fill argument to id:
ggplot(df, aes(x = age, y = value)) +
geom_bar(aes(fill = id), position = "dodge", stat = "identity") +
theme_minimal()
But what I want is the plot to look like the first one above but only filled by id rather than by age.
There's probably a combination position and/or statvalues that get that. Any idea?
You may use the group argument in aes:
ggplot(df, aes(x = id, y = value)) +
geom_bar(aes(group = age, fill = id), position = "dodge", stat = "identity") +
theme_minimal()
I would like to draw a line (or making points) on top of my stacked bar_plots. As I have no real data points I can refer to (only the spereated values and not the sum of them) I don't know how I can add such line. The Code produce this plot:
I want to add this black line(my real data are not linear):
library(tidyverse)
##Create some fake data
data3 <- tibble(
year = 1991:2020,
One = c(31:60),
Two = c(21:50),
Three = c(11:40)
)
##Gather the variables to create a long dataset
new_data3 <- data3 %>%
gather(model, value, -year)
##plot the data
ggplot(new_data3, aes(x = year, y = value, fill=model)) +
geom_bar(stat = "identity",position = "stack")
You can use stat_summary and sum for the summary function:
ggplot(new_data3, aes(year, value)) +
geom_col(aes(fill = model)) +
stat_summary(geom = "line", fun.y = sum, group = 1, size = 2)
Result:
You could get sum by year and plot it with new geom_line
library(dplyr)
library(ggplot2)
newdata4 <- new_data3 %>%
group_by(year) %>%
summarise(total = sum(value))
ggplot(new_data3, aes(x = year, y = value, fill=model)) +
geom_bar(stat = "identity",position = "stack") +
geom_line(aes(year, total, fill = ""), data = newdata4, size = 2)
I'm trying to plot a basic bar chart per group.
As values are pretty big, I want to show for each bar (i.e. group) the % of each group within the bar.
I managed to show percentage of the total, but this is not what I'm expecting : in each bar, I would like that the sum of % equal 100%.
Is there an easy way to do it without changing the dataframe ?
(DF <- data.frame( year = rep(2015:2017, each = 4),
Grp = c("Grp1", "Grp2", "Grp3", "Grp4"),
Value = trunc(rnorm(12, 2000000, 100000))) )
ggplot(DF) +
geom_bar(aes(x = year, y = Value, fill = Grp),
stat = "identity",
position = position_stack()) +
geom_text(aes(x = year, y = Value, group = Grp,
label = percent(Value/sum(Value))) ,
position = position_stack(vjust = .5))
You can create a new variable for percentile by year:
library(dplyr)
library(ggplot2)
library(scales)
DF <- DF %>% group_by(year) %>% mutate(ValuePer=(Value/sum(Value))) %>% ungroup()
ggplot(DF, aes(year, ValuePer, fill = Grp)) +
geom_bar(stat = "identity", position = "fill") +
geom_text(aes(label = percent(ValuePer)),
position = position_fill())+
scale_y_continuous(labels = percent_format())
Use position = "fill" to turn scale into proportions and scale_y_continuous(labels = percent_format()) to turn this scale into percent.
DF <- data.frame( year = rep(2015:2017, each = 4),
Grp = c("Grp1", "Grp2", "Grp3", "Grp4"),
Value = trunc(rnorm(12, 2000000, 100000)))
library(ggplot2)
library(scales)
ggplot(DF, aes(year, Value, fill = Grp)) +
geom_bar(stat = "identity", position = "fill") +
geom_text(aes(label = percent(Value / sum(Value))),
position = position_fill()) +
scale_y_continuous(labels = percent_format())
OK gathering all your tricks, I finally get this :
I need to adjust my DF, what I wanted to avoid, but it remains simple so it works
library(dplyr)
library(ggplot2)
library(scales)
DF <- DF %>% group_by(year) %>% mutate(ValuePer=(Value/sum(Value))) %>% ungroup()
ggplot(DF, aes(year, Value, fill = Grp)) +
geom_bar(stat = "identity", position = "stack") +
geom_text(aes(label = percent(ValuePer)),
position = position_stack()) +
scale_y_continuous(labels = unit_format("M", 1e-6) )
I would use a single geom_text for each bar while filtering data by year (bar) using dplyr. Check if is that what you need:
(DF <- data.frame( year = rep(2015:2017, each = 4),
Grp = c("Grp1", "Grp2", "Grp3", "Grp4"),
Value = trunc(rnorm(12, 2000000, 100000))) )
library(dplyr)
ggplot(DF) +
geom_bar(aes(x = year, y = Value, fill = Grp),
stat = "identity",
position = position_stack()) +
geom_text(data = DF %>% filter(year == 2015),
aes(x = year, y = Value,
label = scales::percent(Value/sum(Value))) ,
position = position_stack(vjust = .5)) +
geom_text(data = DF %>% filter(year == 2016),
aes(x = year, y = Value,
label = scales::percent(Value/sum(Value))) ,
position = position_stack(vjust = .5)) +
geom_text(data = DF %>% filter(year == 2017),
aes(x = year, y = Value,
label = scales::percent(Value/sum(Value))) ,
position = position_stack(vjust = .5))
Argument group is not necessary here. There may be more elegant solutions but that is the one I could think about. Tell me if this is the output you were waiting for:
Maybe creating a new column doing the right computation. I could not figure out how the computation could be done right inside aes(), the way you did you just computed the overall %, the Value should be grouped by year instead.
At least you got yourself the actually value by the Y axis and the Year grouped % inside bars. I would advise changing this labels by stacking something like this:
scale_y_continuous(breaks = seq(0,8*10^6,10^6),
labels = c(0, paste(seq(1,8,1),'M')))
Resulting this:
You can adapt to your context.
I have this bar chart:
group = c("A","A","B","B")
value = c(25,-75,-40,-76)
day = c(1,2,1,2)
dat = data.frame(group = group , value = value, day = day)
ggplot(data = dat, aes(x = group, y = value, fill = factor(day))) +
geom_bar(stat = "identity", position = "identity")+
geom_text(aes(label = round(value,0)), color = "black", position = "stack")
and I'd like the bars stacked and the values to show up. When I run the code above the -76 is not in the correct location (and neither is the 75 it seems).
Any idea how to get the numbers to appear in the correct location?
ggplot(data=dat, aes(x=group, y=value, fill=factor(day))) +
geom_bar(stat="identity", position="identity")+
geom_text(label =round(value,0),color = "black")+
scale_y_continuous(breaks=c(-80,-40,0))
Stacking a mix of negative and positive values is difficult for ggplot2. The easiest thing to do is to split the dataset into two, one for positives and one for negatives, and then add bar layers separately. A classic example is here.
You can do the same thing with the text, adding one text layer for the positive y values and one for the negatives.
dat1 = subset(dat, value >= 0)
dat2 = subset(dat, value < 0)
ggplot(mapping = aes(x = group, y = value, fill = factor(day))) +
geom_bar(data = dat1, stat = "identity", position = "stack")+
geom_bar(data = dat2, stat = "identity", position = "stack") +
geom_text(data = dat1, aes(label = round(value,0)), color = "black", position = "stack") +
geom_text(data = dat2, aes(label = round(value,0)), color = "black", position = "stack")
If using the currently development version of ggplot2 (2.1.0.9000), the stacking doesn't seem to be working correctly in geom_text for negative values. You can always calculate the text positions "by hand" if you need to.
library(dplyr)
dat2 = dat2 %>%
group_by(group) %>%
mutate(pos = cumsum(value))
ggplot(mapping = aes(x = group, y = value, fill = factor(day))) +
geom_bar(data = dat1, stat = "identity", position = "stack")+
geom_bar(data = dat2, stat = "identity", position = "stack") +
geom_text(data = dat1, aes(label = round(value,0)), color = "black") +
geom_text(data = dat2, aes(label = round(value,0), y = pos), color = "black")
How do I draw the sum value of each class (in my case: a=450, b=150, c=290, d=90) above the stacked bar in ggplot2? Here is my code:
#Data
hp=read.csv(textConnection(
"class,year,amount
a,99,100
a,100,200
a,101,150
b,100,50
b,101,100
c,102,70
c,102,80
c,103,90
c,104,50
d,102,90"))
hp$year=as.factor(hp$year)
#Plotting
p=ggplot(data=hp)
p+geom_bar(binwidth=0.5,stat="identity")+
aes(x=reorder(class,-value,sum),y=value,label=value,fill=year)+
theme()
You can do this by creating a dataset of per-class totals (this can be done multiple ways but I prefer dplyr):
library(dplyr)
totals <- hp %>%
group_by(class) %>%
summarize(total = sum(value))
Then adding a geom_text layer to your plot, using totals as the dataset:
p + geom_bar(binwidth = 0.5, stat="identity") +
aes(x = reorder(class, -value, sum), y = value, label = value, fill = year) +
theme() +
geom_text(aes(class, total, label = total, fill = NULL), data = totals)
You can make the text higher or lower than the top of the bars using the vjust argument, or just by adding some value to total:
p + geom_bar(binwidth = 0.5, stat = "identity") +
aes(x = reorder(class, -value, sum), y = value, label = value, fill = year) +
theme() +
geom_text(aes(class, total + 20, label = total, fill = NULL), data = totals)
You can use the built-in summary functionality of ggplot2 directly:
ggplot(hp, aes(reorder(class, -amount, sum), amount, fill = year)) +
geom_col() +
geom_text(
aes(label = after_stat(y), group = class),
stat = 'summary', fun = sum, vjust = -1
)