Is there a way to use ggplot2 to create divergent stacked bar charts like the one on the right-hand side of the image below?
Data for reproducible example
library(ggplot2)
library(scales)
library(reshape)
dat <- read.table(text = " ONE TWO THREE
1 23 234 324
2 34 534 12
3 56 324 124
4 34 234 124
5 123 534 654",sep = "",header = TRUE)
# reshape data
datm <- melt(cbind(dat, ind = rownames(dat)), id.vars = c('ind'))
# plot
ggplot(datm,aes(x = variable, y = value,fill = ind)) +
geom_bar(position = "fill",stat = "identity") +
coord_flip()
Sure, positive values stack positively, negative values stack negatively. Don't use position fill. Just define what you want as negative values, and actually make them negative. Your example only has positive scores. E.g.
ggplot(datm, aes(x = variable, y = ifelse(ind %in% 1:2, -value, value), fill = ind)) +
geom_col() +
coord_flip()
If you want to also scale to 1, you need some preprocessing:
library(dplyr)
datm %>%
group_by(variable) %>%
mutate(value = value / sum(value)) %>%
ggplot(aes(x = variable, y = ifelse(ind %in% 1:2, -value, value), fill = ind)) +
geom_col() +
coord_flip()
An extreme approach might be to calculate the boxes yourself. Here's one method
dd <- datm %>% group_by(variable) %>%
arrange(desc(ind)) %>%
mutate(pct = value/sum(value), right = cumsum(pct), left=lag(right, default=0))
then you can plot with
ggplot(dd) +
geom_rect(aes(xmin=right, xmax=left, ymin=as.numeric(variable)-.4, ymax=as.numeric(variable)+.4, fill=ind)) +
scale_y_continuous(labels=levels(dd$variable), breaks=1:nlevels(dd$variable))
to get the left plot. and to get the right, you just shift the boxes a bit. This will line up all the right edges of the ind 3 boxes.
ggplot(dd %>% group_by(variable) %>% mutate(left=left-right[ind==3], right=right-right[ind==3])) +
geom_rect(aes(xmin=right, xmax=left, ymin=as.numeric(variable)-.4, ymax=as.numeric(variable)+.4, fill=ind)) +
scale_y_continuous(labels=levels(dd$variable), breaks=1:nlevels(dd$variable))
So maybe overkill here, but you have a lot of control this way.
Related
Lets say we have the table:
x y
1 43
1 54
2 54
3 22
2 22
1 43
I want to hist on the x-axis only 1,2,3 so it recognizes the unique values but in addition it should show in % the frequency of the number 43 in 1 then 54 and so on. Should both columns be factorized?
Here's my solution:
library("ggplot2")
library("dplyr")
library("magrittr")
library("tidyr")
df <- data.frame(x = c(1,1,2,3,2,1), y = c(43,54,54,22,22,43))
#Creating a counter that will keep track
#Of how many of each number in y exist for each x category
df$n <- 1
df %<>% #This is a bidirectional pipe here that overwrites 'df' with the result!
group_by(x, y) %>% #Unidirectional pipe
tally(n) %>%
mutate(n = round(n/sum(n), 2)) #Calculating as percentage
#Plotting
df %>%
ggplot(aes(fill = as.factor(y), y = n, x = x)) +
geom_bar(position = "fill", stat = "identity") +
scale_y_continuous(labels = scales::percent) +
labs(y = "Percentage contribution from each y category") +
#Adding the percentage values as labels
geom_text(aes(label = paste0(n*100,"%")), position = position_stack(vjust = 0.5), size = 2)
Note: the y-axis values are presented as percentages because position="fill" is passed to geom_bar().
This question already has answers here:
Add percentage labels to a stacked barplot
(2 answers)
Closed 3 years ago.
bgraph <- ggplot(data = data, aes(x = location)) +
geom_bar(aes(fill = success))
success is a percentage calculated as a factor of 4 categories with the varying 4 outcomes of the data set. I could separately calculate them easily, but as the ggplot is currently constituted, they are generated by the geom_bar(aes(fill=success)).
data <- as.data.frame(c(1,1,1,1,1,1,2,2,3,3,3,3,4,4,4,4,4,4,
4,4,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7))
data[["success"]] <- c("a","b","c","c","d","d","a","b","b","b","c","d",
"a","b","b","b","c","c","c","d","a","b","c","d",
"a","b","c","c","d","d","a","b","b","c","d")
names(data) <- c("location","success")
bgraph <- ggplot(data = data, aes(x = location)) +
geom_bar(aes(fill = success))
bgraph
How do I get labels over the individual percentages? More specifically, I wanted 4 individual percentages for each bar. One for yellow, light orange, orange, and red, respectively. %'s all add up to 1.
Maybe there is a way to do this in ggplot directly but with some pre-processing in dplyr, you'll be able to achieve your desired output.
library(dplyr)
library(ggplot2)
data %>%
count(location, success) %>%
group_by(location) %>%
mutate(n = n/sum(n) * 100) %>%
ggplot() + aes(x = location, n, fill = success,label = paste0(round(n, 2), "%")) +
geom_bar(stat = "identity") +
geom_text(position=position_stack(vjust=0.5))
How about creating a summary frame with the relative frequencies within location and then using that with geom_col() and geom_text()?
# Create summary stats
tots <-
data %>%
group_by(location,success) %>%
summarise(
n = n()
) %>%
mutate(
rel = round(100*n/sum(n)),
)
# Plot
ggplot(data = tots, aes(x = location, y = n)) +
geom_col(aes(fill = fct_rev(success))) + # could only get it with this reversed
geom_text(aes(label = rel), position = position_stack(vjust = 0.5))
OUTPUT:
I need help on setting the individual x-axis limits on different facets as described below.
A programmatical approach is preferred since I will apply the same template to different data sets.
first two facets will have the same x-axis limits (to have comparable bars)
the last facet's (performance) limits will be between 0 and 1, since it is calculated as a percentage
I have seen this and some other related questions but couldn't apply it to my data.
Thanks in advance.
df <-
data.frame(
call_reason = c("a","b","c","d"),
all_records = c(100,200,300,400),
problematic_records = c(80,60,100,80))
df <- df %>% mutate(performance = round(problematic_records/all_records, 2))
df
call_reason all_records problematic_records performance
a 100 80 0.80
b 200 60 0.30
c 300 100 0.33
d 400 80 0.20
df %>%
gather(key = facet_group, value = value, -call_reason) %>%
mutate(facet_group = factor(facet_group,
levels=c('all_records','problematic_records','performance'))) %>%
ggplot(aes(x=call_reason, y=value)) +
geom_bar(stat="identity") +
coord_flip() +
facet_grid(. ~ facet_group)
So here is one way to go about it with facet_grid(scales = "free_x"), in combination with a geom_blank(). Consider df to be your df at the moment before piping it into ggplot.
ggplot(df, aes(x=call_reason, y=value)) +
# geom_col is equivalent to geom_bar(stat = "identity")
geom_col() +
# geom_blank includes data for position scale training, but is not rendered
geom_blank(data = data.frame(
# value for first two facets is max, last facet is 1
value = c(rep(max(df$value), 2), 1),
# dummy category
call_reason = levels(df$call_reason)[1],
# distribute over facets
facet_group = levels(df$facet_group)
)) +
coord_flip() +
# scales are set to "free_x" to have them vary independently
# it doesn't really, since we've set a geom_blank
facet_grid(. ~ facet_group, scales = "free_x")
As long as your column names remain te same, this should work.
EDIT:
To reorder the call_reason variable, you could add the following in your pipe that goes into ggplot:
df %>%
gather(key = facet_group, value = value, -call_reason) %>%
mutate(facet_group = factor(facet_group,
levels=c('all_records','problematic_records','performance')),
# In particular the following bit:
call_reason = factor(call_reason, levels(call_reason)[order(value[facet_group == "performance"])]))
I wish to create a back to back bar chart. In my data, I have a number of species observations (n) from 2017 and 2018. Some species occurred only in 2017 other occurred both years and some only occurred in 2018. I wish to depict this in a graph centered around the number of species occurring both years across multiple sites (a,b,c).
First, I create a data set:
n <- sample(1:50, 9)
reg <- c(rep("2017", 3), rep("Both",3), rep("2018", 3))
plot <- c(rep(c("a", "b", "c"), 3))
d4 <- data.frame(n, reg, plot)
I use ggplot to try to plot my graph - I have tried two ways:
library(ggplot2)
ggplot(d4, aes(plot, n, fill = reg)) +
geom_col() +
coord_flip()
ggplot(d4, aes(x = plot, y = n, fill = reg))+
coord_flip()+
geom_bar(stat = "identity", width = 0.75)
I get a plot similar to what I want. However, would like the blue 'both' bar to be in between the 2017 and 2018 bars. Further, my main problem, I would like to center the 'both' bar in the middle of the plot. The 2017 column should extend to the left and the 2018 column to the right. My question is somewhat similar to the one in the link below; however, as I have only three and not four levels in my graph, I cannot use the same approach as below.
Creating a stacked bar chart centered on zero using ggplot
I'm not sure this is the best way to do that, but here is a way to do that:
library(dplyr)
d4pos <- d4 %>%
filter(reg != 2018) %>%
group_by(reg, plot) %>%
summarise(total = sum(n)) %>%
ungroup() %>%
mutate(total = total * ifelse(reg == "Both", .5, 1))
d4neg <- d4 %>%
filter(reg != 2017) %>%
group_by(reg, plot) %>%
summarise(total = - sum(n)) %>%
ungroup() %>%
mutate(total = total * ifelse(reg == "Both", .5, 1))
ggplot(data = d4pos, aes(x = plot, y = total, fill = reg)) +
geom_bar(stat = "identity") +
geom_bar(data = d4neg, stat = "identity", aes(x = plot, y = total, fill = reg)) +
coord_flip()
I generate two data frames for the total of each group. One contains the 2017 and (half of) Both, and the other contains the rest. The value for the 2018 data frame is flipped to plot on the negative side.
The output looks like this:
EDIT
If you want to have positive values in both directions for the horizontal axis, you can do something like this:
ggplot(data = d4pos, aes(x = plot, y = total, fill = reg)) +
geom_bar(stat = "identity") +
geom_bar(data = d4neg, stat = "identity", aes(x = plot, y = total, fill = reg)) +
scale_y_continuous(breaks = seq(-50, 50, by = 25),
labels = abs(seq(-50, 50, by = 25))) +
coord_flip()
I have the following R code, where I transform the data and then order it by a specific column:
df2 <- df %>%
group_by(V2, news) %>%
tally() %>%
complete(news, fill = list(n = 0)) %>%
mutate(percentage = n / sum(n) * 100)
df22 <- df2[order(df2$news, -df2$percentage),]
I want to apply the ordered data "df22" in ggplot:
ggplot(df22, aes(x = V2, y = percentage, fill = factor(news, labels = c("Read","Otherwise")))) +
geom_bar(stat = "identity", position = "fill", width = .7) +
coord_flip() + guides(fill = guide_legend(title = "Online News")) +
scale_fill_grey(start = .1, end = .6) + xlab("Country") + ylab("Share")
Unfortunately, ggplot still returns me a plot without the order:
Does anyone know what is wrong with my code? This is not the same as to order bar chart with a single value per bar like here Reorder bars in geom_bar ggplot2. I try to order the cart by a specific category of a factor. In particular, I want to see countries with the largest share of Read news first.
Here is the data:
V2 news n percentage
1 United States News Read 1583 1.845139
2 Netherlands News Read 1536 1.790356
3 Germany News Read 1417 1.651650
4 Singapore News Read 1335 1.556071
5 United States Otherwise 581 0.6772114
6 Netherlands Otherwise 350 0.4079587
7 Germany Otherwise 623 0.7261665
8 Singapore Otherwise 635 0.7401536
I used the following R code:
df2 <- df %>%
group_by(V2, news) %>%
tally() %>%
complete(news, fill = list(n = 114)) %>%
mutate(percentage = n / sum(n) * 100)
df2 <- df2[order(df2$news, -df2$percentage),]
df2 <- df2 %>% group_by(news, percentage) %>% arrange(desc(percentage))
df2$V2 <- factor(df2$V2, levels = unique(df2$V2))
ggplot(df2, aes(x = V2, y = percentage, fill = news))+
geom_bar(stat = "identity", position = "stack") +
guides(fill = guide_legend(title = "Online News")) +
coord_flip() +
scale_x_discrete(limits = rev(levels(df2$V2)))
Everything was fine except some countries break the order for some reason and I do not understand why. Here is the picture:
What I did with the hints from guys, I used "arrange" command instead of dplyr
df4 <- arrange(df2, news, desc(percentage))
Here is the result:
Here's what I have - hope this is useful. As mentioned #Axeman - the trick is to reorder the labels as factors. Further, using coord_flip() reorders the labels in the opposite direction so scale_x_discrete() is needed.
I am using the small sample you provided.
library(ggplot2)
library(dplyr)
df <- read.csv("data.csv")
df <- arrange(df, news, desc(Percentage))
df$V2 <- factor(df$V2, levels = unique(df$V2))
ggplot(df, aes(x = V2, y = Percentage, fill = news))+
geom_bar(stat = "identity", position = "stack") +
guides(fill = guide_legend(title = "Online News")) +
coord_flip() +
scale_x_discrete(limits = rev(levels(df$V2)))