I have the following R code, where I transform the data and then order it by a specific column:
df2 <- df %>%
group_by(V2, news) %>%
tally() %>%
complete(news, fill = list(n = 0)) %>%
mutate(percentage = n / sum(n) * 100)
df22 <- df2[order(df2$news, -df2$percentage),]
I want to apply the ordered data "df22" in ggplot:
ggplot(df22, aes(x = V2, y = percentage, fill = factor(news, labels = c("Read","Otherwise")))) +
geom_bar(stat = "identity", position = "fill", width = .7) +
coord_flip() + guides(fill = guide_legend(title = "Online News")) +
scale_fill_grey(start = .1, end = .6) + xlab("Country") + ylab("Share")
Unfortunately, ggplot still returns me a plot without the order:
Does anyone know what is wrong with my code? This is not the same as to order bar chart with a single value per bar like here Reorder bars in geom_bar ggplot2. I try to order the cart by a specific category of a factor. In particular, I want to see countries with the largest share of Read news first.
Here is the data:
V2 news n percentage
1 United States News Read 1583 1.845139
2 Netherlands News Read 1536 1.790356
3 Germany News Read 1417 1.651650
4 Singapore News Read 1335 1.556071
5 United States Otherwise 581 0.6772114
6 Netherlands Otherwise 350 0.4079587
7 Germany Otherwise 623 0.7261665
8 Singapore Otherwise 635 0.7401536
I used the following R code:
df2 <- df %>%
group_by(V2, news) %>%
tally() %>%
complete(news, fill = list(n = 114)) %>%
mutate(percentage = n / sum(n) * 100)
df2 <- df2[order(df2$news, -df2$percentage),]
df2 <- df2 %>% group_by(news, percentage) %>% arrange(desc(percentage))
df2$V2 <- factor(df2$V2, levels = unique(df2$V2))
ggplot(df2, aes(x = V2, y = percentage, fill = news))+
geom_bar(stat = "identity", position = "stack") +
guides(fill = guide_legend(title = "Online News")) +
coord_flip() +
scale_x_discrete(limits = rev(levels(df2$V2)))
Everything was fine except some countries break the order for some reason and I do not understand why. Here is the picture:
What I did with the hints from guys, I used "arrange" command instead of dplyr
df4 <- arrange(df2, news, desc(percentage))
Here is the result:
Here's what I have - hope this is useful. As mentioned #Axeman - the trick is to reorder the labels as factors. Further, using coord_flip() reorders the labels in the opposite direction so scale_x_discrete() is needed.
I am using the small sample you provided.
library(ggplot2)
library(dplyr)
df <- read.csv("data.csv")
df <- arrange(df, news, desc(Percentage))
df$V2 <- factor(df$V2, levels = unique(df$V2))
ggplot(df, aes(x = V2, y = Percentage, fill = news))+
geom_bar(stat = "identity", position = "stack") +
guides(fill = guide_legend(title = "Online News")) +
coord_flip() +
scale_x_discrete(limits = rev(levels(df$V2)))
Related
I am currently working with survey data with 250 columns. A sample of my data looks like this:
q1 <- factor(c("yes",NA,"no","yes",NA,"yes","no","yes"))
q2 <- factor(c("Albania","USA","Albania","Albania","UK",NA,"UK","Albania"))
q3 <- factor(c(0,1,NA,0,1,1,NA,0))
q4 <- factor(c(0,NA,NA,NA,1,NA,0,0))
q5 <- factor(c("Dont know","Prefer not to answer","Agree","Disagree",NA,"Agree","Agree",NA))
q6 <- factor(c(1,NA,3,5,800,NA,900,2))
sector <- factor(c("Energy","Water","Energy","Other","Other","Water","Transportation","Energy"))
data <- data.frame(q1,q2,q3,q4,q5,q6,sector)
I have created a function to loop through all 250 columns and create grouped bar charts where x axis shows sectors, y axis shows percentage distribution of answers and fill is the underlying column from data. Below you can see the code for the function:
by_sector <- lapply(names(data), function(variable) {
ggplot(
data = data,
mapping = aes(x=sector,fill = data[[variable]])
) +
geom_bar(aes( y=..count../tapply(..count.., ..x.. ,sum)[..x..]), position="dodge") +
labs(x = variable, y = "% of total", fill = "Response", caption = paste("Total =", sum(!is.na(data[[variable]])))) +
geom_text(aes( y=..count../tapply(..count.., ..x.. ,sum)[..x..], label=scales::percent(..count../tapply(..count.., ..x.. ,sum)[..x..],accuracy = 0.1) ),
stat="count", position=position_dodge(1), vjust=0.5)+
#scale_fill_brewer(palette = "Accent")+
scale_fill_discrete(na.translate = FALSE) +
theme_bw() +
theme(panel.grid.major.y = element_blank()) +
coord_flip()
})
As you can see from image below, since I use data columns as fill, there is transparent NA category showing up. I want to remove that category from grouped bars.
enter image description here
I tried couple of things:
scale_fill_discrete(na.translate = FALSE) This just removed NA from legend not from grouped bars.
fill = subset(data,!is.na(data[[variable]])) This didn't work
ggplot(data=na.omit(data[[variable]])) This didn't work neither.
Is there a way to modify my code for barplots so that NA category doesn't show up as a bar in the graph? Thank you very much beforehand!
One option would be to aggregate your data outside of ggplot() which makes it easier to debug, removes the duplicated computations inside the code and makes it easy to drop the NA categories if desired.
Additionally, I moved the plotting code to a separate function which also allows for easier debugging by e.g. running the code for just one example.
Finally note, that I switched to the .data pronoun as the recommend way to use column names passed as strings.
Showing only the plots for two of the problematic columns:
EDIT Fixed a small bug by removing the NA values before aggregating instead of doing that afterwards.
library(ggplot2)
library(dplyr, warn.conflicts = FALSE)
plot_fun <- function(variable) {
total <- sum(!is.na(data[[variable]]))
data <- data |>
filter(!is.na(.data[[variable]])) |>
group_by(across(all_of(c("sector", variable)))) |>
summarise(n = n(), .groups = "drop_last") |>
mutate(pct = n / sum(n)) |>
ungroup()
ggplot(
data = data,
mapping = aes(x = sector, y = pct, fill = .data[[variable]])
) +
geom_col(position = "dodge") +
labs(
x = variable, y = "% of total", fill = "Response",
caption = paste("Total =", total)
) +
geom_text(
aes(
label = scales::percent(pct, accuracy = 0.1)
),
position = position_dodge(.9), vjust = 0.5
) +
scale_fill_brewer(palette = "Accent") +
theme_bw() +
theme(panel.grid.major.y = element_blank()) +
coord_flip()
}
by_sector <- lapply(names(data), plot_fun)
by_sector[c(3, 6)]
#> [[1]]
#>
#> [[2]]
I am trying to create a barplot using ggplot2 where I am stacking by one variable and dodging by another.
Here is an example data set:
df=data.frame(
year=rep(c("2010","2011"),each=4),
treatment=rep(c("Impact","Control")),
type=rep(c("Phylum1","Phylum2"),each=2),
total=sample(1:100,8))
I would like to create a barplot where x=treatment, y=total, the stacked variable is type and the dodged variable is year. Of course I can do one or the other:
ggplot(df,aes(y=total,x=treatment,fill=type))+geom_bar(position="dodge",stat="identity")
ggplot(df,aes(y=total,x=treatment,fill=year))+geom_bar(position="dodge",stat="identity")
But not both! Thanks to anyone who can provide advice.
Here's an alternative take using faceting instead of dodging:
ggplot(df, aes(x = year, y = total, fill = type)) +
geom_bar(position = "stack", stat = "identity") +
facet_wrap( ~ treatment)
With Tyler's suggested change: + theme(panel.margin = grid::unit(-1.25, "lines"))
The closest you can get is by drawing a border around the dodged bars to highlight the stacked type values.
ggplot(df, aes(treatment, total, fill = year)) +
geom_bar(stat="identity", position="dodge", color="black")
You can use interaction(year, treatment) as the x-axis variable as an alternative to dodge.
library(dplyr)
library(ggplot2)
df=data.frame(
year=rep(c("2010","2011"),each=4),
treatment=rep(c("Impact","Control")),
type=rep(c("Phylum1","Phylum2"),each=2),
total=sample(1:100,8)) %>%
mutate(x_label = factor(str_replace(interaction(year, treatment), '\\.', ' / '),
ordered=TRUE))
ggplot(df, aes(x=x_label, y=total, fill=type)) +
geom_bar(stat='identity') +
labs(x='Year / Treatment')
Created on 2018-04-26 by the reprex package (v0.2.0).
you can play with some alpha:
df %>%
group_by(year, treatment) %>%
mutate(cum_tot = cumsum(total)) %>%
ggplot(aes(treatment, cum_tot, fill =year)) +
geom_col(data = . %>% filter( type=="Phylum1"), position = position_dodge(width = 0.9), alpha = 1) +
geom_col(data = . %>% filter( type=="Phylum2"), position = position_dodge(width = 0.9), alpha = 0.4) +
geom_tile(aes(y=NA_integer_, alpha = factor(type))) +
scale_alpha_manual(values = c(1,0.4))
Now you can add theme(panel.background = element_rect(fill ="yellow")) some background fill to mix the colors:
Finally you have to fix the legend using inkscape.
It can be done however its tricky/fiddly, you basically have to layer the bar chart.
here is my code:
library(tidyverse)
df=data.frame(
year=rep(c(2010,2011),each=4),
treatment=rep(c("Impact","Control")),
type=rep(c("Phylum1","Phylum2"),each=2),
total=sample(1:100,8))
# separate the by the variable which we are dodging by so
# we have two data frames impact and control
impact <- df %>% filter(treatment == "Impact") %>%
mutate(pos = sum(total, na.rm=T))
control <- df %>% filter(treatment == "Control") %>%
mutate(pos = sum(total, na.rm=T))
# calculate the position for the annotation element
impact_an <- impact %>% group_by(year) %>%
summarise(
pos = sum(total) + 12
, treatment = first(treatment)
)
control_an <- control %>% group_by(year) %>%
summarise(
pos = sum(total) + 12
, treatment = first(treatment)
)
# define the width of the bars, we need this set so that
# we can use it to position the second layer geom_bar
barwidth = 0.30
ggplot() +
geom_bar(
data = impact
, aes(x = year, y = total, fill = type)
, position = "stack"
, stat = "identity"
, width = barwidth
) +
annotate(
"text"
, x = impact_an$year
,y = impact_an$pos
, angle = 90
, label = impact_an$treatment
) +
geom_bar(
data = control
# here we are offsetting the position of the second layer bar
# by adding the barwidth plus 0.1 to push it to the right
, aes(x = year + barwidth + 0.1, y = total, fill = type)
, position = "stack"
, stat = "identity"
, width = barwidth
) +
annotate(
"text"
, x = control_an$year + (barwidth * 1) + 0.1
,y = control_an$pos
, angle = 90
, label = control_an$treatment
) +
scale_x_discrete(limits = c(2010, 2011))
This doesn't really scale well, however there are ways you could code it up to make it suit your situation, credit where its due I originally learnt this method from the following post: https://community.rstudio.com/t/ggplot-position-dodge-with-position-stack/16425
I have a dataset with two variables: 1) ID, 2) Infection Status (Binary:1/0).
I would like to use ggplot2 to
Create a stacked percentage bar graph with the various ID on the verticle-axis (arranged alphabetically with A starting on top), and the percent on the horizontal-axis. I can't seem to get a code that will automatically sort the ID alphabetically as my original dataset has quite a number of categories and will be difficult to arrange them manually.
I also hope to have the infected category (1) to be red and towards the left of the blue non-infected category (0). Is it also possible to change the sub-heading of the legend box from "Non_infected" to "Non-infected"?
I hope that the displayed ID in the plot will include the count of the number of times the ID appeared in the dataset. E.g. "A (n=6)", "B (n=3)"
My sample code is as follow:
ID <- c("A","A","A","A","A","A",
"B","B","B",
"C","C","C","C","C","C","C",
"D","D","D","D","D","D","D","D","D")
Infection <- sample(c(1, 0), size = length(ID), replace = T)
df <- data.frame(ID, Infection)
library(ggplot2)
library(dplyr)
library(reshape2)
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected)
df.plot %>%
melt() %>%
ggplot(aes(x = ID, y = value, fill = variable)) + geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_discrete(guide = guide_legend(title = "Infection Status")) +
coord_flip()
Right now I managed to get this output:
I hope to get this:
Thank you so much!
First, we need to add a count to your original data.frame.
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected,
count = n())
Then, we augment our ID column, turn the Infection Status into a factor variable, use forcats::fct_rev to reverse the ID ordering, and use scale_fill_manual to control your legend.
df.plot %>%
mutate(ID = paste0(ID, " (n=", count, ")")) %>%
select(-count) %>%
melt() %>%
mutate(variable = factor(variable, levels = c("Non_Infected", "Infected"))) %>%
ggplot(aes(x = forcats::fct_rev(ID), y = value, fill = variable)) +
geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_manual("Infection Status",
values = c("Infected" = "#F8766D", "Non_Infected" = "#00BFC4"),
labels = c("Non-Infected", "Infected"))+
coord_flip()
I wish to create a back to back bar chart. In my data, I have a number of species observations (n) from 2017 and 2018. Some species occurred only in 2017 other occurred both years and some only occurred in 2018. I wish to depict this in a graph centered around the number of species occurring both years across multiple sites (a,b,c).
First, I create a data set:
n <- sample(1:50, 9)
reg <- c(rep("2017", 3), rep("Both",3), rep("2018", 3))
plot <- c(rep(c("a", "b", "c"), 3))
d4 <- data.frame(n, reg, plot)
I use ggplot to try to plot my graph - I have tried two ways:
library(ggplot2)
ggplot(d4, aes(plot, n, fill = reg)) +
geom_col() +
coord_flip()
ggplot(d4, aes(x = plot, y = n, fill = reg))+
coord_flip()+
geom_bar(stat = "identity", width = 0.75)
I get a plot similar to what I want. However, would like the blue 'both' bar to be in between the 2017 and 2018 bars. Further, my main problem, I would like to center the 'both' bar in the middle of the plot. The 2017 column should extend to the left and the 2018 column to the right. My question is somewhat similar to the one in the link below; however, as I have only three and not four levels in my graph, I cannot use the same approach as below.
Creating a stacked bar chart centered on zero using ggplot
I'm not sure this is the best way to do that, but here is a way to do that:
library(dplyr)
d4pos <- d4 %>%
filter(reg != 2018) %>%
group_by(reg, plot) %>%
summarise(total = sum(n)) %>%
ungroup() %>%
mutate(total = total * ifelse(reg == "Both", .5, 1))
d4neg <- d4 %>%
filter(reg != 2017) %>%
group_by(reg, plot) %>%
summarise(total = - sum(n)) %>%
ungroup() %>%
mutate(total = total * ifelse(reg == "Both", .5, 1))
ggplot(data = d4pos, aes(x = plot, y = total, fill = reg)) +
geom_bar(stat = "identity") +
geom_bar(data = d4neg, stat = "identity", aes(x = plot, y = total, fill = reg)) +
coord_flip()
I generate two data frames for the total of each group. One contains the 2017 and (half of) Both, and the other contains the rest. The value for the 2018 data frame is flipped to plot on the negative side.
The output looks like this:
EDIT
If you want to have positive values in both directions for the horizontal axis, you can do something like this:
ggplot(data = d4pos, aes(x = plot, y = total, fill = reg)) +
geom_bar(stat = "identity") +
geom_bar(data = d4neg, stat = "identity", aes(x = plot, y = total, fill = reg)) +
scale_y_continuous(breaks = seq(-50, 50, by = 25),
labels = abs(seq(-50, 50, by = 25))) +
coord_flip()
I have a stacked barplot with the following data
df <- expand.grid(name = c("oak","birch","cedar"),
sample = c("one","two"),
type = c("sapling","adult","dead"))
df$count <- sample(5:200, size = nrow(df), replace = T)
I generate a barplot and try to add the group lables to it:
ggplot(df, aes(x = name, y = count, fill = type)) +
geom_bar(stat = "identity") +
coord_flip() +
theme(legend.position="none") +
geom_text(aes(label = type, position = "stack"))
It produces:
Two to three questions arise:
How can I make the labels appear in the top bar only?
How can I make the labels appear in the center of the bar section?
Optionally: How can I make the labels appear on top of the top bar being connected to their sections by arrows?
There is a link suggested above. That will help you. Here, I have another suggestion.
set.seed(123)
df <- expand.grid(name = c("oak","birch","cedar"),
sample = c("one","two"),
type = c("sapling","adult","dead"))
df$count <- sample(5:200, size = nrow(df), replace = T)
### Arrange a data frame (summing up sample one and two)
library(dplyr)
ana <- df %>%
group_by(name, type) %>%
summarise(total = sum(count))
# Draw a figure once
bob <- ggplot(ana, aes(x = name, y = total, fill = type)) +
geom_bar(stat = "identity", position = "stack")
# Get a data frame for ggplot
cathy <- ggplot_build(bob)$data[[1]]
# calculate text position & add text labels
cathy$y_pos <- (cathy$ymin + cathy$ymax) / 2
cathy$label <- rep(c("sampling", "adult", "dead"), times = 3)
# Subset the data for labeling for the top bar
dan <- cathy[c(7:9), ]
# Draw a figure again
bob +
annotate(x = dan$x, y = dan$y_pos, label = dan$label, geom="text", size=3) +
coord_flip()