So I have the following code which produces:
The issue here is twofold:
The group bar chart automatically places the highest value on the top (i.e. for avenue 4 CTP is on top), whereas I would always want FTP to be shown first then CTP to be shown after (so always blue bar then red bar)
I need all of the values to scale to 100 or 100% for their respective group (so for CTP avenue 4 would have a huge bar graph but the other avenues should be extremely tiny)
I am new to 'R'/Stack overflow so sorry if anything is wrong/you need more but any help is greatly appreciated.
library(ggplot2)
library(tidyverse)
library(magrittr)
# function to specify decimals
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
# sample data
avenues <- c("Avenue1", "Avenue2", "Avenue3", "Avenue4")
flytip_amount <- c(1000, 2000, 1500, 250)
collection_amount <- c(5, 15, 10, 2000)
# create data frame from the sample data
df <- data.frame(avenues, flytip_amount, collection_amount)
# got it working - now to test
df3 <- df
SumFA <- sum(df3$flytip_amount)
df3$FTP <- (df3$flytip_amount/SumFA)*100
df3$FTP <- specify_decimal(df3$FTP, 1)
SumCA <- sum(df3$collection_amount)
df3$CTP <- (df3$collection_amount/SumCA)*100
df3$CTP <- specify_decimal(df3$CTP, 1)
# Now we have percentages remove whole values
df2 <- df3[,c(1,4,5)]
df2 <- df2 %>% pivot_longer(-avenues)
FTGraphPos <- df2$name
ggplot(df2, aes(x = avenues, fill = as.factor(name), y = value)) +
geom_col(position = "dodge", width = 0.75) + coord_flip() +
labs(title = "Flytipping & Collection %", x = "ward_name", y = "Percentageperward") +
geom_text(aes(x= avenues, label = value), vjust = -0.1, position = "identity", size = 5)
I have tried the above and I have looked at lots of tutorials but nothing is exactly precise to what I need of ensuring the group bar charts puts the layers in the same order despite amount and scaling to 100/100%
As Camille notes, to handle ordering of the categories in a plot, you need to set them as factors, and then use functions from the forcats package to handle the order. Here I am using fct_relevel() (note that it will automatically convert character variables to factors).
Your numeric values are in fact set to character, so they need to be set to numeric for the chart to make sense.
To cover point #2, I'm using group_by() to calculate percentages within each name.
I have also fixed the labels so that they are properly dodged along with the bar chart. Also, note that you don't need to call ggplot2 or magrittr if you are calling tidyverse - those packages come along with it already.
df_plot <- df2 |>
mutate(name = fct_relevel(name, "CTP"),
value = as.numeric(value)) |>
group_by(name) |>
mutate(perc = value / sum(value)) |>
ungroup()
ggplot(df_plot, aes(x = value, y = avenues, fill = name)) +
geom_col(position = "dodge", width = 0.75) +
geom_text(aes(label = value), position = position_dodge(width = 0.75), size = 5) +
labs(title = "Flytipping & Collection %", x = "Percentageperward", y = "ward_name") +
guides(fill = guide_legend(reverse = TRUE))
Given the following data, I compose a data frame with a factor and a numeric column.
X2 <- c(4,4,3,5,4,4,2,3,4,3,5,5,4,3,3,4,2,3,3,4,3,5,3,3,4,4,3,3,5,4,5,4,4,3,5,5,3,5,4,5,5,4,4,2,3,3,3,4,4,4,2,4,4,4,4,4,2,4,4,3,3,3,5,3,4,3,3,4,4,4,4,1,3,3,4,3,3,2,4,1)
X3 <- rep("I",40)
X4 <- rep("C",40)
Group <- c(X3,X4)
dat2 <- data.frame(X2,Group)
dat2$Group <- factor(dat2$Group)
levels(dat2$Group) = c("I","C")
Group <- c("C","I")
grp.mean <- c(3.8,3.375)
mu2 <- data.frame(Group,grp.mean)
I want to compose the following bar plot with vertical lines at the mean and here's, my code:
p2 <-ggplot(dat2, aes(x=X2))+
geom_bar(aes(color=Group,fill=Group),alpha=0.4, position= position_dodge(preserve = "single"))+
geom_vline(data=mu2, aes(xintercept=grp.mean, color=Group), linetype="dashed")+
xlab("Density in Responses") +
ylab("Levels")+
theme_gray() +
theme_grey(base_size = 30)+
theme(axis.text=element_text(size=22),
axis.title=element_text(size=20,face="bold"),
legend.title=element_text(size=16),
legend.text=element_text(size=14))+
theme(plot.title = element_text(hjust = 0.5,size=19,face="bold"))
p2
And I get a plot which checks all my expectations except one. When I have a value that one of the conditions (C and I) is blank, it automatically changes the place, and I don't know why! From my logic, it should remain in the same position and draw the bar in the right position. I attach an image so you can see what is going on.
As you can see the blue bar has taken the place of the red bar at the absence of a red bar (because it has 0 value). Does anyone know why is this happening and, is there any way I can fix this?
Thanks!
One work-around is to count the number of observations before ggplot, and plot the count information.
Note I have swapped X3 and X4 in your first Group vector so that red is on the left and blue is on the right.
library(tidyverse)
X2 <- c(4,4,3,5,4,4,2,3,4,3,5,5,4,3,3,4,2,3,3,4,3,5,3,3,4,4,3,3,5,4,5,4,4,3,5,5,3,5,4,5,5,4,4,2,3,3,3,4,4,4,2,4,4,4,4,4,2,4,4,3,3,3,5,3,4,3,3,4,4,4,4,1,3,3,4,3,3,2,4,1)
X3 <- rep("I",40)
X4 <- rep("C",40)
Group <- c(X4,X3)
dat2 <- data.frame(X2,Group)
dat2$Group <- factor(dat2$Group)
levels(dat2$Group) = c("I","C")
Group <- c("C","I")
grp.mean <- c(3.8,3.375)
mu2 <- data.frame(Group,grp.mean)
dat2 %>% group_by(X2, Group) %>% summarize(n = n()) %>% complete(Group, fill = list(n = 0)) %>%
ggplot(aes(x=X2, n))+
geom_bar(aes(color=Group,fill=Group),alpha=0.4, position= position_dodge(), stat = "identity")+
geom_vline(data=mu2, aes(xintercept=grp.mean, color=Group), linetype="dashed")+
xlab("Density in Responses") +
ylab("Levels")+
theme_gray() +
theme_grey(base_size = 30)+
theme(axis.text=element_text(size=22),
axis.title=element_text(size=20,face="bold"),
legend.title=element_text(size=16),
legend.text=element_text(size=14))+
theme(plot.title = element_text(hjust = 0.5,size=19,face="bold"))
#> `summarise()` has grouped output by 'X2'. You can override using the `.groups`
#> argument.
Created on 2022-05-13 by the reprex package (v2.0.1)
Like the others commenters i also got the opposite bar color so i changed the I/C values and it matched yours.
I am not sure if my result would satisfy you, as i managed to make the blue bar to fill the whole space of X=1
Anyway, I also used a cleaner code to generate the table:
X2 <- c(4,4,3,5,4,4,2,3,4,3,5,5,4,3,3,4,2,3,3,4,3,5,3,3,4,4,3,3,5,4,5,4,4,3,5,5,3,5,4,5, #40
5,4,4,2,3,3,3,4,4,4,2,4,4,4,4,4,2,4,4,3,3,3,5,3,4,3,3,4,4,4,4,1,3,3,4,3,3,2,4,1)
# First col is the unique X2 values.
# Second col is Group, which is a factor. It is a repetition of I/C, each 40 times (40*I and then 40*C)
# grp.mean is a grouped mean (by each Group[I/C]) of X2.
dat2 <- data.frame(
X2,
Group=factor(rep(c("C","I"),each=40))
) %>% group_by(Group) %>% mutate(grp.mean=mean(X2)) %>% ungroup()
dat2 %>% ggplot(aes(X2,fill=Group))+
geom_bar(position="dodge")+
geom_vline(xintercept = dat2$grp.mean)
I have a dataset that I want to summarize by calculating the ratio of 2 columns. However, I also need to calculate this ratio by different ‘cuts’ of my data set. i.e, ratio of the overall data, ratio by year, ratio by type, etc.
I will also need to put each ratio calculation in a bar chart.
What I want to know is whether I can plot all these bar charts without having to create a separate summary grouping dataset first.
For example, right now, before I send it to ggplot, I use group_by/summarize to my data first to calculate the ratio. Then I send it to ggplot.
Chart1 <- data %>% group_by(cut1) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart1, aes(x=cut1, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
for chart 2 and chart 3, I do the same thing again
Chart2 <- data %>% group_by(cut2) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart2, aes(x=cut2, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
Chart3 <- data %>% group_by(cut3) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart3, aes(x=cut3, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
Is there another way to do this? Initially, I was thinking there would be a way that I can just create the ratio once and then I can use it over many times (similar to a calculated field in excel pivot tables). Is there something better than the above method?
Also, if summarizing each ratio separately is the best way, how do I do a facet chart? For example, I may want to do a facet of ratio to cut 1 and cut 2
edit: more info with example using created data:
c1 <- c('a','a','a', 'b','b', 'b', 'c','c','c')
c2 <- c('aa','aa','aa', 'bb','bb', 'bb', 'cc','cc','cc')
v1 <-c(1,2,3,4,5,6,7,8,9)
v2<-c(9,8,7,6,5,4,3,2,1)
mydata <-data.frame(c1,c2,v1,v2)
Chart1 <- mydata %>% group_by(c1) %>% summarise(ratio=sum(v1)/sum(v2))
ggplot(Chart1, aes(x=c1, y=ratio)) + geom_bar(stat='identity', fill = "tomato2") + theme(axis.text.x=element_text(angle=90))
The outcome I want is to understand how to best summarize data before plotting it. Do I need to summarize each calculation by each grouping seperatly, or is there an easier way?
for the example above, if I wanted to calculate ratio and group it by c1, and then create another ratio chart and group by c2, and then another by c3....do I need to do 3 different aggregations.
Does this accomplish what you want?
library(tidyverse)
c1 <- c('a','a','a', 'b','b', 'b', 'c','c','c')
c2 <- c('aa','aa','aa', 'bb','bb', 'bb', 'cc','cc','cc')
v1 <-c(1,2,3,4,5,6,7,8,9)
v2<-c(9,8,7,6,5,4,3,2,1)
mydata <-data.frame(c1,c2,v1,v2)
Chart1 <- mydata %>%
gather(key = 'cuts', value = 'categories', -(v1:v2)) %>%
group_by(cuts, categories) %>%
summarise(ratio=sum(v1)/sum(v2))
# This lets you facet them onto the same chart,
# but that doesn't really make sense,
# since the cuts will have different x axes
ggplot(Chart1, aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
facet_grid(cuts~.) +
theme(axis.text.x=element_text(angle=90))
# This lets you make each plot separately
Chart1 %>%
filter(cuts == 'c1') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
# Use a for loop to save all of the plots to files
for(i in 1:(length(mydata)-2)){
p <-
Chart1 %>%
filter(cuts == names(mydata)[[i]]) %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
ggsave(paste0("myPlot",i,".png"), plot = p)
}
Only thing that I wasn't sure about, is how to facet the different cuts if they don't have the same values on the x-axis. If you just want to stack them on top of each other, you could use the gridExtra package:
library(gridExtra)
plot1 <- Chart1 %>%
filter(cuts == 'c1') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
plot2 <- Chart1 %>%
filter(cuts == 'c2') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
grid.arrange(plot1, plot2, ncol=1, nrow = 2)
I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:
library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)
plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}
x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)
df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))
I can plot the left side and plot the right side of the diagram I want:
plotAlluvial(df2.left)
plotAlluvial(df2.right)
But if I try to plot the left and right side at the same time I get this plot:
plotAlluvial(df2)
When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?
EDIT:
After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.
df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)
Sample data:
set.seed(145)
df <- data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
df.plot <- ggplot(df,aes(x=Age,y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Within the ggplot, how can I reorder x=Age, by the sum of Ranks "Extremely" and "Very" only?
I tried using the below, without success.
df.plot <- ggplot(df,aes(x=reorder(Age,Rank=="Extremely",sum),y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Couple of notes:
The way that you are simulating your data does not rule out the possibility that for some ages, all categories are not represented (which is fine), but also that for some ages, some categories are duplicated. I am assuming that this is not true for your real data, so have let this be. Note also that your simulation logic does not produce percentages that add up, although the category names indicate that they should.
The way I would do this is to create the ordering of age based on your desired logic, and then pass that order to the factor call. This decouples the ordering logic and allows arbitrary ordering logic.
Here is then what I think you are looking for:
library(ggplot2)
library(dplyr)
library(scales)
set.seed(145)
# simulate the data
df_foo = data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
# get the ordering that you are interested in
age_order = df_foo %>%
filter(Rank %in% c("Extremely", "Very")) %>%
group_by(Age) %>%
summarize(SumRank = sum(Percent)) %>%
arrange(desc(SumRank)) %>%
`[[`("Age")
# in some cases ages do not appear in the order because the
# ordering logic does not span all categories
age_order = c(age_order, setdiff(unique(df_foo$Age), age_order))
# make age a factor sorted by the ordering above
ggplot(df_foo, aes(x = factor(Age, levels = age_order), y = Percent, fill = Rank))+
geom_bar(stat = "identity") +
coord_flip() +
theme_bw() +
scale_y_continuous(labels = percent)
Which code produces: