ggplot doesn´t order levels since the last update - r

I´m trying to produce this plot here:
But the order on the right side gets mixed up.
In each horizontal line there are two stacked barplots, one in positive, one in negative direction. Each one has its own dataframe. df1 for the left side, df2 for the right side. The middle-category is split in half and partly on the left and the other half on the right side.
I tried to reorder the factor df2$level, which holds the order for the right side barplot, but it didn´t change a thing (of course i took out the order= as.numeric(level) from the ggplot2 call).
df2$level <- factor(df2$level, levels=rev(levels(df2$level)))
df2$level
Here is the example-data:
library("plyr")
library("dplyr")
library("stringr")
library("ggplot2")
# example data
Variable<-c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3","4","4","4","4","4")
level<-c(5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1)
perc_w<-c(3.70,11.80,10.10,25.80,38.60,2.00,16.90,13.25,28.80,25.80,1.80,6.50,9.35,33.60,39.40,3.50,12.40,14.10,34.80,21.10)
df<-data.frame(Variable,level,perc_w)
df$perc_w<-as.numeric(df$perc_w)
df$level<-as.factor(df$level)
# item text
items<-c("~ It´s not known, if climate change is real",
"~ In my opinion, the risks of climate change are exaggerated by activists",
"~ Climate change is not as dangerous as it is claimed",
"~ I´m convinced that we can handle climate change")
df$Variable<-as.character(df$Variable)
df$Variable[df$Variable==1]<-items[1]
df$Variable[df$Variable==2]<-items[2]
df$Variable[df$Variable==3]<-items[3]
df$Variable[df$Variable==4]<-items[4]
df$Variable<-as.ordered(df$Variable)
# calculate halves of the neutral category
df.split <-df %>% filter(level==3) %>% mutate(perc_w=as.numeric(perc_w/2))
# replace old neutral-category
df<-df %>% filter(!level==3)
df<-full_join(df,df.split) %>% arrange(level) %>% arrange(desc(Variable))
#split dataframe
df1<-df %>% filter(level == 3 | level== 2 | level==1)
df2<-df %>% filter(level == 5 | level== 4 | level==3) %>% mutate(perc_w = perc_w *-1)
# automatic line break
df1$Variable <-str_wrap(df1$Variable, width = 41)
df2$Variable <-str_wrap(df2$Variable, width = 41)
# reorder factor "Variable"
df1$Variable <- factor(df1$Variable, levels=rev(unique(df1$Variable)))
df2$Variable <- factor(df2$Variable, levels=rev(unique(df2$Variable)))
#Plot
p<-ggplot() +
geom_bar(data=df1, aes(x = Variable, y=perc_w, fill = level, order = -as.numeric(level)),position="stack", stat="identity") +
geom_bar(data=df2, aes(x = Variable, y=perc_w, fill = level, order = as.numeric(level)),position="stack", stat="identity") +
geom_hline(yintercept = 0, color =c("black"))+
theme_bw() +
coord_flip() +
guides(fill=guide_legend(title="",reverse=TRUE)) +
scale_fill_brewer(palette="Blues", name="",labels=c("--","-","0","+","++")) +
labs(title=expression(atop(bold("Attitudes towards climate change"),
atop(italic("Some roughly translated items"),""))),
y="percentages",x="") +
theme(legend.position="top",
axis.ticks = element_blank(),
plot.title = element_text(size=25),
axis.title.y=element_text(size=16),
axis.text.y=element_text(size=13),
axis.title.x=element_text(size=16),
axis.text.x=element_text(size=13),
legend.title=element_text(size=14),
legend.text=element_text(size=12)
)
p

Shamelessly taking a hint from this SO question:
ggplot will plot the stacked bars in the order it encounters them when using stat = "identity".
So, adding
df1 <- df1 %>% group_by(Variable) %>% arrange(desc(level))
df2 <- df2 %>% group_by(Variable) %>% arrange(level)
just before your plot code should give you the desired results.

Related

Horizontal Group Bar Chart - How to scale to 100% and how to specify the order of the layers

So I have the following code which produces:
The issue here is twofold:
The group bar chart automatically places the highest value on the top (i.e. for avenue 4 CTP is on top), whereas I would always want FTP to be shown first then CTP to be shown after (so always blue bar then red bar)
I need all of the values to scale to 100 or 100% for their respective group (so for CTP avenue 4 would have a huge bar graph but the other avenues should be extremely tiny)
I am new to 'R'/Stack overflow so sorry if anything is wrong/you need more but any help is greatly appreciated.
library(ggplot2)
library(tidyverse)
library(magrittr)
# function to specify decimals
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
# sample data
avenues <- c("Avenue1", "Avenue2", "Avenue3", "Avenue4")
flytip_amount <- c(1000, 2000, 1500, 250)
collection_amount <- c(5, 15, 10, 2000)
# create data frame from the sample data
df <- data.frame(avenues, flytip_amount, collection_amount)
# got it working - now to test
df3 <- df
SumFA <- sum(df3$flytip_amount)
df3$FTP <- (df3$flytip_amount/SumFA)*100
df3$FTP <- specify_decimal(df3$FTP, 1)
SumCA <- sum(df3$collection_amount)
df3$CTP <- (df3$collection_amount/SumCA)*100
df3$CTP <- specify_decimal(df3$CTP, 1)
# Now we have percentages remove whole values
df2 <- df3[,c(1,4,5)]
df2 <- df2 %>% pivot_longer(-avenues)
FTGraphPos <- df2$name
ggplot(df2, aes(x = avenues, fill = as.factor(name), y = value)) +
geom_col(position = "dodge", width = 0.75) + coord_flip() +
labs(title = "Flytipping & Collection %", x = "ward_name", y = "Percentageperward") +
geom_text(aes(x= avenues, label = value), vjust = -0.1, position = "identity", size = 5)
I have tried the above and I have looked at lots of tutorials but nothing is exactly precise to what I need of ensuring the group bar charts puts the layers in the same order despite amount and scaling to 100/100%
As Camille notes, to handle ordering of the categories in a plot, you need to set them as factors, and then use functions from the forcats package to handle the order. Here I am using fct_relevel() (note that it will automatically convert character variables to factors).
Your numeric values are in fact set to character, so they need to be set to numeric for the chart to make sense.
To cover point #2, I'm using group_by() to calculate percentages within each name.
I have also fixed the labels so that they are properly dodged along with the bar chart. Also, note that you don't need to call ggplot2 or magrittr if you are calling tidyverse - those packages come along with it already.
df_plot <- df2 |>
mutate(name = fct_relevel(name, "CTP"),
value = as.numeric(value)) |>
group_by(name) |>
mutate(perc = value / sum(value)) |>
ungroup()
ggplot(df_plot, aes(x = value, y = avenues, fill = name)) +
geom_col(position = "dodge", width = 0.75) +
geom_text(aes(label = value), position = position_dodge(width = 0.75), size = 5) +
labs(title = "Flytipping & Collection %", x = "Percentageperward", y = "ward_name") +
guides(fill = guide_legend(reverse = TRUE))

Ggplot2 rearranges wrongly the bars in a plot bar when value is null

Given the following data, I compose a data frame with a factor and a numeric column.
X2 <- c(4,4,3,5,4,4,2,3,4,3,5,5,4,3,3,4,2,3,3,4,3,5,3,3,4,4,3,3,5,4,5,4,4,3,5,5,3,5,4,5,5,4,4,2,3,3,3,4,4,4,2,4,4,4,4,4,2,4,4,3,3,3,5,3,4,3,3,4,4,4,4,1,3,3,4,3,3,2,4,1)
X3 <- rep("I",40)
X4 <- rep("C",40)
Group <- c(X3,X4)
dat2 <- data.frame(X2,Group)
dat2$Group <- factor(dat2$Group)
levels(dat2$Group) = c("I","C")
Group <- c("C","I")
grp.mean <- c(3.8,3.375)
mu2 <- data.frame(Group,grp.mean)
I want to compose the following bar plot with vertical lines at the mean and here's, my code:
p2 <-ggplot(dat2, aes(x=X2))+
geom_bar(aes(color=Group,fill=Group),alpha=0.4, position= position_dodge(preserve = "single"))+
geom_vline(data=mu2, aes(xintercept=grp.mean, color=Group), linetype="dashed")+
xlab("Density in Responses") +
ylab("Levels")+
theme_gray() +
theme_grey(base_size = 30)+
theme(axis.text=element_text(size=22),
axis.title=element_text(size=20,face="bold"),
legend.title=element_text(size=16),
legend.text=element_text(size=14))+
theme(plot.title = element_text(hjust = 0.5,size=19,face="bold"))
p2
And I get a plot which checks all my expectations except one. When I have a value that one of the conditions (C and I) is blank, it automatically changes the place, and I don't know why! From my logic, it should remain in the same position and draw the bar in the right position. I attach an image so you can see what is going on.
As you can see the blue bar has taken the place of the red bar at the absence of a red bar (because it has 0 value). Does anyone know why is this happening and, is there any way I can fix this?
Thanks!
One work-around is to count the number of observations before ggplot, and plot the count information.
Note I have swapped X3 and X4 in your first Group vector so that red is on the left and blue is on the right.
library(tidyverse)
X2 <- c(4,4,3,5,4,4,2,3,4,3,5,5,4,3,3,4,2,3,3,4,3,5,3,3,4,4,3,3,5,4,5,4,4,3,5,5,3,5,4,5,5,4,4,2,3,3,3,4,4,4,2,4,4,4,4,4,2,4,4,3,3,3,5,3,4,3,3,4,4,4,4,1,3,3,4,3,3,2,4,1)
X3 <- rep("I",40)
X4 <- rep("C",40)
Group <- c(X4,X3)
dat2 <- data.frame(X2,Group)
dat2$Group <- factor(dat2$Group)
levels(dat2$Group) = c("I","C")
Group <- c("C","I")
grp.mean <- c(3.8,3.375)
mu2 <- data.frame(Group,grp.mean)
dat2 %>% group_by(X2, Group) %>% summarize(n = n()) %>% complete(Group, fill = list(n = 0)) %>%
ggplot(aes(x=X2, n))+
geom_bar(aes(color=Group,fill=Group),alpha=0.4, position= position_dodge(), stat = "identity")+
geom_vline(data=mu2, aes(xintercept=grp.mean, color=Group), linetype="dashed")+
xlab("Density in Responses") +
ylab("Levels")+
theme_gray() +
theme_grey(base_size = 30)+
theme(axis.text=element_text(size=22),
axis.title=element_text(size=20,face="bold"),
legend.title=element_text(size=16),
legend.text=element_text(size=14))+
theme(plot.title = element_text(hjust = 0.5,size=19,face="bold"))
#> `summarise()` has grouped output by 'X2'. You can override using the `.groups`
#> argument.
Created on 2022-05-13 by the reprex package (v2.0.1)
Like the others commenters i also got the opposite bar color so i changed the I/C values and it matched yours.
I am not sure if my result would satisfy you, as i managed to make the blue bar to fill the whole space of X=1
Anyway, I also used a cleaner code to generate the table:
X2 <- c(4,4,3,5,4,4,2,3,4,3,5,5,4,3,3,4,2,3,3,4,3,5,3,3,4,4,3,3,5,4,5,4,4,3,5,5,3,5,4,5, #40
5,4,4,2,3,3,3,4,4,4,2,4,4,4,4,4,2,4,4,3,3,3,5,3,4,3,3,4,4,4,4,1,3,3,4,3,3,2,4,1)
# First col is the unique X2 values.
# Second col is Group, which is a factor. It is a repetition of I/C, each 40 times (40*I and then 40*C)
# grp.mean is a grouped mean (by each Group[I/C]) of X2.
dat2 <- data.frame(
X2,
Group=factor(rep(c("C","I"),each=40))
) %>% group_by(Group) %>% mutate(grp.mean=mean(X2)) %>% ungroup()
dat2 %>% ggplot(aes(X2,fill=Group))+
geom_bar(position="dodge")+
geom_vline(xintercept = dat2$grp.mean)

how to plot summarized data in ggplot

I have a dataset that I want to summarize by calculating the ratio of 2 columns. However, I also need to calculate this ratio by different ‘cuts’ of my data set. i.e, ratio of the overall data, ratio by year, ratio by type, etc.
I will also need to put each ratio calculation in a bar chart.
What I want to know is whether I can plot all these bar charts without having to create a separate summary grouping dataset first.
For example, right now, before I send it to ggplot, I use group_by/summarize to my data first to calculate the ratio. Then I send it to ggplot.
Chart1 <- data %>% group_by(cut1) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart1, aes(x=cut1, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
for chart 2 and chart 3, I do the same thing again
Chart2 <- data %>% group_by(cut2) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart2, aes(x=cut2, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
Chart3 <- data %>% group_by(cut3) %>% summarise(ratio=sum(column1)/sum(column2))
ggplot(Chart3, aes(x=cut3, y=ratio)) + geom_bar(stat='identity', fill = "tomato2")
Is there another way to do this? Initially, I was thinking there would be a way that I can just create the ratio once and then I can use it over many times (similar to a calculated field in excel pivot tables). Is there something better than the above method?
Also, if summarizing each ratio separately is the best way, how do I do a facet chart? For example, I may want to do a facet of ratio to cut 1 and cut 2
edit: more info with example using created data:
c1 <- c('a','a','a', 'b','b', 'b', 'c','c','c')
c2 <- c('aa','aa','aa', 'bb','bb', 'bb', 'cc','cc','cc')
v1 <-c(1,2,3,4,5,6,7,8,9)
v2<-c(9,8,7,6,5,4,3,2,1)
mydata <-data.frame(c1,c2,v1,v2)
Chart1 <- mydata %>% group_by(c1) %>% summarise(ratio=sum(v1)/sum(v2))
ggplot(Chart1, aes(x=c1, y=ratio)) + geom_bar(stat='identity', fill = "tomato2") + theme(axis.text.x=element_text(angle=90))
The outcome I want is to understand how to best summarize data before plotting it. Do I need to summarize each calculation by each grouping seperatly, or is there an easier way?
for the example above, if I wanted to calculate ratio and group it by c1, and then create another ratio chart and group by c2, and then another by c3....do I need to do 3 different aggregations.
Does this accomplish what you want?
library(tidyverse)
c1 <- c('a','a','a', 'b','b', 'b', 'c','c','c')
c2 <- c('aa','aa','aa', 'bb','bb', 'bb', 'cc','cc','cc')
v1 <-c(1,2,3,4,5,6,7,8,9)
v2<-c(9,8,7,6,5,4,3,2,1)
mydata <-data.frame(c1,c2,v1,v2)
Chart1 <- mydata %>%
gather(key = 'cuts', value = 'categories', -(v1:v2)) %>%
group_by(cuts, categories) %>%
summarise(ratio=sum(v1)/sum(v2))
# This lets you facet them onto the same chart,
# but that doesn't really make sense,
# since the cuts will have different x axes
ggplot(Chart1, aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
facet_grid(cuts~.) +
theme(axis.text.x=element_text(angle=90))
# This lets you make each plot separately
Chart1 %>%
filter(cuts == 'c1') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
# Use a for loop to save all of the plots to files
for(i in 1:(length(mydata)-2)){
p <-
Chart1 %>%
filter(cuts == names(mydata)[[i]]) %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
ggsave(paste0("myPlot",i,".png"), plot = p)
}
Only thing that I wasn't sure about, is how to facet the different cuts if they don't have the same values on the x-axis. If you just want to stack them on top of each other, you could use the gridExtra package:
library(gridExtra)
plot1 <- Chart1 %>%
filter(cuts == 'c1') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
plot2 <- Chart1 %>%
filter(cuts == 'c2') %>%
ggplot(aes(x=categories, y=ratio)) +
geom_bar(stat='identity', fill = "tomato2") +
theme(axis.text.x=element_text(angle=90))
grid.arrange(plot1, plot2, ncol=1, nrow = 2)

ggalluvial: How do I plot an alluvial diagram when I have a dataframe with links and nodes?

I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:
library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)
plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}
x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)
df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))
I can plot the left side and plot the right side of the diagram I want:
plotAlluvial(df2.left)
plotAlluvial(df2.right)
But if I try to plot the left and right side at the same time I get this plot:
plotAlluvial(df2)
When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?
EDIT:
After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.
df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)

Rank Stacked Bar Chart by Sum of Subset of Fill Variable

Sample data:
set.seed(145)
df <- data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
df.plot <- ggplot(df,aes(x=Age,y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Within the ggplot, how can I reorder x=Age, by the sum of Ranks "Extremely" and "Very" only?
I tried using the below, without success.
df.plot <- ggplot(df,aes(x=reorder(Age,Rank=="Extremely",sum),y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Couple of notes:
The way that you are simulating your data does not rule out the possibility that for some ages, all categories are not represented (which is fine), but also that for some ages, some categories are duplicated. I am assuming that this is not true for your real data, so have let this be. Note also that your simulation logic does not produce percentages that add up, although the category names indicate that they should.
The way I would do this is to create the ordering of age based on your desired logic, and then pass that order to the factor call. This decouples the ordering logic and allows arbitrary ordering logic.
Here is then what I think you are looking for:
library(ggplot2)
library(dplyr)
library(scales)
set.seed(145)
# simulate the data
df_foo = data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
# get the ordering that you are interested in
age_order = df_foo %>%
filter(Rank %in% c("Extremely", "Very")) %>%
group_by(Age) %>%
summarize(SumRank = sum(Percent)) %>%
arrange(desc(SumRank)) %>%
`[[`("Age")
# in some cases ages do not appear in the order because the
# ordering logic does not span all categories
age_order = c(age_order, setdiff(unique(df_foo$Age), age_order))
# make age a factor sorted by the ordering above
ggplot(df_foo, aes(x = factor(Age, levels = age_order), y = Percent, fill = Rank))+
geom_bar(stat = "identity") +
coord_flip() +
theme_bw() +
scale_y_continuous(labels = percent)
Which code produces:

Resources