Visualize rank-change using alluvial in R ggalluvial - r

I have a pretty basic df in which I have calculated the rank-change of values between two timestamps:
value rank_A rank_B group
1 A 1 1 A
2 B 2 3 A
3 C 3 2 B
4 D 4 4 B
5 E 5 8 A
6 F 6 5 C
7 G 7 6 C
8 H 8 7 A
What makes it a bit tricky (for me) is plotting the values on the Y-axis.
ggplot(df_alluvial, aes(y = value, axis1 = rank_A, axis2 = rank_B))+
geom_alluvium(aes(fill = group), width = 1/12)+
...
As of now, I can plot the rank-change and the groups successfully, but they are not linked to my value-names - there are no axis names and I don't know how to add them.
In the end it should look similiar to this:
https://www.reddit.com/r/GraphicalExcellence/comments/4imh5f/alluvial_diagram_population_size_and_rank_of_uk/
Thanks for your advice!

Your update made the question more clear to me.
The y parameter should be a numerical value, and the data should be in 'long' format. I'm not sure how to change your data to fulfill these requirements. Therefore, I create some new data in this example. I have tried to make the data similar to the data in the plot that you have linked to.
Labels and stratum refer to the city-names. You can use geom_text to label the strata.
# Load libraries
library(tidyverse)
library(ggalluvial)
# Create some data
df_alluvial <- tibble(
city = rep(c("London", "Birmingham", "Manchester"), 4),
year = rep(c(1901, 1911, 1921, 1931), each = 3),
size = c(0, 10, 100, 10, 15, 100, 15, 20, 100, 30, 25, 100))
# Notice the data is in long-format
df_alluvial
#> # A tibble: 12 x 3
#> city year size
#> <chr> <dbl> <dbl>
#> 1 London 1901 0
#> 2 Birmingham 1901 10
#> 3 Manchester 1901 100
#> 4 London 1911 10
#> 5 Birmingham 1911 15
#> 6 Manchester 1911 100
#> 7 London 1921 15
#> 8 Birmingham 1921 20
#> 9 Manchester 1921 100
#> 10 London 1931 30
#> 11 Birmingham 1931 25
#> 12 Manchester 1931 100
ggplot(df_alluvial,
aes(x = as.factor(year), stratum = city, alluvium = city,
y = size,
fill = city, label = city))+
geom_stratum(alpha = .5)+
geom_alluvium()+
geom_text(stat = "stratum", size = 3)
If you want to sort the cities based on their size, you can add decreasing = TRUE to all layers in the plot.
ggplot(df_alluvial,
aes(x = as.factor(year), stratum = city, alluvium = city,
y = size,
fill = city, label = city))+
geom_stratum(alpha = .5, decreasing = TRUE)+
geom_alluvium(decreasing = TRUE)+
geom_text(stat = "stratum", size = 3, decreasing = TRUE)
Created on 2019-11-08 by the reprex package (v0.3.0)

Related

Get the proportions in ggplot2 (R) bar charts

Can someone provide me some hints as to what I am doing wrong in my code? Or what I need to correct to get the correct percentages? I am trying to get the proportions by manipulating my ggplot2 code. I would prefer not mutating a column. However, if I can't get ggplot2 to give me the correct proportions, I will then be open to adding columns.
Here is the reproduceable data:
cat_type<-c("1", "1","2","3","1","3", "3","2","1","1","1","3","3","2","3","2","3","1","3","3","3","1","3","1","3","1","1","3","1")
country<-c("India","India","India","India","India","India","India","India","India","India","Indonesia","Russia","Indonesia","Russia","Russia","Indonesia","Indonesia","Indonesia","Indonesia","Russia","Indonesia","Russia","Indonesia","Indonesia","Russia", "Russia", "India","India","India")
bigcats<-data.frame(cat_type=cat_type,country=country)
My data gives me the following proportions (these are correct):
> table(bigcats$cat_type, bigcats$country) ## raw numbers
India Indonesia Russia
1 7 3 2
2 2 1 1
3 4 5 4
>
> 100*round(prop.table(table(bigcats$cat_type, bigcats$country),2),3) ## proportions by column total
India Indonesia Russia
1 53.8 33.3 28.6
2 15.4 11.1 14.3
3 30.8 55.6 57.1
However, my ggplot2 is giving me the incorrect proportions:
bigcats %>% ggplot(aes(x=country, y = prop.table(stat(count)), fill=cat_type, label = scales::percent(prop.table(stat(count)))))+
geom_bar(position = position_fill())+
geom_text(stat = "count", position = position_fill(vjust=0.5),colour = "white", size = 5)+
labs(y="Percent",title="Top Big Cat Populations",x="Country")+
scale_fill_discrete(name=NULL,labels=c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar"))+
scale_y_continuous(labels = scales::percent)
The issue is that using prop.table(stat(count)) will not compute the proportions by categories or your countries, i.e. you do:
library(dplyr)
bigcats %>%
count(cat_type, country) %>%
mutate(pct = scales::percent(prop.table(n)))
#> cat_type country n pct
#> 1 1 India 7 24.1%
#> 2 1 Indonesia 3 10.3%
#> 3 1 Russia 2 6.9%
#> 4 2 India 2 6.9%
#> 5 2 Indonesia 1 3.4%
#> 6 2 Russia 1 3.4%
#> 7 3 India 4 13.8%
#> 8 3 Indonesia 5 17.2%
#> 9 3 Russia 4 13.8%
Making use of a helper function to reduce code duplication you could compute your desired proportions like so:
library(ggplot2)
prop <- function(count, group) {
count / tapply(count, group, sum)[group]
}
ggplot(bigcats, aes(
x = country, y = prop(after_stat(count), after_stat(x)),
fill = cat_type, label = scales::percent(prop(after_stat(count), after_stat(x)))
)) +
geom_bar(position = position_fill()) +
geom_text(stat = "count", position = position_fill(vjust = 0.5), colour = "white", size = 5) +
labs(y = "Percent", title = "Top Big Cat Populations", x = "Country") +
scale_fill_discrete(name = NULL, labels = c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar")) +
scale_y_continuous(labels = scales::percent)
Created on 2021-07-28 by the reprex package (v2.0.0)

How to change behaviour of `check_overlap = TRUE`?

My data is structured as follows:
> Comparison
# A tibble: 12 x 3
round TotalShots Year
<int> <dbl> <dbl>
1 1 70 2021
2 2 68 2021
3 3 76 2021
4 4 73 2021
5 5 66 2021
6 6 70 2021
7 1 115 2020
8 2 106 2020
9 3 75 2020
10 4 73 2020
11 5 82 2020
12 6 84 2020
I can plot this in ggplot2 via:
ggplot(Comparison, aes(x = round, y = TotalShots,
colour = factor(Year), label = TotalShots)) +
geom_line() +
geom_point(size = 14) +
geom_text(colour = "black", size = 5, check_overlap = TRUE)
However, in the plot, I have the label, at Rd3 printing as 76 and not 75. I assume this is because of check_overlap = TRUE however the plot is wrong, as year = 2020 for round = 3 should have the label of 75 and not 76.
Is there any way to please fix this?
You can try using ggrepel library for clarity of labels and to avoid overlapping.
library(ggrepel)
library(ggplot2)
ggplot(Comparison, aes(x = round, y = TotalShots,
colour = factor(Year), label = TotalShots)) +
geom_line() +
geom_point(size = 14) +
geom_label_repel(colour = "black", size = 5, nudge_y = 0.8)

pie chart for selected combobox item

I want a chart like this
I plot a pie chart in dashboard, but I want to plot a pie chart for the selected item in combobox, with the function plotly
my Data
State=c ('USA', 'Belgium', 'France','Russia')
totalcases= c(553, 226, 742,370)
totalrecovered=c(12,22,78,21)
totaldeath=c(48,24,12,22)
DTF = data.frame(State,totalcases,totalrecovered,totaldeath)
My code to plot one pie-chart:
labels=c("unrecovered","death","recovered")
USA=filter(DTF,DTF$State=="USA" )
USA=c(USA$Totalcases,USA$Totaldeath,USA$Totalrecovred)
p1= plot_ly(labels = ~labels,
values = ~USA, type = 'pie',
marker = list(colors = brewer.pal(7,"Spectral")))
p1
Thanks.
The problem is: your dataset is a total mess.(; Try this:
library(plotly)
library(RColorBrewer)
library(dplyr)
library(tidyr)
State=c ('USA', 'Belgium', 'France','Russia')
totalcases= c(553, 226, 742,370)
totalrecovered=c(12,22,78,21)
totaldeath=c(48,24,12,22)
DTF = data.frame(State,totalcases,totalrecovered,totaldeath)
dtf_long <- DTF %>%
pivot_longer(-State, names_to = "labels") %>%
mutate(labels = gsub("total", "", labels),
labels = ifelse(labels == "cases", "unrecovered", labels))
dtf_long
#> # A tibble: 12 x 3
#> State labels value
#> <fct> <chr> <dbl>
#> 1 USA unrecovered 553
#> 2 USA recovered 12
#> 3 USA death 48
#> 4 Belgium unrecovered 226
#> 5 Belgium recovered 22
#> 6 Belgium death 24
#> 7 France unrecovered 742
#> 8 France recovered 78
#> 9 France death 12
#> 10 Russia unrecovered 370
#> 11 Russia recovered 21
#> 12 Russia death 22
usa <- filter(dtf_long, State == "USA")
p1 <- usa %>%
plot_ly(labels = ~labels,
values = ~value, type = 'pie',
marker = list(colors = brewer.pal(7, "Spectral")))
p1
Created on 2020-04-04 by the reprex package (v0.3.0)

How to order a geom_col columns based on a summarised value in a pipe

I have tryed to understand the other results, but I could not.
This is my dataset:
> HIST
# A tibble: 1,071 x 16
Ano Leilao Fonte UF Vend Projeto
<dbl> <chr> <chr> <chr> <chr> <chr>
1 2008 2008 Leilao 1 Bio SP Abengoa UTE São Luiz (Abengoa São Luiz)
2 2013 2013 A-5 1 Bio MS AMANDINA Amandina
3 2017 2017 A-6 Bio MG BEVAP BIOENERGETICA AROEIRA 2
4 2015 2015 A-5 1 Bio BA Bolt BOLTBAH
5 2013 2013 A-5 1 Bio BA Bolt CAMPO GRANDE
6 2013 2013 A-5 1 Bio PI Bolt CANTO DO BURITI
7 2010 2010 LER Bio TO Bunge PEDRO AFONSO
8 2015 2015 LFA Bio SP Clealco CLEALCO QUEIROZ
9 2015 2015 A-3 Bio SP Clealco CLEALCO QUEIROZ
10 2008 2008 Leilao 1 Bio MG CMAA UTE Vale do Tijuco
# ... with 1,061 more rows, and 10 more variables: CODPPA <dttm>, CAPEX <dbl>,
# MW <dbl>, GF <dbl>, FC <dbl>, PPA <dbl>, RMW <dbl>, WACC <dbl>, TIR <dbl>,
# VPL <dbl>
`
I want to make a graph sorted by the sum(MW), like this:
HIST %>%
group_by(Fonte, UF)%>%
summarise(SUMMW = sum(MW))%>%
arrange(desc(SUMMW))%>%
ggplot(aes(x = UF, y = SUMMW, fill = Fonte))+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_col()
But the problem is that I get the following chart, not ordered by the sum of MW. I would like this graph`s columns to be ordered by the height of the columns:
thank you, Paulo
I think the easiest way is to reorder your variable SUMMW in the aestetics function aes with reorder(UF, desc(SUMMW)):
HIST %>%
group_by(Fonte, UF)%>%
summarise(SUMMW = sum(MW))%>%
arrange(desc(SUMMW))%>%
ggplot(aes(x = reorder(UF, desc(SUMMW)), y = SUMMW, fill = Fonte))+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_col()
You can get calculate the height of each bar first & assign UF the appropriate order as a factor. Otherwise ggplot will plot UF's values in categorical order on the x-axis.
# create summary data frame from HIST
df <- HIST %>%
group_by(Fonte, UF) %>%
summarise(SUMMW = sum(MW))
# calculate total bar height for each UF value, & sort accordingly.
df2 <- df %>%
group_by(UF) %>%
summarise(bar.heights = sum(SUMMW)) %>%
ungroup() %>%
arrange(desc(bar.heights))
# convert UF in the summary data frame to factor, with levels in the sorted order
df$UF <- factor(df$UF, levels = df2$UF)
rm(df2) # you can remove df2 after this; it's not needed anymore
# plot
ggplot(df,
aes(x = UF, y = SUMMW, fill = Fonte))+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_col()

Ordering a 2 bar plot in R

I have a data set as below and I have created a graph with below code as suggested in a previous question. What I want to do is order the bars by rankings rather than team names. Is that possible to do in ggplot?
Team Names PLRankingsReverse Grreserve
Liverpool 20 20
Chelsea 19 19
Manchester City 15 18
Arsenal 16 17
Tottenham 18 16
Manchester United 8 15
Everton 10 14
Watford 13 13
Burnley 17 12
Southampton 9 11
WBA 11 10
Stoke 4 9
Bournemouth 12 8
Leicester 7 7
Middlesbrough 14 6
C. Palace 6 5
West Ham 1 4
Hull 3 3
Swansea 5 2
Sunderland 2 1
And here is the code:
alldata <- read.csv("premierleague.csv")
library(ggplot2)
library(reshape2)
alldata <- melt(alldata)
ggplot(alldata, aes(x = Team.Names, y= value, fill = variable), xlab="Team Names") +
geom_bar(stat="identity", width=.5, position = "dodge")
Thanks for the help!
In this case you need to sort your data frame prior to melting and capture the order. You can then use this to set the limit order on scale_x_discrete, or you can factor Team Name in your aes string.
Using factor:
ordr <- order(alldata$`Team Names`, alldata$PLRankingsReverse, decreasing = TRUE)
alldata <- melt(alldata)
ggplot(alldata, aes(x = factor(`Team Name`, ordr), y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width = .5, position = "dodge")
Using scale_x_discrete:
ordr <- alldata$`Team Name`[order(alldata$PLRankingsReverse, decreasing = TRUE)]
alldata <- melt(alldata)
ggplot(alldata, aes(x = `Team Name`, y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width =. 5, position = "dodge") +
scale_x_discrete(limits = ordr)

Resources