Alluvial Charts - Flow is not showing - r

I have a table similar to this one:
Organization Timeframe Code id
1 Agencia1 Fortnight 1 International Affairs 1
2 Agencia2 Fortnight 1 Environment 2
3 Agencia2 Fortnight 1 Health 4
4 Agencia2 Fortnight 1 Public Policy 5
5 Agencia1 Fortnight 2 Politics 6
6 Agencia2 Fortnight 2 Disaster 7
7 Agencia1 Fortnight 2 Public Policy 8
8 Agencia1 Fortnight 2 Federal Government 9
9 Agencia1 Fortnight 2 Business 10
10 Agencia1 Fortnight 3 Federal Government 11
11 Agencia2 Fortnight 3 Dissemination - COVID19 12
12 Agencia1 Fortnight 3 Transparency - COVID19 13
13 Agencia2 Fortnight 3 Economy - COVID19 14
14 Agencia1 Fortnight 3 Prevention - COVID19 15
15 Agencia1 Fortnight 4 Economy 16
16 Agencia1 Fortnight 4 Media 17
17 Agencia1 Fortnight 4 Leisure 18
18 Agencia1 Fortnight 4 Politics 19
19 Agencia1 Fortnight 4 Prevention - COVID19 20
20 Agencia1 Fortnight 5 Prevention - COVID19 21
I would like to build an alluvial chart that could highlight the different topics covered by each organization during the fortnight. I managed to create a chart like this one but the flow isn't work.
So far, what I have done was it:
alluvial_data <- as.data.frame(FC_Outlets %>%select(Organization, Timeframe, Code))
alluvial_data <- alluvial_data %>% mutate(id = row_number())
#Remove duplicates
alluvial_data <- alluvial_data %>%
distinct(Organization, Timeframe, Code, .keep_all = TRUE)
# Convert Timeframe to Factor - Categorical Variable
alluvial_data$Timeframe <-as.factor(alluvial_data$Timeframe)
# Convert Code to String
alluvial_data$Code <-as.character(alluvial_data$Code)
library(RColorBrewer)
# Define the number of colors you want
nb.cols <- 10
mycolors <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
# Chart
ggplot(alluvial_data,
aes(x = Timeframe, stratum = Code, alluvium = id,
fill = Code, label = Code)) +
#scale_fill_brewer(type = "qual", palette = "Set2") +
scale_fill_manual(values = mycolors) +
geom_flow(stat = "alluvium", lode.guidance = "frontback",
color = "darkgray") +
geom_stratum() +
theme(legend.position = "bottom") +
ggtitle("Organizations")
Could you help me to identify why the alluvial chart is not working properly?

It is due to incorrect usage of aes in ggplot. The following code
c <- c(LETTERS[1:4], LETTERS[2:6], LETTERS[3:7], LETTERS[3:8])
t <- c(rep("Fortnight 1",4), rep("Fortnight 2",5), rep("Fortnight 3",5), rep("Fortnight 4",6))
s <- c(rep(c("Female","Male"),10))
ag <- c(2,3,4,6,11,13)
f <- rnorm(20,20,99)
df <- data.frame(Timeframe=t,Code=c,Sex=s,Freq=round(abs(f))) %>% mutate(Organization=ifelse((row_number() %in% ag), "Agencia2","Agencia1" ))
alluvial_data <- as.data.frame(df %>%select(Organization, Timeframe, Code, Freq, Sex))
alluvial_data <- alluvial_data %>% mutate(id = row_number())
#Remove duplicates
alluvial_data <- alluvial_data %>%
distinct(Organization, Timeframe, Code, Sex, .keep_all = TRUE)
#levels(alluvial_data$Timeframe)
# Convert Timeframe to Factor - Categorical Variable
alluvial_data$Timeframe <-as.factor(alluvial_data$Timeframe)
# Convert Code to String
alluvial_data$Code <-as.character(alluvial_data$Code)
library(RColorBrewer)
# Define the number of colors you want
nb.cols <- 10
mycolors <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
mycolor2 <- colorRampPalette(brewer.pal(2, "Set2"))(nb.cols)
# Chart
ggplot(alluvial_data,
aes(y = Freq, axis1 = Organization, axis2 = Timeframe, axis3 = Code,fill=Sex)) +
#scale_fill_brewer(type = "qual", palette = "Set2") +
scale_x_discrete(limits=c("Organization","Timeframe","Code"), expand=c(0.05,0.05)) +
scale_fill_manual(values = mycolors) +
geom_flow(stat = "alluvium", lode.guidance = "frontback" #, color="grey"
) +
geom_stratum(width = 1/4, fill = "cyan", color = "grey") +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
theme(legend.position = "bottom") +
ggtitle("Organizations") +
guides(fill=guide_legend(override.aes = list(color=mycolors[1:2])))+
labs(fill=NULL)
gives this output:

Related

Get the proportions in ggplot2 (R) bar charts

Can someone provide me some hints as to what I am doing wrong in my code? Or what I need to correct to get the correct percentages? I am trying to get the proportions by manipulating my ggplot2 code. I would prefer not mutating a column. However, if I can't get ggplot2 to give me the correct proportions, I will then be open to adding columns.
Here is the reproduceable data:
cat_type<-c("1", "1","2","3","1","3", "3","2","1","1","1","3","3","2","3","2","3","1","3","3","3","1","3","1","3","1","1","3","1")
country<-c("India","India","India","India","India","India","India","India","India","India","Indonesia","Russia","Indonesia","Russia","Russia","Indonesia","Indonesia","Indonesia","Indonesia","Russia","Indonesia","Russia","Indonesia","Indonesia","Russia", "Russia", "India","India","India")
bigcats<-data.frame(cat_type=cat_type,country=country)
My data gives me the following proportions (these are correct):
> table(bigcats$cat_type, bigcats$country) ## raw numbers
India Indonesia Russia
1 7 3 2
2 2 1 1
3 4 5 4
>
> 100*round(prop.table(table(bigcats$cat_type, bigcats$country),2),3) ## proportions by column total
India Indonesia Russia
1 53.8 33.3 28.6
2 15.4 11.1 14.3
3 30.8 55.6 57.1
However, my ggplot2 is giving me the incorrect proportions:
bigcats %>% ggplot(aes(x=country, y = prop.table(stat(count)), fill=cat_type, label = scales::percent(prop.table(stat(count)))))+
geom_bar(position = position_fill())+
geom_text(stat = "count", position = position_fill(vjust=0.5),colour = "white", size = 5)+
labs(y="Percent",title="Top Big Cat Populations",x="Country")+
scale_fill_discrete(name=NULL,labels=c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar"))+
scale_y_continuous(labels = scales::percent)
The issue is that using prop.table(stat(count)) will not compute the proportions by categories or your countries, i.e. you do:
library(dplyr)
bigcats %>%
count(cat_type, country) %>%
mutate(pct = scales::percent(prop.table(n)))
#> cat_type country n pct
#> 1 1 India 7 24.1%
#> 2 1 Indonesia 3 10.3%
#> 3 1 Russia 2 6.9%
#> 4 2 India 2 6.9%
#> 5 2 Indonesia 1 3.4%
#> 6 2 Russia 1 3.4%
#> 7 3 India 4 13.8%
#> 8 3 Indonesia 5 17.2%
#> 9 3 Russia 4 13.8%
Making use of a helper function to reduce code duplication you could compute your desired proportions like so:
library(ggplot2)
prop <- function(count, group) {
count / tapply(count, group, sum)[group]
}
ggplot(bigcats, aes(
x = country, y = prop(after_stat(count), after_stat(x)),
fill = cat_type, label = scales::percent(prop(after_stat(count), after_stat(x)))
)) +
geom_bar(position = position_fill()) +
geom_text(stat = "count", position = position_fill(vjust = 0.5), colour = "white", size = 5) +
labs(y = "Percent", title = "Top Big Cat Populations", x = "Country") +
scale_fill_discrete(name = NULL, labels = c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar")) +
scale_y_continuous(labels = scales::percent)
Created on 2021-07-28 by the reprex package (v2.0.0)

How to name different geom_smooth lines in ggplot?

I plotted several smooth lines by using different columns in my dataframe. I put those lines into one figure. However, I don't know how to name those lines. Since I do not have "groups" in my dataframe, the legend does not fix the problem.
Sorry that I did not figure out how to upload my data. But I did upload the figure I plotted.
# dfplot is my dataframe
head(dfplot)
fall_t falltsq winter_t wintertsq spring_t springtsq fall_p fallpsq winter_p winterpsq spring_p springpsq ffall_t fwinter_t
1 15.08704 227.6187 1.9648148 3.8604973 14.15000 200.2225 6.12 37.4544 2.83 8.0089 10.27 105.4729 3.303902 3.365150
2 14.67407 215.3284 -0.9666667 0.9344444 13.15000 172.9225 13.89 192.9321 3.21 10.3041 16.02 256.6404 3.043521 3.331537
3 14.13519 199.8035 2.2333333 4.9877778 10.95926 120.1054 7.39 54.6121 6.42 41.2164 17.20 295.8400 3.208130 3.164450
4 15.32963 234.9975 -1.5629630 2.4428532 11.02593 121.5710 11.21 125.6641 4.46 19.8916 13.98 195.4404 2.972689 3.342540
5 14.12222 199.4372 -1.4611111 2.1348457 14.49630 210.1426 10.58 111.9364 11.71 137.1241 12.89 166.1521 3.382247 3.654554
6 13.25926 175.8080 1.0388889 1.0792901 14.82963 219.9179 14.56 211.9936 4.14 17.1396 8.84 78.1456 3.327567 3.323556
fspring_t
1 3.253946
2 3.087533
3 3.485115
4 3.331752
5 3.213873
6 3.033545
p <- ggplot(data = dfplot) +
geom_smooth(mapping = aes(x = fall_t, y = ffall_t), color = "red", se = F) +
geom_smooth(mapping = aes(x = winter_t, y = fwinter_t), color = "blue", se = F) +
geom_smooth(mapping = aes(x = spring_t, y = fspring_t), color = "green", se = F)
p + xlab("Temperature") + ylab("log yields") + theme(legend.position="right")
As noted in the comments, ggplot2 works best when data are in "long" format, rather than "wide".
We can use tidyr::gather to transform your example data:
library(tidyverse) # for dplyr, tidyr, ggplot2
dfplot_long <- dfplot %>%
select(spring_t, fall_t, winter_t, fspring_t, ffall_t, fwinter_t) %>%
gather(Var, Val, 1:3) %>%
gather(Var2, Val2, 1:3) %>%
filter(Var == gsub("^f", "", Var2)) %>%
mutate(season = gsub("_t", "", Var))
dfplot
Var Val Var2 Val2 season
1 spring_t 14.1500000 fspring_t 3.253946 spring
2 spring_t 13.1500000 fspring_t 3.087533 spring
3 spring_t 10.9592600 fspring_t 3.485115 spring
4 spring_t 11.0259300 fspring_t 3.331752 spring
5 spring_t 14.4963000 fspring_t 3.213873 spring
6 spring_t 14.8296300 fspring_t 3.033545 spring
7 fall_t 15.0870400 ffall_t 3.303902 fall
8 fall_t 14.6740700 ffall_t 3.043521 fall
9 fall_t 14.1351900 ffall_t 3.208130 fall
10 fall_t 15.3296300 ffall_t 2.972689 fall
11 fall_t 14.1222200 ffall_t 3.382247 fall
12 fall_t 13.2592600 ffall_t 3.327567 fall
13 winter_t 1.9648148 fwinter_t 3.365150 winter
14 winter_t -0.9666667 fwinter_t 3.331537 winter
15 winter_t 2.2333333 fwinter_t 3.164450 winter
16 winter_t -1.5629630 fwinter_t 3.342540 winter
17 winter_t -1.4611111 fwinter_t 3.654554 winter
18 winter_t 1.0388889 fwinter_t 3.323556 winter
And then:
dfplot_long %>%
ggplot(aes(Val, Val2)) +
geom_smooth(aes(color = season), se = FALSE)
Result:

ggplot color bar for less than and great than Average

What I am trying to write is, if the Team on the X axis is less than Average on the X axis or in this case 17.62, color it different and same for Greater than. So Average on the X axis is black and then the ones above and below are different colors.
dput()Road.Team Goalie.Pts.at.Home.Vs.Road.TEAM
1 ANA 16.67692
2 ARI 23.20000
3 BOS 18.86667
4 BUF 23.14667
5 CAR 23.44615
6 CBJ 16.62857
7 CGY 17.33333
8 CHI 21.77143
9 COL 18.70769
10 DAL 18.70000
11 DET 20.32000
12 EDM 16.94118
13 FLA 23.68000
14 LAK 12.14118
15 MIN 17.49333
16 MTL 21.71429
17 NJD 11.64444
18 NSH 17.12000
19 NYI 15.55556
20 NYR 18.04444
21 OTT 13.31429
22 PHI 21.25714
23 PIT 20.28235
24 SJS 14.45714
25 STL 15.80000
26 TBL 12.21333
27 TOR 10.44444
28 VAN 14.40000
29 VGK 16.35000
30 WPG 19.41333
31 WSH 15.26154
32 Average 17.62340
rm(stats)
stats <- read.csv("HomeGoaliesPtsforVisitingTeam.csv")
stats
library(tidyverse)
p <- stats %>%
gather(key, value, -Road.Team) %>%
ggplot(aes(x=Road.Team, y=value)) +
geom_col(position = "dodge")
p + labs(y = "FanDuel Goalie Pts",x = "Road Team",title = "Points for Home Goalies Vs Road Team")
Create a logical variable in your stats data frame and pass that to the fill argument in aes(). Something like:
stats$compare <- stats$y <= mean(stats$y)
p <- stats %>%
gather(key, value, -Road.Team) %>%
ggplot(aes(x=Road.Team, y=value, fill=compare)) +
geom_col(position = "dodge")

Ordering a 2 bar plot in R

I have a data set as below and I have created a graph with below code as suggested in a previous question. What I want to do is order the bars by rankings rather than team names. Is that possible to do in ggplot?
Team Names PLRankingsReverse Grreserve
Liverpool 20 20
Chelsea 19 19
Manchester City 15 18
Arsenal 16 17
Tottenham 18 16
Manchester United 8 15
Everton 10 14
Watford 13 13
Burnley 17 12
Southampton 9 11
WBA 11 10
Stoke 4 9
Bournemouth 12 8
Leicester 7 7
Middlesbrough 14 6
C. Palace 6 5
West Ham 1 4
Hull 3 3
Swansea 5 2
Sunderland 2 1
And here is the code:
alldata <- read.csv("premierleague.csv")
library(ggplot2)
library(reshape2)
alldata <- melt(alldata)
ggplot(alldata, aes(x = Team.Names, y= value, fill = variable), xlab="Team Names") +
geom_bar(stat="identity", width=.5, position = "dodge")
Thanks for the help!
In this case you need to sort your data frame prior to melting and capture the order. You can then use this to set the limit order on scale_x_discrete, or you can factor Team Name in your aes string.
Using factor:
ordr <- order(alldata$`Team Names`, alldata$PLRankingsReverse, decreasing = TRUE)
alldata <- melt(alldata)
ggplot(alldata, aes(x = factor(`Team Name`, ordr), y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width = .5, position = "dodge")
Using scale_x_discrete:
ordr <- alldata$`Team Name`[order(alldata$PLRankingsReverse, decreasing = TRUE)]
alldata <- melt(alldata)
ggplot(alldata, aes(x = `Team Name`, y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width =. 5, position = "dodge") +
scale_x_discrete(limits = ordr)

Drawing colored US State map with cut_number() in R

I have a dataframe called "drawdata":
GeoName Ranking
1 Alabama 15
2 Alaska 2
3 Arizona 28
4 Arkansas 12
5 California 19
6 Colorado 7
7 Connecticut 42
8 Delaware 37
9 District of Columbia 9
10 Florida 38
11 Georgia 11
12 Hawaii 48
13 Idaho 10
14 Illinois 16
15 Indiana 26
16 Iowa 34
17 Kansas 27
18 Kentucky 20
19 Louisiana 4
20 Maine 51
21 Maryland 30
22 Massachusetts 39
23 Michigan 14
24 Minnesota 23
25 Mississippi 41
26 Missouri 32
27 Montana 25
28 Nebraska 21
29 Nevada 45
30 New Hampshire 47
31 New Jersey 33
32 New Mexico 5
33 New York 44
34 North Carolina 13
35 North Dakota 31
36 Ohio 35
37 Oklahoma 6
38 Oregon 18
39 Pennsylvania 40
40 Rhode Island 49
41 South Carolina 29
42 South Dakota 46
43 Tennessee 43
44 Texas 3
45 Utah 17
46 Vermont 50
47 Virginia 8
48 Washington 24
49 West Virginia 22
50 Wisconsin 36
51 Wyoming 1
And I want to draw a US State map with different colors for each ranking. The code I have is:
names(drawdata) = c('region','value')
drawdata[,1] = tolower(drawdata[,1])
states = data.frame(state.center, state.abb)
states_map = map_data("state")
df = merge(drawdata, states_map, by = "region")
df$num = 49
p1 = ggplot(data = df, aes(x = long, y = lat, group = group))
p1 = p1 + geom_polygon(aes(fill = cut_number(value, num[1])))
p1 = p1 + geom_path(colour = 'gray', linestyle = 2)
p1 = p1 + scale_fill_brewer('', palette = 'PuRd')
p1 = p1 + coord_map()
p1 = p1 + scale_x_continuous(breaks=NULL) + scale_y_continuous(breaks=NULL)
p1 = p1 + theme(legend.position="none")
p1 = p1 + geom_text(data = states, aes(x = x, y = y, label = state.abb, group = NULL), size = 2)
p1
This perfectly works if 'num', or the number of colors to fill, is small. However, when I set 'num=49', then it produces an error:
Error in cut.default(x, breaks(x, "n", n), include.lowest = TRUE, ...) :
'breaks' are not unique
When I alter the code from
p1 = p1 + geom_polygon(aes(fill = cut_number(value, num[1])))
to
p1 = p1 + geom_polygon(aes(fill = cut_number(unique(value), num[1])))
then it gives me a different error:
Error: Aesthetics must either be length one, or the same length as the dataProblems:cut_number(unique(value), num[1])
I want a map where every 49 States in the map have different colors, each reflecting their 'Ranking'. Any help is very appreciated!
Brewer palettes deliberately have small maximums (generally < 12) since it's pretty much impossible for humans to map the subtle differences to the discrete values you have. You can achieve what you're looking for by "faking" it with scale_fill_gradient2 (NOTE: I deliberately left the legend in as you should too):
library(ggplot2)
names(drawdata) <- c('region','value')
drawdata[,1] <- tolower(drawdata[,1])
states <- data.frame(state.center, state.abb)
states <- states[!(states$state.abb %in% c("AK", "HI")),] # they aren't part of states_map
states_map <- map_data("state")
p1 <- ggplot()
# borders
p1 <- p1 + geom_map(data=states_map, map=states_map,
aes(x=long, y=lat, map_id=region),
color="white", size=0.15)
# fills
p1 <- p1 + geom_map(data=drawdata, map=states_map,
aes(fill=value, map_id=region),
color="white", size=0.15)
# labels
p1 <- p1 + geom_text(data=states,
aes(x=x, y=y, label=state.abb, group=NULL), size=2)
# decent projection
p1 <- p1 + coord_map("albers", lat0=39, lat1=45)
p1 <- p1 + scale_fill_gradient2(low="#f7f4f9", mid="#df65b0", high="#67001f")
# better theme
p1 <- p1 + labs(x=NULL, y=NULL)
p1 <- p1 + theme_bw()
p1 <- p1 + theme(panel.grid=element_blank())
p1 <- p1 + theme(panel.border=element_blank())
p1 <- p1 + theme(axis.ticks=element_blank())
p1 <- p1 + theme(axis.text=element_blank())
p1
You can get an even better result with scale_fill_distiller which does alot under the scenes to let you use a Color Brewer palette with continuous data (I'd argue you do not have continuous data tho):
p1 <- p1 + scale_fill_distiller(palette="PuRd")
I'd strongly suggest continuing to use cut like you had originally and having a max of 9 breaks to fit into the Color Brewer palette you're trying to work with. In reality, folks are still going to need a table to really grok the rankings (never assume Americans know either state shapes, locations or even the two-letter abbreviations for them), so I'd also pretty much just suggest using an actual table with full names at least with this choropleth if not in place of it.
Note also that the way you're trying to build the map deliberately excluded Alaska, Hawaii and the District of Columbia. You'll need to use a real shapefile and something like I cover here to get them to show up nicely.
If you want different colors for each state, using a gradient, you can work with scale_fill_gradient. Here is one version, using green and red at the ends of the gradient, so that each state is on that scale.
ggplot(data = df, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = value)) +
geom_path(colour = 'gray', linestyle = 2) +
scale_fill_gradient(low = "green", high = "red") +
coord_map() +
scale_x_continuous(breaks=NULL) + scale_y_continuous(breaks=NULL) +
theme(legend.position="none") +
geom_text(data = states, aes(x = x, y = y, label = state.abb, group = NULL), size = 2)

Resources