I have a data set as below and I have created a graph with below code as suggested in a previous question. What I want to do is order the bars by rankings rather than team names. Is that possible to do in ggplot?
Team Names PLRankingsReverse Grreserve
Liverpool 20 20
Chelsea 19 19
Manchester City 15 18
Arsenal 16 17
Tottenham 18 16
Manchester United 8 15
Everton 10 14
Watford 13 13
Burnley 17 12
Southampton 9 11
WBA 11 10
Stoke 4 9
Bournemouth 12 8
Leicester 7 7
Middlesbrough 14 6
C. Palace 6 5
West Ham 1 4
Hull 3 3
Swansea 5 2
Sunderland 2 1
And here is the code:
alldata <- read.csv("premierleague.csv")
library(ggplot2)
library(reshape2)
alldata <- melt(alldata)
ggplot(alldata, aes(x = Team.Names, y= value, fill = variable), xlab="Team Names") +
geom_bar(stat="identity", width=.5, position = "dodge")
Thanks for the help!
In this case you need to sort your data frame prior to melting and capture the order. You can then use this to set the limit order on scale_x_discrete, or you can factor Team Name in your aes string.
Using factor:
ordr <- order(alldata$`Team Names`, alldata$PLRankingsReverse, decreasing = TRUE)
alldata <- melt(alldata)
ggplot(alldata, aes(x = factor(`Team Name`, ordr), y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width = .5, position = "dodge")
Using scale_x_discrete:
ordr <- alldata$`Team Name`[order(alldata$PLRankingsReverse, decreasing = TRUE)]
alldata <- melt(alldata)
ggplot(alldata, aes(x = `Team Name`, y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width =. 5, position = "dodge") +
scale_x_discrete(limits = ordr)
Related
I would like to use ggarrange to combine two plots into a single one, and use the same scaling for the y-axis to make it easier to compare the values, in my case, temperature calculations.
This is my sample data:
Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39
Using the following code, I get the following plot:
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot()
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot()
plate <- ggarrange(p1, p2, align = "hv")
plate
What I would like to have is for them to share their y-axis so that I can easily compare the two temperature calculation methods. I can do this manually with + scale_y_continuous(limits = c(a,b)) where I set the appropriate values for a and b, however, I have a lot of different dataframes with different temperature ranges. Ideally, I would use the lowest value (+ some space) from both plots, and the highest value (+ some space) from both plots for the limits of both plots. Is there a way to achieve this?
My manual approach and the desired output:
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot() +
scale_y_continuous(limits = c(10, 40))
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot() +
scale_y_continuous(limits = c(10, 40))
plate <- ggarrange(p1, p2, align = "hv")
plate
This is just to illustrate how you could use facets. Because you have two columns for temperature, this is called the 'wide format'. If you reshape the data to the long format, you can easily use facets.
library(ggplot2)
df <- read.table(text = " Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39", header = TRUE)
# Reshape
long <- tidyr::pivot_longer(
df, c("Temperature_A", "Temperature_B"),
values_to = "Temperature", names_to = "Temp_name"
)
# Example of facets
ggplot(long) +
geom_boxplot(aes(Group, Temperature)) +
facet_wrap(~ Temp_name)
Created on 2021-07-27 by the reprex package (v1.0.0)
I agree with teunbrand. Here is an slightly modified approach:
bring data in long format
ignore outlieres outlier.shape = NA
adapt ylim and
facet_wrap
library(tidyverse)
df1 <- df %>%
pivot_longer(
cols = starts_with("Temperature"),
names_to = "Temperature",
values_to = "values"
)
ggplot(df1, aes(Group, values)) +
geom_boxplot(outlier.shape = NA) +
coord_cartesian(ylim = c(10, 40)) +
facet_wrap(~Temperature) +
theme_bw()
Specifically for J.Doe who appeared unsatisfied with the current accepted answer, I'll post another answer which again ignores the 'ggarrange' part, but generalises better. Instead, the {patchwork} package is used.
Notably, you can use & scale_y_continuous(...) to apply a scale to all previous plots. You can use the plot + plot syntax to combine plots.
library(ggplot2)
library(patchwork)
df <- read.table(text = " Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39", header = TRUE)
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot()
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot()
p1 + p2 & scale_y_continuous(limits = c(10, 40))
The same works if you have a list of plots and use wrap_plot().
plot_list <- list(p1, p2)
wrap_plots(plot_list) & scale_y_continuous(limits = c(10, 40))
Created on 2022-12-08 by the reprex package (v2.0.0)
Can someone provide me some hints as to what I am doing wrong in my code? Or what I need to correct to get the correct percentages? I am trying to get the proportions by manipulating my ggplot2 code. I would prefer not mutating a column. However, if I can't get ggplot2 to give me the correct proportions, I will then be open to adding columns.
Here is the reproduceable data:
cat_type<-c("1", "1","2","3","1","3", "3","2","1","1","1","3","3","2","3","2","3","1","3","3","3","1","3","1","3","1","1","3","1")
country<-c("India","India","India","India","India","India","India","India","India","India","Indonesia","Russia","Indonesia","Russia","Russia","Indonesia","Indonesia","Indonesia","Indonesia","Russia","Indonesia","Russia","Indonesia","Indonesia","Russia", "Russia", "India","India","India")
bigcats<-data.frame(cat_type=cat_type,country=country)
My data gives me the following proportions (these are correct):
> table(bigcats$cat_type, bigcats$country) ## raw numbers
India Indonesia Russia
1 7 3 2
2 2 1 1
3 4 5 4
>
> 100*round(prop.table(table(bigcats$cat_type, bigcats$country),2),3) ## proportions by column total
India Indonesia Russia
1 53.8 33.3 28.6
2 15.4 11.1 14.3
3 30.8 55.6 57.1
However, my ggplot2 is giving me the incorrect proportions:
bigcats %>% ggplot(aes(x=country, y = prop.table(stat(count)), fill=cat_type, label = scales::percent(prop.table(stat(count)))))+
geom_bar(position = position_fill())+
geom_text(stat = "count", position = position_fill(vjust=0.5),colour = "white", size = 5)+
labs(y="Percent",title="Top Big Cat Populations",x="Country")+
scale_fill_discrete(name=NULL,labels=c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar"))+
scale_y_continuous(labels = scales::percent)
The issue is that using prop.table(stat(count)) will not compute the proportions by categories or your countries, i.e. you do:
library(dplyr)
bigcats %>%
count(cat_type, country) %>%
mutate(pct = scales::percent(prop.table(n)))
#> cat_type country n pct
#> 1 1 India 7 24.1%
#> 2 1 Indonesia 3 10.3%
#> 3 1 Russia 2 6.9%
#> 4 2 India 2 6.9%
#> 5 2 Indonesia 1 3.4%
#> 6 2 Russia 1 3.4%
#> 7 3 India 4 13.8%
#> 8 3 Indonesia 5 17.2%
#> 9 3 Russia 4 13.8%
Making use of a helper function to reduce code duplication you could compute your desired proportions like so:
library(ggplot2)
prop <- function(count, group) {
count / tapply(count, group, sum)[group]
}
ggplot(bigcats, aes(
x = country, y = prop(after_stat(count), after_stat(x)),
fill = cat_type, label = scales::percent(prop(after_stat(count), after_stat(x)))
)) +
geom_bar(position = position_fill()) +
geom_text(stat = "count", position = position_fill(vjust = 0.5), colour = "white", size = 5) +
labs(y = "Percent", title = "Top Big Cat Populations", x = "Country") +
scale_fill_discrete(name = NULL, labels = c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar")) +
scale_y_continuous(labels = scales::percent)
Created on 2021-07-28 by the reprex package (v2.0.0)
I have a table similar to this one:
Organization Timeframe Code id
1 Agencia1 Fortnight 1 International Affairs 1
2 Agencia2 Fortnight 1 Environment 2
3 Agencia2 Fortnight 1 Health 4
4 Agencia2 Fortnight 1 Public Policy 5
5 Agencia1 Fortnight 2 Politics 6
6 Agencia2 Fortnight 2 Disaster 7
7 Agencia1 Fortnight 2 Public Policy 8
8 Agencia1 Fortnight 2 Federal Government 9
9 Agencia1 Fortnight 2 Business 10
10 Agencia1 Fortnight 3 Federal Government 11
11 Agencia2 Fortnight 3 Dissemination - COVID19 12
12 Agencia1 Fortnight 3 Transparency - COVID19 13
13 Agencia2 Fortnight 3 Economy - COVID19 14
14 Agencia1 Fortnight 3 Prevention - COVID19 15
15 Agencia1 Fortnight 4 Economy 16
16 Agencia1 Fortnight 4 Media 17
17 Agencia1 Fortnight 4 Leisure 18
18 Agencia1 Fortnight 4 Politics 19
19 Agencia1 Fortnight 4 Prevention - COVID19 20
20 Agencia1 Fortnight 5 Prevention - COVID19 21
I would like to build an alluvial chart that could highlight the different topics covered by each organization during the fortnight. I managed to create a chart like this one but the flow isn't work.
So far, what I have done was it:
alluvial_data <- as.data.frame(FC_Outlets %>%select(Organization, Timeframe, Code))
alluvial_data <- alluvial_data %>% mutate(id = row_number())
#Remove duplicates
alluvial_data <- alluvial_data %>%
distinct(Organization, Timeframe, Code, .keep_all = TRUE)
# Convert Timeframe to Factor - Categorical Variable
alluvial_data$Timeframe <-as.factor(alluvial_data$Timeframe)
# Convert Code to String
alluvial_data$Code <-as.character(alluvial_data$Code)
library(RColorBrewer)
# Define the number of colors you want
nb.cols <- 10
mycolors <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
# Chart
ggplot(alluvial_data,
aes(x = Timeframe, stratum = Code, alluvium = id,
fill = Code, label = Code)) +
#scale_fill_brewer(type = "qual", palette = "Set2") +
scale_fill_manual(values = mycolors) +
geom_flow(stat = "alluvium", lode.guidance = "frontback",
color = "darkgray") +
geom_stratum() +
theme(legend.position = "bottom") +
ggtitle("Organizations")
Could you help me to identify why the alluvial chart is not working properly?
It is due to incorrect usage of aes in ggplot. The following code
c <- c(LETTERS[1:4], LETTERS[2:6], LETTERS[3:7], LETTERS[3:8])
t <- c(rep("Fortnight 1",4), rep("Fortnight 2",5), rep("Fortnight 3",5), rep("Fortnight 4",6))
s <- c(rep(c("Female","Male"),10))
ag <- c(2,3,4,6,11,13)
f <- rnorm(20,20,99)
df <- data.frame(Timeframe=t,Code=c,Sex=s,Freq=round(abs(f))) %>% mutate(Organization=ifelse((row_number() %in% ag), "Agencia2","Agencia1" ))
alluvial_data <- as.data.frame(df %>%select(Organization, Timeframe, Code, Freq, Sex))
alluvial_data <- alluvial_data %>% mutate(id = row_number())
#Remove duplicates
alluvial_data <- alluvial_data %>%
distinct(Organization, Timeframe, Code, Sex, .keep_all = TRUE)
#levels(alluvial_data$Timeframe)
# Convert Timeframe to Factor - Categorical Variable
alluvial_data$Timeframe <-as.factor(alluvial_data$Timeframe)
# Convert Code to String
alluvial_data$Code <-as.character(alluvial_data$Code)
library(RColorBrewer)
# Define the number of colors you want
nb.cols <- 10
mycolors <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
mycolor2 <- colorRampPalette(brewer.pal(2, "Set2"))(nb.cols)
# Chart
ggplot(alluvial_data,
aes(y = Freq, axis1 = Organization, axis2 = Timeframe, axis3 = Code,fill=Sex)) +
#scale_fill_brewer(type = "qual", palette = "Set2") +
scale_x_discrete(limits=c("Organization","Timeframe","Code"), expand=c(0.05,0.05)) +
scale_fill_manual(values = mycolors) +
geom_flow(stat = "alluvium", lode.guidance = "frontback" #, color="grey"
) +
geom_stratum(width = 1/4, fill = "cyan", color = "grey") +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
theme(legend.position = "bottom") +
ggtitle("Organizations") +
guides(fill=guide_legend(override.aes = list(color=mycolors[1:2])))+
labs(fill=NULL)
gives this output:
I have a pretty basic df in which I have calculated the rank-change of values between two timestamps:
value rank_A rank_B group
1 A 1 1 A
2 B 2 3 A
3 C 3 2 B
4 D 4 4 B
5 E 5 8 A
6 F 6 5 C
7 G 7 6 C
8 H 8 7 A
What makes it a bit tricky (for me) is plotting the values on the Y-axis.
ggplot(df_alluvial, aes(y = value, axis1 = rank_A, axis2 = rank_B))+
geom_alluvium(aes(fill = group), width = 1/12)+
...
As of now, I can plot the rank-change and the groups successfully, but they are not linked to my value-names - there are no axis names and I don't know how to add them.
In the end it should look similiar to this:
https://www.reddit.com/r/GraphicalExcellence/comments/4imh5f/alluvial_diagram_population_size_and_rank_of_uk/
Thanks for your advice!
Your update made the question more clear to me.
The y parameter should be a numerical value, and the data should be in 'long' format. I'm not sure how to change your data to fulfill these requirements. Therefore, I create some new data in this example. I have tried to make the data similar to the data in the plot that you have linked to.
Labels and stratum refer to the city-names. You can use geom_text to label the strata.
# Load libraries
library(tidyverse)
library(ggalluvial)
# Create some data
df_alluvial <- tibble(
city = rep(c("London", "Birmingham", "Manchester"), 4),
year = rep(c(1901, 1911, 1921, 1931), each = 3),
size = c(0, 10, 100, 10, 15, 100, 15, 20, 100, 30, 25, 100))
# Notice the data is in long-format
df_alluvial
#> # A tibble: 12 x 3
#> city year size
#> <chr> <dbl> <dbl>
#> 1 London 1901 0
#> 2 Birmingham 1901 10
#> 3 Manchester 1901 100
#> 4 London 1911 10
#> 5 Birmingham 1911 15
#> 6 Manchester 1911 100
#> 7 London 1921 15
#> 8 Birmingham 1921 20
#> 9 Manchester 1921 100
#> 10 London 1931 30
#> 11 Birmingham 1931 25
#> 12 Manchester 1931 100
ggplot(df_alluvial,
aes(x = as.factor(year), stratum = city, alluvium = city,
y = size,
fill = city, label = city))+
geom_stratum(alpha = .5)+
geom_alluvium()+
geom_text(stat = "stratum", size = 3)
If you want to sort the cities based on their size, you can add decreasing = TRUE to all layers in the plot.
ggplot(df_alluvial,
aes(x = as.factor(year), stratum = city, alluvium = city,
y = size,
fill = city, label = city))+
geom_stratum(alpha = .5, decreasing = TRUE)+
geom_alluvium(decreasing = TRUE)+
geom_text(stat = "stratum", size = 3, decreasing = TRUE)
Created on 2019-11-08 by the reprex package (v0.3.0)
What I am trying to write is, if the Team on the X axis is less than Average on the X axis or in this case 17.62, color it different and same for Greater than. So Average on the X axis is black and then the ones above and below are different colors.
dput()Road.Team Goalie.Pts.at.Home.Vs.Road.TEAM
1 ANA 16.67692
2 ARI 23.20000
3 BOS 18.86667
4 BUF 23.14667
5 CAR 23.44615
6 CBJ 16.62857
7 CGY 17.33333
8 CHI 21.77143
9 COL 18.70769
10 DAL 18.70000
11 DET 20.32000
12 EDM 16.94118
13 FLA 23.68000
14 LAK 12.14118
15 MIN 17.49333
16 MTL 21.71429
17 NJD 11.64444
18 NSH 17.12000
19 NYI 15.55556
20 NYR 18.04444
21 OTT 13.31429
22 PHI 21.25714
23 PIT 20.28235
24 SJS 14.45714
25 STL 15.80000
26 TBL 12.21333
27 TOR 10.44444
28 VAN 14.40000
29 VGK 16.35000
30 WPG 19.41333
31 WSH 15.26154
32 Average 17.62340
rm(stats)
stats <- read.csv("HomeGoaliesPtsforVisitingTeam.csv")
stats
library(tidyverse)
p <- stats %>%
gather(key, value, -Road.Team) %>%
ggplot(aes(x=Road.Team, y=value)) +
geom_col(position = "dodge")
p + labs(y = "FanDuel Goalie Pts",x = "Road Team",title = "Points for Home Goalies Vs Road Team")
Create a logical variable in your stats data frame and pass that to the fill argument in aes(). Something like:
stats$compare <- stats$y <= mean(stats$y)
p <- stats %>%
gather(key, value, -Road.Team) %>%
ggplot(aes(x=Road.Team, y=value, fill=compare)) +
geom_col(position = "dodge")