How to name different geom_smooth lines in ggplot? - r

I plotted several smooth lines by using different columns in my dataframe. I put those lines into one figure. However, I don't know how to name those lines. Since I do not have "groups" in my dataframe, the legend does not fix the problem.
Sorry that I did not figure out how to upload my data. But I did upload the figure I plotted.
# dfplot is my dataframe
head(dfplot)
fall_t falltsq winter_t wintertsq spring_t springtsq fall_p fallpsq winter_p winterpsq spring_p springpsq ffall_t fwinter_t
1 15.08704 227.6187 1.9648148 3.8604973 14.15000 200.2225 6.12 37.4544 2.83 8.0089 10.27 105.4729 3.303902 3.365150
2 14.67407 215.3284 -0.9666667 0.9344444 13.15000 172.9225 13.89 192.9321 3.21 10.3041 16.02 256.6404 3.043521 3.331537
3 14.13519 199.8035 2.2333333 4.9877778 10.95926 120.1054 7.39 54.6121 6.42 41.2164 17.20 295.8400 3.208130 3.164450
4 15.32963 234.9975 -1.5629630 2.4428532 11.02593 121.5710 11.21 125.6641 4.46 19.8916 13.98 195.4404 2.972689 3.342540
5 14.12222 199.4372 -1.4611111 2.1348457 14.49630 210.1426 10.58 111.9364 11.71 137.1241 12.89 166.1521 3.382247 3.654554
6 13.25926 175.8080 1.0388889 1.0792901 14.82963 219.9179 14.56 211.9936 4.14 17.1396 8.84 78.1456 3.327567 3.323556
fspring_t
1 3.253946
2 3.087533
3 3.485115
4 3.331752
5 3.213873
6 3.033545
p <- ggplot(data = dfplot) +
geom_smooth(mapping = aes(x = fall_t, y = ffall_t), color = "red", se = F) +
geom_smooth(mapping = aes(x = winter_t, y = fwinter_t), color = "blue", se = F) +
geom_smooth(mapping = aes(x = spring_t, y = fspring_t), color = "green", se = F)
p + xlab("Temperature") + ylab("log yields") + theme(legend.position="right")

As noted in the comments, ggplot2 works best when data are in "long" format, rather than "wide".
We can use tidyr::gather to transform your example data:
library(tidyverse) # for dplyr, tidyr, ggplot2
dfplot_long <- dfplot %>%
select(spring_t, fall_t, winter_t, fspring_t, ffall_t, fwinter_t) %>%
gather(Var, Val, 1:3) %>%
gather(Var2, Val2, 1:3) %>%
filter(Var == gsub("^f", "", Var2)) %>%
mutate(season = gsub("_t", "", Var))
dfplot
Var Val Var2 Val2 season
1 spring_t 14.1500000 fspring_t 3.253946 spring
2 spring_t 13.1500000 fspring_t 3.087533 spring
3 spring_t 10.9592600 fspring_t 3.485115 spring
4 spring_t 11.0259300 fspring_t 3.331752 spring
5 spring_t 14.4963000 fspring_t 3.213873 spring
6 spring_t 14.8296300 fspring_t 3.033545 spring
7 fall_t 15.0870400 ffall_t 3.303902 fall
8 fall_t 14.6740700 ffall_t 3.043521 fall
9 fall_t 14.1351900 ffall_t 3.208130 fall
10 fall_t 15.3296300 ffall_t 2.972689 fall
11 fall_t 14.1222200 ffall_t 3.382247 fall
12 fall_t 13.2592600 ffall_t 3.327567 fall
13 winter_t 1.9648148 fwinter_t 3.365150 winter
14 winter_t -0.9666667 fwinter_t 3.331537 winter
15 winter_t 2.2333333 fwinter_t 3.164450 winter
16 winter_t -1.5629630 fwinter_t 3.342540 winter
17 winter_t -1.4611111 fwinter_t 3.654554 winter
18 winter_t 1.0388889 fwinter_t 3.323556 winter
And then:
dfplot_long %>%
ggplot(aes(Val, Val2)) +
geom_smooth(aes(color = season), se = FALSE)
Result:

Related

Get the proportions in ggplot2 (R) bar charts

Can someone provide me some hints as to what I am doing wrong in my code? Or what I need to correct to get the correct percentages? I am trying to get the proportions by manipulating my ggplot2 code. I would prefer not mutating a column. However, if I can't get ggplot2 to give me the correct proportions, I will then be open to adding columns.
Here is the reproduceable data:
cat_type<-c("1", "1","2","3","1","3", "3","2","1","1","1","3","3","2","3","2","3","1","3","3","3","1","3","1","3","1","1","3","1")
country<-c("India","India","India","India","India","India","India","India","India","India","Indonesia","Russia","Indonesia","Russia","Russia","Indonesia","Indonesia","Indonesia","Indonesia","Russia","Indonesia","Russia","Indonesia","Indonesia","Russia", "Russia", "India","India","India")
bigcats<-data.frame(cat_type=cat_type,country=country)
My data gives me the following proportions (these are correct):
> table(bigcats$cat_type, bigcats$country) ## raw numbers
India Indonesia Russia
1 7 3 2
2 2 1 1
3 4 5 4
>
> 100*round(prop.table(table(bigcats$cat_type, bigcats$country),2),3) ## proportions by column total
India Indonesia Russia
1 53.8 33.3 28.6
2 15.4 11.1 14.3
3 30.8 55.6 57.1
However, my ggplot2 is giving me the incorrect proportions:
bigcats %>% ggplot(aes(x=country, y = prop.table(stat(count)), fill=cat_type, label = scales::percent(prop.table(stat(count)))))+
geom_bar(position = position_fill())+
geom_text(stat = "count", position = position_fill(vjust=0.5),colour = "white", size = 5)+
labs(y="Percent",title="Top Big Cat Populations",x="Country")+
scale_fill_discrete(name=NULL,labels=c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar"))+
scale_y_continuous(labels = scales::percent)
The issue is that using prop.table(stat(count)) will not compute the proportions by categories or your countries, i.e. you do:
library(dplyr)
bigcats %>%
count(cat_type, country) %>%
mutate(pct = scales::percent(prop.table(n)))
#> cat_type country n pct
#> 1 1 India 7 24.1%
#> 2 1 Indonesia 3 10.3%
#> 3 1 Russia 2 6.9%
#> 4 2 India 2 6.9%
#> 5 2 Indonesia 1 3.4%
#> 6 2 Russia 1 3.4%
#> 7 3 India 4 13.8%
#> 8 3 Indonesia 5 17.2%
#> 9 3 Russia 4 13.8%
Making use of a helper function to reduce code duplication you could compute your desired proportions like so:
library(ggplot2)
prop <- function(count, group) {
count / tapply(count, group, sum)[group]
}
ggplot(bigcats, aes(
x = country, y = prop(after_stat(count), after_stat(x)),
fill = cat_type, label = scales::percent(prop(after_stat(count), after_stat(x)))
)) +
geom_bar(position = position_fill()) +
geom_text(stat = "count", position = position_fill(vjust = 0.5), colour = "white", size = 5) +
labs(y = "Percent", title = "Top Big Cat Populations", x = "Country") +
scale_fill_discrete(name = NULL, labels = c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar")) +
scale_y_continuous(labels = scales::percent)
Created on 2021-07-28 by the reprex package (v2.0.0)

Alluvial Charts - Flow is not showing

I have a table similar to this one:
Organization Timeframe Code id
1 Agencia1 Fortnight 1 International Affairs 1
2 Agencia2 Fortnight 1 Environment 2
3 Agencia2 Fortnight 1 Health 4
4 Agencia2 Fortnight 1 Public Policy 5
5 Agencia1 Fortnight 2 Politics 6
6 Agencia2 Fortnight 2 Disaster 7
7 Agencia1 Fortnight 2 Public Policy 8
8 Agencia1 Fortnight 2 Federal Government 9
9 Agencia1 Fortnight 2 Business 10
10 Agencia1 Fortnight 3 Federal Government 11
11 Agencia2 Fortnight 3 Dissemination - COVID19 12
12 Agencia1 Fortnight 3 Transparency - COVID19 13
13 Agencia2 Fortnight 3 Economy - COVID19 14
14 Agencia1 Fortnight 3 Prevention - COVID19 15
15 Agencia1 Fortnight 4 Economy 16
16 Agencia1 Fortnight 4 Media 17
17 Agencia1 Fortnight 4 Leisure 18
18 Agencia1 Fortnight 4 Politics 19
19 Agencia1 Fortnight 4 Prevention - COVID19 20
20 Agencia1 Fortnight 5 Prevention - COVID19 21
I would like to build an alluvial chart that could highlight the different topics covered by each organization during the fortnight. I managed to create a chart like this one but the flow isn't work.
So far, what I have done was it:
alluvial_data <- as.data.frame(FC_Outlets %>%select(Organization, Timeframe, Code))
alluvial_data <- alluvial_data %>% mutate(id = row_number())
#Remove duplicates
alluvial_data <- alluvial_data %>%
distinct(Organization, Timeframe, Code, .keep_all = TRUE)
# Convert Timeframe to Factor - Categorical Variable
alluvial_data$Timeframe <-as.factor(alluvial_data$Timeframe)
# Convert Code to String
alluvial_data$Code <-as.character(alluvial_data$Code)
library(RColorBrewer)
# Define the number of colors you want
nb.cols <- 10
mycolors <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
# Chart
ggplot(alluvial_data,
aes(x = Timeframe, stratum = Code, alluvium = id,
fill = Code, label = Code)) +
#scale_fill_brewer(type = "qual", palette = "Set2") +
scale_fill_manual(values = mycolors) +
geom_flow(stat = "alluvium", lode.guidance = "frontback",
color = "darkgray") +
geom_stratum() +
theme(legend.position = "bottom") +
ggtitle("Organizations")
Could you help me to identify why the alluvial chart is not working properly?
It is due to incorrect usage of aes in ggplot. The following code
c <- c(LETTERS[1:4], LETTERS[2:6], LETTERS[3:7], LETTERS[3:8])
t <- c(rep("Fortnight 1",4), rep("Fortnight 2",5), rep("Fortnight 3",5), rep("Fortnight 4",6))
s <- c(rep(c("Female","Male"),10))
ag <- c(2,3,4,6,11,13)
f <- rnorm(20,20,99)
df <- data.frame(Timeframe=t,Code=c,Sex=s,Freq=round(abs(f))) %>% mutate(Organization=ifelse((row_number() %in% ag), "Agencia2","Agencia1" ))
alluvial_data <- as.data.frame(df %>%select(Organization, Timeframe, Code, Freq, Sex))
alluvial_data <- alluvial_data %>% mutate(id = row_number())
#Remove duplicates
alluvial_data <- alluvial_data %>%
distinct(Organization, Timeframe, Code, Sex, .keep_all = TRUE)
#levels(alluvial_data$Timeframe)
# Convert Timeframe to Factor - Categorical Variable
alluvial_data$Timeframe <-as.factor(alluvial_data$Timeframe)
# Convert Code to String
alluvial_data$Code <-as.character(alluvial_data$Code)
library(RColorBrewer)
# Define the number of colors you want
nb.cols <- 10
mycolors <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
mycolor2 <- colorRampPalette(brewer.pal(2, "Set2"))(nb.cols)
# Chart
ggplot(alluvial_data,
aes(y = Freq, axis1 = Organization, axis2 = Timeframe, axis3 = Code,fill=Sex)) +
#scale_fill_brewer(type = "qual", palette = "Set2") +
scale_x_discrete(limits=c("Organization","Timeframe","Code"), expand=c(0.05,0.05)) +
scale_fill_manual(values = mycolors) +
geom_flow(stat = "alluvium", lode.guidance = "frontback" #, color="grey"
) +
geom_stratum(width = 1/4, fill = "cyan", color = "grey") +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
theme(legend.position = "bottom") +
ggtitle("Organizations") +
guides(fill=guide_legend(override.aes = list(color=mycolors[1:2])))+
labs(fill=NULL)
gives this output:

Organize scale of x axis of time series graph

Here I have data that looks like this:
# Data
df <- data.frame("Hospital" = c("Buge Hospital", "Buge Hospital", "Greta Hospital", "Greta Hospital",
"Makor Hospital", "Makor Hospital"),
"Period" = c("Jul-18","Aug-18", "Jul-19","Aug-19", "Jul-20","Aug-20"),
"Medical admissions" = c(12,56,0,40,5,56),
"Surgical admissions" = c(10,2,0,50,20,56),
"Inpatient admissions" = c(9,5,6,0,60,96))
Now this data has a column called period which is monthy data for different years, 2018,2019 and 2020
if I plot this data, here is how it looks
library(ggplot2
# Melt data into long format
df2 <- melt(data = df,
id.vars = c("Hospital","Period"),
measure.vars = names(df[3:5]))
# Stacked barplot
ggplot( df2, aes(x = Period, y = value, fill = variable, group = variable)) +
geom_bar(stat = "identity") +
theme(legend.position = "none") +
ggtitle(unique(df2$Hospital))+
scale_x_date(date_labels = %Y)+
labs(x = "Month", y = "Number of People", fill = "Type")
It plots well but the x axis is not organized in ascending order, I have tried to use scale_x_date function but still the plot is the same. What I want is months for the year 2018 to start, then followed with months for 2019 and 2020. I mean x axis to be organized in ascending order based on years like this
Aug-18, Jul-18, Aug-19,Jul-19, Aug-20,Jul-20.
To solve your issue, you need to convert your Period in a date format.
For example, you can use parse_date function from lubridate package:
library(lubridate)
library(tidyr)
library(dplyr)
df %>% mutate(Date = parse_date(as.character(Period), format = "%b-%y")) %>%
pivot_longer(cols = Medical.admissions:Inpatient.admissions, names_to = "Var", values_to = "Val")
# A tibble: 18 x 5
Hospital Period Date Var Val
<fct> <fct> <date> <chr> <dbl>
1 Buge Hospital Jul-18 2018-07-01 Medical.admissions 12
2 Buge Hospital Jul-18 2018-07-01 Surgical.admissions 10
3 Buge Hospital Jul-18 2018-07-01 Inpatient.admissions 9
4 Buge Hospital Aug-18 2018-08-01 Medical.admissions 56
5 Buge Hospital Aug-18 2018-08-01 Surgical.admissions 2
6 Buge Hospital Aug-18 2018-08-01 Inpatient.admissions 5
7 Greta Hospital Jul-19 2019-07-01 Medical.admissions 0
8 Greta Hospital Jul-19 2019-07-01 Surgical.admissions 0
9 Greta Hospital Jul-19 2019-07-01 Inpatient.admissions 6
10 Greta Hospital Aug-19 2019-08-01 Medical.admissions 40
11 Greta Hospital Aug-19 2019-08-01 Surgical.admissions 50
12 Greta Hospital Aug-19 2019-08-01 Inpatient.admissions 0
13 Makor Hospital Jul-20 2020-07-01 Medical.admissions 5
14 Makor Hospital Jul-20 2020-07-01 Surgical.admissions 20
15 Makor Hospital Jul-20 2020-07-01 Inpatient.admissions 60
16 Makor Hospital Aug-20 2020-08-01 Medical.admissions 56
17 Makor Hospital Aug-20 2020-08-01 Surgical.admissions 56
18 Makor Hospital Aug-20 2020-08-01 Inpatient.admissions 96
So, then, you can use scale_x_date to set appropriate labeling option on your x axis:
library(lubridate)
library(tidyr)
library(dplyr)
library(ggplot2)
df %>% mutate(Date = parse_date(as.character(Period), format = "%b-%y")) %>%
pivot_longer(cols = Medical.admissions:Inpatient.admissions, names_to = "Var", values_to = "Val") %>%
ggplot(aes(x = Date, y = Val, fill= Var, group = Var))+
geom_col()+
scale_x_date(date_breaks = "month", date_labels = "%b %Y")+
labs(x = "Month", y = "Number of People", fill = "Type")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Does it answer your question ?
EDIT: Using `lubridate v1.7.8
On lubridate version 1.7.8, parse_date does not exist anymore. You will have to replace it by parse_date_time as follow:
library(lubridate)
library(dplyr)
df %>% mutate(Date = ymd(parse_date_time2(as.character(Period), orders = "%b-%y"))) %>% ....

ggplot color bar for less than and great than Average

What I am trying to write is, if the Team on the X axis is less than Average on the X axis or in this case 17.62, color it different and same for Greater than. So Average on the X axis is black and then the ones above and below are different colors.
dput()Road.Team Goalie.Pts.at.Home.Vs.Road.TEAM
1 ANA 16.67692
2 ARI 23.20000
3 BOS 18.86667
4 BUF 23.14667
5 CAR 23.44615
6 CBJ 16.62857
7 CGY 17.33333
8 CHI 21.77143
9 COL 18.70769
10 DAL 18.70000
11 DET 20.32000
12 EDM 16.94118
13 FLA 23.68000
14 LAK 12.14118
15 MIN 17.49333
16 MTL 21.71429
17 NJD 11.64444
18 NSH 17.12000
19 NYI 15.55556
20 NYR 18.04444
21 OTT 13.31429
22 PHI 21.25714
23 PIT 20.28235
24 SJS 14.45714
25 STL 15.80000
26 TBL 12.21333
27 TOR 10.44444
28 VAN 14.40000
29 VGK 16.35000
30 WPG 19.41333
31 WSH 15.26154
32 Average 17.62340
rm(stats)
stats <- read.csv("HomeGoaliesPtsforVisitingTeam.csv")
stats
library(tidyverse)
p <- stats %>%
gather(key, value, -Road.Team) %>%
ggplot(aes(x=Road.Team, y=value)) +
geom_col(position = "dodge")
p + labs(y = "FanDuel Goalie Pts",x = "Road Team",title = "Points for Home Goalies Vs Road Team")
Create a logical variable in your stats data frame and pass that to the fill argument in aes(). Something like:
stats$compare <- stats$y <= mean(stats$y)
p <- stats %>%
gather(key, value, -Road.Team) %>%
ggplot(aes(x=Road.Team, y=value, fill=compare)) +
geom_col(position = "dodge")

Ordering a 2 bar plot in R

I have a data set as below and I have created a graph with below code as suggested in a previous question. What I want to do is order the bars by rankings rather than team names. Is that possible to do in ggplot?
Team Names PLRankingsReverse Grreserve
Liverpool 20 20
Chelsea 19 19
Manchester City 15 18
Arsenal 16 17
Tottenham 18 16
Manchester United 8 15
Everton 10 14
Watford 13 13
Burnley 17 12
Southampton 9 11
WBA 11 10
Stoke 4 9
Bournemouth 12 8
Leicester 7 7
Middlesbrough 14 6
C. Palace 6 5
West Ham 1 4
Hull 3 3
Swansea 5 2
Sunderland 2 1
And here is the code:
alldata <- read.csv("premierleague.csv")
library(ggplot2)
library(reshape2)
alldata <- melt(alldata)
ggplot(alldata, aes(x = Team.Names, y= value, fill = variable), xlab="Team Names") +
geom_bar(stat="identity", width=.5, position = "dodge")
Thanks for the help!
In this case you need to sort your data frame prior to melting and capture the order. You can then use this to set the limit order on scale_x_discrete, or you can factor Team Name in your aes string.
Using factor:
ordr <- order(alldata$`Team Names`, alldata$PLRankingsReverse, decreasing = TRUE)
alldata <- melt(alldata)
ggplot(alldata, aes(x = factor(`Team Name`, ordr), y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width = .5, position = "dodge")
Using scale_x_discrete:
ordr <- alldata$`Team Name`[order(alldata$PLRankingsReverse, decreasing = TRUE)]
alldata <- melt(alldata)
ggplot(alldata, aes(x = `Team Name`, y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width =. 5, position = "dodge") +
scale_x_discrete(limits = ordr)

Resources