ggplot, scale_colour_manual issues - r

Can anyone spot what is wrong in this code?
a <- c("Afghanistan"="darkgreen","Iraq"="red" ,"Mali"="green", "Nigeria"="purple","Senegal"="orange")
ggplot(data = full) + scale_colour_manual(values=a) +
geom_point(aes(x=Afghanistan_GDPpC, y=Afghanistan_AS), colour = "Afghanistan") +
geom_smooth(aes(x=Afghanistan_GDPpC, y=Afghanistan_AS), colour = "Afghanistan", method = "lm") +
geom_point(aes(x=Iraq_GDPpC, y=Iraq_AS), colour = "Iraq") +
geom_smooth(aes(x=Iraq_GDPpC, y=Iraq_AS), colour = "Iraq", method = "lm") +
geom_point(aes(x=Mali_GDPpC, y=Mali_AS), colour = "Mali") +
geom_smooth(aes(x=Mali_GDPpC, y=Mali_AS), colour = "Mali", method = "lm") +
geom_point(aes(x=Nigeria_GDPpC, y=Nigeria_AS), colour = "Nigeria") +
geom_smooth(aes(x=Nigeria_GDPpC, y=Nigeria_AS), colour = "Nigeria", method = "lm") +
geom_point(aes(x=Senegal_GDPpC, y=Senegal_AS), colour = "Senegal") +
geom_smooth(aes(x=Senegal_GDPpC, y=Senegal_AS), colour = "Senegal", method = "lm") +
labs (x = "Log - GDP per Capita", y = "Log - Asylum Applications - First Time", colour = "Legend") +
theme_classic()
This is the message I keep getting:
Error: Unknown colour name: Afghanistan
Here is the dataset: https://drive.google.com/file/d/1j5I6odeWxaAiJlc7dHtD-Qj42xuP-gMs/view?usp=sharing

I advise you to look at how ggplot and "grammar of graphics" works (here for example: https://ramnathv.github.io/pycon2014-r/visualize/ggplot2.html).
So first you need to reshape your data to meet the requirements of ggplot:
full <- full %>% pivot_longer(cols = ends_with(c("AS","GDPpC")),
names_to = c("country", ".value"),
names_sep="_") %>%
rename("year" = "X1")
The resulting tibble:
# A tibble: 50 x 4
year country AS GDPpC
<dbl> <chr> <dbl> <dbl>
1 2011 Mali 8.29 6.73
2 2011 Nigeria 9.32 7.82
3 2011 Senegal 7.54 7.22
4 2011 Afghanistan 9.94 6.38
5 2011 Iraq 9.43 8.71
6 2012 Mali 7.75 6.66
7 2012 Nigeria 8.56 7.91
8 2012 Senegal 7.70 7.18
9 2012 Afghanistan 9.90 6.46
10 2012 Iraq 9.30 8.83
# ... with 40 more rows
Then you can use the ggplot correctly:
ggplot(data = full, mapping = aes(x = GDPpC, y = AS, col = country))+
geom_point()+
scale_color_manual(values = c("Afghanistan"="darkgreen","Iraq"="red" ,"Mali"="green", "Nigeria"="purple","Senegal"="orange"))+
geom_smooth(method = "lm")+
labs (x = "Log - GDP per Capita", y = "Log - Asylum Applications - First Time", colour = "Legend") +
theme_classic()

Related

Get the proportions in ggplot2 (R) bar charts

Can someone provide me some hints as to what I am doing wrong in my code? Or what I need to correct to get the correct percentages? I am trying to get the proportions by manipulating my ggplot2 code. I would prefer not mutating a column. However, if I can't get ggplot2 to give me the correct proportions, I will then be open to adding columns.
Here is the reproduceable data:
cat_type<-c("1", "1","2","3","1","3", "3","2","1","1","1","3","3","2","3","2","3","1","3","3","3","1","3","1","3","1","1","3","1")
country<-c("India","India","India","India","India","India","India","India","India","India","Indonesia","Russia","Indonesia","Russia","Russia","Indonesia","Indonesia","Indonesia","Indonesia","Russia","Indonesia","Russia","Indonesia","Indonesia","Russia", "Russia", "India","India","India")
bigcats<-data.frame(cat_type=cat_type,country=country)
My data gives me the following proportions (these are correct):
> table(bigcats$cat_type, bigcats$country) ## raw numbers
India Indonesia Russia
1 7 3 2
2 2 1 1
3 4 5 4
>
> 100*round(prop.table(table(bigcats$cat_type, bigcats$country),2),3) ## proportions by column total
India Indonesia Russia
1 53.8 33.3 28.6
2 15.4 11.1 14.3
3 30.8 55.6 57.1
However, my ggplot2 is giving me the incorrect proportions:
bigcats %>% ggplot(aes(x=country, y = prop.table(stat(count)), fill=cat_type, label = scales::percent(prop.table(stat(count)))))+
geom_bar(position = position_fill())+
geom_text(stat = "count", position = position_fill(vjust=0.5),colour = "white", size = 5)+
labs(y="Percent",title="Top Big Cat Populations",x="Country")+
scale_fill_discrete(name=NULL,labels=c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar"))+
scale_y_continuous(labels = scales::percent)
The issue is that using prop.table(stat(count)) will not compute the proportions by categories or your countries, i.e. you do:
library(dplyr)
bigcats %>%
count(cat_type, country) %>%
mutate(pct = scales::percent(prop.table(n)))
#> cat_type country n pct
#> 1 1 India 7 24.1%
#> 2 1 Indonesia 3 10.3%
#> 3 1 Russia 2 6.9%
#> 4 2 India 2 6.9%
#> 5 2 Indonesia 1 3.4%
#> 6 2 Russia 1 3.4%
#> 7 3 India 4 13.8%
#> 8 3 Indonesia 5 17.2%
#> 9 3 Russia 4 13.8%
Making use of a helper function to reduce code duplication you could compute your desired proportions like so:
library(ggplot2)
prop <- function(count, group) {
count / tapply(count, group, sum)[group]
}
ggplot(bigcats, aes(
x = country, y = prop(after_stat(count), after_stat(x)),
fill = cat_type, label = scales::percent(prop(after_stat(count), after_stat(x)))
)) +
geom_bar(position = position_fill()) +
geom_text(stat = "count", position = position_fill(vjust = 0.5), colour = "white", size = 5) +
labs(y = "Percent", title = "Top Big Cat Populations", x = "Country") +
scale_fill_discrete(name = NULL, labels = c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar")) +
scale_y_continuous(labels = scales::percent)
Created on 2021-07-28 by the reprex package (v2.0.0)

Graph to visualize mean group wise and pareto chart in R language

I have a dataset which has regions of a country, states and sales in that state. I want to visualize the mean of that dataset region wise and also a pareto chart to know which state contributes more to the overall regional sales. How to do this in R language. Please help as I'm new to R
#dput for dataset
Region <- c('South','South','South','South','South','Central','Central','Central','North','North','North','North','East','East','East','East','West','West','West','West')
State <- c('TAMIL NADU', 'TELANGANA,'ANDHRA PRADESH','KARNATAKA,'KERALA','MADHYA PRADESH','ORISSA','CHATTISGARH','DELHI','UTTARAKHAND','HARYANA','PUNJAB','ASSAM','MIZORAM','WB','BIHAR','GUJARAT','RAJASTHAN','MAHARASHTRA','GOA')
sales <- C(89,109,92,56,43,103,26,41,126,56,64,98,26,16,61,40,61,101,191,38)
The dataset somewhat looks like this
Region
State
Gdp
South
Tamil Nadu
89
South
Telangana
109
South
Karnataka
92
South
Andhra Pradesh
56
South
Kerala
43
Central
Madhya Pradesh
103
Central
Chattisgarh
26
Central
Orissa
41
North
Delhi
126
North
Punjab
56
North
Haryana
64
North
Uttarakhand
98
East
Assam
26
East
Mizoram
16
East
West Bengal
61
East
Bihar
40
West
Gujarat
61
West
Rajasthan
101
West
Maharashtra
191
West
Goa
38
You did not provide a desired output, so here is my guess at it..
library(data.table)
library(ggplot2)
# setDT(DT) #not needed if your data is already in data.table format
# Order decreasing Gdp
setorder(DT, -Gdp)
# Data wrangling
DT[, `:=`(meanGdp_region = mean(Gdp),
cumGdp = cumsum(Gdp)), by = Region]
DT[, State_f := factor(State, levels = State)]
# Plot
ggplot(data = DT, aes(x = State_f)) +
geom_col(aes(y = Gdp)) +
geom_line(aes(y = cumGdp, group = 1), color = "red") +
geom_hline(aes(yintercept = meanGdp_region), color = "blue") +
facet_wrap(~Region, nrow = 1, scales = "free_x") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
labs(x = "")
sample data used
# Sample data
DT <- fread("Region State Gdp
South Tamil Nadu 89
South Telangana 109
South Karnataka 92
South Andhra Pradesh 56
South Kerala 43
Central Madhya Pradesh 103
Central Chattisgarh 26
Central Orissa 41
North Delhi 126
North Punjab 56
North Haryana 64
North Uttarakhand 98
East Assam 26
East Mizoram 16
East West Bengal 61
East Bihar 40
West Gujarat 61
West Rajasthan 101
West Maharashtra 191
West Goa 38")
Another output guess:
Region <- c('South','South','South','South','South','Central','Central','Central','North','North','North','North','East','East','East','East','West','West','West','West')
State <- c('TAMIL NADU', 'TELANGANA','ANDHRA PRADESH','KARNATAKA','KERALA','MADHYA PRADESH','ORISSA','CHATTISGARH','DELHI','UTTARAKHAND','HARYANA','PUNJAB','ASSAM','MIZORAM','WB','BIHAR','GUJARAT','RAJASTHAN','MAHARASHTRA','GOA')
sales <- c(89,109,92,56,43,103,26,41,126,56,64,98,26,16,61,40,61,101,191,38)
df <- data.frame(Region, State, sales)
df2 <- df %>%
arrange(desc(sales)) %>%
mutate(State = factor(State)) %>%
mutate(cumulative = cumsum(sales)) %>%
mutate(State = fct_inorder(df$State))
ggplot(df2, aes(x=State)) +
geom_bar(aes(y=sales), fill='blue', stat="identity") +
geom_point(aes(y=cumulative), color = rgb(0, 1, 0), pch=16, size=1) +
geom_path(aes(y=cumulative, group=1), colour="slateblue1", lty=3, size=0.9) +
theme(axis.text.x = element_text(angle=90, vjust=0.6)) +
labs(title = "Pareto Plot", x = 'State', y = 'Count')
it's great that you want to explore R. I found few mistakes, these vectors will not work, you forgot to put ' in few places and you should use c instead of C (in the code I grouped by colour States in diff. way compared to previous answer - hope you can choose what works for you).
library(ggplot2)
Region <- c('South','South','South','South','South','Central','Central','Central','North','North','North','North','East','East','East','East','West','West','West','West')
State <- c('TAMIL NADU', 'TELANGANA','ANDHRA PRADESH','KARNATAKA','KERALA','MADHYA PRADESH','ORISSA','CHATTISGARH','DELHI','UTTARAKHAND','HARYANA','PUNJAB','ASSAM','MIZORAM','WB','BIHAR','GUJARAT','RAJASTHAN','MAHARASHTRA','GOA')
sales <- c(89,109,92,56,43,103,26,41,126,56,64,98,26,16,61,40,61,101,191,38)
myDf <- data.frame(Region, State, sales, stringsAsFactors = FALSE)
str(myDf)
myDf <- myDf\[order(myDf$sales, decreasing=TRUE), \]
myDf$State <- factor(myDf$State , levels=myDf$State)
myDf$cumulative <- cumsum(myDf$sales)
ggplot(myDf, aes(x = State)) +
geom_bar(aes(y = sales, fill = Region), stat = "identity") +
geom_point(aes(y = cumulative), color = rgb(0, 1, 0), pch = 16, size = 1) +
geom_path(aes(y = cumulative, group = 1), colour = "slateblue1", lty = 3, size = 0.9) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.6)) +
labs(title = "Pareto Plot", x = 'States', y = 'Sales')]

Geom_Text removes geom_bar in ggplot2 R

I have the following R code for a stacked bar chart:
p <- ggplot(df, aes(x = Datum, y = anzahl_tn, fill = ZUSAMMENFASSUNG))+
geom_bar(stat="identity", color='black') +
#geom_text(aes(label = paste0(round(100*percolumn),"%"), y = pos),size = 3)+
scale_fill_manual(values = c("black","#DD1E0D","#003087","#6fa554","#7DABFF","#d6d6d6"))
ggplotly(p)
The result is this:
enter image description here
I would like to add labels via geom_text. However, as soon as I do so, the bars vanish:
p <- ggplot(df, aes(x = Datum, y = anzahl_tn, fill = ZUSAMMENFASSUNG))+
geom_bar(stat="identity", color='black') +
geom_text(aes(label = paste0(round(100*percolumn),"%"), y = pos),size = 3)+
scale_fill_manual(values = c("black","#DD1E0D","#003087","#6fa554","#7DABFF","#d6d6d6"))
ggplotly(p)
enter image description here
Data:
Datum ZUSAMMENFASSUNG anzahl_tn percolumn pos
<date> <fct> <dbl> <dbl> <dbl>
1 2020-10-01 A 9548 0.258 2745326
2 2020-10-01 B 8213 0.222 2040286.
3 2020-10-01 C 5887 0.159 1404390.
4 2020-10-01 D 4192 0.113 932105
5 2020-10-01 E 5043 0.136 525418.
6 2020-10-01 F 4106 0.111 194945
7 2020-11-01 A 10603 0.267 3082634.
8 2020-11-01 B 9235 0.233 2099054.
9 2020-11-01 C 6108 0.154 1452656
10 2020-11-01 D 4380 0.110 1009419
Any idea what causes this?

in R, ggplot geom_point() with colors based on specific, discrete values - part 2

My question is similar to this one, except that my data are different. In my case, I was not able to use the solution given. I would expect points to show up on my map coloured according to the cut() values. Could someone point me in the right direction?
> test
# A tibble: 10 × 5
TC1 TC2 Lat Long Country
<dbl> <dbl> <dbl> <dbl> <fctr>
1 2.9 2678.0 50.62980 -95.60953 Canada
2 1775.7 5639.9 -31.81889 123.19389 Australia
3 4.4 5685.6 -10.10449 38.54364 Tanzania
4 7.9 NA 54.81822 -99.91685 Canada
5 11.2 2443.0 7.71667 -7.91667 Cote d'Ivoire
6 112.1 4233.4 -17.35093 128.02609 Australia
7 4.4 114.6 45.21361 -67.31583 Canada
8 8303.5 4499.9 46.63626 -81.39866 Canada
9 100334.8 2404.5 46.67291 -93.11937 USA
10 NA 1422.9 -17.32921 31.28224 Zimbabwe
ggplot(data = test, aes(x= Long, y= Lat)) +
borders("world", fill="gray75", colour="gray75", ylim = c(-60, 60)) +
geom_point(aes(size=TC2, col=cut(TC1, c(-Inf, 1000, 5000, 50000, Inf)))) +
# scale_colour_gradient(limits=c(100, 1000000), low="yellow", high="red") +
scale_color_manual(name = "TC1",
values = c("(-Inf,1000]" = "green",
"(1000,5000]" = "yellow",
"(5000,50000]" = "orange",
"(50000, Inf]" = "red"),
labels = c("up to 1", "1 to 5", "5 to 50", "greater than 50")) +
theme(legend.position = "right") +
coord_quickmap()
Warning message:
Removed 10 rows containing missing values (geom_point).
You were almost there! It's just the names of the 'cut' factors that are incorrect. If you try:
cut(test$TC1, c(-Inf, 1000, 5000, 50000, Inf))
# [1] (-Inf,1e+03] (1e+03,5e+03] (-Inf,1e+03] (-Inf,1e+03] (-Inf,1e+03]
# [6] (-Inf,1e+03] (-Inf,1e+03] (5e+03,5e+04] (5e+04, Inf] <NA>
# Levels: (-Inf,1e+03] (1e+03,5e+03] (5e+03,5e+04] (5e+04, Inf]
As you see the names of the levels are a bit different from what you are typing.
library(ggplot2)
ggplot(data = test, aes(x = Long, y = Lat)) +
borders("world", fill="gray75", colour="gray75", ylim = c(-60, 60)) +
geom_point(aes(size=TC2, color = cut(TC1, c(-Inf, 1000, 5000, 50000, Inf)))) +
scale_color_manual(name = "TC1",
values = c("(-Inf,1e+03]" = "green",
"(1e+03,5e+03]" = "yellow",
"(5e+03,5e+04]" = "orange",
"(5e+04, Inf]" = "red"),
labels = c("up to 1", "1 to 5", "5 to 50", "greater than 50")) +
theme(legend.position = "right") +
coord_quickmap()
#> Warning: Removed 2 rows containing missing values (geom_point).
Data:
test <- read.table(text = 'TC1 TC2 Lat Long Country
1 2.9 2678.0 50.62980 -95.60953 Canada
2 1775.7 5639.9 -31.81889 123.19389 Australia
3 4.4 5685.6 -10.10449 38.54364 Tanzania
4 7.9 NA 54.81822 -99.91685 Canada
5 11.2 2443.0 7.71667 -7.91667 "Cote d\'Ivoire"
6 112.1 4233.4 -17.35093 128.02609 Australia
7 4.4 114.6 45.21361 -67.31583 Canada
8 8303.5 4499.9 46.63626 -81.39866 Canada
9 100334.8 2404.5 46.67291 -93.11937 USA
10 NA 1422.9 -17.32921 31.28224 Zimbabwe', header = T)

Ordering a 2 bar plot in R

I have a data set as below and I have created a graph with below code as suggested in a previous question. What I want to do is order the bars by rankings rather than team names. Is that possible to do in ggplot?
Team Names PLRankingsReverse Grreserve
Liverpool 20 20
Chelsea 19 19
Manchester City 15 18
Arsenal 16 17
Tottenham 18 16
Manchester United 8 15
Everton 10 14
Watford 13 13
Burnley 17 12
Southampton 9 11
WBA 11 10
Stoke 4 9
Bournemouth 12 8
Leicester 7 7
Middlesbrough 14 6
C. Palace 6 5
West Ham 1 4
Hull 3 3
Swansea 5 2
Sunderland 2 1
And here is the code:
alldata <- read.csv("premierleague.csv")
library(ggplot2)
library(reshape2)
alldata <- melt(alldata)
ggplot(alldata, aes(x = Team.Names, y= value, fill = variable), xlab="Team Names") +
geom_bar(stat="identity", width=.5, position = "dodge")
Thanks for the help!
In this case you need to sort your data frame prior to melting and capture the order. You can then use this to set the limit order on scale_x_discrete, or you can factor Team Name in your aes string.
Using factor:
ordr <- order(alldata$`Team Names`, alldata$PLRankingsReverse, decreasing = TRUE)
alldata <- melt(alldata)
ggplot(alldata, aes(x = factor(`Team Name`, ordr), y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width = .5, position = "dodge")
Using scale_x_discrete:
ordr <- alldata$`Team Name`[order(alldata$PLRankingsReverse, decreasing = TRUE)]
alldata <- melt(alldata)
ggplot(alldata, aes(x = `Team Name`, y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width =. 5, position = "dodge") +
scale_x_discrete(limits = ordr)

Resources