How to use case_when for factor names - r

SA = c("Argentina", "Bolivia", "Brazil", "Chile", "Colombia", "Ecuador", "Paraguay", "Peru", "Uruguay", "Venezuela")
AF1 = gapminder %>%
mutate(
country,
continent == case_when(
country == SA ~ "South America",
TRUE ~ as.character(continent)
)
)
I am trying to rename the country in SA to South America, but it does not work.

I think I understand what you're looking for. I'm not sure why 'country' is in the mutate because you're not actually changing it. For the continent, you are looking to see if the value is in SA, not equal to SA. Does this work?
SA = c("Argentina", "Bolivia", "Brazil", "Chile", "Colombia", "Ecuador", "Paraguay", "Peru", "Uruguay", "Venezuela")
AF1 = gapminder %>%
mutate(
continent = case_when(
country %in% SA ~ "South America",
TRUE ~ continent
)
)

Related

How to aggregate data by time series and create a square matrix?

I am trying to plot a dendrogram according to the time series. I should base it on 27x27 matrix (country names in columns) and am only allowed to use R standard package, dplyr and ggplot2 libraries. I used to aggregate the data by mean but it turned out the plot below wasn't done this way. I'm struggling to understand how I am supposed to obtain one value for each country another way.
I should obtain such plot:
goal dendrogram
library("eurostat")
library("ggplot2")
library("dplyr")
d <- get_eurostat("prc_hicp_manr") %>%
filter(coicop == "CP00")
d2 <- label_eurostat(d) %>%
filter(time %in% as.Date("2000-02-01"):as.Date("2022-09-30"),
geo %in% c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Chechia", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary",
"Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia",
"Spain", "Sweden"))
d3 <- d2[,c("geo","values")]
d3
#m <- matrix(?)
dist <- dist(m, method = "minkowski", p = 1.5)
d3_hclust <- hclust(dist, method = "complete")
d3_dend <- as.dendrogram(d3_hclust)
plot(d3_dend)

Removing redundant areas from a map in R (shapefile)

I plotted a few European countries on a map, but there are some outliers which I don't need. I tried to remove them from my spatial df using different ways suggested in similar questions but they didn't work for this case. Could you please give me your ideas on removing them? I appreciate it. The shape file is available here
EDIT: I need to remove these areas not only from the map, but also from the spatial data frame.
library(rgdal)
library(raster)
myCountries <- c("Austria", "Belgium", "Czech Republic", "Denmark", "Estonia", "Finland",
"France", "Germany", "Latvia", "Hungary", "Iceland", "Ireland", "Italy",
"Netherlands", "Norway", "Portugal", "Poland", "Spain", "Sweden", "Switzerland",
"Turkey", "United Kingdom")
countries <- readOGR('ne_110m_admin_0_countries.shp')
eurcountries <- countries[countries$NAME_EN %in% myCountries ,]
eurcountries2<-spTransform(eurcountries, CRS("+proj=longlat +datum=NAD83"))
plot(eurcountries2)
Here is how you can do that with terra (the replacement for raster):
myCountries <- c("Austria", "Belgium", "Czech Republic", "Denmark", "Estonia", "Finland",
"France", "Germany", "Latvia", "Hungary", "Iceland", "Ireland", "Italy",
"Netherlands", "Norway", "Portugal", "Poland", "Spain", "Sweden", "Switzerland",
"Turkey", "United Kingdom")
library(terra)
countries <- vect('ne_110m_admin_0_countries.shp')
eur <- countries[countries$NAME_EN %in% myCountries ,]
e <- ext(c(-28, 48, 35, 76)))
x <- crop(eur, e)
plot(x, "NAME_EN")
You can interactively find the extent you need for cropping by doing
plot(eur)
e <- draw()
# now click on the map twice
Or subset interactively, like this:
d <- disagg(eur)
plot(d)
s <- sel(d) # now draw a bounding box on the plot
a <- aggregate(s, "NAME_EN")
plot(a, "NAME_EN")
And you can coerce the SpatVector objects to sp or sf types like this:
sf <- sf::st_as_sf(x)
library(raster)
sp <- as(x, "Spatial")
Or vice versa with:
y <- vect(sf)
Instead of using the SP package, I find the SF package is better as it plays well with ggplot2. Then limiting the canvas is straightforward and adds the ability to colour the countries.
library(rgdal)
library(ggplot2)
myCountries <- c("Austria", "Belgium", "Czech Republic", "Denmark", "Estonia", "Finland",
"France", "Germany", "Latvia", "Hungary", "Iceland", "Ireland", "Italy",
"Netherlands", "Norway", "Portugal", "Poland", "Spain", "Sweden", "Switzerland",
"Turkey", "United Kingdom")
countries <- readOGR("C:/R/projects/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp")
eurcountries <- countries[countries$NAME_EN %in% myCountries, ]
eurcountries3 <- sf::st_as_sf(eurcountries)
ggplot(eurcountries3) +
geom_sf(aes(fill = ADMIN)) +
lims(x = c(50,-40), y = c(30, 74)) +
guides(fill = "none") +
theme_void()

Why does the Shiny-app not react to user input

I am trying to build a web app in shiny that would allow for different user input and then plot graphs/output data tables accordingly. I am using WHO's data about suicide rates and there are two possible types of graphs: bar plot and line graph.
The user is given a choice between plotting the graph in which the x axis is either the age group (barplot) or year (line graph). They are also given the choice of plotting the graph separately for males and females and different countries as well.
The code below works fine for everything except when the user chooses x axis = year with gender = 'gender neutral'. The error says that the object rate is not found. However, the block of code which includes the object rate works perfectly fine in other places.
library(shiny)
library(dplyr)
library(ggplot2)
setwd("C:\\Users\\Lenovoi7\\Shrewsbury School\\IT\\Coursework")
who<-data.frame(read.csv("who.csv", stringsAsFactors = TRUE))
dput(head(who))
countries<-sort(unique(who$country))
countries<-union(countries, c("World"))
ui<-fluidPage(
titlePanel("Suicide statistics"),
sidebarLayout(
sidebarPanel(
selectInput(
inputId="x",
label="Please choose the x variable",
choices=c("",
"Age group"="age",
"Year"="year")),
conditionalPanel(
condition = "input.x == 'age' || input.x == 'year'",
selectInput(
inputId = "gender",
label = "Please specify the gender characteristics",
choices = c("", "Gender neutral" = "gender_neutral",
"Gender specific" = "gender_specific"),
selected = NULL),
#nested conditional panel
#only show this panel if the input is gender_specific
conditionalPanel(
condition = "input.gender == 'gender_specific'",
selectInput(
inputId = "country",
label = "Select a country:",
choices = countries,
selected = "Bosnia and Herzegovina")),
conditionalPanel(
condition = "input.gender == 'gender_neutral'",
selectInput(
inputId = "country",
label = "Select a country:",
choices = countries,
selected = "Bosnia and Herzegovina")))),
mainPanel(
plotOutput("graph")
)))
server <- function(input, output) {
x<-reactive({input$x})
gender<-reactive({input$gender})
country<-reactive({input$country})
output$graph <- renderPlot(
#x axis = age group
if (x()=="age"){
if (gender()=="gender_neutral"){
if (country()=="World"){
ggplot(data=who, aes(x=age)) + geom_bar(aes(weights=suicides_no), position="dodge")}
else {
#create a new subset of data that will be used??
who_subset<-subset(who, country == input$country)
ggplot(data=who_subset, aes(x=age)) + geom_bar(aes(weights=suicides_no))}}
else if (gender()=="gender_specific"){
if (country()=="World"){
ggplot(data=who, aes(x=age)) + geom_bar(aes(weights=suicides_no, fill=sex), position="dodge")}
else {
#create a new subset of data that will be used??
who_subset<-subset(who, country==input$country)
ggplot(data=who_subset, aes(x=age)) + geom_bar(aes(weights=suicides_no, fill=sex), position="dodge")}}}
else if (x()=="year"){
if (gender()=="gender_neutral"){
if (country()=="World"){
who_all <- who %>%
group_by(year) %>%
summarize(suicides_no = sum(suicides_no),
population = sum(population)) %>%
mutate(rate = 100000 * suicides_no/population)
ggplot() +
geom_line(data = who_all, aes(year, rate))
}
else {
who_subset<-subset(who, country==input$country)
who_sub_sex <- who_subset %>%
group_by(year) %>%
summarize(suicides_no = sum(suicides_no),
population = sum(population)) %>%
mutate(rate = 100000 * suicides_no/population)
ggplot() +
geom_line(data = who_subset, aes(year, rate))
}}
else if (gender()=="gender_specific"){
if (country()=="World"){
who_all <- who %>%
group_by(year) %>%
summarize(suicides_no = sum(suicides_no),
population = sum(population)) %>%
mutate(rate = 100000 * suicides_no/population)
ggplot() +
geom_line(data = who_all, aes(year, rate))
}
else {
#create a new subset of data that will be used??
who_subset<-subset(who, country==input$country)
who_sub_sex <- who_subset %>%
group_by(year, sex) %>%
summarize(suicides_no = sum(suicides_no),
population = sum(population)) %>%
mutate(rate = 100000 * suicides_no / population)
ggplot() +
geom_line(data = who_sub_sex, aes(year, rate, color = sex))}
}
}
)}
# Create a Shiny app object
shinyApp(ui = ui, server = server)
dput(head(who))
structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
.Label = c("Albania",
"Anguilla", "Antigua and Barbuda", "Argentina", "Armenia", "Aruba",
"Australia", "Austria", "Azerbaijan", "Bahamas", "Bahrain", "Barbados",
"Belarus", "Belgium", "Belize", "Bermuda", "Bolivia",
"Bosnia and Herzegovina",
"Brazil", "British Virgin Islands", "Brunei Darussalam", "Bulgaria",
"Cabo Verde", "Canada", "Cayman Islands", "Chile", "Colombia",
"Costa Rica", "Croatia", "Cuba", "Cyprus", "Czech Republic",
"Denmark", "Dominica", "Dominican Republic", "Ecuador", "Egypt",
"El Salvador", "Estonia", "Falkland Islands (Malvinas)", "Fiji",
"Finland", "France", "French Guiana", "Georgia", "Germany", "Greece",
"Grenada", "Guadeloupe", "Guatemala", "Guyana", "Haiti", "Honduras",
"Hong Kong SAR", "Hungary", "Iceland", "Iran (Islamic Rep of)",
"Iraq", "Ireland", "Israel", "Italy", "Jamaica", "Japan", "Jordan",
"Kazakhstan", "Kiribati", "Kuwait", "Kyrgyzstan", "Latvia", "Lithuania",
"Luxembourg", "Macau", "Malaysia", "Maldives", "Malta", "Martinique",
"Mauritius", "Mayotte", "Mexico", "Monaco", "Mongolia", "Montenegro",
"Montserrat", "Morocco", "Netherlands", "Netherlands Antilles",
"New Zealand", "Nicaragua", "Norway", "Occupied Palestinian Territory",
"Oman", "Panama", "Paraguay", "Peru", "Philippines", "Poland",
"Portugal", "Puerto Rico", "Qatar", "Republic of Korea",
"Republic of Moldova",
"Reunion", "Rodrigues", "Romania", "Russian Federation",
"Saint Kitts and Nevis",
"Saint Lucia", "Saint Pierre and Miquelon",
"Saint Vincent and Grenadines",
"San Marino", "Sao Tome and Principe", "Saudi Arabia", "Serbia",
"Seychelles", "Singapore", "Slovakia", "Slovenia", "South Africa",
"Spain", "Sri Lanka", "Suriname", "Sweden", "Switzerland",
"Syrian Arab Republic",
"Tajikistan", "TFYR Macedonia", "Thailand", "Trinidad and Tobago",
"Tunisia", "Turkey", "Turkmenistan", "Turks and Caicos Islands",
"Ukraine", "United Arab Emirates", "United Kingdom",
"United States of America",
"Uruguay", "Uzbekistan", "Venezuela (Bolivarian Republic of)",
"Virgin Islands (USA)", "Zimbabwe"), class = "factor"),
year = c(1985L, 1985L, 1985L, 1985L, 1985L, 1985L),
sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L),.
Label = c("female", "male"), class = "factor"),
age = structure(1:6, .Label = c("15-24 years", "25-34 years",
"35-54 years", "5-14 years", "55-74 years", "75+ years"),
class = "factor"),
suicides_no = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_), population = c(277900L, 246800L,
267500L, 298300L, 138700L, 34200L)),
row.names = c(NA, 6L), class = "data.frame")
Is there any chance somebody knows a way out of this problem? Again I want the web app to output line graph when the user chooses x axis = year and gender = gender_neutral.
Try out with this server code.
The changes are already described in my comments. Since I dont have the who data.frame I could not test it.
server <- function(input, output) {
output$graph <- renderPlot({
if (input$x == "age") {
if (input$gender=="gender_neutral"){
if (input$country=="World"){
ggplot(data = who, aes(x = age)) + geom_bar(aes(weights = suicides_no), position="dodge")}
else {
#create a new subset of data that will be used??
who_subset <- subset(who, country == input$country)
ggplot(data=who_subset, aes(x=age)) + geom_bar(aes(weights=suicides_no))
}
} else if (input$gender=="gender_specific") {
if (input$country=="World"){
ggplot(data=who, aes(x=age)) + geom_bar(aes(weights=suicides_no, fill=sex), position="dodge")}
else {
#create a new subset of data that will be used??
who_subset <- subset(who, country==input$country)
ggplot(data = who_subset, aes(x=age)) + geom_bar(aes(weights=suicides_no, fill=sex), position="dodge")
}
}
} else if (input$x=="year"){
if (input$gender=="gender_neutral"){
if (input$country=="World"){
who_all <- who %>%
group_by(year) %>%
summarize(suicides_no = sum(suicides_no),
population = sum(population)) %>%
mutate(rate = 100000 * suicides_no/population)
ggplot() +
geom_line(data = who_all, aes(year, rate))
} else {
who_subset <- subset(who, country==input$country)
who_sub_sex <- who_subset %>%
group_by(year) %>%
summarize(suicides_no = sum(suicides_no),
population = sum(population)) %>%
mutate(rate = 100000 * suicides_no/population)
ggplot() +
geom_line(data = who_sub_sex, aes(year, rate))
}
} else if (input$gender=="gender_specific"){
if (input$country=="World"){
who_all <- who %>%
group_by(year) %>%
summarize(suicides_no = sum(suicides_no),
population = sum(population)) %>%
mutate(rate = 100000 * suicides_no/population)
ggplot() +
geom_line(data = who_all, aes(year, rate))
} else {
#create a new subset of data that will be used??
who_subset <- subset(who, country==input$country)
who_sub_sex <- who_subset %>%
group_by(year, sex) %>%
summarize(suicides_no = sum(suicides_no),
population = sum(population)) %>%
mutate(rate = 100000 * suicides_no / population)
ggplot() +
geom_line(data = who_sub_sex, aes(year, rate, color = sex))}
}
}
})
}

Ploting Countries in R Maps Library : Gray boundaries with FILL=TRUE

I am trying to plot countries in R Maps. However, when I use FILL=TRUE all my countries boundaries are drawn in black. I want them in Gray. This is my code:
library(maps) # Provides functions that let us plot the maps
library(mapdata) # Contains the hi-resolution points that mark out the Cnt
countries=c("Argentina","Armenia","Australia","Bahrain","Belgium","Botswana","Bulgaria","Canada", "Chile","Tawain", "Croatia","Cyprus", "Czech Republic", "Denmark","Egypt","UK:Great Britain","Finland", "France", "Georgia", "Germany", "China:Hong Kong", "Hungary", "Indonesia", "Iran", "Ireland", "Israel", "Italy", "Japan", "Jordan", "Kazakhstan", "Korea", "Kuwait", "Lebanon", "Lithuania", "Malaysia", "Malta", "Morocco", "Netherlands", "New Zealand", "UK:Northern Ireland", "Norway", "Oman", "Palestine", "Poland", "Portugal", "Qatar", "Russia", "Saudi Arabia", "Serbia", "Singapore", "Slovak Republic", "Slovenia", "South Africa", "Spain", "Sweden", "Thailand", "Turkey", "United Arab Emirates", "USA")
map('world', resolution=1, col="darkgray")
map('world', countries, resolution=1, fill = T, col = "royalblue", add = T)
map('world', resolution=1,col="darkgray", add=TRUE)
Any ideas will be appreciated.
Thanks
Jº
To change the colour of the boundaries, add the argument border='darkgrey'. This argument is not listed explicitely in the man page for 'map' because it is in fact an argument for the call to 'polygon'. It is mentioned in the '...' segment of the man page, though.
map('world', countries, resolution=1, fill = T,
col = "royalblue", border="darkgrey", add = T)

ggplot: how to limit output in bar plot so only most frequent occurrences are shown?

I have been searching for this simple thing for hours now, but to no avail. I have a dataframe with one of the columns the variable "country". I want two things the following:
Plot the most frequent countries, most frequent on top (partial solution found EDIT full solution found >> focus question on limiting output in bar plot based on frequency);
Only show the top x "most frequent" countries, moving the rest into 'Other' variable.
I tried to ggplot table() or summary() but that does not work. Is it even possible within ggplot, or should I use barchart (I managed to do this using barchart, just using summary(df$something) and adding max = x). I also wanted to stack the output (different questions about country).
Most frequent countries on top:
ggplot(aDDs,aes(x=
factor(answer,
levels=names(sort(table(answer),increasing=TRUE))
),fill=question
)
) + geom_bar() + coord_flip()
Suggestions are very very welcome.
====== EDIT3:
I continued working on the code based on the suggestion by #CMichael, but now encountered another, quite strange, thing. Because this 'ifelse' problem concerns a slightly one question than my original one, I have posted a separate question for this matter. Please check it here: R: ifelse function returns vector position instead of value (string)
====== EDIT:
The aDDs example is reproduced below - aDDs dataset can be downloaded here:
temp <- structure(list(student = c(2270285L, 2321254L, 75338L, 2071594L,1682771L, 1770356L, 2155693L, 3154864L, 3136979L, 2082311L),answer = structure(c(181L, 87L, 183L, 89L, 115L, 183L, 172L,180L, 175L, 125L), .Label = c("Congo", "Guinea-Bissau", "Solomon Islands","Central African Rep", "Comoros", "Equatorial Guinea", "Liechtenstein","Nauru", "Brunei", "Djibouti", "Kiribati", "Papua New Guinea","Samoa", "South Sudan", "Tajikistan", "Tonga", "Bhutan","Gabon", "Laos", "Lesotho", "Maldives", "Micronesia", "St Kitts and Nevis","Mozambique", "Niger", "Andorra", "Cape Verde", "Mauritania","Antigua and Deps", "Chad", "Guinea", "Malta", "Burundi","Eritrea", "Iceland", "Kyrgyzstan", "Turkmenistan", "Azerbaijan","Dominica", "Belize", "Malawi", "Mali", "Moldova", "Benin","Cuba", "Gambia", "Luxembourg", "St Lucia", "Angola", "Cambodia","Georgia", "Madagascar", "Oman", "Kosovo", "Kuwait", "Namibia","Bahrain", "Congo - Democratic Rep", "Montenegro", "Senegal","Sierra Leone", "Togo", "Botswana", "Fiji", "Libya", "Uzbekistan","Guyana", "Mongolia", "Somalia", "Zambia", "Estonia", "Ivory Coast","Myanmar", "Grenada", "Qatar", "Saint Vincent and the Grenadines","Tanzania", "Armenia", "Bahamas", "Belarus", "Burkina", "Liberia","Afghanistan", "Latvia", "Yemen", "Mauritius", "Albania","Barbados", "Iraq", "Macedonia", "Nicaragua", "Panama", "Slovenia","Lebanon", "Slovakia", "Kazakhstan", "Paraguay", "Korea South","Suriname", "Czech Republic", "Rwanda", "Haiti", "Lithuania","Israel", "Zimbabwe", "Cyprus", "Honduras", "Uruguay", "Syria","Finland", "Tunisia", "Taiwan", "Uganda", "Denmark", "Austria","Sri Lanka", "Vietnam", "Bosnia Herzegovina", "Thailand","Norway", "Trinidad and Tobago", "Switzerland", "Nepal","Sudan", "Jamaica", "Japan", "United Arab Emirates", "Bolivia","New Zealand", "Ethiopia", "Jordan", "Cameroon", "Croatia","Sweden", "Kenya", "Singapore", "Guatemala", "Ireland Republic","Saudi Arabia", "Bulgaria", "Malaysia", "Belgium", "Dominican Republic","Algeria", "El Salvador", "Bangladesh", "Serbia", "Ghana","Costa Rica", "Indonesia", "Hungary", "Venezuela", "Ecuador","Ukraine", "Romania", "Turkey", "China", "Morocco", "Russian Federation","Peru", "South Africa", "Argentina", "Portugal", "Iran","Poland", "Italy", "Chile", "France", "Germany", "Australia","Philippines", "Egypt", "Greece", "Nigeria", "Canada", "Pakistan","United Kingdom", "Mexico", "Colombia", "Brazil", "Netherlands","Spain", "India", "United States"), class = "factor"), question = c("C1-pres","C1-pres", "C1-pres", "C1-pres", "C1-pres", "C1-pres", "C1-pres","B1-pres", "B1-pres", "B1-pres")), .Names = c("student","answer", "question"), row.names = c("156", "203", "280", "347","412", "478", "534", "1649651", "1649691", "1649763"), class = "data.frame")
For the filtering question you should introduce a new column:
data$filteredCountry = ifelse(data$value > threshold, data$country, "other")
Now you can use filteredCountry as your x in the aesthetics.
The data ordering question pops up every now and then (e.g., ggplot2: sorting a plot). You need to order your country factor levels by the underlying values. Your reorder command seems to sort by country name again, I would expect something like reorder(country,frequency) but sample data would help.
UPDATE:
With the now provided data it becomes obvious that you need to create summary dataset:
data <- read.table("aDDs.csv",sep=",",header=T)
require(plyr)
summary <- ddply(data,.(answer),summarise,freq=length(answer))
This yields the data frame summary with one entry for each country (181 in total). Now you can do the filtering and the reordering:
threshold = quantile(summary$freq,0.9)
summary $filteredCountry = ifelse(summary$freq > threshold, summary$answer, "other")
summary$filteredCountry = reorder(summary$filteredCountry,-summary$freq)
Now you can plot:
require(ggplot2)
p=ggplot(data=summary,aes(x=filteredCountry,y=freq))
p = p+geom_bar(aes(fill=filteredCountry),stat="identity")
p
Thanks to suggestions from #CMichael and answers to another - related - post here on SO. I managed to create a stacked and ordered bar plot using ggplot:
create a list with most frequent country names
temp <- row.names(as.data.frame(summary(aDDs$answer, max=12))) # create a df or something else with the summary output.
aDDs$answer <- as.character(aDDs$answer) # IMPORTANT! Here was the problem: turn into character values
create new column that filters top results
aDDs$top <- ifelse(
aDDs$answer %in% temp, ## condition: match aDDs$answer with row.names in summary df
aDDs$answer, ## then it should be named as aDDs$answer
"Other" ## else it should be named "Other"
)
aDDs$top <- as.factor(aDDs$top) # factorize the output again
plot
ggplot(aDDs,aes(x=
factor(top,
levels=names(sort(table(top),increasing=TRUE))
),fill=question
)
) + geom_bar() + coord_flip()
And here the output (still needs some tweaking, but it is what I wanted):

Resources