How to aggregate data by time series and create a square matrix? - r

I am trying to plot a dendrogram according to the time series. I should base it on 27x27 matrix (country names in columns) and am only allowed to use R standard package, dplyr and ggplot2 libraries. I used to aggregate the data by mean but it turned out the plot below wasn't done this way. I'm struggling to understand how I am supposed to obtain one value for each country another way.
I should obtain such plot:
goal dendrogram
library("eurostat")
library("ggplot2")
library("dplyr")
d <- get_eurostat("prc_hicp_manr") %>%
filter(coicop == "CP00")
d2 <- label_eurostat(d) %>%
filter(time %in% as.Date("2000-02-01"):as.Date("2022-09-30"),
geo %in% c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Chechia", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary",
"Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia",
"Spain", "Sweden"))
d3 <- d2[,c("geo","values")]
d3
#m <- matrix(?)
dist <- dist(m, method = "minkowski", p = 1.5)
d3_hclust <- hclust(dist, method = "complete")
d3_dend <- as.dendrogram(d3_hclust)
plot(d3_dend)

Related

Removing redundant areas from a map in R (shapefile)

I plotted a few European countries on a map, but there are some outliers which I don't need. I tried to remove them from my spatial df using different ways suggested in similar questions but they didn't work for this case. Could you please give me your ideas on removing them? I appreciate it. The shape file is available here
EDIT: I need to remove these areas not only from the map, but also from the spatial data frame.
library(rgdal)
library(raster)
myCountries <- c("Austria", "Belgium", "Czech Republic", "Denmark", "Estonia", "Finland",
"France", "Germany", "Latvia", "Hungary", "Iceland", "Ireland", "Italy",
"Netherlands", "Norway", "Portugal", "Poland", "Spain", "Sweden", "Switzerland",
"Turkey", "United Kingdom")
countries <- readOGR('ne_110m_admin_0_countries.shp')
eurcountries <- countries[countries$NAME_EN %in% myCountries ,]
eurcountries2<-spTransform(eurcountries, CRS("+proj=longlat +datum=NAD83"))
plot(eurcountries2)
Here is how you can do that with terra (the replacement for raster):
myCountries <- c("Austria", "Belgium", "Czech Republic", "Denmark", "Estonia", "Finland",
"France", "Germany", "Latvia", "Hungary", "Iceland", "Ireland", "Italy",
"Netherlands", "Norway", "Portugal", "Poland", "Spain", "Sweden", "Switzerland",
"Turkey", "United Kingdom")
library(terra)
countries <- vect('ne_110m_admin_0_countries.shp')
eur <- countries[countries$NAME_EN %in% myCountries ,]
e <- ext(c(-28, 48, 35, 76)))
x <- crop(eur, e)
plot(x, "NAME_EN")
You can interactively find the extent you need for cropping by doing
plot(eur)
e <- draw()
# now click on the map twice
Or subset interactively, like this:
d <- disagg(eur)
plot(d)
s <- sel(d) # now draw a bounding box on the plot
a <- aggregate(s, "NAME_EN")
plot(a, "NAME_EN")
And you can coerce the SpatVector objects to sp or sf types like this:
sf <- sf::st_as_sf(x)
library(raster)
sp <- as(x, "Spatial")
Or vice versa with:
y <- vect(sf)
Instead of using the SP package, I find the SF package is better as it plays well with ggplot2. Then limiting the canvas is straightforward and adds the ability to colour the countries.
library(rgdal)
library(ggplot2)
myCountries <- c("Austria", "Belgium", "Czech Republic", "Denmark", "Estonia", "Finland",
"France", "Germany", "Latvia", "Hungary", "Iceland", "Ireland", "Italy",
"Netherlands", "Norway", "Portugal", "Poland", "Spain", "Sweden", "Switzerland",
"Turkey", "United Kingdom")
countries <- readOGR("C:/R/projects/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp")
eurcountries <- countries[countries$NAME_EN %in% myCountries, ]
eurcountries3 <- sf::st_as_sf(eurcountries)
ggplot(eurcountries3) +
geom_sf(aes(fill = ADMIN)) +
lims(x = c(50,-40), y = c(30, 74)) +
guides(fill = "none") +
theme_void()

Ploting Countries in R Maps Library : Gray boundaries with FILL=TRUE

I am trying to plot countries in R Maps. However, when I use FILL=TRUE all my countries boundaries are drawn in black. I want them in Gray. This is my code:
library(maps) # Provides functions that let us plot the maps
library(mapdata) # Contains the hi-resolution points that mark out the Cnt
countries=c("Argentina","Armenia","Australia","Bahrain","Belgium","Botswana","Bulgaria","Canada", "Chile","Tawain", "Croatia","Cyprus", "Czech Republic", "Denmark","Egypt","UK:Great Britain","Finland", "France", "Georgia", "Germany", "China:Hong Kong", "Hungary", "Indonesia", "Iran", "Ireland", "Israel", "Italy", "Japan", "Jordan", "Kazakhstan", "Korea", "Kuwait", "Lebanon", "Lithuania", "Malaysia", "Malta", "Morocco", "Netherlands", "New Zealand", "UK:Northern Ireland", "Norway", "Oman", "Palestine", "Poland", "Portugal", "Qatar", "Russia", "Saudi Arabia", "Serbia", "Singapore", "Slovak Republic", "Slovenia", "South Africa", "Spain", "Sweden", "Thailand", "Turkey", "United Arab Emirates", "USA")
map('world', resolution=1, col="darkgray")
map('world', countries, resolution=1, fill = T, col = "royalblue", add = T)
map('world', resolution=1,col="darkgray", add=TRUE)
Any ideas will be appreciated.
Thanks
Jº
To change the colour of the boundaries, add the argument border='darkgrey'. This argument is not listed explicitely in the man page for 'map' because it is in fact an argument for the call to 'polygon'. It is mentioned in the '...' segment of the man page, though.
map('world', countries, resolution=1, fill = T,
col = "royalblue", border="darkgrey", add = T)

ifelse statement - add a variable based on a condition

I am trying to add a factor variable "Economy" with levels "Developed" and "Developing" to my dataset that has a list of countries.
What am I doing wrong?
Developed <- data.frame(c("Andorra", "Faroe Islands", "Ireland", "Monaco", "Spain", "Australia", "Finland",
"Israel", "Netherlands", "Sweden", "Austria", "France", "Italy", "New Zealand", "Switzerland",
"Belgium", "Germany", "Japan", "Norway", "Turkey", "Bermuda", "Greece", "Liechtenstein",
"Portugal", "United Kingdom", "Canada", "Holy See", "Luxembourg", "San Marino", "United States",
"Denmark", "Iceland", "Malta", "South Africa", "Hong Kong", "South Korea", "Singapore", "Taiwan"))
names(Developed) <- "Country"
total$Economy <- ifelse(d$Country==Developed$Country, "Developed", "Developing")
It produces the following error:
Warning messages:
1: In is.na(e1) | is.na(e2) :
longer object length is not a multiple of shorter object length
2: In `==.default`(data$Country, Developed$Country) :
longer object length is not a multiple of shorter object length
#DJJ,
ifelse(d$Country%in%Developed$Country, "Developed", "Developing")
worked perfectly! Problem solved.

How to generate maps with different section highlighted each time with a for loop?

I would like to generate a set of maps in R with all of them having the same background (a focus on Europe) BUT each of them having one EU country highlighted in another color. And I can't seem to figure out how to write the for loop to get that...
Here is my code:
require(rgdal)
setwd(...) #where I have my GIS shapefile
world <- readOGR(dsn = ".", layer = "TM_WORLD_BORDERS-0.2")
#Subset European countries
#List of "european" countries + shapefile
europe <- c("Russia", "Isle of Man", "Channel Islands", "Faroe Islands",
"France", "Denmark", "Iceland", "Germany", "Romania", "Poland", "Portugal",
"United Kingdom", "Spain", "Sweden", "Lithuania", "Ireland", "Italy",
"Netherlands", "Norway", "Ukraine", "Latvia", "Estonia", "Finland",
"Bulgaria", "Belgium", "Montenegro", "Serbia and Montenegro", "Slovenia",
"Albania", "Greece", "Croatia", "Malta")
europe <- subset(world, NAME %in% europe)
#List of countries in the EU + shapefile
EU <- c("Isle of Man", "Channel Islands", "Faroe Islands", "France",
"Denmark", "Germany", "Romania", "Poland", "Portugal", "Spain", "Sweden",
"Lithuania", "Ireland", "Italy", "Netherlands", "Ukraine", "Latvia", "Estonia",
"Finland", "Bulgaria", "Belgium", "Montenegro", "Serbia and Montenegro",
"Slovenia", "Albania", "Greece", "Croatia", "Malta")
EU <- subset(europe, NAME %in% EU)
#Generate one map per highlighted country
eucountries <- unique(europe$NAME)
for(i:length(eucountries))
{
print(i)
png(paste(i,".png",sep=""), 200, 200)
map("world", ylim=c(35,70), xlim=c(-20,45), col="#BFBFBF", fill=TRUE)
plot(eucountries, add=TRUE, col="#769EB2", namesonly=TRUE)
dev.off()
}
I want to produce one png per country. Each png will have one specific country highlighted with a different color. The full map will be plotted each time.
Thanks to vpipkt's comment that indicated that map()$names does provide a list of names of the things (polygons I suspect) that are plotted I could come up with a much more elgant solution:
building an index of for those polygons that are named like countries
using that information to build a color vector to color the countries
Note: the borders provided by the maps packae seem a litle outdated, e.g. Yugoslavia
# library
library(maps)
# options
old <- par()$mar
par("mar"=c(0,0,0,0))
YLIM <- c(35,70)
XLIM <- c(-20,45)
# plotting
for(country in c("Germany", "Ireland", "Spain", "Greece", "Denmark", "Yugoslavia") )
{
polygon_names <- map("world", ylim=YLIM, xlim=XLIM)$names
index <- grep(country, polygon_names)
colvec <- rep("white", length(polygon_names))
colvec[index] <- "red"
png(paste0(country,".png"))
map("world", ylim=YLIM, xlim=XLIM, col=colvec, fill=TRUE)
dev.off()
}
# resetting options
par("mar"=old)
Inside your loop, try
plot(eucountries[i], add=TRUE, col="#769EB2", namesonly=TRUE)
in place of your current plot call. Note the subset of eucountries.

ggplot: how to limit output in bar plot so only most frequent occurrences are shown?

I have been searching for this simple thing for hours now, but to no avail. I have a dataframe with one of the columns the variable "country". I want two things the following:
Plot the most frequent countries, most frequent on top (partial solution found EDIT full solution found >> focus question on limiting output in bar plot based on frequency);
Only show the top x "most frequent" countries, moving the rest into 'Other' variable.
I tried to ggplot table() or summary() but that does not work. Is it even possible within ggplot, or should I use barchart (I managed to do this using barchart, just using summary(df$something) and adding max = x). I also wanted to stack the output (different questions about country).
Most frequent countries on top:
ggplot(aDDs,aes(x=
factor(answer,
levels=names(sort(table(answer),increasing=TRUE))
),fill=question
)
) + geom_bar() + coord_flip()
Suggestions are very very welcome.
====== EDIT3:
I continued working on the code based on the suggestion by #CMichael, but now encountered another, quite strange, thing. Because this 'ifelse' problem concerns a slightly one question than my original one, I have posted a separate question for this matter. Please check it here: R: ifelse function returns vector position instead of value (string)
====== EDIT:
The aDDs example is reproduced below - aDDs dataset can be downloaded here:
temp <- structure(list(student = c(2270285L, 2321254L, 75338L, 2071594L,1682771L, 1770356L, 2155693L, 3154864L, 3136979L, 2082311L),answer = structure(c(181L, 87L, 183L, 89L, 115L, 183L, 172L,180L, 175L, 125L), .Label = c("Congo", "Guinea-Bissau", "Solomon Islands","Central African Rep", "Comoros", "Equatorial Guinea", "Liechtenstein","Nauru", "Brunei", "Djibouti", "Kiribati", "Papua New Guinea","Samoa", "South Sudan", "Tajikistan", "Tonga", "Bhutan","Gabon", "Laos", "Lesotho", "Maldives", "Micronesia", "St Kitts and Nevis","Mozambique", "Niger", "Andorra", "Cape Verde", "Mauritania","Antigua and Deps", "Chad", "Guinea", "Malta", "Burundi","Eritrea", "Iceland", "Kyrgyzstan", "Turkmenistan", "Azerbaijan","Dominica", "Belize", "Malawi", "Mali", "Moldova", "Benin","Cuba", "Gambia", "Luxembourg", "St Lucia", "Angola", "Cambodia","Georgia", "Madagascar", "Oman", "Kosovo", "Kuwait", "Namibia","Bahrain", "Congo - Democratic Rep", "Montenegro", "Senegal","Sierra Leone", "Togo", "Botswana", "Fiji", "Libya", "Uzbekistan","Guyana", "Mongolia", "Somalia", "Zambia", "Estonia", "Ivory Coast","Myanmar", "Grenada", "Qatar", "Saint Vincent and the Grenadines","Tanzania", "Armenia", "Bahamas", "Belarus", "Burkina", "Liberia","Afghanistan", "Latvia", "Yemen", "Mauritius", "Albania","Barbados", "Iraq", "Macedonia", "Nicaragua", "Panama", "Slovenia","Lebanon", "Slovakia", "Kazakhstan", "Paraguay", "Korea South","Suriname", "Czech Republic", "Rwanda", "Haiti", "Lithuania","Israel", "Zimbabwe", "Cyprus", "Honduras", "Uruguay", "Syria","Finland", "Tunisia", "Taiwan", "Uganda", "Denmark", "Austria","Sri Lanka", "Vietnam", "Bosnia Herzegovina", "Thailand","Norway", "Trinidad and Tobago", "Switzerland", "Nepal","Sudan", "Jamaica", "Japan", "United Arab Emirates", "Bolivia","New Zealand", "Ethiopia", "Jordan", "Cameroon", "Croatia","Sweden", "Kenya", "Singapore", "Guatemala", "Ireland Republic","Saudi Arabia", "Bulgaria", "Malaysia", "Belgium", "Dominican Republic","Algeria", "El Salvador", "Bangladesh", "Serbia", "Ghana","Costa Rica", "Indonesia", "Hungary", "Venezuela", "Ecuador","Ukraine", "Romania", "Turkey", "China", "Morocco", "Russian Federation","Peru", "South Africa", "Argentina", "Portugal", "Iran","Poland", "Italy", "Chile", "France", "Germany", "Australia","Philippines", "Egypt", "Greece", "Nigeria", "Canada", "Pakistan","United Kingdom", "Mexico", "Colombia", "Brazil", "Netherlands","Spain", "India", "United States"), class = "factor"), question = c("C1-pres","C1-pres", "C1-pres", "C1-pres", "C1-pres", "C1-pres", "C1-pres","B1-pres", "B1-pres", "B1-pres")), .Names = c("student","answer", "question"), row.names = c("156", "203", "280", "347","412", "478", "534", "1649651", "1649691", "1649763"), class = "data.frame")
For the filtering question you should introduce a new column:
data$filteredCountry = ifelse(data$value > threshold, data$country, "other")
Now you can use filteredCountry as your x in the aesthetics.
The data ordering question pops up every now and then (e.g., ggplot2: sorting a plot). You need to order your country factor levels by the underlying values. Your reorder command seems to sort by country name again, I would expect something like reorder(country,frequency) but sample data would help.
UPDATE:
With the now provided data it becomes obvious that you need to create summary dataset:
data <- read.table("aDDs.csv",sep=",",header=T)
require(plyr)
summary <- ddply(data,.(answer),summarise,freq=length(answer))
This yields the data frame summary with one entry for each country (181 in total). Now you can do the filtering and the reordering:
threshold = quantile(summary$freq,0.9)
summary $filteredCountry = ifelse(summary$freq > threshold, summary$answer, "other")
summary$filteredCountry = reorder(summary$filteredCountry,-summary$freq)
Now you can plot:
require(ggplot2)
p=ggplot(data=summary,aes(x=filteredCountry,y=freq))
p = p+geom_bar(aes(fill=filteredCountry),stat="identity")
p
Thanks to suggestions from #CMichael and answers to another - related - post here on SO. I managed to create a stacked and ordered bar plot using ggplot:
create a list with most frequent country names
temp <- row.names(as.data.frame(summary(aDDs$answer, max=12))) # create a df or something else with the summary output.
aDDs$answer <- as.character(aDDs$answer) # IMPORTANT! Here was the problem: turn into character values
create new column that filters top results
aDDs$top <- ifelse(
aDDs$answer %in% temp, ## condition: match aDDs$answer with row.names in summary df
aDDs$answer, ## then it should be named as aDDs$answer
"Other" ## else it should be named "Other"
)
aDDs$top <- as.factor(aDDs$top) # factorize the output again
plot
ggplot(aDDs,aes(x=
factor(top,
levels=names(sort(table(top),increasing=TRUE))
),fill=question
)
) + geom_bar() + coord_flip()
And here the output (still needs some tweaking, but it is what I wanted):

Resources