I am trying to plot a dendrogram according to the time series. I should base it on 27x27 matrix (country names in columns) and am only allowed to use R standard package, dplyr and ggplot2 libraries. I used to aggregate the data by mean but it turned out the plot below wasn't done this way. I'm struggling to understand how I am supposed to obtain one value for each country another way.
I should obtain such plot:
goal dendrogram
library("eurostat")
library("ggplot2")
library("dplyr")
d <- get_eurostat("prc_hicp_manr") %>%
filter(coicop == "CP00")
d2 <- label_eurostat(d) %>%
filter(time %in% as.Date("2000-02-01"):as.Date("2022-09-30"),
geo %in% c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Chechia", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary",
"Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia",
"Spain", "Sweden"))
d3 <- d2[,c("geo","values")]
d3
#m <- matrix(?)
dist <- dist(m, method = "minkowski", p = 1.5)
d3_hclust <- hclust(dist, method = "complete")
d3_dend <- as.dendrogram(d3_hclust)
plot(d3_dend)
I am trying to plot countries in R Maps. However, when I use FILL=TRUE all my countries boundaries are drawn in black. I want them in Gray. This is my code:
library(maps) # Provides functions that let us plot the maps
library(mapdata) # Contains the hi-resolution points that mark out the Cnt
countries=c("Argentina","Armenia","Australia","Bahrain","Belgium","Botswana","Bulgaria","Canada", "Chile","Tawain", "Croatia","Cyprus", "Czech Republic", "Denmark","Egypt","UK:Great Britain","Finland", "France", "Georgia", "Germany", "China:Hong Kong", "Hungary", "Indonesia", "Iran", "Ireland", "Israel", "Italy", "Japan", "Jordan", "Kazakhstan", "Korea", "Kuwait", "Lebanon", "Lithuania", "Malaysia", "Malta", "Morocco", "Netherlands", "New Zealand", "UK:Northern Ireland", "Norway", "Oman", "Palestine", "Poland", "Portugal", "Qatar", "Russia", "Saudi Arabia", "Serbia", "Singapore", "Slovak Republic", "Slovenia", "South Africa", "Spain", "Sweden", "Thailand", "Turkey", "United Arab Emirates", "USA")
map('world', resolution=1, col="darkgray")
map('world', countries, resolution=1, fill = T, col = "royalblue", add = T)
map('world', resolution=1,col="darkgray", add=TRUE)
Any ideas will be appreciated.
Thanks
Jº
To change the colour of the boundaries, add the argument border='darkgrey'. This argument is not listed explicitely in the man page for 'map' because it is in fact an argument for the call to 'polygon'. It is mentioned in the '...' segment of the man page, though.
map('world', countries, resolution=1, fill = T,
col = "royalblue", border="darkgrey", add = T)
I am trying to add a factor variable "Economy" with levels "Developed" and "Developing" to my dataset that has a list of countries.
What am I doing wrong?
Developed <- data.frame(c("Andorra", "Faroe Islands", "Ireland", "Monaco", "Spain", "Australia", "Finland",
"Israel", "Netherlands", "Sweden", "Austria", "France", "Italy", "New Zealand", "Switzerland",
"Belgium", "Germany", "Japan", "Norway", "Turkey", "Bermuda", "Greece", "Liechtenstein",
"Portugal", "United Kingdom", "Canada", "Holy See", "Luxembourg", "San Marino", "United States",
"Denmark", "Iceland", "Malta", "South Africa", "Hong Kong", "South Korea", "Singapore", "Taiwan"))
names(Developed) <- "Country"
total$Economy <- ifelse(d$Country==Developed$Country, "Developed", "Developing")
It produces the following error:
Warning messages:
1: In is.na(e1) | is.na(e2) :
longer object length is not a multiple of shorter object length
2: In `==.default`(data$Country, Developed$Country) :
longer object length is not a multiple of shorter object length
#DJJ,
ifelse(d$Country%in%Developed$Country, "Developed", "Developing")
worked perfectly! Problem solved.
I would like to generate a set of maps in R with all of them having the same background (a focus on Europe) BUT each of them having one EU country highlighted in another color. And I can't seem to figure out how to write the for loop to get that...
Here is my code:
require(rgdal)
setwd(...) #where I have my GIS shapefile
world <- readOGR(dsn = ".", layer = "TM_WORLD_BORDERS-0.2")
#Subset European countries
#List of "european" countries + shapefile
europe <- c("Russia", "Isle of Man", "Channel Islands", "Faroe Islands",
"France", "Denmark", "Iceland", "Germany", "Romania", "Poland", "Portugal",
"United Kingdom", "Spain", "Sweden", "Lithuania", "Ireland", "Italy",
"Netherlands", "Norway", "Ukraine", "Latvia", "Estonia", "Finland",
"Bulgaria", "Belgium", "Montenegro", "Serbia and Montenegro", "Slovenia",
"Albania", "Greece", "Croatia", "Malta")
europe <- subset(world, NAME %in% europe)
#List of countries in the EU + shapefile
EU <- c("Isle of Man", "Channel Islands", "Faroe Islands", "France",
"Denmark", "Germany", "Romania", "Poland", "Portugal", "Spain", "Sweden",
"Lithuania", "Ireland", "Italy", "Netherlands", "Ukraine", "Latvia", "Estonia",
"Finland", "Bulgaria", "Belgium", "Montenegro", "Serbia and Montenegro",
"Slovenia", "Albania", "Greece", "Croatia", "Malta")
EU <- subset(europe, NAME %in% EU)
#Generate one map per highlighted country
eucountries <- unique(europe$NAME)
for(i:length(eucountries))
{
print(i)
png(paste(i,".png",sep=""), 200, 200)
map("world", ylim=c(35,70), xlim=c(-20,45), col="#BFBFBF", fill=TRUE)
plot(eucountries, add=TRUE, col="#769EB2", namesonly=TRUE)
dev.off()
}
I want to produce one png per country. Each png will have one specific country highlighted with a different color. The full map will be plotted each time.
Thanks to vpipkt's comment that indicated that map()$names does provide a list of names of the things (polygons I suspect) that are plotted I could come up with a much more elgant solution:
building an index of for those polygons that are named like countries
using that information to build a color vector to color the countries
Note: the borders provided by the maps packae seem a litle outdated, e.g. Yugoslavia
# library
library(maps)
# options
old <- par()$mar
par("mar"=c(0,0,0,0))
YLIM <- c(35,70)
XLIM <- c(-20,45)
# plotting
for(country in c("Germany", "Ireland", "Spain", "Greece", "Denmark", "Yugoslavia") )
{
polygon_names <- map("world", ylim=YLIM, xlim=XLIM)$names
index <- grep(country, polygon_names)
colvec <- rep("white", length(polygon_names))
colvec[index] <- "red"
png(paste0(country,".png"))
map("world", ylim=YLIM, xlim=XLIM, col=colvec, fill=TRUE)
dev.off()
}
# resetting options
par("mar"=old)
Inside your loop, try
plot(eucountries[i], add=TRUE, col="#769EB2", namesonly=TRUE)
in place of your current plot call. Note the subset of eucountries.
I have been searching for this simple thing for hours now, but to no avail. I have a dataframe with one of the columns the variable "country". I want two things the following:
Plot the most frequent countries, most frequent on top (partial solution found EDIT full solution found >> focus question on limiting output in bar plot based on frequency);
Only show the top x "most frequent" countries, moving the rest into 'Other' variable.
I tried to ggplot table() or summary() but that does not work. Is it even possible within ggplot, or should I use barchart (I managed to do this using barchart, just using summary(df$something) and adding max = x). I also wanted to stack the output (different questions about country).
Most frequent countries on top:
ggplot(aDDs,aes(x=
factor(answer,
levels=names(sort(table(answer),increasing=TRUE))
),fill=question
)
) + geom_bar() + coord_flip()
Suggestions are very very welcome.
====== EDIT3:
I continued working on the code based on the suggestion by #CMichael, but now encountered another, quite strange, thing. Because this 'ifelse' problem concerns a slightly one question than my original one, I have posted a separate question for this matter. Please check it here: R: ifelse function returns vector position instead of value (string)
====== EDIT:
The aDDs example is reproduced below - aDDs dataset can be downloaded here:
temp <- structure(list(student = c(2270285L, 2321254L, 75338L, 2071594L,1682771L, 1770356L, 2155693L, 3154864L, 3136979L, 2082311L),answer = structure(c(181L, 87L, 183L, 89L, 115L, 183L, 172L,180L, 175L, 125L), .Label = c("Congo", "Guinea-Bissau", "Solomon Islands","Central African Rep", "Comoros", "Equatorial Guinea", "Liechtenstein","Nauru", "Brunei", "Djibouti", "Kiribati", "Papua New Guinea","Samoa", "South Sudan", "Tajikistan", "Tonga", "Bhutan","Gabon", "Laos", "Lesotho", "Maldives", "Micronesia", "St Kitts and Nevis","Mozambique", "Niger", "Andorra", "Cape Verde", "Mauritania","Antigua and Deps", "Chad", "Guinea", "Malta", "Burundi","Eritrea", "Iceland", "Kyrgyzstan", "Turkmenistan", "Azerbaijan","Dominica", "Belize", "Malawi", "Mali", "Moldova", "Benin","Cuba", "Gambia", "Luxembourg", "St Lucia", "Angola", "Cambodia","Georgia", "Madagascar", "Oman", "Kosovo", "Kuwait", "Namibia","Bahrain", "Congo - Democratic Rep", "Montenegro", "Senegal","Sierra Leone", "Togo", "Botswana", "Fiji", "Libya", "Uzbekistan","Guyana", "Mongolia", "Somalia", "Zambia", "Estonia", "Ivory Coast","Myanmar", "Grenada", "Qatar", "Saint Vincent and the Grenadines","Tanzania", "Armenia", "Bahamas", "Belarus", "Burkina", "Liberia","Afghanistan", "Latvia", "Yemen", "Mauritius", "Albania","Barbados", "Iraq", "Macedonia", "Nicaragua", "Panama", "Slovenia","Lebanon", "Slovakia", "Kazakhstan", "Paraguay", "Korea South","Suriname", "Czech Republic", "Rwanda", "Haiti", "Lithuania","Israel", "Zimbabwe", "Cyprus", "Honduras", "Uruguay", "Syria","Finland", "Tunisia", "Taiwan", "Uganda", "Denmark", "Austria","Sri Lanka", "Vietnam", "Bosnia Herzegovina", "Thailand","Norway", "Trinidad and Tobago", "Switzerland", "Nepal","Sudan", "Jamaica", "Japan", "United Arab Emirates", "Bolivia","New Zealand", "Ethiopia", "Jordan", "Cameroon", "Croatia","Sweden", "Kenya", "Singapore", "Guatemala", "Ireland Republic","Saudi Arabia", "Bulgaria", "Malaysia", "Belgium", "Dominican Republic","Algeria", "El Salvador", "Bangladesh", "Serbia", "Ghana","Costa Rica", "Indonesia", "Hungary", "Venezuela", "Ecuador","Ukraine", "Romania", "Turkey", "China", "Morocco", "Russian Federation","Peru", "South Africa", "Argentina", "Portugal", "Iran","Poland", "Italy", "Chile", "France", "Germany", "Australia","Philippines", "Egypt", "Greece", "Nigeria", "Canada", "Pakistan","United Kingdom", "Mexico", "Colombia", "Brazil", "Netherlands","Spain", "India", "United States"), class = "factor"), question = c("C1-pres","C1-pres", "C1-pres", "C1-pres", "C1-pres", "C1-pres", "C1-pres","B1-pres", "B1-pres", "B1-pres")), .Names = c("student","answer", "question"), row.names = c("156", "203", "280", "347","412", "478", "534", "1649651", "1649691", "1649763"), class = "data.frame")
For the filtering question you should introduce a new column:
data$filteredCountry = ifelse(data$value > threshold, data$country, "other")
Now you can use filteredCountry as your x in the aesthetics.
The data ordering question pops up every now and then (e.g., ggplot2: sorting a plot). You need to order your country factor levels by the underlying values. Your reorder command seems to sort by country name again, I would expect something like reorder(country,frequency) but sample data would help.
UPDATE:
With the now provided data it becomes obvious that you need to create summary dataset:
data <- read.table("aDDs.csv",sep=",",header=T)
require(plyr)
summary <- ddply(data,.(answer),summarise,freq=length(answer))
This yields the data frame summary with one entry for each country (181 in total). Now you can do the filtering and the reordering:
threshold = quantile(summary$freq,0.9)
summary $filteredCountry = ifelse(summary$freq > threshold, summary$answer, "other")
summary$filteredCountry = reorder(summary$filteredCountry,-summary$freq)
Now you can plot:
require(ggplot2)
p=ggplot(data=summary,aes(x=filteredCountry,y=freq))
p = p+geom_bar(aes(fill=filteredCountry),stat="identity")
p
Thanks to suggestions from #CMichael and answers to another - related - post here on SO. I managed to create a stacked and ordered bar plot using ggplot:
create a list with most frequent country names
temp <- row.names(as.data.frame(summary(aDDs$answer, max=12))) # create a df or something else with the summary output.
aDDs$answer <- as.character(aDDs$answer) # IMPORTANT! Here was the problem: turn into character values
create new column that filters top results
aDDs$top <- ifelse(
aDDs$answer %in% temp, ## condition: match aDDs$answer with row.names in summary df
aDDs$answer, ## then it should be named as aDDs$answer
"Other" ## else it should be named "Other"
)
aDDs$top <- as.factor(aDDs$top) # factorize the output again
plot
ggplot(aDDs,aes(x=
factor(top,
levels=names(sort(table(top),increasing=TRUE))
),fill=question
)
) + geom_bar() + coord_flip()
And here the output (still needs some tweaking, but it is what I wanted):