How to convert ISO-4217-Code to currency name in R - r

I have a large dataset with various currencies. As for visualisation purpose I would like to display the full name of the currencies and not the ISO-4217-Code (currency code).
Let me take the following excerpt of the data:
# dataframe
id = rep(1:8)
country = c("USA", "Canada", "UK", "Switzerland", "USA", "Sweden", "Switzerland", "Canada")
currency_code = c("USD", "CAD", "GBP", "CHF", "USD", "SEK", "CHF", "CAD")
df1 = data.frame(id, country, currency_code)
I could now with a left join manually assigning the currency name to the ISO-4217-Code which would work. But I'm looking personally for a more elegant way to do it than below shown:
country = c("USA", "Canada", "UK", "Switzerland", "Sweden")
currency_name = c("US Dollar", "Canadian Dollar", "Great Britain Pound", "Swiss franc", "Swedish krona")
df2 = data.frame(country, currency_name)
#left join
merge(df1, df2, by = "country", all.x = TRUE)
So my desired data would look like this
country id currency_code currency_name
Canada 2 CAD Canadian Dollar
Canada 8 CAD Canadian Dollar
Sweden 6 SEK Swedish krona
Switzerland 4 CHF Swiss franc
Switzerland 7 CHF Swiss franc
UK 3 GBP Great Britain Pound
USA 1 USD US Dollar
USA 5 USD US Dollar
With the countrycode package it is possible to assign country names to the country code (vice versa). Also it is possible to assign the currency code to a country, which is sadly not what my goal is.
I am aware that a left join would also give me a solution, but I would appreciate it if someone knows a better approach.

We can use the currency_list dataset from currencycode and join the 'currency_code' column with the input dataset 'df1'
library(dplyr)
# remotes::install_github("KKulma/currencycode")
library(currencycode)
data(currency_list)
currency_list %>%
select(currency_code, currency_name) %>%
filter(complete.cases(currency_code)) %>%
right_join(df1)
-output
# currency_code currency_name id country
#1 CAD Canadian Dollar 2 Canada
#2 CAD Canadian Dollar 8 Canada
#3 CHF Swiss Franc 4 Switzerland
#4 CHF Swiss Franc 7 Switzerland
#5 GBP Pound Sterling 3 UK
#6 SEK Swedish Krona 6 Sweden
#7 USD US Dollar 1 USA
#8 USD US Dollar 5 USA

The countrycode package also includes currency names...
library(dplyr)
library(countrycode)
df1 %>%
mutate(currency_name = countrycode(country, "country.name", "currency"))
#> id country currency_code currency_name
#> 1 1 USA USD US Dollar
#> 2 2 Canada CAD Canadian Dollar
#> 3 3 UK GBP Pound Sterling
#> 4 4 Switzerland CHF Swiss Franc
#> 5 5 USA USD US Dollar
#> 6 6 Sweden SEK Swedish Krona
#> 7 7 Switzerland CHF Swiss Franc
#> 8 8 Canada CAD Canadian Dollar
all of countrycode's currency options...
library(countrycode)
country = c("USA", "Canada", "UK", "Switzerland", "USA", "Sweden", "Switzerland", "Canada")
countrycode(country, "country.name", "iso4217c")
# [1] "USD" "CAD" "GBP" "CHF" "USD" "SEK" "CHF" "CAD"
countrycode(country, "country.name", "iso4217n")
# [1] 840 124 826 756 840 752 756 124
countrycode(country, "country.name", "currency")
# [1] "US Dollar" "Canadian Dollar" "Pound Sterling" "Swiss Franc"
# [5] "US Dollar" "Swedish Krona" "Swiss Franc" "Canadian Dollar"

Related

How do I rename the values in my column as I have misspelt them and cant rename them in R or Colab

I have a data frame that was given to me. Under the column titled state, there are two components with the same name but with different case sensitivities ie one is "London" and the other is "LONDON". How would i be able to rename "LONDON" to become "London" in order to total them up together and not separately. reminder, I am trying to change the name of the input not the name of the column.
You can use the following code, df is your current dataframe, in which you want to substitute "LONDON" for "London"
df <- data.frame(Country = c("US", "UK", "Germany", "Brazil","US", "Brazil", "UK", "Germany"),
State = c("NY", "London", "Bavaria", "SP", "CA", "RJ", "LONDON", "Berlin"),
Candidate = c(1:8))
print(df)
output
Country State Candidate
1 US NY 1
2 UK London 2
3 Germany Bavaria 3
4 Brazil SP 4
5 US CA 5
6 Brazil RJ 6
7 UK LONDON 7
8 Germany Berlin 8
then run the following code to substitute London to all the instances where State is equal to "LONDON"
df[df$State == "LONDON", "State"] <- "London"
Now the output will be as
Country State Candidate
1 US NY 1
2 UK London 2
3 Germany Bavaria 3
4 Brazil SP 4
5 US CA 5
6 Brazil RJ 6
7 UK London 7
8 Germany Berlin 8
Maybe you could try using the case_when function. I would do something like this:
ยดยดยดยด
mutate(data, State_def=case_when(State=="LONDON" ~ "London",
State=="London" ~ "London",
TRUE ~ NA_real_)
I might misunderstand, but I think it should be as simple as this:
x$state <- sub( "LONDON", "London", x$state, fixed=TRUE )
This should change LONDON to London

Trying to find values within excel cell based on given pairs in R df

I am using this excel sheet that I have currently read into R: https://www.knomad.org/sites/default/files/2018-04/bilateralmigrationmatrix20170_Apr2018.xlsx
dput(head(remittance, 5))
The output is:
structure(list(`Remittance-receiving country (across) - Remittance-sending country (down)` = c("Australia",
"Brazil", "Canada"), Brazil = c("27.868809286999106", "0", "31.284184411144214"
), Canada = c("46.827693406219382", "1.5806325278762619", "0"
), `Czech Republic` = c("104.79905129342241", "3.0488843262423089",
"176.79676736179096"), Finland = c("26.823089572300752", "1.3451674211686246",
"37.781150857376964"), France = c("424.37048861305249", "123.9763417712491",
"1296.7352242506483"), Germany = c("556.4140279523856", "66.518143815367239",
"809.9621650533453"), Hungary = c("200.08597014449356", "11.953328254521287",
"436.0811601171776"), Indonesia = c("172.0021287331823", "1.3701340430259537",
"33.545925908780198"), Italy = c("733.51652291459231", "116.74264895322995",
"1072.1119887588022"), `Korea, Rep.` = c("259.97044386689589",
"20.467939414361016", "326.94157937864327"), Netherlands = c("133.48932759488602",
"4.7378343766684532", "181.28828076733771"), Philippines = c("1002.3593555086774",
"1.5863355979877207", "2369.5223195675494"), Poland = c("109.73486651698796",
"5.8313637459523129", "341.10408952685464"), `Russian Federation` = c("19.082541158574934",
"1.0136604494838692", "58.760989426089431"), `Saudi Arabia` = c("13.578431465294949",
"0.32506772760873404", "15.511213677040857"), Sweden = c("91.887827513176489",
"5.1132733094740352", "65.860232580192786"), Thailand = c("383.08245004577498",
"2.7410805494977684", "79.370683058792849"), `United Kingdom` = c("1084.0742194994727",
"4.2050614573174592", "568.62605950140266"), `United States` = c("188.06242727403128",
"49.814372612310521", "661.98049661387927"), WORLD = c("5578.0296723604206",
"422.37127035334271", "8563.264510816849")), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
I currently have a dataframe of two columns "Source" and "Destination" where each row is a pair of countries which I created by doing:
countries = c("Australia","Brazil", "Canada", "Czech Republic", "Germany", "Finland", "United Kingdom", "Italy", "Poland", "Russian Federation", "Sweden", "United States", "Philippines", "France", "Netherlands", "Hungary", "Saudi Arabia", "Thailand", "Korea, Rep.", "Indonesia")
pairs = t(combn(countries, 2))
I would like to use each pair to extract its corresponding value from the excel sheet above. (In the Excel sheet "Source" is the first column of countries-down and "Destination is the first row countries-across)
For example a sample of the df that I have looks as follows (it currently contains 190 pairs):
pairs = data.frame(Source = c("Australia", "Australia", "Australia"), Destination = c("Brazil", "Canada", "Czech Republic"))
Where the first pair in my df is (Australia, Brazil) which corresponds to a value of 27.868809286999106 from the excel sheet that I reproduced above. Is there a built-in R function that would match the pairs from my df to extract its corresponding value? Thanks
Perhaps what you need is dplyr::pivot_longer?
library(dplyr)
colnames(remittance)[1] <- 'source'
remittance %>% pivot_longer(-source, names_to = 'destination')
#----
# A tibble: 60 x 3
source destination value
<chr> <chr> <chr>
1 Australia Brazil 27.868809286999106
2 Australia Canada 46.827693406219382
3 Australia Czech Republic 104.79905129342241
4 Australia Finland 26.823089572300752
Note remittance is the dataframe in the OP dput.
Probably you are interested in keeping the flexibility of your nice combn approach.
To loop over your pairs data frame (it's actually a matrix though) you may use apply with MARGIN=1 for row-wise. In the FUN= argument we create data frames of one row each with source corresponding to column 1 of pairs and destination to column 2. The distance (or whatever this value is) we get by subsetting at the corresponding rows and columns of remittance (for brevity I shortend to rem).
Since we will get a list of single-line data frames, we want to rbind, and because we have multiple objects we need do.call.
res <- do.call(rbind,
apply(pairs, MARGIN=1, FUN=function(x)
data.frame(source=x[1], destination=x[2],
dist=as.integer(rem[rem[, 1] == x[1], rem[1, ] == x[2]])))
)
Since the .xlsx has zeros where actually should be NAs we should declare them as such in the result.
res[res == 0] <- NA
Result
head(res, 25)
# source destination dist
# 1 Australia Brazil 721
# 2 Australia Canada 24721
# 3 Australia Czech Republic 1074
# 4 Australia Germany 13938
# 5 Australia Finland 1121
# 6 Australia United Kingdom 135000
# 7 Australia Italy 19350
# 8 Australia Poland 974
# 9 Australia Russian Federation 543
# 10 Australia Sweden 3988
# 11 Australia United States 93179
# 12 Australia Philippines 4118
# 13 Australia France 8475
# 14 Australia Netherlands 10697
# 15 Australia Hungary 997
# 16 Australia Saudi Arabia NA
# 17 Australia Thailand 11298
# 18 Australia Korea, Rep. 5381
# 19 Australia Indonesia 11094
# 20 Brazil Canada 26647
# 21 Brazil Czech Republic 742
# 22 Brazil Germany 44000
# 23 Brazil Finland 1378
# 24 Brazil United Kingdom 55772
# 25 Brazil Italy 104779
Data:
u <- "https://www.knomad.org/sites/default/files/2018-04/bilateralmigrationmatrix20170_Apr2018.xlsx"
rem <- openxlsx::read.xlsx(u)
countries <- c("Australia", "Brazil", "Canada", "Czech Republic", "Germany",
"Finland", "United Kingdom", "Italy", "Poland", "Russian Federation",
"Sweden", "United States", "Philippines", "France", "Netherlands",
"Hungary", "Saudi Arabia", "Thailand", "Korea, Rep.", "Indonesia")
pairs <- t(combn(countries, 2))

Changing spelling for multiple words at a time in R/replacing many words at once

I have a dataset (survey) and a column of birth_country, where people have written their country of birth. An example of it:
1 america
2 usa
3 american
4 us of a
5 united states
6 england
7 english
8 great britain
9 uk
10 united kingdom
how I would like it to look:
1 america
2 america
3 america
4 america
5 america
6 uk
7 uk
8 uk
9 uk
10 uk
I have tried using str_replace to manually insert the different spellings, to replace them with 'america' but when I look at my dataset, nothing has changed
e.g.
survey <- structure(list(birth_country = c("america", "usa", "american", "us of a", "united states", "england", "english", "great britain", "uk", "united kingdom")), row.names = c(NA, -10L), class = "data.frame")
survey$birth_country <- str_replace(survey$birth_country, ' "united state"|"united statea"|"united states of america"', "america")
thank you in advance
Come up with some patterns that only match for each country and basically loop over what you are already doing (you can change the replacement below with your favorite function)
survey <- structure(list(birth_country = c("america", "usa", "american", "us of a", "united states", "england", "english", "great britain", "uk", "united kingdom")), row.names = c(NA, -10L), class = "data.frame")
## use a _named_ list of regular expressions
## the name will be the replacement string
l <- list(
america = 'amer|us|states',
uk = 'eng|brit|king|uk',
'another country' = 'ano|an co',
chaz = 'chaz|chop'
)
f <- function(x, list) {
for (ii in seq_along(list)) {
x[grepl(list[[ii]], x, ignore.case = TRUE)] <- names(list)[ii]
}
x
}
## test it
f(survey$birth_country, l)
# [1] "america" "america" "america" "america" "america" "uk" "uk" "uk" "uk" "uk"
within(survey, {
clean <- f(birth_country, l)
})
# birth_country clean
# 1 america america
# 2 usa america
# 3 american america
# 4 us of a america
# 5 united states america
# 6 england uk
# 7 english uk
# 8 great britain uk
# 9 uk uk
# 10 united kingdom uk
Note that 1) if you don't give a pattern that matches, nothing will change, but 2) if you give a pattern that matches both countries (e.g., "united"), the first in the list will be used (unless the replacement itself is also matched)
Looks like the problem is in how you specified your regular expression. Try this (updated based on #Gabriella 's comment, and another tidyverse approach, similar to #MarBIo ):
library(tidyverse)
survey <- survey %>%
mutate(birth_country = if_else(
str_detect(birth_country,
"(united state)|(united statea)|(united states of america)"), #If your regular expression matches any in birth_country
"america", #Change it to "america"
birth_country #Otherwise, keep as is.
) #end of if_else
) #end of mutate
Other people are suggesting you come up with a more complex regular expression, which you can certainly do as well. Consecutive "or" (i.e. "|") statements in your regular expression works though.
In case you allow tidyverse`s mutate you can do:
library(tidyverse)
survey <- structure(list(birth_country = c("america", "usa", "american", "us of a", "united states", "england", "english", "great britain", "uk", "united kingdom")), row.names = c(NA, -10L), class = "data.frame")
americas <- c("america", "usa", "american", "us of a", "united states")
englands <- c("england", "english", "great britain")
survey %>%
mutate(birth_country = ifelse(birth_country %in% americas, 'america', 'UK'))
#> birth_country
#> 1 america
#> 2 america
#> 3 america
#> 4 america
#> 5 america
#> 6 UK
#> 7 UK
#> 8 UK
#> 9 UK
#> 10 UK

extracting country name from city name in R

This question may look like a duplicate but I am facing some issue while extracting country names from the string. I have gone through this link [link]Extracting Country Name from Author Affiliations but I was not able to solve my problem.I have tried grepl and for loop for text matching and replacement, my data column consists of more than 300k rows so using grepl and for loop for pattern matching is very very slow.
I have a column like this.
org_loc
Zug
Zug Canton of Zug
Zimbabwe
Zigong
Zhuhai
Zaragoza
York United Kingdom
Delhi
Yalleroi Queensland
Waterloo Ontario
Waterloo ON
Washington D.C.
Washington D.C. Metro
New York
df$org_loc <- c("zug", "zug canton of zug", "zimbabwe",
"zigong", "zhuhai", "zaragoza","York United Kingdom", "Delhi","Yalleroi Queensland","Waterloo Ontario","Waterloo ON","Washington D.C.","Washington D.C. Metro","New York")
the string may contain the name of a state, city or country. I just want Country as output. Like this
org_loc
Switzerland
Switzerland
Zimbabwe
China
China
Spain
United Kingdom
India
Australia
Canada
Canada
United State
United state
United state
I am trying to convert state (if match found) to its country using countrycode library but not able to do so. Any help would be appreciable.
You can use your City_and_province_list.csv as a custom dictionary for countrycode. The custom dictionary can not have duplicates in the origin vector (the City column in your City_and_province_list.csv), so you'll have to remove them or deal with them somehow first (as in my example below). Currently, you don't have all of the possible strings in your example in your lookup CSV, so they are not all converted, but if you added all of the possible strings to the CSV, it would work completely.
library(countrycode)
org_loc <- c("Zug", "Zug Canton of Zug", "Zimbabwe", "Zigong", "Zhuhai",
"Zaragoza", "York United Kingdom", "Delhi",
"Yalleroi Queensland", "Waterloo Ontario", "Waterloo ON",
"Washington D.C.", "Washington D.C. Metro", "New York")
df <- data.frame(org_loc)
city_country <- read.csv("https://raw.githubusercontent.com/girijesh18/dataset/master/City_and_province_list.csv")
# custom_dict for countrycode cannot have duplicate origin codes
city_country <- city_country[!duplicated(city_country$City), ]
df$country <- countrycode(df$org_loc, "City", "Country",
custom_dict = city_country)
df
# org_loc country
# 1 Zug Switzerland
# 2 Zug Canton of Zug <NA>
# 3 Zimbabwe <NA>
# 4 Zigong China
# 5 Zhuhai China
# 6 Zaragoza Spain
# 7 York United Kingdom <NA>
# 8 Delhi India
# 9 Yalleroi Queensland <NA>
# 10 Waterloo Ontario <NA>
# 11 Waterloo ON <NA>
# 12 Washington D.C. <NA>
# 13 Washington D.C. Metro <NA>
# 14 New York United States of America
library(countrycode)
df <- c("zug switzerland", "zug canton of zug switzerland", "zimbabwe",
"zigong chengdu pr china", "zhuhai guangdong china", "zaragoza","York United Kingdom", "Yamunanagar","Yalleroi Queensland Australia","Waterloo Ontario","Waterloo ON","Washington D.C.","Washington D.C. Metro","USA")
df1 <- countrycode(df, 'country.name', 'country.name')
It didn't match a lot of them, but that should do what you're looking for, based on the reference manual for countrycode.
With function geocode from package ggmap you may accomplish, with good but not total accuracy your task; you must also use your criterion to say "Zaragoza" is a city in Spain (which is what geocode returns) and not somewhere in Argentina; geocode tends to give you the biggest city when there are several homonyms.
(remove the $country to see all of the output)
library(ggmap)
org_loc <- c("zug", "zug canton of zug", "zimbabwe",
"zigong", "zhuhai", "zaragoza","York United Kingdom",
"Delhi","Yalleroi Queensland","Waterloo Ontario","Waterloo ON","Washington D.C.","Washington D.C. Metro","New York")
geocode(org_loc, output = "more")$country
as geocode is provided by google, it has a query limit, 2,500 per day per IP address; if it returns NAs it may be because an unconsistent limit check, just try it again.

Aggregate factors in Variable in R

I have this data.frame with a variable V21 in which many countries are recorded, I want to make it smaller by just specifying the continent rather then all those countries. For example 'Cuba', 'Peru', 'Argentina' rather than being separate levels of V21, I want them to become level 'South America'. Here's the code I tried to use:
recode(WaveOne.test$V21, "levels("Cuba","Colombia","Costa Rica","Argentina","Chile","Ecuador","Peru","Venezuela")= 'South America'")
levels(V21)
Can you suggest what is wrong with my code or maybe a different method?
I am a complete newbie in R and its syntax.
Thank you!
========UPDATE=========
SA_countries <- c("Cuba", "Mexico", "Argentina","Jamaica", "Haiti","West Indies", "Chile", "Ecuador", "Venezuela", "Other South America", "El Salvador", "Guatemala", "Nicaragua", "Dominican Republic", "Panama", "Costa Rica", "Peru")
Asia_countries <- c("Philippines", "Vietnam", "Laos", "Cambodia", "Hmong", "Other Asia", "China", "Hong Kong", "Taiwan", "Japan", "Korea", "India", "Pakistan")
Europe_Canada <- c("Europe/Canada")
MiddleEast_Africa <- c("Middle East/Africa")
continents <- list(`South America`= SA_countries, `Asia` = Asia_countries, `Europe_Canada` = Europe_Canada, `Middle East & Africa` = MiddleEast_Africa)
levels(WaveOne.test$V21) <- c(levels(WaveOne.test$V21), names(continents))
for(i in seq_along(continents)) WaveOne.test$V21[WaveOne.test$V21 %in% continents[[i]]] <- names(continents)[i]
levels(WaveOne.test$V21)
My output however is:
levels(WaveOne.test$V21)
1 "Cuba" "Mexico" "Nicaragua" "Colombia" "Dominican Republic" "El Salvador" "Guatemala"
[8] "Honduras" "Costa Rica" "Panama" "Argentina" "Chile" "Ecuador" "Peru"
[15] "Venezuela" "Other South America" "Haiti" "Jamaica" "West Indies" "Philippines" "Vietnam"
[22] "Laos" "Cambodia" "Hmong" "Other Asia" "China" "Hong Kong" "Taiwan"
[29] "Japan" "Korea" "India" "Pakistan" "Middle East/Africa" "Europe/Canada" "South America"
[36] "Asia" "Europe_Canada" "Middle East & Africa"
You can create a list with all of your countries and continents then reassign the values accordingly:
continents <- list(`South America`=SA_countries,
`North America` = NA_countries,
Europe=Euro_countries)
levels(df$V21) <- c(levels(df$V21), names(continents)) #necessary to add new levels
for(i in seq_along(continents)) {
df$V21[df$V21 %in% continents[[i]]] <- names(continents)[i]}
Reproducible Example
set.seed(123)
SA_countries <- c("Cuba","Colombia","Costa Rica","Argentina","Chile","Ecuador","Peru","Venezuela")
NA_countries <- c("Mexico", "USA", "Canada")
Euro_countries <- c("Germany", "France")
df <- data.frame(V21=sample(c(NA_countries,SA_countries, Europe),20,T))
df
# V21
# 1 Cuba
# 2 Venezuela
# 3 Costa Rica
# 4 Germany
# 5 France
# 6 Mexico
# 7 Argentina
# 8 Germany
# 9 Chile
# 10 Costa Rica
# 11 France
# 12 Costa Rica
# 13 Ecuador
# 14 Chile
# 15 USA
# 16 Germany
# 17 Cuba
# 18 Mexico
# 19 Colombia
# 20 France
continents <- list(`South America`=SA_countries, `North America` = NA_countries, Europe=Euro_countries)
levels(df$V21) <- c(levels(df$V21), names(continents))
for(i in seq_along(continents)) df$V21[df$V21 %in% continents[[i]]] <- names(continents)[i]
df
# V21
# 1 South America
# 2 South America
# 3 South America
# 4 Europe
# 5 Europe
# 6 North America
# 7 South America
# 8 Europe
# 9 South America
# 10 South America
# 11 Europe
# 12 South America
# 13 South America
# 14 South America
# 15 North America
# 16 Europe
# 17 South America
# 18 North America
# 19 South America
# 20 Europe

Resources