R: Is it possible to create multiple tables based on unique values by looping? - r

Say if we have a dataframe such the one below:
region country city
North America USA Washington
North America USA Boston
Western Europe UK Sheffield
Western Europe Germany Düsseldorf
Eastern Europe Ukraine Kiev
North America Canada Vancouver
Western Europe France Reims
Western Europe Belgium Antwerp
North America USA Chicago
Eastern Europe Belarus Minsk
Eastern Europe Russia Omsk
Eastern Europe Russia Moscow
Western Europe UK Southampton
Western Europe Germany Hamburg
North America Canada Ottawa
I would like to know how to loop through this dataframe in order to check if countries are assigned to the right region, same with cities. Usually I do it helping myself with table() function: however this is very time-consuming as this requires several ad-hoc statements such table(df$country[df$region == 'North America') and so on with all the regions involved and countries as well.
Thus, I'm eager to know how to create a loop so I could be able to get this output economizing as much as possible time and lines of code.
Thanks in advance!

df%>%group_by(region)%>%group_split()

Related

How to replace variables for certain observations in R?

So I have a list of countries in my survey with their specific spellings. I have another dataframe which has a score for each country, but the name of some of the countries are spelled differently than those in my survey (e.g. it is "Republic of Korea" in my survey and "South Korea" in that dataframe). When I merge these two dataframes, the score for those countries turn NA since they don't match.
I decided to build a dataframe of rows with the two different spellings being the two variables: should go about it?
correct_names incorrect
1 BOSNIA AND HERZEGOVINA BOSNIA HERZEGOVINA
2 CONGO- THE DEMOCRATIC REPUBLIC OF THE DEMOCRATIC REPUBLIC OF CONGO
3 IRAN- ISLAMIC REPUBLIC OF IRAN
4 KOREA- REPUBLIC OF SOUTH KOREA
5 LIBYAN ARAB JAMAHIRIYA LIBYA
6 MALDIVES NONE
7 MARTINIQUE NONE
8 MAYOTTE NONE
9 MOLDOVA- REPUBLIC OF MOLDOVA
10 RUSSIAN FEDERATION RUSSIA
11 SYRIAN ARAB REPUBLIC SYRIA
12 TAIWAN- PROVINCE OF CHINA TAIWAN
13 TANZANIA- UNITED REPUBLIC OF TANZANIA
14 UNITED ARAB EMIRATES NONE
15 VIET NAM VIETNAM
Now I want R to replace the "incorrect" names in the dataframe with my "correct" names. Anyone has any idea how I should go about it?

R Find unique value and find record like %_%

I have a data table of 10,000 records having multiple columns. Below is the code and part of the data set
states <- str_trim(unlist(strsplit(as.vector(search_data_set$location_name), ";"))
Part of Dataset:
Maine Virginia;
Oklahoma;
Kansas Minnesota South Dakota;
Delaware;
West Virginia;
Utah South Carolina;
Utah South Dakota Utah;
Indiana; Michigan Alaska Washington;
Washington Connecticut Maine;
Maine Oregon South Carolina Oregon;
Alabama Alaska;
Iowa Alabama New Mexico;
Virgin Islands South Dakota;
Maine Louisiana; Colorado;
District of Columbia Virgin Islands;
Pennsylvania Alabama;
I need to fulfill the below requirement and need help here:
Each record should take a unique value of location. (In Utah South Dakota Utah; , Utah should be counted as Unique)
When the user searches the dataset it should bring the record, if the location is anywhere. (%Oregon%) The current code is not bringing the record "Maine Oregon South Carolina Oregon;" when the user searches for "Oregon"
Need help in achieving this. Thanks in advance!

Find string, if does not exist, find another string

I have many files from OECD that have data available for different regional granularities. An example would be:
File A
REG_ID Region
AUS Australia
AU1GS Sydney
AU1 New South Wales
AU2 Victoria
AU2GM Melbourne
File B
REG_ID Region
AUS Australia
AU1GS Sydney
AU2GM Melbourne
File C
REG_ID Region
AUS Australia
AU1 New South Wales
AU1GS Sydney
AU2 Victoria
I want to extract the most granular region, in this case Sydney only, and not New South Wales. However, if Sydney is unavailable, I want to extract New South Wales.
How do I write code that is generalisable to all these files?

Summarize data using doBy package at region level

I have a dataset Data as below,
Region Country Market Price
EUROPE France France 30.4502
EUROPE Israel Israel 5.14110965
EUROPE France France 8.99665
APAC CHINA CHINA 2.6877232
APAC INDIA INDIA 60.9004
AFME SL SL 54.1729685
LA BRAZIL BRAZIL 56.8606917
EUROPE RUSSIA RUSSIA 11.6843732
APAC BURMA BURMA 63.5881232
AFME SA SA 115.0733685
I would like to summarize the data at Region level and get the SUM of Price at every Region Level.
I want the ouput to be Like below.
Data Output
Region Country Price
EUROPE France 30.4502
EUROPE Israel 5.14110965
EUROPE France 8.99665
EUROPE RUSSIA 11.6843732
Europe 56.27233285
APAC BURMA 63.5881232
APAC CHINA 2.6877232
APAC INDIA 60.9004
Apac 127.1762464
AFME BAHARAIN 54.1729685
AFME SA 115.0733685
AFME 169.246337
LA BRAZIL 56.8606917
LA 56.8606917
I have used summaryBy function of doBy package, i have tried the code below.
summaryBy
myfun1 <- function(x){c(s=Sum(x)}
DB= summaryBy(Data$Price ~Region + Country , data=Data, FUN=myfun1)
Anyhelp on this regard is very much appreciated.
You can do this by using dplyr to generate a summary table:
library(dplyr)
totals <- data %>% group_by(Region) %>% summarise(Country="",Price=sum(Price))
And then merging the summary with the rest of the data:
summary <- rbind(data[-3], totals)
Then you can sort by Region to put the summary with the region:
summary <- summary %>% arrange(Region)
Output:
Region Country Price
1 AFME SL 54.1730
2 AFME SA 115.0734
3 AFME 169.2463
4 APAC CHINA 2.6877
5 APAC INDIA 60.9004
6 APAC BURMA 63.5881
7 APAC 127.1762
8 EUROPE France 30.4502
9 EUROPE Israel 5.1411
10 EUROPE France 8.9967
11 EUROPE RUSSIA 11.6844
12 EUROPE 56.2723
13 LA BRAZIL 56.8607
14 LA 56.8607
You have to split data by Region factor and sum Price for each factor
lapply(split(data, data$Region), function(x) sum(x$Price))
Or, if you need to present result as you have shown:
totals = lapply(split(data, data$Region), function(x) rbind(x,data.frame(Region=unique(x$Region), Country="", Market="", Price=sum(x$Price))))
do.call(rbind, totals)

How to merge two single column files together?

I have two files and they have the same number of lines.
File A:
USA
UK
MEXICO
CHINA
RUSSIA
File B:
Washington DC
London
MEXICO CITY
BEIJING
MOSCOW
How can I merge these two files together using unix commands to make a file like this:
Result File:
USA Washington DC
UK London
MEXICO MEXICO CITY
CHINA BEIJING
RUSSIA MOSCOW
These two columns could be separated by tab or comma or any other thing?
Thank you for any suggestions?
You can try paste
$ paste file1 file2
USA Washington DC
UK London
MEXICO MEXICO CITY
CHINA BEIJING
RUSSIA MOSCOW
This is a job for paste, but this awk will do to:
awk 'FNR==NR{a[NR]=$0;next} {print a[FNR],$0}' fileA fileB
USA Washington DC
UK London
MEXICO MEXICO CITY
CHINA BEIJING
RUSSIA MOSCOW

Resources