How to merge two single column files together? - unix

I have two files and they have the same number of lines.
File A:
USA
UK
MEXICO
CHINA
RUSSIA
File B:
Washington DC
London
MEXICO CITY
BEIJING
MOSCOW
How can I merge these two files together using unix commands to make a file like this:
Result File:
USA Washington DC
UK London
MEXICO MEXICO CITY
CHINA BEIJING
RUSSIA MOSCOW
These two columns could be separated by tab or comma or any other thing?
Thank you for any suggestions?

You can try paste
$ paste file1 file2
USA Washington DC
UK London
MEXICO MEXICO CITY
CHINA BEIJING
RUSSIA MOSCOW

This is a job for paste, but this awk will do to:
awk 'FNR==NR{a[NR]=$0;next} {print a[FNR],$0}' fileA fileB
USA Washington DC
UK London
MEXICO MEXICO CITY
CHINA BEIJING
RUSSIA MOSCOW

Related

Count the number of repetitive values separated by commas

I have the set that looks something like :
colA
Nepal , India , USA
USA
India
USA
Nepal , India
USA
USA, Nepal
Nepal
Japan
so I want the count as :
COlB
Count
Nepal
4
India
3
USA
5
Japan
4
Is there a way to do it, without going into the Tableau Prep and directly from Tableau Reader with the use of calculative fields or something similar within it.

R: Is it possible to create multiple tables based on unique values by looping?

Say if we have a dataframe such the one below:
region country city
North America USA Washington
North America USA Boston
Western Europe UK Sheffield
Western Europe Germany Düsseldorf
Eastern Europe Ukraine Kiev
North America Canada Vancouver
Western Europe France Reims
Western Europe Belgium Antwerp
North America USA Chicago
Eastern Europe Belarus Minsk
Eastern Europe Russia Omsk
Eastern Europe Russia Moscow
Western Europe UK Southampton
Western Europe Germany Hamburg
North America Canada Ottawa
I would like to know how to loop through this dataframe in order to check if countries are assigned to the right region, same with cities. Usually I do it helping myself with table() function: however this is very time-consuming as this requires several ad-hoc statements such table(df$country[df$region == 'North America') and so on with all the regions involved and countries as well.
Thus, I'm eager to know how to create a loop so I could be able to get this output economizing as much as possible time and lines of code.
Thanks in advance!
df%>%group_by(region)%>%group_split()

substract two strings in dplyr row wise for R dataframe

Have two columns and need a third substracting the two using dplyr.
Very simple example for the sake of clarity. Split/separate approach not valid in my case.
x <- c("FRANCE","GERMANY","RUSSIA")
y <- c("Paris FRANCE", "Berlin GERMANY", "Moscow RUSSIA")
cities <- data.frame(x,y)
cities
x y
1 FRANCE Paris FRANCE
2 GERMANY Berlin GERMANY
3 RUSSIA Moscow RUSSIA
Expected results:
x y new
1 FRANCE Paris FRANCE Paris
2 GERMANY Berlin GERMANY Berlin
3 RUSSIA Moscow RUSSIA Moscow
What I've tried so far (to no avail):
this gets the very same df but removing the city (contrary as desired)
cities %>% mutate(new = setdiff(x,y))
x y new
1 FRANCE Paris FRANCE FRANCE
2 GERMANY Berlin GERMANY GERMANY
3 RUSSIA Moscow RUSSIA RUSSIA
On the contrary, setdiff in reverse order gets same initial data
cities %>% mutate(new = setdiff(y,x))
x y new
1 FRANCE Paris FRANCE Paris FRANCE
2 GERMANY Berlin GERMANY Berlin GERMANY
3 RUSSIA Moscow RUSSIA Moscow RUSSIA
Using gsub to remove worked just for first row issuing a warning
cities %>% mutate(new = gsub(x,"",y))
Warning message:
In gsub(x, "", y) :
argument 'pattern' has length > 1 and only the first element will be used
x y new
1 FRANCE Paris FRANCE Paris
2 GERMANY Berlin GERMANY Berlin GERMANY
3 RUSSIA Moscow RUSSIA Moscow RUSSIA
We can use stringr::str_replace:
library(tidyverse)
cities %>%
mutate_if(is.factor, as.character) %>%
mutate(new = trimws(str_replace(y, x, "")))
# x y new
#1 FRANCE Paris FRANCE Paris
#2 GERMANY Berlin GERMANY Berlin
#3 RUSSIA Moscow RUSSIA Moscow
Here is a solution with base R:
x <- c("FRANCE","GERMANY","RUSSIA")
y <- c("Paris FRANCE", "Berlin GERMANY", "Moscow RUSSIA")
cities <- data.frame(x,y,stringsAsFactors = F)
cities$new = mapply(function(a,b)
{setdiff(strsplit(a,' ')[[1]],strsplit(b,' ')[[1]])}, cities$y, cities$x)
Output:
x y new
1 FRANCE Paris FRANCE Paris
2 GERMANY Berlin GERMANY Berlin
3 RUSSIA Moscow RUSSIA Moscow
Hope this helps!

Find string, if does not exist, find another string

I have many files from OECD that have data available for different regional granularities. An example would be:
File A
REG_ID Region
AUS Australia
AU1GS Sydney
AU1 New South Wales
AU2 Victoria
AU2GM Melbourne
File B
REG_ID Region
AUS Australia
AU1GS Sydney
AU2GM Melbourne
File C
REG_ID Region
AUS Australia
AU1 New South Wales
AU1GS Sydney
AU2 Victoria
I want to extract the most granular region, in this case Sydney only, and not New South Wales. However, if Sydney is unavailable, I want to extract New South Wales.
How do I write code that is generalisable to all these files?

Summarize data using doBy package at region level

I have a dataset Data as below,
Region Country Market Price
EUROPE France France 30.4502
EUROPE Israel Israel 5.14110965
EUROPE France France 8.99665
APAC CHINA CHINA 2.6877232
APAC INDIA INDIA 60.9004
AFME SL SL 54.1729685
LA BRAZIL BRAZIL 56.8606917
EUROPE RUSSIA RUSSIA 11.6843732
APAC BURMA BURMA 63.5881232
AFME SA SA 115.0733685
I would like to summarize the data at Region level and get the SUM of Price at every Region Level.
I want the ouput to be Like below.
Data Output
Region Country Price
EUROPE France 30.4502
EUROPE Israel 5.14110965
EUROPE France 8.99665
EUROPE RUSSIA 11.6843732
Europe 56.27233285
APAC BURMA 63.5881232
APAC CHINA 2.6877232
APAC INDIA 60.9004
Apac 127.1762464
AFME BAHARAIN 54.1729685
AFME SA 115.0733685
AFME 169.246337
LA BRAZIL 56.8606917
LA 56.8606917
I have used summaryBy function of doBy package, i have tried the code below.
summaryBy
myfun1 <- function(x){c(s=Sum(x)}
DB= summaryBy(Data$Price ~Region + Country , data=Data, FUN=myfun1)
Anyhelp on this regard is very much appreciated.
You can do this by using dplyr to generate a summary table:
library(dplyr)
totals <- data %>% group_by(Region) %>% summarise(Country="",Price=sum(Price))
And then merging the summary with the rest of the data:
summary <- rbind(data[-3], totals)
Then you can sort by Region to put the summary with the region:
summary <- summary %>% arrange(Region)
Output:
Region Country Price
1 AFME SL 54.1730
2 AFME SA 115.0734
3 AFME 169.2463
4 APAC CHINA 2.6877
5 APAC INDIA 60.9004
6 APAC BURMA 63.5881
7 APAC 127.1762
8 EUROPE France 30.4502
9 EUROPE Israel 5.1411
10 EUROPE France 8.9967
11 EUROPE RUSSIA 11.6844
12 EUROPE 56.2723
13 LA BRAZIL 56.8607
14 LA 56.8607
You have to split data by Region factor and sum Price for each factor
lapply(split(data, data$Region), function(x) sum(x$Price))
Or, if you need to present result as you have shown:
totals = lapply(split(data, data$Region), function(x) rbind(x,data.frame(Region=unique(x$Region), Country="", Market="", Price=sum(x$Price))))
do.call(rbind, totals)

Resources