Merging two files horizontally and formatting - unix

I have two files as follows:
File_1
Austin
Los Angeles
York
San Ramon
File_2
Texas
California
New York
California
I want to merge them horizontally as follows:
Austin Texas
Los Angeles California
York New York
San Ramon California
I am able to merge horizontally by using paste command, but the formatting is going haywire.
Austin Texas
Los Angeles California
York New York
San Ramon California
I realize that paste is working as it is supposed to, but can someone point me in the right direction to get the formatting right.
Thanks.

paste is using a tab when 'merging' the file, so maybe you have to post-process the file and remove the tab with spaces:
paste File_1 File_2 | awk 'BEGIN { FS = "\t" } ; {printf("%-20s%s\n",$1,$2) }'
result:
Austin Texas
Los Angeles California
York New York
San Ramon California

Firstly you have to check number of characters in the longest line. Than you may use fmt to pad line from the first file to greater length. Finish it using paste.

If you have an idea about the field width, you could do something like this:
IFS_BAK="$IFS"
IFS=$'\t'
paste file_1 file_2 \
| while read city state; do
printf "%-15s %-15s\n" "$city" "$state"
done
IFS="$IFS_BAK"
Or this shorter version:
paste file_1 file_2 | while IFS=$'\t' read city state; do
printf "%-15s %-15s\n" "$city" "$state"
done
Or use the column tool from bsdmainutils:
paste file_1 file_2 | column -s $'\t' -t

Related

Comparison of two CSV files

I'm trying to use Awk to compare the content of two large CSV files (more than 5000 rows each) but I can't get what I want.
Here is my problem:
I have a first file (File1) with a list of names ($1) and cities ($2) whose structure looks like this:
john;london
marc;paris
karen;new york
ben;london
vic;dublin
I have a second file (File2) with other information where we find some names ($3) of File1:
45456;windows;john;454646
47764;mac;zack;470093
41225;mac;ben;622101
12634;windows;ben;218996
7856;windows;karen;637294
12;mac;finn;878317
2315;windows;beverly;221167
445;windows;lilly;12316
3232;mac;john;601316
4546;mac;fish;305035
487;windows;vic;447421
46464;mac;karen;468154
I would like to extract from File2 all the lines whose names appear in File1 while adding the cities associated to each name in File1. Here is an example of the result I am looking for:
45456;windows;john;454646;london
3232;mac;john;601316;london
7856;windows;karen;637294;new york
46464;mac;karen;468154;new york
41225;mac;ben;622101;london
12634;windows;ben;218996;london
487;windows;vic;447421;dublin
Could you help me?
Build an associative array of the first file, making the name the index and the city the value. For the second file check if the name features in the aray, if yes, print the line and attach the city.
awk -F';' 'NR==FNR{a[$1]=$2}$3 in a{print $0";"a[$3]}' File1 File2
45456;windows;john;454646;london
41225;mac;ben;622101;london
12634;windows;ben;218996;london
7856;windows;karen;637294;new york
3232;mac;john;601316;london
487;windows;vic;447421;dublin
46464;mac;karen;468154;new york
With bash, GNU sort and GNU join:
join -t ';' -1 1 -2 3 <(sort File1) <(sort -t ';' -k 3 File2) -o 2.1,2.2,1.1,2.4,1.2
Output:
12634;windows;ben;218996;london
41225;mac;ben;622101;london
45456;windows;john;454646;london
3232;mac;john;601316;london
46464;mac;karen;468154;new york
7856;windows;karen;637294;new york
487;windows;vic;447421;dublin
Using csvkit
csvjoin -d ';' -H -c 3,1 File2 File1 | csvformat -D ';' | tail -n +2

Selecting format for addresses returned from ggmap's revgeocode

When you grab the address for a geolocation in R it defaults to the first entry. How can I return one of the others instead?
revgeocode(c(-122.39150, 37.77374), output = "address")
Multiple addresses found, the first will be returned:
1145 4th St, San Francisco, CA 94158, USA
...
San Francisco County, CA, USA
San Francisco, CA, USA
California, USA
United States
You can use output="all" and then access the $results array to get the specific entry you want.
E.g.:
revgeocode(c(-122.39150, 37.77374), output = "all")$results[[6]]$formatted_address
This returns the 6th address, "San Francisco, CA 94158, USA".
Hope this helps!

Finding a Column in File1 present in File2 in awk

I've 2 files as below
File1
USA,China,India,Canada
File2
Iran|XXXXXX|Iranian
Iraq|YYYYYY|Iraquian
Saudi|ZZZZZ|Saudi is a Rich Country
USA|AAAAAA|USA is United States of America.
India|IIII|India got freedom from British.
Scot|SSSSS|Canada Mexio.
How can I read the Value in File1 and check if it matches with the first delimited string in File2 using awk? I've tried this But i could not achieve it. Please help.
For the Above Input the Output should be
USA Matches
China Not Matched
India Matches
Canada Not Matches
Could you please try following.
awk 'FNR==NR{a[$1];next} {for(i=1;i<=NF;i++){if($i in a){print $i,"Matches"} else {print $i,"Not Matches."}}}' FS="|" Input_file2 FS="," Input_file1
You can try Perl also
$ cat vinoth1
USA,China,India,Canada
$ cat vinoth2
Iran|XXXXXX|Iranian
Iraq|YYYYYY|Iraquian
Saudi|ZZZZZ|Saudi is a Rich Country
USA|AAAAAA|USA is United States of America.
India|IIII|India got freedom from British.
Scot|SSSSS|Canada Mexio.
$ perl -F, -lane ' BEGIN { $x=qx(cat vinoth2) } print $_,$x=~/^$_/m ? " matches" : " not matches" for(#F) ' vinoth1
USA matches
China not matches
India matches
Canada not matches

Columns to Rows Conversion in Unix shell

I have an input;
8765843279 "dma_code":"501","dma_region";"NEW YORK, NY","check_fpc_cookie":"-1","check_tpc_cookie":"1"
I would like the Output as;
8765843279 dma_code 501
8765843279 dma_region NEW YORK, NY
8765843279 check_fpc_cookie -1
8765843279 check_tpc_cookie 1
Please could someone assist?
This can make it:
$ awk -v RS="," -F"\"" 'NR==1{v=$1}{print v, $2, $4}' file
8765843279 dma_code 501
8765843279 dma_region NEW YORK NY
8765843279 check_fpc_cookie -1
8765843279 check_tpc_cookie 1
Explanation
It is based on this solution by Ed Morton.
RS="," => records are separated by commas.
-F"\"" => fields within a record are separated by quotes ".
NR==1 {v=$1} stores the value of first record.
'{print v, $2, $4}' prints value + 2nd field + 4th field. 2nd and 4th position are based on cutting the fields with the field separator, that was set to F="\"".

want to change csv file date column format

Im newbie to batch/shell scripting. I have a CSV file like this:
Id depId Name city Date prod
12345 52845 ken LA 08.08.2013 16:06:53 KLS22
25685 28725 Larry MA 09.03.2013 16:06:58 KLt35
58345 28545 ken LA 06.08.2013 16:06:53 KLS22
75885 98725 Gow CA 05.04.2013 16:06:58 KLt35
about 2000 records. col are delimited by tab. I would like to change the date column to the format:
DD_MM_YYY_hh_mm_ss
I have tried something like this with awk:
awk -F '' '{ ("date -d \""$5"\" \"+%Y:%m/%d %T\"") | getline $5; print }' myfile.csv
but i get wrong output.
I expect output like this:
Id depId Name city Date prod
58345 28545 ken LA 03_06_2013_23_00_00 KLS22
75885 98725 Gow CA 05_06_2013_23_00_00 KLt35
Please help out! Thanks!!
One way with awk:
$ awk 'NR>1{gsub(/\./,"_",$5);gsub(/:/,"_",$6);$5=$5"_"$6;$6=$NF;NF--}{$1=$1}1' OFS="\t" myfile.csv
Test:
$ cat temp
Id depId Name city Date prod
12345 52845 ken LA 8.8.2013 16:06:53 KLS22
25685 28725 Larry MA 9.3.2013 16:06:58 KLt35
58345 28545 ken LA 6.8.2013 16:06:53 KLS22
75885 98725 Gow CA 5.4.2013 16:06:58 KLt35
$ awk 'NR>1{gsub(/\./,"_",$5);gsub(/:/,"_",$6);$5=$5"_"$6;$6=$NF;NF--}{$1=$1}1' OFS="\t" temp
Id depId Name city Date prod
12345 52845 ken LA 8_8_2013_16_06_53 KLS22
25685 28725 Larry MA 9_3_2013_16_06_58 KLt35
58345 28545 ken LA 6_8_2013_16_06_53 KLS22
75885 98725 Gow CA 5_4_2013_16_06_58 KLt35
A simpler approach which does not check if the string that looks like the date is really in the right columnt:
$ perl -pe 's/\t(\d)\.(\d)\.(\d\d\d\d) /sprintf("\t%04d-%02d-%02d ", $3, $2, $1) /e' t.csv
12345 52845 ken LA 2013-08-08 16:06:53 KLS22
25685 28725 Larry MA 2013-03-09 16:06:58 KLt35

Resources