I've 2 files as below
File1
USA,China,India,Canada
File2
Iran|XXXXXX|Iranian
Iraq|YYYYYY|Iraquian
Saudi|ZZZZZ|Saudi is a Rich Country
USA|AAAAAA|USA is United States of America.
India|IIII|India got freedom from British.
Scot|SSSSS|Canada Mexio.
How can I read the Value in File1 and check if it matches with the first delimited string in File2 using awk? I've tried this But i could not achieve it. Please help.
For the Above Input the Output should be
USA Matches
China Not Matched
India Matches
Canada Not Matches
Could you please try following.
awk 'FNR==NR{a[$1];next} {for(i=1;i<=NF;i++){if($i in a){print $i,"Matches"} else {print $i,"Not Matches."}}}' FS="|" Input_file2 FS="," Input_file1
You can try Perl also
$ cat vinoth1
USA,China,India,Canada
$ cat vinoth2
Iran|XXXXXX|Iranian
Iraq|YYYYYY|Iraquian
Saudi|ZZZZZ|Saudi is a Rich Country
USA|AAAAAA|USA is United States of America.
India|IIII|India got freedom from British.
Scot|SSSSS|Canada Mexio.
$ perl -F, -lane ' BEGIN { $x=qx(cat vinoth2) } print $_,$x=~/^$_/m ? " matches" : " not matches" for(#F) ' vinoth1
USA matches
China not matches
India matches
Canada not matches
Related
I'm trying to use Awk to compare the content of two large CSV files (more than 5000 rows each) but I can't get what I want.
Here is my problem:
I have a first file (File1) with a list of names ($1) and cities ($2) whose structure looks like this:
john;london
marc;paris
karen;new york
ben;london
vic;dublin
I have a second file (File2) with other information where we find some names ($3) of File1:
45456;windows;john;454646
47764;mac;zack;470093
41225;mac;ben;622101
12634;windows;ben;218996
7856;windows;karen;637294
12;mac;finn;878317
2315;windows;beverly;221167
445;windows;lilly;12316
3232;mac;john;601316
4546;mac;fish;305035
487;windows;vic;447421
46464;mac;karen;468154
I would like to extract from File2 all the lines whose names appear in File1 while adding the cities associated to each name in File1. Here is an example of the result I am looking for:
45456;windows;john;454646;london
3232;mac;john;601316;london
7856;windows;karen;637294;new york
46464;mac;karen;468154;new york
41225;mac;ben;622101;london
12634;windows;ben;218996;london
487;windows;vic;447421;dublin
Could you help me?
Build an associative array of the first file, making the name the index and the city the value. For the second file check if the name features in the aray, if yes, print the line and attach the city.
awk -F';' 'NR==FNR{a[$1]=$2}$3 in a{print $0";"a[$3]}' File1 File2
45456;windows;john;454646;london
41225;mac;ben;622101;london
12634;windows;ben;218996;london
7856;windows;karen;637294;new york
3232;mac;john;601316;london
487;windows;vic;447421;dublin
46464;mac;karen;468154;new york
With bash, GNU sort and GNU join:
join -t ';' -1 1 -2 3 <(sort File1) <(sort -t ';' -k 3 File2) -o 2.1,2.2,1.1,2.4,1.2
Output:
12634;windows;ben;218996;london
41225;mac;ben;622101;london
45456;windows;john;454646;london
3232;mac;john;601316;london
46464;mac;karen;468154;new york
7856;windows;karen;637294;new york
487;windows;vic;447421;dublin
Using csvkit
csvjoin -d ';' -H -c 3,1 File2 File1 | csvformat -D ';' | tail -n +2
I have a text file some miscellaneous data and with some money expenses. I want to search all dollar quantities between specific lines and sum them. Specific lines meaning search for dollar quantities between lines 6 and 8.
Here's an example of my text file:
Mary had a little $5.00 lamb
Bing bang bow
Blah blah blah
STARBUCKS Jan 8th, 2019 $7.00
MCDONALD'S Jan 10th, 2019 $6.00
UBER Jan 11th, 2019 $20.01
The expected answer is $33.01
I found that in VI I can search dollar quantities like this:
/$\d\{2}\|\$\d\{1}
I also saw in my search results that AWK can search numbers and sum them, but I couldn't figure out how to tailor those suggestions to my problem.
Use $ as field separator. If there is a second column (NF==2) sum values in second column.
awk -F '$' 'NF==2{sum+=$2} END{print sum}' file
You could use awk with some pattern matching :
awk '$NF ~/^\$.*$/{amt+=substr($NF,2)}END{print "$" amt}' file
$33.01
A very generic solution uses a regular expression with positive lookbehind:
grep -oP --regexp='(?<=\$)[0-9\.]*' inputFile | paste -s -d+ | bc
The regex (?<=\$)[0-9\.]* matches only sequences of digits and '.' if they are preceded by '$'
A modified solution using awk looks like this:
grep -oP --regexp='(?<=\$)[0-9\.]*' inputFile | awk '{s+=$1} END {print s}'
Both commands return 33.01
To limit the summation to specified lines, you can prepend awk 'NR>5 && NR<9{print $0}':
awk 'NR>5 && NR<9{print $0}' inputFile | grep -oP --regexp='(?<=\$)[0-9\.]*' | awk '{s+=$1} END {print s}'
You can try Perl
$ perl -ne ' /\$(\S+)/ and $sum+=$1 ; END { print $sum } ' quantile.txt
38.01
the given input is
$ cat quantile.txt
Mary had a little $5.00 lamb
Bing bang bow
Blah blah blah
STARBUCKS Jan 8th, 2019 $7.00
MCDONALD'S Jan 10th, 2019 $6.00
UBER Jan 11th, 2019 $20.01
if your data in 'd'
perl -ne 'BEGIN{$s=0} if($.>=6) {/\$([\d.]+)/; $s+=$1} END{print "total=$s"}' d
file 1 : emp.txt
7839|KING|PRESIDENT||17-Nov-81|5000||10
7698|BLAKE|MANAGER|7839|01-May-81|2850||30
7782|CLARK|MANAGER|7839|09-Jun-81|2450||10
7566|JONES|MANAGER|7839|02-Apr-81|2975||20
7788|SCOTT|ANALYST|7566|19-Apr-87|3000||20
7902|FORD|ANALYST|7566|03-Dec-81|3000||20
7369|SMITH|CLERK|7902|17-Dec-80|800||20
7499|ALLEN|SALESMAN|7698|20-Feb-81|1600|300|30
7521|WARD|SALESMAN|7698|22-Feb-81|1250|500|30
7654|MARTIN|SALESMAN|7698|28-Sep-81|1250|1400|30
file 2 : dept.txt
10|ACCOUNTING|NEW YORK
20|RESEARCH|DALLAS
30|SALES|CHICAGO
40|OPERATIONS|BOSTON
I want to print below output :
7839|KING|PRESIDENT||17-Nov-81|5000||10|NEW YORK
7698|BLAKE|MANAGER|7839|01-May-81|2850||30|CHICAGO
7782|CLARK|MANAGER|7839|09-Jun-81|2450||10|NEW YORK
7566|JONES|MANAGER|7839|02-Apr-81|2975||20|DALLAS
7788|SCOTT|ANALYST|7566|19-Apr-87|3000||20|DALLAS
7902|FORD|ANALYST|7566|03-Dec-81|3000||20|DALLAS
7369|SMITH|CLERK|7902|17-Dec-80|800||20|DALLAS
7499|ALLEN|SALESMAN|7698|20-Feb-81|1600|300|30|CHICAGO
7521|WARD|SALESMAN|7698|22-Feb-81|1250|500|30|CHICAGO
7654|MARTIN|SALESMAN|7698|28-Sep-81|1250|1400|30|CHICAGO
I tried below awk statement, but it is not printing anything -
awk -F'|' 'NR==FNR {val[$1]=$3; next} $8 in val {print $1,$2,$3,$4,$5,$6,$7,$8,val[$1]}' OFS="|" dept.txt emp.txt
Any Suggestion ??
Use $NF, which is the value of the last field:
➜ awk '
BEGIN { FS = OFS = "|" }
NR==FNR { location[$1] = $NF; next }
{ print (location[$NF] ? $0 OFS location[$NF] : $0) }
' dept.txt emp.txt
7839|KING|PRESIDENT||17-Nov-81|5000||10|NEW YORK
7698|BLAKE|MANAGER|7839|01-May-81|2850||30|CHICAGO
7782|CLARK|MANAGER|7839|09-Jun-81|2450||10|NEW YORK
7566|JONES|MANAGER|7839|02-Apr-81|2975||20|DALLAS
7788|SCOTT|ANALYST|7566|19-Apr-87|3000||20|DALLAS
7902|FORD|ANALYST|7566|03-Dec-81|3000||20|DALLAS
7369|SMITH|CLERK|7902|17-Dec-80|800||20|DALLAS
7499|ALLEN|SALESMAN|7698|20-Feb-81|1600|300|30|CHICAGO
7521|WARD|SALESMAN|7698|22-Feb-81|1250|500|30|CHICAGO
7654|MARTIN|SALESMAN|7698|28-Sep-81|1250|1400|30|CHICAGO
This assumes you still want the entire line regardless if the dept city index exists. If not then please update your question to reflect common use cases and expected output.
The problem is there are two spaces in front of the matching column. Since you are using '|' as your field separator then each row of the second file is divided as follows.(Using the first row as example.)
10|ACCOUNTING|NEW YORK
$1=" 10"
$2="ACCOUNTING"
$3="NEW YORK"
So you are mapping Accounting with " 10" rather than "10". Thats why you don't get any match in the second file. (Assuming you wanted to use val[$8] rather than val[$1] in the second print command).
Do the following. This will fix your problem.
awk -F'|' 'NR==FNR {sub(" ","",$1);val[$1]=$3; next;} $8 in val {print $1,$2
,$3,$4,$5,$6,$7,$8,val[$8]}' OFS="|" dept.txt emp.txt
Output:
7839|KING|PRESIDENT||17-Nov-81|5000||10|NEW YORK
7698|BLAKE|MANAGER|7839|01-May-81|2850||30|CHICAGO
7782|CLARK|MANAGER|7839|09-Jun-81|2450||10|NEW YORK
7566|JONES|MANAGER|7839|02-Apr-81|2975||20|DALLAS
7788|SCOTT|ANALYST|7566|19-Apr-87|3000||20|DALLAS
7902|FORD|ANALYST|7566|03-Dec-81|3000||20|DALLAS
7369|SMITH|CLERK|7902|17-Dec-80|800||20|DALLAS
7499|ALLEN|SALESMAN|7698|20-Feb-81|1600|300|30|CHICAGO
7521|WARD|SALESMAN|7698|22-Feb-81|1250|500|30|CHICAGO
7654|MARTIN|SALESMAN|7698|28-Sep-81|1250|1400|30|CHICAGO
In your code line, you should call the hash by the column that has the id where you hashed each value, in your case, column 8 is the one that stores the common id for the file you want to print the info out.
awk -F\| 'NR==FNR {val[$1]=$3; next} {print $1, $2, $3, $4, $5, $6, $7, $8, val[$8]};' OFS="|" dept.txt emp.txt
The title is self explanatory. I am calling a web service which is returning a String like this :
First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623"
Now i have to write a shell script to create a csv file named CSV_Output.csv and the file must be formatted with the String content.
The format must be something like this :
Field Name(in yellow color) Value(in yellow color)
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)6589745623
Now I can easily generate a CSV file using redirection(>>), but how can i create and format a CSV file like in the format show above ?
Sorry, to be blunt and i have no code to show as well, as i am not understanding what to use here.
Kindly provide some suggestions(sample code). Any help is greatly appreciated .
an awk one-liner could convert the format:
awk -v RS="\\n|;" -v OFS="\t" -F= '{gsub(/"/,"");$1=$1}7' file
if you want the output to look better, you could pass the output to column and change the OFS like:
awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' file|column -s"#" -t
the output is:
kent$ awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' f|column -s"#" -t
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)658974562
short explanation:
awk #awk command
-v RS="\\n|;" #set line separator is \n(newline) or ;(semi)
-v OFS="\t" #set output field separator: <tab>
-F= #set "=" as field separator
'{gsub(/"/,""); #remove all double quotes
$1=$1} #$1=$1, to let awk reformat the line with given OFS
7' #the non-zero number to print the whole line.
Can be achieved using tr and column:
$ cat input
First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623"
$ cat input | tr ";" "\n" | column -s= -t | tr -d \"
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)6589745623
Split input on ;; pipe the output to column specifying = as the delimiter, get rid of quotes!
EDIT: Didn't realize that you want a CSV. In that event, use:
$ cat input | tr ";" "\n" | tr "=" "\t" | tr -d \"
which will result into a TAB delimited output.
I have an input;
8765843279 "dma_code":"501","dma_region";"NEW YORK, NY","check_fpc_cookie":"-1","check_tpc_cookie":"1"
I would like the Output as;
8765843279 dma_code 501
8765843279 dma_region NEW YORK, NY
8765843279 check_fpc_cookie -1
8765843279 check_tpc_cookie 1
Please could someone assist?
This can make it:
$ awk -v RS="," -F"\"" 'NR==1{v=$1}{print v, $2, $4}' file
8765843279 dma_code 501
8765843279 dma_region NEW YORK NY
8765843279 check_fpc_cookie -1
8765843279 check_tpc_cookie 1
Explanation
It is based on this solution by Ed Morton.
RS="," => records are separated by commas.
-F"\"" => fields within a record are separated by quotes ".
NR==1 {v=$1} stores the value of first record.
'{print v, $2, $4}' prints value + 2nd field + 4th field. 2nd and 4th position are based on cutting the fields with the field separator, that was set to F="\"".