How To Run Multiple "awk" commands: - unix

Would like to run multiple "awk" commands in single script ..
For example Master.csv.gz located at /cygdrive/e/Test/Master.csv.gz and
Input files are located in different sub directories like /cygdrive/f/Jan/Input_Jan.csv.gz & /cygdrive/f/Feb/Input_Feb.csv.gz and so on ..
All input files are *.gz extension files.
Below commands are working fine while executing command one by one:
Command#1
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) >>Output.txt
Output#1:
Name,Age,Location
abc,20,xxx
Command#2
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) >>Output.txt
Output#2:
Name,Age,Location
def,40,yyy
cat Output.txt
Name,Age,Location
abc,20,xxx
def,40,yyy
Have tried below commands to run in via single script , got error:
Attempt#1: awk -f Test.awk
cat Test.awk
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) >>Output.txt
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) >>Output.txt
Error : Attempt#1: awk -f Test.awk
awk: Test.awk:1: ^ invalid char ''' in expression
awk: Test.awk:1: ^ syntax error
Attempt#2: sh Test.sh
cat Test.sh
#!/bin/sh
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) >>Output.txt
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) >>Output.txt
Error : Attempt#2: sh Test.sh
Test.sh: line 2: syntax error near unexpected token `('
Desired Output:
Name,Age,Location
abc,20,xxx
def,40,yyy
Looking for your suggestions ..
Update#2-Month Name
Ed Morton, Thanks for the inputs, however the output order are not proper , "Jan2014" is print on next line , please suggest
cat Output.txt:
Name,Age,Location
abc,20,xxx
Jan2014
def,40,yyy
Feb2014
Expected Output
Name,Age,Location
abc,20,xxx,Jan2014
def,40,yyy,Feb2014

All you need is:
#!/bin/bash
awk -F, 'FNR==NR{a[$2]; next} $2 in a' \
<(gzip -dc /cygdrive/e/Test/Master.csv.gz) \
<(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) \
<(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) \
>> Output.txt
If you want to print the month name too then the simplest thing would be:
#!/bin/bash
awk -F, 'FNR==NR{a[$2]; next} $2 in a{print $0, mth}' \
<(gzip -dc /cygdrive/e/Test/Master.csv.gz) \
mth="Jan" <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) \
mth="Feb" <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) \
>> Output.txt
but you could remove the redundant specifying of the month name 3 times on each line with:
#!/bin/bash
mths=(Jan Feb)
awk -F, 'FNR==NR{a[$2]; next} $2 in a{print $0, mth}' \
<(gzip -dc /cygdrive/e/Test/Master.csv.gz) \
mth="${mths[$((i++))]}" <(gzip -dc "/cygdrive/f/${mths[$i]}/Input_${mths[$i]}.csv.gz") \
mth="${mths[$((i++))]}" <(gzip -dc "/cygdrive/f/${mths[$i]}/Input_${mths[$i]}.csv.gz") \
>> Output.txt

Your first attempt failed because you were trying to call awk in an awk script, and your second attempt failed because the bash process substitution, <(...), is not defined by POSIX, and is not guaranteed to work with /bin/sh. Here is an awk script that should work.
#!/usr/bin/awk -f
BEGIN {
if (ARGC < 3) exit 1;
ct = "cat ";
gz = "gzip -dc "
f = "\"" ARGV[1] "\"";
c = (f~/\.gz$/)?gz:ct;
while ((c f | getline t) > 0) {
split(t, a, ",");
A[a[2]] = t;
}
close(c f);
for (n = 2; n < ARGC; n++) {
f = "\"" ARGV[n] "\"";
c = (f~/\.gz$/)?gz:ct;
while ((c f | getline t) > 0) {
split(t, a, ",");
if (a[2] in A) print t;
}
close(c f);
}
exit;
}
usage
script.awk /cygdrive/e/Test/Master.csv.gz /cygdrive/f/Jan/Input_Jan.csv.gz
script.awk /cygdrive/e/Test/Master.csv.gz /cygdrive/f/Feb/Input_Feb.csv.gz
or
script.awk /cygdrive/e/Test/Master.csv.gz /cygdrive/f/Jan/Input_Jan.csv.gz\
/cygdrive/f/Feb/Input_Feb.csv.gz

Related

UNIX shell script reading csv

I have a csv file. I would like to put the fields into different variables. Supposed there are three fields in each line of the csv file. I have this code:
csvfile=test.csv
while read inline; do
var1=`echo $inline | awk -F',' '{print $1}'`
var2=`echo $inline | awk -F',' '{print $2}'`
var3=`echo $inline | awk -F',' '{print $3}'`
.
.
.
done < $csvfile
This code is good. However, if a field is coded with an embedded comma, then, it would not work. Any suggestion? For example:
how,are,you
I,"am, very",good
this,is,"a, line"
This may not be the perfect solution but it will work in your case.
[cloudera#quickstart Documents]$ cat cd.csv
a,b,c
d,"e,f",g
File content
csvfile=cd.csv
while read inline; do
var1=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $1}' | sed 's/*/,/g'`
var2=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $2}' | sed 's/*/,/g'`
var3=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $3}' | sed 's/*/,/g'`
echo $var1 " " $var2 " " $var3
done< $csvfile
Output :
[cloudera#quickstart Documents]$ sh a.sh
a b c
d e,f g
So basically first we are trying to handle "," in data and then replacing the "," with "*" to get desired column using awk and then reverting * to "," again to get actual field value

generate header and trailer after splitting files

This is coding that I already do splitting:
awk -v DATE="$(date +"%d%m%Y")" -F\, '
BEGIN{OFS=","}
NR==1 {h=$0; next}
{
gsub(/"/, "", $1);
file="Assgmt_"$1"_"DATE".csv";
print (a[file]++?"":h ORS) $0 > file
}
' Test_01012020.CSV
but then, how can I add some header and trailer into above command?
I hope this helps you,
awk -v DATE="$(date +"%d%m%Y")" -F\, '
BEGIN{OFS=","}
NR==1 {h=$0; next}
{
gsub(/"/, "", $1);
file="Assgmt_"$1"_"DATE".csv";
print (a[file]++?"":DATE ORS h ORS) $0 > file
}
END{for(file in a) print "EOF" > file}
' Test_01012020.CSV

AWK Include Whitespaces in Command

I have String: "./Delivery Note.doc 1" , where:
$1 = ./Delivery
$2 = Note.doc
$3 = 1
I need to execute sum command concatenating $1 and $2 but keeping white space (./Delivery Note.doc). I try this but it trim whitespaces:
| '{ command="sum -r "$1 $2"
Result: ./DeliveryNote.doc
To execute the sum command
echo "./Delivery Note.doc 1" | awk '{ command="sum -r \""$1" "$2"\""; print command}' | bash
$ echo "./Delivery Note.doc 1" | awk '{ command="sum -r "$1" "$2; print command}'
sum -r ./Delivery Note.doc

How to find the distinct values in unix

I need distinct values from the below columns:
AA|BB|CC
a#gmail.com,c#yahoo.co.in|a#gmail.com|a#gmail.com
y#gmail.com|x#yahoo.in,z#redhat.com|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in
Here records are '|' seperated and in the 1st column, we can two email id's which are ',' seperated. so, I want to consider that also. I want distinct email id's in the AA,BB,CC column, whether it is '|' seperated or ',' seperated.
Expected output:
c#yahoo.co.in|a#gmail.com|
y#gmail.com|x#yahoo.in|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in
is awk unix enough for you?
{
for(i=1; i < NF; i++) {
if ($i ~ /#/) {
mail[$i]++
}
}
}
END {
for (x in mail) {
print mail[x], x
}
}
output:
$ awk -F'[|,]' -f v.awk f1
2 z#redhat.com
3 a#gmail.com
1 x#yahoo.in
1 c#yahoo.co.in
1 c#gmail.com
1 y#gmail.com
1 b#yahoo.co.in
Using awk :
cat file | tr ',' '|' | awk -F '|' '{ line=""; for (i=1; i<=NF; i++) {if ($i != "" && list[NR"#"$i] != 1){line=line $i "|"}; list[NR"#"$i]=1 }; print line}'
Prints :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|z#redhat.com|
c#gmail.com|b#yahoo.co.in|c#uix.xo.in|
Edit :
Now works properly with inputs such as :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|a#gmail.com|
c#gmail.com|c#yahoo.co.in|c#uix.xo.in|
Prints :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|a#gmail.com|
c#gmail.com|c#yahoo.co.in|c#uix.xo.in|
The following python code will solve your problem:
#!/usr/bin/env python
while True:
try:
addrs = raw_input()
except EOFError:
break
print '|'.join(set(addrs.replace(',', '|').split('|')))
In Bash only:
while read s; do
IFS='|,'
for e in $s; do
echo "$e"
done | sort | uniq
unset IFS
done
This seems to work, although I'm not sure what to do if there are more than three unique mails. Run with awk -f filename.awk dataname.dat
BEGIN {IFS=/[,|]/}
NF {
delete uniqmails;
for (i=1; i<=NF; i++)
uniqmails[$i] = 1;
sep="";
n=0;
for (m in uniqmails) {
printf "%s%s", sep, m;
sep="|";
n++;
}
for (;n<3;n++) printf "|";
print ""; // EOL
}
There's also this "one-liner" that doesn't need awk:
while read line; do
echo $line | tr ",|" "\n" | sort -u |\
paste <( seq 3) - | cut -f 2 |\
tr "\n" "|" |\
rev | cut -c 2- | rev;
done
With perl:
perl -lane '$s{$_}++ for split /[|,]/; END { print for keys %s;}' input
I have edited this post, Hope it will work
while read line
do
val1=`echo $line|awk -F"|" '{print $1}'`
val2=`echo $line|awk -F"|" '{print $2}'`
val3=`echo $line|awk -F"|" '{print $3}'`
a=`echo $line|awk -F"|" '{print $2,"|",$3}'|sed 's/'$val1'//g'`
aa=`echo "$val1|$a"`
b=`echo $aa|awk -F"|" '{print $1,"|",$3}'|sed 's/'$val2'//g'`
b1=`echo $b|awk -F"|" '{print $1}'`
b2=`echo $b|awk -F"|" '{print $2}'`
bb=`echo "$b1|$val2|$b2"`
c=`echo $bb|awk -F"|" '{print $1,"|",$2}'|sed 's/'$val3'//g'`
cc=`echo "$c|$val3"|sed 's/,,/,/;s/,|/|/;s/|,/|/;s/^,//;s/ //g'`
echo "$cc">>abcd
done<ab.dat
cat abcd
c#yahoo.co.in||a#gmail.com
y#gmail.com|x#yahoo.in|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in
You can subtract all "," separated values and parse in the same way...if your all values are having "," separated.

sub and gsub function?

I have this command:
$ find $PWD -name "*.jpg" | awk '{system( "echo " $(sub(/\//, "_")) ) }'
_home/mol/Pulpit/test/1.jpg
Now the same thing, but using gsub:
$ find $PWD -name "*.jpg" | awk '{system( "echo " $(gsub(/\//, "_")) ) }'
mol#mol:~
I want to get the result:
_home_mol_Pulpit_test_1.jpg
Thank you for your help.
EDIT:
I put 'echo' to test the command:
$ find $PWD -name "*.jpg" | awk '{gsub("/", "_")} {system( "echo " mv $0 " " $0) }'
_home_mol_Pulpit_test_1.jpg _home_pic_Pulpit_test_1.jpg
mol#mol:~
I want to get the result:
$ find $PWD -name "*.jpg" | awk '{gsub("/", "_")} {system( "echo " mv $0 " " $0) }'
/home/pic/Pulpit/test/1.jpg _home_pic_Pulpit_test_1.jpg
That won't work if the string contains more than one match... try this:
echo "/x/y/z/x" | awk '{ gsub("/", "_") ; system( "echo " $0) }'
or better (if the echo isn't a placeholder for something else):
echo "/x/y/z/x" | awk '{ gsub("/", "_") ; print $0 }'
In your case you want to make a copy of the value before changing it:
echo "/x/y/z/x" | awk '{ c=$0; gsub("/", "_", c) ; system( "echo " $0 " " c )}'

Resources