How To Run Multiple "awk" commands:

How To Run Multiple "awk" commands: - unix

Would like to run multiple "awk" commands in single script ..
For example Master.csv.gz located at /cygdrive/e/Test/Master.csv.gz and
Input files are located in different sub directories like /cygdrive/f/Jan/Input_Jan.csv.gz & /cygdrive/f/Feb/Input_Feb.csv.gz and so on ..
All input files are *.gz extension files.
Below commands are working fine while executing command one by one:
Command#1
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) >>Output.txt
Output#1:
Name,Age,Location
abc,20,xxx
Command#2
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) >>Output.txt
Output#2:
Name,Age,Location
def,40,yyy
cat Output.txt
Name,Age,Location
abc,20,xxx
def,40,yyy
Have tried below commands to run in via single script , got error:
Attempt#1: awk -f Test.awk
cat Test.awk
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) >>Output.txt
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) >>Output.txt
Error : Attempt#1: awk -f Test.awk
awk: Test.awk:1: ^ invalid char ''' in expression
awk: Test.awk:1: ^ syntax error
Attempt#2: sh Test.sh
cat Test.sh
#!/bin/sh
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) >>Output.txt
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$2] = $0; next} ($2 in a) {print $0}' <(gzip -dc /cygdrive/e/Test/Master.csv.gz) <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) >>Output.txt
Error : Attempt#2: sh Test.sh
Test.sh: line 2: syntax error near unexpected token `('
Desired Output:
Name,Age,Location
abc,20,xxx
def,40,yyy
Looking for your suggestions ..
Update#2-Month Name
Ed Morton, Thanks for the inputs, however the output order are not proper , "Jan2014" is print on next line , please suggest
cat Output.txt:
Name,Age,Location
abc,20,xxx
Jan2014
def,40,yyy
Feb2014
Expected Output
Name,Age,Location
abc,20,xxx,Jan2014
def,40,yyy,Feb2014

All you need is:
#!/bin/bash
awk -F, 'FNR==NR{a[$2]; next} $2 in a' \
<(gzip -dc /cygdrive/e/Test/Master.csv.gz) \
<(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) \
<(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) \
>> Output.txt
If you want to print the month name too then the simplest thing would be:
#!/bin/bash
awk -F, 'FNR==NR{a[$2]; next} $2 in a{print $0, mth}' \
<(gzip -dc /cygdrive/e/Test/Master.csv.gz) \
mth="Jan" <(gzip -dc /cygdrive/f/Jan/Input_Jan.csv.gz) \
mth="Feb" <(gzip -dc /cygdrive/f/Feb/Input_Feb.csv.gz) \
>> Output.txt
but you could remove the redundant specifying of the month name 3 times on each line with:
#!/bin/bash
mths=(Jan Feb)
awk -F, 'FNR==NR{a[$2]; next} $2 in a{print $0, mth}' \
<(gzip -dc /cygdrive/e/Test/Master.csv.gz) \
mth="${mths[$((i++))]}" <(gzip -dc "/cygdrive/f/${mths[$i]}/Input_${mths[$i]}.csv.gz") \
mth="${mths[$((i++))]}" <(gzip -dc "/cygdrive/f/${mths[$i]}/Input_${mths[$i]}.csv.gz") \
>> Output.txt

Your first attempt failed because you were trying to call awk in an awk script, and your second attempt failed because the bash process substitution, <(...), is not defined by POSIX, and is not guaranteed to work with /bin/sh. Here is an awk script that should work.
#!/usr/bin/awk -f
BEGIN {
if (ARGC < 3) exit 1;
ct = "cat ";
gz = "gzip -dc "
f = "\"" ARGV[1] "\"";
c = (f~/\.gz$/)?gz:ct;
while ((c f | getline t) > 0) {
split(t, a, ",");
A[a[2]] = t;
}
close(c f);
for (n = 2; n < ARGC; n++) {
f = "\"" ARGV[n] "\"";
c = (f~/\.gz$/)?gz:ct;
while ((c f | getline t) > 0) {
split(t, a, ",");
if (a[2] in A) print t;
}
close(c f);
}
exit;
}
usage
script.awk /cygdrive/e/Test/Master.csv.gz /cygdrive/f/Jan/Input_Jan.csv.gz
script.awk /cygdrive/e/Test/Master.csv.gz /cygdrive/f/Feb/Input_Feb.csv.gz
or
script.awk /cygdrive/e/Test/Master.csv.gz /cygdrive/f/Jan/Input_Jan.csv.gz\
/cygdrive/f/Feb/Input_Feb.csv.gz

Related

UNIX shell script reading csv

I have a csv file. I would like to put the fields into different variables. Supposed there are three fields in each line of the csv file. I have this code:
csvfile=test.csv
while read inline; do
var1=`echo $inline | awk -F',' '{print $1}'`
var2=`echo $inline | awk -F',' '{print $2}'`
var3=`echo $inline | awk -F',' '{print $3}'`
.
.
.
done < $csvfile
This code is good. However, if a field is coded with an embedded comma, then, it would not work. Any suggestion? For example:
how,are,you
I,"am, very",good
this,is,"a, line"

This may not be the perfect solution but it will work in your case.
[cloudera#quickstart Documents]$ cat cd.csv
a,b,c
d,"e,f",g
File content
csvfile=cd.csv
while read inline; do
var1=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $1}' | sed 's/*/,/g'`
var2=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $2}' | sed 's/*/,/g'`
var3=`echo $inline | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "*", $i) }1' | awk -F',' '{print $3}' | sed 's/*/,/g'`
echo $var1 " " $var2 " " $var3
done< $csvfile
Output :
[cloudera#quickstart Documents]$ sh a.sh
a b c
d e,f g
So basically first we are trying to handle "," in data and then replacing the "," with "*" to get desired column using awk and then reverting * to "," again to get actual field value

generate header and trailer after splitting files

This is coding that I already do splitting:
awk -v DATE="$(date +"%d%m%Y")" -F\, '
BEGIN{OFS=","}
NR==1 {h=$0; next}
{
gsub(/"/, "", $1);
file="Assgmt_"$1"_"DATE".csv";
print (a[file]++?"":h ORS) $0 > file
}
' Test_01012020.CSV
but then, how can I add some header and trailer into above command?

I hope this helps you,
awk -v DATE="$(date +"%d%m%Y")" -F\, '
BEGIN{OFS=","}
NR==1 {h=$0; next}
{
gsub(/"/, "", $1);
file="Assgmt_"$1"_"DATE".csv";
print (a[file]++?"":DATE ORS h ORS) $0 > file
}
END{for(file in a) print "EOF" > file}
' Test_01012020.CSV

AWK Include Whitespaces in Command

I have String: "./Delivery Note.doc 1" , where:
$1 = ./Delivery
$2 = Note.doc
$3 = 1
I need to execute sum command concatenating $1 and $2 but keeping white space (./Delivery Note.doc). I try this but it trim whitespaces:
| '{ command="sum -r "$1 $2"
Result: ./DeliveryNote.doc

To execute the sum command
echo "./Delivery Note.doc 1" | awk '{ command="sum -r \""$1" "$2"\""; print command}' | bash

$ echo "./Delivery Note.doc 1" | awk '{ command="sum -r "$1" "$2; print command}'
sum -r ./Delivery Note.doc

How to find the distinct values in unix

I need distinct values from the below columns:
AA|BB|CC
a#gmail.com,c#yahoo.co.in|a#gmail.com|a#gmail.com
y#gmail.com|x#yahoo.in,z#redhat.com|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in
Here records are '|' seperated and in the 1st column, we can two email id's which are ',' seperated. so, I want to consider that also. I want distinct email id's in the AA,BB,CC column, whether it is '|' seperated or ',' seperated.
Expected output:
c#yahoo.co.in|a#gmail.com|
y#gmail.com|x#yahoo.in|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in

is awk unix enough for you?
{
for(i=1; i < NF; i++) {
if ($i ~ /#/) {
mail[$i]++
}
}
}
END {
for (x in mail) {
print mail[x], x
}
}
output:
$ awk -F'[|,]' -f v.awk f1
2 z#redhat.com
3 a#gmail.com
1 x#yahoo.in
1 c#yahoo.co.in
1 c#gmail.com
1 y#gmail.com
1 b#yahoo.co.in

Using awk :
cat file | tr ',' '|' | awk -F '|' '{ line=""; for (i=1; i<=NF; i++) {if ($i != "" && list[NR"#"$i] != 1){line=line $i "|"}; list[NR"#"$i]=1 }; print line}'
Prints :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|z#redhat.com|
c#gmail.com|b#yahoo.co.in|c#uix.xo.in|
Edit :
Now works properly with inputs such as :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|a#gmail.com|
c#gmail.com|c#yahoo.co.in|c#uix.xo.in|
Prints :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|a#gmail.com|
c#gmail.com|c#yahoo.co.in|c#uix.xo.in|

The following python code will solve your problem:
#!/usr/bin/env python
while True:
try:
addrs = raw_input()
except EOFError:
break
print '|'.join(set(addrs.replace(',', '|').split('|')))

In Bash only:
while read s; do
IFS='|,'
for e in $s; do
echo "$e"
done | sort | uniq
unset IFS
done

This seems to work, although I'm not sure what to do if there are more than three unique mails. Run with awk -f filename.awk dataname.dat
BEGIN {IFS=/[,|]/}
NF {
delete uniqmails;
for (i=1; i<=NF; i++)
uniqmails[$i] = 1;
sep="";
n=0;
for (m in uniqmails) {
printf "%s%s", sep, m;
sep="|";
n++;
}
for (;n<3;n++) printf "|";
print ""; // EOL
}
There's also this "one-liner" that doesn't need awk:
while read line; do
echo $line | tr ",|" "\n" | sort -u |\
paste <( seq 3) - | cut -f 2 |\
tr "\n" "|" |\
rev | cut -c 2- | rev;
done

With perl:
perl -lane '$s{$_}++ for split /[|,]/; END { print for keys %s;}' input

I have edited this post, Hope it will work
while read line
do
val1=`echo $line|awk -F"|" '{print $1}'`
val2=`echo $line|awk -F"|" '{print $2}'`
val3=`echo $line|awk -F"|" '{print $3}'`
a=`echo $line|awk -F"|" '{print $2,"|",$3}'|sed 's/'$val1'//g'`
aa=`echo "$val1|$a"`
b=`echo $aa|awk -F"|" '{print $1,"|",$3}'|sed 's/'$val2'//g'`
b1=`echo $b|awk -F"|" '{print $1}'`
b2=`echo $b|awk -F"|" '{print $2}'`
bb=`echo "$b1|$val2|$b2"`
c=`echo $bb|awk -F"|" '{print $1,"|",$2}'|sed 's/'$val3'//g'`
cc=`echo "$c|$val3"|sed 's/,,/,/;s/,|/|/;s/|,/|/;s/^,//;s/ //g'`
echo "$cc">>abcd
done<ab.dat
cat abcd
c#yahoo.co.in||a#gmail.com
y#gmail.com|x#yahoo.in|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in
You can subtract all "," separated values and parse in the same way...if your all values are having "," separated.

sub and gsub function?

I have this command:
$ find $PWD -name "*.jpg" | awk '{system( "echo " $(sub(/\//, "_")) ) }'
_home/mol/Pulpit/test/1.jpg
Now the same thing, but using gsub:
$ find $PWD -name "*.jpg" | awk '{system( "echo " $(gsub(/\//, "_")) ) }'
mol#mol:~
I want to get the result:
_home_mol_Pulpit_test_1.jpg
Thank you for your help.
EDIT:
I put 'echo' to test the command:
$ find $PWD -name "*.jpg" | awk '{gsub("/", "_")} {system( "echo " mv $0 " " $0) }'
_home_mol_Pulpit_test_1.jpg _home_pic_Pulpit_test_1.jpg
mol#mol:~
I want to get the result:
$ find $PWD -name "*.jpg" | awk '{gsub("/", "_")} {system( "echo " mv $0 " " $0) }'
/home/pic/Pulpit/test/1.jpg _home_pic_Pulpit_test_1.jpg

That won't work if the string contains more than one match... try this:
echo "/x/y/z/x" | awk '{ gsub("/", "_") ; system( "echo " $0) }'
or better (if the echo isn't a placeholder for something else):
echo "/x/y/z/x" | awk '{ gsub("/", "_") ; print $0 }'
In your case you want to make a copy of the value before changing it:
echo "/x/y/z/x" | awk '{ c=$0; gsub("/", "_", c) ; system( "echo " $0 " " c )}'

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How To Run Multiple "awk" commands: - unix

Related

UNIX shell script reading csv

generate header and trailer after splitting files

AWK Include Whitespaces in Command

How to find the distinct values in unix

sub and gsub function?

Categories

Resources