need some help on awk command - unix

need a help with awk. reading a csv file and, doing some substitution on some of the columns. It's like 9th column(string type) should be replaced by value of (9th column itself + value of the 4th column(integer)), then 15th column by $15+$12, column 26th with $26+$23. same has to be done line by line for all the records. Suggestions please
Below is the sample I/O. and the first line which is Description must be left as is.
sample Input
EmpID|Empname|Empadd|roleId|roleDesc|Dept
100|mst|Del|20|SD|DA
101|ms|Del|21|XS|DA
Sample output
EmpID|Empname|Empadd|roleId|roleDesc|Dept
100|mst100|Del|20|SD20|DA
101|ms101|Del|21|XS21|DA
it's like empname has been concatenated with empid & the role desc with roleID.Hope that's helpful :)

This will perform the needed transformation:
$ awk 'NR>1{$2=$2$1;$5=$5$4}1' FS='|' OFS='|' file
EmpID|Empname|Empadd|roleId|roleDesc|Dept
100|mst100|Del|20|SD20|DA
101|ms101|Del|21|XS21|DA
If you have to do this for many columns you can use a for loop like so (provided a arithmetic or geometric stepsize):
$ awk 'NR>1{for(i=2;i<=5;i+=3)$i=$i$(i-1)}1' FS='|' OFS='|' file
EmpID|Empname|Empadd|roleId|roleDesc|Dept
100|mst100|Del|20|SD20|DA
101|ms101|Del|21|XS21|DA

When you say +, I'm assuming you mean string concatentation. IN awk, there is no specific concatenation operator, you just put two strings side-by-side.
awk -F, -v OFS=, '{$9 = $9 $4; $15=$15$12; $26=$26$23; print}' file.csv
Also assuming that by "csv", you actually mean comma-separated.
If you want to edit the file in-place, you need to do this:
awk ... file.csv > newfile && mv file.csv file.csv.bak && mv newfile file.csv
Edit: to leave the first line untouched:
awk -F, -v OFS=, 'NR>1 {$9 = $9 $4; $15=$15$12; $26=$26$23} {print}' file.csv
Now the columns are modified for the 2nd and subsequent lines, but every line is printed.
You'll sometimes see that written this way:
awk -F, -v OFS=, 'NR>1 {$9 = $9 $4; $15=$15$12; $26=$26$23} 1' file.csv

Related

How can I identify lines from a delimited file, based on a lookup file in unix

Assume that there are two files
File1 - lookup.txt
CAN
USD
INR
EUR
Another file Input.txt
1~Canada~CAN
2~United States of America~USD
3~Brazil~BRL
Both files may be very huge, hypothetically several thousand of records . Now I'm trying to identify the records in Input.txt and identify them based on values in lookup file.
The expected output should be
1~Canada~CAN
2~United States of America~USD
I tried to do something like below
#!/bin/sh
lookupFile=$1 #lookup.txt
inputFile=$2 #input.txt
outputFile=$3 #output.txt
while IFS= read -r line
do
awk -F'~' '{if ($3==$line) print >> $outputFile}' $inputFile
done < "$lookupFile"
But I'm getting error like
awk: cmd. line:1: (FILENAME=input.txt FNR=2) fatal: can't redirect to
How can I fix this issue ? Also if the files really huge, with several thousand of records to search, is this an efficient way ?
With your shown samples please try following awk code. We could do this in single awk we need to take care of setting field separator as ~ before input.txt.
awk 'FNR==NR{arr[$0];next} ($3 in arr)' lookup.txt FS="~" input.txt
Explanation:
awk ' ##starting awk program from here.
FNR==NR{ ##Checking condition which will be TRUE when lookup.txt is being read.
arr[$0] ##Creating array arr with $0 as index.
next ##next to skip all further statements from here.
}
($3 in arr) ##If $3 is present in arr then print that line.
' lookup.txt FS="~" input.txt ##Mentioning Input_files and setting FS to ~ before input.txt
A non-awk solution that you could compare with on the performance point of view:
$ grep -wFf lookup.txt input.txt
1~Canada~CAN
2~United States of America~USD
Warning: this does not match only on the last word. So if some values in lookup.txt can also be found elsewhere in input.txt, prefer another solution. Or, if it contains nothing that could be interpreted as a regular expression operator, preprocess lookup.txt before grep. Example with bash, sed and grep:
$ grep -f <( sed 's/.*/~&$/' lookup.txt ) input.txt
1~Canada~CAN
2~United States of America~USD

awk write in file with column separator

I am reading a file and writing first 2 columns into an output file.
I want write with "," as a column separator
I tried with
awk -F"," -OFS"|" '{print $1 , $2}' filename
The output file doesn't have | separator
Thanks
Pratik
Yes it will not print since you didn't write it properly. Following are the 2 ways to mention OFS in any awk program.
1st way: By using -v OFS="|" mention it as a variable.
awk -F"," -v OFS="|" '{print $1,$2}' filename
2nd way: Use BEGIN section of awk for mentioning it(which is recommended too).
awk 'BEGIN{FS=",";OFS="|"}{print $1,$2}' filename
3rd way: As per ghoti's comment adding 1 more way of assigning value for OFS here. We could assign it before mentioning Input_file names too by doing this we could set different OFS values for different Input_file(s)(since awk could read multiple Input_files so it can help in those kind of situations). Eg-->
awk '{print $1,$2}' FS="," OFS="|" Input_file1 FS=":" OFS=";" Input_file2
In above command for Input_file1 FS is , and OFS is | and for Input_file2 FS is : and OFS is ;. Thanks to ghoti sir for mentioning this in comments :)

Combining two awk commands in single command

I want to combine these two command and want to invoke single command
In first command i am storing 4th column of x.csv(Separator ,) file in z.csv file.
awk -F, '{print $4}' x.CSV > z.csv
In second command, i want to find out unique first-column value of z.csv(Separator-space) file.
awk -F\ '{print $1}' z.csv|sort|uniq
I want to combine these two command in single command,How can i do that?
Pipe the output of the first awk to the second awk:
awk -F, '{print $4}' x.CSV | awk -F\ '{print $1}' |sort|uniq
or, as Avinash Raj suggested,
awk -F, '{print $4}' x.CSV | awk -F\ '{print $1}' | sort -u
Assuming that the content of z.csv is actually wanted, rather than just an artefact of the way you're currently implementing your program, then you can use:
awk -F, '{ print $4 > "z.csv"
split($4, f, " ")
f4[f[1]] = 1
}
END { for (i in f4) print i }' x.CSV
The split function breaks field 4 on spaces, and (associative) array f4 records the key value. The loop at the end prints out the distinct values, unsorted. If you need them sorted, you can either use GNU awk's built-in sort functions or (if you don't have an awk with built-in sort functions) write your own in awk, or pipe the output to sort.
With GNU awk, you can replace the END block with:
END { asorti(f4); for (i in f4) print f4[i] }
If you don't want the z.csv file, then (a) you could have used a pipe in the first place, and (b) you can simply remove the print $4 > "z.csv" line.
awk '{split($4,b," "); a[b[1]]=1} END { for( i in a) print i }' FS=, x.CSV
This does not sort the data, but it's not clear if you actually want it sorted or merely needed that to get unique entries. If you do want it sorted, pipe it to sort.

AWK : Unzipping and printing File Name and first line

I am trying to unzip files in folder and print first line LASTMODIFIEDDATE
But the below will print First line with '-'
for file in /export/home/xxxxxx/New_folder/*.gz;
do
gzip -dc "$file" | awk 'NR=1 {print $0, FILENAME}' | awk '/LASTMODIFIEDDATE/'
done
1.How can i modify the above code to print filename that is unzipped.
2.I am a beginner and suggestion to improve the above code are welcome
A few issues:
Your first awk should have double equals signs if you mean to address the first line:
awk 'NR==1{...}'
Your second awk will only ever see the output of the first awk, which only shows the first line, so you will not see any lines with LASTMODIFIED in them unless they are the first. So this will show you the first line and any lines containing LASTMODIFIED.
for ...
do
echo $file
gzip -dc "$file" | awk 'NR==1 || /LASTMODIFIED/'
done
Or you may mean this:
for ...
do
gzip -dc "$file" | awk -v file="$file" 'NR==1{print $0 " " file} /LASTMODIFIED/'
done
which will print the first line followed by the filename and also any lines containing LASTMODIFIED.
Do this with an echo. Also you might want to use grep instead of awk in this case.
for file in /export/home/xxxxxx/New_folder/*.gz;
do
echo $file
gzip -dc "$file" | grep LASTMODIFIEDDATE
done

Count number of blank lines in a file

In count (non-blank) lines-of-code in bash they explain how to count the number of non-empty lines.
But is there a way to count the number of blank lines in a file? By blank line I also mean lines that have spaces in them.
Another way is:
grep -cvP '\S' file
-P '\S'(perl regex) will match any line contains non-space
-v select non-matching lines
-c print a count of matching lines
If your grep doesn't support -P option, please use -E '[^[:space:]]'
One way using grep:
grep -c "^$" file
Or with whitespace:
grep -c "^\s*$" file
You can also use awk for this:
awk '!NF {sum += 1} END {print sum}' file
From the manual, "The variable NF is set to the total number of fields in the input record". Since the default field separator is the space, any line consisting in either nothing or some spaces will have NF=0.
Then, it is a matter of counting how many times this happens.
Test
$ cat a
aa dd
ddd
he llo
$ cat -vet a # -vet to show tabs and spaces
aa dd$
$
ddd$
$
^I$
he^Illo$
Now let's' count the number of blank lines:
$ awk '!NF {s+=1} END {print s}' a
3
grep -v '\S' | wc -l
(On OSX the Perl expressions are not available, -P option)
grep -cx '\s*' file
or
grep -cx '[[:space:]]*' file
That is faster than the code in Steve's answer.
Using Perl one-liner:
perl -lne '$count++ if /^\s*$/; END { print int $count }' input.file
To count how many useless blank lines your colleague has inserted in a project you can launch a one-line command like this:
blankLinesTotal=0; for file in $( find . -name "*.cpp" ); do blankLines=$(grep -cvE '\S' ${file}); blankLinesTotal=$[${blankLines} + ${blankLinesTotal}]; echo $file" has" ${blankLines} " empty lines." ; done; echo "Total: "${blankLinesTotal}
This prints:
<filename0>.cpp #blankLines
....
....
<filenameN>.cpp #blankLines
Total #blankLinesTotal

Resources