Unix- using awk to print in table format - unix

I want to create a table that looks something like this, where i display the # at the front of every $2 and the Title header but I do not want to store it in my file:
Name ID Gender
John Smith #4567734D Male
Kyle Toh #2437882L Male
Julianne .T #0324153T Female
I got a result like this instead:
Name ID Gender
#
Name ID Gender
John Smith #4567734D Male
Name ID Gender
Kyle Toh #2437882L Male
Name ID Gender
Julianne .T #0324153T Female
By using this command:
awk -F: ' {print "Name:ID:Gender\n"}
{print $1":#"$2":"$3}' data.txt |column -s ":" -t

Print the headers only in the first line:
awk -F: 'NR==1 {print "Name:ID:Gender\n"} {print $1":#"$2":"$3}'
or:
awk -F: 'BEGIN {print "Name:ID:Gender\n"} {print $1":#"$2":"$3}'
Explanation:
An awk program consists of expressions in the form of:
CONDITION { ACTIONS } [ CONDITION { ACTIONS }, ... ]
... where CONDITION and ACTION optional. If CONDITION is missing, like:
{ ACTIONS }
... then ACTIONS would be applied to any line of input. In the case above we have two expressions:
# Only executed if NR==1
NR==1 {print "Name:ID:Gender\n"}
# CONDITION is missing. ACTIONS will be applied to any line of input
{print $1":#"$2":"$3}

Related

Transposing multiple columns in multiple rows keeping one column fixed in Unix

I have one file that looks like below
1234|A|B|C|10|11|12
2345|F|G|H|13|14|15
3456|K|L|M|16|17|18
I want the output as
1234|A
1234|B
1234|C
2345|F
2345|G
2345|H
3456|K
3456|L
3456|M
I have tried with the below script.
awk -F"|" '{print $1","$2","$3","$4"}' file.dat | awk -F"," '{OFS=RS;$1=$1}1'
But the output is generated as below.
1234
A
B
C
2345
F
G
H
3456
K
L
M
Any help is appreciated.
What about a single simple awk process such as this:
$ awk -F\| '{print $1 "|" $2 "\n" $1 "|" $3 "\n" $1 "|" $4}' file.dat
1234|A
1234|B
1234|C
2345|F
2345|G
2345|H
3456|K
3456|L
3456|M
No messing with RS and OFS.
If you want to do this dynamically, then you could pass in the number of fields that you want, and then use a loop starting from the second field.
In the script, you might first check if the number of fields is equal or greater than the number you pass into the script (in this case n=4)
awk -F\| -v n=4 '
NF >= n {
for(i=2; i<=n; i++) print $1 "|" $i
}
' file
Output
1234|A
1234|B
1234|C
2345|F
2345|G
2345|H
3456|K
3456|L
3456|M
# perl -lne'($a,#b)=((split/\|/)[0..3]);foreach (#b){print join"|",$a,$_}' file.dat
1234|A
1234|B
1234|C
2345|F
2345|G
2345|H
3456|K
3456|L
3456|M

awk $4 column if column = value with characters thereafter

I have a file with the following data within for example:
20 V 70000003d120f88 1 2
20 V 70000003d120f88 2 2
20x00 V 70000003d120f88 2 2
10020 V 70000003d120f88 1 5
I want to get the sum of the 4th column data.
Using the the below command, I can acheive this, however the row 20x00 is excluded. I want to everything to start with 20 must be sumed and nothing before that, so 20* for example:
cat testdata.out | awk '{if ($1 == '20') print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
The output value must be:
5
How can I achieve this using awk. The below I attempted also does not work:
cat testdata.out | awk '$1 ~ /'20'/ {print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
There is no need to use 3 processes, anything can be done by one AWK process. Check it out:
awk '$1 ~ /^20/ { a+=$4 } END { print a }' testdata.out
explanation:
$1 ~ /^20/ checks to see if $1 starts with 20
if yes, we add $4 in the variable a
finally, we print the variable a
result 5
EDIT:
Ed Morton rightly points out that the result should always be of the same type, which can be solved by adding 0 to the result.
You can set the exit status if it is necessary to distinguish whether the result 0 is due to no matches
(output status 0) or matching only zero values ​​(output status 1).
The exit code for different input data can be checked e.g. echo $?
The code would look like this:
awk '$1 ~ /^20/ { a+=$4 } END { print a+0; exit(a!="") }' testdata.out
Figured it out:
cat testdata.out | awk '$1 ~ /'^20'/ {print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
The above might not work for all cases, but below will suffice:
i=20
cat testdata.out | awk '{if ($1 == "'"$i"'" || $1 == ""'"${i}"'"x00") print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'

Is there way to extract all the duplicate records based on a particular column?

I'm trying to extract all (only) the duplicate values from a pipe delimited file.
My data file has 800 thousands rows with multiple columns and I'm particularly interested about column 3. So I need to get the duplicate values of column 3 and extract all the duplicate rows from that file.
I'm, however able to achieve this as shown below..
cat Report.txt | awk -F'|' '{print $3}' | sort | uniq -d >dup.txt
and I take the above in loop as shown below..
while read dup
do
grep "$dup" Report.txt >>only_dup.txt
done <dup.txt
I've also tried the awk method
while read dup
do
awk -v a=$dup '$3 == a { print $0 }' Report.txt>>only_dup.txt
done <dup.txt
But, as I have large number of records in the file, it's taking ages to complete. So I'm looking for an easy and quick alternative.
For example, I have data like this:
1|learning|Unix|Business|Requirements
2|learning|Unix|Business|Team
3|learning|Linux|Business|Requirements
4|learning|Unix|Business|Team
5|learning|Linux|Business|Requirements
6|learning|Unix|Business|Team
7|learning|Windows|Business|Requirements
8|learning|Mac|Business|Requirements
And my expected output which doesn't include unique records:
1|learning|Unix|Business|Requirements
2|learning|Unix|Business|Team
4|learning|Unix|Business|Team
6|learning|Unix|Business|Team
3|learning|Linux|Business|Requirements
5|learning|Linux|Business|Requirements
This may be what you want:
$ awk -F'|' 'NR==FNR{cnt[$3]++; next} cnt[$3]>1' file file
1|learning|Unix|Business|Requirements
2|learning|Unix|Business|Team
3|learning|Linux|Business|Requirements
4|learning|Unix|Business|Team
5|learning|Linux|Business|Requirements
6|learning|Unix|Business|Team
or if the file's too large for all the keys ($3 values) to fit in memory (which shouldn't be a problem with just the unique $3 values from 800,000 lines):
$ cat tst.awk
BEGIN { FS="|" }
{ currKey = $3 }
currKey == prevKey {
if ( !prevPrinted++ ) {
print prevRec
}
print
next
}
{
prevKey = currKey
prevRec = $0
prevPrinted = 0
}
$ sort -t'|' -k3,3 file | awk -f tst.awk
3|learning|Linux|Business|Requirements
5|learning|Linux|Business|Requirements
1|learning|Unix|Business|Requirements
2|learning|Unix|Business|Team
4|learning|Unix|Business|Team
6|learning|Unix|Business|Team
EDIT2: As per Ed sir's suggestion fine tuned my suggestion with more meaningful names(IMO) of arrays.
awk '
match($0,/[^\|]*\|/){
val=substr($0,RSTART+RLENGTH)
if(!unique_check_count[val]++){
numbered_indexed_array[++count]=val
}
actual_valued_array[val]=(actual_valued_array[val]?actual_valued_array[val] ORS:"")$0
line_count_array[val]++
}
END{
for(i=1;i<=count;i++){
if(line_count_array[numbered_indexed_array[i]]>1){
print actual_valued_array[numbered_indexed_array[i]]
}
}
}
' Input_file
Edit by Ed Morton: FWIW here's how I'd have named the variables in the above code:
awk '
match($0,/[^\|]*\|/) {
key = substr($0,RSTART+RLENGTH)
if ( !numRecs[key]++ ) {
keys[++numKeys] = key
}
key2recs[key] = (key in key2recs ? key2recs[key] ORS : "") $0
}
END {
for ( keyNr=1; keyNr<=numKeys; keyNr++ ) {
key = keys[keyNr]
if ( numRecs[key]>1 ) {
print key2recs[key]
}
}
}
' Input_file
EDIT: Since OP changed Input_file with |delimited so changing code a bit to as follows, which deals with new Input_file(Thanks to Ed Morton sir for pointing it out).
awk '
match($0,/[^\|]*\|/){
val=substr($0,RSTART+RLENGTH)
if(!a[val]++){
b[++count]=val
}
c[val]=(c[val]?c[val] ORS:"")$0
d[val]++
}
END{
for(i=1;i<=count;i++){
if(d[b[i]]>1){
print c[b[i]]
}
}
}
' Input_file
Could you please try following, following will give output in same sequence of in which lines are occurring in Input_file.
awk '
match($0,/[^ ]* /){
val=substr($0,RSTART+RLENGTH)
if(!a[val]++){
b[++count]=val
}
c[val]=(c[val]?c[val] ORS:"")$0
d[val]++
}
END{
for(i=1;i<=count;i++){
if(d[b[i]]>1){
print c[b[i]]
}
}
}
' Input_file
Output will be as follows.
2 learning Unix Business Team
4 learning Unix Business Team
6 learning Unix Business Team
3 learning Linux Business Requirements
5 learning Linux Business Requirements
Explanation for above code:
awk ' ##Starting awk program here.
match($0,/[^ ]* /){ ##Using match function of awk which matches regex till first space is coming.
val=substr($0,RSTART+RLENGTH) ##Creating variable val whose value is sub-string is from starting point of RSTART+RLENGTH value to till end of line.
if(!a[val]++){ ##Checking condition if value of array a with index val is NULL then go further and increase its index too.
b[++count]=val ##Creating array b whose index is increment value of variable count and value is val variable.
} ##Closing BLOCK for if condition of array a here.
c[val]=(c[val]?c[val] ORS:"")$0 ##Creating array named c whose index is variable val and value is $0 along with keep concatenating its own value each time it comes here.
d[val]++ ##Creating array named d whose index is variable val and its value is keep increasing with 1 each time cursor comes here.
} ##Closing BLOCK for match here.
END{ ##Starting END BLOCK section for this awk program here.
for(i=1;i<=count;i++){ ##Starting for loop from i=1 to till value of count here.
if(d[b[i]]>1){ ##Checking if value of array d with index b[i] is greater than 1 then go inside block.
print c[b[i]] ##Printing value of array c whose index is b[i].
}
}
}
' Input_file ##Mentioning Input_file name here.
Another in awk:
$ awk -F\| '{ # set delimiter
n=$1 # store number
sub(/^[^|]*/,"",$0) # remove number from string
if($0 in a) { # if $0 in a
if(a[$0]==1) # if $0 seen the second time
print b[$0] $0 # print first instance
print n $0 # also print current
}
a[$0]++ # increase match count for $0
b[$0]=n # number stored to b and only needed once
}' file
Output for the sample data:
2|learning|Unix|Business|Team
4|learning|Unix|Business|Team
3|learning|Linux|Business|Requirements
5|learning|Linux|Business|Requirements
6|learning|Unix|Business|Team
Also, would this work:
$ sort -k 2 file | uniq -D -f 1
or -k2,5 or smth. Nope, as the delimiter changed from space to pipe.
Two steps of improvement.
First step:
After
awk -F'|' '{print $3}' Report.txt | sort | uniq -d >dup.txt
# or
cut -d "|" -f3 < Report.txt | sort | uniq -d >dup.txt
you can use
grep -f <(sed 's/.*/^.*|.*|&|.*|/' dup.txt) Report.txt
# or without process substitution
sed 's/.*/^.*|.*|&|.*|/' dup.txt > dup.sed
grep -f dup.sed Report.txt
Second step:
Use awk as given in other, better, answers.

compare two fields from two different files using awk

I have two files where I want to compare certain fields and produce the output
I have a variable as well
echo ${CURR_SNAP}
123
File1
DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|RSCNAME1
DOMAIN2|USER2|LE2|ORG2|ACCES2|RSCTYPE2|RSCNAME2
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|RSCNAME3
DOMAIN4|USER4|LE4|ORG4|ACCES4|RSCTYPE4|RSCNAME4
File2
ORG1|PRGPATH1
ORG3|PRGPATH3
ORG5|PRGPATH5
ORG6|PRGPATH6
ORG7|PRGPATH7
The output I am expecting as below where the last column is CURR_SNAP value and the matching will be 4th column of File1 should be matched with 1st column of File2
DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|123
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|123
I tried with the below code piece but looks like I am not doing it correctly
awk -v CURRSNAP="${CURR_SNAP}" '{FS="|"} NR==FNR {x[$0];next} {if(x[$1]==$4) print $1"|"$2"|"$3"|"$4"|"$5"|"$6"|"CURRSNAP}' File2 File1
With awk:
#! /bin/bash
CURR_SNAP="123"
awk -F'|' -v OFS='|' -v curr_snap="$CURR_SNAP" '{
if (FNR == NR)
{
# this stores the ORG* as an index
# here you can store other values if needed
orgs_arr[$1]=1
}
else if (orgs_arr[$4] == 1)
{
# overwrite $7 to contain CURR_SNAP value
$7=curr_snap
print
}
}' file2 file1
As in your expected output, you didn't output RSCNAME*, so I have overwritten $7(which is column for RSCNAME*) with $CURR_SNAP. If you want to display RSCNAME* column aswell, remove $7=curr_snap and change print statement to print $0, curr_snap.
I wouldn't use awk at all. This is what join(1) is meant for (Plus sed to append the extra column:
$ join -14 -21 -t'|' -o 1.1,1.2,1.3,1.4,1.5,1.6 File1 File2 | sed "s/$/|${CURR_SNAP}/"
DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|123
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|123
It does require that the files be sorted based on the common field, like your examples are.
You can do this with awk with two-rules. For the first file (where NR==FNR), simply use string concatenation to append the fields 1 - (NF-1) assigning the concatenated result to an array indexed by $4. Then for the second file (where NR>FNR) in rule two test if array[$1] has content and if so, output the array and append "|"CURR_SNAP (with CURR_SNAP shortened to c in the example below and array being a), e.g.
CURR_SNAP=123
awk -F'|' -v c="$CURR_SNAP" '
NR==FNR {
for (i=1;i<NF;i++)
a[$4]=i>1?a[$4]"|"$i:a[$4]$1
}
NR>FNR {
if(a[$1])
print a[$1]"|"c
}
' file1 file2
Example Use/Output
After setting the filenames to match yours, you can simply copy/middle-mouse-paste in your console to test, e.g.
$ awk -F'|' -v c="$CURR_SNAP" '
> NR==FNR {
> for (i=1;i<NF;i++)
> a[$4]=i>1?a[$4]"|"$i:a[$4]$1
> }
> NR>FNR {
> if(a[$1])
> print a[$1]"|"c
> }
> ' file1 file2
DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|123
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|123
Look things over and let me know if you have further questions.

Header to be separated and convert to a column with seq number and also partial file number in unix

I have a tab-delimited data file with a header. I want to split off that header and store it in another file, with corresponding sequence and file numbers.
This is the original filename:
AllResponses_11003_6_7_20132_17_33AM1.txt
This is the information it contains (the first line is the header):
"ID" "NAME" "LOCAL PLACE" "CONTACT NUM"
a1 bala pal kak
ba1 kri kap ute
This is the output I would like to obtain, also tab-delimited:
seq_num file_num header_nm
1 11003 ID
2 11003 NAME
3 11003 LOCAL PLACE
4 11003 CONTACT NUM
Any help would be appreciated.
I tried with following
#!/bin/ksh
export INFAHOME=/informat/PowerCenter/9.1.0/server/infa_shared
export SRCDIR=${INFAHOME}/SrcFiles/CSI/INCOMING
export filename=${SRCDIR}/AllResponses_11003_6_7_20132_17_33AM1.txt
export filenum=$(echo $filename | tr -dc 0-9 |cut -c 1-5)
echo seq_num file_num hname
cnt=1
for h in $(head -1 "$filename" )
do
echo $cnt $filenum $h cnt=$((cnt+1))
done
It is giving for word to word not for delimitor to delimitor
This the code i built with your help using awk but not working. Pl help.
#!/bin/ksh
export INFAHOME=/informat/PowerCenter/9.1.0/server/infa_shared
export SRCDIR=${INFAHOME}/SrcFiles/CSI/INCOMING
export file=${SRCDIR}/AllResponses_11003_6_7_20132_17_33AM1.txt
export file1=AllResponses_11003_6_7_20132_17_33AM1.txt
export name=$(echo $file1 | cut -d_ -f2) #gets 11003
$ awk -v file=$name -F"\t" 'BEGIN{print "seq_num\tfile_num\theader_nm"} NR==1 {for (i=1`enter code here`;i<=NF;i++) {print i"\t"file,"\t"$i}}' $file
getting below error. Pl help
file=11003 '-F\t' 'BEGIN{print "seq_num\tfile_num\theader_nm"} NR==1 {for (i=1;i<=NF;i++) {print i"\t"file,"\t"$i}}'
/informat/PowerCenter/9.1.0/server/infa_shared/SrcFiles/CSI/INCOMING/AllResponses_11003_6_7_20132_17_33AM1.txt
CSI_SURVEY_FILE_CREA.ksh: line 7: v not found
Hi I need one more favour...
I need to pass file names dynamically and for each file need to create separate output file. Pl help.
Let's try to do it with a mixure of awk and bash:
$ file="AllResponses_11003_6_7_20132_17_33AM1.txt"
$ name=$(echo $file | cut -d_ -f2) #gets 11003
$ awk -v file=$name -F"\t"
'OFS="\t"; print "seq_num","file_num","header_nm"}
NR==1 {for (i=1;i<=NF;i++) {print i,file,$i}}' $file
seq_num file_num header_nm
1 11003 "ID"
2 11003 "NAME"
3 11003 "LOCAL PLACE"
4 11003 "CONTACT NUM"
Given
file="AllResponses_11003_6_7_20132_17_33AM1.txt"
The line
name=$(echo $file | cut -d_ -f2) #gets 11003
gets 1111 from the string XXXX_1111_YYY_ZZZ_.... Then this value is saved in $name so that awk can use it.
awk -v file=$name -F"\t" 'BEGIN{OFS="\t"; print "seq_num","file_num","header_nm"} NR==1 {for (i=1;i<=NF;i++) {print i,file,$i}}' $file
-v file=$name. Makes file a variable to be used by awk with the value of $name.
-F"\t". Sets tab as delimiter.
'BEGIN{print "seq_num","file_num","header_nm"}. Prints the header before processing the file.
NR==1. Just works with first line.
{for (i=1;i<=NF;i++) {print i,file,$i}}' $file. Prints each field number + $name + value.

Resources