No results while making precise matching using awk - unix
I am having rows like this in my source file:
"Sumit|My Application|PROJECT|1|6|Y|20161103084527"
I want to make a precise match on Column 3 i.e. I do not want to use '~' operator while writing my awk command. However the command:
awk -F '|' '($3 ~ /'"$Var_ApPJ"'/) {print $3}' ${Var_RDR}/${Var_RFL};
is fetching me correct result but the command:
awk -F '|' '($3 == "${Var_ApPJ}") {print $3}' ${Var_RDR}/${Var_RFL};
fails to do so. Can anyone help in explaining why it happens and I am willing to use '==' because I do not want to match if the value is "PROJECT1" in source file.
Parameter Var_ApPJ="PROJECT"
${Var_RDR}/${Var_RFL} -> Refers to source file.
Refer to this part of documentation to know how to pass variable to awk.
I found an alternative way of '==' with '~':
awk -F '|' '($3 ~ "^${Var_ApPJ}"$) {print $3}' ${Var_RDR}/${Var_RFL};
here is the problem -
try below command -
awk -F '|' '$3 == Var_ApPJ {print $3}' ${Var_RDR}/${Var_RFL};
Remove curly braces and bracket.
vipin#kali:~$ cat kk.txt
a 5 b cd ef gh
vipin#kali:~$ awk -v var1="5" '$2 == var1 {print $3}' kk.txt
b
vipin#kali:~$
OR
#cat kk.txt
a 5 b cd ef gh
#var1="5"
#echo $var1
5
#awk '$2 == "'"$var1"'" {print $3}' kk.txt ### With "{}"
b
#
#awk '$2 == "'"${var1}"'" {print $3}' kk.txt ### without "{}"
b
#
Related
Compare 2 nd columns from 2 files unix
Compare 2 nd columns from 2 files, unmatch match first file record s write into output file Example: # delimiter Filename_clientid.txt RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025 RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024 RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026 Client_id.txt ramesh#RIA000025 suresh#RIA000024 vamshi#RIA000027 Excepted output: RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026 I used awk command not working can you help me awk -F '#' 'NR==FNR{a[$2]; next} FNR==1 || !($1 in a)' Client_id.txt Filename_clientid.txt
alternative $ join -t# -j2 <(sort -t# -k2 file1) <(sort -t# -k2 file2) RIA000026#RIA000026_MA_plan_BTR_09282022_6.xml#ramesh
The number of zeroes is not the same in both files. If they are the same, you can check that the field 2 value of Filename_clientid.txt does not occur in a Filename_clientid.txt RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025 RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024 RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026 Client_id.txt ramesh#RIA00025 suresh#RIA00024 vamshi#RIA00027 Example awk -F'#' 'NR==FNR{a[$2]; next} !($2 in a)' Client_id.txt Filename_clientid.txt Output RIA00026_MA_plan_BTR_09282022_6.xml#RIA000026
With corrected inputs (was wrong with number of zeroes): file1 RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025 RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024 RIA000026_MA_plan_BTR_09282022_6.xml#RIA000026 file2 ramesh#RIA000025 suresh#RIA000024 vamshi#RIA000027 ramesh#RIA000026 code awk -F'#' 'NR==FNR{a[$1]=$0;next} $2 in a{print a[$2]}' file1 file2 Output RIA000026_MA_plan_BTR_09282022_6.xml
Transposing multiple columns in multiple rows keeping one column fixed in Unix
I have one file that looks like below 1234|A|B|C|10|11|12 2345|F|G|H|13|14|15 3456|K|L|M|16|17|18 I want the output as 1234|A 1234|B 1234|C 2345|F 2345|G 2345|H 3456|K 3456|L 3456|M I have tried with the below script. awk -F"|" '{print $1","$2","$3","$4"}' file.dat | awk -F"," '{OFS=RS;$1=$1}1' But the output is generated as below. 1234 A B C 2345 F G H 3456 K L M Any help is appreciated.
What about a single simple awk process such as this: $ awk -F\| '{print $1 "|" $2 "\n" $1 "|" $3 "\n" $1 "|" $4}' file.dat 1234|A 1234|B 1234|C 2345|F 2345|G 2345|H 3456|K 3456|L 3456|M No messing with RS and OFS.
If you want to do this dynamically, then you could pass in the number of fields that you want, and then use a loop starting from the second field. In the script, you might first check if the number of fields is equal or greater than the number you pass into the script (in this case n=4) awk -F\| -v n=4 ' NF >= n { for(i=2; i<=n; i++) print $1 "|" $i } ' file Output 1234|A 1234|B 1234|C 2345|F 2345|G 2345|H 3456|K 3456|L 3456|M
# perl -lne'($a,#b)=((split/\|/)[0..3]);foreach (#b){print join"|",$a,$_}' file.dat 1234|A 1234|B 1234|C 2345|F 2345|G 2345|H 3456|K 3456|L 3456|M
awk $4 column if column = value with characters thereafter
I have a file with the following data within for example: 20 V 70000003d120f88 1 2 20 V 70000003d120f88 2 2 20x00 V 70000003d120f88 2 2 10020 V 70000003d120f88 1 5 I want to get the sum of the 4th column data. Using the the below command, I can acheive this, however the row 20x00 is excluded. I want to everything to start with 20 must be sumed and nothing before that, so 20* for example: cat testdata.out | awk '{if ($1 == '20') print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}' The output value must be: 5 How can I achieve this using awk. The below I attempted also does not work: cat testdata.out | awk '$1 ~ /'20'/ {print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
There is no need to use 3 processes, anything can be done by one AWK process. Check it out: awk '$1 ~ /^20/ { a+=$4 } END { print a }' testdata.out explanation: $1 ~ /^20/ checks to see if $1 starts with 20 if yes, we add $4 in the variable a finally, we print the variable a result 5 EDIT: Ed Morton rightly points out that the result should always be of the same type, which can be solved by adding 0 to the result. You can set the exit status if it is necessary to distinguish whether the result 0 is due to no matches (output status 0) or matching only zero values (output status 1). The exit code for different input data can be checked e.g. echo $? The code would look like this: awk '$1 ~ /^20/ { a+=$4 } END { print a+0; exit(a!="") }' testdata.out
Figured it out: cat testdata.out | awk '$1 ~ /'^20'/ {print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}' The above might not work for all cases, but below will suffice: i=20 cat testdata.out | awk '{if ($1 == "'"$i"'" || $1 == ""'"${i}"'"x00") print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
awk to sort two fields:
Would like to sort Input.csv file based on fields $1 and $5 and generate country wise A-Z order. While doing sort need to consider country name either from $1 or $5 if any of the fields are blank. Input.csv Country,Amt,Des,Details,Country,Amt,Des,Network,Details abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep ,,,,mno,50,DL,ABC~XYZ,Sep abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep jkl,40,11-Sep-13,Aug,,,,, ,,,,ghi,30,AL,DEF~PQZ,Sep abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep def,20,02-Jul-13,Aug,,,,, def,20,02-Aug-13,Aug,,,,, Desired Output.csv Country,Amt,Des,Details,Country,Amt,Des,Network,Details abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep def,20,02-Jul-13,Aug,,,,, def,20,02-Aug-13,Aug,,,,, ,,,,ghi,30,AL,DEF~PQZ,Sep jkl,40,11-Sep-13,Aug,,,,, ,,,,mno,50,DL,ABC~XYZ,Sep I have tried below command but not getting desired output. Please suggest.. head -1 Input.csv > Output.csv; sort -t, -k1,1 -k5,5 <(tail -n +2 Input.csv) >> Output.csv
awk to the rescue! $ awk -F, '{print ($1==""?$5:$1) "\t" $0}' file | sort | cut -f2- Country,Amt,Des,Details,Country,Amt,Des,Network,Details abc,10,03-Apr-14,Aug,abc,10,DL,ABC~XYZ,Sep abc,10,03-Apr-14,Aug,abc,10,MN,ABC~XYZ,Sep abc,10,19-Feb-14,Aug,abc,10,MN,ABC~XYZ,Sep abc,10,22-Jan-07,Aug,abc,10,DL,ABC~XYZ,Sep def,20,02-Aug-13,Aug,,,,, def,20,02-Jul-13,Aug,,,,, ,,,,ghi,30,AL,DEF~PQZ,Sep jkl,40,11-Sep-13,Aug,,,,, ,,,,mno,50,DL,ABC~XYZ,Sep here the header starting with uppercase and data is lowercase. If this is not a valid assumption special handling of header required as you did above or better with awk $ awk -F, 'NR==1{print; next} {print ($1==""?$5:$1) "\t" $0 | "sort | cut -f2-"}' file
Is this what you want? (Omitted first line) cat file_containing_your_lines | awk 'NR != 1' | sed "s/,/\t/g" | sort -k 1 -k 5 | sed "s/\t/,/g"
Duplicates in an unix text file based on multiple fields
I have a requirement to find duplicates based on three columns in a .txt file in unix which is delimited by ,. Input: a,b,c,d,e,f,gf,h a,bd,cg,dd,ey,f,g,h a,b,df,d,e,fd,g,h a,b,ck,d,eg,f,g,h Let's take we are finding dupliactes based on 1,2,5 fields. Expected output: a,b,c,d,e,f,gf,h a,b,df,d,e,fd,g,h Can anyone help to write a script for this or is there a command already available? I tried like this: awk -F, '!x[$1,$2,$3]++' file.txt but did not work
One way using awk: awk -F, 'FNR==NR { x[$1,$2,$5]++; next } x[$1,$2,$5] > 1' a.txt a.txt This is simple, but reads the file two times. On the first pass (FNR==NR), it maintains counts based on key fields. During the second pass, if prints the line if its key was found more than once. Another way using awk: awk -F, '{if (x[$1$2$5]) { y[$1$2$5]++; print $0; if (y[$1$2$5] == 1) { print x[$1$2$5] } } x[$1$2$5] = $0}' a.txt Explanation: 1 awk -F, 2 '{if (x[$1$2$5]) 3 { y[$1$2$5]++; print $0; 4 if (y[$1$2$5] == 1) 5 { print x[$1$2$5] } 6 } x[$1$2$5] = $0 7 }' Line 2: If x has $1$2$5, this key was seen before, do steps 3-5 Line 3: Increment the count and print the line because it is a dup Line 4: This means, We are seeing this key for the 2nd time, so we need to print the first line with this key. Last time we saw this key we did not know whether it was a dup or not. So we print the first line in step 5. Line 6: Store the current line against the key so we can use it in step 2 Another way using sort, uniq and awk Note: uniq command has an option '-f' to skip the specified number of fields before it starts comparison. sort -t, -k1,1 -k2,2 -k5,5 a.txt | awk -F, 'BEGIN { OFS = " "} {print $0, $1, $2, $5}' | sed 's/,/ /g' | uniq -f7 -D | sed 's/ /,/g' | cut -d',' -f 1-7 This sorts based on fields 1,2,5. awk prints the original line and appends fields 1,2,5 . sed changes the delimiter because uniq does not have an option to specify delimiter. uniq skips first 7 fields and works on rest of the line and prints duplicate lines.
I had a similar issue I needed to eliminate duplicate detail records while preserving flat file record formatting and seqence of the records. The duplication caused by a time expansion of the date field in column 2 of the detail only. Receiving system was reporting duplication on columns 4 and 5. I cobbled together this quick hack to resolve it. First read the file data into an array Then we can read and manipulate the individual records (crudely with a counter) as demonstrated in this snippet integrating a case statement to logically treat the various record types. Cheers! readarray inrecs < [input file name] filebase=echo "[input file name] | cut -d '.' -f1 i=1 for inrec in "${inrecs[#]}";do field1=echo ${inrecs[$i-1]} | cut -d',' -f1 field2=echo ${inrecs[$i-1]} | cut -d',' -f2 field3=echo ${inrecs[$i-1]} | cut -d',' -f3 field4=echo ${inrecs[$i-1]} | cut -d',' -f4 field5=echo ${inrecs[$i-1]} | cut -d',' -f5 field6=echo ${inrecs[$i-1]} | cut -d',' -f6 field7=echo ${inrecs[$i-1]} | cut -d',' -f7 field8=echo ${inrecs[$i-1]} | cut -d',' -f8 case $field1 in 'H') echo "$field1,$field2,$field3">${filebase}.new ;; 'D') dupecount=0 dupecount=`zegrep -c -e "${field4},${field5}" ${infile}` if [[ "$dupecount" -gt 1 ]];then writtencount=0 writtencount=`zegrep -c -e "${field4},${field5}" ${filebase}.new` if [[ "${writtencount}" -eq 0 ]];then echo "$field1,$field2,$field3,$field4,$field5,$field6,$field7,$field8,">>${filebase}.new fi else echo "$field1,$field2,$field3,$field4,$field5,$field6,$field7,$field8,">>${filebase}.new fi ;; 'T') dcount=`zegrep -c '^D' ${filebase}.new` echo "$field1,$field2,$dcount,$field4">>${filebase}.new ;; esac ((i++)) done