Finding Records Using awk Command using math score - unix

Write the unix command to display all the fields of students who has score more than 80 in math as well as math score should be top score among all subjects, moreover output should be in ascending order of std(standard)of the students.
INPUT:
roll,name,std,science_marks,math_marks,college
1,A,9,60,86,SM
2,B,10,85,80,DAV
3,C,10,95,92,DAV
4,D,9,75,92,DAV
OUTPUT:
1|A|9|60|86|SM
4|D|9|75|92|DAV
myCode:
awk 'BEGIN{FS=',' ; OFS="|"} {if($4<$5 && $5>80){print $1,$2,$3,$4,$5,$6}}'
but I'm getting unexpected token error please help me.
Error Message on my Mac System Terminal:
awk: syntax error at source line 1
context is
BEGIN >>> {FS=, <<<
awk: illegal statement at source line 1

Could you please try following, written and tested with shown samples in GNU awk. This answer doesn't hard code the field number it gathers column which has math in it and checks for rest of the lines accordingly then.
awk '
BEGIN{
FS=","
OFS="|"
}
FNR==1{
for(i=1;i<=NF;i++){
if($i=="math_marks"){ field=i }
}
next
}
{
for(i=3;i<=(NF-1);i++){
max=(max>$i?(max?max:$i):$i)
}
if(max==$field && $field>80){ $1=$1; print }
max=""
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of code here.
FS="," ##Setting field separator as comma here.
OFS="|" ##Setting output field separator as | here for all lines.
}
FNR==1{ ##Checking condition if its first line then do following.
for(i=1;i<=NF;i++){ ##Going through all fields here.
if($i=="math_marks"){ field=i } ##Checking if a field value is math_marks then set field to tht field numner here.
}
next ##next will skip all further statements from here.
}
{
for(i=3;i<=(NF-1);i++){ ##Going through from 3rd field to 2nd last field here.
max=(max>$i?(max?max:$i):$i) ##Creating max variable which checks its value with current field and sets maximum value by comparison here.
}
if(max==$field && $field>80){ $1=$1; print } ##After processing of all fields checking if maximum and field value is equal AND math number field is greater than 80 then print the line.
max="" ##Nullifying max var here.
}
' Input_file ##Mentioning Input_file name here.

Your code has double quotes of wrong encoding:
here
| |
v v
$ busybox awk 'BEGIN{FS=”,” ; OFS="|"} {if($4<$5 && $5>80){print $1,$2,$3,$4,$5,$6}}'
awk: cmd. line:1: Unexpected token
Replace those and your code works fine.

Related

unix ksh how to print $1 and first n characters of $2

I have a file as follows:
$ cat /etc/oratab
hostname01:DBNAME11:/oracle_home/A_19.0.0.0:N
hostname01:DBNAME1_DC:/oracle_home/A_19.0.0.0:N
hostname02:DBNAME21:/oracle_home/B_19.0.0.0:N
hostname02:DBNAME2_DC:/oracle_home/B_19.0.0.0:N
I want print the unique of the first column, first 6 characters of the second column and the third column when the third column matches the string "19.0.0".
The output I want to see is:
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
I put together this piece of code but looks like its not the correct way to do it.
cat /etc/oratab|grep "19.0.0"|awk '{print $1}' || awk -F":" '{print subsrt($2,1,8)}
sorry I am very new to shell scripting
1st solution: With your shown sample please try following, written and tested with GNU awk.
awk 'BEGIN{FS=OFS=":"} {$2=substr($2,1,7)} !arr[$1,$2]++ && $3~/19\.0\.0/{NF--;print}' Input_file
2nd solution: OR in case your awk doesn't support NF-- then try following.
awk '
BEGIN{
FS=OFS=":"
}
{
$2=substr($2,1,7)
}
!arr[$1,$2]++ && $3~/19\.0\.0/{
$4=""
sub(/:$/,"")
print
}
' Input_file
Explanation: Simple explanation would be, set field separator and output field separator as :. Then in main program, set 2nd field to 1st 7 characters of its value. Then check condition if they are unique(didn't occur before) and 3rd field is like 19.0.0, reduce 1 field and print that line.
You may try this awk:
awk 'BEGIN{FS=OFS=":"} $3 ~ /19\.0\.0/ && !seen[$1]++ {
print $1, substr($2,1,7), $3}' /etc/fstab
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
We check and populate associative array seen only if we find 19.0.0 in $3.
If the lines can be like this and ending on 19.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.1
and the hostname01 only should be unique, you might miss a line.
You could match the pattern using sed and use 2 capture groups that you want to keep and match what you don't want.
Then pipe the output to uniq to get all unique lines instead of line the first column.
sed -nE 's/^([^:]+:.{7})[^:]*(:[^:]*19\.0\.0[^:]*).*/\1\2/p' file | uniq
Output
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
$ awk 'BEGIN{FS=OFS=":"} index($3,"19.0.0"){print $1, substr($2,1,7), $3}' file | sort -u
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0

How to filter a file based on two separate field data by using awk?

Input file consists of 3 fields separated by "|" as follows:
TeamId|TeamName|TotalPlayers
TeamId consists of unique numbers.
TeamName consists of several Premier league football teams and corresponding no of players in TotalPlayers field.
2 of the records are as follows:(these records belong to one of the visible test cases)
103|Manchester United|12
105|Manchester City|13
Code requirement:
I had to output the TeamName which starts with Manchester and has the most number of players. If no Team starts with Manchester, then it should not be any output. i.e. In the above test case the output should be Manchester City.
My solution:
awk 'BEGIN{FS = "|";OFS = ",";}{if($2 ~ /^Manchester/){print $2, $3}}' | sort -n -k2 | awk -F , '(NR==1){print $1}'
This provided the expected output for normal test cases but hidden test cases were failed.
What changes can I make to this or any other easier way to achieve the same...
Also recommend any websites where I can practice these kind of unix coding problems by solving.
I had to output the TeamName which starts with Manchester and has the
most number of players. If no Team starts with Manchester, then it
should not be any output. i.e. In the above test case the output
should be Manchester City.
$ cat file
TeamId|TeamName|TotalPlayers
103|Manchester United|12
105|Manchester City|13
$ awk -F'|' '$2~/^Manchester/ && $3 >max{max=$3; team=$2}END{if(team)print team}' file
Manchester City
Could you please try following, written and tested with shown samples in GNU awk.
awk '
BEGIN{
FS="|"
}
FNR>1 && $2~/^Manchester/{
arr[$NF]=(arr[$NF]?arr[$NF] ORS:"")$2
max=(max>$NF?max:$NF)
}
END{
if(max!=""){
num=split(arr[max],val,ORS)
if(num>1){
for(i=1;i<=num;i++){
print val[i],max
}
}
else{ print arr[max],max }
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of program from here.
FS="|" ##Setting FS as | here.
}
FNR>1 && $2~/^Manchester/{ ##Checking condition if line number is more than 1 then do following.
arr[$NF]=(arr[$NF]?arr[$NF] ORS:"")$2 ##Creating array arr with index of last field and keep appending its value with new line in case similar max objects found to print them all.
max=(max>$NF?max:$NF) ##Creating max by checking if value of 2nd field if its greater than $2 then keep it else assign its value as $2.
}
END{ ##Starting END block of this program from here.
if(max!=""){ ##Checking condition if max is NOT NULL then do following.
num=split(arr[max],val,ORS) ##Splitting arr[max] value into val array with delimiter of ORS here.
if(num>1){ ##if num(total number of elements in arr) is greater than 1 then do following.
for(i=1;i<=num;i++){ ##Start a loop till value of num here.
print val[i],max ##Printing value of val with index i and max here.
}
}
else{ print arr[max],max } ##Else printing value of arr[max] and max only 1 time.
}
}
' Input_file ##Mentioning Input_file name here.

Awk: find duplicates in 3rd field INCLUDING original

I figured out the following code to find duplicate UIDs in a passwd file, but it doesn't include the first instance (the one that was later duplicated), I ultimately wanted to have a dictionary with UID = [ USER1, USER2 ] but I am not sure how to get it done in Awk.
What I have so far:
awk -F':' '$1 !~ /^#/ && _[$3]++ {print}' /etc/passwd
Explanation (as I understand it), if regex matches a line not beginning with comment '#', then increment an array based on the current line UID value which makes that line become a non-zero/True value thus printing it.
This may help you to do it. First we save in an array the data, and in the END{} block we print all repeated lines in the array (also you have a print in execution time). Hope it helps you
awk -F":" '
$1 !~ /^#/ && (counter[$3]>0) {a++;print "REPEATED|UID:"$3"|"$0"|"LastReaded[$3]; repeateds["a"a]=$0; repeateds["b"a]=LastReaded[$3]}
$1 !~ /^#/ { counter[$3]++; LastReaded[$3]=$0}
END {for (i in repeateds)
{
print i"|"repeateds[i]
}
}
' /etc/passwd
REPEATED|UID:229|pepito:*:229:229:pepito:/var/empty:/usr/bin/false|_avbdeviced:*:229:-2:Ethernet AVB Device Daemon:/var/empty:/usr/bin/false
a1|pepito:*:229:229:pepito:/var/empty:/usr/bin/false
b1|_avbdeviced:*:229:-2:Ethernet AVB Device Daemon:/var/empty:/usr/bin/false

Why does awk function only return last line from file?

I am using awk to reformat some fields in a file and an awk function to fix one field value if it is negative. Here is my awk command:
awk 'function fix_neg(value) {\
if(value < 0)\
return '$new_value'\
else\
return value\
} END { print $2,$1,fix_neg($3) }' input_file.txt
where $new_value was set before this call. I do not understand why this only returns the reformatted last line of input_file.txt (which contains multiple lines of data).
Thanks for your help.
Try this:
awk -v newV="$new_value" '{print $2,$1,($3<0?newV:$3)}' inputfile
In your program, you only got the last line data because you put your print statement in the END{..} block. It is triggered after the whole file was processed, not for each line. Drop the END and it would work as you intended.

Maximum number of characters in a field of a csv file using unix shell commands?

I have a csv file. In one of the fields, say the second field, I need to know maximum number of characters in that field. For example, given the file below:
adf,jlkjl,lkjlk
jf,j,lkjljk
jlkj,lkejflkj,adfafef,
jfje,jj,lkjlkj
jjee,eeee,ereq
the answer would be 8 because row 3 has 8 characters in the second field. I would like to integrate this into a bash script, so common unix command line programs are preferred. Imaginary bonus points for explaining what the command is doing.
EDIT: Here is what I have so far
cut --delimiter=, -f 2 test.csv | wc -m
This gives me the character count for all of the fields, not just one, so I still have progress to make.
I would use awk for the task. It uses a comma to split line in fields and for each line checks if the length of second field is bigger that the value already saved.
awk '
BEGIN {
FS = ","
}
{ c = length( $2 ) > c ? length( $2 ) : c }
END {
print c
}
' infile
Use it as a one-liner and assign the return value to a variable, like:
num=$(awk 'BEGIN { FS = "," } { c = length( $2 ) > c ? length( $2 ) : c } END { print c }' infile)
Well #oob, you basically provided the answer with your last edit, and it's the most simple of all answers given. However, I also like #Birei's answer just because I enjoy AWK. :-)
I too had to find the longest possible value for a given field inside a text file today. Tested with your sample and got the expected 8.
cut -d, -f2 test.csv | wc -L
As you see, just a matter of using the correct option for wc (which I hope you have already figured by now).
My solution is to loop over the lines. Than I exchange the commas with new lines to loop over the words than I check which is the longest word and save the data.
#!/bin/bash
lineno=1
matchline=0
matchlen=0
for line in $(cat input.txt); do
words=`echo $line | sed -e 's/,/\n/g'`
for word in $words; do
# echo "line: $lineno; length: ${#word}; input: $word"
if [ $matchlen -lt ${#word} ]; then
matchlen=${#word}
matchline=$lineno
fi
done;
lineno=$(($lineno + 1))
done;
echo max length is $matchlen in line $matchline
Bash and Coreutils Solution
There are a number of ways to solve this, but I vote for simplicity. Here's a solution that uses Bash parameter expansion and a few standard shell utilities to measure each line:
cut -d, -f2 /tmp/foo |
while read; do
echo ${#REPLY}
done | sort | tail -n1
The idea here is to split the CSV file, and then use the parameter length expansion of the implicit REPLY variable to measure the characters on each line. When we sort the measurements, the last line of the sorted output will hold the length of the longest line found.
cut out the desired column
print each line length
sort the line lengths
grab the max line length
cut -d, -f2 test.csv | awk '{print length($0);}' | sort -n | tail -n 1

Resources