I have a file with duplicate values in it. Based on few fields(filed 2, field3) i need to remove the duplicates and change the sequence of a field (ID) which is unique key of the file. how can i achieve this?.
for eg. My file (test.txt) contains
1,Eng,ECE
2,Eng,ECE
3,Eng,CS
4,Eng,CS
I want the output to be below
1,Eng,ECE
2,Eng,CS
I have removed the duplicates using the command
awk -F ',' '!a[$2$3]++' test.txt > test1.txt
How can i change the sequence of ID field now?
You can use
awk -F ',' -v "OFS=," '!a[$2$3]++ { $1=++i; print}'
This will renumber the first field starting with 1.
Another approach:
awk 'BEGIN { FS=OFS="," }
($2,$3) in seen { next }
{ seen[$2,$3] = 1; print ++seqno, $2, $3 }' test.txt
1,Eng,ECE
2,Eng,CS
Related
I have a file as follows:
$ cat /etc/oratab
hostname01:DBNAME11:/oracle_home/A_19.0.0.0:N
hostname01:DBNAME1_DC:/oracle_home/A_19.0.0.0:N
hostname02:DBNAME21:/oracle_home/B_19.0.0.0:N
hostname02:DBNAME2_DC:/oracle_home/B_19.0.0.0:N
I want print the unique of the first column, first 6 characters of the second column and the third column when the third column matches the string "19.0.0".
The output I want to see is:
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
I put together this piece of code but looks like its not the correct way to do it.
cat /etc/oratab|grep "19.0.0"|awk '{print $1}' || awk -F":" '{print subsrt($2,1,8)}
sorry I am very new to shell scripting
1st solution: With your shown sample please try following, written and tested with GNU awk.
awk 'BEGIN{FS=OFS=":"} {$2=substr($2,1,7)} !arr[$1,$2]++ && $3~/19\.0\.0/{NF--;print}' Input_file
2nd solution: OR in case your awk doesn't support NF-- then try following.
awk '
BEGIN{
FS=OFS=":"
}
{
$2=substr($2,1,7)
}
!arr[$1,$2]++ && $3~/19\.0\.0/{
$4=""
sub(/:$/,"")
print
}
' Input_file
Explanation: Simple explanation would be, set field separator and output field separator as :. Then in main program, set 2nd field to 1st 7 characters of its value. Then check condition if they are unique(didn't occur before) and 3rd field is like 19.0.0, reduce 1 field and print that line.
You may try this awk:
awk 'BEGIN{FS=OFS=":"} $3 ~ /19\.0\.0/ && !seen[$1]++ {
print $1, substr($2,1,7), $3}' /etc/fstab
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
We check and populate associative array seen only if we find 19.0.0 in $3.
If the lines can be like this and ending on 19.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.1
and the hostname01 only should be unique, you might miss a line.
You could match the pattern using sed and use 2 capture groups that you want to keep and match what you don't want.
Then pipe the output to uniq to get all unique lines instead of line the first column.
sed -nE 's/^([^:]+:.{7})[^:]*(:[^:]*19\.0\.0[^:]*).*/\1\2/p' file | uniq
Output
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
$ awk 'BEGIN{FS=OFS=":"} index($3,"19.0.0"){print $1, substr($2,1,7), $3}' file | sort -u
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
File e.lst:
H|data|ID|20200929|mni
1|20200929|mni|20200929|pqr|20200929|20200929
2|20200929|mni|20200929|abc|20200929|20200929
3|20200929|mni|20200929|lmn|20200929|20200929
4|20200929|mni|20200929|stu|20200929|20200929
T|count|123456
Command:
awk 'BEGIN {FS=OFS="|"} { $2=$4=$6=$7=20201007 } 1' e.lst > ne.lst
Above command adding extra fields in Header/Trailer records.
1st solution: Could you please try following, written and tested with shown samples.
awk -v lines=$(wc -l < Input_file) 'BEGIN{FS=OFS="|"} FNR>1 && FNR<lines{$2=$4=$6=$7="20201007"} 1' e.lst > ne.lst
OR
awk -v lines=$(wc -l < Input_file) '
BEGIN{ FS=OFS="|" }
FNR>1 && FNR<lines{
$2=$4=$6=$7="20201007"
}
1
' e.lst > ne.lst
Explanation: Simply creating an awk variable lines which has total number of lines in it. Then in BEGIN block setting field separator and output field separator. In Main block checking condition if current line number is greater than 1 and lesser than lines(which has total number of lines value in it), then set fields to new value as per need. 1 is a way in awk to print current line.
2nd solution(In single awk with reading Input_file 2 times): We could do this in a single awk by reading Input_file 2 times too, try like.
awk '
BEGIN{ FS=OFS="|" }
FNR==NR{
lines++
next
}
FNR>1 && FNR<lines{
$2=$4=$6=$7="20201007"
}
1
' e.lst e.lst > ne.lst
3rd solution: As per Sundeep sir's nice comment, in case your header and footer lines are lesser than 7 fields and rest of all lines of 7 fields then you can do this where we need not to count total number of lines in Input_file(but make sure about mentioned above conditions that your Input_file's all lines(apart from first and last) should be only 7 fields for this specific solution).
awk '
BEGIN{ FS=OFS="|" }
NF==7{
$2=$4=$6=$7="20201007"
}
1
' e.lst > ne.lst
I would simply add a condition:
awk 'BEGIN {FS=OFS="|"} $1!="H"&&$1!="T"{ $2=$4=$6=$7=20201007 } 1'
Alternatively, the same with a regular expression:
awk 'BEGIN {FS=OFS="|"} $1!~/^[HT]$/ { $2=$4=$6=$7=20201007 } 1'
I have following Input file and need to find which field has null and Display the Key Column and Null Value Column Name.
Note : In Future There might be New Fields can be added too.
Input.txt
Keyfeild1|Over|Loan|cc|backup
200|12||0|
100||15|1|200
100|100|100|100|100
50||50||11
ExpectedOutput.txt :
200|Loan
200|backup
100|Over
50|Over
50|cc
Command Used :
cat Input.txt | awk -F"|" '{for(i=1;i<=NF;i++) if($i=="") { print $1"|"i} }'
Achieved Output:
200|3
200|5
100|2
50|2
50|4
Following awk may help you on same.
awk -F"|" 'FNR>1{for(i=2;i<=NF;i++){if($i==""){print $1,"field"i}}}' OFS="|" Input_file
Output will be as follows:
200|field3
200|field5
100|field2
50|field2
50|field4
Can awk process this?
Input
Neil,23,01-Jan-1990
25,Reena,19900203
Output
'Neil',23,'01-Jan-1990'
25,'Reena',19900203
awk approach:
awk -F, '{for(i=1;i<=NF;i++) if($i~/[[:alpha:]]/) $i="\047"$i"\047"}1' OFS="," file
The output:
'Neil',23,'01-Jan-1990'
25,'Reena',19900203
if($i~/[[:alpha:]]/) - if field contains alphabetic character
\047 - octal code of single quote ' character
Incorrect was my first attempt
sed -r 's/([^,]*[a-zA-Z]+[^,]*)(,{0,1})/"\1"\2/g' inputfile
#Sundeep gave an excellent comment: I need single quotes and it can be shorter:
I tried to match including the , of end-of-line, causing some complexity for matching. You can just match between the seperators making sure there is an alphabetic character somewhere.
sed 's/[^,]*[a-zA-Z][^,]*/\x27&\x27/g' inputfile
You might use this script:
script.awk
BEGIN { OFS=FS="," }
{ for(i= 1; i<=NF; i++) {
if( !match( $i, /^[0-9]+$/ ) ) $i = "'" $i "'"
}
print
}
and run it like this: awk -f script.awk yourfile .
Explanation
the first line sets up the input and output Fieldseparators to ,.
the loop tests each field, whether it contains only digits (/^[0-9]+$/):
if not the field is put in quotes
I have a Unix file which has data like this.
1379545632,
1051908588,
229102020,
1202084378,
1102083491,
1882950083,
152212030,
1764071734,
1371766009,
(FYI, there is no empty line between two numbers as you see above. Its just because of the editor here. Its just a column with all numbers one below other)
I want to transpose it and print as a single line.
Like this:
1379545632,1051908588,229102020,1202084378,1102083491,1882950083,152212030,1764071734,1371766009
Also remove the last comma.
Can someone help? I need a shell/awk solution.
tr '\n' ' ' < file.txt
To remove the last comma you can try sed 's/,$//'.
With GNU awk for multi-char RS:
$ printf 'x,\ny,\nz,\n' | awk -v RS='^$' '{gsub(/\n|(,\n$)/,"")} 1'
x,y,z
awk 'BEGIN { ORS="" } { print }' file
ORS : Output Record separator.
Each Record will be separated with this delimiter.