Using awk, "if", then sed - unix

awk '{ if (NR -ge $num && NR -le $num_max && $1 == $UM);
then sed 's/'$UM'/'$UM_lower'/g'dipping_CR_UM_change.txt > new.txt}'
I get the following error
awk: syntax error at source line 1
context is
{ if (NR -ge $num && NR -le $num_max && $1 == $UM); then sed >>> s/$UM/$UM_lower/gdipping_CR_UM_change. <<< txt > new.txt}
awk: illegal statement at source line 1

The power of Awk lets you pass variables and make the sed part with the gsub() function:
awk -v UM="$UM" -v UM_lower="$UM_lower" -v num="$num" '{ if (NR > num && NR < num_max && $1 == UM) { gsub(UM, UM_lower) } print $0 }'

Related

Unix shell - syntax error near unexpected token `done'

Below is my code.
#!/bin/ksh
curdate=$(date '+%d%h,%Y')
while read line;
do
echo "$line" > new10.txt
str0=$(cut -f 2 new10.txt)
str01=$(cut -f 1 -d ',' new10.txt)
str1=$(cut -f 2 -d ',' new10.txt)
str2=$(cut -c 3 $str1)
if [ $str2=':' ];
then
str2=',2016'
finalstr=$str01$str2
if [ '01jan2017' -le $finalstr -le $curdate ];
then
finalstr1=$str01',2017'
else
finalstr1=$str01',2016'
echo $finalstr1 > datefinal.txt
fi
done < /export/home/islams/PISAS/userwiseutil/date.txt
I am getting following errors:
date1.sh: line 22: syntax error near unexpected token done'
date1.sh: line 22:done < /export/home/islams/PISAS/userwiseutil/date.txt'
#!/bin/ksh
curdate=$(date '+%d%h,%Y')
while read line;
do
echo "$line" > new10.txt
str0=$(cut -f 2 new10.txt)
str01=$(cut -f 1 -d ',' new10.txt)
str1=$(cut -f 2 -d ',' new10.txt)
str2=$(cut -c 3 $str1)
if [ $str2=':' ];
then
str2=',2016'
finalstr=$str01$str2
if [ '01jan2017' -le $finalstr -le $curdate ];
then
finalstr1=$str01',2017'
else
finalstr1=$str01',2016'
echo $finalstr1 > datefinal.txt
fi
fi #You missed this
done < /export/home/islams/PISAS/userwiseutil/date.txt
You missed one fi

How can I extract all repeated pattern in a line to comma separated format

I am extracting an interested pattern in a file. In each line I have repeated pattern and I want to order all repeated pattern for each line in a comma separated format. For example: In each line I have a string like this:
Line1: InterPro:IPR000504 InterPro:IPR003954 InterPro:IPR012677 Pfam:PF00076 PROSITE:PS50102 SMART:SM00360 SMART:SM00361 EMBL:CP002684 Proteomes:UP000006548 GO:GO:0009507 GO:GO:0003723 GO:GO:0000166 Gene3D:3.30.70.330 SUPFAM:SSF54928 eggNOG:KOG0118 eggNOG:COG0724 InterPro:IPR003954
Line2: InterPro:IPR000306 InterPro:IPR002423 InterPro:IPR002498 Pfam:PF00118 Pfam:PF01363 Pfam:PF01504 PROSITE:PS51455 SMART:SM00064 SMART:SM00330 InterPro:IPR013083 Proteomes:UP000006548 GO:GO:0005739 GO:GO:0005524 EMBL:CP002686 GO:GO:0009555 GO:GO:0046872 GO:GO:0005768 GO:GO:0010008 Gene3D:3.30.40.10 InterPro:IPR017455
I want to extract all InterPro IDs for each line as like as this :
IPR000504,IPR003954,IPR012677,IPR003954
IPR000306,IPR002423,IPR002498,IPR013083,IPR017455
I have used this script:
while read line; do
NUM=$(echo $line | grep -oP 'InterPro:\K[^ ]+' | wc -l)
if [ $NUM -eq 0 ];then
echo "NA" >> InterPro.txt;
fi;
if [ ! $NUM -eq 0 ];then
echo $line | grep -oP 'InterPro:\K[^ ]+' | tr '\n' ',' >> InterPro.txt;
fi;
done <./File.txt
The problem is once I run this script, all the pattern's values in the File.txt print in one line. I want all interested pattern's values of each line print in separated line.
Thank you in advance
With awk:
awk '{for (i=1; i<=NF; ++i) {if ($i~/^InterPro:/) {gsub(/InterPro:/, "", $i); x=x","$i}} gsub (/^,/, "", x); print x; x=""}' file
Output:
IPR000504,IPR003954,IPR012677,IPR003954
IPR000306,IPR002423,IPR002498,IPR013083,IPR017455
With indent and more meaningful variable names:
awk '
{
for (column=1; column<=NF; ++column)
{
if ($column~/^InterPro:/)
{
gsub(/InterPro:/, "", $column)
line=line","$column
}
}
gsub (/^,/, "",line)
print line
line=""
}' file
With bash builtin commands:
while IFS= read -r line; do
for column in $line; do
[[ $column =~ ^InterPro:(.*) ]] && new+=",${BASH_REMATCH[1]}"
done
echo "${new#,*}"
unset new
done < file
Finally, I changed the script and could get the interested results:
while read line; do
NUM=$(echo $line | grep -oP 'InterPro:\K[^ ]+' | wc -l)
if [ $NUM -eq 0 ];then
echo "NA" >> InterPro.txt;
fi;
if [ ! $NUM -eq 0 ];then
echo $line | grep -oP 'InterPro:\K[^ ]+' | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}' | sed 's/ /,/g' >> InterPro.txt;
fi;
done <./File.txt

Date validation in Unix shell script (ksh)

I am validating the date in Unix shell script as follow:
CheckDate="2010-04-09"
regex="[1-9][0-9][0-9][0-9]-[0-9][0-9]-[0-3][0-9]"
if [[ $CheckDate = *#$regex ]]
then
echo "ok"
else
echo "not ok"
fi
But ksh it is giving output as not okay.. pls help.. i want output as ok
Here is my little script (written in Solaris 10, nawk is mandatory... sorry...). I know if you try to trick it by sending an alphanumeric you get an error on the let statements. Not perfect, but it gets you there...
#!/usr/bin/ksh
# checks for "-" or "/" separated 3 field parameter...
if [[ `echo $1 | nawk -F"/|-" '{print NF}'` -ne 3 ]]
then
echo "invalid date!!!"
exit 1
fi
# typeset trickery...
typeset -Z4 YEAR
typeset -Z2 MONTH
typeset -Z2 DATE
let YEAR=`echo $1 | nawk -F"/|-" '{print $3}'`
let MONTH=`echo $1 | nawk -F"/|-" '{print $1}'`
let DATE=`echo $1 | nawk -F"/|-" '{print $2}'`
let DATE2=`echo $1 | nawk -F"/|-" '{print $2}'`
# validating the year
# if the year passed contains letters or is "0" the year is invalid...
if [[ $YEAR -eq 0 ]]
then
echo "Invalid year!!!"
exit 2
fi
# validating the month
if [[ $MONTH -eq 0 || $MONTH -gt 12 ]]
then
echo "Invalid month!"
exit 3
fi
# Validating the date
if [[ $DATE -eq 0 ]]
then
echo "Invalid date!"
exit 4
else
CAL_CHECK=`cal $MONTH $YEAR | grep $DATE2 > /dev/null 2>&1 ; echo $?`
if [[ $CAL_CHECK -ne 0 ]]
then
echo "invalid date!!!"
exit 5
else
echo "VALID DATE!!!"
fi
fi
You can try this and manipulate
echo "04/09/2010" | awk -F '/' '{ print ($1 <= 04 && $2 <= 09 && match($3, /^[0-9][0-9][0-9][0-9]$/)) ? "good" : "bad" }'
echo "2010/04/09" | awk -F '/' '{ print ( match($1, /^[0-9][0-9][0-9][0-9]$/) && $2 <= 04 && $3 <= 09 ) ? "good" : "bad" }'
Please find the below code works as your exception.
export checkdate="2010-04-09"
echo ${checkdate} | grep '^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]$'
if [ $? -eq 0 ]; then
echo "Date is valid"
else
echo "Date is not valid"
fi

How to find the distinct values in unix

I need distinct values from the below columns:
AA|BB|CC
a#gmail.com,c#yahoo.co.in|a#gmail.com|a#gmail.com
y#gmail.com|x#yahoo.in,z#redhat.com|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in
Here records are '|' seperated and in the 1st column, we can two email id's which are ',' seperated. so, I want to consider that also. I want distinct email id's in the AA,BB,CC column, whether it is '|' seperated or ',' seperated.
Expected output:
c#yahoo.co.in|a#gmail.com|
y#gmail.com|x#yahoo.in|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in
is awk unix enough for you?
{
for(i=1; i < NF; i++) {
if ($i ~ /#/) {
mail[$i]++
}
}
}
END {
for (x in mail) {
print mail[x], x
}
}
output:
$ awk -F'[|,]' -f v.awk f1
2 z#redhat.com
3 a#gmail.com
1 x#yahoo.in
1 c#yahoo.co.in
1 c#gmail.com
1 y#gmail.com
1 b#yahoo.co.in
Using awk :
cat file | tr ',' '|' | awk -F '|' '{ line=""; for (i=1; i<=NF; i++) {if ($i != "" && list[NR"#"$i] != 1){line=line $i "|"}; list[NR"#"$i]=1 }; print line}'
Prints :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|z#redhat.com|
c#gmail.com|b#yahoo.co.in|c#uix.xo.in|
Edit :
Now works properly with inputs such as :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|a#gmail.com|
c#gmail.com|c#yahoo.co.in|c#uix.xo.in|
Prints :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|a#gmail.com|
c#gmail.com|c#yahoo.co.in|c#uix.xo.in|
The following python code will solve your problem:
#!/usr/bin/env python
while True:
try:
addrs = raw_input()
except EOFError:
break
print '|'.join(set(addrs.replace(',', '|').split('|')))
In Bash only:
while read s; do
IFS='|,'
for e in $s; do
echo "$e"
done | sort | uniq
unset IFS
done
This seems to work, although I'm not sure what to do if there are more than three unique mails. Run with awk -f filename.awk dataname.dat
BEGIN {IFS=/[,|]/}
NF {
delete uniqmails;
for (i=1; i<=NF; i++)
uniqmails[$i] = 1;
sep="";
n=0;
for (m in uniqmails) {
printf "%s%s", sep, m;
sep="|";
n++;
}
for (;n<3;n++) printf "|";
print ""; // EOL
}
There's also this "one-liner" that doesn't need awk:
while read line; do
echo $line | tr ",|" "\n" | sort -u |\
paste <( seq 3) - | cut -f 2 |\
tr "\n" "|" |\
rev | cut -c 2- | rev;
done
With perl:
perl -lane '$s{$_}++ for split /[|,]/; END { print for keys %s;}' input
I have edited this post, Hope it will work
while read line
do
val1=`echo $line|awk -F"|" '{print $1}'`
val2=`echo $line|awk -F"|" '{print $2}'`
val3=`echo $line|awk -F"|" '{print $3}'`
a=`echo $line|awk -F"|" '{print $2,"|",$3}'|sed 's/'$val1'//g'`
aa=`echo "$val1|$a"`
b=`echo $aa|awk -F"|" '{print $1,"|",$3}'|sed 's/'$val2'//g'`
b1=`echo $b|awk -F"|" '{print $1}'`
b2=`echo $b|awk -F"|" '{print $2}'`
bb=`echo "$b1|$val2|$b2"`
c=`echo $bb|awk -F"|" '{print $1,"|",$2}'|sed 's/'$val3'//g'`
cc=`echo "$c|$val3"|sed 's/,,/,/;s/,|/|/;s/|,/|/;s/^,//;s/ //g'`
echo "$cc">>abcd
done<ab.dat
cat abcd
c#yahoo.co.in||a#gmail.com
y#gmail.com|x#yahoo.in|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in
You can subtract all "," separated values and parse in the same way...if your all values are having "," separated.

how do you skip the last line w/ awk?

I am processing a file with awk and need to skip some lines. The internet dosen't have a good answer.
So far the only info I have is that you can skip a range by doing:
awk 'NR==6,NR==13 {print}' input.file
OR
awk 'NR <= 5 { next } NR > 13 {exit} { print}' input.file
You can skip the first line by inputting:
awk 'NR < 1 { exit } { print}' db_berths.txt
How do you skip the last line?
One way using awk:
awk 'NR > 1 { print prev } { prev = $0 }' file.txt
Or better with sed:
sed '$d' file.txt
You can try:
awk 'END{print NR}' file

Resources