Not In like database in AWK - unix

How to simplify the != statements. I have plenty of values like this
if (charNr%2 == 0 && newChar != " " && newChar !="0" && newChar !="1" && newChar !="2" && newChar !="3" && newChar !="4" && newChar !="5" && newChar !="6" && newChar !="7" && newChar !="8" && newChar !="9" ) {newStr = newStr newChar }
I want to use it in AWK AIX 7.1.2 fuctions. Please help me.
I am expecting something like
if (charNr%2 == 0 && newChar NOT IN (1 2 3 4 5 6 7 8 9 0) ) {newStr = newStr newChar }

The in operator in Awk works with array keys, so you can do:
keys["foo"];
"foo" in keys # true
For your example, you would have to create an array containing all the keys first:
keys[1]; keys[2]; keys[3]; # etc.
In this specific case you could use a loop to help you:
for (i = 0; i < 10; ++i) keys[i] # set keys from 0 to 9
newChar in keys # true if newChar is 0-9
In a general case, you can use:
input = "first,second,third,fourth"
n = split(input, temp, /,/)
for (i = 1; i <= n; ++i) keys[temp[i]]

if your black list is single digits there are easier ways, but assuming that you have a list of arbitrary tokens, you can use this trick
awk -v n='t1 t2 t3 t4' 'FS n FS !~ FS $1 FS'
it adds the FS to the beginning and end of the list and check for pattern match with a FS padded keyword (here $1, replace with your variable). Here assuming the default field delimiter is used, otherwise use the same delimiter in the list of tokens.
For example,
$ awk -v n='11 13 17 19' 'FS n FS !~ FS $1 FS' < <(seq 10 20)
10
12
14
15
16
18
20
if your list is arbitrary single chars (or digits), you can simplify it to
$ awk 'FS $1 FS !~ / [2357] /' < <(seq 10)
1
4
6
8
9
10

To simplify the != is to have the unwanted values in an array and to (value in array==0), like this:
$ awk '
BEGIN {
for(i=0;i<=9;i++) a[i] # add values 0...9 to array a
}
($1 in a==0) { # if value is not found in array
print # print it
}' <(seq 1 12) # test values
10
11
12

Related

Explain the AWK syntax in detail(NR,FNR,NF)

I am learning file comparison using awk.
I found syntax like below,
awk '
(NR == FNR) {
s[$0]
next
}
{
for (i=1; i<=NF; i++)
if ($i in s)
delete s[$i]
}
END {
for (i in s)
print i
}' tests tests2
I couldn't understand what is the Syntax ...Can you please explain in detail?
What exactly does it do?
awk ' # use awk
(NR == FNR) { # process first file
s[$0] # hash the whole record to array s
next # process the next record of the first file
}
{ # process the second file
for (i=1; i<=NF; i++) # for each field in record
if ($i in s) # if field value found in the hash
delete s[$i] # delete the value from the hash
}
END { # after processing both files
for (i in s) # all leftover values in s
print i # are output
}' tests tests2
For example, for files:
tests:
1
2
3
tests2:
1 2
4 5
program would output:
3

accumulate multiple values against one record in awk

I have file as
1|dev|Smith|78|minus
1|ana|jhon|23|plus
1|ana|peter|22|plus
2|dev|dash|45|minus
2|dev||44|plus
I want output as, against uniq value of column 1 and 2 print multiple values of column 3 and 5
1|dev|Smith|minus
1|ana|jhon;peter|plus;plus
2|dev|dash;|minus;plus
I can accumulate multiple records into 1 just for one column , I want to do it for 2 column in one command
awk -F"|" '{if(a[$1"|"$2])a[$1"|"$2]=a[$1"|"$2]";"$5; else
a[$1"|"$2]=$5;}END{for (i in a)print i, a[i];}' OFS="|" input.txt > output.txt
It is giving output as
2|dev|minus;plus
1|ana|plus;plus
1|dev|minus
If datamash is okay
$ # -g 1,2 tells to group by 1st and 2nd column
$ # collapse 3 collapse 5 tells to combine those column values
$ datamash -t'|' -g 1,2 collapse 3 collapse 5 < ip.txt
1|dev|Smith|minus
1|ana|jhon,peter|plus,plus
2|dev|dash,|minus,plus
$ # easy to change , to ; if input file doesn't contain ,
$ datamash -t'|' -g 1,2 collapse 3 collapse 5 < ip.txt | tr ',' ';'
1|dev|Smith|minus
1|ana|jhon;peter|plus;plus
2|dev|dash;|minus;plus
In awk, not the usual way, but first setting $3|$5 and then adding outwards like <-;$3|$5;-> to $3;$3|$5;$5, that's why ;dash instead of dash;:
$ awk '
BEGIN { FS=OFS="|" }
{
a[$1 OFS $2]=$3(a[$1 OFS $2]?";"a[$1 OFS $2]";":"|")$5
}
END {
for(i in a)
print i,a[i]
}' file
2|dev|;dash|minus;plus
1|ana|peter;jhon|plus;plus
1|dev|Smith|minus
The proper awk way would probably be closer to:
$ awk '
BEGIN { FS=OFS="|" }
{
i=$1 OFS $2
a[i] = a[i] ( a[i]=="" || $3=="" ? "" : ";" ) $3
b[i] = b[i] ( b[i]=="" || $5=="" ? "" : ";" ) $5
}
END {
for(i in a)
print i,a[i],b[i]
}' file
2|dev|dash|minus;plus
1|ana|jhon;peter|plus;plus
1|dev|Smith|minus

How to replace second existing patteren in unix file

I want to replace the second existence of the pattern in unix.
Input File:-
12345|45345|TaskID|dksj|kdjfdsjf|TaskID|12
1245|425345|TaskID|dksj|kdjfdsjf|TaskID|12
1234|25345|TaskID|dksj|TaskID|kdjfdsjf|12|TaskID
123425|65345|TaskID|dksj|kdjfdsjf|12|TaskID
123425|15325|TaskID|dksj|kdjfdsjf|12
Sample Output file:-
12345|45345|TaskID|dksj|kdjfdsjf|TaskID1|12
1245|425345|TaskID2|dksj|kdjfdsjf|TaskID3|12
1234|25345|TaskID|dksj|TaskID1|kdjfdsjf|12|TaskID2
123425|65345|TaskID3|dksj|kdjfdsjf|12|TaskID4
123425|15325|TaskID|dksj|kdjfdsjf|12
your example does not match your question,
so i'll only show how to replace every second match of the given pattern
use awk. it's very powerfull tool for command line text processing
replace.sh as follow:
cat | awk -v search="$1" -v repl="$2" '
BEGIN {
flag = 0
}
{
split($0, a, search)
len = length(a)
for (f = 1; f < len; f += 1) {
printf "%s%s", a[f], (flag % 2 == 0 ? search : repl)
flag += 1
}
printf "%s%s", a[len], ORS
}
'
cat input.txt | ./replace.sh TaskID TaskID1

How to print a line with a pattern which is nearest to another line with a specific pattern?

I want to find a pattern which is nearest to a specific pattern. Such as I want to print "bbb=" which is under the "yyyy:" (it is the closest line with bbb= to yyyy). It is line 8. line numbers and the order might be changed so it is better not to use line numbers.
root# vi a
"a" 15 lines
1 ## xxxx:
2 aaa=3
3 bbb=4
4 ccc=2
5 ddd=1
6 ## yyyy:
7 aaa=1
8 bbb=0
9 ccc=3
10 ddd=3
11 ## zzzz:
12 aaa=1
13 bbb=1
14 ccc=1
15 ddd=1
Do you have an idea using awk or grep for this purpose?
Something like this?
awk '/^## yyyy:/ { i = 1 }; i && /^bbb=/ { print; exit }'
Or can a line above also match if? In that case, perhaps:
awk '/^bbb=/ && !i { p=NR; s=$0 }; /^bbb=/ && i { print (NR-i < i-p) ? $0 : s; exit }; /^## yyyy:/ { i=NR }'
Taking into account that there might not be a previous or next entry:
/^bbb=/ && !i { p1 = NR; s1 = $0 }
/^bbb=/ && i { p2 = NR; s2 = $0; exit }
/^## yyyy:/ { i = NR }
END {
if (p1 == 0)
print s2
else if (p2 == 0)
print s1
else
print (i - p1 < p2 - i ? s1 : s2)
}
Quick and dirty using grep:
grep -A 100 '##yyyy' filename | grep 'bbb='

How to compare versions of some products in unix ksh shell?

Format of versions - X.X.X.X.
Where X - number.
What is the best way to compare two versions?
I use following code:
compareVersions()
{
VER_1=$1
VER_2=$2
print -R "$VER_1"| IFS=. read v1_1 v1_2 v1_3 v1_4
print -R "$VER_2"| IFS=. read v2_1 v2_2 v2_3 v2_4
RESULT="0"
if [[ "${v1_1}" -lt "${v2_1}" ]]
then
RESULT="-1"
elif [[ "${v1_1}" -gt "${v2_1}" ]]
then
RESULT="1"
elif [[ "${v1_2}" -lt "${v2_2}" ]]
then
RESULT="-1"
elif [[ "${v1_2}" -gt "${v2_2}" ]]
then
RESULT="1"
elif [[ "${v1_3}" -lt "${v2_3}" ]]
then
RESULT="-1"
elif [[ "${v1_3}" -gt "${v2_3}" ]]
then
RESULT="1"
elif [[ "${v1_4}" -lt "${v2_4}" ]]
then
RESULT="-1"
elif [[ "${v1_4}" -gt "${v2_4}" ]]
then
RESULT="1"
fi
echo "$RESULT"
}
But I do not like it - it is very straightforward.
Maybe is there much correct way to compare versions?
Pure Bash / Ksh:
compareVersions ()
{
typeset IFS='.'
typeset -a v1=( $1 )
typeset -a v2=( $2 )
typeset n diff
for (( n=0; n<4; n+=1 )); do
diff=$((v1[n]-v2[n]))
if [ $diff -ne 0 ] ; then
[ $diff -le 0 ] && echo '-1' || echo '1'
return
fi
done
echo '0'
} # ---------- end of function compareVersions ----------
Maybe you could use awk?
echo $VER_1 $VER2 | \
awk '{ split($1, a, ".");
split($2, b, ".");
for (i = 1; i <= 4; i++)
if (a[i] < b[i]) {
x =-1;
break;
} else if (a[i] > b[i]) {
x = 1;
break;
}
print x;
}'
There isn't a perfect way to do this. As shown you could use array / loop for the numbers, also in bash.
You can use sort -V to sort the lines with versions and match your version against the output:
% cat sorttest
#!/bin/sh
version_lt() {
echo "$1\n$2" | sort -V | head -n 1 | grep -q "$1"
}
display_versioncmp() {
version_lt "$1" "$2" && echo "$1 < $2" || echo "$1 > $2"
}
X="1.2.3"
Y="11.2.3"
Z="1.22.3"
display_versioncmp "$X" "$Y"
display_versioncmp "$Y" "$X"
display_versioncmp "$X" "$Z"
display_versioncmp "$Z" "$X"
display_versioncmp "$Z" "$Y"
display_versioncmp "$Y" "$Z"
% ./sorttest
1.2.3 < 11.2.3
11.2.3 > 1.2.3
1.2.3 < 1.22.3
1.22.3 > 1.2.3
1.22.3 < 11.2.3
11.2.3 > 1.22.3
If you can cheat by using Perl in your shell script, try it's built-in handling of version strings with string comparison operators:
V1=1.1.3; V2=1.1
echo $(perl -e '($x,$y)=#ARGV; print $x cmp $y' $V1 $V2)
You could also do away with the Perl variables and just use shift:
result=$(perl -e 'print shift cmp shift' $V1 $V2)
But that fails on versions > 10. So you could try this instead:
perl -e '($a,$b)=#ARGV; for ($a,$b) {s/(\d+)/sprintf "%5d", $1/ge}; print $a cmp $b;' 12.1.3 9.0.2
The sprintf of "%5d" is to make sure it will even work for Firefox, until version 99999... :-)
Obviously, you could also use the other Perl string operators like gt, lt, ge and le.
I had this problem, and after solving it looked to see if there was a better answer already available. My version allows for comparing different length version strings, and is the version_ge() function below, which should be used as a "greater or equal to" operator, as in
if version_ge "$version" "1.2.3.4"; then ...
#!/bin/sh
# Usage: split "<word list>" <variable1> <variable2>...
# Split a string of $IFS seperated words into individual words, and
# assign them to a list of variables. If there are more words than
# variables then all the remaining words are put in the last variable;
# use a dummy last variable to collect any unwanted words.
# Any variables for which there are no words are cleared.
# eg. split 'hello Fred this is Bill' greeting who extra
# sets greeting=hello who=Fred extra="this is Bill"
# and split "$list" word list # "pops" the first word from a list
split()
{
# Prefix local names with the function name to try to avoid conflicts
# local split_wordlist
split_wordlist="$1"
shift
read "$#" <<EOF-split-end-of-arguments
${split_wordlist}
EOF-split-end-of-arguments
}
# Usage: version_ge v1 v2
# Where v1 and v2 are multi-part version numbers such as 12.5.67
# Missing .<number>s on the end of a version are treated as .0, & leading
# zeros are not significant, so 1.2 == 1.2.0 == 1.2.0.0 == 01.2 == 1.02
# Returns true if v1 >= v2, false if v1 < v2
version_ge()
{
# Prefix local names with the function name to try to avoid conflicts
# local version_ge_1 version_ge_2 version_ge_a version_ge_b
# local version_ge_save_ifs
version_ge_v1="$1"
version_ge_v2="$2"
version_ge_save_ifs="$IFS"
while test -n "${version_ge_v1}${version_ge_v2}"; do
IFS="."
split "$version_ge_v1" version_ge_a version_ge_v1
split "$version_ge_v2" version_ge_b version_ge_v2
IFS="$version_ge_save_ifs"
#echo " compare $version_ge_a $version_ge_b"
test "0$version_ge_a" -gt "0$version_ge_b" && return 0 # v1>v2: true
test "0$version_ge_a" -lt "0$version_ge_b" && return 1 # v1<v2:false
done
# version strings are both empty & no differences found - must be equal.
return 0 # v1==v2: true
}
Here is a slightly improved method previously posted by schot. This method can save your life when there is no bash, sort -V commands etc. (for instance, in some docker images).
compareVersions() {
echo $1 $2 | \
awk '{ split($1, a, ".");
split($2, b, ".");
res = -1;
for (i = 1; i <= 3; i++){
if (a[i] < b[i]) {
res =-1;
break;
} else if (a[i] > b[i]) {
res = 1;
break;
} else if (a[i] == b[i]) {
if (i == 3) {
res = 0;
break;
} else {
continue;
}
}
}
print res;
}'
}

Resources