Awk/Unix group by

Awk/Unix group by - unix

have this text file:
name, age
joe,42
jim,20
bob,15
mike,24
mike,15
mike,54
bob,21
Trying to get this (count):
joe 1
jim 1
bob 2
mike 3
Thanks,

$ awk -F, 'NR>1{arr[$1]++}END{for (a in arr) print a, arr[a]}' file.txt
joe 1
jim 1
mike 3
bob 2
EXPLANATIONS
-F, splits on ,
NR>1 treat lines after line 1
arr[$1]++ increment array arr (split with ,) with first column as key
END{} block is executed at the end of processing the file
for (a in arr) iterating over arr with a key
print a print key , arr[a] array with a key

Strip the header row, drop the age field, group the same names together (sort), count identical runs, output in desired format.
tail -n +2 txt.txt | cut -d',' -f 1 | sort | uniq -c | awk '{ print $2, $1 }'
output
bob 2
jim 1
joe 1
mike 3

It looks like you want sorted output. You could simply pipe or print into sort -nk 2:
awk -F, 'NR>1 { a[$1]++ } END { for (i in a) print i, a[i] | "sort -nk 2" }' file
Results:
jim 1
joe 1
bob 2
mike 3
However, if you have GNU awk installed, you can perform the sorting without coreutils. Here's the single process solution that will sort the array by it's values. The solution should still be quite quick. Run like:
awk -f script.awk file
Contents of script.awk:
BEGIN {
FS=","
}
NR>1 {
a[$1]++
}
END {
for (i in a) {
b[a[i],i] = i
}
n = asorti(b)
for (i=1;i<=n;i++) {
split (b[i], c, SUBSEP)
d[++x] = c[2]
}
for (j=1;j<=n;j++) {
print d[j], a[d[j]]
}
}
Results:
jim 1
joe 1
bob 2
mike 3
Alternatively, here's the one-liner:
awk -F, 'NR>1 { a[$1]++ } END { for (i in a) b[a[i],i] = i; n = asorti(b); for (i=1;i<=n;i++) { split (b[i], c, SUBSEP); d[++x] = c[2] } for (j=1;j<=n;j++) print d[j], a[d[j]] }' file

A strictly awk solution...
BEGIN { FS = "," }
{ ++x[$1] }
END { for(i in x) print i, x[i] }
If name, age is really in the file, you could adjust the awk program to ignore it...
BEGIN { FS = "," }
/[0-9]/ { ++x[$1] }
END { for(i in x) print i, x[i] }

I come up with two functions based on the answers here:
topcpu() {
top -b -n1 \
| tail -n +8 \
| awk '{ print $12, $9, $10 }' \
| awk '{ CPU[$1] += $2; MEM[$1] += $3 } END { for (k in CPU) print k, CPU[k], MEM[k] }' \
| sort -k3 -n \
| tail -n 10 \
| column -t \
| tac
}
topmem() {
top -b -n1 \
| tail -n +8 \
| awk '{ print $12, $9, $10 }' \
| awk '{ CPU[$1] += $2; MEM[$1] += $3 } END { for (k in CPU) print k, CPU[k], MEM[k] }' \
| sort -k2 -n \
| tail -n 10 \
| column -t \
| tac
}
$ topcpu
chrome 0 75.6
gnome-shell 6.2 7
mysqld 0 4.2
zsh 0 2.2
deluge-gtk 0 2.1
Xorg 0 1.6
scrcpy 0 1.6
gnome-session-b 0 0.8
systemd-journal 0 0.7
ibus-x11 6.2 0.7
$ topmem
top 12.5 0
Xorg 6.2 1.6
ibus-x11 6.2 0.7
gnome-shell 6.2 7
chrome 6.2 74.6
adb 6.2 0.1
zsh 0 2.2
xdg-permission- 0 0.2
xdg-document-po 0 0.1
xdg-desktop-por 0 0.4
enjoy!

cut -d',' -f 1 file.txt |
sort | uniq -c
2 bob
1 jim
1 joe
3 mike

Related

awk $4 column if column = value with characters thereafter

I have a file with the following data within for example:
20 V 70000003d120f88 1 2
20 V 70000003d120f88 2 2
20x00 V 70000003d120f88 2 2
10020 V 70000003d120f88 1 5
I want to get the sum of the 4th column data.
Using the the below command, I can acheive this, however the row 20x00 is excluded. I want to everything to start with 20 must be sumed and nothing before that, so 20* for example:
cat testdata.out | awk '{if ($1 == '20') print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
The output value must be:
5
How can I achieve this using awk. The below I attempted also does not work:
cat testdata.out | awk '$1 ~ /'20'/ {print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'

There is no need to use 3 processes, anything can be done by one AWK process. Check it out:
awk '$1 ~ /^20/ { a+=$4 } END { print a }' testdata.out
explanation:
$1 ~ /^20/ checks to see if $1 starts with 20
if yes, we add $4 in the variable a
finally, we print the variable a
result 5
EDIT:
Ed Morton rightly points out that the result should always be of the same type, which can be solved by adding 0 to the result.
You can set the exit status if it is necessary to distinguish whether the result 0 is due to no matches
(output status 0) or matching only zero values (output status 1).
The exit code for different input data can be checked e.g. echo $?
The code would look like this:
awk '$1 ~ /^20/ { a+=$4 } END { print a+0; exit(a!="") }' testdata.out

Figured it out:
cat testdata.out | awk '$1 ~ /'^20'/ {print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'
The above might not work for all cases, but below will suffice:
i=20
cat testdata.out | awk '{if ($1 == "'"$i"'" || $1 == ""'"${i}"'"x00") print $4;}' | awk '{s+=$1}END{printf("%.0f\n", s)}'

Make each 2 rows a separated column in UNIX

Hello i have this file.txt
a=a
b=b
c=c
d=d
e=e
f=f
.
etc
(about 150 rows)
I need the output to be:
a b c d e f ....
a b c d e f ....
I already tried using paste -d " " - - < file.txt but i need something to work with huge number of rows to columns.
Thank you in advance.

Try this :
awk -F= '{
arr1[NR]=$1
arr2[NR]=$2
}
END{
for (i in arr1) {
printf("%s ", arr1[i])
}
print""
for (i in arr2) {
printf("%s ", arr2[i])
}
print ""
}' file
Output:
a b c d e f
a b c d e f

You can separate the file using the internal field separator:
while IFS== read -r left right; do echo $left; done < "test.txt" | xargs
This gives you the left side. For the right side, you could do
while IFS== read -r left right; do echo $right; done < "test.txt" | xargs
If you are talking about only 150 rows, scanning the file twice should be finde.

mash of echo, cut and tr
$ cat ip.txt
a=1
b=2
c=3
d=4
$ echo $(cut -d= -f1 ip.txt | tr '\n' ' ') ; echo $(cut -d= -f2 ip.txt | tr '\n' ' ')
a b c d
1 2 3 4

Do not want to delimit a value in awk

I have a record like this
1664|41.0000|0.683333|0.6560|
Command
$ awk -F"|" '/AL_ALL_CALLS_1.6P/ { if($22>0 && $182!="" && !$183)
print $3,$7,$10,$12,$15,$22,$24,$36,$39,$40,$96,$103,$182,$184,$186}'
CDR_File_1.txt | awk -F"|" '{ for (i=1;i<=NF;i++) { if ($i=="") {
$i="0" } } OFS=" ";print }' | awk -F" " '{print
$1,$2,$3,$4,$5,$6,$6/60,$7,$8,$9,$10,$11,$12,$13,$14,$15}' | sed "s/
/|/g" | awk -F"[|.]" '{for (i=1;i<=NF;i++) {if ($i==$i+0)
{n=split($i,a,"."); $i=sprintf("%d %d", a[1], a[2])}}}1' | head -1
Output
1664 0 41 0 0 0 0 0 683333 0 0 0 6560
Expected
1664 41 0000 0 683333 0 6560

Just check if a given field is a number and, in such case, split it:
awk '/anu/ { # lines containing "anu"
for (i=1;i<=NF;i++) { # loop through the fields
if ($i==$i+0) { # if it is a number
n=split($i,a,".") # slice the number
$i=sprintf("%d %d", a[1], a[2]) # put it back together with a space
}
}
}1' file # print the line
See it in action:
$ awk '/anu/ {for (i=1;i<=NF;i++) {if ($i==$i+0) {n=split($i,a,"."); $i=sprintf("%d %d", a[1], a[2])}}}1' file
45 0 0 25 abc anurag.jain
25.12 1.25 xyz stack
The key point here is the usage of the format-control letter %d in printf to remove the now superfluous leading zeroes:
$ awk 'BEGIN {printf "%d %d", 0000001, 01}'
1 1
Also, the usage of $var == $var +0 to check if a field is a number or not:
$ awk 'BEGIN {print "a" == "a" + 0}'
0
$ awk 'BEGIN {print 23.0 == 23.0 + 0}'
1
From your updated question I see you don't need to remove extra zeros: with $i=sprintf("%s %s", a[1], a[2]) we have more than enough. Also, since you have integers that do not need extra processing, it is best to check for these fields differently, for example with $i~/^[0-9]+\.[0-9]+$/.
$ awk -F"|" '{for (i=1;i<=NF;i++) {if ($i~/^[0-9]+\.[0-9]+$/) {n=split($i,a,"."); $i=sprintf("%s %s", a[1], a[2])}}}1' file
1664 41 0000 0 683333 0 6560

Compare value with 1st and 2nd column print corresponding 3rd column

I have file like this
01 10 a
11 20 b
21 30 c
31 40 d
41 50 e
I want to input a number and compare with 1st and 2nd column and to print corresponding 3rd column
For example if I enter 23 it should display c, if I enter 45 it should display e

egrep "^${DIGIT_1}[0-9] ${DIGIT_2}[0-9]" file | awk '{print $3}'
DIGIT_1 is 2 and DIGIT_2 is 3 in your example

Use this simple script
#!/bin/sh
echo "Enter the number"
read num
while read line
do
set -- $line
if [ $num -ge $1 ] && [ $num -le $2 ] ;then
echo $3
exit 1
fi
done < filename
echo "not found"

another awk approach
awk -v d=<yournumber> '{dt=int(d/10);du=d-dt*10;c1=int($1/10);c2=int($2/10);if(dt==c1&&du==c2)print $3}' <yourfile>

In a simple awk script:
% awk -vfirst=2 -vsecond=3 '
$1 ~ first && $2 ~ second { print $3 }
' file-like-this
c
% awk -vfirst=4 -vsecond=5 '
$1 ~ first && $2 ~ second { print $3 }
' file-like-this
e
You can get awk to determine the first and second digits of a number like so:
% awk -vnumber=45 '
BEGIN { first = int(number / 10); second = number % 10 }
$1 ~ first && $2 ~ second { print $3 }
' file-like-this
e

Simple awk approach
cat file
01 10 a
11 20 b
21 30 c
31 40 d
41 50 e
i=40
awk 'inp<=$2 {f=$3;exit} END {print $3}' inp=$i file
d

How to find the distinct values in unix

I need distinct values from the below columns:
AA|BB|CC
a#gmail.com,c#yahoo.co.in|a#gmail.com|a#gmail.com
y#gmail.com|x#yahoo.in,z#redhat.com|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in
Here records are '|' seperated and in the 1st column, we can two email id's which are ',' seperated. so, I want to consider that also. I want distinct email id's in the AA,BB,CC column, whether it is '|' seperated or ',' seperated.
Expected output:
c#yahoo.co.in|a#gmail.com|
y#gmail.com|x#yahoo.in|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in

is awk unix enough for you?
{
for(i=1; i < NF; i++) {
if ($i ~ /#/) {
mail[$i]++
}
}
}
END {
for (x in mail) {
print mail[x], x
}
}
output:
$ awk -F'[|,]' -f v.awk f1
2 z#redhat.com
3 a#gmail.com
1 x#yahoo.in
1 c#yahoo.co.in
1 c#gmail.com
1 y#gmail.com
1 b#yahoo.co.in

Using awk :
cat file | tr ',' '|' | awk -F '|' '{ line=""; for (i=1; i<=NF; i++) {if ($i != "" && list[NR"#"$i] != 1){line=line $i "|"}; list[NR"#"$i]=1 }; print line}'
Prints :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|z#redhat.com|
c#gmail.com|b#yahoo.co.in|c#uix.xo.in|
Edit :
Now works properly with inputs such as :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|a#gmail.com|
c#gmail.com|c#yahoo.co.in|c#uix.xo.in|
Prints :
a#gmail.com|c#yahoo.co.in|
y#gmail.com|x#yahoo.in|a#gmail.com|
c#gmail.com|c#yahoo.co.in|c#uix.xo.in|

The following python code will solve your problem:
#!/usr/bin/env python
while True:
try:
addrs = raw_input()
except EOFError:
break
print '|'.join(set(addrs.replace(',', '|').split('|')))

In Bash only:
while read s; do
IFS='|,'
for e in $s; do
echo "$e"
done | sort | uniq
unset IFS
done

This seems to work, although I'm not sure what to do if there are more than three unique mails. Run with awk -f filename.awk dataname.dat
BEGIN {IFS=/[,|]/}
NF {
delete uniqmails;
for (i=1; i<=NF; i++)
uniqmails[$i] = 1;
sep="";
n=0;
for (m in uniqmails) {
printf "%s%s", sep, m;
sep="|";
n++;
}
for (;n<3;n++) printf "|";
print ""; // EOL
}
There's also this "one-liner" that doesn't need awk:
while read line; do
echo $line | tr ",|" "\n" | sort -u |\
paste <( seq 3) - | cut -f 2 |\
tr "\n" "|" |\
rev | cut -c 2- | rev;
done

With perl:
perl -lane '$s{$_}++ for split /[|,]/; END { print for keys %s;}' input

I have edited this post, Hope it will work
while read line
do
val1=`echo $line|awk -F"|" '{print $1}'`
val2=`echo $line|awk -F"|" '{print $2}'`
val3=`echo $line|awk -F"|" '{print $3}'`
a=`echo $line|awk -F"|" '{print $2,"|",$3}'|sed 's/'$val1'//g'`
aa=`echo "$val1|$a"`
b=`echo $aa|awk -F"|" '{print $1,"|",$3}'|sed 's/'$val2'//g'`
b1=`echo $b|awk -F"|" '{print $1}'`
b2=`echo $b|awk -F"|" '{print $2}'`
bb=`echo "$b1|$val2|$b2"`
c=`echo $bb|awk -F"|" '{print $1,"|",$2}'|sed 's/'$val3'//g'`
cc=`echo "$c|$val3"|sed 's/,,/,/;s/,|/|/;s/|,/|/;s/^,//;s/ //g'`
echo "$cc">>abcd
done<ab.dat
cat abcd
c#yahoo.co.in||a#gmail.com
y#gmail.com|x#yahoo.in|z#redhat.com
c#gmail.com|b#yahoo.co.in|c#uix.xo.in
You can subtract all "," separated values and parse in the same way...if your all values are having "," separated.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Awk/Unix group by - unix

have this text file: name, age joe,42 jim,20 bob,15 mike,24 mike,15 mike,54 bob,21 Trying to get this (count): joe 1 jim 1 bob 2 mike 3 Thanks,

Strip the header row, drop the age field, group the same names together (sort), count identical runs, output in desired format. tail -n +2 txt.txt | cut -d',' -f 1 | sort | uniq -c | awk '{ print $2, $1 }' output bob 2 jim 1 joe 1 mike 3

A strictly awk solution... BEGIN { FS = "," } { ++x[$1] } END { for(i in x) print i, x[i] } If name, age is really in the file, you could adjust the awk program to ignore it... BEGIN { FS = "," } /[0-9]/ { ++x[$1] } END { for(i in x) print i, x[i] }

cut -d',' -f 1 file.txt | sort | uniq -c 2 bob 1 jim 1 joe 3 mike

Related

awk $4 column if column = value with characters thereafter

Make each 2 rows a separated column in UNIX

Do not want to delimit a value in awk

Compare value with 1st and 2nd column print corresponding 3rd column

How to find the distinct values in unix

Categories

Resources