Sort history on number of occurrences - unix

Basically I want to print the 10
most used commands that are stored in the
bash history but they still have to be proceeded
by the number that indicates when it was used;
I got this far:
history | cut -f 2 | cut -d ' ' -f 3,5 | sort -k 2 -n
Which should sort the second column of the number of occurrences from the command in that row... But it doesn't do that. I know I can head -10 the pipe at the end to take the highest ten of them, but I'm kinda stuck with the sorting part.

The 10 most used commands stored in your history:
history | sed -e 's/ *[0-9][0-9]* *//' | sort | uniq -c | sort -rn | head -10
This gives you the most used command line entries by removing the history number (sed), counting (sort | uniq -c), sorting by frequency (sort -rn) and showing only the top ten entries.
If you just want the commands alone:
history | awk '{print $2;}' | sort | uniq -c | sort -rn | head -10
Both of these strip the history number. Currently, I have no idea, how to achieve that in one line.

If you want to find the top used commands in your history file, you will have to count the instances in your history. awk can be used to do this. In the following code, the awk segment will create a hashtable with commands as the key and the number of times they appear as the value. This is printed out with the last history number for that command and sorted:
history | cut -f 2 | cut -d ' ' -f 3,5 | awk '{a[$2]++;b[$2]=$1} END{for (i in a) {print b[i], i, a[i]}}' | sort -k3 -rn | head -n 10
Output looks like:
975 cd 142
972 vim 122
990 ls 118
686 hg 90
974 mvn 51
939 bash 39
978 tac 32
958 cat 28
765 echo 27
981 exit 17
If you don't want the last column you could pipe the output through cut -d' ' -f1,2.

Related

Count lines that contains a repeated pattern

I have a file (File.txt) that contains 10 lines which may contains a repeated patterns at a certain position (14-17)
fsdf sfkljkl4565
fjjf lmlkfdm1235
fkljfgdfgdfg6583
eretjioijolj6933
ioj ijijsfoi4565
dgodiiopkpok6933
fsj opkjfiej4565
ihfzejijjijf4565
dfsdkfjlfeff1235
dijdijojijdz4565
The Desired Output is counting the lines that contains a pattern :
#occurences pattern
5 4565
2 1235
1 6583
2 6933
I have tried to filter the file
cat File.txt | cut -c14-17 | sort -n -K1,1-1,3 >> File_Filtered.txt
I need your help to add the first column (#occurences)
To get a count of repeats, use uniq -c. Thus, try:
$ cut -c13-17 File.txt | sort -n | uniq -c | sort -nr
5 4565
2 6933
2 1235
1 6583
The above was tested using Linux with GNU utilities. (Judging by your sample code, you may be using different tools.)
Including a header
The following includes the header and uses column -t to assure that everything lines up nicely:
$ { echo '#occurences pattern'; cut -c13-17 File.txt | sort -n | uniq -c | sort -nr; } | column -t
#occurences pattern
5 4565
2 6933
2 1235
1 6583
$ awk '{cnt[substr($0,13)]++} END{for (i in cnt) print cnt[i], i}' file
2 6933
1 6583
5 4565
2 1235

what is the alternate way to count the occurrence of each word without using 'uniq -c' command?

Is it possible to count the occurrence of each word like using uniq -c but with the count after the word rather than before?
Example scenario
Input file named as text1.txt which contain the following data
Renault:cilo:84563
Renault:cilo:84565
M&M:Thar:84566
Tata:nano:84567
M&M:quanto:84568
M&M:quanto:84569
The fields used in the above data are car_company:car_model:customerID
Desired result
cilo 2
Thar 1
nano 1
quanto 2
(car_model and number of cars sold grouped by car_model)
My code
cat test1.txt | cut -d: -f2 | uniq -c
Actual Result
2 cilo
1 Thar
1 nano
2 quanto
Is it possible to do the above process without using uniq -c ,so that I can swap the order of the fields (columns)?
You can use uniq, and simply post-process its output to swap the columns:
cut -d: -f2 test1.txt | uniq -c | awk '{print $2 "\t" $1 "\n" }'
EDIT: Added \n, as noted in a comment.
Save your commands output into a file "badresult";
cat test1.txt | cut -d: -f2 | uniq -c > badresult
Then cut the seventh field and save it into a file named "counts"(you should use space(" ") as a seperator);
cut -d" " -f7 badresult > counts
Then cut the eighth field and save it into a file named "models"(you should use space(" ") as a seperator);
cut -d" " -f8 badresult > models
Now you have your counts and models in seperate files. All you have to do is to show these two files seperately with "pr" command(-m: one file per column, -T:no pre-information)
pr -m -T models counts
Using awk:
cat test1.txt | cut -d: -f2 | uniq -c | awk '{ t = $1; $1 = $2; $2 = t; print }'
The little awk code exchanges fields 1 and 2 using a temporary.
You just need awk for this:
$ awk -F: '{a[$2]++} END {for (i in a) print i, a[i]}' file
cilo 2
quanto 2
nano 1
Thar 1
This goes through every line keeping track of how many times the second field has appeared. Since everything is stored in the array a, then it is just a matter of looping through it and printing its content.

Compare 2 files in unix file1(2M numbers/rows/lines) , file2(2,000,480 numbers/rows/lines)

How can I compare this 2 big files in unix.
I've already tried using 'grep -Fxvf file1.txt file2.txt | wc -l' but the output is 2,000,480 and when switching file1 and file2 the output is 1,999,999.
How can I get the output of '480' because that's what i am expecting.
I've also tried using diff/cmp commands but the output is too complicated.
I think you want an absolute value of a difference in line numbers in 2 files. You can achieve it easily with awk and get a decent result. You'd read numbers of lines in an array and later subtract the array values in the END block. For pure shell it'd have to get more complex. Imagine you get some test data generated (10 and 14 line files):
$ seq 1 10 > ten
$ seq 1 14 > fourteen
And then you do:
$ ( wc -l ten ; wc -l fourteen ) | awk '{ print $1}' | sort -rn | xargs -J % echo % - p | dc
The result:
4
But much better way would be do just do it in 3 lines (get word count for file1, then file2 and then subtract)

UNIX (AIX) Command Help - Sed & Awk

I'm running this on an AIX 6.1.
The intended purpose of this command is to display the following information in the following format:
GetUsedRAM:GetUsedSwap:CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User
The command is composed of several sub commands:
echo `vmstat 1 2 | tr -s ' ' ':' | cut -d':' -f4,5,14-15 | tail -1 | sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/'``mpstat -a 1 1 | tr -s ' ' '|' | head -8 | tail -4 | cut -d'|' -f 25,27 | awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' | sed '$s/.$//'| sed -e "s/ \{1,\}$//"| awk '{int a[10];split($1, a,":");printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])}'`
Which I'll re format for clarity:
echo \
`vmstat 1 2 |
tr -s ' ' ':' |
cut -d':' -f4,5,14-15 |
tail -1 |
sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/' \
` \
`mpstat -a 1 1 |
tr -s ' ' '|' |
head -8 |
tail -4 |
cut -d'|' -f 25,27 |
awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' |
sed '$s/.$//' |
sed -e "s/ \{1,\}$//" |
awk '{int a[10];split($1, a,":");printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])}' \
`
I understand all of the tr, cut, head tail, and (roughly) vmstat/mpstat commands. The first sed is where I get lost, I've tried running the command in smaller segments and not quite sure why it seems to work as a whole but not when I truncate the command before the next tr.
I'm also not so sure on the awk command although I understand the premise vaguely, as a function allowing formatted output.
Similarly, I have a vague understanding of sed being a command allowing certain strings/characters being replaced in some file.
I'm not able to make out what this specific implementation in the above case is.
Could anyone provide some clarity or direction as to exactly what is happening at each sed and awk step within the context of the entire command?
Thanks for your help.
Simplification
This two simpler commands will get the exact same output:
# GetUsedRAM:GetUsedSwap:CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User
# Select fields 4,5 of last line, and format with :
comm1=`vmstat 1 2 |
awk '$4~/[0-9]/{avm=$4;fre=$5} END{printf "%s:%s",avm,fre}'
`
# Select fields 27 (sy) and 25 (us) for four cpu, print as decimal.
comm2=`mpstat -A 1 1 |
awk -v firstline=6 -v cpus=4 '
BEGIN{start=firstline-1; end=firstline+cpus;}
NR>start && NR<end {printf( ":%d:%d", $27,$25)}'
`
echo "${comm1}${comm2}"
Description.
Description of original commands
The whole command is the concatenation of two commands.
The first command:
The output of the vmstat is shown in this link.
The columns 4 and 5 are 'avm' and 'fre'. The output in columns 14 and 15,
seem to be 'us' (user) and 'sy' (system). And I say seem as no output
from the user is available to confirm.
The first command
`vmstat 1 2 | # Execute the command vmstat.
tr -s ' ' ':' | # convert all spaces to colon (:).
cut -d':' -f4,5,14-15 | # select fields 4,5,14,and 15
tail -1 | # select last line.
sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/' \ # See below.
`
The sed command selects inside braces all digits [0-9]* before a colon
repeated twice. And then again (without the last colon). That's the whole
string in two parts: « (dd:dd:)(dd:dd) » (d means digit).
And finally, it replaces such whole string by what was selected inside
the first braces /\1/.
All this complexity just removes fields 14 and 15 as selected by cut.
A simpler command with exactly the same output is:
Select fields 4,5 of last line, and format with (:).
`vmstat 1 2 | awk '
$4~/[0-9]/{avm=$4;fre=$5} END{printf "%s:%s:",avm,fre}'
`
The second command:
The output of mpstat -A is similar to this one from Linux.
And also similar to this AIX mpstat -d output.
However, the exact output of AIX 6.1 for mpstat -a (ALL) on the computer
used could have several variations. Anyway, guided by the intended final
output desired: CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User.
It seems that the columns to be selected should be us (user) and sy
(sys) percent of time that used the cpu for all cpu in use,
which seem to be four on the computer measured.
The manual for AIX 6.1 mpstat is here.
It has a list of all the 40 columns that are presented when the option
-a ALL is used:
CPU min maj mpcs mpcr dev soft dec ph cs ics bound rq push
S3pull S3grd S0rd S1rd S2rd S3rd S4rd S5rd S3hrd S4hrd S5hrd
sysc us sy wa id pc %ec ilcs vlcs lcs %idon %bdon %istol %bstol %nsp
us and sy are listed as the fields 27 and 28, however the command presented
by the user selects fields number 25 and 27. Close but not the same. The
only way to confirm would be to receive the output of the command from the user.
For testing I will be using the output of mpstat 5 1 from here.
# mpstat 5 1
System configuration: lcpu=4 ent=1.0 mode=Uncapped
cpu min maj mpc int cs ics rq mig lpa sysc us sy wt id pc %ec lcs
0 4940 0 1 632 685 268 0 320 100 263924 42 55 0 4 0.57 35.1 277
1 990 0 3 1387 2234 805 0 684 100 130290 28 47 0 25 0.27 16.6 649
2 3943 0 2 531 663 223 0 389 100 276520 44 54 0 3 0.57 34.9 270
3 1298 0 2 1856 2742 846 0 752 100 82141 31 40 0 29 0.22 13.4 650
ALL 11171 0 8 4406 6324 2142 0 2145 100 752875 39 51 0 10 1.63 163.1 1846
The second command
`mpstat -A 1 1 | # execute command
tr -s ' ' '|' | # replace all spaces with (|).
head -8 | # select 8 first lines.
tail -4 | # select last four lines.
cut -d'|' -f 25,27 | # select fields 25 and 27
awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' | # print the fields as integers.
sed '$s/.$//' | # on the last line ($), substitute the last character (.$) by nothing.
sed -e "s/ \{1,\}$//" | # remove trailing space(s).
awk '{
int a[10];
split($1, a,":");
printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])
}' \
`
About the int: For older versions of awk, calling a function without the parentheses is equivalent to call the function on $0. int is equivalent to int($0), which is not printed, nor used. The same happens to the value of a[10].
The split sets each value of the command in a[i]. Then, all values of a[i] are printed as decimals.
The equivalent, and way simpler is:
Command #2
`mpstat -A 1 1 |
awk -v firstline=6 -v cpus=4 '
BEGIN{start=firstline-1; end=firstline+cpus;}
NR>start && NR<end {printf( ":%d:%d", $27,$25)}'
`

How to find count of a particular word in Different Files in Unix

How Do i Find Count of a particular word in Different Files in Unix:
I have: 50 file in a Directory (abc.txt, abc.txt.1,abc.txt.2, etc)
What I want: To Find number of instances of word 'Hello' in each file.
What I have used is grep -c Hello abc* | grep -v :0
It gave me result in Form of,
<<File name>> : <<count>>
I want Output to be in a form
<<Date>> <<File_Name>> <<Number of Instances of word Hello in the file>>
1-1-2001 abc.txt 23
1-1-2014 abc.txt.19 57
2-5-2015 abc.txt.49 16
You can use gnu awk >=4.0 (due to ENDFILE) to get the number.
If we know where the data comes from, I will add it.
awk '{for (i=1;i<=NF;i++) if ($i~/Hello/) a++} ENDFILE {print FILENAME,a;a=0}' abc.txt*
### Sample code for you to tweak for your needs:
touch test.txt
echo "ravi chandran marappan 30" > test.txt
echo "ramesh kumar marappan 24" >> test.txt
echo "ram lakshman marappan 22" >> test.txt
sed -e 's/ /\n/g' test.txt | sort | uniq | awk '{print "echo """,$1,
"""`grep -wc ",$1," test.txt`"}' | sh
Results:
22 -1
24 -1
30 -1
chandran -1
kumar -1
lakshman -1
marappan -3
ram -1
ramesh -1
ravi -1`

Resources