Format output of concatenating 2 variables in unix - unix

I am coding a simple shell script that checks the space of the target path and the space utilization per directory on that target path (example, I am checking space of /path1/home, and also checks how all the folders on /path1/home is consuming the total space.) My question is regarding the output it produces, it is not that pleasing to the eye (uneven spacing). See sample output lines below.
SIZE USER_FOLDER DATE_LAST_MODIFIED
83G FOLDER 1 Apr 15 03:45
34G FOLDER 10 Mar 9 05:02
26G FOLDER 11 Mar 29 13:01
8.2G FOLDER 100 Apr 1 09:42
1.8G FOLDER 101 Apr 11 13:50
1.3G FOLDER 110 Feb 16 09:30
I just want the output format to be in line with the header so it will look neat because I will use it as a report. Here is the code I am using for this part.
ls -1 | grep -v "lost+found" |grep -v "email_body.tmp" > $v_path/Users.tmp
for user in `cat $v_path/Users.tmp | grep -v "Users.tmp"`
do
folder_size=`du -sh $user 2>/dev/null` # should be run using a more privileged user so that other folders can be read (2>/dev/null was used to discard error messages i.e. "du: cannot read directory `./marcnad/.gnupg': Permission denied")
folder_date=`ls -ltr | tr -s " " | cut -f6,7,8,9, -d" " | grep -w $user | cut -f1,2,3, -d" "`
folder_size="$folder_size $folder_date"
echo $folder_size >> $v_path/Users_Usage.tmp
done
echo "Summary of $v_path Disk Space Utilization per folder." >> email_body.tmp
echo "" >> email_body.tmp
echo "SIZE USER_FOLDER DATE_LAST_MODIFIED" >> email_body.tmp
for i in T G M K
do
cat $v_path/Users_Usage.tmp | grep [0-9]$i | sort -nr -k 1 >> $v_path/email_body.tmp
done
Thanks!
EDIT: Formatting

When you print the data use printf instead of echo
cat $v_path/Users_Usage.tmp | while read a b c d e f
do
printf '%-5s%-7%s%-4s%-4s%-3s-6s' $a $b $c $d $e $f
done
See here

Related

How to list the quantity of files per subdir on Unix

I have 60 subdirs in a directory, example name of the directory: test/queues.
The subdirs:
test/queues/subdir1
test/queues/subdir2
test/queues/subdir3
(...)
test/queues/subdir60
I want a command that gives me the output of the number of files in each subdirectory, listed separately, example:
test/queues/subdir1 - 45 files
test/queues/subdir2 - 76 files
test/queues/subdir3 - 950 files
(...)
test/queues/subdir60 - 213 files
Through my researchs, I only got the command ls -lat test/queues/* | wc -l, but this command outputs me the total of files in all of these subdirs. For example, It returns me only 4587, that is the total number of files in all these 60 subdirs. I want the output listing separately, the quantity of files in each folder.
How can I do that ?
Use a loop to count the lines for every subdirectory individually:
for d in test/queues/*/
do
echo "$d" - $(ls -lat "$d" | wc -l)
done
Note that the output of ls -lat some_directory will contain a few additional lines like
total 123
drwxr-xr-x 1 user group 0 Feb 26 09:51 ../
drwxr-xr-x 1 user group 0 Jan 25 12:35 ./
If your ls command supports these options you can use
for d in test/queues/*/
do
echo "$d" - $(ls -A1 "$d" | wc -l)
done
You can apply ls | wc -l in a loop to all subdirs
for x in *; do echo "$x => $(ls $x | wc -l)"; done;
If you want to restrict the output to directories that are one level deep and you only want a count of regular files, you could do:
find . -type d -maxdepth 1 -exec sh -c '
printf "%s\t" "$0"; find "$0" -type f -maxdepth 1 | wc -l' {} \; \
| column -t
You can get the "name - %d files" format with:
find . -type d -maxdepth 1 -exec sh -c '
printf "%s - %d files\n" "$0" \
"$(find "$0" -type f -maxdepth 1 | wc -l)"' {} \;
Using find and awk:
find test/queues -maxdepth 2 -mindepth 2 -printf "%h\n" | awk '{ map[$0]++ } END { for (i in map) { print i" - "map[i]} }'
Use maxdepth and mindepth to ensure that we are only searching the directory structure one level down. Print only the leading directories thorough printf "%h" Pipe the output into awk and create an incrementing map array with the directories as the index. At the end, loop through the map array printing the directories and the counts.
On Unix in the case of not -printf option with find, use exec dirname instead:
find test/queues -maxdepth 2 -mindepth 2 -exec dirname {} \; | awk '{ map[$0]++ } END { for (i in map) { print i" - "map[i]} }'

UNIX (AIX) Command Help - Sed & Awk

I'm running this on an AIX 6.1.
The intended purpose of this command is to display the following information in the following format:
GetUsedRAM:GetUsedSwap:CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User
The command is composed of several sub commands:
echo `vmstat 1 2 | tr -s ' ' ':' | cut -d':' -f4,5,14-15 | tail -1 | sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/'``mpstat -a 1 1 | tr -s ' ' '|' | head -8 | tail -4 | cut -d'|' -f 25,27 | awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' | sed '$s/.$//'| sed -e "s/ \{1,\}$//"| awk '{int a[10];split($1, a,":");printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])}'`
Which I'll re format for clarity:
echo \
`vmstat 1 2 |
tr -s ' ' ':' |
cut -d':' -f4,5,14-15 |
tail -1 |
sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/' \
` \
`mpstat -a 1 1 |
tr -s ' ' '|' |
head -8 |
tail -4 |
cut -d'|' -f 25,27 |
awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' |
sed '$s/.$//' |
sed -e "s/ \{1,\}$//" |
awk '{int a[10];split($1, a,":");printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])}' \
`
I understand all of the tr, cut, head tail, and (roughly) vmstat/mpstat commands. The first sed is where I get lost, I've tried running the command in smaller segments and not quite sure why it seems to work as a whole but not when I truncate the command before the next tr.
I'm also not so sure on the awk command although I understand the premise vaguely, as a function allowing formatted output.
Similarly, I have a vague understanding of sed being a command allowing certain strings/characters being replaced in some file.
I'm not able to make out what this specific implementation in the above case is.
Could anyone provide some clarity or direction as to exactly what is happening at each sed and awk step within the context of the entire command?
Thanks for your help.
Simplification
This two simpler commands will get the exact same output:
# GetUsedRAM:GetUsedSwap:CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User
# Select fields 4,5 of last line, and format with :
comm1=`vmstat 1 2 |
awk '$4~/[0-9]/{avm=$4;fre=$5} END{printf "%s:%s",avm,fre}'
`
# Select fields 27 (sy) and 25 (us) for four cpu, print as decimal.
comm2=`mpstat -A 1 1 |
awk -v firstline=6 -v cpus=4 '
BEGIN{start=firstline-1; end=firstline+cpus;}
NR>start && NR<end {printf( ":%d:%d", $27,$25)}'
`
echo "${comm1}${comm2}"
Description.
Description of original commands
The whole command is the concatenation of two commands.
The first command:
The output of the vmstat is shown in this link.
The columns 4 and 5 are 'avm' and 'fre'. The output in columns 14 and 15,
seem to be 'us' (user) and 'sy' (system). And I say seem as no output
from the user is available to confirm.
The first command
`vmstat 1 2 | # Execute the command vmstat.
tr -s ' ' ':' | # convert all spaces to colon (:).
cut -d':' -f4,5,14-15 | # select fields 4,5,14,and 15
tail -1 | # select last line.
sed 's/\([0-9]*:[0-9]*:\)\([0-9]*:[0-9]*\)/\1/' \ # See below.
`
The sed command selects inside braces all digits [0-9]* before a colon
repeated twice. And then again (without the last colon). That's the whole
string in two parts: « (dd:dd:)(dd:dd) » (d means digit).
And finally, it replaces such whole string by what was selected inside
the first braces /\1/.
All this complexity just removes fields 14 and 15 as selected by cut.
A simpler command with exactly the same output is:
Select fields 4,5 of last line, and format with (:).
`vmstat 1 2 | awk '
$4~/[0-9]/{avm=$4;fre=$5} END{printf "%s:%s:",avm,fre}'
`
The second command:
The output of mpstat -A is similar to this one from Linux.
And also similar to this AIX mpstat -d output.
However, the exact output of AIX 6.1 for mpstat -a (ALL) on the computer
used could have several variations. Anyway, guided by the intended final
output desired: CPU_0_System:CPU_0_User:…CPU_N_System:CPU_N_User.
It seems that the columns to be selected should be us (user) and sy
(sys) percent of time that used the cpu for all cpu in use,
which seem to be four on the computer measured.
The manual for AIX 6.1 mpstat is here.
It has a list of all the 40 columns that are presented when the option
-a ALL is used:
CPU min maj mpcs mpcr dev soft dec ph cs ics bound rq push
S3pull S3grd S0rd S1rd S2rd S3rd S4rd S5rd S3hrd S4hrd S5hrd
sysc us sy wa id pc %ec ilcs vlcs lcs %idon %bdon %istol %bstol %nsp
us and sy are listed as the fields 27 and 28, however the command presented
by the user selects fields number 25 and 27. Close but not the same. The
only way to confirm would be to receive the output of the command from the user.
For testing I will be using the output of mpstat 5 1 from here.
# mpstat 5 1
System configuration: lcpu=4 ent=1.0 mode=Uncapped
cpu min maj mpc int cs ics rq mig lpa sysc us sy wt id pc %ec lcs
0 4940 0 1 632 685 268 0 320 100 263924 42 55 0 4 0.57 35.1 277
1 990 0 3 1387 2234 805 0 684 100 130290 28 47 0 25 0.27 16.6 649
2 3943 0 2 531 663 223 0 389 100 276520 44 54 0 3 0.57 34.9 270
3 1298 0 2 1856 2742 846 0 752 100 82141 31 40 0 29 0.22 13.4 650
ALL 11171 0 8 4406 6324 2142 0 2145 100 752875 39 51 0 10 1.63 163.1 1846
The second command
`mpstat -A 1 1 | # execute command
tr -s ' ' '|' | # replace all spaces with (|).
head -8 | # select 8 first lines.
tail -4 | # select last four lines.
cut -d'|' -f 25,27 | # select fields 25 and 27
awk -F "|" '{printf "%.0f:%.0f:",$2,$1}' | # print the fields as integers.
sed '$s/.$//' | # on the last line ($), substitute the last character (.$) by nothing.
sed -e "s/ \{1,\}$//" | # remove trailing space(s).
awk '{
int a[10];
split($1, a,":");
printf("%d:%d:%d:%d:%d:%d:%d:%d",a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7])
}' \
`
About the int: For older versions of awk, calling a function without the parentheses is equivalent to call the function on $0. int is equivalent to int($0), which is not printed, nor used. The same happens to the value of a[10].
The split sets each value of the command in a[i]. Then, all values of a[i] are printed as decimals.
The equivalent, and way simpler is:
Command #2
`mpstat -A 1 1 |
awk -v firstline=6 -v cpus=4 '
BEGIN{start=firstline-1; end=firstline+cpus;}
NR>start && NR<end {printf( ":%d:%d", $27,$25)}'
`

How to find count of a particular word in Different Files in Unix

How Do i Find Count of a particular word in Different Files in Unix:
I have: 50 file in a Directory (abc.txt, abc.txt.1,abc.txt.2, etc)
What I want: To Find number of instances of word 'Hello' in each file.
What I have used is grep -c Hello abc* | grep -v :0
It gave me result in Form of,
<<File name>> : <<count>>
I want Output to be in a form
<<Date>> <<File_Name>> <<Number of Instances of word Hello in the file>>
1-1-2001 abc.txt 23
1-1-2014 abc.txt.19 57
2-5-2015 abc.txt.49 16
You can use gnu awk >=4.0 (due to ENDFILE) to get the number.
If we know where the data comes from, I will add it.
awk '{for (i=1;i<=NF;i++) if ($i~/Hello/) a++} ENDFILE {print FILENAME,a;a=0}' abc.txt*
### Sample code for you to tweak for your needs:
touch test.txt
echo "ravi chandran marappan 30" > test.txt
echo "ramesh kumar marappan 24" >> test.txt
echo "ram lakshman marappan 22" >> test.txt
sed -e 's/ /\n/g' test.txt | sort | uniq | awk '{print "echo """,$1,
"""`grep -wc ",$1," test.txt`"}' | sh
Results:
22 -1
24 -1
30 -1
chandran -1
kumar -1
lakshman -1
marappan -3
ram -1
ramesh -1
ravi -1`

Sort history on number of occurrences

Basically I want to print the 10
most used commands that are stored in the
bash history but they still have to be proceeded
by the number that indicates when it was used;
I got this far:
history | cut -f 2 | cut -d ' ' -f 3,5 | sort -k 2 -n
Which should sort the second column of the number of occurrences from the command in that row... But it doesn't do that. I know I can head -10 the pipe at the end to take the highest ten of them, but I'm kinda stuck with the sorting part.
The 10 most used commands stored in your history:
history | sed -e 's/ *[0-9][0-9]* *//' | sort | uniq -c | sort -rn | head -10
This gives you the most used command line entries by removing the history number (sed), counting (sort | uniq -c), sorting by frequency (sort -rn) and showing only the top ten entries.
If you just want the commands alone:
history | awk '{print $2;}' | sort | uniq -c | sort -rn | head -10
Both of these strip the history number. Currently, I have no idea, how to achieve that in one line.
If you want to find the top used commands in your history file, you will have to count the instances in your history. awk can be used to do this. In the following code, the awk segment will create a hashtable with commands as the key and the number of times they appear as the value. This is printed out with the last history number for that command and sorted:
history | cut -f 2 | cut -d ' ' -f 3,5 | awk '{a[$2]++;b[$2]=$1} END{for (i in a) {print b[i], i, a[i]}}' | sort -k3 -rn | head -n 10
Output looks like:
975 cd 142
972 vim 122
990 ls 118
686 hg 90
974 mvn 51
939 bash 39
978 tac 32
958 cat 28
765 echo 27
981 exit 17
If you don't want the last column you could pipe the output through cut -d' ' -f1,2.

How to properly grep filenames only from ls -al

How do I tell grep to only print out lines if the "filename" matches when I'm piping through ls? I want it to ignore everything on each line until after the timestamp. There must be some easy way to do this on a single command.
As you can see, without it, if I searched for the file "rwx", it would return not only the line with rwx.c, but also the first three lines because of permissions. I was going to use AWK but I want it to display the whole last line if I search for "rwx".
Any ideas?
EDIT: Thanks for the hacks below. However, it would be great to have a more bug-free method. For example, if I had a file named "rob rob", I wouldn't be able to use the stated solutions.
drwxrwxr-x 2 rob rob 4096 2012-03-04 18:03 .
drwxrwxr-x 4 rob rob 4096 2012-03-04 12:38 ..
-rwxrwxr-x 1 rob rob 13783 2012-03-04 18:03 a.out
-rw-rw-r-- 1 rob rob 4294 2012-03-04 18:02 function1.c
-rw-rw-r-- 1 rob rob 273 2012-03-04 12:54 function1.c~
-rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rwx.c
-rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rob rob
The following will list only file name, and one file in each row.
$ ls -1
To include . files
$ ls -1a
Please note that the argument is number "1", not letter "l".
Why don't you use grep and match the file name following the timestamp?
grep -P "[0-9]{2}:[0-9]{2} $FILENAME(\.[a-zA-Z0-9]+)?$"
The [0-9]{2}:[0-9]{2} is for the time, the $FILENAME is where you'd put rob rob or rwx, and the trailing (\.[a-zA-Z0-9]+)? is to allow for an optional extension.
Edit: #JonathanLeffler below points out that when files are older than bout 6 months the time column gets replaced by a year - this is what happens on my computer anyhow. You could do ([0-9]{2}:[0-9]{2}|(19|20)[0-9]{2}) to allow time OR year, but you may be best of using awk (?).
[foo#bar ~/tmp]$ls -al
total 8
drwxrwxr-x 2 foo foo 4096 Mar 5 09:30 .
drwxr-xr-- 83 foo foo 4096 Mar 5 09:30 ..
-rw-rw-r-- 1 foo foo 0 Mar 5 09:30 foo foo
-rw-rw-r-- 1 foo foo 0 Mar 5 09:29 rwx.c
-rw-rw-r-- 1 foo foo 0 Mar 5 09:29 tmp
[foo#bar ~/tmp]$export filename='foo foo'
[foo#bar ~/tmp]$echo $filename
foo foo
[foo#bar ~/tmp]$ls -al | grep -P "[0-9]{2}:[0-9]{2} $filename(\.[a-zA-Z0-9]+)?$"
-rw-rw-r-- 1 cha66i cha66i 0 Mar 5 09:30 foo foo
(You could additionally extend to matching the whole line if you wanted:
^ # start of line
[d-]([r-][w-][x-]){3} + # permissions & space (note: is there a 't' or 's'
# sometimes where the 'd' can be??)
[0-9]+ # whatever that number is
[\w-]+ [\w-]+ + # user/group (are spaces allowed in these?)
[0-9]+ + # file size (modify for -h switch??)
(19|20)[0-9]{2}- # yyyy (modify if you want to allow <1900)
(1[012]|0[1-9])- # mm
(0[1-9]|[12][0-9]|3[012]) + # dd
([01][0-9]|2[0-3]):[0-6][0-9] +# HH:MM (24hr)
$filename(\.[a-zA-Z0-9]+)? # filename & optional extension
$ # end of line
. You get the point, tailor to your needs.)
Assuming that you aren't prepared to do:
ls -ld $(ls -a | grep rwx)
then you need to exploit the fact that there are 8 columns with space separation before the file name starts. Using egrep (or grep -E), you could do:
ls -al | egrep "^([^ ]+ +){8}.*rwx"
This looks for 'rwx' after the 8th column. If you want the name to start with rwx, omit the .*. If you want the name to end with rwx, add a $ at the end. Note that I used double quotes so you could interpolate a variable in place of the literal rwx.
This was tested on Mac OS X 10.7.3; the ls -l command consistently gives three columns for the date field:
-r--r--r-- 1 jleffler staff 6510 Mar 17 2003 README,v
-r--r--r-- 1 jleffler staff 26676 Mar 3 21:44 ccs.nmd
Your ls -l seems to be giving just two columns, so you'd need to change the {8} to {7} for your machine - and beware migrating between systems.
Well, if you're working with filenames that don't have spaces in them, you could do something like this:
grep 'rwx\S*$'
Aside frrm the fact that you can use pattern matching with ls, exaple ksh and bash,
which is probably what you should do, you can use the fact that filename occur in a
fixed position. awk (gawk, nawk or whaever you have) is a better choice for this.
If you have to use grep it smells like homework to me. Please tag it that way.
Assume the filename starting position is based on this output from ls -l in linux: 56
-rwxr-xr-x 1 Administrators None 2052 Feb 28 20:29 vote2012.txt
ls -l | awk ' substr($0,56) ~/your pattern even with spaces goes here/'
e.g.,
ls -l | awk ' substr($0,56) ~/^val/'
will find files starting with "val"
As a simple hack, just add a space before your filename so you don't match the beginning of the output:
ls -al | grep '\srwx'
Edit: OK, this is not as robust as it should be. Here's awk:
ls -l | awk ' $9 ~ /rwx/ { print $0 }'
This works for me, unlike ls -l & others as some folks pointed out. I like this because its really generic & gives me the base file name, which removes the path names before the file.
ls -1 /path_name |awk -F/ '{print $NF}'
Only one command you needed for this --
ls -al | gawk '{print $9}'
You can use this:
ls -p | grep -v /
this is super old, but i needed the answer and had a hard time finding it. i didn't really care about the one-liner part; i just needed it done. this is down and dirty and requires that you count the columns. i'm not looking for an upvote here, just leaving some options for future searcher-ers.
the helpful awk trick is here -- Using awk to print all columns from the nth to the last
if
YOUR_FILENAME="rob rob"
and
WHERE_FILENAMES_START=8
ls -al | while read x; do
y=$(echo "$x" | awk '{for(i=$WHERE_FILENAMES_START; i<=NF; ++i) printf $i""FS; print ""}')
[[ "$YOUR_FILENAME " = "$y" ]] && echo "$x"
done
if you save it as a bash script and swap out the vars with $2 and $1, throw the script in your usr bin... then you'll have your clean simple one-liner ;)
output will be:
> -rw-rw-r-- 1 rob rob 16 2012-03-04 18:02 rob rob
the question was for a one-liner so...
ls -al | while read x; do [[ "$YOUR_FILENAME " = "$(echo "$x" | awk '{for(i=WHERE_FILENAMES_START; i<=NF; ++i) printf $i""FS; print ""}')" ]] && echo "$x" ; done
(lol ;P)
on another note: mathematical.coffee your answer was rad. it didn't solve my version of this problem, so i didn't upvote, but i liked your regex breakdown :D

Resources