Unix find average file size - unix

I have a directory with a ton of files I want to find the average file size of these files so something like ls somethinghere whats the average file size of everything meets that?

I found something here:
http://vivekjain10.blogspot.com/2008/02/average-file-size-within-directory.html
To calculate the average file size within a directory on a Linux system, following command can be used:
ls -l | gawk '{sum += $5; n++;} END {print sum/n;}'

A short, general and recursion-friendly variation of Ernstsson's answer:
find ./ -ls | awk '{sum += $7; n++;} END {print sum/n;}'
Or, for example, if you want to impede files above 100 KB from stewing the average:
find ./ -size -100000c -ls | awk '{sum += $7; n++;} END {print sum/n;}'

Use wc -c * to get the size of all the files and ls | wc -l to get the number of files. Then just divide one by the other.

This works portably, even on AIX.
Outputs average number of bytes for plain files in the specified directory (${directory} in the example below):
find "${directory}" '!' -path "${directory}" -prune -type f -ls | awk '{s+=$7} END {printf "%.0f\n", s/NR}'
No need in counting the number of files yourself. NR is an awk builtin for number of rows.
The '!' -path ${directory} -prune part is a portable way to achieve the equivalent of GNU find -maxdepth 1 by pruning any path that is not the same as the one we start at, thereby ignoring any subdirectories.
Adjust with restrictions on what files to count. For instance, to average all files except *.sh in the current directory, you could add '!' -name '*.sh':
find . '!' -path . -prune -type f '!' -name '*.sh' -ls | awk '{s+=$7} END {printf "%.0f\n", s/NR}'
or to count only *.mp3 and include all subdirectories (remove '!' -path . -prune):
find . -type f -name '*.mp3' -ls | awk '{s+=$7} END {printf "%.0f\n", s/NR}'

du -sh . # gives the total space used by the directory
find . -type f | wc -l # count the number of files
devide the first by the second.
If you want a one liner, here it is:
echo $(( `du -sb | tr '.' ' '` / `find . -type f | wc -l` ))

They are finding the size of a directory and finding the amount of free disk space that exists on your machine. The command you would use to find the directory size is ' du '. And to find the free disk space you could use ' df '.
All the information present in this article is available in the man pages for du and df. In case you get bored reading the man pages and you want to get your work done quickly, then this article is for you.
-
'du' - Finding the size of a directory
$ du
Typing the above at the prompt gives you a list of directories that exist in the current directory along with their sizes. The last line of the output gives you the total size of the current directory including its subdirectories. The size given includes the sizes of the files and the directories that exist in the current directory as well as all of its subdirectories. Note that by default the sizes given are in kilobytes.
**$ du /home/david**
The above command would give you the directory size of the directory /home/david
**$ du -h**
This command gives you a better output than the default one. The option '-h' stands for human readable format. So the sizes of the files / directories are this time suffixed with a 'k' if its kilobytes and 'M' if its Megabytes and 'G' if its Gigabytes.
**$ du -ah**
This command would display in its output, not only the directories but also all the files that are present in the current directory. Note that 'du' always counts all files and directories while giving the final size in the last line. But the '-a' displays the filenames along with the directory names in the output. '-h' is once again human readable format.
**$ du -c**
This gives you a grand total as the last line of the output. So if your directory occupies 30MB the last 2 lines of the output would be
30M .
30M total
The first line would be the default last line of the 'du' output indicating the total size of the directory and another line displaying the same size, followed by the string 'total'. This is helpful in case you this command along with the grep command to only display the final total size of a directory as shown below.
**$ du -ch | grep total**
This would have only one line in its output that displays the total size of the current directory including all the subdirectories.
Note : In case you are not familiar with pipes (which makes the above command possible) refer to Article No. 24 . Also grep is one of the most important commands in Unix. Refer to Article No. 25 to know more about grep.
**$ du -s**
This displays a summary of the directory size. It is the simplest way to know the total size of the current directory.
**$ du -S**
This would display the size of the current directory excluding the size of the subdirectories that exist within that directory. So it basically shows you the total size of all the files that exist in the current directory.
**$ du --exculde=mp3**
The above command would display the size of the current directory along with all its subdirectories, but it would exclude all the files having the given pattern present in their filenames. Thus in the above case if there happens to be any mp3 files within the current directory or any of its subdirectories, their size would not be included while calculating the total directory size.
'df' - finding the disk free space / disk usage
$ df
Typing the above, outputs a table consisting of 6 columns. All the columns are very easy to understand. Remember that the 'Size', 'Used' and 'Avail' columns use kilobytes as the unit. The 'Use%' column shows the usage as a percentage which is also very useful.
**$ df -h**
Displays the same output as the previous command but the '-h' indicates human readable format. Hence instead of kilobytes as the unit the output would have 'M' for Megabytes and 'G' for Gigabytes.
Most of the users don't use the other parameters that can be passed to 'df'. So I shall not be discussing them.
I shall in turn show you an example that I use on my machine. I have actually stored this as a script named 'usage' since I use it often.
Example :
I have my Linux installed on /dev/hda1 and I have mounted my Windows partitions as well (by default every time Linux boots). So 'df' by default shows me the disk usage of my Linux as well as Windows partitions. And I am only interested in the disk usage of the Linux partitions. This is what I use :
**$ df -h | grep /dev/hda1 | cut -c 41-43**
This command displays the following on my machine
45%
Basically this command makes 'df' display the disk usages of all the partitions and then extracts the lines with /dev/hda1 since I am only interested in that. Then it cuts the characters from the 41st to the 43rd column since they are the columns that display the usage in % , which is what I want.
There are a few more options that can be used with 'du' and 'df' . You could find them in the man pages.

In addition to #cnst,
if you need to exlcude folders from the calculation, use
find ./ -size +4096c -ls | awk '{sum += $7; n++;} END {print sum/n;}'

Use du to estimate file space usage for a given directory.
du -sh /Your/Path # Average file size in human readable format
-s (--summarize) display only a total for each argument.
-h (--human-readable) print sizes in human readable format (e.g. 1K, 234M, 2G).
Note that not using -h would give the default block size (512-byte blocks).
If you wish to specify the block size you can use -k (Kilobytes), -m (Megabytes), or -g (Gigabytes).
du -sk /Your/Path # Average file size in Kilobytes.
Footnote: Using a file path would give the specified files's size.

Related

How to get the folder size in BYTES or the smallest unit possible in SOLARIS

Is there any script or command to get the FOLDER size in BYTES or BITS so that every small change in the files in the folder is reflected by checking the Folder size in SOLARIS/
The directory size doesn't change when you add few bytes to files. Files are allocated in fragments / blocks.
Should you want the cumulative size of all files in a directory, you have to compute it yourself. See https://superuser.com/a/603302/19279
Note that this size doesn't represent what the files are using, which is usually larger but can also be smaller depending on various factors.
Edit:
Here is a simplified solution giving the size in bytes:
#!/bin/sh
find ${1:-.} -type f -exec ls -lnq {} \+ | awk '{sum+=$5} END{print sum}'
du -sk foldername is a Pop Favorite. Just multiply the result by 1024 for #/bytes.

Efficient way of getting listing of files in large filesystem

What is the most efficient way to get a "ls"-like output of the most recently created files in a very large unix file system (100 thousand files +)?
Have tried ls -a and some other varients.
You can also use less to search and scroll it easily.
ls -la | less
If I'm understanding your question correctly try
ls -a | tail
More information here
If the files are in a single directory, then you can use:
ls -lt | less
the -t option to ls will sort the files by modification time and less will let you scroll through them
If the want recent files across an entire file system --- i.e., in different directories, then you can use the find command:
find dir -mtime 1 -print | xargs ls -ld
Substitute the directory where you want to start the search for "dir". The find command will print the names of all of the files that have been modified in the last day (-mtime 1 means modified in the last one day) and the xargs command will take that list of files and feed it to ls, giving you the ls-like output you want

How to count and display the number of all files in my account's space with the names starting with g, t and w? UNIX

Hey guys. I would like to set up the alias in my enviromental file which counts and display the number of all files in my account space with the names starting with g, t and w. So far I came up with something like this:
alias countGTW=find . \( -name 'g*' -o -name 't*' -o -name 'w*' \) | wc -l
However it only counts those within the subdirectories of a current working directory. What I want is that it counts them in my WHOLE account's space. I'm using Korn shell. Hope I explained my problem well enough. Any ideas?
If by your whole account you mean everything in or below your home directory, replace the . with $HOME.
Also, you can simplify the find considerably since the -name predicate understands globbable wildcards:
find $HOME -name '[gtw]*' | wc -l

How to count number of lines in the files which created today

Well,I m trying to list the number of files created today and the count the number of lines in those files.I have to do it in unix.Please suggest how to write script for this.
To find the number of lines:
find / -type f -ctime 0 -mtime 0 -print0 | xargs -0 wc -l
This is almost what you want. There is no file created time in Unix, this is approximation with both file status changed time and file modified time.
If you would like to search only in certain directory, replace / with /path/to/your/dir.
To find the number of files:
find / -type f -ctime 0 -mtime 0 | wc -l
This will find files (-type f) in /path modified in the last 24 hours (-mtime -1 means modified in the last 1 day) and run wc -l to count the number of lines. {} is a placeholder for the file names and + means pass all the file names to a single invocation of wc.
find /path -mtime -1 -type f -exec wc -l {} +
Note that -ctime as suggested in other answers is change time, not creation time. It is the last time a file's owner, group, link count, mode, etc., was changed. Unix does not track the creation time of a file.
find . -maxdepth 1 -daystart -ctime 0 -type f | xargs wc -l
You'll need to change the maxdepth argument value if you need to look deeper.
To count the number of files changed today:
find . -daystart -type f -ctime -1 | wc -l
find finds all the files (-type f) in the current directory (.) created* (-ctime) more recently (-) than one (1) day since the start of this day (-daystart). wc counts the number of lines (-l) in find's output.
To count the lines in those files:
find -daystart -type f -ctime -1 -print0 | wc -l --files0-from=-
The first part is the same, except that find separates the filenames using nulls (-print0). wc counts the lines (-l) in the null-separated files (--files0-from=) on its standard input (-).
* ctime is not actually the creation time, but the time when the file's status was last changed. I don't think the filesystem holds on to the actual creation time.
Determining when a file was created reliably is hard. The mtime is when it was modified, the ctime is when the inode data was changed (change of permissions, for example), the atime is when the file data was last accessed. Usually, the mtime is surrogate for the create time; when a file is created, it records the creation time (as does the ctime and atime), but if the file is subsequently modified, the mtime records the time when the contents of the file was last modified.
find . -mtime -1 -print0 | xargs -0 wc -l
Find all the files under the current directory with a modification time less than 24 hours old and send the names to 'wc -l' - allowing for spaces and other odd characters in the file names.

Why did my use of the read command not do what I expected?

I did some havoc on my computer, when I played with the commands suggested by vezult [1]. I expected the one-liner to ask file-names to be removed. However, it immediately removed my files in a folder:
> find ./ -type f | while read x; do rm "$x"; done
I expected it to wait for my typing of stdin:s [2]. I cannot understand its action. How does the read command work, and where do you use it?
What happened there is that read reads from stdin. When you put it at the end of a pipe, it read from that pipe.
So your find becomes
file1
file2
and so on; read reads that and replaces x successively with file1 then file2, and so your loop becomes
rm "file1"
rm "file2"
and sure enough, that rm's every file starting at the current directory ".".
A couple hints.
You didn't need the "/".
It's better and safer to say
find . -type f
because should you happen to type ". /" (ie, dot SPACE slash) find will start at the current directory and then go look starting at the root directory. That trick, given the right privileges, would delete every file in the computer. "." is already the name of a directory; you don't need to add the slash.
The find or rm commands will do this
It sounds like what you wanted to do was go through all the files in all the directories starting at the current directory ".", and have it ASK if you want to delete it. You could do that with
find . -type f -exec rm -i {} \;
or
find . -type f -ok rm {} \;
and not need a loop at all. You can also do
rm -r -i *
and get nearly the same effect, except that it will try to delete directories too. If the directory is empty, that'll even work.
Another thought
Come to think of it, unless you have a LOT of files, you could also do
rm -i `find . -type f`
Now the find in backquotes will become a bunch of file names on the command line, and the '-i' interactive flag on rm will ask the yes or no question.
Charlie Martin gives you a good dissection and explanation of what went wrong with your specific example, but doesn't address the general question of:
When should you use the read command?
The answer to that is - when you want to read successive lines from some file (quite possibly the standard output of some previous sequence of commands in a pipeline), possibly splitting the lines into several separate variables. The splitting is done using the current value of '$IFS', which normally means on blanks and tabs (newlines don't count in this context; they separate lines). If there are multiple variables in the read command, then the first word goes into the first variable, the second into the second, ..., and the residue of the line into the last variable. If there's only one variable, the whole line goes into that variable.
There are many uses. This is one of the simpler scripts I have that uses the split option:
#!/bin/ksh
#
# #(#)$Id: mkdbs.sh,v 1.4 2008/10/12 02:41:42 jleffler Exp $
#
# Create basic set of databases
MKDUAL=$HOME/bin/mkdual.sql
ELEMENTS=$HOME/src/sqltools/SQL/elements.sql
cat <<! |
mode_ansi with log mode ansi
logged with buffered log
unlogged
stores with buffered log
!
while read dbs logging
do
if [ "$dbs" = "unlogged" ]
then bw=""; cw=""
else bw="-ebegin"; cw="-ecommit"
fi
sqlcmd -xe "create database $dbs $logging" \
$bw -e "grant resource to public" -f $MKDUAL -f $ELEMENTS $cw
done
The cat command with a here-document has its output sent to a pipe, so the output goes into the while read dbs logging loop. The first word goes into $dbs and is the name of the (Informix) database I want to create. The remainder of the line is placed into $logging. The body of the loop deals with unlogged databases (where begin and commit do not work), then run a program sqlcmd (completely separate from the Microsoft new-comer of the same name; it's been around since about 1990) to create a database and populate it with some standard tables and data - a simulation of the Oracle 'dual' table, and a set of tables related to the 'table of elements'.
Other scripts that use the read command are bigger (by far), but generally read lines containing one or more file names and some other attributes of relevance, and then apply an appropriate transform to the files using the attributes.
Osiris JL: file * | grep 'sh.*script' | sed 's/:.*//' | xargs wgrep read
esqlcver:read version letter
jlss: while read directory
jlss: read x || exit
jlss: read x || exit
jlss: while read file type link owner group perms
jlss: read x || exit
jlss: while read file type link owner group perms
kb: while read size name
mkbod: while read directory
mkbod:while read dist comp
mkdbs:while read dbs logging
mkmsd:while read msdfile master
mknmd:while read gfile sfile version notes
publictimestamp:while read name type title
publictimestamp:while read name type title
Osiris JL:
'Osiris JL: ' is my command line prompt; I ran this in my 'bin' directory. 'wgrep' is a variant of grep that only matches entire words (to avoid words like 'already'). This gives some indication of how I've used it.
The 'read x || exit' lines are for an interactive script that reads a response from standard input, but exits if the command gets EOF (for example, if standard input comes from /dev/null).

Resources