Converting dates in AWK - datetime

I have a file containing many columns of text, including a timestamp along the lines of Fri Jan 02 18:23 and I need to convert that date into MM/DD/YYYY HH:MM format.
I have been trying to use the standard `date' tool with awk getline to do the conversion, but I can't quite figure out how to pass the fields into the 'date' command in the format it expects (quoted with " or 's,) as getline needs the command string enclosed in quotes too.
Something like "date -d '$1 $2 $3 $4' +'%D %H:%M'" | getline var
Now that I think about it, I guess what I'm really asking is how to embed awk variables into a string.

If you're using gawk, you don't need the external date which can be expensive to call repeatedly:
awk '
BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
months[d[o]]=sprintf("%02d",o)
}
format = "%m/%d/%Y %H:%M"
}
{
split($4,time,":")
date = (strftime("%Y") " " months[$2] " " $3 " " time[1] " " time[2] " 0")
print strftime(format, mktime(date))
}'
Thanks to ghostdog74 for the months array from this answer.

you can try this. Assuming just the date you specified is in the file
awk '
{
cmd ="date \"+%m/%d/%Y %H:%M\" -d \""$1" "$2" "$3" "$4"\""
cmd | getline var
print var
close(cmd)
}' file
output
$ ./shell.sh
01/02/2010 18:23
and if you are not using GNU tools, like if you are in Solaris for example, use nawk
nawk 'BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
months[d[o]]=sprintf("%02d",o)
}
cmd="date +%Y"
cmd|getline yr
close(cmd)
}
{
day=$3
mth=months[$2]
print mth"/"day"/"yr" "$4
} ' file

I had a similar issue converting a date from RRDTool databases using rrdfetch but prefer one liners that I've been using since Apollo computer days.
Data looked like this:
localTemp rs1Temp rs2Temp thermostatMode
1547123400: 5.2788174937e+00 4.7788174937e+00 -8.7777777778e+00 2.0000000000e+00
1547123460: 5.1687014581e+00 4.7777777778e+00 -8.7777777778e+00 2.0000000000e+00
One liner:
rrdtool fetch -s -14400 thermostatDaily.rrd MAX | sed s/://g | awk '{print "echo ""\`date -r" $1,"\`" " " $2 }' | sh
Result:
Thu Jan 10 07:25:00 EST 2019 5.3373432378e+00
Thu Jan 10 07:26:00 EST 2019 5.2788174937e+00
On the face of it this doesn't look very efficient to me but this kind of methodology has always proven to be fairly low overhead under most circumstances even for very large files on very low power computer (like 25Mhz NeXT Machines). Yes Mhz.
Sed deletes the colon, awk is used to print the other various commands of interest including just echoing the awk variables and sh or bash executes the resulting string.
For methodology or large files or streams I just head the first few lines and gradually build up the one liner. Throw away code.

Related

To improve Calculate Number of Days Command

Would like to generate report, which calculate the number of days, the material is in the warehouse.
The number of days is the difference between date ($3 field) the material comes in and
against (01 OCT 2014) manual feed date.
Input.csv
Des11,Material,DateIN,Des22,Des33,MRP,Des44,Des55,Des66,Location,Des77,Des88
aa,xxx,19-AUG-14.08:08:01,cc,dd,x20,ee,ff,gg,XX128,hh,jj
aa,xxx,19-AUG-14.08:08:01,cc,dd,x20,ee,ff,gg,XX128,hh,jj
aa,yyy,13-JUN-14.09:06:08,cc,dd,x20,ee,ff,gg,XX128,hh,jj
aa,yyy,13-JUN-14.09:06:08,cc,dd,x20,ee,ff,gg,XX128,hh,jj
aa,yyy,05-FEB-14.09:02:09,cc,dd,x20,ee,ff,gg,YY250,hh,jj
aa,yyy,05-FEB-14.09:02:09,cc,dd,y35,ee,ff,gg,YY250,hh,jj
aa,zzz,05-FEB-14.09:02:09,cc,dd,y35,ee,ff,gg,YY250,hh,jj
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj
Currently i am using below command to popualte Ageing - No of days at $13 field ( thanks to gboffi)
awk -F, 'NR>0 {date=$3;
gsub("[-.]"," ",date);
printf $0 ",";system("date --date=\"" date "\" +%s")}
' Input.csv | awk -F, -v OFS=, -v now=`date --date="01 OCT 2014 " +%s` '
NR>0 {$13=now-$13; $13=$13/24/3600;print $0}' >Op_Step11.csv
while using the above command in Cygwin (windows), it is taking 50 minutes for 1 Lac (1,00,000) rows of sample input.
Since my actual input file contains 25 million rows of lines , it seems that the script will take couple of days ,
Looking for your suggestions to improve the command and advice !!!
Expected Output:
Des11,Material,DateIN,Des22,Des33,MRP,Des44,Des55,Des66,Location,Des77,Des88,Ageing-NoOfDays
aa,xxx,19-AUG-14.08:08:01,cc,dd,x20,ee,ff,gg,XX128,hh,jj,42.6611
aa,xxx,19-AUG-14.08:08:01,cc,dd,x20,ee,ff,gg,XX128,hh,jj,42.6611
aa,yyy,13-JUN-14.09:06:08,cc,dd,x20,ee,ff,gg,XX128,hh,jj,109.621
aa,yyy,13-JUN-14.09:06:08,cc,dd,x20,ee,ff,gg,XX128,hh,jj,109.621
aa,yyy,05-FEB-14.09:02:09,cc,dd,x20,ee,ff,gg,YY250,hh,jj,237.624
aa,yyy,05-FEB-14.09:02:09,cc,dd,y35,ee,ff,gg,YY250,hh,jj,237.624
aa,zzz,05-FEB-14.09:02:09,cc,dd,y35,ee,ff,gg,YY250,hh,jj,237.624
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj,476.787
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj,476.787
aa,zzz,11-JUN-13.05:06:17,cc,dd,y35,ee,ff,gg,YY250,hh,jj,476.787
I don't have the access to change the input format and dont have perl & python access.
Update#3:
BEGIN{ FS=OFS=","}
{
t1=$3
t2="01-OCT-14.00:00:00"
print $0,(cvttime(t2) - cvttime(t1))/24/3600
}
function cvttime(t, a) {
split(t,a,"[-.:]")
match("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",a[2])
a[2] = sprintf("%02d",(RSTART+2)/3)
return( mktime("20"a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6]) )
}
Since you are on cygwin you are using GNU awk which has it's own built-in time functions and so you do not need to be trying to use the shell date command. Just tweak this old command I had lying around to suit your input and output format:
function cvttime(t, a) {
split(t,a,"[/:]")
match("JanFebMarAprMayJunJulAugSepOctNovDec",a[2])
a[2] = sprintf("%02d",(RSTART+2)/3)
return( mktime(a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6]) )
}
BEGIN{
t1="01/Dec/2005:00:04:42"
t2="01/Dec/2005:17:14:12"
print cvttime(t2) - cvttime(t1)
}
It uses GNU awk for time functions, see http://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions
Here is an example in Perl:
use feature qw(say);
use strict;
use warnings;
use Text::CSV;
use Time::Piece;
my $csv = Text::CSV->new;
my $te = Time::Piece->strptime('01-OCT-14', '%d-%b-%y');
my $fn = 'Input.csv';
open (my $fh, '<', $fn) or die "Could not open file '$fn': $!\n";
chomp(my $head = <$fh>);
say "$head,Ageing-NoOfDays";
while (my $line = <$fh>) {
chomp $line;
if ($csv->parse($line)) {
my $t = ($csv->fields())[2];
my $tp = Time::Piece->strptime($t, '%d-%b-%y.%T');
my $s = $te - $tp;
say "$line," . $s->days;
} else {
warn "Line could not be parsed: $line\n";
}
}
close($fh);

Removing trailing / starting newlines with sed, awk, tr, and friends

I would like to remove all of the empty lines from a file, but only when they are at the end/start of a file (that is, if there are no non-empty lines before them, at the start; and if there are no non-empty lines after them, at the end.)
Is this possible outside of a fully-featured scripting language like Perl or Ruby? I’d prefer to do this with sed or awk if possible. Basically, any light-weight and widely available UNIX-y tool would be fine, especially one I can learn more about quickly (Perl, thus, not included.)
From Useful one-line scripts for sed:
# Delete all leading blank lines at top of file (only).
sed '/./,$!d' file
# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
Therefore, to remove both leading and trailing blank lines from a file, you can combine the above commands into:
sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba' file
So I'm going to borrow part of #dogbane's answer for this, since that sed line for removing the leading blank lines is so short...
tac is part of coreutils, and reverses a file. So do it twice:
tac file | sed -e '/./,$!d' | tac | sed -e '/./,$!d'
It's certainly not the most efficient, but unless you need efficiency, I find it more readable than everything else so far.
here's a one-pass solution in awk: it does not start printing until it sees a non-empty line and when it sees an empty line, it remembers it until the next non-empty line
awk '
/[[:graph:]]/ {
# a non-empty line
# set the flag to begin printing lines
p=1
# print the accumulated "interior" empty lines
for (i=1; i<=n; i++) print ""
n=0
# then print this line
print
}
p && /^[[:space:]]*$/ {
# a potentially "interior" empty line. remember it.
n++
}
' filename
Note, due to the mechanism I'm using to consider empty/non-empty lines (with [[:graph:]] and /^[[:space:]]*$/), interior lines with only whitespace will be truncated to become truly empty.
As mentioned in another answer, tac is part of coreutils, and reverses a file. Combining the idea of doing it twice with the fact that command substitution will strip trailing new lines, we get
echo "$(echo "$(tac "$filename")" | tac)"
which doesn't depend on sed. You can use echo -n to strip the remaining trailing newline off.
Here's an adapted sed version, which also considers "empty" those lines with just spaces and tabs on it.
sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
It's basically the accepted answer version (considering BryanH comment), but the dot . in the first command was changed to [^[:blank:]] (anything not blank) and the \n inside the second command address was changed to [[:space:]] to allow newlines, spaces an tabs.
An alternative version, without using the POSIX classes, but your sed must support inserting \t and \n inside […]. GNU sed does, BSD sed doesn't.
sed -e :a -e '/[^\t ]/,$!d; /^[\n\t ]*$/{ $d; N; ba' -e '}'
Testing:
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n'
foo
foo
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -n l
$
\t $
$
foo$
$
foo$
$
\t $
$
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
foo
foo
prompt$
using awk:
awk '{a[NR]=$0;if($0 && !s)s=NR;}
END{e=NR;
for(i=NR;i>1;i--)
if(a[i]){ e=i; break; }
for(i=s;i<=e;i++)
print a[i];}' yourFile
this can be solved easily with sed -z option
sed -rz 's/^\n+//; s/\n+$/\n/g' file
Hello
Welcome to
Unix and Linux
For an efficient non-recursive version of the trailing newlines strip (including "white" characters) I've developed this sed script.
sed -n '/^[[:space:]]*$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^[[:space:]]*$/H'
It uses the hold buffer to store all blank lines and prints them only after it finds a non-blank line. Should someone want only the newlines, it's enough to get rid of the two [[:space:]]* parts:
sed -n '/^$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^$/H'
I've tried a simple performance comparison with the well-known recursive script
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba'
on a 3MB file with 1MB of random blank lines around a random base64 text.
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M > bigfile
base64 </dev/urandom | dd bs=1 count=1M >> bigfile
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M >> bigfile
The streaming script took roughly 0.5 second to complete, the recursive didn't end after 15 minutes. Win :)
For completeness sake of the answer, the leading lines stripping sed script is already streaming fine. Use the most suitable for you.
sed '/[^[:blank:]]/,$!d'
sed '/./,$!d'
Using bash
$ filecontent=$(<file)
$ echo "${filecontent/$'\n'}"
In bash, using cat, wc, grep, sed, tail and head:
# number of first line that contains non-empty character
i=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | head -1`
# number of hte last one
j=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | tail -1`
# overall number of lines:
k=`cat <your_file> | wc -l`
# how much empty lines at the end of file we have?
m=$(($k-$j))
# let strip last m lines!
cat <your_file> | head -n-$m
# now we have to strip first i lines and we are done 8-)
cat <your_file> | tail -n+$i
Man, it's definitely worth to learn "real" programming language to avoid that ugliness!
#dogbane has a nice simple answer for removing leading empty lines. Here's a simple awk command which removes just the trailing lines. Use this with #dogbane's sed command to remove both leading and trailing blanks.
awk '{ LINES=LINES $0 "\n"; } /./ { printf "%s", LINES; LINES=""; }'
This is pretty simple in operation.
Add every line to a buffer as we read it.
For every line which contains a character, print the contents of the buffer and then clear it.
So the only things that get buffered and never displayed are any trailing blanks.
I used printf instead of print to avoid the automatic addition of a newline, since I'm using newlines to separate the lines in the buffer already.
This AWK script will do the trick:
BEGIN {
ne=0;
}
/^[[:space:]]*$/ {
ne++;
}
/[^[:space:]]+/ {
for(i=0; i < ne; i++)
print "";
ne=0;
print
}
The idea is simple: empty lines do not get echoed immediately. Instead, we wait till we get a non-empty line, and only then we first echo out as much empty lines as seen before it, and only then echo out the new non-empty line.
perl -0pe 's/^\n+|\n+(\n)$/\1/gs'
Here's an awk version that removes trailing blank lines (both empty lines and lines consisting of nothing but white space).
It is memory efficient; it does not read the entire file into memory.
awk '/^[[:space:]]*$/ {b=b $0 "\n"; next;} {printf "%s",b; b=""; print;}'
The b variable buffers up the blank lines; they get printed when a non-blank line is encountered. When EOF is encountered, they don't get printed. That's how it works.
If using gnu awk, [[:space:]] can be replaced with \s. (See full list of gawk-specific Regexp Operators.)
If you want to remove only those trailing lines that are empty, see #AndyMortimer's answer.
A bash solution.
Note: Only useful if the file is small enough to be read into memory at once.
[[ $(<file) =~ ^$'\n'*(.*)$ ]] && echo "${BASH_REMATCH[1]}"
$(<file) reads the entire file and trims trailing newlines, because command substitution ($(....)) implicitly does that.
=~ is bash's regular-expression matching operator, and =~ ^$'\n'*(.*)$ optionally matches any leading newlines (greedily), and captures whatever comes after. Note the potentially confusing $'\n', which inserts a literal newline using ANSI C quoting, because escape sequence \n is not supported.
Note that this particular regex always matches, so the command after && is always executed.
Special array variable BASH_REMATCH rematch contains the results of the most recent regex match, and array element [1] contains what the (first and only) parenthesized subexpression (capture group) captured, which is the input string with any leading newlines stripped. The net effect is that ${BASH_REMATCH[1]} contains the input file content with both leading and trailing newlines stripped.
Note that printing with echo adds a single trailing newline. If you want to avoid that, use echo -n instead (or use the more portable printf '%s').
I'd like to introduce another variant for gawk v4.1+
result=($(gawk '
BEGIN {
lines_count = 0;
empty_lines_in_head = 0;
empty_lines_in_tail = 0;
}
/[^[:space:]]/ {
found_not_empty_line = 1;
empty_lines_in_tail = 0;
}
/^[[:space:]]*?$/ {
if ( found_not_empty_line ) {
empty_lines_in_tail ++;
} else {
empty_lines_in_head ++;
}
}
{
lines_count ++;
}
END {
print (empty_lines_in_head " " empty_lines_in_tail " " lines_count);
}
' "$file"))
empty_lines_in_head=${result[0]}
empty_lines_in_tail=${result[1]}
lines_count=${result[2]}
if [ $empty_lines_in_head -gt 0 ] || [ $empty_lines_in_tail -gt 0 ]; then
echo "Removing whitespace from \"$file\""
eval "gawk -i inplace '
{
if ( NR > $empty_lines_in_head && NR <= $(($lines_count - $empty_lines_in_tail)) ) {
print
}
}
' \"$file\""
fi
Because I was writing a bash script anyway containing some functions, I found it convenient to write those:
function strip_leading_empty_lines()
{
while read line; do
if [ -n "$line" ]; then
echo "$line"
break
fi
done
cat
}
function strip_trailing_empty_lines()
{
acc=""
while read line; do
acc+="$line"$'\n'
if [ -n "$line" ]; then
echo -n "$acc"
acc=""
fi
done
}

How to sort characters in a string?

I would like to sort the characters in a string.
E.g.
echo cba | sort-command
abc
Is there a command that will allow me to do this or will I have to write an awk script to iterate over the string and sort it?
echo cba | grep -o . | sort |tr -d "\n"
Please find the following useful methods:
Shell
Sort string based on its characters:
echo cba | grep -o . | sort | tr -d "\n"
String separated by spaces:
echo 'dd aa cc bb' | tr " " "\n" | sort | tr "\n" " "
Perl
print (join "", sort split //,$_)
Ruby
ruby -e 'puts "dd aa cc bb".split(/\s+/).sort'
Bash
With bash you have to enumerate each character from a string, in general something like:
str="dd aa cc bb";
for (( i = 0; i < ${#str[#]}; i++ )); do echo "${str[$i]}"; done
For sorting array, please check: How to sort an array in bash?
This is cheating (because it uses Perl), but works. :-P
echo cba | perl -pe 'chomp; $_ = join "", sort split //'
Another perl one-liner
$ echo cba | perl -F -lane 'print sort #F'
abc
$ # for reverse order
$ echo xyz | perl -F -lane 'print reverse sort #F'
zyx
$ # or
$ echo xyz | perl -F -lane 'print sort {$b cmp $a} #F'
zyx
This will add newline to output as well, courtesy -l option
See Command switches for doc on all the options
The input is basically split character wise and saved in #F array
Then sorted #F is printed
This will also work line wise for given input file
$ cat ip.txt
idea
cold
spare
umbrella
$ perl -F -lane 'print sort #F' ip.txt
adei
cdlo
aeprs
abellmru
This would have been more appropriate as a comment to one of the grep -o . solutions (my reputation's not quite up to that low bar alas, damn my lurking), but I thought it worth mentioning that separating letters can be done more efficiently within the shell. It's always worth avoiding code, but this letsep function is pretty small:
letsep ()
{
INWORD="$1"
while [ "$INWORD" ]
do
echo ${INWORD:0:1}
INWORD=${INWORD#?}
done
}
. . . and outputs one letter per line for an input string of arbitrary length. For example, once letsep is defined, populating an array FLETRS with the letters of a string contained in variable FRED could be done (assuming contemporary bash) as:
readarray -t FLETRS < <(letsep $FRED)
. . . which for word-size strings runs about twice as fast as the equivalent :
readarray -t FLETRS < <(echo $FRED | grep -o .)
Whether this is worth setting up depends on the application. I've only measured this crudely, but the slower procedural code seems to maintain an advantage over the context switch up to ~60 chars (grep is obviously more efficient, but loading it is relatively expensive). If the above operation is taking place in one or more steps of a loop over an indeterminate number of executions, the difference in efficiency can add up (at which point some might argue for switching tools and rewriting regardless, but that's another set of tradeoffs).

counting records in unix file

This was an interview question, nevertheless still a programming question.
I have a unix file with two columns name and score. I need to display count of all the scores.
like
jhon 100
dan 200
rob 100
mike 100
the output should be
100 3
200 1
You only need to use built in unix utility to solve it, so i am assuming using shell scripts . or reg ex. or unix commands
I understand looping would be one way to do. store all the values u have already seen and then grep every record for unseen values. any other efficient way of doing it
Try this:
cut -d ' ' -f 2 < /tmp/foo | sort -n | uniq -c \
| (while read n v ; do printf "%s %s\n" "$v" "$n" ; done)
The initial cut could be replaced with another while read loop, which would be more resilient to input file format variations (extra whitespace). If some of the names consist in several words, simple field extraction will not work as easily, but sed can do it.
Otherwise, use your favorite programming language. Perl would probably shine. It is not difficult either in Java or even in C or Forth.
$ cat foo.txt
jhon 100
dan 200
rob 100
mike 100
$ awk '{print $2}' foo.txt | sort | uniq -c
3 100
1 200
Its a pity you can't do a count with sort or uniq alone.
Edit: I just noticed I have the count in front ... to get it exactly the same you can do:
$ awk '{print $2}' foo.txt | sort | uniq -c | awk '{ print $2 " " $1 }'
Not very complicated in perl:
#!/usr/bin/perl -w
use strict;
use warnings;
my %count = ();
while (<>) {
chomp;
my ($name, $score) = split(/ /);
$count{$score}++;
}
foreach my $key (sort keys %count) {
print "$key ", $count{$key}, "\n";
}
You could go with awk:
awk '/.*/ { a[$2] = a[$2] + 1; } END { for (x in a) { print x, " ", a[x] } }' record_file.txt
Alternatively with shell commands:
for i in `awk '{print $2}' inputfile | sort -u`
do
echo -n "$i "
grep $i inputfile | wc -l
done
The first awk command will give a list of all the different scores (e.g. 100 and 200) which then
the for loop iterates over, counting up each separately. Not very super efficient, but simple. If the file is not to big is should not be a too big problem.

How can I show file sizes with commas when getting a directory listing with 'ls -l'?

You can do 'ls -l' to get a detailed directory listing like this:
-rw-rw-rw- 1 alice themonkeys 1159995999 2008-08-20 07:01 foo.log
-rw-rw-rw- 1 bob bob 244251992 2008-08-20 05:30 bar.txt
But notice how you have to slide your finger along the screen to figure out the order of magnitude of those file sizes.
What's a good way to add commas to the file sizes in the directory listing, like this:
-rw-rw-rw- 1 alice themonkeys 1,159,995,999 2008-08-20 07:01 foo.log
-rw-rw-rw- 1 bob bob 244,251,992 2008-08-20 05:30 bar.txt
I just discovered that it's built-in to GNU Core Utils and works for ls and du!
ls -l --block-size="'1"
du --block-size="'1"
It works on Ubuntu but sadly doesn't on OSX. More on variants of block size here
If the order of magnitude is all you're interested in, ls -lh does something like this:
-rw-r----- 1 alice themonkeys 626M 2007-02-05 01:15 foo.log
-rw-rw-r-- 1 bob bob 699M 2007-03-12 23:14 bar.txt
I don't think 'ls' has exactly that capability. If you are looking for readability, 'ls -lh' will give you file sizes that are easier for humans to parse.
-rw-rw-rw- 1 alice themonkeys 1.2G 2008-08-20 07:01 foo.log
-rw-rw-rw- 1 bob bob 244M 2008-08-20 05:30 bar.txt
Here's an improvement to commafy.pl, it allows you to use ls with or without listing the file sizes. Alias ls to commafy.pl to use it.
#!/usr/bin/perl
# Does ls and adds commas to numbers if ls -l is used.
$largest_number_of_commas = 0;
$result = `ls -C #ARGV`;
# First line adds five spaces before file size
$result =~ s/(^[-lrwxds]{10,}\s*[^\s]+\s*[^\s]+\s*[^\s]+)/$1 /gm;
$result =~ s/(.{5} )(\d{4,}) /truncatePre($1,$2).commafy($2).' '/eg;
$remove_extra_spaces = 5 - $largest_number_of_commas;
$result =~ s/(^[-lrwxds]{10,}\s*[^\s]+\s*[^\s]+\s*[^\s]+) {$remove_extra_spaces}/$1/gm;
print $result;
# adds commas to an integer as appropriate
sub commafy
{
my($num) = #_;
my $len = length($num);
if ($len <= 3) { return $num; }
return commafy(substr($num, 0, $len - 3)) . ',' . substr($num, -3);
}
# removes as many chars from the end of str as there are commas to be added
# to num
sub truncatePre
{
my($str, $num) = #_;
$numCommas = int((length($num)-1) / 3);
if ($numCommas > $largest_number_of_commas) {$largest_number_of_commas = $numCommas}
return substr($str, 0, length($str) - $numCommas);
}
This common sed script should work:
ls -l | sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'
However, I agree with the earlier comment suggesting ls -lh is probably the better general solution for the desired effect.
This is on OS X, so you might have to tweak it a bit for your Unix flavor. I created such a function for this purpose in my ~/.bashrc dot file. The trick is using ' in the awk printf format string for the file size. A caveat: the awk mangles the "total" first line somewhat, and loses terminal coloration as well. Otherwise, one of its merits is that it tries to keep columns aligned as much as possible. To me this instantly gives a visual estimation of how big a file is. The -h switch solution is okay, but your brain needs to convert those Ks, Bs, Gs. The biggest advantage to the solution below is that you can pipe it to sort and sort would understand it. As in "lc | sort -k5,5nr" for example.
lc() {
/bin/ls -l -GPT | /usr/bin/awk "{
printf \"%-11s \", \$1;
printf \"%3s \", \$2;
printf \"%-6s \", \$3;
printf \"%-6s \", \$4;
printf \"%'12d \", \$5;
printf \"%3s \", \$6;
printf \"%2s \", \$7;
for (i=8; i<=NF; i++) {
printf \"%s \", \$i
};
printf \"\n\";
}"
}
Here's a perl script that will filter the output of 'ls -l' to add the commas.
If you call the script commafy.pl then you can alias 'ls' to 'ls -l | commafy.pl'.
#!/usr/bin/perl -p
# pipe the output of ls -l through this to add commas to numbers.
s/(.{5} )(\d{4,}) /truncatePre($1,$2).commafy($2).' '/e;
# adds commas to an integer as appropriate
sub commafy
{
my($num) = #_;
my $len = length($num);
if ($len <= 3) { return $num; }
return commafy(substr($num, 0, $len - 3)) . ',' . substr($num, -3);
}
# removes as many chars from the end of str as there are commas to be added
# to num
sub truncatePre
{
my($str, $num) = #_;
$numCommas = int((length($num)-1) / 3);
return substr($str, 0, length($str) - $numCommas);
}
Actually, I was looking for a test for a young trainee and this seemed ideal. Here's what he came up with:
for i in $(ls -1)
do
sz=$(expr $(ls -ld $i | awk '{print $5}' | wc -c) - 1)
printf "%10d %s\n" $sz $i
done
It gives the order of magnitude for the size in a horribly inefficient way. I'll make this community wiki since we're both interested how you rate his code, but I don't want my rep suffering.
Feel free to leave comments (be gentle, he's a newbie, though you wouldn't guess it by his shell scripting :-).
I wrote this several years ago, works on stdin;
Read stdin & insert commas in numbers for readability emit to stdout.
example;
$ ls -l testdatafile.1000M
-rw-rw-r--+ 1 mkm wheel 1048576000 Apr 24 12:45 testdatafile.1000M
$ ls -l testdatafile.1000M | commas
-rw-rw-r--+ 1 mkm wheel 1,048,576,000 Apr 24 12:45 testdatafile.1000M
https://github.com/mikemakuch/commas
export LS_BLOCK_SIZE="'1"
will do this for recent versions of GNU ls.

Resources