Would like to generate report, which calculate the number of days, the material is in the warehouse.
The number of days is the difference between date ($3 field) the material comes in and
against (01 OCT 2014) manual feed date.
Currently i am using below command to popualte Ageing - No of days at $13 field ( thanks to gboffi)
awk -F, 'NR>0 {date=$3;
gsub("[-.]"," ",date);
printf $0 ",";system("date --date=\"" date "\" +%s")}
' Input.csv | awk -F, -v OFS=, -v now=`date --date="01 OCT 2014 " +%s` '
NR>0 {$13=now-$13; $13=$13/24/3600;print $0}' >Op_Step11.csv
while using the above command in Cygwin (windows), it is taking 50 minutes for 1 Lac (1,00,000) rows of sample input.
Since my actual input file contains 25 million rows of lines , it seems that the script will take couple of days ,
Looking for your suggestions to improve the command and advice !!!
Expected Output:
I don't have the access to change the input format and dont have perl & python access.
print $0,(cvttime(t2) - cvttime(t1))/24/3600
function cvttime(t, a) {
a[2] = sprintf("%02d",(RSTART+2)/3)
return( mktime("20"a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6]) )
Since you are on cygwin you are using GNU awk which has it's own built-in time functions and so you do not need to be trying to use the shell date command. Just tweak this old command I had lying around to suit your input and output format:
function cvttime(t, a) {
a[2] = sprintf("%02d",(RSTART+2)/3)
return( mktime(a[3]" "a[2]" "a[1]" "a[4]" "a[5]" "a[6]) )
print cvttime(t2) - cvttime(t1)
It uses GNU awk for time functions, see
Here is an example in Perl:
use feature qw(say);
use strict;
use warnings;
use Text::CSV;
use Time::Piece;
my $csv = Text::CSV->new;
my $te = Time::Piece->strptime('01-OCT-14', '%d-%b-%y');
my $fn = 'Input.csv';
open (my $fh, '<', $fn) or die "Could not open file '$fn': $!\n";
chomp(my $head = <$fh>);
say "$head,Ageing-NoOfDays";
while (my $line = <$fh>) {
chomp $line;
if ($csv->parse($line)) {
my $t = ($csv->fields())[2];
my $tp = Time::Piece->strptime($t, '%d-%b-%y.%T');
my $s = $te - $tp;
say "$line," . $s->days;
} else {
warn "Line could not be parsed: $line\n";
I would like to remove all of the empty lines from a file, but only when they are at the end/start of a file (that is, if there are no non-empty lines before them, at the start; and if there are no non-empty lines after them, at the end.)
Is this possible outside of a fully-featured scripting language like Perl or Ruby? I’d prefer to do this with sed or awk if possible. Basically, any light-weight and widely available UNIX-y tool would be fine, especially one I can learn more about quickly (Perl, thus, not included.)
From Useful one-line scripts for sed:
# Delete all leading blank lines at top of file (only).
sed '/./,$!d' file
# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
Therefore, to remove both leading and trailing blank lines from a file, you can combine the above commands into:
sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba' file
So I'm going to borrow part of #dogbane's answer for this, since that sed line for removing the leading blank lines is so short...
tac is part of coreutils, and reverses a file. So do it twice:
tac file | sed -e '/./,$!d' | tac | sed -e '/./,$!d'
It's certainly not the most efficient, but unless you need efficiency, I find it more readable than everything else so far.
here's a one-pass solution in awk: it does not start printing until it sees a non-empty line and when it sees an empty line, it remembers it until the next non-empty line
awk '
/[[:graph:]]/ {
# a non-empty line
# set the flag to begin printing lines
# print the accumulated "interior" empty lines
for (i=1; i<=n; i++) print ""
# then print this line
p && /^[[:space:]]*$/ {
# a potentially "interior" empty line. remember it.
' filename
Note, due to the mechanism I'm using to consider empty/non-empty lines (with [[:graph:]] and /^[[:space:]]*$/), interior lines with only whitespace will be truncated to become truly empty.
As mentioned in another answer, tac is part of coreutils, and reverses a file. Combining the idea of doing it twice with the fact that command substitution will strip trailing new lines, we get
echo "$(echo "$(tac "$filename")" | tac)"
which doesn't depend on sed. You can use echo -n to strip the remaining trailing newline off.
Here's an adapted sed version, which also considers "empty" those lines with just spaces and tabs on it.
sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
It's basically the accepted answer version (considering BryanH comment), but the dot . in the first command was changed to [^[:blank:]] (anything not blank) and the \n inside the second command address was changed to [[:space:]] to allow newlines, spaces an tabs.
An alternative version, without using the POSIX classes, but your sed must support inserting \t and \n inside […]. GNU sed does, BSD sed doesn't.
sed -e :a -e '/[^\t ]/,$!d; /^[\n\t ]*$/{ $d; N; ba' -e '}'
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n'
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -n l
\t $
\t $
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
using awk:
awk '{a[NR]=$0;if($0 && !s)s=NR;}
if(a[i]){ e=i; break; }
print a[i];}' yourFile
this can be solved easily with sed -z option
sed -rz 's/^\n+//; s/\n+$/\n/g' file
Welcome to
Unix and Linux
For an efficient non-recursive version of the trailing newlines strip (including "white" characters) I've developed this sed script.
sed -n '/^[[:space:]]*$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^[[:space:]]*$/H'
It uses the hold buffer to store all blank lines and prints them only after it finds a non-blank line. Should someone want only the newlines, it's enough to get rid of the two [[:space:]]* parts:
sed -n '/^$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^$/H'
I've tried a simple performance comparison with the well-known recursive script
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba'
on a 3MB file with 1MB of random blank lines around a random base64 text.
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M > bigfile
base64 </dev/urandom | dd bs=1 count=1M >> bigfile
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M >> bigfile
The streaming script took roughly 0.5 second to complete, the recursive didn't end after 15 minutes. Win :)
For completeness sake of the answer, the leading lines stripping sed script is already streaming fine. Use the most suitable for you.
sed '/[^[:blank:]]/,$!d'
sed '/./,$!d'
Using bash
$ filecontent=$(<file)
$ echo "${filecontent/$'\n'}"
In bash, using cat, wc, grep, sed, tail and head:
# number of first line that contains non-empty character
i=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | head -1`
# number of hte last one
j=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | tail -1`
# overall number of lines:
k=`cat <your_file> | wc -l`
# how much empty lines at the end of file we have?
# let strip last m lines!
cat <your_file> | head -n-$m
# now we have to strip first i lines and we are done 8-)
cat <your_file> | tail -n+$i
Man, it's definitely worth to learn "real" programming language to avoid that ugliness!
#dogbane has a nice simple answer for removing leading empty lines. Here's a simple awk command which removes just the trailing lines. Use this with #dogbane's sed command to remove both leading and trailing blanks.
awk '{ LINES=LINES $0 "\n"; } /./ { printf "%s", LINES; LINES=""; }'
This is pretty simple in operation.
Add every line to a buffer as we read it.
For every line which contains a character, print the contents of the buffer and then clear it.
So the only things that get buffered and never displayed are any trailing blanks.
I used printf instead of print to avoid the automatic addition of a newline, since I'm using newlines to separate the lines in the buffer already.
This AWK script will do the trick:
/^[[:space:]]*$/ {
/[^[:space:]]+/ {
for(i=0; i < ne; i++)
print "";
The idea is simple: empty lines do not get echoed immediately. Instead, we wait till we get a non-empty line, and only then we first echo out as much empty lines as seen before it, and only then echo out the new non-empty line.
perl -0pe 's/^\n+|\n+(\n)$/\1/gs'
Here's an awk version that removes trailing blank lines (both empty lines and lines consisting of nothing but white space).
It is memory efficient; it does not read the entire file into memory.
awk '/^[[:space:]]*$/ {b=b $0 "\n"; next;} {printf "%s",b; b=""; print;}'
The b variable buffers up the blank lines; they get printed when a non-blank line is encountered. When EOF is encountered, they don't get printed. That's how it works.
If using gnu awk, [[:space:]] can be replaced with \s. (See full list of gawk-specific Regexp Operators.)
If you want to remove only those trailing lines that are empty, see #AndyMortimer's answer.
A bash solution.
Note: Only useful if the file is small enough to be read into memory at once.
[[ $(<file) =~ ^$'\n'*(.*)$ ]] && echo "${BASH_REMATCH[1]}"
$(<file) reads the entire file and trims trailing newlines, because command substitution ($(....)) implicitly does that.
=~ is bash's regular-expression matching operator, and =~ ^$'\n'*(.*)$ optionally matches any leading newlines (greedily), and captures whatever comes after. Note the potentially confusing $'\n', which inserts a literal newline using ANSI C quoting, because escape sequence \n is not supported.
Note that this particular regex always matches, so the command after && is always executed.
Special array variable BASH_REMATCH rematch contains the results of the most recent regex match, and array element [1] contains what the (first and only) parenthesized subexpression (capture group) captured, which is the input string with any leading newlines stripped. The net effect is that ${BASH_REMATCH[1]} contains the input file content with both leading and trailing newlines stripped.
Note that printing with echo adds a single trailing newline. If you want to avoid that, use echo -n instead (or use the more portable printf '%s').
I'd like to introduce another variant for gawk v4.1+
result=($(gawk '
lines_count = 0;
empty_lines_in_head = 0;
empty_lines_in_tail = 0;
/[^[:space:]]/ {
found_not_empty_line = 1;
empty_lines_in_tail = 0;
/^[[:space:]]*?$/ {
if ( found_not_empty_line ) {
empty_lines_in_tail ++;
} else {
empty_lines_in_head ++;
lines_count ++;
print (empty_lines_in_head " " empty_lines_in_tail " " lines_count);
' "$file"))
if [ $empty_lines_in_head -gt 0 ] || [ $empty_lines_in_tail -gt 0 ]; then
echo "Removing whitespace from \"$file\""
eval "gawk -i inplace '
if ( NR > $empty_lines_in_head && NR <= $(($lines_count - $empty_lines_in_tail)) ) {
' \"$file\""
Because I was writing a bash script anyway containing some functions, I found it convenient to write those:
function strip_leading_empty_lines()
while read line; do
if [ -n "$line" ]; then
echo "$line"
function strip_trailing_empty_lines()
while read line; do
if [ -n "$line" ]; then
echo -n "$acc"
I would like to sort the characters in a string.
echo cba | sort-command
Is there a command that will allow me to do this or will I have to write an awk script to iterate over the string and sort it?
echo cba | grep -o . | sort |tr -d "\n"
Please find the following useful methods:
Sort string based on its characters:
echo cba | grep -o . | sort | tr -d "\n"
String separated by spaces:
echo 'dd aa cc bb' | tr " " "\n" | sort | tr "\n" " "
print (join "", sort split //,$_)
ruby -e 'puts "dd aa cc bb".split(/\s+/).sort'
With bash you have to enumerate each character from a string, in general something like:
str="dd aa cc bb";
for (( i = 0; i < ${#str[#]}; i++ )); do echo "${str[$i]}"; done
For sorting array, please check: How to sort an array in bash?
This is cheating (because it uses Perl), but works. :-P
echo cba | perl -pe 'chomp; $_ = join "", sort split //'
Another perl one-liner
$ echo cba | perl -F -lane 'print sort #F'
$ # for reverse order
$ echo xyz | perl -F -lane 'print reverse sort #F'
$ # or
$ echo xyz | perl -F -lane 'print sort {$b cmp $a} #F'
This will add newline to output as well, courtesy -l option
See Command switches for doc on all the options
The input is basically split character wise and saved in #F array
Then sorted #F is printed
This will also work line wise for given input file
$ cat ip.txt
$ perl -F -lane 'print sort #F' ip.txt
This would have been more appropriate as a comment to one of the grep -o . solutions (my reputation's not quite up to that low bar alas, damn my lurking), but I thought it worth mentioning that separating letters can be done more efficiently within the shell. It's always worth avoiding code, but this letsep function is pretty small:
letsep ()
while [ "$INWORD" ]
echo ${INWORD:0:1}
. . . and outputs one letter per line for an input string of arbitrary length. For example, once letsep is defined, populating an array FLETRS with the letters of a string contained in variable FRED could be done (assuming contemporary bash) as:
readarray -t FLETRS < <(letsep $FRED)
. . . which for word-size strings runs about twice as fast as the equivalent :
readarray -t FLETRS < <(echo $FRED | grep -o .)
Whether this is worth setting up depends on the application. I've only measured this crudely, but the slower procedural code seems to maintain an advantage over the context switch up to ~60 chars (grep is obviously more efficient, but loading it is relatively expensive). If the above operation is taking place in one or more steps of a loop over an indeterminate number of executions, the difference in efficiency can add up (at which point some might argue for switching tools and rewriting regardless, but that's another set of tradeoffs).
This was an interview question, nevertheless still a programming question.
I have a unix file with two columns name and score. I need to display count of all the scores.
jhon 100
dan 200
rob 100
mike 100
the output should be
100 3
200 1
You only need to use built in unix utility to solve it, so i am assuming using shell scripts . or reg ex. or unix commands
I understand looping would be one way to do. store all the values u have already seen and then grep every record for unseen values. any other efficient way of doing it
Try this:
cut -d ' ' -f 2 < /tmp/foo | sort -n | uniq -c \
| (while read n v ; do printf "%s %s\n" "$v" "$n" ; done)
The initial cut could be replaced with another while read loop, which would be more resilient to input file format variations (extra whitespace). If some of the names consist in several words, simple field extraction will not work as easily, but sed can do it.
Otherwise, use your favorite programming language. Perl would probably shine. It is not difficult either in Java or even in C or Forth.
$ cat foo.txt
jhon 100
dan 200
rob 100
mike 100
$ awk '{print $2}' foo.txt | sort | uniq -c
3 100
1 200
Its a pity you can't do a count with sort or uniq alone.
Edit: I just noticed I have the count in front ... to get it exactly the same you can do:
$ awk '{print $2}' foo.txt | sort | uniq -c | awk '{ print $2 " " $1 }'
Not very complicated in perl:
#!/usr/bin/perl -w
use strict;
use warnings;
my %count = ();
while (<>) {
my ($name, $score) = split(/ /);
foreach my $key (sort keys %count) {
print "$key ", $count{$key}, "\n";
You could go with awk:
awk '/.*/ { a[$2] = a[$2] + 1; } END { for (x in a) { print x, " ", a[x] } }' record_file.txt
Alternatively with shell commands:
for i in `awk '{print $2}' inputfile | sort -u`
echo -n "$i "
grep $i inputfile | wc -l
The first awk command will give a list of all the different scores (e.g. 100 and 200) which then
the for loop iterates over, counting up each separately. Not very super efficient, but simple. If the file is not to big is should not be a too big problem.
You can do 'ls -l' to get a detailed directory listing like this:
-rw-rw-rw- 1 alice themonkeys 1159995999 2008-08-20 07:01 foo.log
-rw-rw-rw- 1 bob bob 244251992 2008-08-20 05:30 bar.txt
But notice how you have to slide your finger along the screen to figure out the order of magnitude of those file sizes.
What's a good way to add commas to the file sizes in the directory listing, like this:
-rw-rw-rw- 1 alice themonkeys 1,159,995,999 2008-08-20 07:01 foo.log
-rw-rw-rw- 1 bob bob 244,251,992 2008-08-20 05:30 bar.txt
I just discovered that it's built-in to GNU Core Utils and works for ls and du!
ls -l --block-size="'1"
du --block-size="'1"
It works on Ubuntu but sadly doesn't on OSX. More on variants of block size here
If the order of magnitude is all you're interested in, ls -lh does something like this:
-rw-r----- 1 alice themonkeys 626M 2007-02-05 01:15 foo.log
-rw-rw-r-- 1 bob bob 699M 2007-03-12 23:14 bar.txt
I don't think 'ls' has exactly that capability. If you are looking for readability, 'ls -lh' will give you file sizes that are easier for humans to parse.
-rw-rw-rw- 1 alice themonkeys 1.2G 2008-08-20 07:01 foo.log
-rw-rw-rw- 1 bob bob 244M 2008-08-20 05:30 bar.txt
Here's an improvement to, it allows you to use ls with or without listing the file sizes. Alias ls to to use it.
# Does ls and adds commas to numbers if ls -l is used.
$largest_number_of_commas = 0;
$result = `ls -C #ARGV`;
# First line adds five spaces before file size
$result =~ s/(^[-lrwxds]{10,}\s*[^\s]+\s*[^\s]+\s*[^\s]+)/$1 /gm;
$result =~ s/(.{5} )(\d{4,}) /truncatePre($1,$2).commafy($2).' '/eg;
$remove_extra_spaces = 5 - $largest_number_of_commas;
$result =~ s/(^[-lrwxds]{10,}\s*[^\s]+\s*[^\s]+\s*[^\s]+) {$remove_extra_spaces}/$1/gm;
print $result;
# adds commas to an integer as appropriate
sub commafy
my($num) = #_;
my $len = length($num);
if ($len <= 3) { return $num; }
return commafy(substr($num, 0, $len - 3)) . ',' . substr($num, -3);
# removes as many chars from the end of str as there are commas to be added
# to num
sub truncatePre
my($str, $num) = #_;
$numCommas = int((length($num)-1) / 3);
if ($numCommas > $largest_number_of_commas) {$largest_number_of_commas = $numCommas}
return substr($str, 0, length($str) - $numCommas);
This common sed script should work:
ls -l | sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'
However, I agree with the earlier comment suggesting ls -lh is probably the better general solution for the desired effect.
This is on OS X, so you might have to tweak it a bit for your Unix flavor. I created such a function for this purpose in my ~/.bashrc dot file. The trick is using ' in the awk printf format string for the file size. A caveat: the awk mangles the "total" first line somewhat, and loses terminal coloration as well. Otherwise, one of its merits is that it tries to keep columns aligned as much as possible. To me this instantly gives a visual estimation of how big a file is. The -h switch solution is okay, but your brain needs to convert those Ks, Bs, Gs. The biggest advantage to the solution below is that you can pipe it to sort and sort would understand it. As in "lc | sort -k5,5nr" for example.
lc() {
/bin/ls -l -GPT | /usr/bin/awk "{
printf \"%-11s \", \$1;
printf \"%3s \", \$2;
printf \"%-6s \", \$3;
printf \"%-6s \", \$4;
printf \"%'12d \", \$5;
printf \"%3s \", \$6;
printf \"%2s \", \$7;
for (i=8; i<=NF; i++) {
printf \"%s \", \$i
printf \"\n\";
Here's a perl script that will filter the output of 'ls -l' to add the commas.
If you call the script then you can alias 'ls' to 'ls -l |'.
#!/usr/bin/perl -p
# pipe the output of ls -l through this to add commas to numbers.
s/(.{5} )(\d{4,}) /truncatePre($1,$2).commafy($2).' '/e;
# adds commas to an integer as appropriate
sub commafy
my($num) = #_;
my $len = length($num);
if ($len <= 3) { return $num; }
return commafy(substr($num, 0, $len - 3)) . ',' . substr($num, -3);
# removes as many chars from the end of str as there are commas to be added
# to num
sub truncatePre
my($str, $num) = #_;
$numCommas = int((length($num)-1) / 3);
return substr($str, 0, length($str) - $numCommas);
Actually, I was looking for a test for a young trainee and this seemed ideal. Here's what he came up with:
for i in $(ls -1)
sz=$(expr $(ls -ld $i | awk '{print $5}' | wc -c) - 1)
printf "%10d %s\n" $sz $i
It gives the order of magnitude for the size in a horribly inefficient way. I'll make this community wiki since we're both interested how you rate his code, but I don't want my rep suffering.
Feel free to leave comments (be gentle, he's a newbie, though you wouldn't guess it by his shell scripting :-).
I wrote this several years ago, works on stdin;
Read stdin & insert commas in numbers for readability emit to stdout.
$ ls -l testdatafile.1000M
-rw-rw-r--+ 1 mkm wheel 1048576000 Apr 24 12:45 testdatafile.1000M
$ ls -l testdatafile.1000M | commas
-rw-rw-r--+ 1 mkm wheel 1,048,576,000 Apr 24 12:45 testdatafile.1000M
export LS_BLOCK_SIZE="'1"
will do this for recent versions of GNU ls.