Find most occurring words in text file

Find most occurring words in text file - unix

I have a log file which logs cat and sub cat names that failed with message error. My goal is to find the most occurring categories.
e.g. log.:
Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073'
Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'
Now I want to identify the top 10 categories that failed.
Using sed:
sed -e 's/\s/\n/g' < file.log | grep ERROR | sort | uniq -c | sort -nr | head -10
I am getting 1636 [ERROR
While I was looking for a list of categories sorting after amount of occurrenxe. e.g.
139 category1
23 category 2
...

You say you want to make a counting using sed, but actually, you are having an entire pipeline with sed, grep, sort, uniq and head. Generally, when this happens, your problem is screaming for awk:
awk 'BEGIN{FS="\047"; PROCINFO["sorted_in"]="#val_num_asc"}
/\[ERROR /{c[$2]++}
END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file
The above solution is a GNU awk solution as it makes use of non-POSIX compliant features such as the sorting of the array traversal (PROCINFO). The field separator is set to the <single quote> (') which has octal value \047 as it assumes that the category name is between single quotes.
If you are not using GNU awk, you could use sort and head or do the sorting yourself. One way is:
awk 'BEGIN{FS="\047"; n=10 }
/\[ERROR /{ c[$2]++ }
END {
for (l in c) {
for (i=1;i<=n;++i) {
if (c[l] > c[s[i]]) {
for(j=n;j>i;--j) s[j]=s[j-1];
s[i]=l
break
}
}
}
for (i=1;i<=n;++i) {
if (s[i]=="") break
print c[s[i]], s[i]
}
}' file
or just do:
awk 'BEGIN{FS="\047"}
/\[ERROR /{c[$2]++}
END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file \
| sort -nr | head -10

You got 1636 [ERROR because you change the space character into a newline character, then you grep the word ERROR, then you count.
This :
sed -e 's/\s/\n/g' < file.log | grep ERROR
Gives you this :
[ERROR
[ERROR
[ERROR
[ERROR
[ERROR
[ERROR
... (1630 more)
You need to grep first then sed (pretty sure you can do better with sed but I'm just talking about the logic behind the commands) :
grep ERROR file.log | sed -e 's/\s/\n/g' | sort | uniq -c | sort -nr | head -10
This may not be the best solution as it counts the word ERROR and other useless words, but you didn't give us a lot of information on the input file.

Assuming 'Bulgari' is an example of a category you want to extract, try
sed -n "s/.*ERROR.*\] Category '\([^']*\)'.*/\1/p" file.log |
sort | uniq -c | sort -rn | head -n 10
The sed command finds lines which match a fairly complex regular expression and captures part of the line, then replaces the match with the captured substring, and prints it (the -n option disables the default print action, so we only print the extracted lines). The rest is basically identical to what you already had.
In the regex, we look for (beginning of line followed by) anything (except a newline) followed by ERROR and later on followed by ] Category ' and then a string which doesn't contain a single quote, then the closing single quote followed by anything. The lots of "anything (except newline)" are required in order to replace the entire line with just the captured string from inside the single quotes. The backslashed parentheses are what capture an expression; google for "backref" for the full scoop.
Your original attempt would only extract the actual ERROR strings, because you replaced all the surrounding spaces with newlines (assuming vaguely that your sed accepts the Perl \s shorthand, which isn't standard in sed, and that \n gets interpreted as a literal newline in the replacement, which also isn't entirely standard or portable).

The way to go is to select the erred categories and replace the whole line with only the Category name using sed.
Give a try to this:
sed -e "s/^.* [[]ERROR .*[]] Category '\([^']*\)' .*$/\1/g" file.log | sort | uniq -c | sort -nr | head -16
^ is the start of the line
\( ... \) : the char sequence enclosed in this escaped parenthesis can be referred with \1 for the first pair appearing in the regex, \2 for the second pair etc.
$ is the end of the line.
The sed selects a line which contains [ERROR and some chars until a ], folled with the word Category, and then after the (space) char, any sequence of chars, up to the next space char, is selected with a pair of escaped parenthesis, followed with any sequence of chars up to the end of the line. If a such a line is found, it is replaced with the char sequence after Category.

Using Perl
> cat merlin.txt
Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073'
Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'
Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'
> perl -ne ' { s/(.*)Category.*for(.+)ref.*/\2/g and s/(\047\S+\047)/$kv{$1}++/ge if /ERROR/} END { foreach (sort keys %kv) { print "$_ $kv{$_}\n" } } ' merlin.txt | sort -nr
'subcat-name2' 1
'subcat-name1' 1
'model' 1
'mcat-name2' 1
'mcat-name1' 1
'make' 1
>

Related

warning: here-document at line 4 delimited by end-of-file (wanted `limit')

I did and try but not able to rectify
opal#opal-Inspiron-15-3567:~/PRABHAT/unix$ bash valcode.sh
valcode.sh: line 5: unexpected EOF while looking for matching ``' valcode.sh: line 19: syntax error: unexpected end of file
IFS="|"
while echo "Enter deparment code:" ; do
read dcode
set -- `grep "^$dcode" <<-limit
01|accounts|6123
02 | admin | 5423
03 | marketing |6521
04 | personnel |2365
05 | production | 9876
06 | sales | 1006
limit'
case $# in
3) echo "deparment name : $2\nEmp-id of head of dept :$3\n"
shift 3 ;;
*) echo "Invalid code" ; continue
esac
done
the output is not coming as per desire

On line 4 you write `grep but the backtick ` is unmatched. Backticks always come in pairs so the interpreter keeps going looking for the match. Eventually it reached the end of the file without finding it and gives up.
Adding the matching backtick (at the end of the line?) will solve this problem.

How to append suffix to all string matched regular expression in unix

I need to replace all occurrences of string in specific format (in my case colon followed by some number) with same string with suffix in a file, like this:
:123456 -> :123456_suffix
Is there a way to do it with sed or other unix command-line tool?

Sed should do that:
sed -i~ -e 's/:\([0-9]\{1,\}\)/:\1_suffix/g' file
^ ^ ^ ^ ^ ^
| | | | | |
start capture | | end | globally, i.e. not just the first
group | | capture | occurrence on a line
any digit | the first capture
one or group contents
more times
If -i is not supported, just create a new file and replace the old one:
sed ... > newfile
mv oldfile oldfile~ # a backup
mv newfile oldfile

use sed,
sed 's/\(:[0-9]\+\)/\1_suffix/g' file
add -i modifier , if you want to do an in-place edit.

Counting the number of 7 character words in a file that start with tree and do not end in u or v

I'm trying to count the number of 7 character words in a file that start with tree and do not end in u or v. I know how to specify the begin with tree and end in u or v condition in cat, but I'm not sure how to identify exactly 7 words or enter the conditions using wc. My pathname is /users/file1.txt.
This is the valid cat command(missing number of 7 character words)
cat /users/file1.txt | grep ^tree.*[!uv]
Below is the invalid wc command(missing number of 7 character words)
wc - w /users/file1.txt | grep ^tree.*[!uv]

Do you like perl? Here a one-liner:
cat /users/file1.txt | perl -lne 'if (/^(tree)(.{4}$)(?<![uv])/) { print $_ }'

sed -e 's/%//g' -e 's/\btree..[^uv]\b/%/g' -e 's/[^%]//g' -e 's/%/word /g' /users/file1.txt | wc -w
Don't let anyone steal our token.
Give us a token for what we want to count; match word boundaries to count to 7, negate match character in (u,v).
Get rid of everything else.
Turn our token into a friendly word plus a space.
Count 'em.

Reut's answer is very close.
But this will get you where you need:
cat /users/file1.txt | grep -wo 'tree..[^uv]' | wc -l
-w will get exact word matches
see that I ditched the .* and specified .. instead, as the total number of characters matched is 7
I also got rid of the ^tree so you can also match words that aren't at the beginning of the line.

Using grep and wc:
# echo the file # filter files # grep EXACT words # count
cat /users/file1.txt | grep ^tree.*[^u^v] | grep -o '[^\ ]\{7\}' | wc -w
Pipe walkthrough:
Echo content of the source file:
cat /users/file1.txt
Pass only files starting with "tree" and not ending with either "u" or "v":
grep ^tree.*[^u^v]
Forward any word that is composed of 7 non-spaces (if you want only letters use [a-zA-Z] instead of [^\ ]):
grep -o '[^\ ]\{7\}'
Count the words that made it here:
wc -w
Here is one other way using pretty basic bash:
count=0
for word in $(cat f.py)
do
if [ 7 -eq ${#word} ]
then
count=$((count+1))
fi
done
echo $count
Or in a single line:
count=0; for word in $(cat f.py); do if [ 7 -eq ${#word} ]; then count=$((count+1)); fi; done; echo $count
You may want to remove dots and commas from word.

Is there a way to ignore header lines in a UNIX sort?

I have a fixed-width-field file which I'm trying to sort using the UNIX (Cygwin, in my case) sort utility.
The problem is there is a two-line header at the top of the file which is being sorted to the bottom of the file (as each header line begins with a colon).
Is there a way to tell sort either "pass the first two lines across unsorted" or to specify an ordering which sorts the colon lines to the top - the remaining lines are always start with a 6-digit numeric (which is actually the key I'm sorting on) if that helps.
Example:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
500123TSTMY_RADAR00
222334NOTALINEOUT01
477821USASHUTTLES21
325611LVEANOTHERS00
should sort to:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00

(head -n 2 <file> && tail -n +3 <file> | sort) > newfile
The parentheses create a subshell, wrapping up the stdout so you can pipe it or redirect it as if it had come from a single command.

If you don't mind using awk, you can take advantage of awk's built-in pipe abilities
eg.
extract_data | awk 'NR<3{print $0;next}{print $0| "sort -r"}'
This prints the first two lines verbatim and pipes the rest through sort.
Note that this has the very specific advantage of being able to selectively sort parts
of a piped input. all the other methods suggested will only sort plain files which can be read multiple times. This works on anything.

In simple cases, sed can do the job elegantly:
your_script | (sed -u 1q; sort)
or equivalently,
cat your_data | (sed -u 1q; sort)
The key is in the 1q -- print first line (header) and quit (leaving the rest of the input to sort).
For the example given, 2q will do the trick.
The -u switch (unbuffered) is required for those seds (notably, GNU's) that would otherwise read the input in chunks, thereby consuming data that you want to go through sort instead.

Here is a version that works on piped data:
(read -r; printf "%s\n" "$REPLY"; sort)
If your header has multiple lines:
(for i in $(seq $HEADER_ROWS); do read -r; printf "%s\n" "$REPLY"; done; sort)
This solution is from here

You can use tail -n +3 <file> | sort ... (tail will output the file contents from the 3rd line).

head -2 <your_file> && nawk 'NR>2' <your_file> | sort
example:
> cat temp
10
8
1
2
3
4
5
> head -2 temp && nawk 'NR>2' temp | sort -r
10
8
5
4
3
2
1

It only takes 2 lines of code...
head -1 test.txt > a.tmp;
tail -n+2 test.txt | sort -n >> a.tmp;
For a numeric data, -n is required. For alpha sort, the -n is not required.
Example file:
$ cat test.txt
header
8
5
100
1
-1
Result:
$ cat a.tmp
header
-1
1
5
8
100

So here's a bash function where arguments are exactly like sort. Supporting files and pipes.
function skip_header_sort() {
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
local file=${#: -1}
set -- "${#:1:$(($#-1))}"
fi
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
}
How it works. This line checks if there is at least one argument and if the last argument is a file.
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
This saves the file to separate argument. Since we're about to erase the last argument.
local file=${#: -1}
Here we remove the last argument. Since we don't want to pass it as a sort argument.
set -- "${#:1:$(($#-1))}"
Finally, we do the awk part, passing the arguments (minus the last argument if it was the file) to sort in awk. This was orignally suggested by Dave, and modified to take sort arguments. We rely on the fact that $file will be empty if we're piping, thus ignored.
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
Example usage with a comma separated file.
$ cat /tmp/test
A,B,C
0,1,2
1,2,0
2,0,1
# SORT NUMERICALLY SECOND COLUMN
$ skip_header_sort -t, -nk2 /tmp/test
A,B,C
2,0,1
0,1,2
1,2,0
# SORT REVERSE NUMERICALLY THIRD COLUMN
$ cat /tmp/test | skip_header_sort -t, -nrk3
A,B,C
0,1,2
2,0,1
1,2,0

Here's a bash shell function derived from the other answers. It handles both files and pipes. First argument is the file name or '-' for stdin. Remaining arguments are passed to sort. A couple examples:
$ hsort myfile.txt
$ head -n 100 myfile.txt | hsort -
$ hsort myfile.txt -k 2,2 | head -n 20 | hsort - -r
The shell function:
hsort ()
{
if [ "$1" == "-h" ]; then
echo "Sort a file or standard input, treating the first line as a header.";
echo "The first argument is the file or '-' for standard input. Additional";
echo "arguments to sort follow the first argument, including other files.";
echo "File syntax : $ hsort file [sort-options] [file...]";
echo "STDIN syntax: $ hsort - [sort-options] [file...]";
return 0;
elif [ -f "$1" ]; then
local file=$1;
shift;
(head -n 1 $file && tail -n +2 $file | sort $*);
elif [ "$1" == "-" ]; then
shift;
(read -r; printf "%s\n" "$REPLY"; sort $*);
else
>&2 echo "Error. File not found: $1";
>&2 echo "Use either 'hsort <file> [sort-options]' or 'hsort - [sort-options]'";
return 1 ;
fi
}

This is the same as Ian Sherbin answer but my implementation is :-
cut -d'|' -f3,4,7 $arg1 | uniq > filetmp.tc
head -1 filetmp.tc > file.tc;
tail -n+2 filetmp.tc | sort -t"|" -k2,2 >> file.tc;

Another simple variation on all the others, reading a file once
HEADER_LINES=2
(head -n $HEADER_LINES; sort) < data-file.dat

With Python:
import sys
HEADER_ROWS=2
for _ in range(HEADER_ROWS):
sys.stdout.write(next(sys.stdin))
for row in sorted(sys.stdin):
sys.stdout.write(row)

cat file_name.txt | sed 1d | sort
This will do what you want.

Removing trailing / starting newlines with sed, awk, tr, and friends

I would like to remove all of the empty lines from a file, but only when they are at the end/start of a file (that is, if there are no non-empty lines before them, at the start; and if there are no non-empty lines after them, at the end.)
Is this possible outside of a fully-featured scripting language like Perl or Ruby? I’d prefer to do this with sed or awk if possible. Basically, any light-weight and widely available UNIX-y tool would be fine, especially one I can learn more about quickly (Perl, thus, not included.)

From Useful one-line scripts for sed:
# Delete all leading blank lines at top of file (only).
sed '/./,$!d' file
# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
Therefore, to remove both leading and trailing blank lines from a file, you can combine the above commands into:
sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba' file

So I'm going to borrow part of #dogbane's answer for this, since that sed line for removing the leading blank lines is so short...
tac is part of coreutils, and reverses a file. So do it twice:
tac file | sed -e '/./,$!d' | tac | sed -e '/./,$!d'
It's certainly not the most efficient, but unless you need efficiency, I find it more readable than everything else so far.

here's a one-pass solution in awk: it does not start printing until it sees a non-empty line and when it sees an empty line, it remembers it until the next non-empty line
awk '
/[[:graph:]]/ {
# a non-empty line
# set the flag to begin printing lines
p=1
# print the accumulated "interior" empty lines
for (i=1; i<=n; i++) print ""
n=0
# then print this line
print
}
p && /^[[:space:]]*$/ {
# a potentially "interior" empty line. remember it.
n++
}
' filename
Note, due to the mechanism I'm using to consider empty/non-empty lines (with [[:graph:]] and /^[[:space:]]*$/), interior lines with only whitespace will be truncated to become truly empty.

As mentioned in another answer, tac is part of coreutils, and reverses a file. Combining the idea of doing it twice with the fact that command substitution will strip trailing new lines, we get
echo "$(echo "$(tac "$filename")" | tac)"
which doesn't depend on sed. You can use echo -n to strip the remaining trailing newline off.

Here's an adapted sed version, which also considers "empty" those lines with just spaces and tabs on it.
sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
It's basically the accepted answer version (considering BryanH comment), but the dot . in the first command was changed to [^[:blank:]] (anything not blank) and the \n inside the second command address was changed to [[:space:]] to allow newlines, spaces an tabs.
An alternative version, without using the POSIX classes, but your sed must support inserting \t and \n inside […]. GNU sed does, BSD sed doesn't.
sed -e :a -e '/[^\t ]/,$!d; /^[\n\t ]*$/{ $d; N; ba' -e '}'
Testing:
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n'
foo
foo
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -n l
$
\t $
$
foo$
$
foo$
$
\t $
$
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
foo
foo
prompt$

using awk:
awk '{a[NR]=$0;if($0 && !s)s=NR;}
END{e=NR;
for(i=NR;i>1;i--)
if(a[i]){ e=i; break; }
for(i=s;i<=e;i++)
print a[i];}' yourFile

this can be solved easily with sed -z option
sed -rz 's/^\n+//; s/\n+$/\n/g' file
Hello
Welcome to
Unix and Linux

For an efficient non-recursive version of the trailing newlines strip (including "white" characters) I've developed this sed script.
sed -n '/^[[:space:]]*$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^[[:space:]]*$/H'
It uses the hold buffer to store all blank lines and prints them only after it finds a non-blank line. Should someone want only the newlines, it's enough to get rid of the two [[:space:]]* parts:
sed -n '/^$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^$/H'
I've tried a simple performance comparison with the well-known recursive script
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba'
on a 3MB file with 1MB of random blank lines around a random base64 text.
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M > bigfile
base64 </dev/urandom | dd bs=1 count=1M >> bigfile
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M >> bigfile
The streaming script took roughly 0.5 second to complete, the recursive didn't end after 15 minutes. Win :)
For completeness sake of the answer, the leading lines stripping sed script is already streaming fine. Use the most suitable for you.
sed '/[^[:blank:]]/,$!d'
sed '/./,$!d'

Using bash
$ filecontent=$(<file)
$ echo "${filecontent/$'\n'}"

In bash, using cat, wc, grep, sed, tail and head:
# number of first line that contains non-empty character
i=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | head -1`
# number of hte last one
j=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | tail -1`
# overall number of lines:
k=`cat <your_file> | wc -l`
# how much empty lines at the end of file we have?
m=$(($k-$j))
# let strip last m lines!
cat <your_file> | head -n-$m
# now we have to strip first i lines and we are done 8-)
cat <your_file> | tail -n+$i
Man, it's definitely worth to learn "real" programming language to avoid that ugliness!

#dogbane has a nice simple answer for removing leading empty lines. Here's a simple awk command which removes just the trailing lines. Use this with #dogbane's sed command to remove both leading and trailing blanks.
awk '{ LINES=LINES $0 "\n"; } /./ { printf "%s", LINES; LINES=""; }'
This is pretty simple in operation.
Add every line to a buffer as we read it.
For every line which contains a character, print the contents of the buffer and then clear it.
So the only things that get buffered and never displayed are any trailing blanks.
I used printf instead of print to avoid the automatic addition of a newline, since I'm using newlines to separate the lines in the buffer already.

This AWK script will do the trick:
BEGIN {
ne=0;
}
/^[[:space:]]*$/ {
ne++;
}
/[^[:space:]]+/ {
for(i=0; i < ne; i++)
print "";
ne=0;
print
}
The idea is simple: empty lines do not get echoed immediately. Instead, we wait till we get a non-empty line, and only then we first echo out as much empty lines as seen before it, and only then echo out the new non-empty line.

perl -0pe 's/^\n+|\n+(\n)$/\1/gs'

Here's an awk version that removes trailing blank lines (both empty lines and lines consisting of nothing but white space).
It is memory efficient; it does not read the entire file into memory.
awk '/^[[:space:]]*$/ {b=b $0 "\n"; next;} {printf "%s",b; b=""; print;}'
The b variable buffers up the blank lines; they get printed when a non-blank line is encountered. When EOF is encountered, they don't get printed. That's how it works.
If using gnu awk, [[:space:]] can be replaced with \s. (See full list of gawk-specific Regexp Operators.)
If you want to remove only those trailing lines that are empty, see #AndyMortimer's answer.

A bash solution.
Note: Only useful if the file is small enough to be read into memory at once.
[[ $(<file) =~ ^$'\n'*(.*)$ ]] && echo "${BASH_REMATCH[1]}"
$(<file) reads the entire file and trims trailing newlines, because command substitution ($(....)) implicitly does that.
=~ is bash's regular-expression matching operator, and =~ ^$'\n'*(.*)$ optionally matches any leading newlines (greedily), and captures whatever comes after. Note the potentially confusing $'\n', which inserts a literal newline using ANSI C quoting, because escape sequence \n is not supported.
Note that this particular regex always matches, so the command after && is always executed.
Special array variable BASH_REMATCH rematch contains the results of the most recent regex match, and array element [1] contains what the (first and only) parenthesized subexpression (capture group) captured, which is the input string with any leading newlines stripped. The net effect is that ${BASH_REMATCH[1]} contains the input file content with both leading and trailing newlines stripped.
Note that printing with echo adds a single trailing newline. If you want to avoid that, use echo -n instead (or use the more portable printf '%s').

I'd like to introduce another variant for gawk v4.1+
result=($(gawk '
BEGIN {
lines_count = 0;
empty_lines_in_head = 0;
empty_lines_in_tail = 0;
}
/[^[:space:]]/ {
found_not_empty_line = 1;
empty_lines_in_tail = 0;
}
/^[[:space:]]*?$/ {
if ( found_not_empty_line ) {
empty_lines_in_tail ++;
} else {
empty_lines_in_head ++;
}
}
{
lines_count ++;
}
END {
print (empty_lines_in_head " " empty_lines_in_tail " " lines_count);
}
' "$file"))
empty_lines_in_head=${result[0]}
empty_lines_in_tail=${result[1]}
lines_count=${result[2]}
if [ $empty_lines_in_head -gt 0 ] || [ $empty_lines_in_tail -gt 0 ]; then
echo "Removing whitespace from \"$file\""
eval "gawk -i inplace '
{
if ( NR > $empty_lines_in_head && NR <= $(($lines_count - $empty_lines_in_tail)) ) {
print
}
}
' \"$file\""
fi

Because I was writing a bash script anyway containing some functions, I found it convenient to write those:
function strip_leading_empty_lines()
{
while read line; do
if [ -n "$line" ]; then
echo "$line"
break
fi
done
cat
}
function strip_trailing_empty_lines()
{
acc=""
while read line; do
acc+="$line"$'\n'
if [ -n "$line" ]; then
echo -n "$acc"
acc=""
fi
done
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Find most occurring words in text file - unix

Related

warning: here-document at line 4 delimited by end-of-file (wanted `limit')

How to append suffix to all string matched regular expression in unix

Counting the number of 7 character words in a file that start with tree and do not end in u or v

Is there a way to ignore header lines in a UNIX sort?

Removing trailing / starting newlines with sed, awk, tr, and friends

Categories

Resources