Count and display size of incoming stdin (count lines)

Count and display size of incoming stdin (count lines) - unix

Is there a program that would result in output like "wc -l" but that would update counter on more data.
Here is what I want it for:
tail -f log/production.log | grep POST | wc -l
But wc -l should be changed for something.

tail -f log/production.log | grep --line-buffered POST | awk '{printf "\r%d", ++i} END {print ""}'
This prints the line count after every line of input. The carriage return \r makes each line number overwrite the last, so you only see the most recent one.
Use grep --line-buffered to make grep flush its output after each line rather than every 4KB. Or you can combine the grep and awk into one:
tail -f log/production.log | awk '/POST/ {printf "\r%d", ++i} END {print ""}'

Related

Get the latest modified file and count lines from modified file

Trying to write a simple script to find the latest modified file from a directory and then count the lines of that modified file. Below is part of my script.
Note: the $stg variable is created for another directory
echo "LATEST LOG = $(ls -ltr $stg/abc/foo***.txt | awk '{print $5, $6, $7, $8, $9}' | tail -n1)"
echo "COUNT = $(wc -l $stg/abc/foo***.txt | tail -n1)"
What happens on the "COUNT" part is that it does not match the count of the LATEST LOG because it seems to be counting a different log file.
Any suggestions? Thank you!

Suggestion: store the result of the latest log in a variable, and reuse it in the count. Like this:
#!/bin/bash
latestlogline=$(ls -ltr foo*.txt | awk '{print $5, $6, $7, $8, $9}' | tail -n1)
latestlogfilename=$(echo $latestlogline | awk 'NF>1{print $NF}')
echo "LATEST LOG = $(echo $latestlogline)"
echo "COUNT = $(wc -l $latestlogfilename)"
Details:
latestlogline: your code exactly, to extract the complete line of information
latestlogfilename: just the filename. wc -l expects a filename, so extract it from your first command.
Then just echo the variable values.
As commented before, *** is exactly the same thing as *.

How do I use a filter, that is a taken input from standard input, instead of from the default log file /var/log/messages

How would I start off using get-opts?
The output should be similar to this, but I need the filter that is a taken input from standard input
awk < /var/log/messages '{ print $2, $1, $5}' | uniq -c | awk '{ print $2, $3, $1, $4 }' | cut -d':' -f1

If you write a shell script, then the normal answer is to use "$#", which is somewhat magical:
#!/bin/bash
awk '{ print $2, $1, $5}' "$#" |
uniq -c |
awk '{ print $2, $3, $1, $4 }' |
cut -d':' -f1
The "$#" represents 'all the command line arguments', or nothing if there are no command line arguments. Given that awk is a command that reads the files named on its command line, or standard input if no files are specified, this will work correctly — it is an important part of the design of good, general purpose, Unix tools. If you want to process /var/log/messages in the absence of a command line argument, then you need to use the shell parameter expansion notation:
"${#:-/var/log/messages}"
If there are no arguments in the argument list, then it substitutes /var/log/messages as the file name.
You can find out more about "$#" in the Bash manual under Special Parameters. See also How to iterate over the arguments in a Bash script.
Note that uniq does not sort the data; it looks for lines that are adjacent and the same. You would usually insert a sort before uniq -c:
#!/bin/bash
awk '{ print $2, $1, $5}' "$#" |
sort |
uniq -c |
awk '{ print $2, $3, $1, $4 }' |
cut -d':' -f1

AWK : Unzipping and printing File Name and first line

I am trying to unzip files in folder and print first line LASTMODIFIEDDATE
But the below will print First line with '-'
for file in /export/home/xxxxxx/New_folder/*.gz;
do
gzip -dc "$file" | awk 'NR=1 {print $0, FILENAME}' | awk '/LASTMODIFIEDDATE/'
done
1.How can i modify the above code to print filename that is unzipped.
2.I am a beginner and suggestion to improve the above code are welcome

A few issues:
Your first awk should have double equals signs if you mean to address the first line:
awk 'NR==1{...}'
Your second awk will only ever see the output of the first awk, which only shows the first line, so you will not see any lines with LASTMODIFIED in them unless they are the first. So this will show you the first line and any lines containing LASTMODIFIED.
for ...
do
echo $file
gzip -dc "$file" | awk 'NR==1 || /LASTMODIFIED/'
done
Or you may mean this:
for ...
do
gzip -dc "$file" | awk -v file="$file" 'NR==1{print $0 " " file} /LASTMODIFIED/'
done
which will print the first line followed by the filename and also any lines containing LASTMODIFIED.

Do this with an echo. Also you might want to use grep instead of awk in this case.
for file in /export/home/xxxxxx/New_folder/*.gz;
do
echo $file
gzip -dc "$file" | grep LASTMODIFIEDDATE
done

Create newline with awk Command?

Im trying to edit a script in geek tool (mac app for desktop widgets), could someone help me make this statement only print out 10 or so words per line?
curl -s www.brainyquote.com/quotes_of_the_day.html | egrep '(div class="bqQuoteLink")| (ahref)' | sed -n '19p; 20p;' | sed -e 's/<[^>]*>//g'\
Just now everything is printed out on one big line.
I think i could use the awk command, although I am unsure how to do so on this output.
Any help would be appreciated !!

Try running it through:
| fold -s

Add this to the end of your command
| awk '{ print $1 $2 $3 $4 $5 $6 $7 $8 $9 }'
That should only print the first nine words per line, assuming they're space-delimited.

How to keep a file's format if you use the uniq command (in shell)?

In order to use the uniq command, you have to sort your file first.
But in the file I have, the order of the information is important, thus how can I keep the original format of the file but still get rid of duplicate content?

Another awk version:
awk '!_[$0]++' infile

This awk keeps the first occurrence. Same algorithm as other answers use:
awk '!($0 in lines) { print $0; lines[$0]; }'
Here's one that only needs to store duplicated lines (as opposed to all lines) using awk:
sort file | uniq -d | awk '
FNR == NR { dups[$0] }
FNR != NR && (!($0 in dups) || !lines[$0]++)
' - file

There's also the "line-number, double-sort" method.
nl -n ln | sort -u -k 2| sort -k 1n | cut -f 2-

You can run uniq -d on the sorted version of the file to find the duplicate lines, then run some script that says:
if this_line is in duplicate_lines {
if not i_have_seen[this_line] {
output this_line
i_have_seen[this_line] = true
}
} else {
output this_line
}

Using only uniq and grep:
Create d.sh:
#!/bin/sh
sort $1 | uniq > $1_uniq
for line in $(cat $1); do
cat $1_uniq | grep -m1 $line >> $1_out
cat $1_uniq | grep -v $line > $1_uniq2
mv $1_uniq2 $1_uniq
done;
rm $1_uniq
Example:
./d.sh infile

You could use some horrible O(n^2) thing, like this (Pseudo-code):
file2 = EMPTY_FILE
for each line in file1:
if not line in file2:
file2.append(line)
This is potentially rather slow, especially if implemented at the Bash level. But if your files are reasonably short, it will probably work just fine, and would be quick to implement (not line in file2 is then just grep -v, and so on).
Otherwise you could of course code up a dedicated program, using some more advanced data structure in memory to speed it up.

for line in $(sort file1 | uniq ); do
grep -n -m1 line file >>out
done;
sort -n out
first do the sort,
for each uniqe value grep for the first match (-m1)
and preserve the line numbers
sort the output numerically (-n) by line number.
you could then remove the line #'s with sed or awk