Breaking out of "tail -f" that's being read by a "while read" loop in HP-UX - unix

I'm trying to write a (sh -bourne shell) script that processes lines as they are written to a file. I'm attempting to do this by feeding the output of tail -f into a while read loop. This tactic seems to be proper based on my research in Google as well as this question dealing with a similar issue, but using bash.
From what I've read, it seems that I should be able to break out of the loop when the file being followed ceases to exist. It doesn't. In fact, it seems the only way I can break out of this is to kill the process in another session. tail does seem to be working fine otherwise as testing with this:
touch file
tail -f file | while read line
do
echo $line
done
Data I append to file in another session appears just file from the loop processing written above.
This is on HP-UX version B.11.23.
Thanks for any help/insight you can provide!

If you want to break out, when your file does not exist any more, just do it:
test -f file || break
Placing this in your loop, should break out.
The remaining problem is, how to break the read line, as this is blocking.
This could you do by applying a timeout, like read -t 5 line. Then every 5 second the read returns, and in case the file does not longer exist, the loop will break. Attention: Create your loop that it can handle the case, that the read times out, but the file is still present.
EDIT: Seems that with timeout read returns false, so you could combine the test with the timeout, the result would be:
tail -f test.file | while read -t 3 line || test -f test.file; do
some stuff with $line
done

I don't know about HP-UX tail but GNU tail has the --follow=name option which will follow the file by name (by re-opening the file every few seconds instead of reading from the same file descriptor which will not detect if the file is unlinked) and will exit when the filename used to open the file is unlinked:
tail --follow=name test.txt

Unless you're using GNU tail, there is no way it'll terminate of its own accord when following a file. The -f option is really only meant for interactive monitoring--indeed, I have a book that says that -f "is unlikely to be of use in shell scripts".
But for a solution to the problem, I'm not wholly sure this isn't an over-engineered way to do it, but I figured you could send the tail to a FIFO, then have a function or script that checked the file for existence and killed off the tail if it'd been unlinked.
#!/bin/sh
sentinel ()
{
while true
do
if [ ! -e $1 ]
then
kill $2
rm /tmp/$1
break
fi
done
}
touch $1
mkfifo /tmp/$1
tail -f $1 >/tmp/$1 &
sentinel $1 $! &
cat /tmp/$1 | while read line
do
echo $line
done
Did some naïve testing, and it seems to work okay, and not leave any garbage lying around.

I've never been happy with this answer but I have not found an alternative either:
kill $(ps -o pid,cmd --no-headers --ppid $$ | grep tail | awk '{print $1}')
Get all processes that are children of the current process, look for the tail, print out the first column (tail's pid), and kill it. Sin-freaking-ugly indeed, such is life.

The following approach backgrounds the tail -f file command, echos its process id plus a custom string prefix (here tailpid: ) to the while loop where the line with the custom string prefix triggers another (backgrounded) while loop that every 5 seconds checks if file is still existing. If not, tail -f file gets killed and the subshell containing the backgrounded while loop exits.
# cf. "The Heirloom Bourne Shell",
# http://heirloom.sourceforge.net/sh.html,
# http://sourceforge.net/projects/heirloom/files/heirloom-sh/ and
# http://freecode.com/projects/bournesh
/usr/local/bin/bournesh -c '
touch file
(tail -f file & echo "tailpid: ${!}" ) | while IFS="" read -r line
do
case "$line" in
tailpid:*) while sleep 5; do
#echo hello;
if [ ! -f file ]; then
IFS=" "; set -- ${line}
kill -HUP "$2"
exit
fi
done &
continue ;;
esac
echo "$line"
done
echo exiting ...
'

Related

Find when tar find n matches using wildcards

I'm trying to extract from a huge tar file some files from a list that are using wildcards. I'm using a loop to read the list but passing from one element in the list to the next one is taking too long, I'm guessing because is trying to match the element through the whole tar file. I want that after 2 matches for any element, the loop continues with the next one.
while read line;do
tar --wildcards -xzvf file.tar.gz "$line"
done <$file
And one line looks like this
dataset/0113947.*
I went aggresive and kill the tar process as soon as it finds two files. Here is my solution
file=list.txt
while read line;do
tar --wildcards --checkpoint=10000 --checkpoint-action=exec='sh stop.sh dummy.txt 1' -xzvf ny_file.tar.gz "$line" > dummy.txt
done <$file
where stop.sh checks if dummy.txt has more than two lines and kill the process.
n=$(wc -l < $1)
if [ $n -gt 1 ];then
kill $(ps aux|grep "[t]ar --wildcards*" | cut -d " " -f 4)
fi
I had to use cut to recover the ID process because the single quotes for awk were troubling

awk getline not accepting external variable from a file

I have a file test.sh from which I am executing the following awk command.
awk -f x.awk < result/output.txt >>difference.txt
x.awk
while (getline < result/$bld/$DeviceType)
the variable DeviceType and bld are available in test.sh.
I have declared them as export type.
export DeviceType=$line
Even then while executing test.sh file, the script stops at following line
awk -f x.awk < result/output.txt >>difference.txt
and I am getting
awk: x.awk:4: (FILENAME=- FNR=116) fatal: division by zero attempted
error.
The awk script is read by awk, not touched by the shell. Inside an awk script, $bld means 'the field designated by the number in the variable bld' (that's the awk variable bld).
You can set awk variables on the command line (officially with the -v option):
awk -v bld="$bld" -v dev="$DeviceType" -f x.awk < result/output.txt >> difference.txt
Whether that does what you want is still debatable. Most likely you need x.awk to contain something like:
BEGIN { file = sprintf("result/%s/%s", bld, dev); }
{ while ((getline < file) > 0) print }
awk is not shell just like C is not shell. You should not expect to be able to access shell variables within an awk program any more than you can access shell variables within a C program.
To pass the VALUE of shell variables to an awk script, see http://cfajohnson.com/shell/cus-faq-2.html#Q24 for details but essentially:
awk -v awkvar="$shellvar" '{ ... use awkvar ...}'
is usually the right approach.
Having said that, whatever you're trying to do it looks like the wrong approach. If you are considering using getline, make sure to read http://awk.freeshell.org/AllAboutGetline first and understand all of the caveats but if you tell us what it is you're trying to do with sample input and expected output we can almost certainly help you come up with a better approach that has nothing to do with getline.

unix shell, getting exit code with piped child

Let's say I do this in a unix shell
$ some-script.sh | grep mytext
$ echo $?
this will give me the exit code of grep
but how can I get the exit code of some-script.sh
EDIT
Assume that the pipe operation is immutable. ie, I can not break it apart and run the two commands seperately
There are multiple solutions, it depends on what you want to do exactly.
The easiest and understandable way would be to send the output to a file, then grep for it after saving the exit code:
tmpfile=$(mktemp)
./some-script.sh > $tmpfile
retval=$?
grep mytext $tmpfile
rm tmpfile
A trick from the comp.unix.shell FAQ (#13) explains how using the pipeline in the Bourne shell should help accomplish what you want:
You need to use a trick to pass the exit codes to the main
shell. You can do it using a pipe(2). Instead of running
"cmd1", you run "cmd1; echo $?" and make sure $? makes it way
to the shell.
exec 3>&1
eval `
# now, inside the `...`, fd4 goes to the pipe
# whose other end is read and passed to eval;
# fd1 is the normal standard output preserved
# the line before with exec 3>&1
exec 4>&1 >&3 3>&-
{
cmd1 4>&-; echo "ec1=$?;" >&4
} | {
cmd2 4>&-; echo "ec2=$?;" >&4
} | cmd3
echo "ec3=$?;" >&4
If you're using bash:
PIPESTATUS
An array variable (see Arrays) containing a list of exit status values from the processes in the most-recently-executed foreground pipeline (which may contain only a single command).
There is a utility named mispipe which is part of the moreutils package.
It does exactly that: mispipe some-script.sh 'grep mytext'
First approach, temporarly save exit status in some file. This cause you must create subshell using braces:
(your_script.sh.pl.others; echo $? >/tmp/myerr)|\ #subshell with exitcode saving
grep sh #next piped commands
exitcode=$(cat /tmp/myerr) #restore saved exitcode
echo $exitcode #and print them
another approach presented by Randy above, simplier code implementation:
some-script.sh | grep mytext
echo ${PIPESTATUS[0]} #print exitcode for first commands. tables are indexted from 0
its all. both works under bash (i know, bashizm). good luck :)
both approaches does not save temporarly pipe to physical file, only exit code.

'tee' and exit status

Is there an alternative to tee which captures standard output and standard error of the command being executed and exits with the same exit status as the processed command?
Something like the following:
eet -a some.log -- mycommand --foo --bar
Where "eet" is an imaginary alternative to "tee" :) (-a means append and -- separates the captured command). It shouldn't be hard to hack such a command, but maybe it already exists and I'm not aware of it?
This works with Bash:
(
set -o pipefail
mycommand --foo --bar | tee some.log
)
The parentheses are there to limit the effect of pipefail to just the one command.
From the bash(1) man page:
The return status of a pipeline is the exit status of the last command, unless the pipefail option is enabled. If pipefail is enabled, the pipeline's return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully.
I stumbled upon a couple of interesting solutions at Capture Exit Code Using Pipe & Tee.
There is the $PIPESTATUS variable available in Bash:
false | tee /dev/null
[ $PIPESTATUS -eq 0 ] || exit $PIPESTATUS
And the simplest prototype of "eet" in Perl may look as follows:
open MAKE, "command 2>&1 |" or die;
open (LOGFILE, ">>some.log") or die;
while (<MAKE>) {
print LOGFILE $_;
print
}
close MAKE; # To get $?
my $exit = $? >> 8;
close LOGFILE;
Here's an eet. Works with every Bash I can get my hands on, from 2.05b to 4.0.
#!/bin/bash
tee_args=()
while [[ $# > 0 && $1 != -- ]]; do
tee_args=("${tee_args[#]}" "$1")
shift
done
shift
# now ${tee_args[*]} has the arguments before --,
# and $* has the arguments after --
# redirect standard out through a pipe to tee
exec | tee "${tee_args[#]}"
# do the *real* exec of the desired program
exec "$#"
(pipefail and $PIPESTATUS are nice, but I recall them being introduced in 3.1 or thereabouts.)
This is what I consider to be the best pure-Bourne-shell solution to use as the base upon which you could build your "eet":
# You want to pipe command1 through command2:
exec 4>&1
exitstatus=`{ { command1; echo $? 1>&3; } | command2 1>&4; } 3>&1`
# $exitstatus now has command1's exit status.
I think this is best explained from the inside out – command1 will execute and print its regular output on stdout (file descriptor 1), then once it's done, echo will execute and print command1's exit code on its stdout, but that stdout is redirected to file descriptor three.
While command1 is running, its stdout is being piped to command2 (echo's output never makes it to command2 because we send it to file descriptor 3 instead of 1, which is what the pipe reads). Then we redirect command2's output to file descriptor 4, so that it also stays out of file descriptor one – because we want file descriptor one clear for when we bring the echo output on file descriptor three back down into file descriptor one so that the command substitution (the backticks) can capture it.
The final bit of magic is that first exec 4>&1 we did as a separate command – it opens file descriptor four as a copy of the external shell's stdout. Command substitution will capture whatever is written on standard out from the perspective of the commands inside it – but, since command2's output is going to file descriptor four as far as the command substitution is concerned, the command substitution doesn't capture it – however, once it gets "out" of the command substitution, it is effectively still going to the script's overall file descriptor one.
(The exec 4>&1 has to be a separate command to work with many common shells. In some shells it works if you just put it on the same line as the variable assignment, after the closing backtick of the substitution.)
(I use compound commands ({ ... }) in my example, but subshells (( ... )) would also work. The subshell will just cause a redundant forking and awaiting of a child process, since each side of a pipe and the inside of a command substitution already normally implies a fork and await of a child process, and I don't know of any shell being coded to recognize that it can skip one of those forks because it's already done or is about to do the other.)
You can look at it in a less technical and more playful way, as if the outputs of the commands are leapfrogging each other: command1 pipes to command2, then the echo's output jumps over command2 so that command2 doesn't catch it, and then command2's output jumps over and out of the command substitution just as echo lands just in time to get captured by the substitution so that it ends up in the variable, and command2's output goes on its way to the standard output, just as in a normal pipe.
Also, as I understand it, at the end of this command, $? will still contain the return code of the second command in the pipe, because variable assignments, command substitutions, and compound commands are all effectively transparent to the return code of the command inside them, so the return status of command2 should get propagated out.
A caveat is that it is possible that command1 will at some point end up using file descriptors three or four, or that command2 or any of the later commands will use file descriptor four, so to be more hygienic, we would do:
exec 4>&1
exitstatus=`{ { command1 3>&-; echo $? 1>&3; } 4>&- | command2 1>&4; } 3>&1`
exec 4>&-
Commands inherit file descriptors from the process that launches them, so the entire second line will inherit file descriptor four, and the compound command followed by 3>&1 will inherit the file descriptor three. So the 4>&- makes sure that the inner compound command will not inherit file descriptor four, and the 3>&- makes sure that command1 will not inherit file descriptor three, so command1 gets a 'cleaner', more standard environment. You could also move the inner 4>&- next to the 3>&-, but I figure why not just limit its scope as much as possible.
Almost no programs uses pre-opened file descriptor three and four directly, so you almost never have to worry about it, but the latter is probably best to keep in mind and use for general-purpose cases.
{ mycommand --foo --bar 2>&1; ret=$?; } | tee -a some.log; (exit $ret)
KornShell, all in one line:
foo; RET_VAL=$?; if test ${RET_VAL} != 0;then echo $RET_VAL; echo Error occurred!>/tmp/out.err;exit 2;fi |tee >>/tmp/out.err ; if test ${RET_VAL} != 0;then exit $RET_VAL;fi
#!/bin/sh
logfile="$1"
shift
exec 2>&1
exec "$#" | tee "$logfile"
Hopefully this works for you.
Assuming Bash or Z shell (zsh),
my_command >>my_log 2>&1
N.B. The sequence of redirection and duplication of standard error onto standard output is significant!
I didn't realise you wanted to see the output on screen as well. This will of course direct all output to the file my_log.

ksh: how to probe stdin?

I want my ksh script to have different behaviors depending on whether there is something incoming through stdin or not:
(1) cat file.txt | ./script.ksh (then do "cat <&0 >./tmp.dat" and process tmp.dat)
vs. (2) ./script.ksh (then process $1 which must be a readable regular file)
Checking for stdin to see if it is a terminal[ -t 0 ] is not helpful, because my script is called from an other script.
Doing "cat <&0 >./tmp.dat" to check tmp.dat's size hangs up waiting for an EOF from stdin if stdin is "empty" (2nd case).
How to just check if stdin is "empty" or not?!
EDIT: You are running on HP-UX
Tested [ -t 0 ] on HP-UX and it appears to be working for me. I have used the following setup:
/tmp/x.ksh:
#!/bin/ksh
/tmp/y.ksh
/tmp/y.ksh:
#!/bin/ksh
test -t 0 && echo "terminal!"
Running /tmp/x.ksh prints: terminal!
Could you confirm the above on your platform, and/or provide an alternate test setup more closely reflecting your situation? Is your script ultimately spawned by cron?
EDIT 2
If desperate, and if Perl is available, define:
stdin_ready() {
TIMEOUT=$1; shift
perl -e '
my $rin = "";
vec($rin,fileno(STDIN),1) = 1;
select($rout=$rin, undef, undef, '$TIMEOUT') < 1 && exit 1;
'
}
stdin_ready 1 || 'stdin not ready in 1 second, assuming terminal'
EDIT 3
Please note that the timeout may need to be significant if your input comes from sort, ssh etc. (all these programs can spawn and establish the pipe with your script seconds or minutes before producing any data over it.) Also, using a hefty timeout may dramatically penalize your script when there is nothing on the input to begin with (e.g. terminal.)
If potentially large timeouts are a problem, and if you can influence the way in which your script is called, then you may want to force the callers to explicitly instruct your program whether stdin should be used, via a custom option or in the standard GNU or tar manner (e.g. script [options [--]] FILE ..., where FILE can be a file name, a - to denote standard input, or a combination thereof, and your script would only read from standard input if - were passed in as a parameter.)
This strategy works for bash, and would likely work for ksh. Poll 'tty':
#!/bin/bash
set -a
if [ "$( tty )" == 'not a tty' ]
then
STDIN_DATA_PRESENT=1
else
STDIN_DATA_PRESENT=0
fi
if [ ${STDIN_DATA_PRESENT} -eq 1 ]
then
echo "Input was found."
else
echo "Input was not found."
fi
Why not solve this in a more traditional way, and use the command line argument to indicate that the data will be coming from stdin?
For an example, consider the difference between:
echo foo | cat -
and
echo foo > /tmp/test.txt
cat /tmp/test.txt

Resources