Accepting command line parameters in awk - unix

I've looked around for awhile and found only either questions touching on the subject or providing me with an answer that does not work. Here's the question:
I'm working on an assignment for school that requires me to read in command line arguments for an awk script (which seems odd to begin with, but eh). We're using an older version of Unix and I'm running Bash. This awk only has the -f and -Fc options. Basically, I keep trying to do "awk -f awk_script arg1 arg2 arg3 arg4 arg5 arg6" but each time awk attempts to open arg1 as a file, which it isn't. An example I saw elsewhere addressing this was:
awk 'BEGIN { print "ARGV[1] = ", ARGV[1] }' foo bar
It was supposed to print "foo", but on this system I only get the output "ARGV[1] = awk: can't open foo". So, in summary, is there any way around this? Can an awk this old read command line arguments and use them for anything other than input files? The instructors notes file hinted at the above usage (of printing foo), but his program doesn't even run, so...
Any help would be greatly appreciated.
After Edit: Using SunOS 5.10 and this awk does not support the -v option, ONLY the -f and -Fc

You can decrement ARGC after reading arguments so that only the first(s) argument(s) is(are) considered by awk as input file(s) :
#!/bin/awk -f
BEGIN {
for (i=ARGC; i>2; i--) {
print ARGV[ARGC-1];
ARGC--;
}
}
…
Or alternatively, you can reset ARGC after having read all arguments :
#!/bin/awk -f
BEGIN {
for (i=0; i<ARGC; i++) {
print ARGV[ARGC-1];
}
ARGC=2;
}
…
Both methods will correctly process myawkscript.awk foobar foo bar … as if foobar was the only file to process (of course you can set ARGC to 3 if you want the two first arguments as files, etc.). In your particular case, it seems you don't want to process any file, so you would set ARGC to 1.

Use nawk or /usr/xpg4/bin/awk. These are newer versions of awk that support more features.
Alternatively, you can install another version of awk like mawk or GNU awk.

A possible work around - maybe not acceptable - would be to use the -v option of awk.
awk -v arg1=foo 'BEGIN { print arg1; }'

Related

Check if a string in one file exists in another in unix

I have a file that contains the version name and version number. The contents of the first file looks as-
File1-
<Line contains the name of product1>
package_name0_9_8 >= 1.2.3x-4.5.6
package_name0_9_8-32bit >= 3.6.1g-3.5.1
package_name0_9_8-xx >= 6.3.2v-3.0.4
<Line contains the name of product2>
anotherpackage_name0_9_8 >= 3.5.6u-3.6.5
And,
File2.xml-
<package name="package_name0_9_8" version="1.2.3x-4.4.4"/>
<package name="package_name0_9_8-32bit" version="3.6.1g-3.4.0"/>
.
.
Is there a way to check the existance of package_name that is present in File1 with the package_name of File2 and check if the corresponding version of package_name in File1 with that of corresponding version of package_name of File2?
I am frank that I am pretty much weak in concatenating the 'grep' and 'awk' commands along with options to be used here. Please help out.
for a in $(sed -n '/>=/p' File1.txt | grep -o '^[^ ]*'); do for b in $(sed -n "/^$a /{s/.*>=\(.*\)$/\1/p}" File1.txt); do ((! $(grep -c "$a.*$b" File2.txt))) && (echo "$a $b" >> missing_pkgs.txt); done; done;
this is a quick one liner - you could print it out a bit prettier
the way this works is nested for loop that grabs both pieces separate into variables (you could do that with read and put them in on one loop if you want) and then just counts the occurences in the second file with grep and whenever there is a count of zero it will reverse the value making the test (()) turn true and echo the missing packages to the file missing_pkgs.txt
here is another quick one liner that does the same thing except more efficient with one loop and variables loaded via read
while read each; do read a b < <(echo $each) && ((! $(grep -c "$a.*$b" File2.txt))) && (echo "$a $b" >> missing_pkgs.txt); done < <(awk '/>=/{ print $1" "$3 }' File1.txt)
more simplified:
while read a b; do ((! $(grep -c "$a.*$b" File2.txt))) && (echo "$a $b" >> missing_pkgs.txt); done < <(awk '/>=/{ print $1" "$3 }' File1.txt)
sed -n 's².*²s#<package name="\\(&"/>#\\1 Present#p²;s/ *>= */\\)" *version="/p' File1 > /tmp/File1.sed
sed -n -f /tmp/File1.sed File2
rm /tmp/File1.sed
not in on instruction like awk could do, but do the job (posix version so --posix on GNU sed
you could change the output message that is the \\1 Present text where \\1 will the be the package name (with few modification, version could also be used)
It looks like you already got a much shorter solution in a format closer to what you desired. However, since I asked if a Python solution would work, and you said yes, check out the code here:
http://pastebin.com/F5LYrmea
(I haven't debugged it more than a little, but it seems to work on at least a little more than your example files. I released the code to the public domain. CC-BY-SA isn't a software license, according to the makers of CC; so, that's why I didn't post it here, as posting it here would give it that license. Plus, you get syntax highlighting specific to Python at the link provided.)
Basically, it's a lot of complicated text parsing. Not much of an algorithm to explain. It gets the contents of both files, strips out the packages, their versions and the operands (puts all those in a dictionary for use later), and loops through lines of the other file and compares versions; then it tells you which ones match and which ones don't.

Unix awk what does f=1 do

So right now I am read some unix script and I am not quite sure if it does what I think it does.
foo=/some/directory/rules
awk '/^test=/ { print "test="foo; f=1 }
f==0 { print }
{ f=0 }' \
/some/other/directory/file
My guess is that this should not run ( because of "test="foo ), but it does - so I think the intend is that if a line out of '/some/other/directory/file' matches it gets written to '/some/directory/rules' with the praefix "test=", other wise it just gets printed on the console?
I am unable to find something in the man pages and the examples also don't use the f=? syntax
Get the book Effective Awk Programming, Third Edition by Arnold Robbins as you currently are very confused about awk syntax. The intent of the script you posted is to print the contents of /some/other/directory/file to stdout, except when a line starts with test= and in that case replace that line with one that says test=/some/directory/rules.
The more awk-ish way to write that would simply be:
foo=/some/directory/rules
awk -v foo="$foo" '{print (/^test=/ ? "test="foo : $0)}' /some/other/directory/file

awk getline not accepting external variable from a file

I have a file test.sh from which I am executing the following awk command.
awk -f x.awk < result/output.txt >>difference.txt
x.awk
while (getline < result/$bld/$DeviceType)
the variable DeviceType and bld are available in test.sh.
I have declared them as export type.
export DeviceType=$line
Even then while executing test.sh file, the script stops at following line
awk -f x.awk < result/output.txt >>difference.txt
and I am getting
awk: x.awk:4: (FILENAME=- FNR=116) fatal: division by zero attempted
error.
The awk script is read by awk, not touched by the shell. Inside an awk script, $bld means 'the field designated by the number in the variable bld' (that's the awk variable bld).
You can set awk variables on the command line (officially with the -v option):
awk -v bld="$bld" -v dev="$DeviceType" -f x.awk < result/output.txt >> difference.txt
Whether that does what you want is still debatable. Most likely you need x.awk to contain something like:
BEGIN { file = sprintf("result/%s/%s", bld, dev); }
{ while ((getline < file) > 0) print }
awk is not shell just like C is not shell. You should not expect to be able to access shell variables within an awk program any more than you can access shell variables within a C program.
To pass the VALUE of shell variables to an awk script, see http://cfajohnson.com/shell/cus-faq-2.html#Q24 for details but essentially:
awk -v awkvar="$shellvar" '{ ... use awkvar ...}'
is usually the right approach.
Having said that, whatever you're trying to do it looks like the wrong approach. If you are considering using getline, make sure to read http://awk.freeshell.org/AllAboutGetline first and understand all of the caveats but if you tell us what it is you're trying to do with sample input and expected output we can almost certainly help you come up with a better approach that has nothing to do with getline.

Breaking out of "tail -f" that's being read by a "while read" loop in HP-UX

I'm trying to write a (sh -bourne shell) script that processes lines as they are written to a file. I'm attempting to do this by feeding the output of tail -f into a while read loop. This tactic seems to be proper based on my research in Google as well as this question dealing with a similar issue, but using bash.
From what I've read, it seems that I should be able to break out of the loop when the file being followed ceases to exist. It doesn't. In fact, it seems the only way I can break out of this is to kill the process in another session. tail does seem to be working fine otherwise as testing with this:
touch file
tail -f file | while read line
do
echo $line
done
Data I append to file in another session appears just file from the loop processing written above.
This is on HP-UX version B.11.23.
Thanks for any help/insight you can provide!
If you want to break out, when your file does not exist any more, just do it:
test -f file || break
Placing this in your loop, should break out.
The remaining problem is, how to break the read line, as this is blocking.
This could you do by applying a timeout, like read -t 5 line. Then every 5 second the read returns, and in case the file does not longer exist, the loop will break. Attention: Create your loop that it can handle the case, that the read times out, but the file is still present.
EDIT: Seems that with timeout read returns false, so you could combine the test with the timeout, the result would be:
tail -f test.file | while read -t 3 line || test -f test.file; do
some stuff with $line
done
I don't know about HP-UX tail but GNU tail has the --follow=name option which will follow the file by name (by re-opening the file every few seconds instead of reading from the same file descriptor which will not detect if the file is unlinked) and will exit when the filename used to open the file is unlinked:
tail --follow=name test.txt
Unless you're using GNU tail, there is no way it'll terminate of its own accord when following a file. The -f option is really only meant for interactive monitoring--indeed, I have a book that says that -f "is unlikely to be of use in shell scripts".
But for a solution to the problem, I'm not wholly sure this isn't an over-engineered way to do it, but I figured you could send the tail to a FIFO, then have a function or script that checked the file for existence and killed off the tail if it'd been unlinked.
#!/bin/sh
sentinel ()
{
while true
do
if [ ! -e $1 ]
then
kill $2
rm /tmp/$1
break
fi
done
}
touch $1
mkfifo /tmp/$1
tail -f $1 >/tmp/$1 &
sentinel $1 $! &
cat /tmp/$1 | while read line
do
echo $line
done
Did some naïve testing, and it seems to work okay, and not leave any garbage lying around.
I've never been happy with this answer but I have not found an alternative either:
kill $(ps -o pid,cmd --no-headers --ppid $$ | grep tail | awk '{print $1}')
Get all processes that are children of the current process, look for the tail, print out the first column (tail's pid), and kill it. Sin-freaking-ugly indeed, such is life.
The following approach backgrounds the tail -f file command, echos its process id plus a custom string prefix (here tailpid: ) to the while loop where the line with the custom string prefix triggers another (backgrounded) while loop that every 5 seconds checks if file is still existing. If not, tail -f file gets killed and the subshell containing the backgrounded while loop exits.
# cf. "The Heirloom Bourne Shell",
# http://heirloom.sourceforge.net/sh.html,
# http://sourceforge.net/projects/heirloom/files/heirloom-sh/ and
# http://freecode.com/projects/bournesh
/usr/local/bin/bournesh -c '
touch file
(tail -f file & echo "tailpid: ${!}" ) | while IFS="" read -r line
do
case "$line" in
tailpid:*) while sleep 5; do
#echo hello;
if [ ! -f file ]; then
IFS=" "; set -- ${line}
kill -HUP "$2"
exit
fi
done &
continue ;;
esac
echo "$line"
done
echo exiting ...
'

Is there a Unix utility to prepend timestamps to stdin?

I ended up writing a quick little script for this in Python, but I was wondering if there was a utility you could feed text into which would prepend each line with some text -- in my specific case, a timestamp. Ideally, the use would be something like:
cat somefile.txt | prepend-timestamp
(Before you answer sed, I tried this:
cat somefile.txt | sed "s/^/`date`/"
But that only evaluates the date command once when sed is executed, so the same timestamp is incorrectly prepended to each line.)
ts from moreutils will prepend a timestamp to every line of input you give it. You can format it using strftime too.
$ echo 'foo bar baz' | ts
Mar 21 18:07:28 foo bar baz
$ echo 'blah blah blah' | ts '%F %T'
2012-03-21 18:07:30 blah blah blah
$
To install it:
sudo apt-get install moreutils
Could try using awk:
<command> | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; fflush(); }'
You may need to make sure that <command> produces line buffered output, i.e. it flushes its output stream after each line; the timestamp awk adds will be the time that the end of the line appeared on its input pipe.
If awk shows errors, then try gawk instead.
annotate, available via that link or as annotate-output in the Debian devscripts package.
$ echo -e "a\nb\nc" > lines
$ annotate-output cat lines
17:00:47 I: Started cat lines
17:00:47 O: a
17:00:47 O: b
17:00:47 O: c
17:00:47 I: Finished with exitcode 0
Distilling the given answers to the simplest one possible:
unbuffer $COMMAND | ts
On Ubuntu, they come from the expect-dev and moreutils packages.
sudo apt-get install expect-dev moreutils
How about this?
cat somefile.txt | perl -pne 'print scalar(localtime()), " ";'
Judging from your desire to get live timestamps, maybe you want to do live updating on a log file or something? Maybe
tail -f /path/to/log | perl -pne 'print scalar(localtime()), " ";' > /path/to/log-with-timestamps
Kieron's answer is the best one so far. If you have problems because the first program is buffering its out you can use the unbuffer program:
unbuffer <command> | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; }'
It's installed by default on most linux systems. If you need to build it yourself it is part of the expect package
http://expect.nist.gov
Just gonna throw this out there: there are a pair of utilities in daemontools called tai64n and tai64nlocal that are made for prepending timestamps to log messages.
Example:
cat file | tai64n | tai64nlocal
Use the read(1) command to read one line at a time from standard input, then output the line prepended with the date in the format of your choosing using date(1).
$ cat timestamp
#!/bin/sh
while read line
do
echo `date` $line
done
$ cat somefile.txt | ./timestamp
I'm not an Unix guy, but I think you can use
gawk '{print strftime("%d/%m/%y",systime()) $0 }' < somefile.txt
#! /bin/sh
unbuffer "$#" | perl -e '
use Time::HiRes (gettimeofday);
while(<>) {
($s,$ms) = gettimeofday();
print $s . "." . $ms . " " . $_;
}'
$ cat somefile.txt | sed "s/^/`date`/"
you can do this (with gnu/sed):
$ some-command | sed "x;s/.*/date +%T/e;G;s/\n/ /g"
example:
$ { echo 'line1'; sleep 2; echo 'line2'; } | sed "x;s/.*/date +%T/e;G;s/\n/ /g"
20:24:22 line1
20:24:24 line2
of course, you can use other options of the program date. just replace date +%T with what you need.
Here's my awk solution (from a Windows/XP system with MKS Tools installed in the C:\bin directory). It is designed to add the current date and time in the form mm/dd hh:mm to the beginning of each line having fetched that timestamp from the system as each line is read. You could, of course, use the BEGIN pattern to fetch the timestamp once and add that timestamp to each record (all the same). I did this to tag a log file that was being generated to stdout with the timestamp at the time the log message was generated.
/"pattern"/ "C\:\\\\bin\\\\date '+%m/%d %R'" | getline timestamp;
print timestamp, $0;
where "pattern" is a string or regex (without the quotes) to be matched in the input line, and is optional if you wish to match all input lines.
This should work on Linux/UNIX systems as well, just get rid of the C\:\\bin\\ leaving the line
"date '+%m/%d %R'" | getline timestamp;
This, of course, assumes that the command "date" gets you to the standard Linux/UNIX date display/set command without specific path information (that is, your environment PATH variable is correctly configured).
Mixing some answers above from natevw and Frank Ch. Eigler.
It has milliseconds, performs better than calling a external date command each time and perl can be found in most of the servers.
tail -f log | perl -pne '
use Time::HiRes (gettimeofday);
use POSIX qw(strftime);
($s,$ms) = gettimeofday();
print strftime "%Y-%m-%dT%H:%M:%S+$ms ", gmtime($s);
'
Alternative version with flush and read in a loop:
tail -f log | perl -pne '
use Time::HiRes (gettimeofday); use POSIX qw(strftime);
$|=1;
while(<>) {
($s,$ms) = gettimeofday();
print strftime "%Y-%m-%dT%H:%M:%S+$ms $_", gmtime($s);
}'
caerwyn's answer can be run as a subroutine, which would prevent the new processes per line:
timestamp(){
while read line
do
echo `date` $line
done
}
echo testing 123 |timestamp
Disclaimer: the solution I am proposing is not a Unix built-in utility.
I faced a similar problem a few days ago. I did not like the syntax and limitations of the solutions above, so I quickly put together a program in Go to do the job for me.
You can check the tool here: preftime
There are prebuilt executables for Linux, MacOS, and Windows in the Releases section of the GitHub project.
The tool handles incomplete output lines and has (from my point of view) a more compact syntax.
<command> | preftime
It's not ideal, but I though I'd share it in case it helps someone.
The other answers mostly work, but have some drawbacks. In particular:
Many require installing a command not commonly found on linux systems, which may not be possible or convenient.
Since they use pipes, they don't put timestamps on stderr, and lose the exit status.
If you use multiple pipes for stderr and stdout, then some do not have atomic printing, leading to intermingled lines of output like [timestamp] [timestamp] stdout line \nstderr line
Buffering can cause problems, and unbuffer requires an extra dependency.
To solve (4), we can use stdbuf -i0 -o0 -e0 which is generally available on most linux systems (see How to make output of any shell command unbuffered?).
To solve (3), you just need to be careful to print the entire line at a time.
Bad: ruby -pe 'print Time.now.strftime(\"[%Y-%m-%d %H:%M:%S] \")' (Prints the timestamp, then prints the contents of $_.)
Good: ruby -pe '\$_ = Time.now.strftime(\"[%Y-%m-%d %H:%M:%S] \") + \$_' (Alters $_, then prints it.)
To solve (2), we need to use multiple pipes and save the exit status:
alias tslines-pipe="stdbuf -i0 -o0 ruby -pe '\$_ = Time.now.strftime(\"[%Y-%m-%d %H:%M:%S] \") + \$_'"
function tslines() (
stdbuf -o0 -e0 "$#" 2> >(tslines-pipe) > >(tslines-pipe)
status="$?"
exit $status
)
Then you can run a command with tslines some command --options.
This almost works, except sometimes one of the pipes takes slightly longer to exit and the tslines function has exited, so the next prompt has printed. For example, this command seems to print all the output after the prompt for the next line has appeared, which can be a bit confusing:
tslines bash -c '(for (( i=1; i<=20; i++ )); do echo stderr 1>&2; echo stdout; done)'
There needs to be some coordination method between the two pipe processes and the tslines function. There are presumably many ways to do this. One way I found is to have the pipes send some lines to a pipe that the main function can listen to, and only exit after it's received data from both pipe handlers. Putting that together:
alias tslines-pipe="stdbuf -i0 -o0 ruby -pe '\$_ = Time.now.strftime(\"[%Y-%m-%d %H:%M:%S] \") + \$_'"
function tslines() (
# Pick a random name for the pipe to prevent collisions.
pipe="/tmp/pipe-$RANDOM"
# Ensure the pipe gets deleted when the method exits.
trap "rm -f $pipe" EXIT
# Create the pipe. See https://www.linuxjournal.com/content/using-named-pipes-fifos-bash
mkfifo "$pipe"
# echo will block until the pipe is read.
stdbuf -o0 -e0 "$#" 2> >(tslines-pipe; echo "done" >> $pipe) > >(tslines-pipe; echo "done" >> $pipe)
status="$?"
# Wait until we've received data from both pipe commands before exiting.
linecount=0
while [[ $linecount -lt 2 ]]; do
read line
if [[ "$line" == "done" ]]; then
((linecount++))
fi
done < "$pipe"
exit $status
)
That synchronization mechanism feels a bit convoluted; hopefully there's a simpler way to do it.
doing it with date and tr and xargs on OSX:
alias predate="xargs -I{} sh -c 'date +\"%Y-%m-%d %H:%M:%S\" | tr \"\n\" \" \"; echo \"{}\"'"
<command> | predate
if you want milliseconds:
alias predate="xargs -I{} sh -c 'date +\"%Y-%m-%d %H:%M:%S.%3N\" | tr \"\n\" \" \"; echo \"{}\"'"
but note that on OSX, date doesn't give you the %N option, so you'll need to install gdate (brew install coreutils) and so finally arrive at this:
alias predate="xargs -I{} sh -c 'gdate +\"%Y-%m-%d %H:%M:%S.%3N\" | tr \"\n\" \" \"; echo \"{}\"'"
No need to specify all the parameters in strftime() unless you really want to customize the outputting format :
echo "abc 123 xyz\njan 765 feb" \
\
| gawk -Sbe 'BEGIN {_=strftime()" "} sub("^",_)'
Sat Apr 9 13:14:53 EDT 2022 abc 123 xyz
Sat Apr 9 13:14:53 EDT 2022 jan 765 feb
works the same if you have mawk 1.3.4. Even on awk-variants without the time features, a quick getline could emulate it :
echo "abc 123 xyz\njan 765 feb" \
\
| mawk2 'BEGIN { (__="date")|getline _;
close(__)
_=_" " } sub("^",_)'
Sat Apr 9 13:19:38 EDT 2022 abc 123 xyz
Sat Apr 9 13:19:38 EDT 2022 jan 765 feb
If you wanna skip all that getline and BEGIN { }, then something like this :
mawk2 'sub("^",_" ")' \_="$(date)"
If the value you are prepending is the same on every line, fire up emacs with the file, then:
Ctrl + <space>
at the beginning of the of the file (to mark that spot), then scroll down to the beginning of the last line (Alt + > will go to the end of file... which probably will involve the Shift key too, then Ctrl + a to go to the beginning of that line) and:
Ctrl + x r t
Which is the command to insert at the rectangle you just specified (a rectangle of 0 width).
2008-8-21 6:45PM <enter>
Or whatever you want to prepend... then you will see that text prepended to every line within the 0 width rectangle.
UPDATE: I just realized you don't want the SAME date, so this won't work... though you may be able to do this in emacs with a slightly more complicated custom macro, but still, this kind of rectangle editing is pretty nice to know about...

Resources