Unable to use -C of grep in Unix Shell Script - unix

I am able to use grep in normal command line.
grep "ABC" Filename -C4
This is giving me the desired output which is 4 lines above and below the matched pattern line.
But if I use the same command in a Unix shell script, I am unable to grep the lines above and below the pattern. It is giving me output as the only lines where pattern is matched and an error in the end that cannot says cannot open grep : -C4
The results are similar if I use -A4 and -B4

I'll assume you need a portable POSIX solution without the GNU extensions (-C NUM, -A NUM, and -B NUM are all GNU, as are arguments following the pattern and/or file name).
POSIX grep can't do this, but POSIX awk can. This can be invoked as e.g. grepC -C4 "ABC" Filename (assuming it is named "grepC", is executable, and is in your $PATH):
#!/bin/sh
die() { echo "$*\nUsage: $0 [-C NUMBER] PATTERN [FILE]..." >&2; exit 2; }
CONTEXT=0 # default value
case $1 in
-C ) CONTEXT="$2"; shift 2 ;; # extract "4" from "-C 4"
-C* ) CONTEXT="${1#-C}"; shift ;; # extract "4" from "-C4"
--|-) shift ;; # no args or use std input (implicit)
-* ) [ -f "$1" ] || die "Illegal option '$1'" ;; # non-option non-file
esac
[ "$CONTEXT" -ge 0 ] 2>/dev/null || die "Invalid context '$CONTEXT'"
[ "$#" = 0 ] && die "Missing PATTERN"
PATTERN="$1"
shift
awk '
/'"$PATTERN"'/ {
match='$CONTEXT'
for(i=1; i<=CONTEXT; i++) if(NR>i) print last[i];
print
next
}
match { print; match-- }
{ for(i='$CONTEXT'; i>1; i--) last[i] = last[i-1]; last[1] = $0 }
' "$#"
This sets up die as a fatal error function, then finds the desired lines of context from your arguments (either -C NUMBER or -CNUMBER), with an error for unsupported options (unless they're files).
If the context is not a number or there is no pattern, we again fatally error out.
Otherwise, we save the pattern, shift it away, and reserve the rest of the options for handing to awk as files ("$#").
There are three stanzas in this awk call:
Match the pattern itself. This requires ending the single-quote portion of the string in order to incorporate the $PATTERN variable (which may not behave correctly if imported via awk -v). Upon that match, we store the number of lines of context into the match variable, loop through the previous lines saved in the last hash (if we've gone far enough to have had them), and print them. We then skip to the next line without evaluating the other two stanzas.
If there was a match, we need the next few lines for context. As this stanza prints them, it decrements the counter. A new match (previous stanza) will reset that count.
We need to save previous lines for recalling upon a match. This loops through the number of lines of context we care about and stores them in the last hash. The current line ($0) is stored in last[1].

Related

Loop over environment variables in POSIX sh

I need to loop over environment variables and get their names and values in POSIX sh (not bash). This is what I have so far.
#!/usr/bin/env sh
# Loop over each line from the env command
while read -r line; do
# Get the string before = (the var name)
name="${line%=*}"
eval value="\$$name"
echo "name: ${name}, value: ${value}"
done <<EOF
$(env)
EOF
It works most of the time, except when an environment variable contains a newline. I need it to work in that case.
I am aware of the -0 flag for env that separates variables with nul instead of newlines, but if I use that flag, how do I loop over each variable? Edit: #chepner pointed out that POSIX env doesn't support -0, so that's out.
Any solution that uses portable linux utilities is good as long as it works in POSIX sh.
There is no way to parse the output of env with complete confidence; consider this output:
bar=3
baz=9
I can produce that with two different environments:
$ env -i "bar=3" "baz=9"
bar=3
baz=9
$ env -i "bar=3
> baz=9"
bar=3
baz=9
Is that two environment variables, bar and baz, with simple numeric values, or is it one variable bar with the value $'3\nbaz=9' (to use bash's ANSI quoting style)?
You can safely access the environment with POSIX awk, however, using the ENVIRON array. For example:
awk 'END { for (name in ENVIRON) {
print "Name is "name;
print "Value is "ENVIRON[name];
}
}' < /dev/null
With this command, you can distinguish between the two environments mentioned above.
$ env -i "bar=3" "baz=9" awk 'END { for (name in ENVIRON) { print "Name is "name; print "Value is "ENVIRON[name]; }}' < /dev/null
Name is baz
Value is 9
Name is bar
Value is 3
$ env -i "bar=3
> baz=9" awk 'END { for (name in ENVIRON) { print "Name is "name; print "Value is "ENVIRON[name]; }}' < /dev/null
Name is bar
Value is 3
baz=9
Maybe this would work?
#!/usr/bin/env sh
env | while IFS= read -r line
do
name="${line%%=*}"
indirect_presence="$(eval echo "\${$name+x}")"
[ -z "$name" ] || [ -z "$indirect_presence" ] || echo "name:$name, value:$(eval echo "\$$name")"
done
It is not bullet-proof, as if the value of a variable with a newline happens to have a line beginning that looks like an assignment, it could be somewhat confused.
The expansion uses %% to remove the longest match, so if a line contains several = signs, they should all be removed to leave only the variable name from the beginning of the line.
Here an example based on the awk approach:
#!/bin/sh
for NAME in $(awk "END { for (name in ENVIRON) { print name; }}" < /dev/null)
do
VAL="$(awk "END { printf ENVIRON[\"$NAME\"]; }" < /dev/null)"
echo "$NAME=$VAL"
done

How to use awk for multiple file search in two directories, print records only from files with matching string in second directory

Remade a previous question so that it is more clear. I'm trying to search files in two directories and print matching character strings (+ line immediately following) into a new file from the second directory only if they match a record in the first directory. I have found similar examples but nothing quite the same. I don't know how to use awk for multiple files from different directories and I've tortured myself trying to figure it out.
Directory 1, 28,000 files, formatted viz.:
>ABC
KLSDFIOUWERMSDFLKSJDFKLSJDSFKGHGJSNDKMVMFHKSDJFS
>GHI
OOILKJSDFKJSDFLMOPIWERIOUEWIRWIOEHKJTSDGHLKSJDHGUIYIUSDVNSDG
Directory 2, 15 files, formatted viz.:
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>DEF
12341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
Desired output:
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
Directories 1 and 2 are located in my home directory: (./Test1 & ./Test2)
If anyone could advise command to specific the different directories, I'd be immensely grateful! Currently when I include file path (e.g., /Test1/*.fa) I get the following error:
awk: can't open file /Test1/*.fa
You'll want something like this (untested):
awk '
FNR==1 {
dirname = FILENAME
sub("/.*","",dirname)
if (NR==1) {
dirname1 = dirname
}
}
dirname == dirname1 {
if (FNR % 2) {
key = $0
}
else {
map[key] = $0
}
next
}
(FNR % 2) && ($0 in map) && !seen[$0,map[$0]]++ {
print $0 ORS map[$0]
}
' Test1/* Test2/*
Given you're getting the error message /usr/bin/awk: Argument list too long which means you're exceeding your shells maximum argument length for a command and that 28,000 of your files are in the Test1 directory, try this:
find Test1 -type f -exec cat {} \; |
awk '
NR == FNR {
if (FNR % 2) {
key = $0
}
else {
map[key] = $0
}
next
}
(FNR % 2) && ($0 in map) && !seen[$0,map[$0]]++ {
print $0 ORS map[$0]
}
' - Test2/*
Solution in TXR:
Data:
$ ls dir*
dir1:
file1 file2
dir2:
file1 file2
$ cat dir1/file1
>ABC
KLSDFIOUWERMSDFLKSJDFKLSJDSFKGHGJSNDKMVMFHKSDJFS
>GHI
OOILKJSDFKJSDFLMOPIWERIOUEWIRWIOEHKJTSDGHLKSJDHGUIYIUSDVNSDG
$ cat dir1/file2
>XYZ
SDOIWEUROIUOIWUEROIWUEROIWUEROIWUEROUIEIDIDIIDFIFI
>MNO
OOIWEPOIUWERHJSDHSDFJSHDF
$ cat dir2/file1
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>DEF
12341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
$ cat dir2/file2
>STP
12341234123412341234123412341234123412341234123412341234123412341234123412341234
>MNO
123412341234123412341234123412341234123412341234123412341234123412341234
$
Run:
$ txr filter.txr dir1/* dir2/*
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
>MNO
123412341234123412341234123412341234123412341234123412341234123412341234
Code in filter.txr:
#(bind want #(hash :equal-based))
#(next :args)
#(all)
#dir/#(skip)
#(and)
# (repeat :gap 0)
#dir/#file
# (next `#dir/#file`)
# (repeat)
>#key
# (do (set [want key] t))
# (end)
# (end)
#(end)
#(repeat)
#path
# (next path)
# (repeat)
>#key
#datum
# (require [want key])
# (output)
>#key
#datum
# (end)
# (end)
#(end)
To separate the dir1 paths from the rest, we use an #(all) match (try multiple pattern branches, which must all match) with two branches. The first branch matches one #dir/#(skip) pattern, binding the variable dir to text that is preceded by a slash, and ignore the rest. The second branch matches a whole consecutive sequence of #dir/#file patterns via #(repeat :gap 0). Because the same dir variable appears that already has a binding from the first branch of the all, this constrains the matches to the same directory name. Inside this repeat we recurse into each file via next and gather the >-delimited keys into the keep hash. After that, we process the remaining arguments as path names of files to process; they don't all have to be in the same directory. We scan through each one for the >#key pattern followed by a line of #datum. The #(require ...) directive will fail the match if key is not in the wanted hash, otherwise we fall through to the #(output).

How to log data of a call

I want to log data of asterisk command line. But the criteria is I want log data for calls separately, i.e. I want to log data for each call in separate file.
Is there a way to do that?
In case there is no inbuild feature in asterisk to do this, here is a bash solution:
#!/bin/bash
echo "0" >/tmp/numberoflines
IFS=''
pathToLogFile = /path/to/log/file
while [ 1 ]
do
NUMBER=$(cat /tmp/numberoflines)
LINECOUNT=$(wc -l < $pathToLogFile)
DIFFERENCE=$(($LINECOUNT-$NUMBER))
if [ $DIFFERENCE != 0 ]; then
lines=($(tail -n $DIFFERENCE $pathToLogFile))
for line in $lines; do
callID = `expr "$line" : 'CALLID_REGEX (see below)'`
$(echo "$line" >> /path/to/log/directory/$callID)
done
fi
sleep 5;
echo "$LINECOUNT" >/tmp/numberoflines
done
untested
it should be used to get ab idea to solve this problem.
the regular expression: normaly: /\[(C\d{8})\]/. sadly I don't know the syntax in bash. I'm sorry. you have to convert it by yourself into bash-syntax.
The idea is: remember the last line in the logfile that was processed by the bash script. check the line count of the log file. if there are more lines then the remembered line: walk through the new lines and extract the call id at the beginning of each line (format: C******** (* are numbers). in words: a C followed by a number with 8 digits). now append the whole line at the end of a log file. the name of the file is the extracted callid.
EDIT Information about the call id (don't mistake it with the caller id): https://wiki.asterisk.org/wiki/display/AST/Unique+Call-ID+Logging

How to get the search count for a particular string from each and every line in a file using Unix?

I am trying to search for a particular string in a Unix file from each and every line and error out those records. Can someone let me how can I improve my code which is as below. Also please share your thoughts if you have a better solution.
v_filename=$1;
v_new_file="new_file";
v_error_file="error_file";
echo "The input file name is $var1"
while read line
do
echo "Testing $line"
v_cnt_check=`grep ',' $line | wc -l`
echo "Testing $v_cnt_check"
# if [ $v_cnt_check > 2 ]; then
# echo $line >> $v_error_file
# else
# echo $line >> $v_new_file
# fi
done < $v_filename
Input:
1,2,3
1,2,3,4
1,2,3
Output:
(New file)
1,2,3
1,2,3
(Error file)
1,2,3,4
awk -F ',' -v new_file="$v_new_file" -v err_file="$v_error_file" \
'BEGIN { OFS="," }
NF == 3 { print >new_file }
NF != 3 { print >err_file }' $v_filename
The first line sets the file name variables and sets the field separator to comma. The second line sets the output field separator to comma too. The third line prints lines with 3 fields to the new file; the fourth line prints lines with other than 3 fields to the error file.
Note that your code would be excruciatingly slow on big files because it executes two processes per line. This code has only one process operating on the whole file — which will be really important if the input grow to thousand or millions or more lines.
From the grep manpage:
General Output Control
-c, --count
Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-
matching lines. (-c is specified by POSIX.)
You could do something like:
grep --count "your pattern" v_filename
to get the number of occurrences. If you just want the number of lines with your pattern, replace the grep shown above with:
grep "your pattern" v_filename | wc -l

Quoting command-line arguments in shell scripts

The following shell script takes a list of arguments, turns Unix paths into WINE/Windows paths and invokes the given executable under WINE.
#! /bin/sh
if [ "${1+set}" != "set" ]
then
echo "Usage; winewrap EXEC [ARGS...]"
exit 1
fi
EXEC="$1"
shift
ARGS=""
for p in "$#";
do
if [ -e "$p" ]
then
p=$(winepath -w $p)
fi
ARGS="$ARGS '$p'"
done
CMD="wine '$EXEC' $ARGS"
echo $CMD
$CMD
However, there's something wrong with the quotation of command-line arguments.
$ winewrap '/home/chris/.wine/drive_c/Program Files/Microsoft Research/Z3-1.3.6/bin/z3.exe' -smt /tmp/smtlib3cee8b.smt
Executing: wine '/home/chris/.wine/drive_c/Program Files/Microsoft Research/Z3-1.3.6/bin/z3.exe' '-smt' 'Z: mp\smtlib3cee8b.smt'
wine: cannot find ''/home/chris/.wine/drive_c/Program'
Note that:
The path to the executable is being chopped off at the first space, even though it is single-quoted.
The literal "\t" in the last path is being transformed into a tab character.
Obviously, the quotations aren't being parsed the way I intended by the shell. How can I avoid these errors?
EDIT: The "\t" is being expanded through two levels of indirection: first, "$p" (and/or "$ARGS") is being expanded into Z:\tmp\smtlib3cee8b.smt; then, \t is being expanded into the tab character. This is (seemingly) equivalent to
Y='y\ty'
Z="z${Y}z"
echo $Z
which yields
zy\tyz
and not
zy yz
UPDATE: eval "$CMD" does the trick. The "\t" problem seems to be echo's fault: "If the first operand is -n, or if any of the operands contain a backslash ( '\' ) character, the results are implementation-defined." (POSIX specification of echo)
bash’s arrays are unportable but the only sane way to handle argument lists in shell
The number of arguments is in ${#}
Bad stuff will happen with your script if there are filenames starting with a dash in the current directory
If the last line of your script just runs a program, and there are no traps on exit, you should exec it
With that in mind
#! /bin/bash
# push ARRAY arg1 arg2 ...
# adds arg1, arg2, ... to the end of ARRAY
function push() {
local ARRAY_NAME="${1}"
shift
for ARG in "${#}"; do
eval "${ARRAY_NAME}[\${#${ARRAY_NAME}[#]}]=\${ARG}"
done
}
PROG="$(basename -- "${0}")"
if (( ${#} < 1 )); then
# Error messages should state the program name and go to stderr
echo "${PROG}: Usage: winewrap EXEC [ARGS...]" 1>&2
exit 1
fi
EXEC=("${1}")
shift
for p in "${#}"; do
if [ -e "${p}" ]; then
p="$(winepath -w -- "${p}")"
fi
push EXEC "${p}"
done
exec "${EXEC[#]}"
I you do want to have the assignment to CMD you should use
eval $CMD
instead of just $CMD in the last line of your script. This should solve your problem with spaces in the paths, I don't know what to do about the "\t" problem.
replace the last line from $CMD to just
wine '$EXEC' $ARGS
You'll note that the error is ''/home/chris/.wine/drive_c/Program' and not '/home/chris/.wine/drive_c/Program'
The single quotes are not being interpolated properly, and the string is being split by spaces.
You can try preceeding the spaces with \ like so:
/home/chris/.wine/drive_c/Program Files/Microsoft\ Research/Z3-1.3.6/bin/z3.exe
You can also do the same with your \t problem - replace it with \\t.

Resources