AWK print unexpected newline or end of string inside shell - unix

I have a shell script which is trying to trim a file from end of the line but I always get some error.
Shell Script:
AWK_EXPRESSION='{if(length>'"$RANGE1"'){ print substr('"$0 "',0, length-'"$RANGE2"'}) } else { print '"$0 "'} }'
for report in ${ACTUAL_TARGET_FOLDER}/* ; do
awk $AWK_EXPRESSION $report > $target_file
done
If I trigger the AWK command, I get unexpected newline or end of string near print.
What am I missing?

Why are you trying to store the awk body in a shell variable? Just use awk and the -v option to pass a shell value into an awk variable:
awk -v range1="$RANGE1" -v range2="$RANGE2" '{
if (length > range1) {
print substr($0,0, length-range2)
} else {
print
}
}' "$ACTUAL_TARGET_FOLDER"/* > "$target_file"
Add a few newlines to help readability.
Get out of the habit of using ALLCAPS variable names, leave those as reserved by the shell. One day you'll write PATH=something and then wonder why your script is broken.
Unquoted variables are subject to word splitting and glob expansion. Use double quotes for all your variables unless you know what specific side-effect you want to use.

I would recommend writing the AWK program using AWK variables instead of interpolating variables into it from the shell. You can pass variable into awk on the command line using the -v command line option.
Also, awk permits using white space to make the program readable, just like other programming languages. Like this:
AWK_EXPRESSION='{
if (length > RANGE1) {
print substr($0, 1, length-RANGE2)
} else {
print
}
}'
for report in "${ACTUAL_TARGET_FOLDER}"/* ; do
awk -v RANGE1="$RANGE1" -v RANGE2="$RANGE2" "$AWK_EXPRESSION" "$report" > "$target_file"
done

Related

Loop over environment variables in POSIX sh

I need to loop over environment variables and get their names and values in POSIX sh (not bash). This is what I have so far.
#!/usr/bin/env sh
# Loop over each line from the env command
while read -r line; do
# Get the string before = (the var name)
name="${line%=*}"
eval value="\$$name"
echo "name: ${name}, value: ${value}"
done <<EOF
$(env)
EOF
It works most of the time, except when an environment variable contains a newline. I need it to work in that case.
I am aware of the -0 flag for env that separates variables with nul instead of newlines, but if I use that flag, how do I loop over each variable? Edit: #chepner pointed out that POSIX env doesn't support -0, so that's out.
Any solution that uses portable linux utilities is good as long as it works in POSIX sh.
There is no way to parse the output of env with complete confidence; consider this output:
bar=3
baz=9
I can produce that with two different environments:
$ env -i "bar=3" "baz=9"
bar=3
baz=9
$ env -i "bar=3
> baz=9"
bar=3
baz=9
Is that two environment variables, bar and baz, with simple numeric values, or is it one variable bar with the value $'3\nbaz=9' (to use bash's ANSI quoting style)?
You can safely access the environment with POSIX awk, however, using the ENVIRON array. For example:
awk 'END { for (name in ENVIRON) {
print "Name is "name;
print "Value is "ENVIRON[name];
}
}' < /dev/null
With this command, you can distinguish between the two environments mentioned above.
$ env -i "bar=3" "baz=9" awk 'END { for (name in ENVIRON) { print "Name is "name; print "Value is "ENVIRON[name]; }}' < /dev/null
Name is baz
Value is 9
Name is bar
Value is 3
$ env -i "bar=3
> baz=9" awk 'END { for (name in ENVIRON) { print "Name is "name; print "Value is "ENVIRON[name]; }}' < /dev/null
Name is bar
Value is 3
baz=9
Maybe this would work?
#!/usr/bin/env sh
env | while IFS= read -r line
do
name="${line%%=*}"
indirect_presence="$(eval echo "\${$name+x}")"
[ -z "$name" ] || [ -z "$indirect_presence" ] || echo "name:$name, value:$(eval echo "\$$name")"
done
It is not bullet-proof, as if the value of a variable with a newline happens to have a line beginning that looks like an assignment, it could be somewhat confused.
The expansion uses %% to remove the longest match, so if a line contains several = signs, they should all be removed to leave only the variable name from the beginning of the line.
Here an example based on the awk approach:
#!/bin/sh
for NAME in $(awk "END { for (name in ENVIRON) { print name; }}" < /dev/null)
do
VAL="$(awk "END { printf ENVIRON[\"$NAME\"]; }" < /dev/null)"
echo "$NAME=$VAL"
done

Replace a string which is present on first line in UNIX file

I would like to replace a string which is present on the first line though it is there on rest of the lines in the file as well. How can i do that through a shell script? Can someone help me regarding this. My code is as below. I am extracting the first line from the file and after that I am not sure how to do a replace. Any help would be appreciated. Thanks.
Guys -I would like to replace a string present in $line and write the new line into the same file at same place.
Code:
while read line
do
if [[ $v_counter == 0 ]] then
echo "$line"
v_counter=$(($v_counter + 1));
fi
done < "$v_Full_File_Nm"
Sample data:
Input
BUXT_CMPID|MEDICAL_RECORD_NUM|FACILITY_ID|PATIENT_LAST_NAME|PATIENT_FIRST_NAME|HOME_ADDRESS_LINE_1|HOME_ADDRESS_LINE_2|HOME_CITY|HOME_STATE|HOME_ZIP|MOSAIC_CODE|MOSAIC_DESC|DRIVE_TIME| buxt_pt_apnd_20140624_head_5records.txt
100106086|5000120878|7141|HARRIS|NEDRA|6246 PARALLEL PKWY||KANSAS CITY|KS|66102|S71|Tough Times|2|buxt_pt_apnd_20140624_head_5records.txt
Output
BUXT_CMPID|MEDICAL_RECORD_NUM|FACILITY_ID|PATIENT_LAST_NAME|PATIENT_FIRST_NAME|HOME_ADDRESS_LINE_1|HOME_ADDRESS_LINE_2|HOME_CITY|HOME_STATE|HOME_ZIP|MOSAIC_CODE|MOSAIC_DESC|DRIVE_TIME| SRC_FILE_NM
100106086|5000120878|7141|HARRIS|NEDRA|6246 PARALLEL PKWY||KANSAS CITY|KS|66102|S71|Tough Times|2|buxt_pt_apnd_20140624_head_5records.txt
From the above sample data I need to replace buxt_pt_apnd_20140624_head_5records.txt with SRC_FILE_NAME string.
Why not use sed?
sed -e '1s/fred/frog/' yourfile
will replace fred with frog on line 1.
If your 'string' is a variable, you can do this to get the variable expanded:
sed -e "1s/$varA/$varB/" yourfile
If you want to do it in place and change your file, add -i before -e.
awk -v old="string1" -v new="string2" '
NR==1 && (idx=index($0,old)) {
$0 = substr($0,1,idx-1) new substr($0,idx+length(old))
}
1' file > /usr/tmp/tmp$$ && mv /usr/tmp/tmp$$ file
The above will replace string1 with string2 only when it appears in the first line of file.
Any solution posted that uses awk but does not use index will not work in general. Same for any solution posted that uses sed. The reason is that those would work on REs, not strings and so behave undesirably for string replacement depending what characters are present in string1.
Looks like the OPs going with a sed RE-replacement solution so this is just for anyone else looking to replace a string: Here's what a string replacement function would look like if youd rather not have it inline:
awk -v old="string1" -v new="string2" '
function strsub(old,new,tgt, idx) {
if ( idx = index(tgt,old) ) {
tgt = substr(tgt,1,idx-1) new substr(tgt,idx+length(old))
}
return tgt
}
NR==1 { $0 = strsub(old,new,$0) }
1' file
A bash solution:
file="afile.txt"
str="hello"
repl="goodbye"
IFS= read -r line < "$file"
line=${line/$str/$repl}
tmpfile="/usr/tmp/$file.$$.tmp"
{
echo "$line"
tail -n+2 "$file"
} > "$tmpfile" && mv "$tmpfile" "$file"
Note that $str above will be interpreted as a "pattern" (a simple kind of regex) where * matches any number of any characters, ? matches any single character, [abc] matches any one of the characters in the brackets, and [^abc] (or [!abc]) matches any one character not in the brackets. See Pattern-Matching

awk syntax to invoke function with argument read from a file

I have a function
xyz()
{
x=$1*2
echo x
}
then I want to use it to replace a particular column in a csv file by awk.
File input.csv:
abc,2,something
def,3,something1
I want output like:
abc,4,somthing
def,6,something1
Command used:
cat input.csv|awk -F, -v v="'"`xyz "$2""'" 'BEGIN {FS=","; OFS=","} {$2=v1; print $0}'
Open file input.csv, calling function xyz by passing file 2nd filed as argument and result is stored back to position 2 of file, but is not working!
If I put constant in place of $2 while calling function it works:
Please help me to do this.
cat input.csv|awk -F, -v v="'"`xyz "14""'" 'BEGIN {FS=","; OFS=","} {$2=v1; print $0}'
This above line of code is working properly by calling the xyz function and putting the result back to 2nd column of file input.csv, but with only 14*2, as 14 is taken as constant.
There's a back-quote missing from your command line, and a UUOC (Useless Use of Cat), and a mismatch between variable v on the command line and v1 in the awk program:
cat input.csv|awk -F, -v v="'"`xyz "$2""'" 'BEGIN {FS=","; OFS=","} {$2=v1; print $0}'
^ Here ^ Here ^ Here
That should be written using $(…) instead:
awk -F, -v v="'$(xyz "$2")'" 'BEGIN {FS=","; OFS=","} {$2=v; print $0}' input.csv
This leaves you with a problem, though; the function xyz is invoked once by the shell before you start your awk script running, and is never invoked by awk. You simply can't do it that way. However, you can define your function in awk (and on the fly):
awk -F, 'BEGIN { FS = ","; OFS = "," }
function xyz(a) { return a * 2 }
{ $2 = xyz($2); print $0 }' \
input.csv
For your two-line input file, it produces your desired output.

grep for a string in a line if the previous line doesn't contain a specific string

I have the following lines in a file:
abcdef ghi jkl
uvw xyz
I want to grep for the string "xyz" if the previous line is not contains the string "jkl".
I know how to grep for a string if the line doesn't contains a specific string using -v option. But i don't know how to do this with different lines.
grep is really a line-oriented tool. It might be possible to achieve what you want with it, but it's easier to use Awk:
awk '
/xyz/ && !skip { print }
{ skip = /jkl/ }
' file
Read as: for every line, do
if the current line matches xyz and we haven't just seen jkl, print it;
set the variable skip to indicate whether we've just seen jkl.
sed '/jkl/{N;d}; /xyz/!d'
If find jkl, remove that line and next
print only remaining lines with xyz
I think you're better off using an actual programming language, even a simple one like Bash or AWK or sed. For example, using Bash:
(
previous_line_matched=
while IFS= read -r line ; do
if [[ ! "$previous_line_matched" && "$line" == *xyz* ]] ; then
echo "$line"
fi
if [[ "$line" == *jkl* ]] ; then
previous_line_matched=1
else
previous_line_matched=
fi
done < input_file
)
Or, more tersely, using Perl:
perl -ne 'print if m/xyz/ && ! $skip; $skip = m/jkl/' < input_file

Removing trailing / starting newlines with sed, awk, tr, and friends

I would like to remove all of the empty lines from a file, but only when they are at the end/start of a file (that is, if there are no non-empty lines before them, at the start; and if there are no non-empty lines after them, at the end.)
Is this possible outside of a fully-featured scripting language like Perl or Ruby? I’d prefer to do this with sed or awk if possible. Basically, any light-weight and widely available UNIX-y tool would be fine, especially one I can learn more about quickly (Perl, thus, not included.)
From Useful one-line scripts for sed:
# Delete all leading blank lines at top of file (only).
sed '/./,$!d' file
# Delete all trailing blank lines at end of file (only).
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
Therefore, to remove both leading and trailing blank lines from a file, you can combine the above commands into:
sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba' file
So I'm going to borrow part of #dogbane's answer for this, since that sed line for removing the leading blank lines is so short...
tac is part of coreutils, and reverses a file. So do it twice:
tac file | sed -e '/./,$!d' | tac | sed -e '/./,$!d'
It's certainly not the most efficient, but unless you need efficiency, I find it more readable than everything else so far.
here's a one-pass solution in awk: it does not start printing until it sees a non-empty line and when it sees an empty line, it remembers it until the next non-empty line
awk '
/[[:graph:]]/ {
# a non-empty line
# set the flag to begin printing lines
p=1
# print the accumulated "interior" empty lines
for (i=1; i<=n; i++) print ""
n=0
# then print this line
print
}
p && /^[[:space:]]*$/ {
# a potentially "interior" empty line. remember it.
n++
}
' filename
Note, due to the mechanism I'm using to consider empty/non-empty lines (with [[:graph:]] and /^[[:space:]]*$/), interior lines with only whitespace will be truncated to become truly empty.
As mentioned in another answer, tac is part of coreutils, and reverses a file. Combining the idea of doing it twice with the fact that command substitution will strip trailing new lines, we get
echo "$(echo "$(tac "$filename")" | tac)"
which doesn't depend on sed. You can use echo -n to strip the remaining trailing newline off.
Here's an adapted sed version, which also considers "empty" those lines with just spaces and tabs on it.
sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
It's basically the accepted answer version (considering BryanH comment), but the dot . in the first command was changed to [^[:blank:]] (anything not blank) and the \n inside the second command address was changed to [[:space:]] to allow newlines, spaces an tabs.
An alternative version, without using the POSIX classes, but your sed must support inserting \t and \n inside […]. GNU sed does, BSD sed doesn't.
sed -e :a -e '/[^\t ]/,$!d; /^[\n\t ]*$/{ $d; N; ba' -e '}'
Testing:
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n'
foo
foo
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -n l
$
\t $
$
foo$
$
foo$
$
\t $
$
prompt$ printf '\n \t \n\nfoo\n\nfoo\n\n \t \n\n' | sed -e :a -e '/[^[:blank:]]/,$!d; /^[[:space:]]*$/{ $d; N; ba' -e '}'
foo
foo
prompt$
using awk:
awk '{a[NR]=$0;if($0 && !s)s=NR;}
END{e=NR;
for(i=NR;i>1;i--)
if(a[i]){ e=i; break; }
for(i=s;i<=e;i++)
print a[i];}' yourFile
this can be solved easily with sed -z option
sed -rz 's/^\n+//; s/\n+$/\n/g' file
Hello
Welcome to
Unix and Linux
For an efficient non-recursive version of the trailing newlines strip (including "white" characters) I've developed this sed script.
sed -n '/^[[:space:]]*$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^[[:space:]]*$/H'
It uses the hold buffer to store all blank lines and prints them only after it finds a non-blank line. Should someone want only the newlines, it's enough to get rid of the two [[:space:]]* parts:
sed -n '/^$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^$/H'
I've tried a simple performance comparison with the well-known recursive script
sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba'
on a 3MB file with 1MB of random blank lines around a random base64 text.
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M > bigfile
base64 </dev/urandom | dd bs=1 count=1M >> bigfile
shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M >> bigfile
The streaming script took roughly 0.5 second to complete, the recursive didn't end after 15 minutes. Win :)
For completeness sake of the answer, the leading lines stripping sed script is already streaming fine. Use the most suitable for you.
sed '/[^[:blank:]]/,$!d'
sed '/./,$!d'
Using bash
$ filecontent=$(<file)
$ echo "${filecontent/$'\n'}"
In bash, using cat, wc, grep, sed, tail and head:
# number of first line that contains non-empty character
i=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | head -1`
# number of hte last one
j=`grep -n "^[^\B*]" <your_file> | sed -e 's/:.*//' | tail -1`
# overall number of lines:
k=`cat <your_file> | wc -l`
# how much empty lines at the end of file we have?
m=$(($k-$j))
# let strip last m lines!
cat <your_file> | head -n-$m
# now we have to strip first i lines and we are done 8-)
cat <your_file> | tail -n+$i
Man, it's definitely worth to learn "real" programming language to avoid that ugliness!
#dogbane has a nice simple answer for removing leading empty lines. Here's a simple awk command which removes just the trailing lines. Use this with #dogbane's sed command to remove both leading and trailing blanks.
awk '{ LINES=LINES $0 "\n"; } /./ { printf "%s", LINES; LINES=""; }'
This is pretty simple in operation.
Add every line to a buffer as we read it.
For every line which contains a character, print the contents of the buffer and then clear it.
So the only things that get buffered and never displayed are any trailing blanks.
I used printf instead of print to avoid the automatic addition of a newline, since I'm using newlines to separate the lines in the buffer already.
This AWK script will do the trick:
BEGIN {
ne=0;
}
/^[[:space:]]*$/ {
ne++;
}
/[^[:space:]]+/ {
for(i=0; i < ne; i++)
print "";
ne=0;
print
}
The idea is simple: empty lines do not get echoed immediately. Instead, we wait till we get a non-empty line, and only then we first echo out as much empty lines as seen before it, and only then echo out the new non-empty line.
perl -0pe 's/^\n+|\n+(\n)$/\1/gs'
Here's an awk version that removes trailing blank lines (both empty lines and lines consisting of nothing but white space).
It is memory efficient; it does not read the entire file into memory.
awk '/^[[:space:]]*$/ {b=b $0 "\n"; next;} {printf "%s",b; b=""; print;}'
The b variable buffers up the blank lines; they get printed when a non-blank line is encountered. When EOF is encountered, they don't get printed. That's how it works.
If using gnu awk, [[:space:]] can be replaced with \s. (See full list of gawk-specific Regexp Operators.)
If you want to remove only those trailing lines that are empty, see #AndyMortimer's answer.
A bash solution.
Note: Only useful if the file is small enough to be read into memory at once.
[[ $(<file) =~ ^$'\n'*(.*)$ ]] && echo "${BASH_REMATCH[1]}"
$(<file) reads the entire file and trims trailing newlines, because command substitution ($(....)) implicitly does that.
=~ is bash's regular-expression matching operator, and =~ ^$'\n'*(.*)$ optionally matches any leading newlines (greedily), and captures whatever comes after. Note the potentially confusing $'\n', which inserts a literal newline using ANSI C quoting, because escape sequence \n is not supported.
Note that this particular regex always matches, so the command after && is always executed.
Special array variable BASH_REMATCH rematch contains the results of the most recent regex match, and array element [1] contains what the (first and only) parenthesized subexpression (capture group) captured, which is the input string with any leading newlines stripped. The net effect is that ${BASH_REMATCH[1]} contains the input file content with both leading and trailing newlines stripped.
Note that printing with echo adds a single trailing newline. If you want to avoid that, use echo -n instead (or use the more portable printf '%s').
I'd like to introduce another variant for gawk v4.1+
result=($(gawk '
BEGIN {
lines_count = 0;
empty_lines_in_head = 0;
empty_lines_in_tail = 0;
}
/[^[:space:]]/ {
found_not_empty_line = 1;
empty_lines_in_tail = 0;
}
/^[[:space:]]*?$/ {
if ( found_not_empty_line ) {
empty_lines_in_tail ++;
} else {
empty_lines_in_head ++;
}
}
{
lines_count ++;
}
END {
print (empty_lines_in_head " " empty_lines_in_tail " " lines_count);
}
' "$file"))
empty_lines_in_head=${result[0]}
empty_lines_in_tail=${result[1]}
lines_count=${result[2]}
if [ $empty_lines_in_head -gt 0 ] || [ $empty_lines_in_tail -gt 0 ]; then
echo "Removing whitespace from \"$file\""
eval "gawk -i inplace '
{
if ( NR > $empty_lines_in_head && NR <= $(($lines_count - $empty_lines_in_tail)) ) {
print
}
}
' \"$file\""
fi
Because I was writing a bash script anyway containing some functions, I found it convenient to write those:
function strip_leading_empty_lines()
{
while read line; do
if [ -n "$line" ]; then
echo "$line"
break
fi
done
cat
}
function strip_trailing_empty_lines()
{
acc=""
while read line; do
acc+="$line"$'\n'
if [ -n "$line" ]; then
echo -n "$acc"
acc=""
fi
done
}

Resources