How do I manipulate $PATH elements in shell scripts? - unix

Is there a idiomatic way of removing elements from PATH-like shell variables?
That is I want to take
PATH=/home/joe/bin:/usr/local/bin:/usr/bin:/bin:/path/to/app/bin:.
and remove or replace the /path/to/app/bin without clobbering the rest of the variable. Extra points for allowing me put new elements in arbitrary positions. The target will be recognizable by a well defined string, and may occur at any point in the list.
I know I've seen this done, and can probably cobble something together on my own, but I'm looking for a nice approach. Portability and standardization a plus.
I use bash, but example are welcome in your favorite shell as well.
The context here is one of needing to switch conveniently between multiple versions (one for doing analysis, another for working on the framework) of a large scientific analysis package which produces a couple dozen executables, has data stashed around the filesystem, and uses environment variable to help find all this stuff. I would like to write a script that selects a version, and need to be able to remove the $PATH elements relating to the currently active version and replace them with the same elements relating to the new version.
This is related to the problem of preventing repeated $PATH elements when re-running login scripts and the like.
Previous similar question: How to keep from duplicating path variable in csh
Subsequent similar question: What is the most elegant way to remove a path from the $PATH variable in Bash?

Addressing the proposed solution from dmckee:
While some versions of Bash may allow hyphens in function names, others (MacOS X) do not.
I don't see a need to use return immediately before the end of the function.
I don't see the need for all the semi-colons.
I don't see why you have path-element-by-pattern export a value. Think of export as equivalent to setting (or even creating) a global variable - something to be avoided whenever possible.
I'm not sure what you expect 'replace-path PATH $PATH /usr' to do, but it does not do what I would expect.
Consider a PATH value that starts off containing:
.
/Users/jleffler/bin
/usr/local/postgresql/bin
/usr/local/mysql/bin
/Users/jleffler/perl/v5.10.0/bin
/usr/local/bin
/usr/bin
/bin
/sw/bin
/usr/sbin
/sbin
The result I got (from 'replace-path PATH $PATH /usr') is:
.
/Users/jleffler/bin
/local/postgresql/bin
/local/mysql/bin
/Users/jleffler/perl/v5.10.0/bin
/local/bin
/bin
/bin
/sw/bin
/sbin
/sbin
I would have expected to get my original path back since /usr does not appear as a (complete) path element, only as part of a path element.
This can be fixed in replace-path by modifying one of the sed commands:
export $path=$(echo -n $list | tr ":" "\n" | sed "s:^$removestr\$:$replacestr:" |
tr "\n" ":" | sed "s|::|:|g")
I used ':' instead of '|' to separate parts of the substitute since '|' could (in theory) appear in a path component, whereas by definition of PATH, a colon cannot. I observe that the second sed could eliminate the current directory from the middle of a PATH. That is, a legitimate (though perverse) value of PATH could be:
PATH=/bin::/usr/local/bin
After processing, the current directory would no longer be on the PATH.
A similar change to anchor the match is appropriate in path-element-by-pattern:
export $target=$(echo -n $list | tr ":" "\n" | grep -m 1 "^$pat\$")
I note in passing that grep -m 1 is not standard (it is a GNU extension, also available on MacOS X). And, indeed, the-n option for echo is also non-standard; you would be better off simply deleting the trailing colon that is added by virtue of converting the newline from echo into a colon. Since path-element-by-pattern is used just once, has undesirable side-effects (it clobbers any pre-existing exported variable called $removestr), it can be replaced sensibly by its body. This, along with more liberal use of quotes to avoid problems with spaces or unwanted file name expansion, leads to:
# path_tools.bash
#
# A set of tools for manipulating ":" separated lists like the
# canonical $PATH variable.
#
# /bin/sh compatibility can probably be regained by replacing $( )
# style command expansion with ` ` style
###############################################################################
# Usage:
#
# To remove a path:
# replace_path PATH $PATH /exact/path/to/remove
# replace_path_pattern PATH $PATH <grep pattern for target path>
#
# To replace a path:
# replace_path PATH $PATH /exact/path/to/remove /replacement/path
# replace_path_pattern PATH $PATH <target pattern> /replacement/path
#
###############################################################################
# Remove or replace an element of $1
#
# $1 name of the shell variable to set (e.g. PATH)
# $2 a ":" delimited list to work from (e.g. $PATH)
# $3 the precise string to be removed/replaced
# $4 the replacement string (use "" for removal)
function replace_path () {
path=$1
list=$2
remove=$3
replace=$4 # Allowed to be empty or unset
export $path=$(echo "$list" | tr ":" "\n" | sed "s:^$remove\$:$replace:" |
tr "\n" ":" | sed 's|:$||')
}
# Remove or replace an element of $1
#
# $1 name of the shell variable to set (e.g. PATH)
# $2 a ":" delimited list to work from (e.g. $PATH)
# $3 a grep pattern identifying the element to be removed/replaced
# $4 the replacement string (use "" for removal)
function replace_path_pattern () {
path=$1
list=$2
removepat=$3
replacestr=$4 # Allowed to be empty or unset
removestr=$(echo "$list" | tr ":" "\n" | grep -m 1 "^$removepat\$")
replace_path "$path" "$list" "$removestr" "$replacestr"
}
I have a Perl script called echopath which I find useful when debugging problems with PATH-like variables:
#!/usr/bin/perl -w
#
# "#(#)$Id: echopath.pl,v 1.7 1998/09/15 03:16:36 jleffler Exp $"
#
# Print the components of a PATH variable one per line.
# If there are no colons in the arguments, assume that they are
# the names of environment variables.
#ARGV = $ENV{PATH} unless #ARGV;
foreach $arg (#ARGV)
{
$var = $arg;
$var = $ENV{$arg} if $arg =~ /^[A-Za-z_][A-Za-z_0-9]*$/;
$var = $arg unless $var;
#lst = split /:/, $var;
foreach $val (#lst)
{
print "$val\n";
}
}
When I run the modified solution on the test code below:
echo
xpath=$PATH
replace_path xpath $xpath /usr
echopath $xpath
echo
xpath=$PATH
replace_path_pattern xpath $xpath /usr/bin /work/bin
echopath xpath
echo
xpath=$PATH
replace_path_pattern xpath $xpath "/usr/.*/bin" /work/bin
echopath xpath
The output is:
.
/Users/jleffler/bin
/usr/local/postgresql/bin
/usr/local/mysql/bin
/Users/jleffler/perl/v5.10.0/bin
/usr/local/bin
/usr/bin
/bin
/sw/bin
/usr/sbin
/sbin
.
/Users/jleffler/bin
/usr/local/postgresql/bin
/usr/local/mysql/bin
/Users/jleffler/perl/v5.10.0/bin
/usr/local/bin
/work/bin
/bin
/sw/bin
/usr/sbin
/sbin
.
/Users/jleffler/bin
/work/bin
/usr/local/mysql/bin
/Users/jleffler/perl/v5.10.0/bin
/usr/local/bin
/usr/bin
/bin
/sw/bin
/usr/sbin
/sbin
This looks correct to me - at least, for my definition of what the problem is.
I note that echopath LD_LIBRARY_PATH evaluates $LD_LIBRARY_PATH. It would be nice if your functions were able to do that, so the user could type:
replace_path PATH /usr/bin /work/bin
That can be done by using:
list=$(eval echo '$'$path)
This leads to this revision of the code:
# path_tools.bash
#
# A set of tools for manipulating ":" separated lists like the
# canonical $PATH variable.
#
# /bin/sh compatibility can probably be regained by replacing $( )
# style command expansion with ` ` style
###############################################################################
# Usage:
#
# To remove a path:
# replace_path PATH /exact/path/to/remove
# replace_path_pattern PATH <grep pattern for target path>
#
# To replace a path:
# replace_path PATH /exact/path/to/remove /replacement/path
# replace_path_pattern PATH <target pattern> /replacement/path
#
###############################################################################
# Remove or replace an element of $1
#
# $1 name of the shell variable to set (e.g. PATH)
# $2 the precise string to be removed/replaced
# $3 the replacement string (use "" for removal)
function replace_path () {
path=$1
list=$(eval echo '$'$path)
remove=$2
replace=$3 # Allowed to be empty or unset
export $path=$(echo "$list" | tr ":" "\n" | sed "s:^$remove\$:$replace:" |
tr "\n" ":" | sed 's|:$||')
}
# Remove or replace an element of $1
#
# $1 name of the shell variable to set (e.g. PATH)
# $2 a grep pattern identifying the element to be removed/replaced
# $3 the replacement string (use "" for removal)
function replace_path_pattern () {
path=$1
list=$(eval echo '$'$path)
removepat=$2
replacestr=$3 # Allowed to be empty or unset
removestr=$(echo "$list" | tr ":" "\n" | grep -m 1 "^$removepat\$")
replace_path "$path" "$removestr" "$replacestr"
}
The following revised test now works too:
echo
xpath=$PATH
replace_path xpath /usr
echopath xpath
echo
xpath=$PATH
replace_path_pattern xpath /usr/bin /work/bin
echopath xpath
echo
xpath=$PATH
replace_path_pattern xpath "/usr/.*/bin" /work/bin
echopath xpath
It produces the same output as before.

Reposting my answer to What is the most elegant way to remove a path from the $PATH variable in Bash? :
#!/bin/bash
IFS=:
# convert it to an array
t=($PATH)
unset IFS
# perform any array operations to remove elements from the array
t=(${t[#]%%*usr*})
IFS=:
# output the new array
echo "${t[*]}"
or the one-liner:
PATH=$(IFS=':';t=($PATH);unset IFS;t=(${t[#]%%*usr*});IFS=':';echo "${t[*]}");

For deleting an element you can use sed:
#!/bin/bash
NEW_PATH=$(echo -n $PATH | tr ":" "\n" | sed "/foo/d" | tr "\n" ":")
export PATH=$NEW_PATH
will delete the paths that contain "foo" from the path.
You could also use sed to insert a new line before or after a given line.
Edit: you can remove duplicates by piping through sort and uniq:
echo -n $PATH | tr ":" "\n" | sort | uniq -c | sed -n "/ 1 / s/.*1 \(.*\)/\1/p" | sed "/foo/d" | tr "\n" ":"

There are a couple of relevant programs in the answers to "How to keep from duplicating path variable in csh". They concentrate more on ensuring that there are no repeated elements, but the script I provide can be used as:
export PATH=$(clnpath $head_dirs:$PATH:$tail_dirs $remove_dirs)
Assuming you have one or more directories in $head_dirs and one or more directories in $tail_dirs and one or more directories in $remove_dirs, then it uses the shell to concatenate the head, current and tail parts into a massive value, and then removes each of the directories listed in $remove_dirs from the result (not an error if they don't exist), as well as eliminating second and subsequent occurrences of any directory in the path.
This does not address putting path components into a specific position (other than at the beginning or end, and those only indirectly). Notationally, specifying where you want to add the new element, or which element you want to replace, is messy.

Just a note that bash itself can do search and replace. It can do all the normal "once or all", cases [in]sensitive options you would expect.
From the man page:
${parameter/pattern/string}
The pattern is expanded to produce a pattern just as in pathname expansion. Parameter is expanded and the longest match of pattern against its value is replaced with string. If Ipattern begins with /, all matches of pattern are replaced with string. Normally only the first match is replaced. If pattern begins with #, it must match at the beginning of the expanded value of parameter. If pattern begins with %, it must match at the end of the expanded value of parameter. If string is null, matches of pattern are deleted and the / following pattern may be omitted. If parameter is # or *, the substitution operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with # or
*, the substitution operation is applied to each member of the array in turn, and the expansion is the resultant list.
You can also do field splitting by setting $IFS (input field separator) to the desired delimiter.

OK, thanks to all responders. I've prepared an encapsulated version of florin's answer. The first pass looks like this:
# path_tools.bash
#
# A set of tools for manipulating ":" separated lists like the
# canonical $PATH variable.
#
# /bin/sh compatibility can probably be regained by replacing $( )
# style command expansion with ` ` style
###############################################################################
# Usage:
#
# To remove a path:
# replace-path PATH $PATH /exact/path/to/remove
# replace-path-pattern PATH $PATH <grep pattern for target path>
#
# To replace a path:
# replace-path PATH $PATH /exact/path/to/remove /replacement/path
# replace-path-pattern PATH $PATH <target pattern> /replacement/path
#
###############################################################################
# Finds the _first_ list element matching $2
#
# $1 name of a shell variable to be set
# $2 name of a variable with a path-like structure
# $3 a grep pattern to match the desired element of $1
function path-element-by-pattern (){
target=$1;
list=$2;
pat=$3;
export $target=$(echo -n $list | tr ":" "\n" | grep -m 1 $pat);
return
}
# Removes or replaces an element of $1
#
# $1 name of the shell variable to set (i.e. PATH)
# $2 a ":" delimited list to work from (i.e. $PATH)
# $2 the precise string to be removed/replaced
# $3 the replacement string (use "" for removal)
function replace-path () {
path=$1;
list=$2;
removestr=$3;
replacestr=$4; # Allowed to be ""
export $path=$(echo -n $list | tr ":" "\n" | sed "s|$removestr|$replacestr|" | tr "\n" ":" | sed "s|::|:|g");
unset removestr
return
}
# Removes or replaces an element of $1
#
# $1 name of the shell variable to set (i.e. PATH)
# $2 a ":" delimited list to work from (i.e. $PATH)
# $2 a grep pattern identifying the element to be removed/replaced
# $3 the replacement string (use "" for removal)
function replace-path-pattern () {
path=$1;
list=$2;
removepat=$3;
replacestr=$4; # Allowed to be ""
path-element-by-pattern removestr $list $removepat;
replace-path $path $list $removestr $replacestr;
}
Still needs error trapping in all the functions, and I should probably stick in a repeated path solution while I'm at it.
You use it by doing a . /include/path/path_tools.bash in the working script and calling on of the the replace-path* functions.
I am still open to new and/or better answers.

This is easy using awk.
Replace
{
for(i=1;i<=NF;i++)
if($i == REM)
if(REP)
print REP;
else
continue;
else
print $i;
}
Start it using
function path_repl {
echo $PATH | awk -F: -f rem.awk REM="$1" REP="$2" | paste -sd:
}
$ echo $PATH
/bin:/usr/bin:/home/js/usr/bin
$ path_repl /bin /baz
/baz:/usr/bin:/home/js/usr/bin
$ path_repl /bin
/usr/bin:/home/js/usr/bin
Append
Inserts at the given position. By default, it appends at the end.
{
if(IDX < 1) IDX = NF + IDX + 1
for(i = 1; i <= NF; i++) {
if(IDX == i)
print REP
print $i
}
if(IDX == NF + 1)
print REP
}
Start it using
function path_app {
echo $PATH | awk -F: -f app.awk REP="$1" IDX="$2" | paste -sd:
}
$ echo $PATH
/bin:/usr/bin:/home/js/usr/bin
$ path_app /baz 0
/bin:/usr/bin:/home/js/usr/bin:/baz
$ path_app /baz -1
/bin:/usr/bin:/baz:/home/js/usr/bin
$ path_app /baz 1
/baz:/bin:/usr/bin:/home/js/usr/bin
Remove duplicates
This one keeps the first occurences.
{
for(i = 1; i <= NF; i++) {
if(!used[$i]) {
print $i
used[$i] = 1
}
}
}
Start it like this:
echo $PATH | awk -F: -f rem_dup.awk | paste -sd:
Validate whether all elements exist
The following will print an error message for all entries that are not existing in the filesystem, and return a nonzero value.
echo -n $PATH | xargs -d: stat -c %n
To simply check whether all elements are paths and get a return code, you can also use test:
echo -n $PATH | xargs -d: -n1 test -d

suppose
echo $PATH
/usr/lib/jvm/java-1.6.0/bin:lib/jvm/java-1.6.0/bin/:/lib/jvm/java-1.6.0/bin/:/usr/lib/qt-3.3/bin:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/tvnadeesh/bin
If you want to remove /lib/jvm/java-1.6.0/bin/ do like as below
export PATH=$(echo $PATH | sed 's/\/lib\/jvm\/java-1.6.0\/bin\/://g')
sed will take input from echo $PATH and replace /lib/jvm/java-1.6.0/bin/: with empty
in this way you can remove

Order of PATH is not distrubed
Handles corner cases like empty path, space in path gracefully
Partial match of dir does not give false positives
Treats path at head and tail of PATH in proper ways. No : garbage and such.
Say you have
/foo:/some/path:/some/path/dir1:/some/path/dir2:/bar
and you want to replace
/some/path
Then it correctly replaces "/some/path" but
leaves "/some/path/dir1" or "/some/path/dir2", as what you would expect.
function __path_add(){
if [ -d "$1" ] ; then
local D=":${PATH}:";
[ "${D/:$1:/:}" == "$D" ] && PATH="$PATH:$1";
PATH="${PATH/#:/}";
export PATH="${PATH/%:/}";
fi
}
function __path_remove(){
local D=":${PATH}:";
[ "${D/:$1:/:}" != "$D" ] && PATH="${D/:$1:/:}";
PATH="${PATH/#:/}";
export PATH="${PATH/%:/}";
}
# Just for the shake of completeness
function __path_replace(){
if [ -d "$2" ] ; then
local D=":${PATH}:";
if [ "${D/:$1:/:}" != "$D" ] ; then
PATH="${D/:$1:/:$2:}";
PATH="${PATH/#:/}";
export PATH="${PATH/%:/}";
fi
fi
}
Related post
What is the most elegant way to remove a path from the $PATH variable in Bash?

I prefer using ruby to the likes of awk/sed/foo these days, so here's my approach to deal with dupes,
# add it to the path
PATH=~/bin/:$PATH:~/bin
export PATH=$(ruby -e 'puts ENV["PATH"].split(/:/).uniq.join(":")')
create a function for reuse,
mungepath() {
export PATH=$(ruby -e 'puts ENV["PATH"].split(/:/).uniq.join(":")')
}
Hash, arrays and strings in a ruby one liner :)

The first thing to pop into my head to change just part of a string is a sed substitution.
example:
if echo $PATH => "/usr/pkg/bin:/usr/bin:/bin:/usr/pkg/games:/usr/pkg/X11R6/bin"
then to change "/usr/bin" to "/usr/local/bin" could be done like this:
## produces standard output file
## the "=" character is used instead of slash ("/") since that would be messy,
# alternative quoting character should be unlikely in PATH
## the path separater character ":" is both removed and re-added here,
# might want an extra colon after the last path
echo $PATH | sed '=/usr/bin:=/usr/local/bin:='
This solution replaces an entire path-element so might be redundant if new-element is similar.
If the new PATH'-s aren't dynamic but always within some constant set you could save those in a variable and assign as needed:
PATH=$TEMP_PATH_1;
# commands ... ; \n
PATH=$TEMP_PATH_2;
# commands etc... ;
Might not be what you were thinking. some of the relevant commands on bash/unix would be:
pushd
popd
cd
ls # maybe l -1A for single column;
find
grep
which # could confirm that file is where you think it came from;
env
type
..and all that and more have some bearing on PATH or directories in general. The text altering part could be done any number of ways!
Whatever solution chosen would have 4 parts:
1) fetch the path as it is
2) decode the path to find the part needing changes
3) determing what changes are needed/integrating those changes
4) validation/final integration/setting the variable

In line with dj_segfault's answer, I do this in scripts that append/prepend environment variables that might be executed multiple times:
ld_library_path=${ORACLE_HOME}/lib
LD_LIBRARY_PATH=${LD_LIBRARY_PATH//${ld_library_path}?(:)/}
export LD_LIBRARY_PATH=${ld_library_path}${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Using this same technique to remove, replace or manipulate entries in PATH is trivial given the filename-expansion-like pattern matching and pattern-list support of shell parameter expansion.

Related

command in shell to get second numeric value after "-"

Example
prod2-03_dl-httpd-prod-8080_access_referer_log.20181111-050000
I need value 8080. So basically we need digit value after second occurrence of '-'.
We tried following options:
echo "prod2-03_dl-httpd-prod-8080_access_referer_log.20181111-050000" | sed -r 's/([^-][:digit:]+[^-][:digit:]).*/\1/'
There is no need to resort to sed, BASH supports regular expressions:
$ A=prod2-03_dl-httpd-prod-8080_access_referer_log.20181111-050000
$ [[ $A =~ ([^-]*-){2}[^[:digit:]]+([[:digit:]]+) ]] && echo "${BASH_REMATCH[2]}"
8080
Try this Perl solution
$ data="prod2-03_dl-httpd-prod-8080_access_referer_log.20181111-050000"
$ perl -ne ' /.+?\-(\d+).+?\-(\d+).*/g and print $2 ' <<< "$data"
8080
or
$ echo "$data" | perl -ne ' /.+?\-(\d+).+?\-(\d+).*/g and print $2 '
8080
You could do this in a POSIX shell using IFS to identify the parts, and a loop to step to the pattern you're looking for:
s="prod2-03_dl-httpd-prod-8080_access_referer_log.20181111-050000"
# Set a field separator
IFS=-
# Expand your variable into positional parameters
set - $s
# Drop the first two fields
shift 2
# Drop additional fields until one that starts with a digit
while ! expr "$1" : '[0-9]' >/dev/null; do shift; done
# Capture the part of the string that is not digits
y="$1"; while expr "$y" : '[0-9]' >/dev/null; do y="${y##[[:digit:]]}"; done
# Strip off the non-digit part from the original field
x="${1%$y}"
Note that this may fail for a string that looks like aa-bb-123cc45-foo. If you might have additional strings of digits in the "interesting" field, you'll need more code.
If you have a bash shell available, you could do this with a series of bash parameter expansions...
# Strip off the first two "fields"
x="${s#*-}"; x="${x#*-}"
shopt -s extglob
x="${x##+([^[:digit:]])}"
# Identify the part on the right that needs to be stripped
y="${x##+([[:digit:]])}"
# And strip it...
x="${x%$y}"
This is not POSIX compatible because if the requirement for extglob.
Of course, bash offers you many options. Consider this function:
whatdigits() {
local IFS=- x i
local -a a
a=( $1 )
for ((i=3; i<${#a[#]}; i++)) {
[[ ${a[$i]} =~ ^([0-9]+) ]] && echo "${BASH_REMATCH[1]}" && return 0
}
return 1
}
You can then run commands like:
$ whatdigits "12-ab-cd-45ef-gh"
45
$ whatdigits "$s"
8080

Unable to use -C of grep in Unix Shell Script

I am able to use grep in normal command line.
grep "ABC" Filename -C4
This is giving me the desired output which is 4 lines above and below the matched pattern line.
But if I use the same command in a Unix shell script, I am unable to grep the lines above and below the pattern. It is giving me output as the only lines where pattern is matched and an error in the end that cannot says cannot open grep : -C4
The results are similar if I use -A4 and -B4
I'll assume you need a portable POSIX solution without the GNU extensions (-C NUM, -A NUM, and -B NUM are all GNU, as are arguments following the pattern and/or file name).
POSIX grep can't do this, but POSIX awk can. This can be invoked as e.g. grepC -C4 "ABC" Filename (assuming it is named "grepC", is executable, and is in your $PATH):
#!/bin/sh
die() { echo "$*\nUsage: $0 [-C NUMBER] PATTERN [FILE]..." >&2; exit 2; }
CONTEXT=0 # default value
case $1 in
-C ) CONTEXT="$2"; shift 2 ;; # extract "4" from "-C 4"
-C* ) CONTEXT="${1#-C}"; shift ;; # extract "4" from "-C4"
--|-) shift ;; # no args or use std input (implicit)
-* ) [ -f "$1" ] || die "Illegal option '$1'" ;; # non-option non-file
esac
[ "$CONTEXT" -ge 0 ] 2>/dev/null || die "Invalid context '$CONTEXT'"
[ "$#" = 0 ] && die "Missing PATTERN"
PATTERN="$1"
shift
awk '
/'"$PATTERN"'/ {
match='$CONTEXT'
for(i=1; i<=CONTEXT; i++) if(NR>i) print last[i];
print
next
}
match { print; match-- }
{ for(i='$CONTEXT'; i>1; i--) last[i] = last[i-1]; last[1] = $0 }
' "$#"
This sets up die as a fatal error function, then finds the desired lines of context from your arguments (either -C NUMBER or -CNUMBER), with an error for unsupported options (unless they're files).
If the context is not a number or there is no pattern, we again fatally error out.
Otherwise, we save the pattern, shift it away, and reserve the rest of the options for handing to awk as files ("$#").
There are three stanzas in this awk call:
Match the pattern itself. This requires ending the single-quote portion of the string in order to incorporate the $PATTERN variable (which may not behave correctly if imported via awk -v). Upon that match, we store the number of lines of context into the match variable, loop through the previous lines saved in the last hash (if we've gone far enough to have had them), and print them. We then skip to the next line without evaluating the other two stanzas.
If there was a match, we need the next few lines for context. As this stanza prints them, it decrements the counter. A new match (previous stanza) will reset that count.
We need to save previous lines for recalling upon a match. This loops through the number of lines of context we care about and stores them in the last hash. The current line ($0) is stored in last[1].

How to use awk for multiple file search in two directories, print records only from files with matching string in second directory

Remade a previous question so that it is more clear. I'm trying to search files in two directories and print matching character strings (+ line immediately following) into a new file from the second directory only if they match a record in the first directory. I have found similar examples but nothing quite the same. I don't know how to use awk for multiple files from different directories and I've tortured myself trying to figure it out.
Directory 1, 28,000 files, formatted viz.:
>ABC
KLSDFIOUWERMSDFLKSJDFKLSJDSFKGHGJSNDKMVMFHKSDJFS
>GHI
OOILKJSDFKJSDFLMOPIWERIOUEWIRWIOEHKJTSDGHLKSJDHGUIYIUSDVNSDG
Directory 2, 15 files, formatted viz.:
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>DEF
12341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
Desired output:
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
Directories 1 and 2 are located in my home directory: (./Test1 & ./Test2)
If anyone could advise command to specific the different directories, I'd be immensely grateful! Currently when I include file path (e.g., /Test1/*.fa) I get the following error:
awk: can't open file /Test1/*.fa
You'll want something like this (untested):
awk '
FNR==1 {
dirname = FILENAME
sub("/.*","",dirname)
if (NR==1) {
dirname1 = dirname
}
}
dirname == dirname1 {
if (FNR % 2) {
key = $0
}
else {
map[key] = $0
}
next
}
(FNR % 2) && ($0 in map) && !seen[$0,map[$0]]++ {
print $0 ORS map[$0]
}
' Test1/* Test2/*
Given you're getting the error message /usr/bin/awk: Argument list too long which means you're exceeding your shells maximum argument length for a command and that 28,000 of your files are in the Test1 directory, try this:
find Test1 -type f -exec cat {} \; |
awk '
NR == FNR {
if (FNR % 2) {
key = $0
}
else {
map[key] = $0
}
next
}
(FNR % 2) && ($0 in map) && !seen[$0,map[$0]]++ {
print $0 ORS map[$0]
}
' - Test2/*
Solution in TXR:
Data:
$ ls dir*
dir1:
file1 file2
dir2:
file1 file2
$ cat dir1/file1
>ABC
KLSDFIOUWERMSDFLKSJDFKLSJDSFKGHGJSNDKMVMFHKSDJFS
>GHI
OOILKJSDFKJSDFLMOPIWERIOUEWIRWIOEHKJTSDGHLKSJDHGUIYIUSDVNSDG
$ cat dir1/file2
>XYZ
SDOIWEUROIUOIWUEROIWUEROIWUEROIWUEROUIEIDIDIIDFIFI
>MNO
OOIWEPOIUWERHJSDHSDFJSHDF
$ cat dir2/file1
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>DEF
12341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
$ cat dir2/file2
>STP
12341234123412341234123412341234123412341234123412341234123412341234123412341234
>MNO
123412341234123412341234123412341234123412341234123412341234123412341234
$
Run:
$ txr filter.txr dir1/* dir2/*
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
>MNO
123412341234123412341234123412341234123412341234123412341234123412341234
Code in filter.txr:
#(bind want #(hash :equal-based))
#(next :args)
#(all)
#dir/#(skip)
#(and)
# (repeat :gap 0)
#dir/#file
# (next `#dir/#file`)
# (repeat)
>#key
# (do (set [want key] t))
# (end)
# (end)
#(end)
#(repeat)
#path
# (next path)
# (repeat)
>#key
#datum
# (require [want key])
# (output)
>#key
#datum
# (end)
# (end)
#(end)
To separate the dir1 paths from the rest, we use an #(all) match (try multiple pattern branches, which must all match) with two branches. The first branch matches one #dir/#(skip) pattern, binding the variable dir to text that is preceded by a slash, and ignore the rest. The second branch matches a whole consecutive sequence of #dir/#file patterns via #(repeat :gap 0). Because the same dir variable appears that already has a binding from the first branch of the all, this constrains the matches to the same directory name. Inside this repeat we recurse into each file via next and gather the >-delimited keys into the keep hash. After that, we process the remaining arguments as path names of files to process; they don't all have to be in the same directory. We scan through each one for the >#key pattern followed by a line of #datum. The #(require ...) directive will fail the match if key is not in the wanted hash, otherwise we fall through to the #(output).

Replace a string which is present on first line in UNIX file

I would like to replace a string which is present on the first line though it is there on rest of the lines in the file as well. How can i do that through a shell script? Can someone help me regarding this. My code is as below. I am extracting the first line from the file and after that I am not sure how to do a replace. Any help would be appreciated. Thanks.
Guys -I would like to replace a string present in $line and write the new line into the same file at same place.
Code:
while read line
do
if [[ $v_counter == 0 ]] then
echo "$line"
v_counter=$(($v_counter + 1));
fi
done < "$v_Full_File_Nm"
Sample data:
Input
BUXT_CMPID|MEDICAL_RECORD_NUM|FACILITY_ID|PATIENT_LAST_NAME|PATIENT_FIRST_NAME|HOME_ADDRESS_LINE_1|HOME_ADDRESS_LINE_2|HOME_CITY|HOME_STATE|HOME_ZIP|MOSAIC_CODE|MOSAIC_DESC|DRIVE_TIME| buxt_pt_apnd_20140624_head_5records.txt
100106086|5000120878|7141|HARRIS|NEDRA|6246 PARALLEL PKWY||KANSAS CITY|KS|66102|S71|Tough Times|2|buxt_pt_apnd_20140624_head_5records.txt
Output
BUXT_CMPID|MEDICAL_RECORD_NUM|FACILITY_ID|PATIENT_LAST_NAME|PATIENT_FIRST_NAME|HOME_ADDRESS_LINE_1|HOME_ADDRESS_LINE_2|HOME_CITY|HOME_STATE|HOME_ZIP|MOSAIC_CODE|MOSAIC_DESC|DRIVE_TIME| SRC_FILE_NM
100106086|5000120878|7141|HARRIS|NEDRA|6246 PARALLEL PKWY||KANSAS CITY|KS|66102|S71|Tough Times|2|buxt_pt_apnd_20140624_head_5records.txt
From the above sample data I need to replace buxt_pt_apnd_20140624_head_5records.txt with SRC_FILE_NAME string.
Why not use sed?
sed -e '1s/fred/frog/' yourfile
will replace fred with frog on line 1.
If your 'string' is a variable, you can do this to get the variable expanded:
sed -e "1s/$varA/$varB/" yourfile
If you want to do it in place and change your file, add -i before -e.
awk -v old="string1" -v new="string2" '
NR==1 && (idx=index($0,old)) {
$0 = substr($0,1,idx-1) new substr($0,idx+length(old))
}
1' file > /usr/tmp/tmp$$ && mv /usr/tmp/tmp$$ file
The above will replace string1 with string2 only when it appears in the first line of file.
Any solution posted that uses awk but does not use index will not work in general. Same for any solution posted that uses sed. The reason is that those would work on REs, not strings and so behave undesirably for string replacement depending what characters are present in string1.
Looks like the OPs going with a sed RE-replacement solution so this is just for anyone else looking to replace a string: Here's what a string replacement function would look like if youd rather not have it inline:
awk -v old="string1" -v new="string2" '
function strsub(old,new,tgt, idx) {
if ( idx = index(tgt,old) ) {
tgt = substr(tgt,1,idx-1) new substr(tgt,idx+length(old))
}
return tgt
}
NR==1 { $0 = strsub(old,new,$0) }
1' file
A bash solution:
file="afile.txt"
str="hello"
repl="goodbye"
IFS= read -r line < "$file"
line=${line/$str/$repl}
tmpfile="/usr/tmp/$file.$$.tmp"
{
echo "$line"
tail -n+2 "$file"
} > "$tmpfile" && mv "$tmpfile" "$file"
Note that $str above will be interpreted as a "pattern" (a simple kind of regex) where * matches any number of any characters, ? matches any single character, [abc] matches any one of the characters in the brackets, and [^abc] (or [!abc]) matches any one character not in the brackets. See Pattern-Matching

How to quote strings in file names in zsh (passing back to other scripts)

I have a script that has a string in a file name like so:
filename_with_spaces="a file with spaces"
echo test > "$filename_with_spaces"
test_expect_success "test1: filename with spaces" "
run cat \"$filename_with_spaces\"
run grep test \"$filename_with_spaces\"
"
test_expect_success is defined as:
test_expect_success () {
echo "expecting success: $1"
eval "$2"
}
and run is defined as:
#!/bin/zsh
# make nice filename removing special characters, replace space with _
filename=`echo $# | tr ' ' _ | tr -cd 'a-zA-Z0-9_.'`.run
echo "#!/bin/zsh" > $filename
print "$#" >> $filename
chmod +x $filename
./$filename
But when I run the toplevel script test_expect_success... I get cat_a_file_with_spaces.run with:
#!/bin/zsh
cat a file with spaces
The problem is the quotes around a file with spaces in cat_a_file_with_spaces.run is missing. How do you get Z shell to keep the correct quoting?
Thanks
Try
run cat ${(q)filename_with_spaces}
. It is what (q) modifier was written for. Same for run script:
echo -E ${(q)#} >> $filename
. And it is not bash, you don't need to put quotes around variables: unless you specify some option (don't remember which exactly)
command $var
always passes exactly one argument to command no matter what is in $var. To ensure that some zsh option will not alter the behavior, put
emulate -L zsh
at the top of every script.
Note that initial variant (run cat \"$filename_with_spaces\") is not a correct quoting: filename may contain any character except NULL and / used for separating directories. ${(q)} takes care about it.
Update: I would have written test_expect_success function in the following fashion:
function test_expect_success()
{
emulate -L zsh
echo "Expecting success: $1" ; shift
$#
}
Usage:
test_expect_success "Message" run cat $filename_with_spaces

Resources