sub string in unix on a character - unix

I have file names inside a directory in unix as:
code1_abc.txt
code2_xyz.txt
code1_pqr.txt
I am looping over all files in this director to do some stuff on each files:
for myFile in $(ls $INPUT_DIR/* | xargs -n 1 basename)
do
echo $myFile
done
However, now I want to split the file name and want to get the part before the underscore i.e. code1, code2, code3
for myFile in $(ls $INPUT_DIR/* | xargs -n 1 basename)
do
echo $myFile
codeForCurrentFile= // want code1 here using myFile value
echo $codeForCurrentFile // should echo code1, code2, code3 respectively
done
How to do this? I am using korn shell.
Thanks for reading!

Use ksh pattern substitution to replace the underscore and anything after it with nothing (effectively delete):
echo ${myFile//_*/}
For your example:
codeForCurrentFile=${myFile//_*/}
More info here (see section 4.5.4): http://docstore.mik.ua/orelly/unix3/korn/ch04_05.htm

You can do this by calling out to an external program, regardless of the shell in use (provided it supports output capture of external programs, of course), such as with the following transcript:
pax$ fspec=code1_abc
pax$ echo $fspec
code1_abc
pax$ pre=`echo $fspec | cut -d_ -f1` ; echo $pre
code1
pax$ post=`echo $fspec | cut -d_ -f2` ; echo $post
abc
There are a wide variety of tools you can use to achieve this, cut (as above, probably the simplest), awk, sed and so on.
This has the disadvantage of kicking up external processes, something that should be okay provided you're not doing it many times per second. If it's something that needs to be fast, you're better off using shell-specific internal methods, such as:
ksh:
fspec=code1_abc
pre=${fspec//_*/}
post=${fspec//*_/}
bash:
fspec=code1_abc
pre=${fspec%%_*}
post=${fspec#*_}
csh:
set fspec = code1_abc
set arr = ( $fspec:as/_/ / )
set pre = $arr[1]
set post = $arr[2]

Related

tcsh passing a variable inside a shell script

I've defined a variable inside a shell script and I want to use it. For some reason, I cannot pass it into to command line that I need it in.
Here's my script which fails at the last lines
#! /usr//bin/tcsh -f
if ( $# != 2 ) then
echo "Usage: jump_sorter.sh <jump> <field to sort on>"
exit;
endif
set a = `cat $1 | tail -1` #prepares last row for check with loop
set b = $2 #this is the value last row will be checked for
set counter = 0
foreach i ($a)
if ($i == "$b") then
set bingo = $counter
echo "$bingo is the field to print from $a"
endif
set counter = `expr $counter + 1`
end
echo $bingo #this prints the correct value for using in the command below
cat $1 | awk '{print($bingo)}' | sort | uniq -c | sort -nr #but this doesn't work.
#when I use $9 instead of $bingo, it does work.
How can I pass $bingo into the final line correctly, please?
Update: following the accepted answer from Martin Tournoij, the correct way to handle the "$" sign in the command is:
cat $1 | awk "{print("\$"$bingo)}" | sort | uniq -c | sort -nr
The reason it doesn't work is because variables are only substituted inside double quotes ("), not single quotes ('), and you're using single quotes:
cat $1 | awk '{print($bingo)}' | sort | uniq -c | sort -nr
The following should work:
cat $1 | awk "{print($bingo)}" | sort | uniq -c | sort -nr
You also have an error here:
#! /usr//bin/tcsh -f
That should be:
#!/usr/bin/tcsh -f
Note that csh isn't usually recommended for scripting; it has many quirks and lacks some features like functions. Unless you really need to use csh, it's recommended to use a Bourne shell (/bin/sh, bash, zsh) or a scripting language (Python, Ruby, etc.) instead.

substring before and substring after in shell script

I have a string:
//host:/dir1/dir2/dir3/file_name
I want to fetch value of host & directories in different variables in unix script.
Example :
host_name = host
dir_path = /dir1/dir2/dir3
Note - String length & no of directories is not fixed.
Could you please help me to fetch these values from string in unix shell script.
Using bash string operations:
str='//host:/dir1/dir2/dir3/file_name'
host_name=${str%%:*}
host_name=${host_name##*/}
dir_path=${str#*:}
dir_path=${dir_path%/*}
I would do it using regular expressions:
if [[ $path =~ ^//(.*):(.*)/(.*)$ ]]; then
host="${BASH_REMATCH[1]}"
dir_path="${BASH_REMATCH[2]}"
filename="${BASH_REMATCH[3]}"
else
echo "Invalid format" >&2
exit 1
fi
If you are sure that the format will match, you can do simply
[[ $path =~ ^//(.*):(.*)/(.*)$ ]]
host="${BASH_REMATCH[1]}"
dir_path="${BASH_REMATCH[2]}"
filename="${BASH_REMATCH[3]}"
Edit: Since you seem to be using ksh rather than bash (though bash was indicated in the question), the syntax is a bit different:
match=(${path/~(E)^\/\/(.*):(.*)\/(.*)$/\1 \2 \3})
host="${match[0]}"
dir_path="${match[1]}"
filename="${match[2]}"
This will break if there are spaces in the file name, though. In that case, you can use the more cumbersome
host="${path/~(E)^\/\/(.*):(.*)\/(.*)$/\1}"
dir_path="${path/~(E)^\/\/(.*):(.*)\/(.*)$/\2}"
filename="${path/~(E)^\/\/(.*):(.*)\/(.*)$/\3}"
Perhaps there are more elegant ways of doing it in ksh, but I'm not familiar with it.
The shortest way I can think of is to assign two variables in one statement:
$ read host_name dir_path <<< $(echo $string | sed -e 's,^//,,;s,:, ,')
Complete script:
string="//host:/dir1/dir2/dir3/file_name"
read host_name dir_path <<< $(echo $string | sed -e 's,^//,,;s,:, ,')
echo "host_name = " $host_name
echo "dir_path = " $dir_path
Output:
host_name: host
dir_path: /dir1/dir2/dir3/file_name

How to quote strings in file names in zsh (passing back to other scripts)

I have a script that has a string in a file name like so:
filename_with_spaces="a file with spaces"
echo test > "$filename_with_spaces"
test_expect_success "test1: filename with spaces" "
run cat \"$filename_with_spaces\"
run grep test \"$filename_with_spaces\"
"
test_expect_success is defined as:
test_expect_success () {
echo "expecting success: $1"
eval "$2"
}
and run is defined as:
#!/bin/zsh
# make nice filename removing special characters, replace space with _
filename=`echo $# | tr ' ' _ | tr -cd 'a-zA-Z0-9_.'`.run
echo "#!/bin/zsh" > $filename
print "$#" >> $filename
chmod +x $filename
./$filename
But when I run the toplevel script test_expect_success... I get cat_a_file_with_spaces.run with:
#!/bin/zsh
cat a file with spaces
The problem is the quotes around a file with spaces in cat_a_file_with_spaces.run is missing. How do you get Z shell to keep the correct quoting?
Thanks
Try
run cat ${(q)filename_with_spaces}
. It is what (q) modifier was written for. Same for run script:
echo -E ${(q)#} >> $filename
. And it is not bash, you don't need to put quotes around variables: unless you specify some option (don't remember which exactly)
command $var
always passes exactly one argument to command no matter what is in $var. To ensure that some zsh option will not alter the behavior, put
emulate -L zsh
at the top of every script.
Note that initial variant (run cat \"$filename_with_spaces\") is not a correct quoting: filename may contain any character except NULL and / used for separating directories. ${(q)} takes care about it.
Update: I would have written test_expect_success function in the following fashion:
function test_expect_success()
{
emulate -L zsh
echo "Expecting success: $1" ; shift
$#
}
Usage:
test_expect_success "Message" run cat $filename_with_spaces

Unix and tee — chain of commands

In a Unix environment, I want to use tee on a chain of commands like so:
$ echo 1; echo 2 | tee file
1
2
$ cat file
2
Why does file only end up as having the output from the final command?
For the purposes of this discussion, let's assume I can't break them apart and run the commands separately.
It has only the output of the second command, as the semicolon indicates a new statement to the shell.
Just put them into parentheses:
(echo 1; echo 2) | tee file
Try:
( echo 1; echo 2 ) | tee file
Without the parentheses, it's getting parsed as:
echo 1 ; ( echo 2 | tee file )

How to sort characters in a string?

I would like to sort the characters in a string.
E.g.
echo cba | sort-command
abc
Is there a command that will allow me to do this or will I have to write an awk script to iterate over the string and sort it?
echo cba | grep -o . | sort |tr -d "\n"
Please find the following useful methods:
Shell
Sort string based on its characters:
echo cba | grep -o . | sort | tr -d "\n"
String separated by spaces:
echo 'dd aa cc bb' | tr " " "\n" | sort | tr "\n" " "
Perl
print (join "", sort split //,$_)
Ruby
ruby -e 'puts "dd aa cc bb".split(/\s+/).sort'
Bash
With bash you have to enumerate each character from a string, in general something like:
str="dd aa cc bb";
for (( i = 0; i < ${#str[#]}; i++ )); do echo "${str[$i]}"; done
For sorting array, please check: How to sort an array in bash?
This is cheating (because it uses Perl), but works. :-P
echo cba | perl -pe 'chomp; $_ = join "", sort split //'
Another perl one-liner
$ echo cba | perl -F -lane 'print sort #F'
abc
$ # for reverse order
$ echo xyz | perl -F -lane 'print reverse sort #F'
zyx
$ # or
$ echo xyz | perl -F -lane 'print sort {$b cmp $a} #F'
zyx
This will add newline to output as well, courtesy -l option
See Command switches for doc on all the options
The input is basically split character wise and saved in #F array
Then sorted #F is printed
This will also work line wise for given input file
$ cat ip.txt
idea
cold
spare
umbrella
$ perl -F -lane 'print sort #F' ip.txt
adei
cdlo
aeprs
abellmru
This would have been more appropriate as a comment to one of the grep -o . solutions (my reputation's not quite up to that low bar alas, damn my lurking), but I thought it worth mentioning that separating letters can be done more efficiently within the shell. It's always worth avoiding code, but this letsep function is pretty small:
letsep ()
{
INWORD="$1"
while [ "$INWORD" ]
do
echo ${INWORD:0:1}
INWORD=${INWORD#?}
done
}
. . . and outputs one letter per line for an input string of arbitrary length. For example, once letsep is defined, populating an array FLETRS with the letters of a string contained in variable FRED could be done (assuming contemporary bash) as:
readarray -t FLETRS < <(letsep $FRED)
. . . which for word-size strings runs about twice as fast as the equivalent :
readarray -t FLETRS < <(echo $FRED | grep -o .)
Whether this is worth setting up depends on the application. I've only measured this crudely, but the slower procedural code seems to maintain an advantage over the context switch up to ~60 chars (grep is obviously more efficient, but loading it is relatively expensive). If the above operation is taking place in one or more steps of a loop over an indeterminate number of executions, the difference in efficiency can add up (at which point some might argue for switching tools and rewriting regardless, but that's another set of tradeoffs).

Resources