zsh for loop over command output - zsh

I store the output of a command as such.
all=(`some command`)
for i in $all; do
echo $i
done
this returns output like this.
string1a string1b string1c
string2a string2b string2c
string3a string3b string3c
however i want to be able to access just a part of the value of i, where space is the separator. for example output like this.
string1a string1c
string2a string2c
string3a string3c
how do i go about this?

% print -l 'string1a string1b string1c' 'string2a string2b string2c' \
'string3a string3b string3c'
string1a string1b string1c
string2a string2b string2c
string3a string3b string3c
% all=( "$( print -l 'string1a string1b string1c' 'string2a string2b string2c'
'string3a string3b string3c' )" )
% print -l ${all//string<->b }
string1a string1c
string2a string2c
string3a string3c
%
$(...) does the same as backticks, but is preferred because it can be nested, whereas backticks can not.
"$(...)" ensures line breaks are preserved.
${foo//bar} removes from parameter $foo all substrings that match the pattern bar.
<-> matches any number.
print -l prints each argument on a separate line.
Alternatively, you can also do this without creating the parameter $all:
% print -l ${"$(
print -l 'string1a string1b string1c' 'string2a string2b string2c' \
'string3a string3b string3c'
)"//string<->b }
string1a string1c
string2a string2c
string3a string3c
%

Related

Pass variable from bash to R with commandArgs

I'm having a terrible go trying to pass some variables from the shell to R. I am hesitant to post this because I can't figure out a reasonable way to make this reproducible, since it involves a tool that has to be downloaded, and really it's more of a general methodology issue that I don't think needs to be reproducible, if you can just suspend your disbelief and bear with me for a quick minute.
I have arguments that are defined in a bash script: $P, $G, and $O.
I have some if/then statements and everything is fine until I get to the $O options.
This is the first part of the $O section and it works fine. It grabs data from $P and passes it to the twoBitToFa utility from UCSC's genome project and outputs the data correctly in a .fa file. Beautiful. (Although I think using 'stdout' and '>' is perhaps redundant?)
if [ "$O" = "fasta" ]
then
awk '{print $0" "$1":"$2"-"$3}' "$P" |
twoBitToFa -bed=stdin -udcDir=. "$twobit" stdout > "${P%.bed}".fa
fi
The next section is where I am stuck. If the $O option is "bed", then I want to invoke the Rscript command and pass my stuff over to R. I am able to pass my $P, $G, and $O variables without issue, but now I also need to pass the output from the twoBitToFa function. I could add a step and make the .fa file and then pick that up in R, but I am trying to skip the .fa file creation step and output a different file type instead (.bed). Here are some things I have tried:
# try saving twoBitToFa output to variable and including it in the variables passed to R:
if [ "$O" = "bed" ]
then
awk '{print $0" "$1":"$2"-"$3}' "$P" |
myvar=$(twoBitToFa -bed=stdin -udcDir=. "$twobit" stdout) \
Rscript \
GetSeq_R.r \
$P \
$G \
$O \
$myvar
fi
To check what variables come through, my GetSeq_R.r script starts with:
args = commandArgs(trailingOnly=TRUE)
print(args)
and with the above code, the output only includes my $P, $G, and $O variables. $myvar doesn't make it. $P is the TAD-1 file, $G is "hg38", and $O is "bed".
[1] "TAD-1_template.bed" "hg38" "bed"
I am not sure if the way I am trying to pass the data in the variable is wrong. From everything I've read, it seems like it should work. I've also tried using tee to see what is in my stdout at that step like so:
if [ "$O" = "bed" ]
then
awk '{print $0" "$1":"$2"-"$3}' "$P" |
twoBitToFa -bed=stdin -udcDir=. "$twobit" stdout | tee \
Rscript \
GetSeq_R.r \
$P \
$G \
$O
fi
And the data I want to pass to R is correctly shown in my console by using tee. I've tried saving stdout and tee to a variable and passing that variable to R, thinking maybe it's something about twoBitToFa that refuses to be put inside a variable, but was unsuccessful. I've spent hours looking up info about tee, stdout, and passing variables from bash to R. I feel like I'm missing something fundamental, or trying to do something impossible, and would really appreciate some other eyes on this.
Here's the whole bash script, in case that's illuminating. Do I need to define a variable in "$#" for what I am trying to pass to R, even though it's not something I want the user to be aware of? Am I capturing the variable with $myvar incorrectly? Can I get the contents of stdout or tee to show up in R?
Thanks in advance.
for arg in "$#"; do
shift
case "$arg" in
"--path") set -- "$#" "-P" ;;
"--genome") set -- "$#" "-G" ;;
"--output") set -- "$#" "-O" ;;
"--help") set -- "$#" "-h" ;;
*) set -- "$#" "$arg"
esac
done
while getopts ":P:G:O:h" OPT
do
case $OPT in
P) P=$OPTARG;;
G) G=$OPTARG;;
O) O=$OPTARG;;
h) help ;;
\?)
echo "Invalid option: -$OPTARG" >&2
usage
exit 1
;;
:)
echo "Option -$OPTARG requires an argument." >&2
usage
exit 1
;;
esac
done
num_col=$(cat "$P" | awk "{print NF; exit}")
if [ "$num_col" = 3 ]
then
echo -e "\n\n3 column bed file detected; no directional considerations for sequences \n\n"
if [ "$G" = "hg38" ]
then
twobit="https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.2bit"
fi
if [ "$G" = "hg19" ]
then
twobit="https://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/hg19.2bit"
fi
if [ "$O" = "fasta" ]
then
awk '{print $0" "$1":"$2"-"$3}' "$P" |
twoBitToFa -bed=stdin -udcDir=. "$twobit" stdout > "${P%.bed}".fa
fi
if [ "$O" = "bed" ]
then
awk '{print $0" "$1":"$2"-"$3}' "$P" |
#myvar=$(twoBitToFa -bed=stdin -udcDir=. "$twobit" stdout) \
Rscript \
GetSeq_R.r \
$P \
$G \
$O \
$myvar
fi
fi

Display matched string to end of line

How to find a particular string in a file and display matched string and rest of the line?
For example- I have a line in a.txt:
This code gives ORA-12345 in my code.
So, I am finding string 'ORA-'
Output Should be:
ORA-12345 in my code
Tried using grep:
grep 'ORA-*' a.txt
but it gives whole line in the output.
# Create test data:
echo "junk ORA-12345 more stuff" > a.tst
echo "junk ORB-12345 another stuff" >> a.tst
# Actually command:
# the -o (--only-matching) flag will print only the matched result, and not the full line
cat a.tst | grep -o 'ORA-.*$' # ORA-12345 more stuff
As fedorqui pointed out you can use:
grep -o 'ORA-.*$' a.tst
An additional answer in awk:
awk '$0 ~ "ORA" {print substr($0, match($0, "ORA"))}' a.tst
From the inside out, here's what's going on:
match($0, "ORA") finds where in the line ORA appears. In this case, it happens to be position 17.
substr($0, match($0, "ORA")) then returns from position 17 to the end of the line.
$0 ~ "ORA" makes sure that the the above is applied only to those lines that contain ORA.
with sed
echo "This code gives ORA-12345 in my code." | sed 's/.*ORA-/ORA-/'

awk passing a variable

I am struggling with an awk problem in my bash shell script. In the below snippet of code i am passing a variable var_awk for regular expression in awk. The idea is to get lines above a regular expression but the below echo is not displaying any data
echo `ls -ltr $date*$f* | /usr/xpg4/bin/awk -v reg=$var_awk '/reg/ {print $0}'`
I am unable to reg for regex though when i do print reg it is printing but when not doing regex as expected.
if [ $GE == "HBCA" ] || [ $GE == "HBUS" ] || [ $GE == "HBEU" ]; then
for f in `ls -ltr $date*GEN*REVAL*log|grep -v LPD | awk '{split($9,a,"_")}{print a[3]}'`; do
echo $f
var_awk="$date"_RESET_CALC_"$f"
echo $var_awk
echo `ls -ltr $date*$f* | /usr/xpg4/bin/awk -v reg=$var_awk '/reg/ {print $0}'`
You cannot use variable in regex that way. You need to do:
/usr/xpg4/bin/awk -v reg="$var_awk" '$0~reg{ print $0 }'
or simply
/usr/xpg4/bin/awk -v reg="$var_awk" '$0~reg'
Inside / / your variable reg will be used as a literal word.
Quote your shell variables.
try this:
...whatever you had already..|awk -v reg="$var_awk" '$0~reg'
it is better to wrap shell variable with quotes, e.g. if your var has spaces.
/pattern/ in awk is called regex constant. It cannot be used with variable, that's why it is called constant. We need to use dynamic regex here in this example.

In UNIX Terminal How to get a part of filename in a folder?

I have a list of n files in a folder have some format.
Eg: ABCD.EXXXX.ZZZZ.ZZZZZ.txt
in above file ABCD.E is common for all the files,ZZZZ.ZZZZ is user wish string and i need to extract XXXX from all the files , need to display distinct XXXX to user.. Is there any way to do so.? Help me out in doing so.. Thanks in advance..
Use ls -1 to make a list of the relevant files. Pipe it into sed to strip the beginning 'ABCD.E'. Then pipe it into sed again to remove everything after the first '.'
ls -1 ABCD\.E*\.txt | sed 's/^ABCD\.E//' | sed 's/\..*//'
Alternatively, if you want a little more control of the output you can do the second bit with awk
ls -1 ABCD\.E*\.txt | sed 's/^ABCD\.E//' | awk 'BEGIN{FS="."}{print "value =", $1, "user=", $2"."$3}'
awk -F"."'{print $2}' filename
You can try printing $1, $2 ,$3... to get more understanding of command.
You can use the bash/ksh parameter subsitution # and % for this from inside the shell.
function get_filename_section {
typeset f=${1:?}
typeset r=${f#ABCD.E}
print ${r%.ZZZZ.ZZZZZ.txt}
}
Testing:
[[ $( get_filename_section ABCD.EXXXX.ZZZZ.ZZZZZ.txt ) == XXXX ]] &&
echo ok || echo no

grep for a string in a line if the previous line doesn't contain a specific string

I have the following lines in a file:
abcdef ghi jkl
uvw xyz
I want to grep for the string "xyz" if the previous line is not contains the string "jkl".
I know how to grep for a string if the line doesn't contains a specific string using -v option. But i don't know how to do this with different lines.
grep is really a line-oriented tool. It might be possible to achieve what you want with it, but it's easier to use Awk:
awk '
/xyz/ && !skip { print }
{ skip = /jkl/ }
' file
Read as: for every line, do
if the current line matches xyz and we haven't just seen jkl, print it;
set the variable skip to indicate whether we've just seen jkl.
sed '/jkl/{N;d}; /xyz/!d'
If find jkl, remove that line and next
print only remaining lines with xyz
I think you're better off using an actual programming language, even a simple one like Bash or AWK or sed. For example, using Bash:
(
previous_line_matched=
while IFS= read -r line ; do
if [[ ! "$previous_line_matched" && "$line" == *xyz* ]] ; then
echo "$line"
fi
if [[ "$line" == *jkl* ]] ; then
previous_line_matched=1
else
previous_line_matched=
fi
done < input_file
)
Or, more tersely, using Perl:
perl -ne 'print if m/xyz/ && ! $skip; $skip = m/jkl/' < input_file

Resources