loop through different arguments in Rscript within Korn shell - r

I have an R script which I'm running in the terminal by firstly generating a .ksh file called myscript.ksh with the following information:
#!/bin/ksh
Rscript myscript.R 'Input1'
and then run the function with
./mycode.ksh
which sends the script to a node on the cluster in our department (the processes that we send to the cluster must be as a .ksh file).
'Input1' is an input argument that is used by the R script to some analysis.
The issue that I now have is that I need to run this script a number of times with different input arguments to the function. One solution is to generate a few .ksh files, such as:
#!/bin/ksh
Rscript myscript.R 'Input2'
and
#!/bin/ksh
Rscript myscript.R 'Input3'
and then execute them seperately, but I was hoping to find a better solution.
Note that I have to do this for 100 different input arguments so it is not realistic to write 100 of these files. Is there a way of generating another file with the information needed to be supplied to the function e.g. 'Input1' 'Input2' 'Input3' and then run myscript.ksh for these individually.
For example, I could have a variable defining the name of the input arguments and then have a loop which would pass it to myscript.ksh. Is that possible?
The reason for running these in this manner is so that each iteration will hopefully be send to a different node on the cluster, thus analysing the data at a much faster rate.

You need to do two things:
Create an array of all your input variables
Loop through the array and initiate all your calls
The following illustrates the concept:
#!/bin/ksh
#Create array of inputs - space separator
inputs=(Input1 Input2 Input3 Input4)
# Loop through all the array items {0 ... n-1}
for i in {0..3}
do
echo ${inputs[i]}
done
This will output all the values in the inputs array.
You just need to replace the contents of the do-loop with:
Rscript myscript.R ${inputs[i]}
Also, you may need to add a ` &' at the end of the Rscript command line to spawn off each Rscript command as a separate thread -- otherwise, the shell would wait for a return from each Rscript command before going onto the next.
EDIT:
Based on your comments, you need to actually generate .ksh scripts to submit to qsub. For this you just need to expand the do loop.
For example:
#!/bin/ksh
#Create array of inputs - space separator
inputs=(Input1 Input2 Input3 Input4)
# Loop through all the array items {0 ... n-1}
for i in {0..3}
do
cat > submission.ksh << EOF
#!/bin/ksh
Rscript myscript.R ${inputs[i]}
EOF
chmod u+x submission.ksh
qsub submission.ksh
done
The EOF defines the beginning and end of what will be taken as input (STDIN) and the output (STDOUT) will written to submission.ksh.
Then submission.ksh is made executable with the chmod command.
And then the script is submitted via qsub. I'll let you fill in any other arguments you need for qsub.

When your script doesn't know all parameters when it starts, you can make a .ksh file called mycode.ksh with the following information:
#!/bin/ksh
if [ $# -ne 1 ]; then
echo "Usage: $0 input"
exit 1
fi
# Or start at the background with nohup .... &, other question
Rscript myscript.R "$1"
and then run the function with
./mycode.ksh inputX
When your application knows all arguments, you can use a loop:
#!/bin/ksh
if [ $# -eq 0 ]; then
echo "Usage: $0 input(s)"
exit 1
fi
for input in $*; do
Rscript myscript.R "${input}"
done
and then run the function with
./mycode.ksh input1 input2 "input with space in double quotes" input4

Related

Unit testing Zsh completion script

I'm trying to write a completion script for Zsh. I'd like to unit test the completion script. For example, I'd like to test that completions for my-command --h include --help.
For Fish, I can use complete -C 'my-command --h', which would then output --help and any other valid completions.
I can't seem to find an equivalent command for Zsh. Does one exist? I've tried things like _main_complete, _complete and _normal, but either they don't support this or I'm not invoking them in the correct way (I get a lot of can only be called from completion function errors).
I get a lot of can only be called from completion function errors
This is because Zsh's completion commands can run only from inside a completion widget, which in turn can only be called while the Zsh Line Editor is active. We can work around this by activating a completion widget on an active command line inside a so-called pseudo terminal:
# Set up your completions as you would normally.
compdef _my-command my-command
_my-command () {
_arguments '--help[display help text]' # Just an example.
}
# Define our test function.
comptest () {
# Gather all matching completions in this array.
# -U discards duplicates.
typeset -aU completions=()
# Override the builtin compadd command.
compadd () {
# Gather all matching completions for this call in $reply.
# Note that this call overwrites the specified array.
# Therefore we cannot use $completions directly.
builtin compadd -O reply "$#"
completions+=("$reply[#]") # Collect them.
builtin compadd "$#" # Run the actual command.
}
# Bind a custom widget to TAB.
bindkey "^I" complete-word
zle -C {,,}complete-word
complete-word () {
# Make the completion system believe we're on a normal
# command line, not in vared.
unset 'compstate[vared]'
_main_complete "$#" # Generate completions.
# Print out our completions.
# Use of ^B and ^C as delimiters here is arbitrary.
# Just use something that won't normally be printed.
print -n $'\C-B'
print -nlr -- "$completions[#]" # Print one per line.
print -n $'\C-C'
exit
}
vared -c tmp
}
zmodload zsh/zpty # Load the pseudo terminal module.
zpty {,}comptest # Create a new pty and run our function in it.
# Simulate a command being typed, ending with TAB to get completions.
zpty -w comptest $'my-command --h\t'
# Read up to the first delimiter. Discard all of this.
zpty -r comptest REPLY $'*\C-B'
zpty -r comptest REPLY $'*\C-C' # Read up to the second delimiter.
# Print out the results.
print -r -- "${REPLY%$'\C-C'}" # Trim off the ^C, just in case.
zpty -d comptest # Delete the pty.
Running the example above will print out:
--help
If you want to test the entire completion output and not just the strings that would be inserted on the command line, then see https://unix.stackexchange.com/questions/668618/how-to-write-automated-tests-for-zsh-completion/668827#668827

How to give output location of file in shell script?

I have a a Shell script that contain a Perl script and R script.
my Shell script R.sh:-
#!/bin/bash
./R.pl #calling Perl script
`perl -lane 'print $F[0]' /media/data/abc.cnv > /media/data/abc1.txt`;
#Shell script
Rscript R.r #calling R script
This is my R.pl (head):-
`export path=$PATH:/media/exe_folder/bin`;
print "Enter the path to your input file:";
$base_dir ="/media/exe_folder";
chomp($CEL_dir = <STDIN>);
opendir (DIR, "$CEL_dir") or die "Couldn't open directory $CEL_dir";
$cel_files = "$CEL_dir"."/cel_files.txt";
open(CEL,">$cel_files")|| die "cannot open $file to write";
print CEL "cel_files\n";
for ( grep { /^[\w\d]/ } readdir DIR ){
print CEL "$CEL_dir"."/$_\n";
}close (CEL);
The output of Perl script is input for Shell script and Shell's output is input for R script.
I want to run the Shell script by providing the input file name and output file name like :-
./R.sh home/folder/inputfile.txt home/folder2/output.txt
If folder contain many files then it will take only user define file and process it.
Is There is a way to do this?
I guess this is what you want:
#!/bin/bash
# command line parameters
_input_file=$1
_output_file=$2
# #TODO: not sure if this path is the one you intended...
_script_path=$(dirname $0)
# sanity checks
if [[ -z "${_input_file}" ]] ||
[[ -z "${_output_file}" ]]; then
echo 1>&2 "usage: $0 <input file> <output file>"
exit 1
fi
if [[ ! -r "${_input_file}" ]]; then
echo 1>&2 "ERROR: can't find input file '${input_file}'!"
exit 1
fi
# process input file
# 1. with Perl script (writes to STDOUT)
# 2. post-process with Perl filter
# 3. run R script (reads from STDIN, writes to STDOUT)
perl ${_script_path}/R.pl <"${_input_file}" | \
perl -lane 'print $F[0]' | \
Rscript ${_script_path}/R.r >"${_output_file}"
exit 0
Please see the notes how the called scripts should behave.
NOTE: I don't quite understand why you need to post-process the output of the Perl script with Perl filter. Why not integrate it directly into the Perl script itself?
BONUS CODE: this is how you would write the main loop in R.pl to act as proper filter, i.e. reading lines from STDIN and writing the result to STDOUT. You can use the same approach also in other languages, e.g. R.
#!/usr/bin/perl
use strict;
use warnings;
# read lines from STDIN
while (<STDIN>) {
chomp;
# add your processing code here that does something with $_, i.e. the line
# EXAMPLE: upper case first letter in all words on the line
s/\b([[:lower:]])/\U\1/;
# write result to STDOUT
print "$_\n";
}

How To Run Different, Multiple Rscripts on SGE Cluster

I am trying to run different Rscripts on a SGE cluster, each Rscript only changes by one variable (e.g. cancer <- "UVM" or "ACC", etc.).
I have attempted two ways: either run a Single Rscript that gets command line arguments for the 30 different cancer names
OR
run each Rscript (i.e. UVM.r, ACC.r, etc.)
Either way, I am having alot of difficulty figuring out how to submit these jobs so I can run either one Rscript 30 times with different argument each time OR run multiple Rscripts with no command line arguments.
You can use while loop in bash for this.
Setup input file of arguments, e.g. args.txt:
UVM
ACC
Run qsub in while loop to submit script for each argument:
while read arg
do
echo "Rscript script.R ${arg}" | qsub <options>
done <args.txt
Above uses echo to pass code to run to qsub.
A job script like this:
#!/bin/bash
#$ -t 1-30
shift ${SGE_TASK_ID}
exec Rscript script.R $1
Submit like this qsub job_script dummy UVM ACC ...

unix shell, getting exit code with piped child

Let's say I do this in a unix shell
$ some-script.sh | grep mytext
$ echo $?
this will give me the exit code of grep
but how can I get the exit code of some-script.sh
EDIT
Assume that the pipe operation is immutable. ie, I can not break it apart and run the two commands seperately
There are multiple solutions, it depends on what you want to do exactly.
The easiest and understandable way would be to send the output to a file, then grep for it after saving the exit code:
tmpfile=$(mktemp)
./some-script.sh > $tmpfile
retval=$?
grep mytext $tmpfile
rm tmpfile
A trick from the comp.unix.shell FAQ (#13) explains how using the pipeline in the Bourne shell should help accomplish what you want:
You need to use a trick to pass the exit codes to the main
shell. You can do it using a pipe(2). Instead of running
"cmd1", you run "cmd1; echo $?" and make sure $? makes it way
to the shell.
exec 3>&1
eval `
# now, inside the `...`, fd4 goes to the pipe
# whose other end is read and passed to eval;
# fd1 is the normal standard output preserved
# the line before with exec 3>&1
exec 4>&1 >&3 3>&-
{
cmd1 4>&-; echo "ec1=$?;" >&4
} | {
cmd2 4>&-; echo "ec2=$?;" >&4
} | cmd3
echo "ec3=$?;" >&4
If you're using bash:
PIPESTATUS
An array variable (see Arrays) containing a list of exit status values from the processes in the most-recently-executed foreground pipeline (which may contain only a single command).
There is a utility named mispipe which is part of the moreutils package.
It does exactly that: mispipe some-script.sh 'grep mytext'
First approach, temporarly save exit status in some file. This cause you must create subshell using braces:
(your_script.sh.pl.others; echo $? >/tmp/myerr)|\ #subshell with exitcode saving
grep sh #next piped commands
exitcode=$(cat /tmp/myerr) #restore saved exitcode
echo $exitcode #and print them
another approach presented by Randy above, simplier code implementation:
some-script.sh | grep mytext
echo ${PIPESTATUS[0]} #print exitcode for first commands. tables are indexted from 0
its all. both works under bash (i know, bashizm). good luck :)
both approaches does not save temporarly pipe to physical file, only exit code.

'tee' and exit status

Is there an alternative to tee which captures standard output and standard error of the command being executed and exits with the same exit status as the processed command?
Something like the following:
eet -a some.log -- mycommand --foo --bar
Where "eet" is an imaginary alternative to "tee" :) (-a means append and -- separates the captured command). It shouldn't be hard to hack such a command, but maybe it already exists and I'm not aware of it?
This works with Bash:
(
set -o pipefail
mycommand --foo --bar | tee some.log
)
The parentheses are there to limit the effect of pipefail to just the one command.
From the bash(1) man page:
The return status of a pipeline is the exit status of the last command, unless the pipefail option is enabled. If pipefail is enabled, the pipeline's return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully.
I stumbled upon a couple of interesting solutions at Capture Exit Code Using Pipe & Tee.
There is the $PIPESTATUS variable available in Bash:
false | tee /dev/null
[ $PIPESTATUS -eq 0 ] || exit $PIPESTATUS
And the simplest prototype of "eet" in Perl may look as follows:
open MAKE, "command 2>&1 |" or die;
open (LOGFILE, ">>some.log") or die;
while (<MAKE>) {
print LOGFILE $_;
print
}
close MAKE; # To get $?
my $exit = $? >> 8;
close LOGFILE;
Here's an eet. Works with every Bash I can get my hands on, from 2.05b to 4.0.
#!/bin/bash
tee_args=()
while [[ $# > 0 && $1 != -- ]]; do
tee_args=("${tee_args[#]}" "$1")
shift
done
shift
# now ${tee_args[*]} has the arguments before --,
# and $* has the arguments after --
# redirect standard out through a pipe to tee
exec | tee "${tee_args[#]}"
# do the *real* exec of the desired program
exec "$#"
(pipefail and $PIPESTATUS are nice, but I recall them being introduced in 3.1 or thereabouts.)
This is what I consider to be the best pure-Bourne-shell solution to use as the base upon which you could build your "eet":
# You want to pipe command1 through command2:
exec 4>&1
exitstatus=`{ { command1; echo $? 1>&3; } | command2 1>&4; } 3>&1`
# $exitstatus now has command1's exit status.
I think this is best explained from the inside out – command1 will execute and print its regular output on stdout (file descriptor 1), then once it's done, echo will execute and print command1's exit code on its stdout, but that stdout is redirected to file descriptor three.
While command1 is running, its stdout is being piped to command2 (echo's output never makes it to command2 because we send it to file descriptor 3 instead of 1, which is what the pipe reads). Then we redirect command2's output to file descriptor 4, so that it also stays out of file descriptor one – because we want file descriptor one clear for when we bring the echo output on file descriptor three back down into file descriptor one so that the command substitution (the backticks) can capture it.
The final bit of magic is that first exec 4>&1 we did as a separate command – it opens file descriptor four as a copy of the external shell's stdout. Command substitution will capture whatever is written on standard out from the perspective of the commands inside it – but, since command2's output is going to file descriptor four as far as the command substitution is concerned, the command substitution doesn't capture it – however, once it gets "out" of the command substitution, it is effectively still going to the script's overall file descriptor one.
(The exec 4>&1 has to be a separate command to work with many common shells. In some shells it works if you just put it on the same line as the variable assignment, after the closing backtick of the substitution.)
(I use compound commands ({ ... }) in my example, but subshells (( ... )) would also work. The subshell will just cause a redundant forking and awaiting of a child process, since each side of a pipe and the inside of a command substitution already normally implies a fork and await of a child process, and I don't know of any shell being coded to recognize that it can skip one of those forks because it's already done or is about to do the other.)
You can look at it in a less technical and more playful way, as if the outputs of the commands are leapfrogging each other: command1 pipes to command2, then the echo's output jumps over command2 so that command2 doesn't catch it, and then command2's output jumps over and out of the command substitution just as echo lands just in time to get captured by the substitution so that it ends up in the variable, and command2's output goes on its way to the standard output, just as in a normal pipe.
Also, as I understand it, at the end of this command, $? will still contain the return code of the second command in the pipe, because variable assignments, command substitutions, and compound commands are all effectively transparent to the return code of the command inside them, so the return status of command2 should get propagated out.
A caveat is that it is possible that command1 will at some point end up using file descriptors three or four, or that command2 or any of the later commands will use file descriptor four, so to be more hygienic, we would do:
exec 4>&1
exitstatus=`{ { command1 3>&-; echo $? 1>&3; } 4>&- | command2 1>&4; } 3>&1`
exec 4>&-
Commands inherit file descriptors from the process that launches them, so the entire second line will inherit file descriptor four, and the compound command followed by 3>&1 will inherit the file descriptor three. So the 4>&- makes sure that the inner compound command will not inherit file descriptor four, and the 3>&- makes sure that command1 will not inherit file descriptor three, so command1 gets a 'cleaner', more standard environment. You could also move the inner 4>&- next to the 3>&-, but I figure why not just limit its scope as much as possible.
Almost no programs uses pre-opened file descriptor three and four directly, so you almost never have to worry about it, but the latter is probably best to keep in mind and use for general-purpose cases.
{ mycommand --foo --bar 2>&1; ret=$?; } | tee -a some.log; (exit $ret)
KornShell, all in one line:
foo; RET_VAL=$?; if test ${RET_VAL} != 0;then echo $RET_VAL; echo Error occurred!>/tmp/out.err;exit 2;fi |tee >>/tmp/out.err ; if test ${RET_VAL} != 0;then exit $RET_VAL;fi
#!/bin/sh
logfile="$1"
shift
exec 2>&1
exec "$#" | tee "$logfile"
Hopefully this works for you.
Assuming Bash or Z shell (zsh),
my_command >>my_log 2>&1
N.B. The sequence of redirection and duplication of standard error onto standard output is significant!
I didn't realise you wanted to see the output on screen as well. This will of course direct all output to the file my_log.

Resources