How to give output location of file in shell script? - r

I have a a Shell script that contain a Perl script and R script.
my Shell script R.sh:-
#!/bin/bash
./R.pl #calling Perl script
`perl -lane 'print $F[0]' /media/data/abc.cnv > /media/data/abc1.txt`;
#Shell script
Rscript R.r #calling R script
This is my R.pl (head):-
`export path=$PATH:/media/exe_folder/bin`;
print "Enter the path to your input file:";
$base_dir ="/media/exe_folder";
chomp($CEL_dir = <STDIN>);
opendir (DIR, "$CEL_dir") or die "Couldn't open directory $CEL_dir";
$cel_files = "$CEL_dir"."/cel_files.txt";
open(CEL,">$cel_files")|| die "cannot open $file to write";
print CEL "cel_files\n";
for ( grep { /^[\w\d]/ } readdir DIR ){
print CEL "$CEL_dir"."/$_\n";
}close (CEL);
The output of Perl script is input for Shell script and Shell's output is input for R script.
I want to run the Shell script by providing the input file name and output file name like :-
./R.sh home/folder/inputfile.txt home/folder2/output.txt
If folder contain many files then it will take only user define file and process it.
Is There is a way to do this?

I guess this is what you want:
#!/bin/bash
# command line parameters
_input_file=$1
_output_file=$2
# #TODO: not sure if this path is the one you intended...
_script_path=$(dirname $0)
# sanity checks
if [[ -z "${_input_file}" ]] ||
[[ -z "${_output_file}" ]]; then
echo 1>&2 "usage: $0 <input file> <output file>"
exit 1
fi
if [[ ! -r "${_input_file}" ]]; then
echo 1>&2 "ERROR: can't find input file '${input_file}'!"
exit 1
fi
# process input file
# 1. with Perl script (writes to STDOUT)
# 2. post-process with Perl filter
# 3. run R script (reads from STDIN, writes to STDOUT)
perl ${_script_path}/R.pl <"${_input_file}" | \
perl -lane 'print $F[0]' | \
Rscript ${_script_path}/R.r >"${_output_file}"
exit 0
Please see the notes how the called scripts should behave.
NOTE: I don't quite understand why you need to post-process the output of the Perl script with Perl filter. Why not integrate it directly into the Perl script itself?
BONUS CODE: this is how you would write the main loop in R.pl to act as proper filter, i.e. reading lines from STDIN and writing the result to STDOUT. You can use the same approach also in other languages, e.g. R.
#!/usr/bin/perl
use strict;
use warnings;
# read lines from STDIN
while (<STDIN>) {
chomp;
# add your processing code here that does something with $_, i.e. the line
# EXAMPLE: upper case first letter in all words on the line
s/\b([[:lower:]])/\U\1/;
# write result to STDOUT
print "$_\n";
}

Related

What is a neat command line equivalent to RStudio's Knit HTML?

What is a neat command line equivalent to RStudio's Knit HTML? Given an .Rmd file, you can use RStudio to knit .html, .docx and .pdf files using Knitr. It would be great to shift this process completely to the command line. My approach so far:
Rscript -e "library(knitr); knit('test.Rmd')" # This creates test.md
pandoc test.md >> test.html
This works fine, but the resulting test.html does not come with the same pretty make over as in RStudio. Any suggestions how one should best knit .Rmd files to .html via the command line, and end up with a pretty .html?
Extra question: What would be the best command line solution for .pdf or .docx?
rmarkdown::render("test.Rmd", "html_document")
Following up on the accepted answer, I've drafted a bash script called "knitter" that will do everything needed, all the user needs to do is input: ./knitter file.Rmd file.html or ./knitter file.Rmd file.pdf.
The script is below:
#!/bin/sh
### Test usage; if incorrect, output correct usage and exit
if [ "$#" -gt 2 -o "$#" -lt 2 ]; then
echo "********************************************************************"
echo "* Knitter version 1.0 *"
echo "********************************************************************"
echo -e "The 'knitter' script converts Rmd files into HTML or PDFs. \n"
echo -e "usage: knitter file.Rmd file.{pdf,html} \n"
echo -e "Spaces in the filename or directory name may cause failure. \n"
exit
fi
# Stem and extension of file
extension1=`echo $1 | cut -f2 -d.`
extension2=`echo $2 | cut -f2 -d.`
### Test if file exist
if [[ ! -r $1 ]]; then
echo -e "\n File does not exist, or option mispecified \n"
exit
fi
### Test file extension
if [[ $extension1 != Rmd ]]; then
echo -e "\n Invalid input file, must be a Rmd-file \n"
exit
fi
# Create temporary script
# Use user-defined 'TMPDIR' if possible; else, use /tmp
if [[ -n $TMPDIR ]]; then
pathy=$TMPDIR
else
pathy=/tmp
fi
# Tempfile for the script
tempscript=`mktemp $pathy/tempscript.XXXXXX` || exit 1
if [[ $extension2 == "pdf" ]]; then
echo "library(rmarkdown); rmarkdown::render('"${1}"', 'pdf_document')" >> $tempscript
Rscript $tempscript
fi
if [[ $extension2 == "html" ]]; then
echo "library(rmarkdown); rmarkdown::render('"${1}"', 'html_document')" >> $tempscript
Rscript $tempscript
fi
My simpler command-line script, similar to Tyler R.'s:
In your .profile or similar, add:
function knit() {
R -e "rmarkdown::render('$1')"
}
Then, on command line, type knit file.Rmd
EDIT: For nice autocomplete, see comments
I set up output format in the Rmd header: output: github_document or similar
getwd()
setwd("C:Location of the RMD File")
# To make it in to PDF you can use the below code also
rmarkdown::render("Filname.Rmd")
# To make it in to PDF you can use the below code also
rmarkdown::render("Filename", "pdf_document")
I Typed the above in to a R Script and triggered it from Python command prompt and solved my requirement:)
Kindly note : if it doesn't work.. Try to install latex and try again.. All the best :)
From the mac/linux terminal, you could run:
R -e "rmarkdown::render('README.Rmd')"
(replacing README.Rmd with whatever file you wish to knit)

loop through different arguments in Rscript within Korn shell

I have an R script which I'm running in the terminal by firstly generating a .ksh file called myscript.ksh with the following information:
#!/bin/ksh
Rscript myscript.R 'Input1'
and then run the function with
./mycode.ksh
which sends the script to a node on the cluster in our department (the processes that we send to the cluster must be as a .ksh file).
'Input1' is an input argument that is used by the R script to some analysis.
The issue that I now have is that I need to run this script a number of times with different input arguments to the function. One solution is to generate a few .ksh files, such as:
#!/bin/ksh
Rscript myscript.R 'Input2'
and
#!/bin/ksh
Rscript myscript.R 'Input3'
and then execute them seperately, but I was hoping to find a better solution.
Note that I have to do this for 100 different input arguments so it is not realistic to write 100 of these files. Is there a way of generating another file with the information needed to be supplied to the function e.g. 'Input1' 'Input2' 'Input3' and then run myscript.ksh for these individually.
For example, I could have a variable defining the name of the input arguments and then have a loop which would pass it to myscript.ksh. Is that possible?
The reason for running these in this manner is so that each iteration will hopefully be send to a different node on the cluster, thus analysing the data at a much faster rate.
You need to do two things:
Create an array of all your input variables
Loop through the array and initiate all your calls
The following illustrates the concept:
#!/bin/ksh
#Create array of inputs - space separator
inputs=(Input1 Input2 Input3 Input4)
# Loop through all the array items {0 ... n-1}
for i in {0..3}
do
echo ${inputs[i]}
done
This will output all the values in the inputs array.
You just need to replace the contents of the do-loop with:
Rscript myscript.R ${inputs[i]}
Also, you may need to add a ` &' at the end of the Rscript command line to spawn off each Rscript command as a separate thread -- otherwise, the shell would wait for a return from each Rscript command before going onto the next.
EDIT:
Based on your comments, you need to actually generate .ksh scripts to submit to qsub. For this you just need to expand the do loop.
For example:
#!/bin/ksh
#Create array of inputs - space separator
inputs=(Input1 Input2 Input3 Input4)
# Loop through all the array items {0 ... n-1}
for i in {0..3}
do
cat > submission.ksh << EOF
#!/bin/ksh
Rscript myscript.R ${inputs[i]}
EOF
chmod u+x submission.ksh
qsub submission.ksh
done
The EOF defines the beginning and end of what will be taken as input (STDIN) and the output (STDOUT) will written to submission.ksh.
Then submission.ksh is made executable with the chmod command.
And then the script is submitted via qsub. I'll let you fill in any other arguments you need for qsub.
When your script doesn't know all parameters when it starts, you can make a .ksh file called mycode.ksh with the following information:
#!/bin/ksh
if [ $# -ne 1 ]; then
echo "Usage: $0 input"
exit 1
fi
# Or start at the background with nohup .... &, other question
Rscript myscript.R "$1"
and then run the function with
./mycode.ksh inputX
When your application knows all arguments, you can use a loop:
#!/bin/ksh
if [ $# -eq 0 ]; then
echo "Usage: $0 input(s)"
exit 1
fi
for input in $*; do
Rscript myscript.R "${input}"
done
and then run the function with
./mycode.ksh input1 input2 "input with space in double quotes" input4

How to write an output of multiple simultaneous R script into a single file?

For my simulations, I wrote a little loop to run multiple (n=20) instances of "my_script.R" simultaneously:
#! /bin/bash
declare -i i=1 n=20
while [ $i -le $n ]; do
echo "#!/bin/bash --login" >my.qsub.${i}
echo "#PBS -l nodes=1:ppn=1" >> mu.qsub.${i}
echo "#PBS -l mem=2GB" >> my.qsub.${i}
echo "cd /~path to my wd/" >> my.qsub.${i}
echo "module load R/3.0.1" >> my.qsub.${i}
echo -n 'R CMD BATCH --vanilla --slave --no-timing my_script.R ' >> my.qsub.${i}
echo "" >> my.qsub.${i}
qsub -l walltime=03:59:00 my.qsub.${i}
sleep 2
let i+=1
done
Within "my_script.R", I have a for loop (n=1,...,B), which writes it's results into a "fnam" .txt file in the end of each iteration:
cat(file=fnam, append=TRUE, Results[n,], "\n")
Everything works fine, BUT it looks like there may be a problem if two different instances of the same R scrip try to append the fnam file simultaneously.
Did anyone try to synchronize/order the way, in which multiple instances of the same R script append the same output file?
They jobs will need to write to different files which you will have to combine after. You can submit all the jobs at once as as 'Array Job' using qsub -t 20 -sync y. That will create 20 identical jobs and wait for all of them to finish before returning. Each job can get a unique identifier for itself via the environment variable SGE_TASK_ID from which it can craft a unique filename. The sync option will make qsub wait until they complete before returning and then you can concatenate all the files together.

Unix command to capitalize first letter of file name

I have a folder "activity" which contains files like getEmployee.java, getTable.java etc. I was wondering if there was a unix command that could replace the file Name to give me: GetEmployee.java, GetTable.java and so on.
I've tried mv getEmployee.java GetEmployee.java
However this is pretty cumbersome as you can imagine since i have almost 70 files. Is there a way in Unix I can do this? I usually use sed to replace stuff, but I don't think that works for filenames. Can someone please suggest an easier way please?
This is a shell script that will find *.java files in the local directory and alter them:
#!/bin/sh
find . -name "*.java" -print | gawk -F "/" '
{
new = sprintf( "%s%s", toupper( substr( $NF, 1, 1 ) ), substr( $NF, 2 ) )
cmd = sprintf( "mv %s %s", $NF, new )
# comment out the next two lines and uncomment the printf() to see the commands
cmd | getline ret_val
close( cmd )
#printf( "%s => ret_val = %s\n", cmd, ret_val"" )
} '
I saved it to a script named "alterjava", "chmod +x alterjava" then ran it on a directory of zero sized files I made up for testing. You can check the commands before running it by commenting out the cmd line and uncommenting the printf() line
Bash can perform case-replacement in parameter substitution, using ${^} to capitalise the first character:
#!/bin/bash
for i in *.java; do mv -v "$i" "${i^}"; done
Note that this is not standard POSIX; other shells need not have this feature
rename -vn 's/([a-z])(\w+.java)/\u$1$2/' *.java
remove -n to execute command

unable to run awk command as a shell script

i am trying to create a shell script to search for a specific index in a multiline csv file.
the code i am trying is:
#!/bin/sh
echo "please enter the line no. to search: "
read line
echo "please enter the index to search at: "
read index
awk -F, 'NR=="$line"{print "$index"}' "$1"
the awk command I try to use on the shell works absolutely fine. But when I am trying to create a shell script out of this command, it fails and gives no output. It reads the line no. and index. and then no output at all.
is there something I am doing wrong?
I run the file at the shell by typing:
./fetchvalue.sh newfile.csv
Your quoting is not going to work. Try this:
awk -F, 'NR=="'$line'"{print $'$index'}' "$1"
Rather than going through quoting hell, try this:
awk -F, -v line=$line -v myindex=$index 'NR==line {print $myindex}' "$1"
(Index is a reserved word in awk, so I gave it a slightly differet name)

Resources