Executing R Script in Unix with parameters - r

In the Unix script, is there any way to run R files but with arguments in the Unix script?
I know that to run R files in that system, you will need to type "R -f "file" but what codes do you need in R so that you will need to type this instead on Unix:
"R -f "file" arg1 arg2"

Here is an example. Save this code in test.R:
#!/usr/bin/env Rscript
# make this script executable by doing 'chmod +x test.R'
help = cat(
"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Help text here
Arguments in this order:
1) firstarg
2) secondarg
3) thirdarg
4) fourtharg
./test.R firstarg secondarg thirdarg fourtharg
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
\n\n")
# Read options from command line
args = commandArgs(trailingOnly = TRUE)
if(is.element("--help", args) | is.element("-h", args) | is.element("-help", args) | is.element("--h", args)){
cat(help,sep="\n")
stop("\nHelp requested.")
}
print(args)
Do chmod +x test.R
Then invoke it using ./test.R a b c d. It should print: [1] "a" "b" "c" "d".
You can access each of the args by doing args[1] to get to a and args[4] to get to d.

The suggestion to use Rscript does seem useful but possibly not what is being asked. One can also start R from the command line with an input file that gets sourced. The R interpreter can access the commandArgs in that mode as well. This is a minimal "ptest.R" file in my user directory that is also my default working directory:
ca <- commandArgs()
print(ca)
From a Unix command line I can do:
$ r -f ~/ptest.r --args "test of args"
And R opens, displays the usual startup messages and announces packages loaded by .Rprofile and then:
> ca <- commandArgs()
> print(ca)
[1] "/Library/Frameworks/R.framework/Resources/bin/exec/R"
[2] "-f"
[3] "/Users/davidwinsemius/ptest.r"
[4] "--args"
[5] "test of args"
>
>
And then quits.

Related

No stdout from this Rscript command in Powershell

I'm trying to write the equivalent of this Bash command in Windows:
(tee <<"EOF"
<some-markdown-and-html>
EOF
) | Rscript -e '
input <- file("stdin", "r")
content <- readLines(input)
print(content)
'
I am quite unfamiliar with Windows, and thought Powershell seemed like a good choice, however I have a strange issue outputting text to stdout. This command works:
(#"
<some-markdown-and-html>
"#) | Rscript -e #"
print(readLines(file('stdin', 'r')))
"#
has the desired output:
[1] "<some-markdown-and-html>"
however when I break the R script into multiple lines it outputs nothing:
(#"
<some-markdown-and-html>
"#) | Rscript -e #"
input <- file('stdin', 'r')
content <- readLines(input)
print(content)
"#
How can I print to stdout with a multiline R script? Doesn't need to be Powershell, just Windows compatible (ideally without installing WSL). Thanks

Paralelizing an Rscript using a job array in Slurm

I want to run an Rscript.R using an array job in Slurm, with 1-10 tasks, whereby the task id from the job will be directed to the Rscript, to write a file named "'task id'.out", containing 'task id' in its body. However, this has proven to be more challenging than I anticipated haha I am trying the following:
~/bash_test.sh looks like:
#!/bin/bash -l
#SBATCH --time=00:01:00
#SBATCH --array=1-10
conda activate R
cd ~/test
R CMD BATCH --no-save --no-restore ~/Rscript_test.R $SLURM_ARRAY_TASK_ID
~/Rscript_test.R looks like:
#!/usr/bin/env Rscript
taskid = commandArgs(trailingOnly=TRUE)
# taskid <- Sys.getenv('SLURM_ARRAY_TASK_ID')
taskid <- as.data.frame(taskid)
# print task number
print(paste0("the number processed was... ", taskid))
write.table(taskid, paste0("~/test/",taskid,".out"),quote=FALSE, row.names=FALSE, col.names=FALSE)
After I submit my job (sbatch bash_test.sh), it looks like R is not really seeing SLURM_ARRAY_TASK_ID. The script is generating 10 files (1, 2, ..., 10 - just numbers - probably corresponding to the task ids), but it's not writing the files with the extension ".out": the script wrote an empty "integer(0).out" file.
What I wanted, was to populate the folder ~/test/ with 10 files, 1.out, 2.out, ..., 10.out, and each file has to contain the task id inside (simply the number 1, 2, ..., or 10, respectively).
P.S.: Note that I tried playing with Sys.getenv() too, but I don't think I was able to set that up properly. That option generates 10 files, and one 1.out file, containing number 10.
P.S.2: This is slurm 19.05.5. I am running R wihthin a conda environment.
You should avoid using "R CMD BATCH". It doesn't handle arguments the way most functions do. "Rscript" has been the recommended option for a while now. By calling "R CMD BATCH" you are basically ignoring the "#!/usr/bin/env Rscript" part of your script.
So change your script file to
#!/bin/bash -l
#SBATCH --time=00:01:00
#SBATCH --array=1-10
conda activate R
cd ~/test
Rscript ~/Rscript_test.R $SLURM_ARRAY_TASK_ID
And then becareful in your script that you aren't using the same variable as both a string a data.frame. You can't easily paste a data.frame into a file path for example. So
taskid <- commandArgs(trailingOnly=TRUE)
# taskid <- Sys.getenv('SLURM_ARRAY_TASK_ID') # This should also work
print(paste0("the number processed was... ", taskid))
outdata <- as.data.frame(taskid)
outfile <- paste0("~/test/", taskid, ".out")
write.table(outdata, outfile, quote=FALSE, row.names=FALSE, col.names=FALSE)
The extra files with just the array number were created because the usage of R CMD BATCH is
R CMD BATCH [options] infile [outfile]
So the $SLURM_ARRAY_TASK_ID value you were passing at the command line was treated as the outfile name. Instead that value needed to be passed as options. But again, it's better to use Rscript which has more standard argument conventions.

R, STDIN to Rscript

I am trying to make a R script - test.R - that can take either a file or a text string directly from a pipe in unix as in either:
file | test.R
or:
cat Sometext | test.R
Tried to follow answers here and here but I am clearly missing something. Is it the piping above or my script below that gives me a error like:
me#lnx: cat AAAA | test.R
bash: test.R: command not found
cat: AAAA: No such file or directory
My test script:
#!/usr/bin/env Rscript
input <- file("stdin", "r")
x <- readLines(input)
write(x, "")
UPDATE.
The script:
#!/usr/bin/env Rscript
con <- file("stdin")
open(con, blocking=TRUE)
x <- readLines(con)
x <- somefunction(x) #Do something or nothing with x
write(x,"")
close(con)
Then both cat file | ./test.R and echo AAAA | ./test.R yield the expected.
I still like r over Rscript here (but then I am not unbiased in this ...)
edd#rob:~$ (echo "Hello,World";echo "Bye,Bye") | r -e 'X <- readLines(stdin());print(X)' -
Hello,World
Bye,Bye
[1] "Hello,World" "Bye,Bye"
edd#rob:~$
r can also read.csv() directly:
edd#rob:~$ (echo "X,Y"; echo "Hello,World"; echo "Bye,Bye") | r -d -e 'print(X)' -
X Y
1 Hello World
2 Bye Bye
edd#rob:~$
The -d is essentially a predefined 'read stdin into X via read.csv' which I think I borrowed as an idea from rio or another package.
Edit: Your example works with small changes:
Make it executable: chmod 0755 ex.R
Pipe output in correctly, ie use echo not cat
Use the ./ex.R notation for a file in the current dir
I changed it to use print(x)
Then:
edd#rob:~$ echo AAA | ./ex.R
[1] "AAA"
edd#rob:~$
I generally use R from a terminal application (BASH shell). I have only done a few experiments with Rscript, but including the #! line allows the script to be run in R, while permitting the use of RScript to generate an executable file. I have to use chmod to set the executable flag on my test file. Your call to write() should print the same output to the console in R or RScript, but if I want to save my output to a file I call sink("fileName") to open the connection and sink() to close it. This generally gives me control of the output and how it is rendered. If I called my script "myScript.rs" and made it executable (chmod u+x myScript.rs) I can type something like ./myScript.rs to run it and get the output on OS X or Linux. Instead of a pipe | you might try redirection > or >> to create or append.

R or bash command line length limit

I'm developing a bash program that execute a R oneliner command to convert a RMarkdown template into a HTML document.
This R oneliner command looks like:
R -e 'library(rmarkdown) ; rmarkdown::render( "template.Rmd", "html_document", output_file = "report.html", output_dir = "'${OUTDIR}'", params = list( param1 = "'${PARAM1}'", param2 = "'${PARAM2}'", ... ) )
I have a long list of parameters, let's say 10 to explain the problem, and it seems that the R or bash has a command line length limit.
When I execute the R oneliner with 10 parameters I obtain a error message like this:
WARNING: '-e library(rmarkdown)~+~;~+~rmarkdown::render(~+~"template.Rmd",~+~"html_document",~+~output_file~+~=~+~"report.html",~+~output_dir~+~=~+~"output/",~+~params~+~=~+~list(~+~param1~+~=~+~"param2", ...
Fatal error: you must specify '--save', '--no-save' or '--vanilla'
When I execute the R oneliner with 9 parameters it's ok (I tried different combinations to verify that the problem was not the last parameter).
When I execute the R oneliner with 10 parameters but with removing all spaces in it, it's ok too so I guess that R or bash use a command line length limit.
R -e 'library(rmarkdown);rmarkdown::render("template.Rmd","html_document",output_file="report.html",output_dir="'${OUTDIR}'",params=list(param1="'${PARAM1}'",param2="'${PARAM2}'",...))
Is it possible to increase this limit?
This will break a number of ways – including if your arguments have spaces or quotes in them.
Instead, try passing the values as arguments. Something like this should give you an idea how it works:
# create a script file
tee arguments.r << 'EOF'
argv <- commandArgs(trailingOnly=TRUE)
arg1 <- argv[1]
print(paste("Argument 1 was", arg1))
EOF
# set some values
param1="foo bar"
param2="baz"
# run the script with arguments
Rscript arguments.r "$param1" "$param2"
Expected output:
[1] "Argument 1 was foo bar"
Always quote your variables and always use lowercase variable names to avoid conflicts with system or application variables.

How can I suppress the line numbers output using R CMD BATCH?

If I have an R script:
print("hi")
commandArgs()
And I run it using:
r CMD BATCH --slave --no-timing test.r output.txt
The output will contain:
[1] "hi"
[1] "/Library/Frameworks/R.framework/Resources/bin/exec/x86_64/R"
[2] "-f"
[3] "test.r"
[4] "--restore"
[5] "--save"
[6] "--no-readline"
[7] "--slave"
How can i suppress the line numbers[1]..[7] in the output so only the output of the script appears?
Use cat instead of print if you want to suppress the line numbers ([1], [2], ...) in the output.
I think you are also going to want to pass command line arguments. I think the easiest way to do that is to create a file with the RScript shebang:
For example, create a file called args.r:
#!/usr/bin/env Rscript
args <- commandArgs(TRUE)
cat(args, sep = "\n")
Make it executable with chmod +x args.r and then you can run it with ./args.r ARG1 ARG2
FWIW, passing command line parameters with the R CMD BATCH ... syntax is a pain. Here is how you do it: R CMD BATCH "--args ARG1 ARG2" args.r Note the quotes. More discussion here
UPDATE: changed shebang line above from #!/usr/bin/Rscript to #!/usr/bin/env Rscript in response to #mbq's comment (thanks!)
Yes, mbq is right -- use Rscript, or, if it floats your boat, littler:
$ cat /tmp/tommy.r
#!/usr/bin/r
cat("hello world\n")
print(argv[])
$ /tmp/tommy.r a b c
hello world
[1] "a" "b" "c"
$
You probably want to look at CRAN packages getopt and optparse for argument-parsing as you'd do in other scripting languages/
Use commandArgs(TRUE) and run your script with Rscript.
EDIT: Ok, I've misread your question. David has it right.
Stop Rscript from command-numbering the output from print
By default, R makes print(...) pre-pend command numbering to stdout like this:
print("we get signal")
Produces:
[1] "we get signal"
Rscript lets the user change the definition of functions like print, so it serves our purpose by default:
print = cat
print("we get signal")
Produces:
we get signal
Notice the command numbering and double quoting is gone.
Get more control of print by using R first class functions:
my_print <- function(x, ...){
#extra shenanigans for when the wind blows from the east on tuesdays, go here.
cat(x)
}
print = my_print
print("we get signal")
Prints:
we get signal
If you're using print as a poor mans debugger... We're not laughing at you, we're laughing with you.

Resources