Using rscript for expression with dash - r

I am using rscript to run some expressions but I'm having an issue with some cases with dashes. A simple example would be:
$ rscript -e '-1'
ERROR: option '-e' requires a non-empty argument
Adding parenthesis works out (rscript -e (-1)) but I'm not always sure that they will be properly parenthesized.
In the documentation it says
When using -e options be aware of the quoting rules in the shell used
So I tried using different quoting rules for bash, escaping the dashes or using single quotes but it still doesn't work.
$ rscript -e "\-1"
Error: unexpected input in "\"
Execution halted
Is there something I'm missing?

You misunderstand one part here. "Expression" is something R can parse, ie:
$ R --slave -e '1+1'
[1] 2
$
What you hit with -1 is a corner case. You can do
$ R --slave -e 'a <- -1; a'
[1] -1
$
or
$ R --slave -e 'print(-1)'
[1] -1
$
For actual argument parsing do you want an package like docopt (which I like and use a lot), or getopt (which I used before) or optparse. All are on CRAN.

Related

Check syntax of R script without running it

After making changes to a R script, is there a way to check its syntax by running a command, before running the R script itself?
Base R has parse which will parse a script without running it.
parse("myscript.R")
The codetools package, which comes with R, has checkUsage and checkUsagePackage to check single functions and packages respectively.
There is lint() from the lint package:
lintr::lint("tets.r")
#> tets.r:1:6: style: Place a space before left parenthesis, except in a function call.
#> is.na(((NA))
#> ^
#> tets.r:1:12: error: unexpected end of input
#> is.na(((NA))
#> ^
The tested file contains only this wrong code
is.na(((NA))
You can customise what lint() checks. By default, it is quite noisy about code style (which is the main reason I use it).
From within an R console/REPL session, you can use the builtin base's parse function:
$ R
> parse("hello.r")
expression(cat("Hello, World!\n"))
> parse("broken.r")
Error in parse("broken.r") : broken.r:2:0: unexpected end of input
1: cat("Hello, World!\n"
^
The parse function throws an error in case it fails to parse. A limitation here is that it will only detect parse errors, but not for example references to undefined functions.
Directly from the command line
You can also check parsing directly from the command line by using Rscript's -e option to call parse:
$ Rscript -e 'parse("hello.r")'
expression(cat("Hello, World!\n"))
$ Rscript -e 'parse("broken.r")'
Error in parse("broken.r") : broken.r:2:0: unexpected end of input
1: cat("Hello, World!\n"
^
Execution halted
A nice result of this is that Rscript returns successfully (no exit code) when the parse works and
it returns an error code when the parse fails. So you can use your shell's || and && operators as usual or other forms of error detection (set -e).
Custom parsing script
You can also create a custom script that wraps everything nicely:
#!/bin/bash
#
# Rparse: checks whether R scripts parse successfully
#
# usage: Rscript script.r
# usage: Rscript file1.r file2.r file3.r ...
set -e
for file in "$#"
do
Rscript -e "parse(\"$file\")"
done
Place the script in a folder pointed by your $PATH variable and use it to check your R files as follows:
$ Rparse hello.r broken.r
expression(cat("Hello, World!\n"))
Error in parse("broken.r") : broken.r:2:0: unexpected end of input
1: cat("Hello, World!\n"
^
Execution halted
Here I am using bash as a reference language but there's nothing impeding alternatives to be built for other shells including Windows' ".bat" files or even an R script!
(Other answers already address the question quite nicely, but I wanted to document some additional possibilities in a more complete answer.)

Can I force Rscript to pass wildcard in command line arguments?

As best as I can sort out (since I haven't had much luck finding documentation of it) when one runs Rscript with a command argument that includes a wildcard *, the argument is expanded to a character vector of the filepaths that match, or passed if there are no matches. Is there a way to pass the wildcard all the time, so I can handle it myself within the script (using things like Sys.glob, for example)?
Here's a minimal example, run from the terminal:
ls
## foo.csv bar.csv baz.txt
Rscript -e "print(commandArgs(T))" *.csv
## [1] "foo.csv" "bar.csv"
Rscript -e "print(commandArgs(T))" *.txt
## [1] "baz.txt"
Rscript -e "print(commandArgs(T))" *.rds
## [1] "*.rds"
EDIT: I have learned that this behavior is from bash, not Rscript. Is there some way to work around this behavior from within R, or to suppress wildcard expansion for a particular R script but not the Rscript command? In my particular case, I want to run a function with two arguments, Rscript collapse.R *.rds out.rds that concatenates the contents of many individual RDS files into a list and saves the result in out.rds. But since the wildcard gets expanded before being passed to R, I have no way of checking whether the second argument has been supplied.
If I understand correctly, you don't want bash to glob the wildcard for you, you want to pass the expression itself, e.g. *.csv. Some options include:
Pass the expression in quoted text and process it within R, either by evaluating that in another command or otherwise
Rscript -e "list.files(pattern = commandArgs(T))" "*\.csv$"
Pass just the extension and process the * within R by context
Rscript -e "list.files(pattern = paste0('*\\\\.', commandArgs(T)))" "csv$"
Through complicated and unnecessary means, disable globbing for that command: Stop shell wildcard character expansion?
Note: I've changed the argument to a regex to prevent it matching too greedily.

Have an Rscript read or take input from stdin

I see how to have an Rscript perform the operations I want when given a filename as an argument, e.g. if my Rscript is called script and contains:
#!/usr/bin/Rscript
path <- commandArgs()[1]
writeLines(readLines(path))
Then I can run from the bash command line:
Rscript script filename.Rmd --args dev='svg'
and successfully get the contents of filename.Rmd echoed back out to me. If instead of passing the above argument a filename like filename.Rmd I want to pass it text from stdin, I try modifying my script to read from stdin:
#!/usr/bin/Rscript
writeLines(file("stdin"))
but I do not know how to construct the commandline call for this case. I tried piping in the contents:
cat filename.Rmd | Rscript script --args dev='svg'
and also tried redirect:
Rscript script --args dev='svg' < filename.Rmd
and either way I get the error:
Error in writeLines(file("stdin")) : invalid 'text' argument
(I've also tried open(file("stdin"))). I'm not sure if I'm constructing the Rscript incorrectly, or the commandline argument incorrectly, or both.
You need to read text from the connection created by file("stdin") in order to pass anything useful to the text argument of writeLines(). This should work
#!/usr/bin/Rscript
writeLines(readLines(file("stdin")))

Makefile with SHELL=/usr/bin/R : handling multilines

I'm playing with R and Gnu Make (4.0, the code below won't work with <=3.81) and I'd like to use R instead of a classical shell:
I wrote the following code:
.PHONY: all clean
SHELL = /usr/bin/R
.SHELLFLAGS= --vanilla --no-readline --quiet -e
.ONESHELL:
UCSC=http://hgdownload.cse.ucsc.edu/goldenpath/hg17/database/
all: chr1_gold.txt.gz
gold <- read.delim(gzfile("$<"))
head(gold)
chr1_gold.txt.gz:
download.file("${UCSC}/$#","$#")
clean:
$(foreach F,chr1_gold.txt.gz,file.remove("$F");)
the target chr1_gold.txt.gz works fine but not the target "all" because there is more than one line:
$ /make-4.0/make
download.file("http://hgdownload.cse.ucsc.edu/goldenpath/hg17/database//chr1_gold.txt.gz","chr1_gold.txt.gz")
> download.file("http://hgdownload.cse.ucsc.edu/goldenpath/hg17/database//chr1_gold.txt.gz","chr1_gold.txt.gz")
trying URL 'http://hgdownload.cse.ucsc.edu/goldenpath/hg17/database//chr1_gold.txt.gz'
Content type 'application/x-gzip' length 45866 bytes (44 Kb)
opened URL
==================================================
downloaded 44 Kb
>
>
gold <- read.delim(gzfile("chr1_gold.txt.gz"))
head(gold)
ARGUMENT 'head(gold)' __ignored__
> gold <- read.delim(gzfile("chr1_gold.txt.gz"));\
Error: unexpected input in "\"
Execution halted
Makefile:9: recipe for target 'all' failed
make: *** [all] Error 1
I tried to add a backslash, a semi colon but that doesn't work: how can I fix this ? Can I tell make to pipe a file to the SHELL instead of using an argument (-e string) ?
EDIT:
with
all: chr1_gold.txt.gz
gold <- read.delim(gzfile("$<")) \
head(gold)
.
read.delim(gzfile("chr1_gold.txt.gz")) \
head(gold)
ARGUMENT 'head(gold)' __ignored__
> gold <- read.delim(gzfile("chr1_gold.txt.gz")) \
Error: unexpected input in "gold <- read.delim(gzfile("chr1_gold.txt.gz")) \"
Execution halted
with ';'
all: chr1_gold.txt.gz
gold <- read.delim(gzfile("$<")) ;
head(gold)
.
gold <- read.delim(gzfile("chr1_gold.txt.gz")) ;
head(gold)
ARGUMENT 'head(gold)' ignored
> gold <- read.delim(gzfile("chr1_gold.txt.gz")) ;
>
>
with ';\'
all: chr1_gold.txt.gz
gold <- read.delim(gzfile("$<")) ;\
head(gold)
.
ARGUMENT 'head(gold)' __ignored__
> gold <- read.delim(gzfile("chr1_gold.txt.gz")) ;\
Error: unexpected input in "\"
Execution halted
Makefile:9: recipe for target 'all' failed
It looks to me like this is a problem with R's -e option: it appears that unlike the shell's -e option, R's version will accept only a single command and ignores embedded newlines (as you suspected). Unfortunately there's no option in GNU make to have it automatically write a temporary file and send that to the SHELL. The logistics here are somewhat daunting: how would you specify the name of the file in the shell command? Or what if you wanted to pipe via stdin? Etc. It could be done for sure, but requires some careful consideration of the design.
Currently GNU make requires that the interpreter used for SHELL must be able to accept a multi-line script provided on the command line, that's just the way it is.
The most straightforward way to work with R that I can think of is to put the recipe into a variable using define/enddef to preserve newlines, then use the new $(file ...) function to write it to a file and call R with the name of that file. You can make this somewhat cleaner with a user-defined variable, but you'll probably have to go back to using /bin/sh as the SHELL.
I think an alternative is to use "littler"
For example:
.PHONY: all
SHELL = /usr/bin/r
.SHELLFLAGS= -e
.ONESHELL:
.SILENT: all
all:
x <- rnorm(10)
cat(sd(x), "\n")

Unable to get a system variable work for manuals

I have the following system variable in .zshrc
manuals='/usr/share/man/man<1-9>'
I run unsuccessfully
zgrep -c compinit $manuals/zsh*
I get
zsh: no matches found: /usr/share/man/man<1-9>/zsh*
The command should be the same as the following command which works
zgrep -c compinit /usr/share/man/man<1-9>/zsh*
How can you run the above command with a system variable in Zsh?
Try:
$> manuals=/usr/share/man/man<0-9>
$> zgrep -c compinit ${~manuals}/zsh*
The '~' tells zsh to perform expansion of the <0-9> when using the variable. The zsh reference card tells you how to do this and more.
From my investigations, it looks like zsh performs <> substitution before $ substitution. That means when you use the $ variant, it first tries <> substitution (nothing there) then $ substitution (which works), and you're left with the string containing the <> characters.
When you don't use $manuals, it first tries <> substitution and it works. It's a matter of order. The final version below shows how to defer expansion so they happen at the same time:
These can be seen here:
> manuals='/usr/share/man/man<1-9>'
> echo $manuals
/usr/share/man/man<1-9>
> echo /usr/share/man/man<1-9>
/usr/share/man/man1 /usr/share/man/man2 /usr/share/man/man3
/usr/share/man/man4 /usr/share/man/man5 /usr/share/man/man6
/usr/share/man/man7 /usr/share/man/man8
> echo $~manuals
/usr/share/man/man1 /usr/share/man/man2 /usr/share/man/man3
/usr/share/man/man4 /usr/share/man/man5 /usr/share/man/man6
/usr/share/man/man7 /usr/share/man/man8

Resources