Check syntax of R script without running it - r

After making changes to a R script, is there a way to check its syntax by running a command, before running the R script itself?

Base R has parse which will parse a script without running it.
parse("myscript.R")
The codetools package, which comes with R, has checkUsage and checkUsagePackage to check single functions and packages respectively.

There is lint() from the lint package:
lintr::lint("tets.r")
#> tets.r:1:6: style: Place a space before left parenthesis, except in a function call.
#> is.na(((NA))
#> ^
#> tets.r:1:12: error: unexpected end of input
#> is.na(((NA))
#> ^
The tested file contains only this wrong code
is.na(((NA))
You can customise what lint() checks. By default, it is quite noisy about code style (which is the main reason I use it).

From within an R console/REPL session, you can use the builtin base's parse function:
$ R
> parse("hello.r")
expression(cat("Hello, World!\n"))
> parse("broken.r")
Error in parse("broken.r") : broken.r:2:0: unexpected end of input
1: cat("Hello, World!\n"
^
The parse function throws an error in case it fails to parse. A limitation here is that it will only detect parse errors, but not for example references to undefined functions.
Directly from the command line
You can also check parsing directly from the command line by using Rscript's -e option to call parse:
$ Rscript -e 'parse("hello.r")'
expression(cat("Hello, World!\n"))
$ Rscript -e 'parse("broken.r")'
Error in parse("broken.r") : broken.r:2:0: unexpected end of input
1: cat("Hello, World!\n"
^
Execution halted
A nice result of this is that Rscript returns successfully (no exit code) when the parse works and
it returns an error code when the parse fails. So you can use your shell's || and && operators as usual or other forms of error detection (set -e).
Custom parsing script
You can also create a custom script that wraps everything nicely:
#!/bin/bash
#
# Rparse: checks whether R scripts parse successfully
#
# usage: Rscript script.r
# usage: Rscript file1.r file2.r file3.r ...
set -e
for file in "$#"
do
Rscript -e "parse(\"$file\")"
done
Place the script in a folder pointed by your $PATH variable and use it to check your R files as follows:
$ Rparse hello.r broken.r
expression(cat("Hello, World!\n"))
Error in parse("broken.r") : broken.r:2:0: unexpected end of input
1: cat("Hello, World!\n"
^
Execution halted
Here I am using bash as a reference language but there's nothing impeding alternatives to be built for other shells including Windows' ".bat" files or even an R script!
(Other answers already address the question quite nicely, but I wanted to document some additional possibilities in a more complete answer.)

Related

Can I force Rscript to pass wildcard in command line arguments?

As best as I can sort out (since I haven't had much luck finding documentation of it) when one runs Rscript with a command argument that includes a wildcard *, the argument is expanded to a character vector of the filepaths that match, or passed if there are no matches. Is there a way to pass the wildcard all the time, so I can handle it myself within the script (using things like Sys.glob, for example)?
Here's a minimal example, run from the terminal:
ls
## foo.csv bar.csv baz.txt
Rscript -e "print(commandArgs(T))" *.csv
## [1] "foo.csv" "bar.csv"
Rscript -e "print(commandArgs(T))" *.txt
## [1] "baz.txt"
Rscript -e "print(commandArgs(T))" *.rds
## [1] "*.rds"
EDIT: I have learned that this behavior is from bash, not Rscript. Is there some way to work around this behavior from within R, or to suppress wildcard expansion for a particular R script but not the Rscript command? In my particular case, I want to run a function with two arguments, Rscript collapse.R *.rds out.rds that concatenates the contents of many individual RDS files into a list and saves the result in out.rds. But since the wildcard gets expanded before being passed to R, I have no way of checking whether the second argument has been supplied.
If I understand correctly, you don't want bash to glob the wildcard for you, you want to pass the expression itself, e.g. *.csv. Some options include:
Pass the expression in quoted text and process it within R, either by evaluating that in another command or otherwise
Rscript -e "list.files(pattern = commandArgs(T))" "*\.csv$"
Pass just the extension and process the * within R by context
Rscript -e "list.files(pattern = paste0('*\\\\.', commandArgs(T)))" "csv$"
Through complicated and unnecessary means, disable globbing for that command: Stop shell wildcard character expansion?
Note: I've changed the argument to a regex to prevent it matching too greedily.

Piping Rscript gives error after output

I wrote a small R script to read JSON, which works fine but upon piping with
Rscript myscript.R | head
the (full, expected) output comes back with an error
Error: ignoring SIGPIPE signal
Execution halted
Oddly I can't remove this by piping STDERR to /dev/null using:
Rscript myscript.R | head 2>/dev/null
The same error is given... presumably because the error arises within the Rscript command? The suggestion to me is that the output of the head command is all STDOUT.
Piping STDOUT to /dev/null returns only the error message
Piping STDERR to /dev/null returns only the error message...!
Piping the output to cat seems to be 'invisible' - this doesn't cause an error.
Rscript myscript.R | cat | head
Further pipe chaining is possible after the cat command but it feels like I may be ignoring something important by not addressing the error.
Is there a setting I need to use within the script to permit piping without the error? I'd like to have R scripts at the ready for small tasks as is done with the likes of Python and Perl, and it'd get annoying to always have to add a useless cat.
There is discussion of handling this error in C here, but it's not immediately clear to me how this would relate to an R script.
Edit In response to #lll's answer, the full script in use (above called as 'myscript.R') is
library(RJSONIO)
note.list <- c('abcdefg.json','hijklmn.json')
# unique IDs for markdown notes stored in JSON by Laverna, http://laverna.cc
for (laverna.note in note.list) {
# note.file <- path.expand(file.path('~/Dropbox/Apps/Laverna/notes',
# laverna.note))
# For the purpose of this example run the script in the same
# directory as the JSON files
note.file <- path.expand(file.path(getwd(),laverna.note))
file.conn <- file(note.file)
suppressWarnings( # warnings re: no terminating newline
cat(paste0(substr(readLines(file.conn), 2, 15)),'\n') # add said newline
)
close(file.conn)
}
Rscript myscript.R outputs
"id":"abcdefg"
"id":"hijklmn"
Rscript myscript.R | head -1 outputs
"id":"abcdefg"
Error: ignoring SIGPIPE signal
Execution halted
It's not clear to me what would be terminating 'early' here
Edit 2 It's replicable with readLines so I've removed JSON library-specific details in the example above. Script and dummy JSON gisted here.
Edit 3 It seems it may be possible to take command-line arguments including pipes and pass them to pipe() - I'll try this when I can and resolve the question.
The error is simply caused by an attempt to write to the pipe without a process connected to the other end. In other words, your script has already picked up and left by the time the pipe is reached and the HEAD command is called.
The command itself might not be the issue; it could be something within the script causing an early termination or race condition before reaching the pipe. Since you're getting full output it may not be that much of a concern, however, masking the error with other CLI commands as mentioned probably isn't the best approach.
The command line solution:
R does have a couple of useful commands for dealing with instances in which you might want the interpreter to wait, or perhaps suppress any errors that would normally be output to stderr.
For command-line R, error messages written to ‘stderr’ will be sent to
the terminal unless ignore.stderr = TRUE. They can be captured (in the
most likely shells) by:
system("some command 2>&1", intern = TRUE)
There is also the wait argument which could help with keeping the process alive.
wait — logical (not NA) indicating whether the R interpreter should
wait for the command to finish, or run it asynchronously. This will be
ignored (and the interpreter will always wait)
if intern = TRUE.
system("Rscript myscript.R | head 2>&1", intern = TRUE)
The above would wait, and output errors, if any are thrown.
system("Rscript myscript.R | head", intern = FALSE, ignore.stderr = TRUE)
The above won't wait, but would suppress errors, if any.
I encountered the same annoying error. It appears to be generated from within R by a function writing to STDOUT, if the R function is still running (outputting data to the pipe) when the pipe stops 'listening'.
So, errors can be suppressed by simply wrapping the R output function into try(...,silent=TRUE), or specifically this error can be handled by wrapping the R output function into a more-involved tryCatch(...,error=...) function.
Example:
Here's a script that generates an error when piped:
#! /Library/Frameworks/R.framework/Resources/bin/rscript
random_matrix=matrix(rnorm(2000),1000)
write.table(x=random_matrix,file="",sep=",",row.names=FALSE,col.names=FALSE)
Output when called from bash and piped to head:
./myScript.r | head -n 1
-1.69669866833626,-0.463199773124574
Error in write.table(x = random_matrix, file = "", sep = ",", row.names = FALSE, :
ignoring SIGPIPE signal
Execution halted
So: wrap the write.table output function into try to suppress all errors that occur during output:
try(write.table(x=random_matrix,file="",sep=",",row.names=FALSE,col.names=FALSE),silent=TRUE)
Or, more-specific, just suppress the "ignoring SIGPIPE signal" error:
tryCatch(write.table(x=random_matrix,file="",sep=",",row.names=FALSE,col.names=FALSE),
error=function(e) if(!grepl("ignoring SIGPIPE signal",e$message))stop(e) )
I could overcome this problem by using littler instead of Rscript:
r myscript.R | head

Have an Rscript read or take input from stdin

I see how to have an Rscript perform the operations I want when given a filename as an argument, e.g. if my Rscript is called script and contains:
#!/usr/bin/Rscript
path <- commandArgs()[1]
writeLines(readLines(path))
Then I can run from the bash command line:
Rscript script filename.Rmd --args dev='svg'
and successfully get the contents of filename.Rmd echoed back out to me. If instead of passing the above argument a filename like filename.Rmd I want to pass it text from stdin, I try modifying my script to read from stdin:
#!/usr/bin/Rscript
writeLines(file("stdin"))
but I do not know how to construct the commandline call for this case. I tried piping in the contents:
cat filename.Rmd | Rscript script --args dev='svg'
and also tried redirect:
Rscript script --args dev='svg' < filename.Rmd
and either way I get the error:
Error in writeLines(file("stdin")) : invalid 'text' argument
(I've also tried open(file("stdin"))). I'm not sure if I'm constructing the Rscript incorrectly, or the commandline argument incorrectly, or both.
You need to read text from the connection created by file("stdin") in order to pass anything useful to the text argument of writeLines(). This should work
#!/usr/bin/Rscript
writeLines(readLines(file("stdin")))

Executing expressions in Rscript.exe

I'd like to put some expressions that write stuff to a file directly into a call to Rscript.exe (without specifying file in Rscript [options] [-e expression] file [args] and thus without an explicit R script that is run).
Everything seems to work except the fact that the desired file is not created. What am I doing wrong?
# Works:
shell("Rscript -e print(Sys.time())")
# Works:
write(Sys.time(), file='c:/temp/systime.txt')
# No error, but no file created:
shell("Rscript -e write(Sys.time(), file='c:/temp/systime.txt')")
Rscript parses its command line using spaces as separators. If your R string contains embedded spaces, you need to wrap it within quotes to make sure it gets sent as a complete unit to the R parser.
You also don't need to use shell unless you specifically need features of cmd.exe.
system("Rscript.exe -e \"write(Sys.time(), file='c:/temp/systime.txt')\"")

making commandargs comma delimited or parsing spaces

I'm trying to run R from the command line using command line arguments. This includes passing in some filepaths as arguments for use inside the script. It all works most of the time, but sometimes the paths have spaces in and R doesn't understand.
I'm running something of the form:
R CMD BATCH --slave "--args inputfile='C:/Work/FolderWith SpaceInName/myinputfile.csv' outputfile='C:/Work/myoutputfile.csv'" RScript.r ROut.txt
And R throws out a file saying
Fatal error: cannot open file 'C:\Work\FolderWith': No such file or directory
So evidently my single quotes aren't enough to tell R to take everything inside the quotes as the argument value. I'm thinking this means I should find a way to delimit my --args using a comma, but I can't find a way to do this. I'm sure it's simple but I've not found anything in the documentation.
The current script is very basic:
ca = commandArgs(trailingOnly=TRUE)
eval(parse(text=ca))
tempdata = read.csv(inputFile)
tempdata$total = apply(tempdata[,4:18], 1, sum)
write.csv(tempdata, outputFile, row.names = FALSE)
In case it's relevant I'm using windows for this, but it seems like it's not a cmd prompt problem.
Using eval(parse()) is probably not the best and most efficient way to parse command line arguments. I recommend to use a package like the optparse to do the parsing for you. Parsing command line args has already been solved, no need to reimplement this. I could imagine that this solves your problems. Although, spaces in path names are a bad idea to begin with.
Alternatively, you could take a very simple approach and pass the arguments like this:
R CMD BATCH --slave arg1 arg2
Where you can retrieve them like:
ca = commandArgs(TRUE)
arg1 = ca[2]
arg2 = ca[3]
This avoids the eval(parse which I think is causing the issues. Finally, you could try and escape the space like this:
R CMD BATCH --slave "C:/spam\ bla"
You could also give Rscript a try, R CMD BATCH seems to be less favored than Rscript.
As an enhancement of #PaulHimestra answer here how you can use Rscript :
you create a launcher.bat ,
echo off
C:
PATH R_PATH;%path%
cd DEMO_PATH
Rscript youscript.R arg1 arg2
exit
with R_PATH something like C:/Program Files/R/R-version
There are many similarities with this post:
R command line passing a filename to script in arguments (Windows)
Also this post is very OS related. My answer applies only to Windows.
Probably what you are looking for is RScript.exe instead of R.exe. The latter has no problem with spaces: path\to\RScript "My script.r".
One boring thing may be searching or setting the path for RScript and doing this every time one updates R.
Among the convenience scripts I have in my search path, I wrote a little facility to run RScript without bothering with paths. Just in case it may be of interest for someone:
#echo off
setlocal
::Get change to file dir par (-CD must be 1st par)
::================================================
Set CHANGEDIR="F"
If /I %1 EQU -cd (
Set CHANGEDIR="T"
SHIFT
)
::No args given
::=============
If [%1] EQU [] GoTo :USAGE
::Get R path from registry
::========================
:: may check http://code.google.com/p/batchfiles for updates on R reg keys
Call :CHECKSET hklm\software\R-core\R InstallPath
Call :CHECKSET hklm\software\wow6432Node\r-core\r InstallPath
if not defined RINSTALLPATH echo "Error: R not found" & goto:EOF
::Detect filepath when arg not starting with "-"
::==============================================
::Note the space after ARGS down here!!!
Set ARGS=
:LOOP
if [%1]==[] (GoTo :ELOOP)
Set ARGS=%ARGS% %1
::Echo [%ARGS%]
Set THIS=%~1
if [%THIS:~0,1%] NEQ [-] (Set FPATH=%~dp1)
SHIFT
GoTo :LOOP
:ELOOP
::echo %FPATH%
::Run Rscript script, changing to its path if asked
::=================================================
If %CHANGEDIR%=="T" (CD %FPATH%)
Echo "%RINSTALLPATH%\bin\Rscript.exe" %ARGS%
"%RINSTALLPATH%\bin\Rscript.exe" %ARGS%
endlocal
:: ==== Subroutines ====
GoTo :EOF
:USAGE
Echo USAGE:
Echo R [-cd] [RScriptOptions] Script [ScriptArgs]
Echo.
Echo -cd changes to script dir. Must be first par.
Echo To get RScript help on options etc.:
Echo R --help
GoTo :EOF
:CHECKSET
if not defined RINSTALLPATH for /f "tokens=2*" %%a in ('reg query %1 /v %2 2^>NUL') do set RINSTALLPATH=%%~b
GoTo :EOF
The script prints the actual RScript invoking line, before running it.
Note that there is an added argument, -cd, to change automatically to the script directory. In fact it is not easy to guess the script path from inside R (and set it with setwd()), in order to call other scripts or read/write data files placed in the same path (or in a relative one).
This (-cd) might possibly make superfluous your other commandargs, as you may find convenient calling them straight from inside the script.

Resources