Piping Rscript gives error after output - r

I wrote a small R script to read JSON, which works fine but upon piping with
Rscript myscript.R | head
the (full, expected) output comes back with an error
Error: ignoring SIGPIPE signal
Execution halted
Oddly I can't remove this by piping STDERR to /dev/null using:
Rscript myscript.R | head 2>/dev/null
The same error is given... presumably because the error arises within the Rscript command? The suggestion to me is that the output of the head command is all STDOUT.
Piping STDOUT to /dev/null returns only the error message
Piping STDERR to /dev/null returns only the error message...!
Piping the output to cat seems to be 'invisible' - this doesn't cause an error.
Rscript myscript.R | cat | head
Further pipe chaining is possible after the cat command but it feels like I may be ignoring something important by not addressing the error.
Is there a setting I need to use within the script to permit piping without the error? I'd like to have R scripts at the ready for small tasks as is done with the likes of Python and Perl, and it'd get annoying to always have to add a useless cat.
There is discussion of handling this error in C here, but it's not immediately clear to me how this would relate to an R script.
Edit In response to #lll's answer, the full script in use (above called as 'myscript.R') is
library(RJSONIO)
note.list <- c('abcdefg.json','hijklmn.json')
# unique IDs for markdown notes stored in JSON by Laverna, http://laverna.cc
for (laverna.note in note.list) {
# note.file <- path.expand(file.path('~/Dropbox/Apps/Laverna/notes',
# laverna.note))
# For the purpose of this example run the script in the same
# directory as the JSON files
note.file <- path.expand(file.path(getwd(),laverna.note))
file.conn <- file(note.file)
suppressWarnings( # warnings re: no terminating newline
cat(paste0(substr(readLines(file.conn), 2, 15)),'\n') # add said newline
)
close(file.conn)
}
Rscript myscript.R outputs
"id":"abcdefg"
"id":"hijklmn"
Rscript myscript.R | head -1 outputs
"id":"abcdefg"
Error: ignoring SIGPIPE signal
Execution halted
It's not clear to me what would be terminating 'early' here
Edit 2 It's replicable with readLines so I've removed JSON library-specific details in the example above. Script and dummy JSON gisted here.
Edit 3 It seems it may be possible to take command-line arguments including pipes and pass them to pipe() - I'll try this when I can and resolve the question.

The error is simply caused by an attempt to write to the pipe without a process connected to the other end. In other words, your script has already picked up and left by the time the pipe is reached and the HEAD command is called.
The command itself might not be the issue; it could be something within the script causing an early termination or race condition before reaching the pipe. Since you're getting full output it may not be that much of a concern, however, masking the error with other CLI commands as mentioned probably isn't the best approach.
The command line solution:
R does have a couple of useful commands for dealing with instances in which you might want the interpreter to wait, or perhaps suppress any errors that would normally be output to stderr.
For command-line R, error messages written to ‘stderr’ will be sent to
the terminal unless ignore.stderr = TRUE. They can be captured (in the
most likely shells) by:
system("some command 2>&1", intern = TRUE)
There is also the wait argument which could help with keeping the process alive.
wait — logical (not NA) indicating whether the R interpreter should
wait for the command to finish, or run it asynchronously. This will be
ignored (and the interpreter will always wait)
if intern = TRUE.
system("Rscript myscript.R | head 2>&1", intern = TRUE)
The above would wait, and output errors, if any are thrown.
system("Rscript myscript.R | head", intern = FALSE, ignore.stderr = TRUE)
The above won't wait, but would suppress errors, if any.

I encountered the same annoying error. It appears to be generated from within R by a function writing to STDOUT, if the R function is still running (outputting data to the pipe) when the pipe stops 'listening'.
So, errors can be suppressed by simply wrapping the R output function into try(...,silent=TRUE), or specifically this error can be handled by wrapping the R output function into a more-involved tryCatch(...,error=...) function.
Example:
Here's a script that generates an error when piped:
#! /Library/Frameworks/R.framework/Resources/bin/rscript
random_matrix=matrix(rnorm(2000),1000)
write.table(x=random_matrix,file="",sep=",",row.names=FALSE,col.names=FALSE)
Output when called from bash and piped to head:
./myScript.r | head -n 1
-1.69669866833626,-0.463199773124574
Error in write.table(x = random_matrix, file = "", sep = ",", row.names = FALSE, :
ignoring SIGPIPE signal
Execution halted
So: wrap the write.table output function into try to suppress all errors that occur during output:
try(write.table(x=random_matrix,file="",sep=",",row.names=FALSE,col.names=FALSE),silent=TRUE)
Or, more-specific, just suppress the "ignoring SIGPIPE signal" error:
tryCatch(write.table(x=random_matrix,file="",sep=",",row.names=FALSE,col.names=FALSE),
error=function(e) if(!grepl("ignoring SIGPIPE signal",e$message))stop(e) )

I could overcome this problem by using littler instead of Rscript:
r myscript.R | head

Related

Check syntax of R script without running it

After making changes to a R script, is there a way to check its syntax by running a command, before running the R script itself?
Base R has parse which will parse a script without running it.
parse("myscript.R")
The codetools package, which comes with R, has checkUsage and checkUsagePackage to check single functions and packages respectively.
There is lint() from the lint package:
lintr::lint("tets.r")
#> tets.r:1:6: style: Place a space before left parenthesis, except in a function call.
#> is.na(((NA))
#> ^
#> tets.r:1:12: error: unexpected end of input
#> is.na(((NA))
#> ^
The tested file contains only this wrong code
is.na(((NA))
You can customise what lint() checks. By default, it is quite noisy about code style (which is the main reason I use it).
From within an R console/REPL session, you can use the builtin base's parse function:
$ R
> parse("hello.r")
expression(cat("Hello, World!\n"))
> parse("broken.r")
Error in parse("broken.r") : broken.r:2:0: unexpected end of input
1: cat("Hello, World!\n"
^
The parse function throws an error in case it fails to parse. A limitation here is that it will only detect parse errors, but not for example references to undefined functions.
Directly from the command line
You can also check parsing directly from the command line by using Rscript's -e option to call parse:
$ Rscript -e 'parse("hello.r")'
expression(cat("Hello, World!\n"))
$ Rscript -e 'parse("broken.r")'
Error in parse("broken.r") : broken.r:2:0: unexpected end of input
1: cat("Hello, World!\n"
^
Execution halted
A nice result of this is that Rscript returns successfully (no exit code) when the parse works and
it returns an error code when the parse fails. So you can use your shell's || and && operators as usual or other forms of error detection (set -e).
Custom parsing script
You can also create a custom script that wraps everything nicely:
#!/bin/bash
#
# Rparse: checks whether R scripts parse successfully
#
# usage: Rscript script.r
# usage: Rscript file1.r file2.r file3.r ...
set -e
for file in "$#"
do
Rscript -e "parse(\"$file\")"
done
Place the script in a folder pointed by your $PATH variable and use it to check your R files as follows:
$ Rparse hello.r broken.r
expression(cat("Hello, World!\n"))
Error in parse("broken.r") : broken.r:2:0: unexpected end of input
1: cat("Hello, World!\n"
^
Execution halted
Here I am using bash as a reference language but there's nothing impeding alternatives to be built for other shells including Windows' ".bat" files or even an R script!
(Other answers already address the question quite nicely, but I wanted to document some additional possibilities in a more complete answer.)

How to parallelize the nested for loops in bash calling the R script

Is it possible to parallelize the following code?
for word in $(cat FileNames.txt)
do
for i in {1..22}
do
Rscript assoc_test.R...........
done >> log.txt
done
I have been trying to parallelize it but have not been lucky so far. I have tried putting () around the Rscript assoc_test.R........... followed by & but it is not giving the results, and the log file turns out to be empty. Any suggestions/help would be appreciated. TIA.
You can change your script to output the commands to run, and feed the results into GNU parallel:
for word in $(cat FileNames.txt)
do
for i in {1..22}
do
echo Rscript assoc_test.R........... \> log.$word.$i
done
done | parallel -j 4
Some details:
parallel -j 4 will keep 4 jobs running at a time - replace 4 by the number of CPUs you want to use.
Notice I redirect the output to log.$word.$i and escape the redirection operator > by using \>. I need to test and make sure it works, but the point is that since you're going parallel, you don't want to jumble all your outputs together.
Make sure you escape anything else the echo might interpret. The output should be valid command lines that parallel can run.
As an alternative to parallel, you can also use xargs -i. See this question for more information.
GNU Parallel is made for replacing loops, so the double loop can be replaced by:
parallel Rscript assoc_test.R... \> log.{1}.{2} :::: FileNames.txt ::: {1..22} > log.txt

R - How to execute PowerShell cmds with system() or system2()

I working in R (on a Windows OS) attempting to count the number of words in the text file without loading the file into memory. The idea is to get some stats on the file size, line count, word count, etc. A call to R's system() function that uses find for the line count is not hard to come by:
How do I do a "word count" command in Windows Command Prompt
lineCount <- system(paste0('find /c /v "" ', path), intern = T)
The command that I'm trying to work with for the word count is a PowerShell command: Measure-Object. I can get the following code to run without throwing an error but it returns an incorrect count.
print(system2("Measure-Object", args = c('count_words.txt', '-Word')))
[1] 127
The file, count_words.txt has on the order of millions of words. I also tested it on a .txt file with far fewer words.
"There are seven words in this file."
But the count again is returned as 127.
print(system2("Measure-Object", args = c('seven_words.txt', '-Word')))
[1] 127
Does system2() recognize PowerShell commands? What is the correct syntax for a call to the function when using Measure-Object? Why is it returning the same value regardless of actual word count?
The issues -- overview
So, you have two issues going on here:
You aren't telling system2() to use powershell
You aren't using the right powershell syntax
The solution
command <- "Get-Content C:/Users/User/Documents/test1.txt | Measure-Object -Word"
system2("powershell", args = command)
where you replace C:/Users/User/Documents/test2.txt with whatever the path to your file is. I created two .txt files, one with the text "There are seven words in this file." and the other with the text "But there are eight words in this file." I then ran the following in R:
command <- "Get-Content C:/Users/User/Documents/test1.txt | Measure-Object -Word"
system2("powershell", args = command)
Lines Words Characters Property
----- ----- ---------- --------
7
command <- "Get-Content C:/Users/User/Documents/test2.txt | Measure-Object -Word"
system2("powershell", args = command)
Lines Words Characters Property
----- ----- ---------- --------
8
More explanation
From help("system2"):
system2 invokes the OS command specified by command.
One main issue is that Measure-Object isn't a system command -- it's a PowerShell command. The system command for PowerShell is powershell, which is what you need to invoke.
Then, further, you didn't quite have the right PowerShell syntax. If you take a look at the docs, you'll see the PowerShell command you really want is
Get-Content C:/Users/User/Documents/count_words.txt | Measure-Object -Word
(check out example three on the linked documentation).

R or bash command line length limit

I'm developing a bash program that execute a R oneliner command to convert a RMarkdown template into a HTML document.
This R oneliner command looks like:
R -e 'library(rmarkdown) ; rmarkdown::render( "template.Rmd", "html_document", output_file = "report.html", output_dir = "'${OUTDIR}'", params = list( param1 = "'${PARAM1}'", param2 = "'${PARAM2}'", ... ) )
I have a long list of parameters, let's say 10 to explain the problem, and it seems that the R or bash has a command line length limit.
When I execute the R oneliner with 10 parameters I obtain a error message like this:
WARNING: '-e library(rmarkdown)~+~;~+~rmarkdown::render(~+~"template.Rmd",~+~"html_document",~+~output_file~+~=~+~"report.html",~+~output_dir~+~=~+~"output/",~+~params~+~=~+~list(~+~param1~+~=~+~"param2", ...
Fatal error: you must specify '--save', '--no-save' or '--vanilla'
When I execute the R oneliner with 9 parameters it's ok (I tried different combinations to verify that the problem was not the last parameter).
When I execute the R oneliner with 10 parameters but with removing all spaces in it, it's ok too so I guess that R or bash use a command line length limit.
R -e 'library(rmarkdown);rmarkdown::render("template.Rmd","html_document",output_file="report.html",output_dir="'${OUTDIR}'",params=list(param1="'${PARAM1}'",param2="'${PARAM2}'",...))
Is it possible to increase this limit?
This will break a number of ways – including if your arguments have spaces or quotes in them.
Instead, try passing the values as arguments. Something like this should give you an idea how it works:
# create a script file
tee arguments.r << 'EOF'
argv <- commandArgs(trailingOnly=TRUE)
arg1 <- argv[1]
print(paste("Argument 1 was", arg1))
EOF
# set some values
param1="foo bar"
param2="baz"
# run the script with arguments
Rscript arguments.r "$param1" "$param2"
Expected output:
[1] "Argument 1 was foo bar"
Always quote your variables and always use lowercase variable names to avoid conflicts with system or application variables.

Have an Rscript read or take input from stdin

I see how to have an Rscript perform the operations I want when given a filename as an argument, e.g. if my Rscript is called script and contains:
#!/usr/bin/Rscript
path <- commandArgs()[1]
writeLines(readLines(path))
Then I can run from the bash command line:
Rscript script filename.Rmd --args dev='svg'
and successfully get the contents of filename.Rmd echoed back out to me. If instead of passing the above argument a filename like filename.Rmd I want to pass it text from stdin, I try modifying my script to read from stdin:
#!/usr/bin/Rscript
writeLines(file("stdin"))
but I do not know how to construct the commandline call for this case. I tried piping in the contents:
cat filename.Rmd | Rscript script --args dev='svg'
and also tried redirect:
Rscript script --args dev='svg' < filename.Rmd
and either way I get the error:
Error in writeLines(file("stdin")) : invalid 'text' argument
(I've also tried open(file("stdin"))). I'm not sure if I'm constructing the Rscript incorrectly, or the commandline argument incorrectly, or both.
You need to read text from the connection created by file("stdin") in order to pass anything useful to the text argument of writeLines(). This should work
#!/usr/bin/Rscript
writeLines(readLines(file("stdin")))

Resources