How do I check if a command used for pipe() failed? - r

Say, I have written a compressor function of the form:
compress <- function(text, file){
c <- paste0("gzip -c > ",shQuote(file))
p <- pipe(c, open = "wb")
on.exit(close(p))
writeLines(text, p)
}
Now I can compress strings like this:
compress("Hello World", "test.gz")
system("zcat test.gz")
## Hello World
However, how can I check that the programm gzip called by pipe() succeeds?
E.g.
compress("Hello World", "nonexistent-dir/test.gz")
## sh: nonexistent-dir/test.gz: No such file or directory
results in an error message printed on STDERR but I have no way in creating an R error. The program just continues without saving my text.
I know that I can just check for the existence of the target file in this example. But there are many possible errors, like end of disk space, the program was not found, some library is missing, the program was found but the ELF interpreter was not etc. I just can't think of any way to test for all possible errors.
I searched the help page ?pipe, but could find no hint.
As a UNIX-specific approach, I tried to trap the SIGPIPE signal, but could not find a way to do this. Stack Overflow questions regarding this topic remain unanswered as of now [1] [2]
How do I check the exit code or premature termination of a program called with pipe()?

I have not found a solution using pipe(), but the package processx has a solution. I first create the external process as a background process, obtain a connection to it and write to that connection.
Before I start to write, I check if the process is running. After all is written, I can check the exit code of the program.
library(processx)
p <- process$new("pigz","-c",stdin="|", stdout = "/tmp/test.gz")
if(!p$is_alive()) stop("Subprocess ended prematurely")
con <- p$get_input_connection()
# Write output
conn_write(con, "Hello World")
close(con)
p$wait()
if(p$get_exit_status() != 0) stop("Subprocess failed")
system("zcat /tmp/test.gz")

Related

Error with on.exit(close(con)) with writeLines()

I’m writing a function for writing some information to file. To offer more options for fine control I’m first opening a connection before calling writeLines().
As it has been stated in many places (eg How and when should I use on.exit?) the function on.exit() may be very helpful to reset settings in case unforeseen events making functions stall.
I hope you agree, writing files can certainly be considered a somehow risky activity (eg no more disk space, …).
So I was thinking of charging on.exit() to make sure the connection gets closed if anything during writeLines() caused my function to stop with an error. However, when I do so I get an error right after writeLines() gets run !
Has anybody an explanation ?
The workarounds I found is a) don’t use on.exit() or b) add try(). But I wonder if the second option would realy be helpful if an unforeseen event would make writeLines() stall. Even checking if the connection is still open to avoid closing twice didn't help.
Surprisingly, so far I couldn't find other examples of people using on.exit() to make sure the current connection gets closed when/after writing text.
Here an example :
txt1 <- c("ABCDEF","abcdefgh")
fxWrite <- function(txt, fileName, opt=0, eol="\n") {
con <- file(fileName)
open(con, open="wb")
if(identical(opt,1)) on.exit(try(close(con), silent=TRUE), add=TRUE)
if(identical(opt,2)) on.exit(close(con), add=TRUE) # regular on.exit() call
writeLines(as.character(txt), con=con, sep=eol)
if(isOpen(con=con)) close(con)
}
fxWrite(txt1, "test1.txt", 0) # no on.exit()
fxWrite(txt1, "test2.txt", 1)
fxWrite(txt1, "test3.txt", 2) # regular on.exit(),
# with opt=2 the file gets written, BUT : Error in close.connection(con) : invalid connection

Suppress start-up messages when loading a library to snowfall cluster with sfLibrary

Example of the code I am running below.
library(snowfall)
library(snow)
sfInit(parallel = TRUE, cpus = 3)
sfLibrary(raster)
Library raster loaded.
Library raster loaded in cluster.
I want to stop sfLibrary from printing the messages. I can't figure out how. Help please...
Thanks.
EDIT 1: This does not work:
suppressMessages(sfLibrary(raster))
Library raster loaded.
EDIT 2: This does not work:
suppressPackageStartupMessages(sfLibrary(raster))
Library raster loaded.
Library raster loaded in cluster.
Use the Source.
If you look at the source code for sfLibrary, specifically where it prints those messages, you'll see that is uses sfCat. Tracing that down (same file), it uses cat.
I know of two ways to prevent cat from dumping onto the console: capture.output and sink.
capture.output: "evaluates its arguments with the output being returned as a character string or sent to a file".
cat("quux4\n")
# quux4
invisible(capture.output(cat("quux5\n")))
cat("quux6\n")
# quux6
Since capture.output returns the captured output visibly as a character vector, wrapping it in invisible or storing the return value into a variable (that is ignored and/or removed) will prevent its output on the console.
sink: "send R output to a file".
cat("quux1\n")
# quux1
sink("ignore_me.txt")
cat("quux2\n")
sink(NULL) # remove the sink
cat("quux3\n")
# quux3
I personally find the use of sink (in general) to be with some risks, especially in automation. One good example is that knitr uses sink when capturing the output for code chunks; nested calls to sink have issues. An astute reader will notice that capture.output uses sink, so neither is better in that regard.
Looking again at the source (first link above),
else {
## Load message in slave logs.
sfCat( paste( "Library", .sfPars$package, "loaded.\n" ) )
## Message in masterlog.
message( paste( "Library", .sfPars$package, "loaded in cluster.\n" ) )
}
you'll see that it also calls message, which is not caught by capture.output by default. You can always use capture.output(..., type="message"), but then you aren't capturing the cat output as well. So you are left with having to capture both types, either with nested capture.output or with suppressMessages.
I suggest you can either use suppressMessages(invisible(capture.output(sfLibrary(raster)))) or write some helper function that does that for you.

Is rJava object is exportable in future(Package for Asynchronous computing in R)

I'm trying to speed up my R code using future package by using mutlicore plan on Linux. In future definition I'm creating a java object and trying to pass it to .jcall(), But I'm getting a null value for java object in future. Could anyone please help me out to resolve this. Below is sample code-
library("future")
plan(multicore)
library(rJava)
.jinit()
# preprocess is a user defined function
my_value <- preprocess(a = value){
# some preprocessing task here
# time consuming statistical analysis here
return(lreturn) # return a list of 3 components
}
obj=.jnew("java.custom.class")
f <- future({
.jcall(obj, "V", "CustomJavaMethod", my_value)
})
Basically I'm dealing with large streaming data. In above code I'm sending the string of streaming data to user defined function for statistical analysis and returning the list of 3 components. Then want to send this list to custom java class [ java.custom.class ]for further processing using custom Java method [ CustomJavaMethod ].
Without using future my code is running fine. But I'm getting 12 streaming records in one minute and then my code is getting slow, observed delay in processing.
Currently I'm using Unix with 16 cores. After using future package my process is done fast. I have traced back my code, in .jcall something happens wrong.
Hope this clarifies my pain.
(Author of the future package here:)
Unfortunately, there are certain types of objects in R that cannot be sent to another R process for further processing. To clarify, this is a limitation to those type of objects - not to the parallel framework use (here the future framework). This simplest example of such an objects may be a file connection, e.g. con <- file("my-local-file.txt", open = "wb"). I've documented some examples in Section 'Non-exportable objects' of the 'Common Issues with Solutions' vignette (https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html).
As mentioned in the vignette, you can set an option (*) such that the future framework looks for these type of objects and gives an informative error before attempting to launch the future ("early stopping"). Here is your example with this check activated:
library("future")
plan(multisession)
## Assert that global objects can be sent back and forth between
## the main R process and background R processes ("workers")
options(future.globals.onReference = "error")
library("rJava")
.jinit()
end <- .jnew("java/lang/String", " World!")
f <- future({
start <- .jnew("java/lang/String", "Hello")
.jcall(start, "Ljava/lang/String;", "concat", end)
})
# Error in FALSE :
# Detected a non-exportable reference ('externalptr') in one of the
# globals ('end' of class 'jobjRef') used in the future expression
So, yes, your example actually works when using plan(multicore). The reason for that is that 'multicore' uses forked processes (available on Unix and macOS but not Windows). However, I would try my best to limit your software to parallelize only on "forkable" systems; if you can find an alternative approach I would aim for that. That way your code will also work on, say, a huge cloud cluster.
(*) The reason for these checks not being enabled by default is (a) it's still in beta testing, and (b) it comes with overhead because we basically need to scan for non-supported objects among all the globals. Whether these checks will be enabled by default in the future or not, will be discussed over at https://github.com/HenrikBengtsson/future.
The code in the question is calling unknown Method1 method, my_value is undefined, ... hard to know what you are really trying to achieve.
Take a look at the following example, maybe you can get inspiration from it:
library(future)
plan(multicore)
library(rJava)
.jinit()
end = .jnew("java/lang/String", " World!")
f <- future({
start = .jnew("java/lang/String", "Hello")
.jcall(start, "Ljava/lang/String;", "concat", end)
})
value(f)
[1] "Hello World!"

Sink does not release file

I know that the sink() function can be used to divert R output into a file, e.g.
sink('sink-closing.txt')
cat('Hello world!')
sink()
Is there a simple command to close all outstanding sinks?
Below, I elaborate on my question.
Suppose that my R-script opens a sink() in an R-script, but there is an error in the R-script which occurs before the script closes the sink(). I may run the R-script multiple times, trying to fix the error. Finally, I want to close all the sinks and print to the console. How do I do so?
Finally, in the interest of concreteness, I provide a MWE to illustrate the problem I face.
First, I write an R-script sink-closing.R which has an error in it.
sink('sink-closing.txt')
foo <- function() {
cat(sprintf('Hello world! My name is %s\n',
a.variable.that.does.not.exist))
}
foo()
sink()
Next, I source the R-script multiple times, say 3 times by mistake as I try to find and fix the bug.
> source('~/Dropbox/cookbook/r-cookbook/sink-closing.R')
Error in sprintf("Hello world! My name is %s\n", a.variable.that.does.not.exist) :
object 'a.variable.that.does.not.exist' not found
Now, suppose that I am debugging the R-script and want to print to the console. I can call sink() multiple times to close the earlier sinks. If I call it 3 times, then I can finally print to the console as before. But how do I know how many sinks I need to close?
closeAllConnections() # .........................
I'm getting upvotes for this as time goes along but Simon.S.A and others are better.
You can use sink.number() to tell you how many diversions are already set and then call sink that many times. Putting it into a function you could have this
sink.reset <- function(){
for(i in seq_len(sink.number())){
sink(NULL)
}
}
Based on #mnel's comment:
sinkall <- function() {
i <- sink.number()
while (i > 0) {
sink()
i <- i - 1
}
}
Should close all open sinks.
You may also encounter this problem when dealing with devices and plots, where the number of open devices isn't reported anywhere. For a more general case you could use this:
stopWhenError <- function(FUN) {
tryCatch({
while(TRUE) {
FUN()
}
}, warning = function(w) {
print("All finished!")
}, error = function(e) {
print("All finished!")
})
}
stopWhenError(sink) # for sink.
stopWhenError(dev.off) # close all open plotting devices.
EDIT:
sink throws a warning not an error so I've modified the code so that it won't run forever, whoops!
The most common time I experience this is when an error occurs preventing a sink from closing. For example, the following will leave an open sink after execution.
sink("output.txt")
my_function_that_will_error()
sink()
This can be avoided using on.exit(sink()). This will close the sink "when the current function exits (either naturally or as the result of an error)" (documentation here).
But you do have to change the order:
sink("output.txt")
on.exit(sink())
my_function_that_might_error()
So we create the sink, tell R to close it when it exits, and then execute the code that might error. This will close the sink regardless of whether the code errors or not.

Getting path of an R script

Is there a way to programmatically find the path of an R script inside the script itself?
I am asking this because I have several scripts that use RGtk2 and load a GUI from a .glade file.
In these scripts I am obliged to put a setwd("path/to/the/script") instruction at the beginning, otherwise the .glade file (which is in the same directory) will not be found.
This is fine, but if I move the script in a different directory or to another computer I have to change the path. I know, it's not a big deal, but it would be nice to have something like:
setwd(getScriptPath())
So, does a similar function exist?
This works for me:
getSrcDirectory(function(x) {x})
This defines an anonymous function (that does nothing) inside the script, and then determines the source directory of that function, which is the directory where the script is.
For RStudio only:
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
This works when Running or Sourceing your file.
Use source("yourfile.R", chdir = T)
Exploit the implicit "--file" argument of Rscript
When calling the script using "Rscript" (Rscript doc) the full path of the script is given as a system parameter. The following function exploits this to extract the script directory:
getScriptPath <- function(){
cmd.args <- commandArgs()
m <- regexpr("(?<=^--file=).+", cmd.args, perl=TRUE)
script.dir <- dirname(regmatches(cmd.args, m))
if(length(script.dir) == 0) stop("can't determine script dir: please call the script with Rscript")
if(length(script.dir) > 1) stop("can't determine script dir: more than one '--file' argument detected")
return(script.dir)
}
If you wrap your code in a package, you can always query parts of the package directory.
Here is an example from the RGtk2 package:
> system.file("ui", "demo.ui", package="RGtk2")
[1] "C:/opt/R/library/RGtk2/ui/demo.ui"
>
You can do the same with a directory inst/glade/ in your sources which will become a directory glade/ in the installed package -- and system.file() will compute the path for you when installed, irrespective of the OS.
This answer works fine to me:
script.dir <- dirname(sys.frame(1)$ofile)
Note: script must be sourced in order to return correct path
I found it in: https://support.rstudio.com/hc/communities/public/questions/200895567-can-user-obtain-the-path-of-current-Project-s-directory-
But I still don´t understand what is sys.frame(1)$ofile. I didn´t find anything about that in R Documentation. Someone can explain it?
#' current script dir
#' #param
#' #return
#' #examples
#' works with source() or in RStudio Run selection
#' #export
z.csd <- function() {
# http://stackoverflow.com/questions/1815606/rscript-determine-path-of-the-executing-script
# must work with source()
if (!is.null(res <- .thisfile_source())) res
else if (!is.null(res <- .thisfile_rscript())) dirname(res)
# http://stackoverflow.com/a/35842176/2292993
# RStudio only, can work without source()
else dirname(rstudioapi::getActiveDocumentContext()$path)
}
# Helper functions
.thisfile_source <- function() {
for (i in -(1:sys.nframe())) {
if (identical(sys.function(i), base::source))
return (normalizePath(sys.frame(i)$ofile))
}
NULL
}
.thisfile_rscript <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
cmdArgsTrailing <- commandArgs(trailingOnly = TRUE)
cmdArgs <- cmdArgs[seq.int(from=1, length.out=length(cmdArgs) - length(cmdArgsTrailing))]
res <- gsub("^(?:--file=(.*)|.*)$", "\\1", cmdArgs)
# If multiple --file arguments are given, R uses the last one
res <- tail(res[res != ""], 1)
if (length(res) > 0)
return (res)
NULL
}
A lot of these solutions are several years old. While some may still work, there are good reasons against utilizing each of them (see linked source below). I have the best solution (also from source): use the here library.
Original example code:
library(ggplot2)
setwd("/Users/jenny/cuddly_broccoli/verbose_funicular/foofy/data")
df <- read.delim("raw_foofy_data.csv")
Revised code
library(ggplot2)
library(here)
df <- read.delim(here("data", "raw_foofy_data.csv"))
This solution is the most dynamic and robust because it works regardless of whether you are using the command line, RStudio, calling from an R script, etc. It is also extremely simple to use and is succinct.
Source: https://www.tidyverse.org/articles/2017/12/workflow-vs-script/
I have found something that works for me.
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
How about using system and shell commands? With the windows one, I think when you open the script in RStudio it sets the current shell directory to the directory of the script. You might have to add cd C:\ e.g or whatever drive you want to search (e.g. shell('dir C:\\*file_name /s', intern = TRUE) - \\ to escape escape character). Will only work for uniquely named files unless you further specify subdirectories (for Linux I started searching from /). In any case, if you know how to find something in the shell, this provides a layout to find it within R and return the directory. Should work whether you are sourcing or running the script but I haven't fully explored the potential bugs.
#Get operating system
OS<-Sys.info()
win<-length(grep("Windows",OS))
lin<-length(grep("Linux",OS))
#Find path of data directory
#Linux Bash Commands
if(lin==1){
file_path<-system("find / -name 'file_name'", intern = TRUE)
data_directory<-gsub('/file_name',"",file_path)
}
#Windows Command Prompt Commands
if(win==1){
file_path<-shell('dir file_name /s', intern = TRUE)
file_path<-file_path[4]
file_path<-gsub(" Directory of ","",file_path)
filepath<-gsub("\\\\","/",file_path)
data_directory<-file_path
}
#Change working directory to location of data and sources
setwd(data_directory)
Thank you for the function, though I had to adjust it a Little as following for me (W10):
#Windows Command Prompt Commands
if(win==1){
file_path<-shell('dir file_name', intern = TRUE)
file_path<-file_path[4]
file_path<-gsub(" Verzeichnis von ","",file_path)
file_path<-chartr("\\","/",file_path)
data_directory<-file_path
}
In my case, I needed a way to copy the executing file to back up the original script together with its outputs. This is relatively important in research. What worked for me while running my script on the command line, was a mixure of other solutions presented here, that looks like this:
library(scriptName)
file_dir <- gsub("\\", "/", fileSnapshot()$path, fixed=TRUE)
file.copy(from = file.path(file_dir, scriptName::current_filename()) ,
to = file.path(new_dir, scriptName::current_filename()))
Alternatively, one can add to the file name the date and our to help in distinguishing that file from the source like this:
file.copy(from = file.path(current_dir, current_filename()) ,
to = file.path(new_dir, subDir, paste0(current_filename(),"_", Sys.time(), ".R")))
None of the solutions given so far work in all circumstances. Worse, many solutions use setwd, and thus break code that expects the working directory to be, well, the working directory — i.e. the code that the user of the code chose (I realise that the question asks about setwd() but this doesn’t change the fact that this is generally a bad idea).
R simply has no built-in way to determine the path of the currently running piece of code.
A clean solution requires a systematic way of managing non-package code. That’s what ‘box’ does. With ‘box’, the directory relative to the currently executing code can be found trivially:
box::file()
However, that isn’t the purpose of ‘box’; it’s just a side-effect of what it actually does: it implements a proper, modern module system for R. This includes organising code in (nested) modules, and hence the ability to load code from modules relative to the currently running code.
To load code with ‘box’ you wouldn’t use e.g. source(file.path(box::file(), 'foo.r')). Instead, you’d use
box::use(./foo)
However, box::file() is still useful for locating data (i.e. OP’s use-case). So, for instance, to locate a file mygui.glade from the current module’s path, you would write.
glade_path = box::file('mygui.glade')
And (as long as you’re using ‘box’ modules) this always works, doesn’t require any hacks, and doesn’t use setwd.

Resources