Suppress start-up messages when loading a library to snowfall cluster with sfLibrary - r

Example of the code I am running below.
library(snowfall)
library(snow)
sfInit(parallel = TRUE, cpus = 3)
sfLibrary(raster)
Library raster loaded.
Library raster loaded in cluster.
I want to stop sfLibrary from printing the messages. I can't figure out how. Help please...
Thanks.
EDIT 1: This does not work:
suppressMessages(sfLibrary(raster))
Library raster loaded.
EDIT 2: This does not work:
suppressPackageStartupMessages(sfLibrary(raster))
Library raster loaded.
Library raster loaded in cluster.

Use the Source.
If you look at the source code for sfLibrary, specifically where it prints those messages, you'll see that is uses sfCat. Tracing that down (same file), it uses cat.
I know of two ways to prevent cat from dumping onto the console: capture.output and sink.
capture.output: "evaluates its arguments with the output being returned as a character string or sent to a file".
cat("quux4\n")
# quux4
invisible(capture.output(cat("quux5\n")))
cat("quux6\n")
# quux6
Since capture.output returns the captured output visibly as a character vector, wrapping it in invisible or storing the return value into a variable (that is ignored and/or removed) will prevent its output on the console.
sink: "send R output to a file".
cat("quux1\n")
# quux1
sink("ignore_me.txt")
cat("quux2\n")
sink(NULL) # remove the sink
cat("quux3\n")
# quux3
I personally find the use of sink (in general) to be with some risks, especially in automation. One good example is that knitr uses sink when capturing the output for code chunks; nested calls to sink have issues. An astute reader will notice that capture.output uses sink, so neither is better in that regard.
Looking again at the source (first link above),
else {
## Load message in slave logs.
sfCat( paste( "Library", .sfPars$package, "loaded.\n" ) )
## Message in masterlog.
message( paste( "Library", .sfPars$package, "loaded in cluster.\n" ) )
}
you'll see that it also calls message, which is not caught by capture.output by default. You can always use capture.output(..., type="message"), but then you aren't capturing the cat output as well. So you are left with having to capture both types, either with nested capture.output or with suppressMessages.
I suggest you can either use suppressMessages(invisible(capture.output(sfLibrary(raster)))) or write some helper function that does that for you.

Related

Is rJava object is exportable in future(Package for Asynchronous computing in R)

I'm trying to speed up my R code using future package by using mutlicore plan on Linux. In future definition I'm creating a java object and trying to pass it to .jcall(), But I'm getting a null value for java object in future. Could anyone please help me out to resolve this. Below is sample code-
library("future")
plan(multicore)
library(rJava)
.jinit()
# preprocess is a user defined function
my_value <- preprocess(a = value){
# some preprocessing task here
# time consuming statistical analysis here
return(lreturn) # return a list of 3 components
}
obj=.jnew("java.custom.class")
f <- future({
.jcall(obj, "V", "CustomJavaMethod", my_value)
})
Basically I'm dealing with large streaming data. In above code I'm sending the string of streaming data to user defined function for statistical analysis and returning the list of 3 components. Then want to send this list to custom java class [ java.custom.class ]for further processing using custom Java method [ CustomJavaMethod ].
Without using future my code is running fine. But I'm getting 12 streaming records in one minute and then my code is getting slow, observed delay in processing.
Currently I'm using Unix with 16 cores. After using future package my process is done fast. I have traced back my code, in .jcall something happens wrong.
Hope this clarifies my pain.
(Author of the future package here:)
Unfortunately, there are certain types of objects in R that cannot be sent to another R process for further processing. To clarify, this is a limitation to those type of objects - not to the parallel framework use (here the future framework). This simplest example of such an objects may be a file connection, e.g. con <- file("my-local-file.txt", open = "wb"). I've documented some examples in Section 'Non-exportable objects' of the 'Common Issues with Solutions' vignette (https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html).
As mentioned in the vignette, you can set an option (*) such that the future framework looks for these type of objects and gives an informative error before attempting to launch the future ("early stopping"). Here is your example with this check activated:
library("future")
plan(multisession)
## Assert that global objects can be sent back and forth between
## the main R process and background R processes ("workers")
options(future.globals.onReference = "error")
library("rJava")
.jinit()
end <- .jnew("java/lang/String", " World!")
f <- future({
start <- .jnew("java/lang/String", "Hello")
.jcall(start, "Ljava/lang/String;", "concat", end)
})
# Error in FALSE :
# Detected a non-exportable reference ('externalptr') in one of the
# globals ('end' of class 'jobjRef') used in the future expression
So, yes, your example actually works when using plan(multicore). The reason for that is that 'multicore' uses forked processes (available on Unix and macOS but not Windows). However, I would try my best to limit your software to parallelize only on "forkable" systems; if you can find an alternative approach I would aim for that. That way your code will also work on, say, a huge cloud cluster.
(*) The reason for these checks not being enabled by default is (a) it's still in beta testing, and (b) it comes with overhead because we basically need to scan for non-supported objects among all the globals. Whether these checks will be enabled by default in the future or not, will be discussed over at https://github.com/HenrikBengtsson/future.
The code in the question is calling unknown Method1 method, my_value is undefined, ... hard to know what you are really trying to achieve.
Take a look at the following example, maybe you can get inspiration from it:
library(future)
plan(multicore)
library(rJava)
.jinit()
end = .jnew("java/lang/String", " World!")
f <- future({
start = .jnew("java/lang/String", "Hello")
.jcall(start, "Ljava/lang/String;", "concat", end)
})
value(f)
[1] "Hello World!"

In R, how do I detect file access?

I would like to write a callback function that detects and logs any file-access during an R session.
There are tons of different built-in ways to open a connection in R so it's unreliable to search 'open','file','url','read','save', etc. in the expression argument of my callback function. There must be some generic event that all these different connection-manipulating functions converge on, right?
So, how do I detect such an event programatically in a platform-agnostic way? Thanks.
The following does not work. I guess the connections are already closed by the time the callback is triggered...
cb <- taskCallbackManager();
cb$add(function(xpr,val,ok,visible){
if(length( cons<-showConnections())>0) print( cons)
else print('0');
TRUE;
},'mycb');
# [1] "mycb"
# [1] "0"
sample <- read.table("http://www.ats.ucla.edu/stat/examples/ara/angell.txt")
# [1] "0"
I still have not found an answer but I have found a workaround. Most file access functions follow a similar syntax, with the location of the file as the first argument so I wrote a wrapper that works with most of them following this pattern:
tread <- function(file,...,readfun){
filename <- deparse(match.call()$file);
loaded <- readfun(file,...);
# put your logging/MD5-check/whatever here
return(loaded);
}
It can even be used transparently in a multi-author setting by locally reassigning for example read_csv to the wrapper at the beginning of the script or some global config file such that readfun is a fully qualified name of, in this case, readr::read_csv

call to sapply() works in interactive mode, not in batch mode

I need to execute some commands in batch mode (e.g., via Rscript). They work in interactive mode, but not in batch mode. Here is a minimal example: sapply(1:3, is, "numeric"). Why does this work in interactive mode but return an error in batch mode? Is there a way to make a command like this work in batch mode?
More specifically, I need to write scripts and to run them in batch mode. They need to call a function (which I didn't write and can't edit) that looks like this:
testfun <- function (...)
{
args <- list(...)
if (any(!sapply(args, is, "numeric")))
stop("All arguments must be numeric.")
else
writeLines("All arguments look OK.")
}
I need to pass a list to this function. A command like testfun(list(1, 2, 3)) works in interactive mode. But in batch mode, it produces an error: Error in match.fun(FUN) : object 'is' not found. I tried debugger() to get a handle on the problem, but it didn't give me any insight. I also looked through r-help, the R FAQ, R Inferno, but I couldn't find anything that spoke to this problem.
Rscript doesn't load the methods package by default because it takes a lot of time. From the Details section of ?Rscript:
‘--default-packages=list’ where ‘list’ is a comma-separated list
of package names or ‘NULL’. Sets the environment variable
‘R_DEFAULT_PACKAGES’ which determines the packages loaded on
startup. The default for ‘Rscript’ omits ‘methods’ as it
takes about 60% of the startup time.
You can make it load methods by using the --default-packages argument.
> Rscript -e 'sapply(1:3, is, "numeric")' --default-packages='methods'
[1] TRUE TRUE TRUE

How to preserve changes to function with fix() between R sessions?

If I edit a function with R v2.14.0 using fix(), those fixes are applied during the session.
For example, I might make the following edit to get a white background in a hive plot:
> library(HiveR)
> fix(plotHive)
... :%s/black/white/g
... :w
... :q
> plotHive(myHiveData)
I then get a white background in the hive plot, as expected.
But if I quit and reopen R, I have lost those changes, and the plot has a black background again.
How do I preserve the edits I make with fix() between R sessions?
EDIT
If I source() the modified plotHive() function, I get the following error:
> modifiedPlotHive <- source("modifiedPlotHive.R")
Error in source("modifiedPlotHive.R") :
modifiedPlotHive.R:1160:1: unexpected '<'
1159: }
1160: <
^
In addition: Warning message:
In readLines(file) : incomplete final line found on 'modifiedPlotHive.R'
The final line in the modified plotHive() function is:
<environment: namespace:HiveR>
If I remove this line before source()-ing, then the function no longer works.
Sorry I missed this when it came out, but the latest version of HiveR has the option to control the background color (available on CRAN 0.2-1) Bryan
Here's the safer way of doing what you want, referenced by #joran.
The sink/source pair is fine for dealing with R code files. But saving to text files and then reading back in other types of objects can strip them of important attributes, especially those relating to environments. That's what you just experienced.
The save/load pair stores objects in R's own binary format, so is much less liable to lose important information/environments attached to functions.
In this example, I define a personal version of ls, which differs from the base function in that it by default lists objects that start with a dot/period:
my_ls <- ls
fix(my_ls)
# 1) On the first line, change 'all.names=FALSE' to 'all.names=TRUE'
# 2) Say "Yes", I want to save the changes
save("my_ls", file="my_ls.Rdata")
# Then, in a later session, test that it works
load("my_ls.Rdata")
.TrysToHide <- 99
my_ls()
# [1] ".TrysToHide" "my_ls"
One more note: it's much cleaner to give your modified function a name of its own. To really edit a packaged function, and have the changes persist, you'd need to edit the sources and recompile the package. But if you do that, beware, as you may well break the function for other packaged functions that depend on it.
There are a couple of options:
Save your workspace before quiting and load it again when you reopen R.
Save the modified function to script file and source it:
sink("modified_plotHive.r")
plotHive
sink()
In the next session:
plotHive <- source("modified_plotHive.r")
HTH

Suppressing "null device" output with R in batch mode

I have a number of bash scripts which invoke R scripts for plotting things. Something like:
#!/bin/bash
R --vanilla --slave <<RSCRIPT
cat("Plotting $1 to $2\n")
input <- read.table("$1")
png("$2")
plot(as.numeric(input[1,]))
dev.off()
RSCRIPT
The problem is that despite --slave, the call to dev.off() prints the message null device 1. Once there are a lot of plots being done, or for more complex scripts which plot to a number of files, this gets to be a real hassle.
Is there some way to suppress this message?
For no good reason I'm aware of, dev.off(), unlike device related functions like png() returns a value: "the number and name of the new active device." That value is what's being echoed to stdout.
Suppressing it can thus be achieved by just putting it somewhere, i.e.,
garbage <- dev.off()
One of the nice things about R is that you can view the source of many functions:
> dev.off
function (which = dev.cur())
{
if (which == 1)
stop("cannot shut down device 1 (the null device)")
.Internal(dev.off(as.integer(which)))
dev.cur()
}
<environment: namespace:grDevices>
So it calls .Internal(dev.off(...)) and then returns dev.cur(), which I suppose would be useful if you have several devices open so you know which one became active. You could use .Internal(dev.off(as.integer(dev.cur()))) in your script, or even patch dev.off to only return the value of dev.cur() if it is something else than the null device, and send the patch to the maintainers of R.
Also, graphics.off() calls dev.off() for all devices and doesn't return anything.
Ran into the same issue recently and noticed that one more possibility is not mentioned in the answers here:
invisible(dev.off())
This will hide the output from dev.off() and will not create additional variables unlike assigning the output to garbage variable: garbage <- def.off() would.
Another option would be to use sink() and output everything to a log file, so you can check up on whether the plots worked if you need to.
You can use littler instead which is a) an easier way to write R 'scripts' and b) suppresses output so you get the side effect of dev.off being silent:
$ foo.r /tmp/foo.txt /tmp/foo.png
Plotting /tmp/foo.txt to /tmp/foo.png
$ cat /tmp/foo.r
#!/usr/bin/r
cat("Plotting", argv[1], "to", argv[2], "\n")
input <- read.table(argv[1])
png(argv[2])
plot(as.numeric(input[1,]))
dev.off()
$
Rscript will probably work too; I tend to prefer littler.

Resources