call to sapply() works in interactive mode, not in batch mode - r

I need to execute some commands in batch mode (e.g., via Rscript). They work in interactive mode, but not in batch mode. Here is a minimal example: sapply(1:3, is, "numeric"). Why does this work in interactive mode but return an error in batch mode? Is there a way to make a command like this work in batch mode?
More specifically, I need to write scripts and to run them in batch mode. They need to call a function (which I didn't write and can't edit) that looks like this:
testfun <- function (...)
{
args <- list(...)
if (any(!sapply(args, is, "numeric")))
stop("All arguments must be numeric.")
else
writeLines("All arguments look OK.")
}
I need to pass a list to this function. A command like testfun(list(1, 2, 3)) works in interactive mode. But in batch mode, it produces an error: Error in match.fun(FUN) : object 'is' not found. I tried debugger() to get a handle on the problem, but it didn't give me any insight. I also looked through r-help, the R FAQ, R Inferno, but I couldn't find anything that spoke to this problem.

Rscript doesn't load the methods package by default because it takes a lot of time. From the Details section of ?Rscript:
‘--default-packages=list’ where ‘list’ is a comma-separated list
of package names or ‘NULL’. Sets the environment variable
‘R_DEFAULT_PACKAGES’ which determines the packages loaded on
startup. The default for ‘Rscript’ omits ‘methods’ as it
takes about 60% of the startup time.
You can make it load methods by using the --default-packages argument.
> Rscript -e 'sapply(1:3, is, "numeric")' --default-packages='methods'
[1] TRUE TRUE TRUE

Related

Stop R executing System or Shell commands

i'd like to disable commands that can execute other non R related stuff like System(), Shell() e.g.
for (year in 2010:2915){
system("calc")
}
from running within R.
any suggestions other than locking down the user executing?
thanks
edit: to add more context, we allow the users to create R scripts in our solution which are passed to the R Engine to execute, we then process those results.
Short of editing the R source code to remove the undesirable functions, which would be tedious and probably a bit dangerous, I would override these functions:
# override system()
env <- as.environment("package:base")
unlockBinding("system", env) # bindings in the base R are write-protected
assign(
"system",
function(...){stop("This is a forbidden command!")},
envir=env
)
lockBinding("system", env)
This would give the following when a user runs system():
> system()
Error in system() : this is a forbidden command
So that the changes take effect each time R is started, you could override as many functions as you want this way, adding them to .First() in your (write-protected) "Rprofile.site" file:
.First <- function(){
# code to override system() here
# code to override shell() here
# ...
}
Note that this will not prevent an ill-intentioned determined user from re-implementing the forbidden functionality though.

R: How make dump.frames() include all variables for later post-mortem debugging with debugger()

I have the following code which provokes an error and writes a dump of all frames using dump.frames() as proposed e. g. by Hadley Wickham:
a <- -1
b <- "Hello world!"
bad.function <- function(value)
{
log(value) # the log function may cause an error or warning depending on the value
}
tryCatch( {
a.local.value <- 42
bad.function(a)
bad.function(b)
},
error = function(e)
{
dump.frames(to.file = TRUE)
})
When I restart the R session and load the dump to debug the problem via
load(file = "last.dump.rda")
debugger(last.dump)
I cannot find my variables (a, b, a.local.value) nor my function "bad.function" anywhere in the frames.
This makes the dump nearly worthless to me.
What do I have to do to see all my variables and functions for a decent post-mortem analysis?
The output of debugger is:
> load(file = "last.dump.rda")
> debugger(last.dump)
Message: non-numeric argument to mathematical functionAvailable environments had calls:
1: tryCatch({
a.local.value <- 42
bad.function(a)
bad.function(b)
2: tryCatchList(expr, classes, parentenv, handlers)
3: tryCatchOne(expr, names, parentenv, handlers[[1]])
4: value[[3]](cond)
Enter an environment number, or 0 to exit
Selection:
PS: I am using R3.3.2 with RStudio for debugging.
Update Nov. 20, 2016: Note that it is not an R bug (see answer of Martin Maechler). I did not change my answer for reproducibility. The described work around still applies.
Summary
I think dump.frames(to.file = TRUE) is currently an anti pattern (or probably a bug) in R if you want to debug errors of batch jobs in a new R session.
You should better replace it with
dump.frames()
save.image(file = "last.dump.rda")
or
options(error = quote({dump.frames(); save.image(file = "last.dump.rda")}))
instead of
options(error = dump.frames)
because the global environment (.GlobalEnv = the user workspace you normally create your objects) is included then in the dump while it is missing when you save the dump directly via dump.frames(to.file = TRUE).
Impact analysis
Without the .GlobalEnv you loose important top level objects (and their current values ;-) to understand the behaviour of your code that led to an error!
Especially in case of errors in "non-interactive" R batch jobs you are lost without .GlobalEnv since you can debug only in a newly started (empty) interactive workspace where you then can only access the objects in the call stack frames.
Using the code snippet above you can examine the object values that led to the error in a new R workspace as usual via:
load(file = "last.dump.rda")
debugger(last.dump)
Background
The implementation of dump.frames creates a variable last.dump in the workspace and fills it with the environments of the call stack (sys.frames(). Each environment contains the "local variables" of the called function). Then it saves this variable into a file using save().
The frame stack (call stack) grows with each call of a function, see ?sys.frames:
.GlobalEnv is given number 0 in the list of frames. Each subsequent
function evaluation increases the frame stack by 1 and the [...] environment for evaluation of that function are returned by [...] sys.frame with the appropriate index.
Observe that the .GlobalEnv has the index number 0.
If I now start debugging the dump produced by the code in the question and select the frame 1 (not 0!) I can see a variable parentenv which points (references) the .GlobalEnv:
Browse[1]> environmentName(parentenv)
[1] "R_GlobalEnv"
Hence I believe that sys.frames does not contain the .GlobalEnv and therefore dump.frames(to.file = TRUE) neither since it only stores the sys.frames without all other objects of the .GlobalEnv.
Maybe I am wrong, but this looks like an unwanted effect or even a bug.
Discussions welcome!
References
https://cran.r-project.org/doc/manuals/R-exts.pdf
Excerpt from section 4.2 Debugging R code (page 96):
Because last.dump can be looked at later or even in another R session,
post-mortem debug- ging is possible even for batch usage of R. We do
need to arrange for the dump to be saved: this can be done either
using the command-line flag
--save to save the workspace at the end of the run, or via a setting such as
options(error = quote({dump.frames(to.file=TRUE); q()}))
Note that it is often more productive to work with the R Core team rather than just telling that R has a bug. It clearly has no bug, here, as it behaves exactly as documented.
Also there is no problem if you work interactively, as you have full access to your workspace (which may be LARGE) there, so the problem applies only to batch jobs (as you've mentioned).
What we rather have here is a missing feature and feature requests (and bug reports!) should happen on the R bug site (aka _'R bugzilla'), https://bugs.r-project.org/ ... typically however after having read the corresponding page on the R website: https://www.r-project.org/bugs.html.
Note that R bugzilla is searchable, and in the present case, you'd pretty quickly find that Andreas Kersting made a nice proposal (namely as a wish, rather than claiming a bug),
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17116
and consequently I had added the missing feature to R, on Aug.16, already.
Yes, of course, the development version of R, aka R-devel.
See also today's thread on the R-devel mailing list,
https://stat.ethz.ch/pipermail/r-devel/2016-November/073378.html

How to hide warning messages from self-written function in R?

I am running an ordinary R script in which I have a self-written function. The function makes use of rm() which often produces warnings I do not want to appear in console output. Any of these solutions:
hiding warnings from rm usage from this particular self-written function,
hiding warnings from all usages of rm (globally for an R session)
would satisfy me.
foo.function <- function(){
rm(foo.object)
print("foo")
}
foo.function()
# [1] "foo"
# Warning message:
# In rm(foo.object) : object 'foo.object' not found
For these particular case of object not found, you may use something like this:
if(exists("foo.object")) rm(foo.object)
If you want to hide other warnings as well, just use suppressWarnings().

Debugging a function in a different source file in R

I'm using RStudio and I want to be able to stop the code execution at a specific line.
The functions are defined in the first script file and called from a second.
I source the first file into the second one using source("C:/R/script1.R")
I used run from beginning to line: where I start running from the second script which has the function calls and have highlighted a line in the first script where the function definitions are.
I then use browser() to view the variables. However this is not ideal as there are some large matrices involved. Is there a way to make these variables appear in RStudio's workspace?
Also when I restart using run from line to end it only runs to the end of the called first script file it does not return to the calling function and complete the running of the second file.
How can I achieve these goals in RStudio?
OK here is a trivial example the function adder below is defined in one script
adder<-function(a,b) {
browser()
return(a+b)
}
I than call is from a second script
x=adder(3,4)
When adder is called in the second script is starts browser() in the first one. From here I can use get("a") to get the value of a, but the values of a and b do not appear in the workspace in RStudio?
In the example here it does not really matter but when you have several large matrices it does.
If you assign the data, into the .GlobalEnv it will be shown in RStudio's "Workspace" tab.
> adder(3, 4)
Called from: adder(3, 4)
Browse[1]> a
[1] 3
Browse[1]> b
[1] 4
Browse[1]> assign('a', a, pos=.GlobalEnv)
Browse[1]> assign('b', b, pos=.GlobalEnv)
Browse[1]> c
[1] 7
> a
[1] 3
> b
[1] 4
What you refer to as RStudio's workspace is the global environment in an R session. Each function lives in its own small environment, not exposing its local variables to the global environment. Therefore a is not present in the object inspector of RStudio.
This is good programming practice as it shields sections of a larger script from each other, reducing the amount of unwanted interaction. For example, if you use i as a counter in one function, this does not influence the value of a counter i in another function.
You can inspect a when you are in the browser session by using any of the usual functions. For example,
head(a)
str(a)
summary(a)
View(a)
attributes(a)
One common tactic after calling browser is to get a summary of all variables in the current (parent) environment. Make it a habit that every time you stop code with browser, you immediately type ls.str() at the command line.

How to preserve changes to function with fix() between R sessions?

If I edit a function with R v2.14.0 using fix(), those fixes are applied during the session.
For example, I might make the following edit to get a white background in a hive plot:
> library(HiveR)
> fix(plotHive)
... :%s/black/white/g
... :w
... :q
> plotHive(myHiveData)
I then get a white background in the hive plot, as expected.
But if I quit and reopen R, I have lost those changes, and the plot has a black background again.
How do I preserve the edits I make with fix() between R sessions?
EDIT
If I source() the modified plotHive() function, I get the following error:
> modifiedPlotHive <- source("modifiedPlotHive.R")
Error in source("modifiedPlotHive.R") :
modifiedPlotHive.R:1160:1: unexpected '<'
1159: }
1160: <
^
In addition: Warning message:
In readLines(file) : incomplete final line found on 'modifiedPlotHive.R'
The final line in the modified plotHive() function is:
<environment: namespace:HiveR>
If I remove this line before source()-ing, then the function no longer works.
Sorry I missed this when it came out, but the latest version of HiveR has the option to control the background color (available on CRAN 0.2-1) Bryan
Here's the safer way of doing what you want, referenced by #joran.
The sink/source pair is fine for dealing with R code files. But saving to text files and then reading back in other types of objects can strip them of important attributes, especially those relating to environments. That's what you just experienced.
The save/load pair stores objects in R's own binary format, so is much less liable to lose important information/environments attached to functions.
In this example, I define a personal version of ls, which differs from the base function in that it by default lists objects that start with a dot/period:
my_ls <- ls
fix(my_ls)
# 1) On the first line, change 'all.names=FALSE' to 'all.names=TRUE'
# 2) Say "Yes", I want to save the changes
save("my_ls", file="my_ls.Rdata")
# Then, in a later session, test that it works
load("my_ls.Rdata")
.TrysToHide <- 99
my_ls()
# [1] ".TrysToHide" "my_ls"
One more note: it's much cleaner to give your modified function a name of its own. To really edit a packaged function, and have the changes persist, you'd need to edit the sources and recompile the package. But if you do that, beware, as you may well break the function for other packaged functions that depend on it.
There are a couple of options:
Save your workspace before quiting and load it again when you reopen R.
Save the modified function to script file and source it:
sink("modified_plotHive.r")
plotHive
sink()
In the next session:
plotHive <- source("modified_plotHive.r")
HTH

Resources