R: Creating a custom error message giving argument values for a function in a package - r

Suppose there is a set of functions, drawn from a package not written by me, that I want to assign to a special behavior on error. My current concern is with the _impl family of functions in dplyr. Take mutate_impl, for example. When I get an error from mutate, traceback almost always leads me to mutate_impl, but it is usually a ways up the call stack -- I have seen it be as many as 15 calls from the call to mutate. So what I want to know at that point is typically how the arguments to mutate_impl relate to the arguments I originally supplied to mutate (or think I did).
So, this code is probably wrong in too many ways to count -- certainly it does not work -- but I hope it at least helps to express my intent. The idea is that I could wrap it around mutate_impl, and if it produces an error, it saves the error message and a description of the arguments and returns them as a list
str_impl <- function(f){tryCatch(f, error = function(c) {
msg <- conditionMessage(c)
args <- capture.output(str(as.list(match.call(call(f)))))
list(message = msg, arguments = args)
}
assign(str_impl(mutate_impl), .GlobalEnv)
Although, this still falls short of what I really want, because even without the constraint of working code, I could not figure out how to produce a draft. What I really want is to be able to identify a function or list of functions that I want to have this behavior on error, and then have it occur on error whenever and wherever that function is called. I could not think of any way to even start to do that without rewriting functions in the dplyr package environment, which struck me as a really bad idea.
The final assignment to the global environment is supposed to get the error object back to somewhere I can find it, even if the call to mutate_impl happens somewhere inaccessible, like in an environment that ceases to exist after the error.

Probably the best way of achieving what you want is via the trace functionality. It's surely worth reading the help about trace, but here is a working example:
library(dplyr)
trace("mutate_impl", exit = quote({
if (class(returnValue())[1]=="NULL") {
cat("df\n")
print(head(df))
cat("\n\ndots\n")
print(dots)
} else {
# no problem, nothing to do
}
}), where = mutate, print = FALSE)
# ok
xx <- mtcars %>% mutate(gear = gear * 2)
# not ok, extra output
xx <- mtcars %>% mutate(gear = hi * 2)
It should be fairly simple to adjust this to your specific needs, e.g. if you want to log to a file instead:
trace("mutate_impl", exit = quote({
if (class(returnValue())[1]=="NULL") {
sink("error.log")
cat("df\n")
print(head(df))
cat("\n\ndots\n")
print(dots)
sink()
} else {
# no problem, nothing to do
}
}), where = mutate, print = FALSE)

Related

Global Variables as Function Parameters

In an R project, we have a global dataframe df that is to be used inside a function my_func(). The dataframe will not be changed, but it will be used as a "read-only" table.
Can you please assist me, on, what is the best practice:
Include the dataframe in the parameters of the function, as in
my_func(df)
{
a <- df[1,2]
}
OR
Not include it in the parameters, just use it (read it) in the function body, as in
my_func()
{
a <- df[1,2]
}
In an ideal world, data enters a function as an argument and leaves it as a return value. That is a good principle. Besides it is prefereable for code reuse. Right now you may be conviced, that you will only ever call this code on df (bad name by the way, as there is a function calles df already in R and that can lead to terrible error messages).
The only exception from this rule, and the reason, why <<- exist(*), may rarely be performance.
However in the read-only case, there are no performance gains, as R does behave cleverly.
Will will need to install the microbenchmark package for the following code to run:
expl <- data.frame(a = rep("Hello world.", 1e8),
b = rep(1, 1e8))
fun1 <- function(dataframe) return(sum(dataframe$b))
fun2 <- function() return(sum(expl$b))
microbenchmark::microbenchmark(fun1(expl), fun2())
Try it and you will see, that there is no performance gain in fun2over fun1, even though the dataframe has considerable size.
Edit:
(*) as I have learned from Konrad Rudolph's comment below, <<- can be usefull, when giving data to the parent, not necessarily the global namespace. Very interesting read even if not strictly on topic here: http://adv-r.had.co.nz/Functional-programming.html#mutable-state

No visible binding for '<<-' Assignment

I have solved variants of the "no visible binding" notes that one gets when checking their package. However, I am unable to solve the situation when applied to the '<<-' assignment.
Specifically, I had defined and used a local variable in several functions, such as:
fName = function(df){
eval({vName<<-0}, envir=environment(fName))
}
However when I run check() from devtools, I get the error:
fName: no visible binding for '<<-' assignment to 'vName'
So, I tried using a different syntax, as follows:
fName = function(df){
assign("vName",0, envir=environment(fName))
}
But got the check() error:
cannot add bindings to a locked environment
When I tried:
fName = function(df){
assign("vName",0, envir=environment(-1))
}
I got the error:
use of NULL environment is defunct
So, my question is how I can accomplish the assignment operator <<- without getting a note in the check() from devtools.
Thank you.
The easy answer is - Don't use <<- in your package. One Alterna way you can make assignments to an environment (but not a meaningful one) is to create a locked binding.
e <- new.env()
e$vName <- 0L
lockBinding("vName", e)
vName
# Error: object 'vName' not found
with(e, vName)
# [1] 0
e$vName <- 5
# Error in e$vName <- 5 : cannot change value of locked binding for 'vName'
You can also lock an environment, but not a meaningful one.
lockEnvironment(e)
rm(vName, envir = e)
# Error in rm(vName, envir = e) :
# cannot remove bindings from a locked environment
Have a look at help(bindenv), it's a good read.
Updated Since you mentioned you might be waiting to make assignment rather than at load times, have a read of help(globalVariables) It's another best-seller at ?
For globalVariables, the names supplied are of functions or other objects that should be regarded as defined globally when the check tool is applied to this package. The call to globalVariables will be included in the package's source. Repeated calls in the same package accumulate the names of the global variables.
i dont know if it helps after that many years.I had the same problem,and what i did to solve it is :
utils::globalVariables(c("global_var"))
Write this r code somewhere inside the R directory(save it as R file).Whenever you assign a global var assign it also locally like so:
global_var<<-1
global_var<-1
That worked for me.

how to figure out which statement in lapply fails in R

I often have the situation like this:
result <- lapply(1:length(mylist), function(x){
doSomething(x)
})
However, if it fails, I have no idea which element in the list failed on doSomething().
So then I end up recoding it as a for loop:
for(i in 1: length(mylist)){
doSomething(mylist[[i]])
}
I can then see the last value of i and what happened. There must be a better way to do this right?? Thanks!
Notice how the error includes 5L
> lapply(1:10, function(i) if (i == 5) stop("oops"))
Error in FUN(1:10[[5L]], ...) : oops
indicating that the 5th iteration failed.
One simple option is to run the code:
options( error=recover )
before running lapply (see ?recover for details).
Then when/if an error occurs you will instantly be put into the recover mode that will let you examine which function you are in, what arguments were passed to that function, etc. so you can see which step you are on and what the possible reason for the error is.
You can also use try or tryCatch as mentioned in the comments to either skip elements that produce an error or print out information on where they occur.

R 'object XX not found' error thrown inside function, but not in script

I am fairly new to R, so my apologies if this question is a bit silly.
I am calling a function in an external package ('mmlcr', although I don't think that is directly relevant to my problem), and one of the required inputs (data) is a data.frame. I compose the data.frame from various data using the following approach (simplified for illustration):
#id, Time, and value are vectors created elsewhere in the code.
myData = data.frame(a=id, b=Time, c=value)
out <- mmlcr( input1, input2, data=myData, input4)
Which throws the error:
Error in is.data.frame(data) : object 'myData' not found
The debugger indicates that this error is thrown during the mmlcr() call.
I then added a print(ls()) immediately prior to the mmlcr() call, and the output confirmed that "myData" was in my function workspace; further is.data.frame(myData) returned TRUE. So it seems that "myData" is successfully being created, but for some reason it is not passing into the mmlcr() function properly. (Commenting this line causes no error to be thrown, so I'm pretty sure this is the problematic line).
However, when I put the exact same code in a script (i.e., not within a function block), no such error is thrown and the output is as expected. Thus, I assume there is some scoping issue that arises.
I have tried both assignment approaches:
myData = data.frame(a=id, b=Time, c=value)
myData <- data.frame(a=id, b=Time, c=value)
and both give me the same error. I admit that I don't fully understand the scope model in R (I've read about the differences between = and <- and I think I get it, but I'm not sure).
Any advice you can offer would be appreciated.
MMLCR is now deprecated and you should search for some alternatives. Without looking too much into it, I sleuthed through an old repo and found the culprit:
m <- eval(m, data)
in the function mmlcr.default. There are a lot of reasons why this is bad, but scoping is the big one. R has this issue with the subset.data.frame function, see my old SO question. Rather than modify the source code, I would find a way to do your function with a subroutine using a for, repeat, or while loop.

Why can't I pass a dataset to a function?

I'm using the package glmulti to fit models to several datasets. Everything works if I fit one dataset at a time.
So for example:
output <- glmulti(y~x1+x2,data=dat,fitfunction=lm)
works just fine.
However, if I create a wrapper function like so:
analyze <- function(dat)
{
out<- glmulti(y~x1+x2,data=dat,fitfunction=lm)
return (out)
}
simply doesn't work. The error I get is
error in evaluating the argument 'data' in selecting a method for function 'glmulti'
Unless there is a data frame named dat, it doesn't work. If I use results=lapply(list_of_datasets, analyze), it doesn't work.
So what gives? Without my said wrapper, I can't lapply a list of datasets through this function. If anyone has thoughts or ideas on why this is happening or how I can get around it, that would be great.
example 2:
dat=list_of_data[[1]]
analyze(dat)
works fine. So in a sense it is ignoring the argument and just literally looking for a data frame named dat. It behaves the same no matter what I call it.
I guess this is -yet another- problem due to the definition of environments in the parse tree of S4 methods (one of the resons why I am not a big fan of S4...)
It can be shown by adding quotes around the dat :
> analyze <- function(dat)
+ {
+ out<- glmulti(y~x1+x2,data="dat",fitfunction=lm)
+ return (out)
+ }
> analyze(test)
Initialization...
Error in eval(predvars, data, env) : invalid 'envir' argument
You should in the first place send this information to the maintainers of the package, as they know how they deal with the environments internally. They'll have to adapt the functions.
A -very dirty- workaround for yourself, is to put "dat" in the global environment and delete it afterwards.
analyze <- function(dat)
{
assign("dat",dat,envir=.GlobalEnv) # put the dat in the global env
out<- glmulti(y~x1+x2,data=dat,fitfunction=lm)
remove(dat,envir=.GlobalEnv) # delete dat again from global env
return (out)
}
EDIT:
Just for clarity, this is really about the worst solution possible, but I couldn't manage to find anything better. If somebody else gives you a solution where you don't have to touch your global environment, by all means use that one.

Resources