R 'object XX not found' error thrown inside function, but not in script - r

I am fairly new to R, so my apologies if this question is a bit silly.
I am calling a function in an external package ('mmlcr', although I don't think that is directly relevant to my problem), and one of the required inputs (data) is a data.frame. I compose the data.frame from various data using the following approach (simplified for illustration):
#id, Time, and value are vectors created elsewhere in the code.
myData = data.frame(a=id, b=Time, c=value)
out <- mmlcr( input1, input2, data=myData, input4)
Which throws the error:
Error in is.data.frame(data) : object 'myData' not found
The debugger indicates that this error is thrown during the mmlcr() call.
I then added a print(ls()) immediately prior to the mmlcr() call, and the output confirmed that "myData" was in my function workspace; further is.data.frame(myData) returned TRUE. So it seems that "myData" is successfully being created, but for some reason it is not passing into the mmlcr() function properly. (Commenting this line causes no error to be thrown, so I'm pretty sure this is the problematic line).
However, when I put the exact same code in a script (i.e., not within a function block), no such error is thrown and the output is as expected. Thus, I assume there is some scoping issue that arises.
I have tried both assignment approaches:
myData = data.frame(a=id, b=Time, c=value)
myData <- data.frame(a=id, b=Time, c=value)
and both give me the same error. I admit that I don't fully understand the scope model in R (I've read about the differences between = and <- and I think I get it, but I'm not sure).
Any advice you can offer would be appreciated.

MMLCR is now deprecated and you should search for some alternatives. Without looking too much into it, I sleuthed through an old repo and found the culprit:
m <- eval(m, data)
in the function mmlcr.default. There are a lot of reasons why this is bad, but scoping is the big one. R has this issue with the subset.data.frame function, see my old SO question. Rather than modify the source code, I would find a way to do your function with a subroutine using a for, repeat, or while loop.

Related

R Parallelisation Error unserialize(socklisk[[n]])

In a nutshell I am trying to parallelise my whole script over dates using Snow and adply but continually get the below error.
Error in unserialize(socklist[[n]]) : error reading from connection
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
I have set up the parallelisation process in the following way:
Cores = detectCores(all.tests = FALSE, logical = TRUE)
cl = makeCluster(Cores, type="SOCK")
registerDoSNOW(cl)
clusterExport(cl, c("Var1","Var2","Var3","Var4"), envir = environment())
exposureDaily <- adply(.data = dateSeries,.margins = 1,.fun = MainCalcFunction,
.expand = TRUE, Var1, Var2, Var3,
Var4,.parallel = TRUE)
stopCluster(cl)
Where dateSeries might look something like
> dateSeries
marketDate
1 2016-04-22
2 2016-04-26
MainCalcFunction is a very long script with multiple of my own functions contained within it. As the script is so long reproducing it wouldn't be practical, and a hypothetical small function would defeat the purpose as I have already got this methodology to work with other smaller functions. I can say that within MainCalcFunction I call all my libraries, necessary functions, and a file containing all other variables aside from those exported above so that I don't have to export a long list libraries and other objects.
MainCalcFunction can run successfully in its entirety over 2 dates using adply but not parallelisation, which tells me that it is not a bug in the code that is causing the parallelisation to fail.
Initially I thought (from experience) that the parallelisation over dates was failing because there was another function within the code that utilised parallelisation, however I have subsequently rebuilt the whole code to make sure that there was no such function.
I have poured over the script with a fine tooth comb to see if there was any place where I accidently didn't export something that I needed and I can't find anything.
Some ideas as to what could be causing the code to fail are:
The use of various option valuation functions in fOptions and rquantlib
The use of type sock
I am aware of this question already asked and also this question, and while the first question has helped me, it hasn't yet help solve the problem. (Note: that may be because I haven't used it correctly, having mainly used loginfo("text") to track where the code is. Potentially, there is a way to change that such that I log warning and/or error messages instead?)
Please let me know if there is any other information I can provide to help in solving this. I would be so appreciative if someone could provide some guidance, as the code takes close to 40 minutes to run for a day and I need to run it for close to a year, therefore parallelisation is essential!
EDIT
I have tried to implement the suggestion in the first question included above by utilising the outfile option. Given I am using Windows, I have done this by including the following lines before the exporting of the key objects and running MainCalcFunction :
reportLogName <- paste("logout_parallel.txt", sep="")
addHandler(writeToFile,
file = paste(Save_directory,reportLogName, sep="" ),
level='DEBUG')
with(getLogger(), names(handlers))
loginfo(paste("Starting log file", getwd()))
mc<-detectCores()
cl<-makeCluster(mc, outfile="")
registerDoParallel(cl)
Similarly, at the beginning of MainCalcFunction, after having sourced my libraries and functions I have included the following to print to file:
reportLogName <- paste(testDate,"_logout.txt", sep="")
addHandler(writeToFile,
file = paste(Save_directory,reportLogName, sep="" ),
level='DEBUG')
with(getLogger(), names(handlers))
loginfo(paste("Starting test function ",getwd(), sep = ""))
In the MainCalcFunction function I have then put loginfo("text") statements at key junctures to inform me of where the code is at.
This has resulted in some text files being available after the code fails due to the aforementioned error. However, these text files provide no more information on the cause of the error aside from at what point. This is despite having a tryCatch statement embedded in MainCalcFunction where at the end, on any instance of error I have added the line logerror(e)
I am posting this answer in case it helps anyone else with a similar problem in the future.
Essentially, the error unserialize(socklist[[n]]) doesn't tell you a lot, so to solve it it's a matter of narrowing down the issue.
Firstly, be absolutely sure the code runs over several dates in non-parallel with no errors
Ensure the parallelisation is set up correctly. There are some obvious initial errors that many other questions respond to, e.g., hidden parallelisation inside the code which means parallelisation is occurring twice.
Once you are sure that there is no problem with the code and the parallelisation is set up correctly start narrowing down. The issue is likely (unless something has been missed above) something in the code which isn't a problem when it is run in serial, but becomes a problem when run in parallel. The easiest way to narrow down is by setting outfile = "Log.txt" in which make cluster function you use, e.g., cl<-makeCluster(cores-1, outfile="Log.txt"). Then add as many print("Point in code") comments in your function to narrow down on where the issue is occurring.
In my case, the problem was the line jj = closeAllConnections(). This line works fine in non-parallel but breaks the code when in parallel. I suspect it has something to do with the function closing all connections including socket connections that are required for the parallelisation.
Try running using plain R instead of running in RStudio.

Retrieving expected data.frame for testthat expectation

I'd like to test that a function returns the expected data.frame. The data.frame is too large to define in the R file (eg, using something like structure()). I'm doing something wrong with the environments when I try a simple retrieval from disk, like:
test_that("SO example for data.frame retreival", {
path_expected <- "./inst/test_data/project_longitudinal/expected/default.rds"
actual <- data.frame(a=1:5, b=6:10) #saveRDS(actual, file=path_expected)
expected <- readRDS(path_expected)
expect_equal(actual, expected, label="The returned data.frame should be correct")
})
The lines execute correctly when run in the console. But when I run devtools::test(), the following error occurs when the rds/data.frame is read from a file.
1. Error: All Records -Default ----------------------------------------------------------------
cannot open the connection
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"),
warning = function(c) invokeRestart("muffleWarning"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: readRDS(path_expected) at test-read_batch_longitudinal.R:59
5: gzfile(file, "rb")
To make this work, what adjustments are necessary to the environment? If there's not an easy way, what's a good way to test large data.frames?
I suggest you check out the excellent ensurer package. You can include these functions inside the function itself (rather than as part of the testthat test set).
It will throw an error if the dataframe (or whatever object you'd like to check) doesn't fulfill your requirements, and will just return the object if it passes your tests.
The difference with testthat is that ensurer is built to check your objects at runtime, which probably circumvents the entire environment problem you are facing, as the object is tested inside the function at runtime.
See the end of this vignette, to see how to test the dataframe against a template that you can make as detailed as you like. You'll also find many other tests you can run inside the function. It looks like this approach may be preferable over testthat in this case.
Based on the comment by #Gavin Simpson, the problem didn't involve environments, but instead the file path. Changing the snippet's second line worked.
path_qualified <- base::file.path(
devtools::inst(name="REDCapR"),
test_data/project_longitudinal/expected/dummy.rds"
)
The file's location is found whether I'm debugging interactively, or testthat is running (and thus whether inst is in the path or not).

Problems with reassignInPackage

I am trying to understand the way the YourCast R package works and make it work with my data.
For example, if a function produces errors, I
get the source code of that function using YourCast:::bad.fn
add outputs of critical
values at critical stages
use reassignInPackage(name="original.fn", package="YourCast", value="my.fn")
Once I find the cause of the error, I fix it in the function and reassign it in the package.
However, for some strange reason this does not work for non-hidden functions.
For example:
install.packages("YourCast")
Library(YourCast)
YourCast:::check.depvar
This will print the hidden function check.depvar. One line if (all(ix == 1:3)) will produce an error message if any of the x is missing.
Thus, I change the whole function to the following and replace the original formula:
mzuba.check.depvar <- function(formula)
{
return (grepl("log[(]",as.character(formula)[2]))
}
reassignInPackage("check.depvar",
pkgName="YourCast",
mzuba.check.depvar)
rm(mzuba.check.depvar)
Now YourCast:::check.depvar will print my version of that function, and everything is fine.
However
YourCast::yourcast or YourCast:::yourcast or simply yourcast will print the non-hidden function yourcast. Suppose I want to change that function as well.
reassignInPackage(name="yourcast",
pkgName="YourCast",
value=test)
Now, YourCast::yourcast and YourCast:::yourcast will print the new, modified version but yourcast still gives the old version!
That might not a problem if I could simply call YourCast::yourcast instead of yourcast, but that produces some kind of error that I can't trace back because suddenly R-Studio does not print error messages at all anymore!, although it still does something if it is capable to:
> Uagh! do something!
> 1 + 1
[1] 2
> Why no error msg?
>
Restarting the R-session will solve the error-msg problem, though.
So my question is: How do I reassign non-hidden functions in packages?
Furthermore (this would faciliate testing a lot), is there a way to make all hidden functions available without using the ::: operator? I.e., How to export all functions from a package?

Object not found when called within a function with a formula

I'm trying to call the checkm function from within another function that accepts a formula as a parameter. I'm getting an object not found error. Here is the minimal implementation and error.
library(lrmest)
data(pcd)
form<-formula(Y~X1+X2+X3+X4)
checkm(form,data=pcd)
Wrap<-function(f){
checkm(f,data=pcd)
}
Wrap(form)
The error is:
Error in model.frame(formula = f, data = pcd, NULL) :
object 'f' not found
Called from: eval(expr, envir, enclos)
My guess from reading around is this has to do with my not understanding environments or promises but given that I don't understand them, I'm probably wrong.
Any quick fixes?
One quick fix is to change the name of your formula argument. It happens to conflict with the eval(cal) call within checkm. I suspect #joran is right that this isn't your fault. This works:
library(lrmest)
data(pcd)
form<-Y~X1+X2+X3+X4
checkm(form,data=pcd)
Wrap<-function(formula){
checkm(formula,data=pcd)
}
Wrap(form)
As #joran pointed out, there is a bug/error in the function caused from not using the correct frame to evaluate the command. If you swap out checkm for lm you'll see it runs just fine. You can create your own function that changes just that one line of code with
checkm2<-checkm
body(checkm2)[[6]]<-quote(cal <- eval(cal, parent.frame()))
And then run
library(lrmest)
data(pcd)
form<-formula(Y~X1+X2+X3+X4)
checkm2(form,data=pcd)
Wrap<-function(f){
checkm2(f,data=pcd)
}
Wrap(form)
and everything seems to run properly. So that just appears to be the fault of the people who wrote the code. You might consider contacting them to file a bug report.

tryCatch does not catch an error if called though RScript

I'm facing a strange issue in R.
Consider the following code (a really simplified version of the real code but still having the problem) :
library(timeSeries)
tryCatch(
{
specificWeekDay <- 2
currTs <- timeSeries(c(1,2),c('2012-01-01','2012-01-02'),
format='%Y-%m-%d',units='A')
# just 2 dates out of range
start <- time(currTs)[2]+100*24*3600
end <- time(currTs)[2]+110*24*3600
# this line returns an empty timeSeries
currTs <- window(currTs,start=start,end=end)
message("Up to now, everything is OK")
# this is the line with the uncatchable error
currTs[!(as.POSIXlt(time(currTs))$wday %in% specificWeekDay),] <- NA
message("I'm after the bugged line !")
},error=function(e){message(e)})
message("End")
When I run that code in RGui, I correctly get the following output:
Up to now, everything is OK
error in evaluating the argument 'i' in
selecting a method for function '[<-': Error in
as.POSIXlt.numeric(time(currTs)) : 'origin' must be supplied
End
Instead, when I run it through RScript (in windows) using the following line:
RScript.exe --vanilla "myscript.R"
I get this output:
Up to now, everything is OK
Execution interrupted
It seems like RScript crashes...
Any idea about the reason?
Is this a timeSeries package bug, or I'm doing something wrong ?
If the latter, what's the right way to be sure to catch all the errors ?
Thanks in advance.
EDIT :
Here's a smaller example reproducing the issue that doesn't use timeSeries package. To test it, just run it as described above:
library(methods)
# define a generic function
setGeneric("foo",
function(x, ...){standardGeneric("foo")})
# set a method for the generic function
setMethod("foo", signature("character"),
function(x) {x})
tryCatch(
{
foo("abc")
foo(notExisting)
},error=function(e)print(e))
It seems something related to generic method dispatching; when an argument of a method causes an error, the dispatcher cannot find the signature of the method and conseguently raises an exception that tryCatch function seems unable to handle when run through RScript.
Strangely, it doesn't happen for example with print(notExisting); in that case the exception is correctly handled.
Any idea about the reason and how to catch this kind of errors ?
Note:
I'm using R-2.14.2 on Windows 7
The issue is in the way the internal C code implementing S4 method dispatch tries to catch and handle some errors and how the non-interactive case is treated in this approach. A work-around should be in place in R-devel and R-patched soon.
Work-around now committed to R-devel and R-patched.
Information about tryCatch() [that the OP already knew and used but I didn't notice]
I think you are missing that your tryCatch() is not doing anything special with the error, hence you are raising an error in the normal fashion. In interactive use the error is thrown and handled in the usual fashion, but an error inside a script run in a non-interactive session (a la Rscript) will abort the running script.
tryCatch() is a complex function that allows the potential to trap and handle all sorts of events in R, not just errors. However by default it is set up to mimic the standard R error handling procedure; basically allow the error to be thrown and reported by R. If you want R to do anything other than the basic behaviour then you need to add a specific handler for the error:
> e <- simpleError("test error")
> tryCatch(foo, error = function(e) e,
+ finally = writeLines("There was a problem!"))
There was a problem!
<simpleError in doTryCatch(return(expr), name, parentenv, handler): object 'foo'
not found>
I suggest you read ?tryCatch in more detail to understand better what it does.
An alternative is to use try(). To modify your script I would just do:
# this is the line with the uncatchable error
tried <- try(currTs[!(as.POSIXlt(time(currTs))$wday %in% specificWeekDay),] <- NA,
silent = TRUE)
if(inherits(tried, "try-error")) {
writeLines("There was an error!")
} else {
writeLines("Everything worked fine!")
}
The key bit is to save the object returned from try() so you can test the class, and to have try() operate silently. Consider the difference:
> bar <- try(foo)
Error in try(foo) : object 'foo' not found
> bar <- try(foo, silent = TRUE)
> class(bar)
[1] "try-error"
Note that in the first call above, the error is caught and reported as a message. In the second, it is not reported. In both cases an object of class "try-error" is returned.
Internally, try() is written as a single call to tryCatch() which sets up a custom function for the error handler which reports the error as a message and sets up the returned object. You might wish to study the R code for try() as another example of using tryCatch().

Resources