Tracing back rare error - r

I'm running a code which takes very long to compute. I made my code parallel using foreach()%dopar% and run in on the cluster.
It runs generally fine but sometimes crashes and I get the following error :
Error in { : task 4 failed - "missing value where TRUE/FALSE needed"
Calls: %dopar% -> <Anonymous>
Execution halted
Now it says Execution halted but only for this particular core so the others keep running and at the end it fails to output but doesn't tell me before hand.
I guess it's a problem with an if statement. I tried simulating the code on my computer but it is so rare that I can't simulate it.
The code runs easily 100 hours doing as many as 100 000 loops and only one of them will fail.
My questions are : Can I traceback where the error was? (I run the code on a cluster so I don't have all the nice Rstudio stuff)
Also, is it possible to still output from a foreach() loop even if one of the tasks crashed?
Or perhaps any method people use so that I can make the crash happen on my computer?
I can write the code if needed, please ask if it helps.

The foreach ".errorhandling" argument is intended to help in this situation. If you want foreach to pass errors through, then use .errorhandling="pass". If you want it to filter out errors (which reduces the length of the result), then use .errorhandling="remove". The default value is "stop" which throws an error indicating which task failed.
Unfortunately, most parallel backends don't support tracebacks, but doMPI does. You simply call "startMPIcluster" with verbose=TRUE, and the traceback will be written to the log file of the worker that had the error. Here's an example that generates an error on task 42:
suppressMessages(library(doMPI))
cl <- startMPIcluster(4, verbose=TRUE)
registerDoMPI(cl)
g <- function(i) {
if (i == 42) {
if (NULL) cat('hello, world\n')
}
7
}
f <- function(i) g(i)
r <- foreach(i=1:50, .errorhandling='pass') %dopar% f(i)
print(r)
closeCluster(cl)
mpi.quit()
Since it uses .errorhandling="pass", the script runs to completion, with an error object returned in element 42 of the result list. In addition, one of the log files contains a traceback of the error (along with many other messages):
waiting for a taskchunk...
executing taskchunk 42 containing 1 tasks
error executing task: argument is of length zero
traceback (most recent call first):
> g(i)
> f(i)
> eval(expr, envir, enclos)
> eval(expr, envir)
> executeTask(taskchunk$argslist[[1]])
> executeTaskChunk(cl$workerid, taskchunk, envir, err, cores)
returning error results for taskchunk 42
Unfortunately, doMPI is mostly used on Linux systems, so this isn't helpful to most Mac and Windows users.

Related

Is it possible to not printing error messages when knitting an RMarkdown messages? [duplicate]

I am running a simulation study in R. Occassionally, my simulation study produces an error message. As I implemented my simulation study in a function, the simulation stops when this error message occurs. I know that it is bad practice to suppress errors, but at this moment to me there is no other option than to suppress the error and then go on with the next simulation until the total number of simulations I like to run. To do this, I have to suppress the error message R produces.
To do this, I tried different things:
library(base64)
suppressWarnings
suppressMessages
options(error = expression(NULL))
In the first two options, only warnings and message are suprressed, so that's no help. If I understand it correctly, in the last case, all error messages should be avoided. However, that does not help, the function still stops with an error message.
Has someone any idea why this does not work the way I expect it to work? I searched the internet for solutions, but could only find the above mentioned ways.
In the function I am running my simulation, a part of the code is analysed by the external program JAGS (Gibbs sampler) and the error message is produced by this analysis. Might this be where it goes wrong?
Note that I do not have to supress a certain/specific error message, as there are no other error messages produced, it is 'good enough' to have an option that supresses just all error messages.
Thanks for your time and help!
As suggested by the previous solution, you can use try or tryCatch functions, which will encapsulate the error (more info in Advanced R). However, they will not suppress the error reporting message to stderr by default.
This can be achieved by setting their parameters. For try, set silent=TRUE. For tryCatch set error=function(e){}.
Examples:
o <- try(1 + "a")
> Error in 1 + "a" : non-numeric argument to binary operator
o <- try(1 + "a", silent=TRUE) # no error printed
o <- tryCatch(1 + "a")
> Error in 1 + "a" : non-numeric argument to binary operator
o <- tryCatch(1 + "a", error=function(e){})
There is a big difference between suppressing a message and suppressing the response to an error. If a function cannot complete its task, it will of necessity return an error (although some functions have a command-line argument to take some other action in case of error). What you need, as Zoonekynd suggested, is to use try or trycatch to "encapsulate" the error so that your main program flow can continue even when the function fails.

R: Catch errors and continue execution after logging the stacktrace (no traceback available with tryCatch)

I have many unattended batch jobs in R running on a server and I have to analyse job failures after they have run.
I am trying to catch errors to log them and recover from the error gracefully but I am not able to get a stack trace (traceback) to log the code file name and line number of the R command that caused the error. A (stupid) reproducible example:
f <- function() {
1 + variable.not.found # stupid error
}
tryCatch( f(), error=function(e) {
# Here I would log the error message and stack trace (traceback)
print(e) # error message is no problem
traceback() # stack trace does NOT work
# Here I would handle the error and recover...
})
Running the code above produces this output:
simpleError in f(): object 'variable.not.found' not found
No traceback available
The traceback is not available and the reason is documented in the R help (?traceback):
Errors which are caught via try or tryCatch do not generate a
traceback, so what is printed is the call sequence for the last
uncaught error, and not necessarily for the last error.
In other words: Catching an error with tryCatch does kill the stack trace!
How can I
handle errors and
log the stack trace (traceback) for further examination
[optionally] without using undocumented or hidden R internal functions that are not guaranteed to work in the future?
THX a lot!
Sorry for the long answer but I wanted to summarize all knowledge and references in one answer!
Main issues to be solved
tryCatch "unrolls" the call stack to the tryCatch call so that traceback and sys.calls do no longer contain the full stack trace to identify the source code line that causes an error or warning.
tryCatch aborts the execution if you catch a warning by passing a handler function for the warning condition. If you just want to log a warning you cannot continue the execution as normal.
dump.frames writes the evaluation environments (frames) of the stack trace to allow post-mortem debugging (= examining the variable values visible within each function call) but dump.frames "forgets" to save the workspace too if you set the parameter to.file = TRUE. Therefore important objects may be missing.
Find a simple logging framework since R does not support decent logging out of the box
Enrich the stack trace with the source code lines.
Solution concept
Use withCallingHandlers instead of tryCatch to get a full stack trace pointing to the source code line that throwed an error or warning.
Catch warnings only within withCallingHandlers (not in tryCatch) since it just calls the handler functions but does not change the program flow.
Surround withCallingHandlers with tryCatch to catch and handle errors as wanted.
Use dump.frames with the parameter to.file = FALSE to write the dump into global variable named last.dump and save it into a file together with the global environment by calling save.image.
Use a logging framework, e. g. the package futile.logger.
R does track source code references when you set options(keep.source = TRUE). You can add this option to your .Rprofile file or use a startup R script that sets this option and source your actual R script then.
To enrich the stack trace with the tracked source code lines you can use the undocumented (but widely used) function limitedLabels.
To filter out R internal function calls from stack trace you can remove all calls that have no source code line reference.
Implementation
Code template
Instead of using tryCatch you should use this code snippet:
library(futile.logger)
tryCatch(
withCallingHandlers(<expression>,
error = function(e) {
call.stack <- sys.calls() # is like a traceback within "withCallingHandlers"
dump.frames()
save.image(file = "last.dump.rda")
flog.error(paste(e$message, limitedLabels(call.stack), sep = "\n"))
}
warning = <similar to error above>
}
error = <catch errors and recover as you would do it normally>
# warning = <...> # never do this here since it stops the normal execution like an error!
finally = <your clean-up code goes here>
}
Reusable implementation via a package (tryCatchLog)
I have implemented a simple package with all the concepts mentioned above.
It provides a function tryCatchLog using the futile.logger package.
Usage:
library(tryCatchLog) # or source("R/tryCatchLog.R")
tryCatchLog(<expression>,
error = function(e) {
<your error handler>
})
You can find the free source code at github:
https://github.com/aryoda/tryCatchLog
You could also source the tryCatchLog function instead of using a full blown package.
Example (demo)
See the demo file that provides a lot of comments to explain how it works.
References
Other tryCatch replacements
Logging of warnings and errors with with a feature to perform multiple attempts (retries) at try catch, e. g. for accessing an unreliable network drive:
Handling errors before warnings in tryCatch
withJavaLogging function without any dependencies to other packages which also enriches the source code references to the call stack using limitedLabels:
Printing stack trace and continuing after error occurs in R
Other helpful links
http://adv-r.had.co.nz/Exceptions-Debugging.html
A Warning About warning() - avoid R's warning feature
In R, why does withCallingHandlers still stops execution?
How to continue function when error is thrown in withCallingHandlers in R
Can you make R print more detailed error messages?
How can I access the name of the function generating an error or warning?
How do I save warnings and errors as output from a function?
options(error=dump.frames) vs. options(error=utils::recover)
General suggestions for debugging in R
Suppress warnings using tryCatch in R
R Logging display name of the script
Background information about the "srcrefs" attribute (Duncan Murdoch)
get stack trace on tryCatch'ed error in R
The traceback function can be used to print/save the current stack trace, but you have to specify an integer argument, which is the number of stack frames to omit from the top (can be 0). This can be done inside a tryCatch block or anywhere else. Say this is the content of file t.r:
f <- function() {
x <- 1
g()
}
g <- function() {
traceback(0)
}
When you source this file into R and run f, you get the stack trace:
3: traceback(0) at t.r#7
2: g() at t.r#3
1: f()
which has file name and line number information for each entry. You will get several stack frames originating from the implementation of tryCatch and you can't skip them by specifying a non-zero argument to traceback, yet indeed this will break in case the implementation of tryCatch changes.
The file name and line number information (source references) will only be available for code that has been parsed to keep source references (by default the source'd code, but not packages). The stack trace will always have call expressions.
The stack trace is printed by traceback (no need to call print on it).
For logging general errors, it is sometimes useful to use options(error=), one then does not need to modify the code that causes the errors.

parallel r, multiple errors

I am running a foreach loop and am not sure how to interpret some strange behaviour. The functions are tested individually, the loop roughly looks like this:
cl<-makeCluster(2, outfile = "")
registerDoParallel(cl)
outputs <- foreach(k = 1:x, .packages = "various packages") %dopar% {
do things #contains a for-loop
return(stuff for output)
}
stopCluster(c1)
I get these errors
Error in unserialize(socklist[[n]]) : error reading from connection
In addition: Warning message:
closing unused connection 6 (<-computer:11531)
and
Error in serialize(data, node$con) : error writing to connection
The red dot in rstudio disappears and the console cursor is active again, suggesting it has quit the program.
Now, I know these are pretty non-specific errors, but the really weird thing is that the cluster still runs after the errors. The processes spawn, they take up the expected amount of CPU and memory for as long as I would expect the job to take, but I get no return data.
I thought maybe the weird behaviour might be useful to pin down the problem.

Results of workers not returned properly - snow - debug

I'm using the snow package in R to execute a function on a SOCK cluster with multiple machines(3) running on Linux OS. I tried to run the code with both parLapply and clusterApply.
In case of any error at the worker level, the results of the worker nodes are not returned properly to master making it very hard to debug. I'm currently logging every heartbeat of the worker nodes independently using futile.logger. It seems as if the results are properly computed. But when I tried to print the result at the master node (After receiving the output from workers) I get an error which says, Error in checkForRemoteErrors(val): 8 nodes produced errors; first error: missing value where TRUE/FALSE needed.
Is there any way to debug the results of the workers more deeply?
The checkForRemoteErrors function is called by parLapply and clusterApply to check for task errors, and it will throw an error if any of the tasks failed. Unfortunately, although it displays the error message, it doesn't provide any information about what worker code caused the error. But if you modify your worker/task function to catch errors, you can retain some extra information that may be helpful in determining where the error occurred.
For example, here's a simple snow program that fails. Note that it uses outfile='' when creating the cluster so that output from the program is displayed, which by itself is a very useful debugging technique:
library(snow)
cl <- makeSOCKcluster(2, outfile='')
problem <- function(i) {
if (NA)
j <- 999
else
j <- i
2 * j
}
r <- parLapply(cl, 1:2, problem)
When you execute this, you see the error message from checkForRemoteErrors and some other messages, but nothing that tells you that the if statement caused the error. To catch errors when calling problem, we define workerfun:
workerfun <- function(i) {
tryCatch({
problem(i)
},
error=function(e) {
print(e)
stop(e)
})
}
Now we execute workerfun with parLapply instead of problem, first exporting problem to the workers:
clusterExport(cl, c('problem'))
r <- parLapply(cl, 1:2, workerfun)
Among the other messages, we now see
<simpleError in if (NA) j <- 999 else j <- i: missing value where TRUE/FALSE needed>
which includes the actual if statement that generated the error. Of course, it doesn't tell you the file name and line number of the expression, but it's often enough to let you solve the problem.
check the range of your observations. how the observation varies. I have noticed that when there are lots of decimal places 4, 5,6 , it throws glm.nb off. To solve this i just round the observations to 2 decimal places.

tryCatch does not catch an error if called though RScript

I'm facing a strange issue in R.
Consider the following code (a really simplified version of the real code but still having the problem) :
library(timeSeries)
tryCatch(
{
specificWeekDay <- 2
currTs <- timeSeries(c(1,2),c('2012-01-01','2012-01-02'),
format='%Y-%m-%d',units='A')
# just 2 dates out of range
start <- time(currTs)[2]+100*24*3600
end <- time(currTs)[2]+110*24*3600
# this line returns an empty timeSeries
currTs <- window(currTs,start=start,end=end)
message("Up to now, everything is OK")
# this is the line with the uncatchable error
currTs[!(as.POSIXlt(time(currTs))$wday %in% specificWeekDay),] <- NA
message("I'm after the bugged line !")
},error=function(e){message(e)})
message("End")
When I run that code in RGui, I correctly get the following output:
Up to now, everything is OK
error in evaluating the argument 'i' in
selecting a method for function '[<-': Error in
as.POSIXlt.numeric(time(currTs)) : 'origin' must be supplied
End
Instead, when I run it through RScript (in windows) using the following line:
RScript.exe --vanilla "myscript.R"
I get this output:
Up to now, everything is OK
Execution interrupted
It seems like RScript crashes...
Any idea about the reason?
Is this a timeSeries package bug, or I'm doing something wrong ?
If the latter, what's the right way to be sure to catch all the errors ?
Thanks in advance.
EDIT :
Here's a smaller example reproducing the issue that doesn't use timeSeries package. To test it, just run it as described above:
library(methods)
# define a generic function
setGeneric("foo",
function(x, ...){standardGeneric("foo")})
# set a method for the generic function
setMethod("foo", signature("character"),
function(x) {x})
tryCatch(
{
foo("abc")
foo(notExisting)
},error=function(e)print(e))
It seems something related to generic method dispatching; when an argument of a method causes an error, the dispatcher cannot find the signature of the method and conseguently raises an exception that tryCatch function seems unable to handle when run through RScript.
Strangely, it doesn't happen for example with print(notExisting); in that case the exception is correctly handled.
Any idea about the reason and how to catch this kind of errors ?
Note:
I'm using R-2.14.2 on Windows 7
The issue is in the way the internal C code implementing S4 method dispatch tries to catch and handle some errors and how the non-interactive case is treated in this approach. A work-around should be in place in R-devel and R-patched soon.
Work-around now committed to R-devel and R-patched.
Information about tryCatch() [that the OP already knew and used but I didn't notice]
I think you are missing that your tryCatch() is not doing anything special with the error, hence you are raising an error in the normal fashion. In interactive use the error is thrown and handled in the usual fashion, but an error inside a script run in a non-interactive session (a la Rscript) will abort the running script.
tryCatch() is a complex function that allows the potential to trap and handle all sorts of events in R, not just errors. However by default it is set up to mimic the standard R error handling procedure; basically allow the error to be thrown and reported by R. If you want R to do anything other than the basic behaviour then you need to add a specific handler for the error:
> e <- simpleError("test error")
> tryCatch(foo, error = function(e) e,
+ finally = writeLines("There was a problem!"))
There was a problem!
<simpleError in doTryCatch(return(expr), name, parentenv, handler): object 'foo'
not found>
I suggest you read ?tryCatch in more detail to understand better what it does.
An alternative is to use try(). To modify your script I would just do:
# this is the line with the uncatchable error
tried <- try(currTs[!(as.POSIXlt(time(currTs))$wday %in% specificWeekDay),] <- NA,
silent = TRUE)
if(inherits(tried, "try-error")) {
writeLines("There was an error!")
} else {
writeLines("Everything worked fine!")
}
The key bit is to save the object returned from try() so you can test the class, and to have try() operate silently. Consider the difference:
> bar <- try(foo)
Error in try(foo) : object 'foo' not found
> bar <- try(foo, silent = TRUE)
> class(bar)
[1] "try-error"
Note that in the first call above, the error is caught and reported as a message. In the second, it is not reported. In both cases an object of class "try-error" is returned.
Internally, try() is written as a single call to tryCatch() which sets up a custom function for the error handler which reports the error as a message and sets up the returned object. You might wish to study the R code for try() as another example of using tryCatch().

Resources