R, tryCatch error - r

I am parsing alot of website and I wrote a script that loops thru the thousands of link from a separate file. However, I experienced that sometimes R couldn't load one link and it stops in the middle of loop, leaving many of other urls unparsed. So I tried to use tryCatch, so the script ignores this case and keep parsing next urls. However, I recently experienced that tryCatch generates below error.
gethelp.url = 'http://forums.autodesk.com/t5/Vault-General/bd-p/101'
gethelp.df =tryCatch(htmlTreeParse(gethelp.url, useInternalNodes = T), error = function() next)
Error in value[[3L]](cond) : unused argument (cond)
Calls: withRestarts ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted
The confusing thing is sometimes it works well and sometimes it throws out this error message, even though the same script parses the same urls.
Can anyone give me a guidance how to interpret this error messages? I read the document but i couldn't find much insights.

I think your function has to have cond as an argument – at least that's how I've used tryCatch() in the past, and your error message seems to indicate it as the problem.
Try the following:
gethelp.df =tryCatch(htmlTreeParse(gethelp.url, useInternalNodes = T), error = function(cond) next)
Note that the above line will still throw an error, b/c the example code is not in a loop. So I just replaced next with NA, and it worked fine.
Edit: In response to OP's comment, I suggest trying the following:
gethelp.df =tryCatch(htmlTreeParse(gethelp.url, useInternalNodes = T), error = function(cond)"skip")
if(gethelp.df=="skip"){next}

Related

R: Catch errors and continue execution after logging the stacktrace (no traceback available with tryCatch)

I have many unattended batch jobs in R running on a server and I have to analyse job failures after they have run.
I am trying to catch errors to log them and recover from the error gracefully but I am not able to get a stack trace (traceback) to log the code file name and line number of the R command that caused the error. A (stupid) reproducible example:
f <- function() {
1 + variable.not.found # stupid error
}
tryCatch( f(), error=function(e) {
# Here I would log the error message and stack trace (traceback)
print(e) # error message is no problem
traceback() # stack trace does NOT work
# Here I would handle the error and recover...
})
Running the code above produces this output:
simpleError in f(): object 'variable.not.found' not found
No traceback available
The traceback is not available and the reason is documented in the R help (?traceback):
Errors which are caught via try or tryCatch do not generate a
traceback, so what is printed is the call sequence for the last
uncaught error, and not necessarily for the last error.
In other words: Catching an error with tryCatch does kill the stack trace!
How can I
handle errors and
log the stack trace (traceback) for further examination
[optionally] without using undocumented or hidden R internal functions that are not guaranteed to work in the future?
THX a lot!
Sorry for the long answer but I wanted to summarize all knowledge and references in one answer!
Main issues to be solved
tryCatch "unrolls" the call stack to the tryCatch call so that traceback and sys.calls do no longer contain the full stack trace to identify the source code line that causes an error or warning.
tryCatch aborts the execution if you catch a warning by passing a handler function for the warning condition. If you just want to log a warning you cannot continue the execution as normal.
dump.frames writes the evaluation environments (frames) of the stack trace to allow post-mortem debugging (= examining the variable values visible within each function call) but dump.frames "forgets" to save the workspace too if you set the parameter to.file = TRUE. Therefore important objects may be missing.
Find a simple logging framework since R does not support decent logging out of the box
Enrich the stack trace with the source code lines.
Solution concept
Use withCallingHandlers instead of tryCatch to get a full stack trace pointing to the source code line that throwed an error or warning.
Catch warnings only within withCallingHandlers (not in tryCatch) since it just calls the handler functions but does not change the program flow.
Surround withCallingHandlers with tryCatch to catch and handle errors as wanted.
Use dump.frames with the parameter to.file = FALSE to write the dump into global variable named last.dump and save it into a file together with the global environment by calling save.image.
Use a logging framework, e. g. the package futile.logger.
R does track source code references when you set options(keep.source = TRUE). You can add this option to your .Rprofile file or use a startup R script that sets this option and source your actual R script then.
To enrich the stack trace with the tracked source code lines you can use the undocumented (but widely used) function limitedLabels.
To filter out R internal function calls from stack trace you can remove all calls that have no source code line reference.
Implementation
Code template
Instead of using tryCatch you should use this code snippet:
library(futile.logger)
tryCatch(
withCallingHandlers(<expression>,
error = function(e) {
call.stack <- sys.calls() # is like a traceback within "withCallingHandlers"
dump.frames()
save.image(file = "last.dump.rda")
flog.error(paste(e$message, limitedLabels(call.stack), sep = "\n"))
}
warning = <similar to error above>
}
error = <catch errors and recover as you would do it normally>
# warning = <...> # never do this here since it stops the normal execution like an error!
finally = <your clean-up code goes here>
}
Reusable implementation via a package (tryCatchLog)
I have implemented a simple package with all the concepts mentioned above.
It provides a function tryCatchLog using the futile.logger package.
Usage:
library(tryCatchLog) # or source("R/tryCatchLog.R")
tryCatchLog(<expression>,
error = function(e) {
<your error handler>
})
You can find the free source code at github:
https://github.com/aryoda/tryCatchLog
You could also source the tryCatchLog function instead of using a full blown package.
Example (demo)
See the demo file that provides a lot of comments to explain how it works.
References
Other tryCatch replacements
Logging of warnings and errors with with a feature to perform multiple attempts (retries) at try catch, e. g. for accessing an unreliable network drive:
Handling errors before warnings in tryCatch
withJavaLogging function without any dependencies to other packages which also enriches the source code references to the call stack using limitedLabels:
Printing stack trace and continuing after error occurs in R
Other helpful links
http://adv-r.had.co.nz/Exceptions-Debugging.html
A Warning About warning() - avoid R's warning feature
In R, why does withCallingHandlers still stops execution?
How to continue function when error is thrown in withCallingHandlers in R
Can you make R print more detailed error messages?
How can I access the name of the function generating an error or warning?
How do I save warnings and errors as output from a function?
options(error=dump.frames) vs. options(error=utils::recover)
General suggestions for debugging in R
Suppress warnings using tryCatch in R
R Logging display name of the script
Background information about the "srcrefs" attribute (Duncan Murdoch)
get stack trace on tryCatch'ed error in R
The traceback function can be used to print/save the current stack trace, but you have to specify an integer argument, which is the number of stack frames to omit from the top (can be 0). This can be done inside a tryCatch block or anywhere else. Say this is the content of file t.r:
f <- function() {
x <- 1
g()
}
g <- function() {
traceback(0)
}
When you source this file into R and run f, you get the stack trace:
3: traceback(0) at t.r#7
2: g() at t.r#3
1: f()
which has file name and line number information for each entry. You will get several stack frames originating from the implementation of tryCatch and you can't skip them by specifying a non-zero argument to traceback, yet indeed this will break in case the implementation of tryCatch changes.
The file name and line number information (source references) will only be available for code that has been parsed to keep source references (by default the source'd code, but not packages). The stack trace will always have call expressions.
The stack trace is printed by traceback (no need to call print on it).
For logging general errors, it is sometimes useful to use options(error=), one then does not need to modify the code that causes the errors.

R Log warnings and continue execution

I have a block of R code that is wrapped in a tryCatch statement. Any of the lines in that block can potentially throw a warning or an error. When caught, I have handlers for both warnings and errors, which perform logging in both cases, and exit handling in the error case.
But in the warning case, I just want the warning to be logged, and the execution to continue as normal. At the moment, when warning is caught, it is logged, but the execution is also stopped. Is there an easy way to allow for this functionality?
Not sure if it's the most idiomatic solution, but using a combination of tryCatch and withCallingHandlers works for me in an almost identical situation.
I wrap the call to my function with withCallingHandlers, providing a function to handle warnings; execution of the function will continue afterwards. I wrap all of that in tryCatch, providing a function to handle errors.
tryCatch(
withCallingHandlers(doSomething(), warning = function(w) logWarning(w)),
error = function(e) logError(e)
)
Thanks to nicola in the comments for the withCallingHandlers tip.

Avoiding "Could not resolve host" error to stop the program running in R

I use the getURL function from the Rcurl package in R to read content from a list of links.
When trying to fetch a broken link of the list I get the error "Error in function (type, msg, asError = TRUE) : Could not resolve host:" and the program stops running.
I use the Try command to try to avoid the program stopping but it doesn´t work.
try(getURL(URL, ssl.verifypeer = FALSE, useragent = "R")
Any hint on how can I avoid the program to stop running when trying to get a broken link?
You need to be doing some type of error handling. I would argue tryCatch is actually better for your situation.
I'm assuming you are inside a loop over the links, then you can check the response from your try/tryCatch to see if an error was thrown, and if so just move to the following iteration in your loop using next.
status <- tryCatch(
getURL(URL, ssl.verifypeer=FALSE, useragent="R"),
error = function(e) e
)
if(inherits(status, "error")) next

Knitting returns parse error

In attempting to knit a PDF. I'm calling a script that should return two ggplots by calling the chunk:
```{r, echo=FALSE}
read_chunk('Script.R')
```r
But receive the error
processing file: Preview-24a46368403c.Rmd
Quitting from lines 9-12 (Preview-24a46368403c.Rmd) Error in
parse(text = x, srcfile = src) : attempt to use zero-length
variable name Calls: <Anonymous> ... <Anonymous> -> parse_all ->
parse_all.character -> parse Execution halted
The script on its own runs and returns the two plots, but won't return them when knitted.
Similarly attempted to use source()
But got a similar error
Quitting from lines 7-10 (Preview-24a459ca4c1.Rmd) Error in
file(filename, "r", encoding = encoding) : cannot open the
connection Calls: <Anonymous> ... withCallingHandlers -> withVisible
-> eval -> eval -> source -> file Execution halted
While this does not appear to be a solution for you, this exact same error message appears if the chunk is not ended properly.
I experienced this error and traced it to ending chunk with `` instead of ```. Correcting the syntax of the chunk solved the problem I experienced with the same error message as you.
Are you sure that knitr is running from the directory you think it is? It appears that it is failing to find the file.
use an absolute path, if that fixes it, you've found your problem
once you've done that, you can use opts_knit$set(root.dir = "...") -- don't use setwd(.) if you want it (the cwd) to be maintained.
Knitr's default is the directory of the .Rmd file itself.
It may have to do with the "r" at the end of the triple backquotes demarcating your code chunk. There should be nothing after the triple backquotes, but I think the problem is specifically that the letter is "r".
The issue stems from the fact that R markdown processes backquoted statements starting with r as inline code, meaning it actually runs whatever is between the backquotes.
I had similar issues writing a problem set in an Rmd with this statement, which had backquoted text intended to be monospace but not run as inline code:
Use sapply or map to calculate the probability of a failure rate over r <- seq(.05, .5, .025).
When I knit the document, I got opaque error messages saying I had an improper assignment using <-. It was because instead of just displaying the backquoted statement in monospace, r <- seq(.05, .5, .025) was actually processed as R inline code of <- seq(.05, .5, .025)...thus the improper assignment error. I fixed my error by changing the variable name from r to rate.
The actual text of the error message in your question might refer to whatever follows your code chunk, as the knitting process is probably trying to run that as code. In this case, just removing that stray r at the end of the code chunk should fix the error.
You should use the following similar syntax, I had the same exact issue but got it fixed:
```{r views}
bank.df <- read.csv("C:/Users/User/Desktop/Banks.csv", header = TRUE) #load data
dim(bank.df) # to find dimension of data frame
head(bank.df) # show first six rows
```
the ``` has to be in the end of the line.
In my case was that I finished the code with four comas, not three . Check this and If you finished with four comas too, try to delete one of them.

tryCatch does not catch an error if called though RScript

I'm facing a strange issue in R.
Consider the following code (a really simplified version of the real code but still having the problem) :
library(timeSeries)
tryCatch(
{
specificWeekDay <- 2
currTs <- timeSeries(c(1,2),c('2012-01-01','2012-01-02'),
format='%Y-%m-%d',units='A')
# just 2 dates out of range
start <- time(currTs)[2]+100*24*3600
end <- time(currTs)[2]+110*24*3600
# this line returns an empty timeSeries
currTs <- window(currTs,start=start,end=end)
message("Up to now, everything is OK")
# this is the line with the uncatchable error
currTs[!(as.POSIXlt(time(currTs))$wday %in% specificWeekDay),] <- NA
message("I'm after the bugged line !")
},error=function(e){message(e)})
message("End")
When I run that code in RGui, I correctly get the following output:
Up to now, everything is OK
error in evaluating the argument 'i' in
selecting a method for function '[<-': Error in
as.POSIXlt.numeric(time(currTs)) : 'origin' must be supplied
End
Instead, when I run it through RScript (in windows) using the following line:
RScript.exe --vanilla "myscript.R"
I get this output:
Up to now, everything is OK
Execution interrupted
It seems like RScript crashes...
Any idea about the reason?
Is this a timeSeries package bug, or I'm doing something wrong ?
If the latter, what's the right way to be sure to catch all the errors ?
Thanks in advance.
EDIT :
Here's a smaller example reproducing the issue that doesn't use timeSeries package. To test it, just run it as described above:
library(methods)
# define a generic function
setGeneric("foo",
function(x, ...){standardGeneric("foo")})
# set a method for the generic function
setMethod("foo", signature("character"),
function(x) {x})
tryCatch(
{
foo("abc")
foo(notExisting)
},error=function(e)print(e))
It seems something related to generic method dispatching; when an argument of a method causes an error, the dispatcher cannot find the signature of the method and conseguently raises an exception that tryCatch function seems unable to handle when run through RScript.
Strangely, it doesn't happen for example with print(notExisting); in that case the exception is correctly handled.
Any idea about the reason and how to catch this kind of errors ?
Note:
I'm using R-2.14.2 on Windows 7
The issue is in the way the internal C code implementing S4 method dispatch tries to catch and handle some errors and how the non-interactive case is treated in this approach. A work-around should be in place in R-devel and R-patched soon.
Work-around now committed to R-devel and R-patched.
Information about tryCatch() [that the OP already knew and used but I didn't notice]
I think you are missing that your tryCatch() is not doing anything special with the error, hence you are raising an error in the normal fashion. In interactive use the error is thrown and handled in the usual fashion, but an error inside a script run in a non-interactive session (a la Rscript) will abort the running script.
tryCatch() is a complex function that allows the potential to trap and handle all sorts of events in R, not just errors. However by default it is set up to mimic the standard R error handling procedure; basically allow the error to be thrown and reported by R. If you want R to do anything other than the basic behaviour then you need to add a specific handler for the error:
> e <- simpleError("test error")
> tryCatch(foo, error = function(e) e,
+ finally = writeLines("There was a problem!"))
There was a problem!
<simpleError in doTryCatch(return(expr), name, parentenv, handler): object 'foo'
not found>
I suggest you read ?tryCatch in more detail to understand better what it does.
An alternative is to use try(). To modify your script I would just do:
# this is the line with the uncatchable error
tried <- try(currTs[!(as.POSIXlt(time(currTs))$wday %in% specificWeekDay),] <- NA,
silent = TRUE)
if(inherits(tried, "try-error")) {
writeLines("There was an error!")
} else {
writeLines("Everything worked fine!")
}
The key bit is to save the object returned from try() so you can test the class, and to have try() operate silently. Consider the difference:
> bar <- try(foo)
Error in try(foo) : object 'foo' not found
> bar <- try(foo, silent = TRUE)
> class(bar)
[1] "try-error"
Note that in the first call above, the error is caught and reported as a message. In the second, it is not reported. In both cases an object of class "try-error" is returned.
Internally, try() is written as a single call to tryCatch() which sets up a custom function for the error handler which reports the error as a message and sets up the returned object. You might wish to study the R code for try() as another example of using tryCatch().

Resources