Handling of condition objects in R

Handling of condition objects in R - r

What exactly happens in
tryCatch(warning("W"),warning = function(c) cat(c$message))
As far as I understand this the call warning("W") creates a condition object c of type warning and then someone (who?) looks for a handler (where? and recognizes the handler how?) and calls the handler on c.
The task of tryCatch is only to "establish" the handlers. What does this mean? In short the question is: how exactly does the condition object get to its handler?

In my experience, tryCatch is used to catch and either ignore or specifically handle errors and warnings. While you can do warnings with this function, I see it more often used with withCallingHandlers (and invokeRestart("muffleWarning")).
The Advanced R book is a good reference for many topics; for this, I'd recommend two chapters:
Exceptions and debugging, where Hadley highlights one key difference:
withCallingHandlers() is a variant of tryCatch() that establishes local handlers, whereas tryCatch() registers exiting handlers. Local handlers are called in the same context as where the condition is signalled, without interrupting the execution of the function. When a exiting handler from tryCatch() is called, the execution of the function is interrupted and the handler is called. withCallingHandlers() is rarely needed, but is useful to be aware of.
Bold emphasis is mine, to highlight that tryCatch interrupts, withCallingHandlers does not. This means that when a warning is raised, nothing else is executed:
tryCatch({warning("foo");message("bar");}, warning=function(w) conditionMessage(w))
# [1] "foo"
tryCatch({warning("foo");message("bar");})
# Warning in tryCatchList(expr, classes, parentenv, handlers) : foo
# bar
Further details/examples are in the Beyond exception handling chapter.
tryCatch executes the expression in a context where anything "raised" is optionally caught and potentially altered, ignored, logged, etc. A common method I see is to accept any error and return a known entity (now without an error), such as
ret <- tryCatch(
something_that_fails(),
error = function(e) FALSE)
Unlike other operating systems that allow precise filters on what to handle (e.g., python and its try: ... except ValueError: syntax, ref: https://docs.python.org/3/tutorial/errors.html#handling-exceptions), R is a bit coarser in that it catches all errors you can get to figure out what kind of error it is.
If you look at the source of tryCatch and trace around its use of its self-defined functions, you'll see that ultimately it calls an R function .addCondHands that includes the list of handlers, matching on the names of the handlers (typically error and warning, though perhaps others might be used?). (The source for that function can be found here, though I don't find it very helpful in this context).
I don't know exactly how to answer "how exactly ... get to its handler", other than an exception is thrown, it is caught, and the handler is executed with the error object as an argument.

Related

Change in error handling of PyEval_EvalCode between CPython 3.6 and CPython 3.7?

Does anyone know if there is a change in how errors with PyEval_EvalCode work in CPython 3.7, compared to CPython 3.6?
Specifically, I am asking because the way errors are handled seems different now.
We have some python code as a PyObject (let's call it Super_py_code), which, in turn, after leaving Python context, might call different Python code (in a new Python context) in the form of a PyObject (let's call it Sub_py_code), and there's an error (syntax error or whatever) at a sub-function, like this:
PyEval_EvalCode(Super_py_code, ...)
-> calling, leaving the context now ->
PyEval_EvalCode(Sub_py_code, ...) * <- Error here
Up until Python 3.6, I used to get an error indicator (return value NULL, and PyErr_Occurred() / PyErr_Fetch() return useful error information) at both calls to PyEval_EvalCode, which allowed us to string the stack traces together.
With Python 3.7 (and also 3.8), however, I ONLY get an error indicator at the lower PyEval_EvalCode(Sub_py_code, ...), but I get no longer any error information at the calling function PyEval_EvalCode(Super_py_code, ...), so, the stack trace is effectively cut above Sub_py_code.
I couldn't find any information in the Python-documentation for PyEval_EvalCode that the error handling has changed, however, so...
Is this an error in Python, or am I missing something?

How do I directly make an val binding to an option value in SML?

val SOME i = Int.fromString e
I have a line like this on my code and smlnj shows me this warning
vm.sml:84.7-84.32 Warning: binding not exhaustive
SOME i = ...
Is this bad practice? Should I use a function to handle the option or am I missing something?

If you're just working on a small script you'll run once, it's not necessarily bad practice: If Int.fromString e fails (and returns NONE instead of SOME _), then the value binding will fail and an exception will be raised to the appropriate handler (or the program will exit, if ther is no handler). To disable this warning, you can run the top-level statement (for SML-NJ 110.96): Control.MC.bindNonExhaustiveWarn := false;.
As an alternative approach, you could throw a custom exception:
val i =
case Int.fromString e
of SOME i => i
| NONE => raise Fail ("Expected string value to be parseable as an int; got: " ^ e)
The exception message should be written in a way that's appropriate to the provenance of the e value. (If e comes from command-line input, the program should tell the user that a number was expected there; if e comes from a file, the program should tell the user which file is formatted incorrectly and where the formatting error was found.)
As yet another alternative: If your program is meant to be long-running and builds up a lot of state, it wouldn't be very user-friendly if the program crashed as soon as the user entered an ill-formed string on the command line. (The user would be quite sad in this case, as all the state they built up in the program would have been lost.) In this case, you could repeatedly read from stdin until the user types in input that can be parsed as an int. This is incidentally more-or-less what the SML/NJ REPL does: instead of something like val SOME parsedProgram = SMLofNJ.parse (getUserInput ()), it would want to do something like:
fun getNextParsedProgram () =
case SMLofNJ.parse (getUserInput ())
of NONE => (print "ERROR: failed to parse\n"; getNextParsedProgram ())
| SOME parsedProgram => parsedProgram
In summary,
For a short-lived script or a script you don't intend on running often, turning off the warning is a fine option.
For a program where it's unexpected that e would be an unparseable string, you could raise a custom exception that explains what went wrong and how the user can fix it.
For longer-lived programs where better error handling is desired, you should respect the NONE case by pattern-matching on the result of fromString, which forces you to come up with some sort of error-handling behavior.

How to systematically populate a whitelist for a sandboxing program?

On pp. 260-263 of Programming in Lua (4th ed.), the author discusses how to implement "sandboxing" (i.e. the running of untrusted code) in Lua.
When it comes to imposing limiting the functions that untrusted code can run, he recommends a "whitelist approach":
We should never think in terms of what functions to remove, but what functions to add.
This question is about tools and techniques for putting this suggestion into practice. (I expect there will be confusion on this point I want to emphasize it upfront.)
The author gives the following code as an illustration of a sandbox program based on a whitelist of allowed functions. (I have added or moved around some comments, and removed some blank lines, but I've copied the executable content verbatim from the book).
-- From p. 263 of *Programming in Lua* (4th ed.)
-- Listing 25.6. Using hooks to bar calls to unauthorized functions
local debug = require "debug"
local steplimit = 1000 -- maximum "steps" that can be performed
local count = 0 -- counter for steps
local validfunc = { -- set of authorized functions
[string.upper] = true,
[string.lower] = true,
... -- other authorized functions
}
local function hook (event)
if event == "call" then
local info = debug.getinfo(2, "fn")
if not validfunc[info.func] then
error("calling bad function: " .. (info.name or "?"))
end
end
count = count + 1
if count > steplimit then
error("script uses too much CPU")
end
end
local f = assert(loadfile(arg[1], "t", {})) -- load chunk
debug.sethook(hook, "", 100) -- set hook
f() -- run chunk
Right off the bat I am puzzled by this code, since the hook tests for event type (if event == "call" then...), and yet, when the hook is set, only count events are requested (debug.sethook(hook, "", 100)). Therefore, the whole song-and-dance with validfunc is for naught.
Maybe it is a typo. So I tried experimenting with this code, but I found it very difficult to put the whitelist technique in practice. The example below is a very simplified illustration of the type of problems I ran into.
First, here is a slightly modified version of the author's code.
#!/usr/bin/env lua5.3
-- Filename: sandbox
-- ----------------------------------------------------------------------------
local debug = require "debug"
local steplimit = 1000 -- maximum "steps" that can be performed
local count = 0 -- counter for steps
local validfunc = { -- set of authorized functions
[string.upper] = true,
[string.lower] = true,
[io.stdout.write] = true,
-- ... -- other authorized functions
}
local function hook (event)
if event == "call" then
local info = debug.getinfo(2, "fnS")
if not validfunc[info.func] then
error(string.format("calling bad function (%s:%d): %s",
info.short_src, info.linedefined, (info.name or "?")))
end
end
count = count + 1
if count > steplimit then
error("script uses too much CPU")
end
end
local f = assert(loadfile(arg[1], "t", {})) -- load chunk
validfunc[f] = true
debug.sethook(hook, "c", 100) -- set hook
f() -- run chunk
The most significant differences in the second snippet relative to the first one are:
the call to debug.sethook has "c" as mask;
the f function for the loaded chunk gets added to the validfunc whitelist;
io.stdout.write is added to the validfunc whitelist;
When I use this sandbox program to run the one-line script shown below:
# Filename: helloworld.lua
io.stdout:write("Hello, World!\n")
...I get the following error:
% ./sandbox helloworld.lua
lua5.3: ./sandbox:20: calling bad function ([C]:-1): __index
stack traceback:
[C]: in function 'error'
./sandbox:20: in function <./sandbox:16>
[C]: in metamethod '__index'
helloworld.lua:3: in local 'f'
./sandbox:34: in main chunk
[C]: in ?
I tried to fix this by adding the following to validfunc:
[getmetatable(io.stdout).__index] = true,
...but I still get pretty much the same error. I could go on guessing and trying more things to add, but this is what I would like to avoid.
I have two related questions:
What can I add to validfunc so that sandbox will run helloworld (as is) to completion?
More importantly, what is a systematic way to find determine what to add to a whitelist table?
Part (2) is the heart of this post. I am looking for tools/techniques that remove the guesswork from the problem of populating a whitelist table.
(I know that I can get helloworld to work if I replace io.stdout:write with print, register print in sandbox's validfunc, and pass {print = print} as the last argument to loadfile, but doing this does not answer the general question of how to systematically determine what needs to be added to the whitelist to allow some specific code to work in the sandbox.)
EDIT: Ask #DarkWiiPlayer pointed out, the calling bad function error is being triggered by the calling of an unregistered function (__index?), which happened as part of the response to an earlier attempt to index a nil value error. So, this post's questions are all about systematically determining what to add to validfunc to allow Lua to emit the attempt to index a nil value error normally.
I should add that the question of which function's call triggered the hook's execution responsible for the calling bad function error message is at the moment completely unclear. This error message blames the error on __index, but I suspect that this may be a red herring, possibly due to a bug in Lua.
Why suspect a bug in Lua? If I change the error call in sandbox slightly to
error(string.format("calling bad function (%s:%d): %s (%s)",
info.short_src, info.linedefined, (info.name or "?"),
info.func))
...then the error message looks like this:
lua5.3: ./sandbox:20: calling bad function ([C]:-1): __index (function: 0x55b391b79ef0)
stack traceback:
[C]: in function 'error'
./sandbox:20: in function <./sandbox:16>
[C]: in metamethod '__index'
helloworld.lua:3: in local 'f'
./sandbox:34: in main chunk
[C]: in ?
Nothing surprising there, but if now I change helloworld.lua to
# Filename: helloworld.lua
nonexistent()
io.stdout:write("Hello, World!\n")
...and run it under sandbox, the error message becomes
lua5.3: ./sandbox:20: calling bad function ([C]:-1): nonexistent (function: 0x556a161cdef0)
stack traceback:
[C]: in function 'error'
./sandbox:20: in function <./sandbox:16>
[C]: in global 'nonexistent'
helloworld.lua:3: in local 'f'
./sandbox:34: in main chunk
[C]: in ?
From this error message, one may conclude that nonexistent is a real function; after all, it's sitting right there at 0x556a161cdef0! But we know that nonexistent lives up to its name: it doesn't exist!
The whiff of a bug is definitely in the air. It could be that the function that is triggering the hook should really be excluded from those that trigger such "c"-masked hooks? Be that as it may, it appears that, in this particular situation, the call to debug.info is returning inconsistent information (since the name of the function [e.g. nonexistent] clearly does not correspond at all to the actual function object [e.g. function: 0x556a161cdef0] that is supposedly triggering the hook).

(Final answer at the bottom, feel free to skip until the <hr> line)
I'll explain my debugging step by step.
This is a really weird phenomenon. After some testing, I've managed to narrow it down a bit:
Since you pass {} to load, the function runs with an empty environment, so io is, in fact, nil (and io.stdout would error anyway)
The error happens directly when attempting to index io (which is a nil value)
The functio __index is a C function (see error message)
My first intuition was that __index was called somewhere internally. Thus, to find out what it does, I decided to look at its locals in hopes of guessing what it does.
A quick helper function I threw together:
local function locals(f)
return function(f, n)
local name, value = debug.getlocal(f+1, n)
if name then
return n+1, name, value
end
end, f, 1
end
Insert that right before the line where the error is raised:
for idx, name, value in locals(2) do
print(name, value)
end
error(string.format("calling bad function (%s:%d): %s", info.short_src, info.linedefined, (info.name or "?")))
This led to an interesting result:
(*temporary) stdin:43: attempt to index a nil value (global 'io')
(*temporary) table: 0x563cef2fd170
lua: stdin:29: calling bad function ([C]:-1): __index
stack traceback:
[C]: in function 'error'
stdin:29: in function <stdin:21>
[C]: in metamethod '__index'
stdin:43: in function 'f'
stdin:49: in main chunk
[C]: in ?
shell returned 1
Why is there a temporary string value with a completely different error message?
By the way, this error makes total sense; io does not exist because of the empty environment, so indexing it should obviously raise just that error.
It's honestly a very interesting error, but I'll leave it at this, as you're learning the language and this hint might be enough for you to figure it out on your own. It's also a very nice chance to actually use (and get to know) the debug module in a more practical context.
Actual Solution
After some time has now passed, I came back to add a proper solution to this problem, but I really already did just that. The weird error reporting is just Lua being weird. The real error is the empty environment that's set when loading the chunk, as I mentioned a few paragraphs above.
From the manual:
load (chunk [, chunkname [, mode [, env]]])
Loads a chunk.
[...]
If the resulting function has upvalues, the first upvalue is set to the value of env, if that parameter is given, or to the value of the global environment. Other upvalues are initialized with nil. (When you load a main chunk, the resulting function will always have exactly one upvalue, the _ENV variable (see §2.2). However, when you load a binary chunk created from a function (see string.dump), the resulting function can have an arbitrary number of upvalues.) All upvalues are fresh, that is, they are not shared with any other function.
[...]
Now, in a "main chunk", i.e. one loaded from a text Lua file, the first (and only) upvalue is always the environment of the chunk, so where it will look for "globals" (this is slightly different in Lua 5.1). Since an empty table is passed in, the chunk has no access to any of the global variables like string or io.
Therefore, when the function f() tries to index io, Lua throws an error "attempt to index a nil value", because io is nil. For whatever reason Lua then makes some internal function calls that end up triggering the blacklist, causing a new error that shadows the previous one; this makes debugging this error extremely inconvenient and almost impossible without using the debug library to get additional information about the call stack.
I ultimately only realized this myself after I noticed the original error message while looking at the locals of the function that made the blocked call.
I hope this solves the problem :)

Capture and handle multiple errors in sequence

I have an expression that spits out multiple errors:
stop("first")
stop("second")
stop("third")
print("success")
Is there a way to make this expression run all the way to the end (and, if possible, store all of the errors)?
The problem with tryCatch is that it stops execution on the first handled condition (so this will print "There was an error" exactly once).
tryCatch({
stop("first")
stop("second")
stop("third")
print("success")
}, error = function(e) print("There was an error"))
I read that withCallingHandlers will capture conditions and continue execution, so I tried something like this...
withCallingHandlers(
{...},
error = function(e) print("There was an error")
)
...but, although this does print the message, it also fails after the first error, which, as far as I can tell, is because the default restart just runs the current line again.
I think what I need to do is write a custom restart that simply skips the current expression and jumps to the next one, but I'm at a bit of a loss about how to go about that.
(Background: Basically, I'm trying to replicate what happens inside testthat::test_that, which is that even though each failing expectation throws an error, all expectations are run before quitting).

How to target exception-handling to specific exceptions?

Does R provide any support for targetting exception handling to only specific exceptions?
In Python, for example, one can narrow down exception-handling to specific exception types; e.g.:
try:
return frobozz[i]
except IndexError:
return DEFAULT
In this example, the exception handling will kick in only if i is an integer such that i >= len(frobozz) or i < -len(frobozz), but will not catch the exception resulting from, e.g., the case where i is the string "0" (which would be a TypeError, rather than an IndexError).

Wellllll...yes and no, and mostly no.
Every Python exception is wrapped in a particular error class which derives from Error, and Python modules are supposed to raise the "right" kinds of errors. For instance, an index out of range error should throw IndexError. The base language knows about these errors and so you can catch the appropriate error type in your except... clause.
R doesn't do that. Errors are untyped; there's no essential difference between an index out of bounds error and any other error.
That said, you can cheat under certain, very limited, circumstances.
> y <- tryCatch(x[[2]], error = function(e) e)
> y
<simpleError in x[[2]]: subscript out of bounds>
> y$message
[1] "subscript out of bounds"
The key here is the use of the tryCatch function and the error clause. The error clause in a tryCatch is a function of one variable which can perform arbitrary operations on e, which is an object of type 'simpleError' and contains an item named "message". You can parse message and handle interesting cases separately
> y <- tryCatch(x[[2]],
error = function(e) {
if ('subscript out of bounds' == e$message) return(NA) else stop(e))
})
> y
[1] NA
This only works if you can actually detect the error string you want to look for, and that isn't guaranteed. (Then again, it also isn't guaranteed in Python, so things aren't really much different from one another.)
Final question, though: why in Heaven's name are you doing this? What are you really trying to do?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Handling of condition objects in R - r

Related

Change in error handling of PyEval_EvalCode between CPython 3.6 and CPython 3.7?

How do I directly make an val binding to an option value in SML?

How to systematically populate a whitelist for a sandboxing program?

Capture and handle multiple errors in sequence

How to target exception-handling to specific exceptions?

Categories

Resources