Examples of the perils of globals in R and Stata - r

In recent conversations with fellow students, I have been advocating for avoiding globals except to store constants. This is a sort of typical applied statistics-type program where everyone writes their own code and project sizes are on the small side, so it can be hard for people to see the trouble caused by sloppy habits.
In talking about avoidance of globals, I'm focusing on the following reasons why globals might cause trouble, but I'd like to have some examples in R and/or Stata to go with the principles (and any other principles you might find important), and I'm having a hard time coming up with believable ones.
Non-locality: Globals make debugging harder because they make understanding the flow of code harder
Implicit coupling: Globals break the simplicity of functional programming by allowing complex interactions between distant segments of code
Namespace collisions: Common names (x, i, and so forth) get re-used, causing namespace collisions
A useful answer to this question would be a reproducible and self-contained code snippet in which globals cause a specific type of trouble, ideally with another code snippet in which the problem is corrected. I can generate the corrected solutions if necessary, so the example of the problem is more important.
Relevant links:
Global Variables are Bad
Are global variables bad?

I also have the pleasure of teaching R to undergraduate students who have no experience with programming. The problem I found was that most examples of when globals are bad, are rather simplistic and don't really get the point across.
Instead, I try to illustrate the principle of least astonishment. I use examples where it is tricky to figure out what was going on. Here are some examples:
I ask the class to write down what they think the final value of i will be:
i = 10
for(i in 1:5)
i = i + 1
i
Some of the class guess correctly. Then I ask should you ever write code like this?
In some sense i is a global variable that is being changed.
What does the following piece of code return:
x = 5:10
x[x=1]
The problem is what exactly do we mean by x
Does the following function return a global or local variable:
z = 0
f = function() {
if(runif(1) < 0.5)
z = 1
return(z)
}
Answer: both. Again discuss why this is bad.

Oh, the wonderful smell of globals...
All of the answers in this post gave R examples, and the OP wanted some Stata examples, as well. So let me chime in with these.
Unlike R, Stata does take care of locality of its local macros (the ones that you create with local command), so the issue of "Is this this a global z or a local z that is being returned?" never comes up. (Gosh... how can you R guys write any code at all if locality is not enforced???) Stata has a different quirk, though, namely that a non-existent local or global macro is evaluated as an empty string, which may or may not be desirable.
I have seen globals used for several main reasons:
Globals are often used as shortcuts for variable lists, as in
sysuse auto, clear
regress price $myvars
I suspect that the main usage of such construct is for someone who switches between interactive typing and storing the code in a do-file as they try multiple specifications. Say they try regression with homoskedastic standard errors, heteroskedastic standard errors, and median regression:
regress price mpg foreign
regress price mpg foreign, robust
qreg price mpg foreign
And then they run these regressions with another set of variables, then with yet another one, and finally they give up and set this up as a do-file myreg.do with
regress price $myvars
regress price $myvars, robust
qreg price $myvars
exit
to be accompanied with an appropriate setting of the global macro. So far so good; the snippet
global myvars mpg foreign
do myreg
produces the desirable results. Now let's say they email their famous do-file that claims to produce very good regression results to collaborators, and instruct them to type
do myreg
What will their collaborators see? In the best case, the mean and the median of mpg if they started a new instance of Stata (failed coupling: myreg.do did not really know you meant to run this with a non-empty variable list). But if the collaborators had something in the works, and too had a global myvars defined (name collision)... man, would that be a disaster.
Globals are used for directory or file names, as in:
use $mydir\data1, clear
God only knows what will be loaded. In large projects, though, it does come handy. You would want to define global mydir somewhere in your master do-file, may be even as
global mydir `c(pwd)'
Globals can be used to store an unpredictable crap, like a whole command:
capture $RunThis
God only knows what will be executed; let's just hope it is not ! format c:\. This is the worst case of implicit strong coupling, but since I am not even sure that RunThis will contain anything meaningful, I put a capture in front of it, and will be prepared to treat the non-zero return code _rc. (See, however, my example below.)
Stata's own use of globals is for God settings, like the type I error probability/confidence level: the global $S_level is always defined (and you must be a total idiot to redefine this global, although of course it is technically doable). This is, however, mostly a legacy issue with code of version 5 and below (roughly), as the same information can be obtained from less fragile system constant:
set level 90
display $S_level
display c(level)
Thankfully, globals are quite explicit in Stata, and hence are easy to debug and remove. In some of the above situations, and certainly in the first one, you'd want to pass parameters to do-files which are seen as the local `0' inside the do-file. Instead of using globals in the myreg.do file, I would probably code it as
unab varlist : `0'
regress price `varlist'
regress price `varlist', robust
qreg price `varlist'
exit
The unab thing will serve as an element of protection: if the input is not a legal varlist, the program will stop with an error message.
In the worst cases I've seen, the global was used only once after having been defined.
There are occasions when you do want to use globals, because otherwise you'd have to pass the bloody thing to every other do-file or a program. One example where I found the globals pretty much unavoidable was coding a maximum likelihood estimator where I did not know in advance how many equations and parameters I would have. Stata insists that the (user-supplied) likelihood evaluator will have specific equations. So I had to accumulate my equations in the globals, and then call my evaluator with the globals in the descriptions of the syntax that Stata would need to parse:
args lf $parameters
where lf was the objective function (the log-likelihood). I encountered this at least twice, in the normal mixture package (denormix) and confirmatory factor analysis package (confa); you can findit both of them, of course.

One R example of a global variable that divides opinion is the stringsAsFactors issue on reading data into R or creating a data frame.
set.seed(1)
str(data.frame(A = sample(LETTERS, 100, replace = TRUE),
DATES = as.character(seq(Sys.Date(), length = 100, by = "days"))))
options("stringsAsFactors" = FALSE)
set.seed(1)
str(data.frame(A = sample(LETTERS, 100, replace = TRUE),
DATES = as.character(seq(Sys.Date(), length = 100, by = "days"))))
options("stringsAsFactors" = TRUE) ## reset
This can't really be corrected because of the way options are implemented in R - anything could change them without you knowing it and thus the same chunk of code is not guaranteed to return exactly the same object. John Chambers bemoans this feature in his recent book.

A pathological example in R is the use of one of the globals available in R, pi, to compute the area of a circle.
> r <- 3
> pi * r^2
[1] 28.27433
>
> pi <- 2
> pi * r^2
[1] 18
>
> foo <- function(r) {
+ pi * r^2
+ }
> foo(r)
[1] 18
>
> rm(pi)
> foo(r)
[1] 28.27433
> pi * r^2
[1] 28.27433
Of course, one can write the function foo() defensively by forcing the use of base::pi but such recourse may not be available in normal user code unless packaged up and using a NAMESPACE:
> foo <- function(r) {
+ base::pi * r^2
+ }
> foo(r = 3)
[1] 28.27433
> pi <- 2
> foo(r = 3)
[1] 28.27433
> rm(pi)
This highlights the mess you can get into by relying on anything that is not solely in the scope of your function or passed in explicitly as an argument.

Here's an interesting pathological example involving replacement functions, the global assign, and x defined both globally and locally...
x <- c(1,NA,NA,NA,1,NA,1,NA)
local({
#some other code involving some other x begin
x <- c(NA,2,3,4)
#some other code involving some other x end
#now you want to replace NAs in the the global/parent frame x with 0s
x[is.na(x)] <<- 0
})
x
[1] 0 NA NA NA 0 NA 1 NA
Instead of returning [1] 1 0 0 0 1 0 1 0, the replacement function uses the index returned by the local value of is.na(x), even though you're assigning to the global value of x. This behavior is documented in the R Language Definition.

One quick but convincing example in R is to run the line like:
.Random.seed <- 'normal'
I chose 'normal' as something someone might choose, but you could use anything there.
Now run any code that uses generated random numbers, for example:
rnorm(10)
Then you can point out that the same thing could happen for any global variable.
I also use the example of:
x <- 27
z <- somefunctionthatusesglobals(5)
Then ask the students what the value of x is; the answer is that we don't know.

Through trial and error I've learned that I need to be very explicit in naming my function arguments (and ensure enough checks at the start and along the function) to make everything as robust as possible. This is especially true if you have variables stored in global environment, but then you try to debug a function with a custom valuables - and something doesn't add up! This is a simple example that combines bad checks and calling a global variable.
glob.arg <- "snake"
customFunction <- function(arg1) {
if (is.numeric(arg1)) {
glob.arg <- "elephant"
}
return(strsplit(glob.arg, "n"))
}
customFunction(arg1 = 1) #argument correct, expected results
customFunction(arg1 = "rubble") #works, but may have unexpected results

An example sketch that came up while trying to teach this today. Specifically, this focuses on trying to give intuition as to why globals can cause problems, so it abstracts away as much as possible in an attempt to state what can and cannot be concluded just from the code (leaving the function as a black box).
The set up
Here is some code. Decide whether it will return an error or not based on only the criteria given.
The code
stopifnot( all( x!=0 ) )
y <- f(x)
5/x
The criteria
Case 1: f() is a properly-behaved function, which uses only local variables.
Case 2: f() is not necessarily a properly-behaved function, which could potentially use global assignment.
The answer
Case 1: The code will not return an error, since line one checks that there are no x's equal to zero and line three divides by x.
Case 2: The code could potentially return an error, since f() could e.g. subtract 1 from x and assign it back to the x in the parent environment, where any x element equal to 1 could then be set to zero and the third line would return a division by zero error.

Here's one attempt at an answer that would make sense to statisticsy types.
Namespace collisions: Common names (x, i, and so forth) get re-used, causing namespace collisions
First we define a log likelihood function,
logLik <- function(x) {
y <<- x^2+2
return(sum(sqrt(y+7)))
}
Now we write an unrelated function to return the sum of squares of an input. Because we're lazy we'll do this passing it y as a global variable,
sumSq <- function() {
return(sum(y^2))
}
y <<- seq(5)
sumSq()
[1] 55
Our log likelihood function seems to behave exactly as we'd expect, taking an argument and returning a value,
> logLik(seq(12))
[1] 88.40761
But what's up with our other function?
> sumSq()
[1] 633538
Of course, this is a trivial example, as will be any example that doesn't exist in a complex program. But hopefully it'll spark a discussion about how much harder it is to keep track of globals than locals.

In R you may also try to show them that there is often no need to use globals as you may access the variables defined in the function scope from within the function itself by only changing the enviroment. For example the code below
zz="aaa"
x = function(y) {
zz="bbb"
cat("value of zz from within the function: \n")
cat(zz , "\n")
cat("value of zz from the function scope: \n")
with(environment(x),cat(zz,"\n"))
}

Related

Is this what rnorm(x) does if x is a vector, and how could I have found out faster?

I’m looking for R resources, and I started looking at “An Introduction to R” here at r-project.org. I did and got stumped immediately.
I think I've figured out what’s going on, and my question is basically
Are there resources to help me figure out something like this more
easily?
The preface of the Introduction to R suggests starting with the introductory session in Appendix A, and right at the start is this code and remark.
x <- rnorm(50)
y <- rnorm(x)
Generate two pseudo-random normal vectors of x- and y-coordinates.
The documentation says the (first and only non-optional) parameter to rnorm is the length of the result vector. So x <- rnorm(50) produces a vector of 50 random values from a normal distribution with mean 0 and standard deviation 1.
So far so good. But why does rnorm(x) seem to do what y <- rnorm(50) or y <- rnorm(length(x)) would have done? Either of these alternatives seem clearer to me.
My guess as to what happens is this:
The wrapper for rnorm didn’t care what kind of thing x is and just passed to the underlying C function a pointer to the C struct for x as an R object.
R objects represented in C are structs followed by “data”; the data of the C representation of an R vector of reals starts with two integers, the first of which is the vector's length. (The vector elements follow those integers.) I found this out by reading up on R internals here.
If a C function were written to find the value of an R integer from a passed pointer-to-R-object, and it were called with a pointer to an R vector of reals, it would find the vector’s length in the place it would look for the single integer.
In addition to my main question of “How can I figure out something like this more easily?”, I wouldn’t mind knowing whether what I think is going on is correct and whether the fact that rnorm(x) is idiomatic R in this context or more of a sloppy choice. Given that it does something useful, can it be relied upon or is it just lucky behavior for an expression that isn’t well-defined in R?
I’m used to strongly-typed languages like C or SQL, which have easier-to-follow (for me) semantics and which also have more comprehensive references available, so any references for R that have a programming-language-theory focus or are aimed at people used to strong typing would be good, too.
It is documented behavior. From ?rnorm:
Usage: [...]
rnorm(n, mean = 0, sd = 1)
Arguments:
[...]
n: number of observations. If ‘length(n) > 1’, the length is
taken to be the number required.

Function doesn't change value (R)

I have written a function that takes two arguments, a number between 0:16 and a vector which contains four parameter values.
The output of the function does change if I change the parameters in the vector, but it does not change if I change the number between 0:16.
I can add, that the function I'm having troubles with, includes another function (called 'pi') which takes the same arguments.
I have checked that the 'pi' function does actually change values if I change the value from 0:16 (and it does also change if I change the values of the parameters).
Firstly, here is my code;
pterm_ny <- function(x, theta){
(1-sum(theta[1:2]))*(theta[4]^(x))*exp((-1)*theta[4])/pi(x, theta)
}
pi <- function(x, theta){
theta[1]*1*(x==0)+theta[2]*(theta[3]^(x))*exp((-1)*(theta[3]))+(1-
sum(theta[1:2]))*(theta[4]^(x))*exp((-1)*(theta[4]))
}
Which returns 0.75 for pterm_ny(i,c(0.2,0.2,2,2)), were i = 1,...,16 and 0.2634 for i = 0, which tells me that the indicator function part in 'pi' does work.
With respect to raising a number to a certain power, I have been told that one should wrap the wished number in a 'I', as an example it would be like;
x^I(2)
I have tried to do that in my code, but that didn't help either.
I can't remember the argument for doing it, but I expect that it's to ensure that the number in parentheses is interpreted as an integer.
My end goal is to get 17 different values of the 'pterm' and to accomplish that, I was thinking of using the sapply function like this;
sapply(c(0:16),pterm_ny,theta = c(0.2,0.2,2,2))
I really hope that someone can point out what I'm missing here.
In advance, thank you!
You have a theta[4]^x term both in your main expression and in your pi() function; these are cancelling out, leaving the result invariant to changes in x ...
Also:
you might want to avoid using pi as your function name, as it's also a built-in variable (3.14159...) - this can sometimes cause confusion
the advice about using the "as is" function I() to protect powers is only relevant within formulas, e.g. as used in lm() (linear regression). (It would be used as I(x^2), not x^I(2)

R copies for no apparent reason

Some R function will make R copy the object AFTER the function call, like nrow, while some others don't, like sum.
For example the following code:
x = as.double(1:1e8)
system.time(x[1] <- 100)
y = sum(x)
system.time(x[1] <- 200) ## Fast (takes 0s), after calling sum
foo = function(x) {
return(sum(x))
}
y = foo(x)
system.time(x[1] <- 300) ## Slow (takes 0.35s), after calling foo
Calling foo is NOT slow, because x isn't copied. However, changing x again is very slow, as x is copied. My guess is that calling foo will leave a reference to x, so when changing it after, R makes another copy.
Any one knows why R does this? Even when the function doesn't change x at all? Thanks.
I definitely recommend Hadley's Advanced R book, as it digs into some of the internals that you will likely find interesting and relevant. Most relevant to your question (and as mentioned by #joran and #lmo), the reason for the slow-down was an additional reference that forced copy-on-modify.
An excerpt that might be beneficial from Memory#Modification:
There are two possibilities:
R modifies x in place.
R makes a copy of x to a new location, modifies the copy, and then
uses the name x to point to the new location.
It turns out that R can do either depending on the circumstances. In
the example above, it will modify in place. But if another variable
also points to x, then R will copy it to a new location. To explore
what’s going on in greater detail, we use two tools from the pryr
package. Given the name of a variable, address() will tell us the
variable’s location in memory and refs() will tell us how many names
point to that location.
Also of interest are the sections on R's C interface and Performance. The pryr package also has tools for working with these sorts of internals in an easier fashion.
One last note from Hadley's book (same Memory section) that might be helpful:
While determining that copies are being made is not hard, preventing
such behaviour is. If you find yourself resorting to exotic tricks to
avoid copies, it may be time to rewrite your function in C++, as
described in Rcpp.

How to not fall into R's 'lazy evaluation trap'

"R passes promises, not values. The promise is forced when it is first evaluated, not when it is passed.", see this answer by G. Grothendieck. Also see this question referring to Hadley's book.
In simple examples such as
> funs <- lapply(1:10, function(i) function() print(i))
> funs[[1]]()
[1] 10
> funs[[2]]()
[1] 10
it is possible to take such unintuitive behaviour into account.
However, I find myself frequently falling into this trap during daily development. I follow a rather functional programming style, which means that I often have a function A returning a function B, where B is in some way depending on the parameters with which A was called. The dependency is not as easy to see as in the above example, since calculations are complex and there are multiple parameters.
Overlooking such an issue leads to difficult to debug problems, since all calculations run smoothly - except that the result is incorrect. Only an explicit validation of the results reveals the problem.
What comes on top is that even if I have noticed such a problem, I am never really sure which variables I need to force and which I don't.
How can I make sure not to fall into this trap? Are there any programming patterns that prevent this or that at least make sure that I notice that there is a problem?
You are creating functions with implicit parameters, which isn't necessarily best practice. In your example, the implicit parameter is i. Another way to rework it would be:
library(functional)
myprint <- function(x) print(x)
funs <- lapply(1:10, function(i) Curry(myprint, i))
funs[[1]]()
# [1] 1
funs[[2]]()
# [1] 2
Here, we explicitly specify the parameters to the function by using Curry. Note we could have curried print directly but didn't here for illustrative purposes.
Curry creates a new version of the function with parameters pre-specified. This makes the parameter specification explicit and avoids the potential issues you are running into because Curry forces evaluations (there is a version that doesn't, but it wouldn't help here).
Another option is to capture the entire environment of the parent function, copy it, and make it the parent env of your new function:
funs2 <- lapply(
1:10, function(i) {
fun.res <- function() print(i)
environment(fun.res) <- list2env(as.list(environment())) # force parent env copy
fun.res
}
)
funs2[[1]]()
# [1] 1
funs2[[2]]()
# [1] 2
but I don't recommend this since you will be potentially copying a whole bunch of variables you may not even need. Worse, this gets a lot more complicated if you have nested layers of functions that create functions. The only benefit of this approach is that you can continue your implicit parameter specification, but again, that seems like bad practice to me.
As others pointed out, this might not be the best style of programming in R. But, one simple option is to just get into the habit of forcing everything. If you do this, realize you don't need to actually call force, just evaluating the symbol will do it. To make it less ugly, you could make it a practice to start functions like this:
myfun<-function(x,y,z){
x;y;z;
## code
}
There is some work in progress to improve R's higher order functions like the apply functions, Reduce, and such in handling situations like these. Whether this makes into R 3.2.0 to be released in a few weeks depend on how disruptive the changes turn out to be. Should become clear in a week or so.
R has a function that helps safeguard against lazy evaluation, in situations like closure creation: forceAndCall().
From the online R help documentation:
forceAndCall is intended to help defining higher order functions like apply to behave more reasonably when the result returned by the function applied is a closure that captured its arguments.

How to force an error if non-finite values (NA, NaN, or Inf) are encountered

There's a conditional debugging flag I miss from Matlab: dbstop if infnan described here. If set, this condition will stop code execution when an Inf or NaN is encountered (IIRC, Matlab doesn't have NAs).
How might I achieve this in R in a more efficient manner than testing all objects after every assignment operation?
At the moment, the only ways I see to do this are via hacks like the following:
Manually insert a test after all places where these values might be encountered (e.g. a division, where division by 0 may occur). The testing would be to use is.finite(), described in this Q & A, on every element.
Use body() to modify the code to call a separate function, after each operation or possibly just each assignment, which tests all of the objects (and possibly all objects in all environments).
Modify R's source code (?!?)
Attempt to use tracemem to identify those variables that have changed, and check only these for bad values.
(New - see note 2) Use some kind of call handlers / callbacks to invoke a test function.
The 1st option is what I am doing at present. This is tedious, because I can't guarantee I've checked everything. The 2nd option will test everything, even if an object hasn't been updated. That is a massive waste of time. The 3rd option would involve modifying assignments of NA, NaN, and infinite values (+/- Inf), so that an error is produced. That seems like it's better left to R Core. The 4th option is like the 2nd - I'd need a call to a separate function listing all of the memory locations, just to ID those that have changed, and then check the values; I'm not even sure this will work for all objects, as a program may do an in-place modification, which seems like it would not invoke the duplicate function.
Is there a better approach that I'm missing? Maybe some clever tool by Mark Bravington, Luke Tierney, or something relatively basic - something akin to an options() parameter or a flag when compiling R?
Example code Here is some very simple example code to test with, incorporating the addTaskCallback function proposed by Josh O'Brien. The code isn't interrupted, but an error does occur in the first scenario, while no error occurs in the second case (i.e. badDiv(0,0,FALSE) doesn't abort). I'm still investigating callbacks, as this looks promising.
badDiv <- function(x, y, flag){
z = x / y
if(flag == TRUE){
return(z)
} else {
return(FALSE)
}
}
addTaskCallback(stopOnNaNs)
badDiv(0, 0, TRUE)
addTaskCallback(stopOnNaNs)
badDiv(0, 0, FALSE)
Note 1. I'd be satisfied with a solution for standard R operations, though a lot of my calculations involve objects used via data.table or bigmemory (i.e. disk-based memory mapped matrices). These appear to have somewhat different memory behaviors than standard matrix and data.frame operations.
Note 2. The callbacks idea seems a bit more promising, as this doesn't require me to write functions that mutate R code, e.g. via the body() idea.
Note 3. I don't know whether or not there is some simple way to test the presence of non-finite values, e.g. meta information about objects that indexes where NAs, Infs, etc. are stored in the object, or if these are stored in place. So far, I've tried Simon Urbanek's inspect package, and have not found a way to divine the presence of non-numeric values.
Follow-up: Simon Urbanek has pointed out in a comment that such information is not available as meta information for objects.
Note 4. I'm still testing the ideas presented. Also, as suggested by Simon, testing for the presence of non-finite values should be fastest in C/C++; that should surpass even compiled R code, but I'm open to anything. For large datasets, e.g. on the order of 10-50GB, this should be a substantial savings over copying the data. One may get further improvements via use of multiple cores, but that's a bit more advanced.
The idea sketched below (and its implementation) is very imperfect. I'm hesitant to even suggest it, but: (a) I think it's kind of interesting, even in all of its ugliness; and (b) I can think of situations where it would be useful. Given that it sounds like you are right now manually inserting a check after each computation, I'm hopeful that your situation is one of those.
Mine is a two-step hack. First, I define a function nanDetector() which is designed to detect NaNs in several of the object types that might be returned by your calculations. Then, it using addTaskCallback() to call the function nanDetector() on .Last.value after each top-level task/calculation is completed. When it finds an NaN in one of those returned values, it throws an error, which you can use to avoid any further computations.
Among its shortcomings:
If you do something like setting stop(error = recover), it's hard to tell where the error was triggered, since the error is always thrown from inside of stopOnNaNs().
When it throws an error, stopOnNaNs() is terminated before it can return TRUE. As a consequence, it is removed from the task list, and you'll need to reset with addTaskCallback(stopOnNaNs) it you want to use it again. (See the 'Arguments' section of ?addTaskCallback for more details).
Without further ado, here it is:
# Sketch of a function that tests for NaNs in several types of objects
nanDetector <- function(X) {
# To examine data frames
if(is.data.frame(X)) {
return(any(unlist(sapply(X, is.nan))))
}
# To examine vectors, matrices, or arrays
if(is.numeric(X)) {
return(any(is.nan(X)))
}
# To examine lists, including nested lists
if(is.list(X)) {
return(any(rapply(X, is.nan)))
}
return(FALSE)
}
# Set up the taskCallback
stopOnNaNs <- function(...) {
if(nanDetector(.Last.value)) {stop("NaNs detected!\n")}
return(TRUE)
}
addTaskCallback(stopOnNaNs)
# Try it out
j <- 1:00
y <- rnorm(99)
l <- list(a=1:4, b=list(j=1:4, k=NaN))
# Error in function (...) : NaNs detected!
# Subsequent time consuming code that could be avoided if the
# error thrown above is used to stop its evaluation.
I fear there is no such shortcut. In theory on unix there is SIGFPE that you could trap on, but in practice
there is no standard way to enable FP operations to trap it (even C99 doesn't include a provision for that) - it is highly system-specifc (e.g. feenableexcept on Linux, fp_enable_all on AIX etc.) or requires the use of assembler for your target CPU
FP operations are nowadays often done in vector units like SSE so you can't be even sure that FPU is involved and
R intercepts some operations on things like NaNs, NAs and handles them separately so they won't make it to the FP code
That said, you could hack yourself an R that will catch some exceptions for your platform and CPU if you tried hard enough (disable SSE etc.). It is not something we would consider building into R, but for a special purpose it may be doable.
However, it would still not catch NaN/NA operations unless you change R internal code. In addition, you would have to check every single package you are using since they may be using FP operations in their C code and may also handle NA/NaN separately.
If you are only worried about things like division by zero or over/underflows, the above will work and is probably the closest to something like a solution.
Just checking your results may not be very reliable, because you don't know whether a result is based on some intermediate NaN calculation that changed an aggregated value which may not need to be NaN as well. If you are willing to discard such case, then you could simply walk recursively through your result objects or the workspace. That should not be extremely inefficient, because you only need to worry about REALSXP and not anything else (unless you don't like NAs either - then you'd have more work).
This is an example code that could be used to traverse R object recursively:
static int do_isFinite(SEXP x) {
/* recurse into generic vectors (lists) */
if (TYPEOF(x) == VECSXP) {
int n = LENGTH(x);
for (int i = 0; i < n; i++)
if (!do_isFinite(VECTOR_ELT(x, i))) return 0;
}
/* recurse into pairlists */
if (TYPEOF(x) == LISTSXP) {
while (x != R_NilValue) {
if (!do_isFinite(CAR(x))) return 0;
x = CDR(x);
}
return 1;
}
/* I wouldn't bother with attributes except for S4
where attributes are slots */
if (IS_S4_OBJECT(x) && !do_isFinite(ATTRIB(x))) return 0;
/* check reals */
if (TYPEOF(x) == REALSXP) {
int n = LENGTH(x);
double *d = REAL(x);
for (int i = 0; i < n; i++) if (!R_finite(d[i])) return 0;
}
return 1;
}
SEXP isFinite(SEXP x) { return ScalarLogical(do_isFinite(x)); }
# in R: .Call("isFinite", x)

Resources