Is stricter error reporting available in R? - r

In PHP we can do error_reporting(E_ALL) or error_reporting(E_ALL|E_STRICT) to have warnings about suspicious code. In g++ you can supply -Wall (and other flags) to get more checking of your code. Is there some similar in R?
As a specific example, I was refactoring a block of code into some functions. In one of those functions I had this line:
if(nm %in% fields$non_numeric)...
Much later I realized that I had overlooked adding fields to the parameter list, but R did not complain about an undefined variable.

(Posting as an answer rather than a comment)
How about ?codetools::checkUsage (codetools is a built-in package) ... ?

This is not really an answer, I just can't resist showing how you could declare globals explicitly. #Ben Bolker should post his comment as the Answer.
To avoiding seeing globals, you can take a function "up" one environment -- it'll be able to see all the standard functions and such (mean, etc), but not anything you put in the global environment:
explicit.globals = function(f) {
name = deparse(substitute(f))
env = parent.frame()
enclos = parent.env(.GlobalEnv)
environment(f) = enclos
env[[name]] = f
}
Then getting a global is just retrieving it from .GlobalEnv:
global = function(n) {
name = deparse(substitute(n))
env = parent.frame()
env[[name]] = get(name, .GlobalEnv)
}
assign('global', global, env=baseenv())
And it would be used like
a = 2
b = 3
f = function() {
global(a)
a
b
}
explicit.globals(f)
And called like
> f()
Error in f() : object 'b' not found
I personally wouldn't go for this but if you're used to PHP it might make sense.

Summing up, there is really no correct answer: as Owen and gsk3 point out, R functions will use globals if a variable is not in the local scope. This may be desirable in some situations, so how could the "error" be pointed out?
checkUsage() does nothing that R's built-in error-checking does not (in this case). checkUsageEnv(.GlobalEnv) is a useful way to check a file of helper functions (and might be great as a pre-hook for svn or git; or as part of an automated build process).
I feel the best solution when refactoring is: at the very start to move all global code to a function (e.g. call it main()) and then the only global code would be to call that function. Do this first, then start extracting functions, etc.

Related

How an R function can be used before being defined

I came across this spinet of code where the function rval_top_ingredients() was used to render a D3wordcloud before it was defined. I think that would throw an error in case of Python as the script is executed from top to bottom. Why did it work in R then? Thankyou.
output$wc_ingredients <- d3wordcloud::renderD3wordcloud({
ingredients_df <- rval_top_ingredients()
d3wordcloud(ingredients_df$ingredient, ingredients_df$nb_recipes, tooltip = TRUE)
})
rval_top_ingredients <- reactive({
recipes_enriched %>%
filter(cuisine == input$cuisine) %>%
arrange(desc(tf_idf)) %>%
head(input$nb_ingredients) %>%
mutate(ingredient = forcats::fct_reorder(ingredient, tf_idf))
})
R doesn’t differ from Python here: you can’t use a function before it’s defined. But, despite appearances to the contrary, this also isn’t happening here.
d3wordcloud::renderD3wordcloud is a special function call which doesn’t evaluate its arguments immediately. In fact, the argument is stored internally as an unevaluated expression and is only evaluated later after a certain trigger. By that time, rval_top_ingredients has been defined.
This is a pervasive pattern in Shiny, but you can harness this behaviour yourself. Consider the following:
f = function (expr) {}
f(g())
g = function () { stop('oh no!') }
This code works, since f never uses its argument, and since R uses lazy evaluation for function arguments: unlike most other languages, a function argument only gets evaluated once it is used. Arguments that are never used are never evaluated.
So, despite the fact that f(g()) appears to use g before it’s defined, the actual call to f never evaluates its arguments so there’s no issue. The only constraint is that the argument needs to be syntactically valid.
Here’s a slightly more meaningful example which does something useful (it creates a function that creates a log message before evaluating an expression:
make_verbose = function (expr) {
function () {
message(sprintf('Evaluating %s', deparse(substitute(expr))))
expr
}
}
verbose_g = make_verbose(g())
g = function () {
message('g was called!')
}
verbose_g()
Python doesn’t quite support this, since Python doesn’t have lazy and non-standard evaluation. But a similar situation still exists in Python:
def f():
g()
def g():
print('g()')
f()
Here, g() is seemingly used before it was defined; but this is only true if we’re reading the code textually from top top bottom without paying attention to scope. In reality, g() is only ever called after it was defined. The same is true in the R code you’ve posted.

Does each assignment mean that a copy is being made?

Recently I learned that in R there are no references, rather all object are immutable and each assignment makes a copy.
Uh-oh.
Copying large matrices over and over seems pretty horrible...
Now I'm in a paranoia, copypasting code all the time because I'm afraid of making helper functions (passing parameters = assignment? returning values = assignment?), I'm afraid of making helper variables if I'm not 100% sure an object would be copied anyway...
Example:
What I would love to make:
foo = function(someGivenLargeObject) {
returnedMatrix = someGivenLargeObject$someLargeMatrix # <- BAD?!?!?!?!
if(someCondition)
returnedMatrix = operateOn(returnedMatrix)
if(otherCondition)
returnedMatrix = operateOn(returnedMatrix)
returnedMatrix
}
What I'm making instead:
foo = function(someGivenLargeObject) { # <- still BAD?!?!?!
returnedMatrix = NULL # <- No copy of someLargeMatrix is made!
if(someCondition)
returnedMatrix = operateOn(someGivenLargeObject$someLargeMatrix)
if(otherCondition)
returnedMatrix = operateOn(
if(is.null(returnedMatrix))
someGivenLargeObject$someLargeMatrix
else
returnedMatrix
) # <- ^ Incredible clutter! Unreadable!
if(is.null(returnedMatrix))
return(someGivenLargeObject$someLargeMatrix)
else
return(returnedMatrix) # <- does return copy stuff?!?!?!?!
The readability loss in the second version of the function is pretty amazing IMO; yet - is this the price to avoid the unecessary copying of someLargeMatrix in case neither someCondition nor otherCondition holds? Because the line returnedMatrix = someGivenLargeObject$someLargeMatrix would necessite this copying?
Or am I in a paranoia, may I go safely with the more readable version of the function because making a reference to someLargeMatrix doesn't necessite copying? (BUT THERE ARE NO REFERENCES IN R!!!)
Also I hope that a function call / function return doesn't copy stuff either?
}
Side note: Just so that it is clear: I didn't yet run into an issue when I knew an object was copied unecessarily in a situation like that I described above. I'm just perplexed by having read that "there are no references in R", so this question is based on my worries from what might be the implication of this lack of references, rather than any empirical observation.
Donald Knuth famously said "Premature Optimization is the root of all evil",
http://wiki.c2.com/?PrematureOptimization
it is good to be aware about this, but code clarity is on most cases more important.
R is usually smart enough to figure out when copy is needed.
(not all assignments cause a copy only assignments that are later modified)

Why does rm inside a function not delete objects?

rel.mem <- function(nm) {
rm(nm)
}
I defined the above function rel.mem -- takes a single argument and passes it to rm
> ls()
[1] "rel.mem"
> x<-1:10
> ls()
[1] "rel.mem" "x"
> rel.mem(x)
> ls()
[1] "rel.mem" "x"
Now you can see what I call rel.mem x is not deleted -- I know this is due to the incorrect environment on which rm is being attempted.
What is a good fix for this?
Criteria for a good fix:
The caller should not have to pass the environment
The callee (rel.mem) should be able to determine the environment by using an R language facility (call stack inspection, aspects, etc.)
The interface of the function rel.mem should be kept simple -- idiot proof: call rel.mem -- then rel.mem takes it from there -- no need to pass environments.
NOTES:
As many commenters have pointed out that one easy fix is to pass the environment.
What I meant by a good fix [and I should have clarified it] is that the callee function (in this case rel.mem) is able to calculate/find out the environment when the caller was referring to and then remove the object from the right environment.
The type of reasoning in "2" can be done in other languages by inspecting the call stack -- for example in Java I would throw a dummy exception -- catch it and then parse the call stack. In other languages still I could use Aspect Oriented techniques. The question is can something like that be done in R?
As one commenter has suggested that there may be multiple objects with the same name and thus the "right" environment is meaningless -- as I've stated above that in other languages it is possible (sometimes with some creative trickery) to interpret the call-stack -- this may not be possible in R
As one commenter has suggested that rm(list=nm, envir = parent.frame()) will remove this from the parent environment. This is correct -- however I'm looking for something that will work for an arbitrary call depth.
The quick answer is that you're in a different environment - essentially picture the variables in a box: you have a box for the function and one for the Global Environment. You just need to tell rm where to find that box.
So
rel_mem <- function(nm) {
# State the environment
rm(list=nm, envir = .GlobalEnv )
}
x = 10
rel_mem("x")
Alternatively, you can use the pos argument, e.g.
rel_mem <- function(nm) {
rm(list=nm, pos=1 )
}
If you type search() you will see a vector of environments, the global is number 1.
Another two options are
envir = parent.frame() if you want to go one level up the call stack
Use inherits = TRUE to go up the call stack until you find something
In the above code, notice that I'm passing the object as a character - I'm passing the "x" not x. We can be clever and avoid this using the substitute function
rel_mem <- function(nm) {
rm(list = as.character(substitute(nm)), envir = .GlobalEnv )
}
To finish I'll just add that deleting things in the .GlobalEnv from a function is generally a bad idea.
Further resources:
Environments:http://adv-r.had.co.nz/Environments.html
Substitute function: http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions
If you are using another function to find the global objects within your function such as ls(), you must state the environment in it explicitly too:
rel_mem <- function(nm) {
# State the environment in both functions
rm(list = ls(envir = .GlobalEnv) %>% .[startsWith(., "plot_")], envir = .GlobalEnv)
}

Get variables that have been created inside a function

I have created a function (which is quite long) that I have saved in a .txt file.
It works well (I use source(< >) to access it).
My problem is that I have created a few variables in that function
ie:
myfun<-function(a,b) {
Var1=....
Var2=Var1 + ..
}
Now I want to get those variables.
When I include return() inside the function, its fine: the value comes up on the screen, but when I type Var1 outside the function, I have an error message "the object cannot be found".
I am new to R, but I was thinking it might be because "myfun" operates in a different envrionment than the global one, but when I did
environment()
environment: R_GlobalEnv>
environment(myfun1)
environment: R_GlobalEnv>
It seems to me the problem is elsewhere...
Any idea?
Thanks
I realize this answer is more than 3 years old but I believe the option you are looking for is as follows:
myfun <- function(a,b) {
Var1 = (a + b) / 2 # do whatever logic you have to do here...
Var2 <<- Var1 + a # then output result to Global Environment with the "<<-" object.
}
The double "<<-" assignment operator will output "Var2" to the global environment and you can then use or reference it however you like without having to use "return()" inside your function.
If you want to do it in a nice way, write a class and than provide a print method. Within this class it is possible to return variables invisible. A nice book which covers such topics is "The Art of R programming".
An easy fix would be save each variable you need later on an list and than return a list
(as Peter pointed out):
return(list(VAR1=VAR1, .....))

how to isolate a function

How can I ensure that when a function is called it is not allowed to grab variables from the global environment?
I would like the following code to give me an error. The reason is because I might have mistyped z (I wanted to type y).
z <- 10
temp <- function(x,y) {
y <- y + 2
return(x+z)
}
> temp(2,1)
[1] 12
I'm guessing the answer has to do with environments, but I haven't understood those yet.
Is there a way to make my desired behavior default (e.g. by setting an option)?
> library(codetools)
> checkUsage(temp)
<anonymous>: no visible binding for global variable 'z'
The function doesn't change, so no need to check it each time it's used. findGlobals is more general, and a little more cryptic. Something like
Filter(Negate(is.null), eapply(.GlobalEnv, function(elt) {
if (is.function(elt))
findGlobals(elt)
}))
could visit all functions in an environment, but if there are several functions then maybe it's time to think about writing a package (it's not that hard).
environment(temp) = baseenv()
See also http://cran.r-project.org/doc/manuals/R-lang.html#Scope-of-variables and ?environment.
environment(fun) = parent.env(environment(fun))
(I'm using 'fun' in place of your function name 'temp' for clarity)
This will remove the "workspace" environment (.GlobalEnv) from the search path and leave everything else (eg all packages).

Resources