R unary operator overload: risks? - r

In my continuing quest to avoid using parentheses for some simple commands, I wrote up the following operator to create a new graphics window. My question is: am I at risk of "breaking" anything in R, other than the obvious inability to execute the "not" function on my variable "newdev"?
# function to overload "!" for one purpose only
#this is adapted from the sos package code for "???", credited to Duncan Murdoch.
# Example of how to create a specialized unary operator that doesn't require
# parentheses for its argument. So far as I can tell,
#the only way to do this is to overload an existing function or
# operator which doesn't require parentheses. "?" and "!" meet this requirement.
`!` <- function (e1, e2) {
call <- match.call()
# match.call breaks out each callable function in argument list (which was "??foo" for the sos package "???",
# which allows topicExpr1 to become a list variable w/ callable function "!" (or "?" in sos)
original <- function() {
call[[1]]<-quote(base::`!`)
return(eval(call, parent.frame(2)))
}
# this does preclude my ever having an actual
# variable called "newdev" (or at least trying to create the actual NOT of it)
if(call[[2]] =='newdev') {
windows(4.5,4.5,restoreConsole=T)
}else{
return(original()) # do what "!" is supposed to do
}
}

I executed "!" = function(a){stop("'NOT' is used")} and executed the replications function, which uses the ! operator, and this worked fine. So it looks like it is safe to override "!".
Still you probably want to use classes, which you can do as follows:
# Create your object and set the class
A = 42
class(A) = c("my_class")
# override ! for my_class
"!.my_class" = function(v){
cat("Do wathever you want here. Argument =",v,"\n")
}
# Test ! on A
!A

with
makeActiveBinding
you can replace ls() by e.g LS w/o need of unary operators

Related

How do I know which method will be called?

If I type methods(print) I get a long list of methods.
For an object of class data.frame, print.data.frame will be called.
It's not always so simple though:
hw <- "hello world"
class(hw) # [1] "character"
There is no print.character method. How do I know which method is called when executing print(hw)?
Turn debugging on for print and then run your example:
> debug(print)
> print("hello")
debugging in: print("hello")
debug: UseMethod("print")
Browse[2]> <---------------------------- press Enter to step forward
debugging in: print.default("hello") <-- this is the method that gets called
debug: {
noOpt <- missing(digits) && missing(quote) && missing(na.print) &&
missing(print.gap) && missing(right) && missing(max) &&
missing(useSource) && missing(...)
.Internal(print.default(x, digits, quote, na.print, print.gap,
right, max, useSource, noOpt))
}
Have you read Hadley's Advanced R and the chapter on objects? It might not give you the whole answer, but fundamentally what you're experiencing is the difference between method dispatch in C and regular S3 behaviour.
[ isn't really an R function, it's a C function and the decision of what method to use is done in C. That doesn't mean you can't create an S3 method for [ (or sum, +, [<- and other .Primitive functions), but when you do it's more like you're making a wrapper/preprocess for the C function, which R will dispatch, before the ultimate decision is made by the C function based on classes defined separate from your regular (and extensible) R classes.
Or at least that's how I've understood it.

How can one store a binary operator in a variable?

I am pretty confused about the meaning of the %Something% operators.
How can one store a binary operator in a variable?
Something like...
binary_operator = store.binary(%in%)
c(3,9,4,1,7) binary_operator c(1:5) # would behave alike "c(3,9,4,1,7) %in% c(1:5)"
or something like
library(foreach)
binary_operator = expression(%do%) # or expression(%dopar%)
...
...
foreach (i=1:6) binary_operator { # would behave alike "foreach (i=1:6) %do% ..."
...
...
}
If you want to define your own infix operator, it must begin and end with %. This is so the parser knows how to properly pass the parameters to the function since it not the usual way code is parsed. Also, use the backtick to escape the percent signs in the variable name.
`%binary_operator%` <- `%in%`
c(3,9,4,1,7) %binary_operator% c(1:5)

How to list all the functions signatures in an R file?

Is there an R function that lists all the functions in an R script file along with their arguments?
i.e. an output of the form:
func1(var1, var2)
func2(var4, var10)
.
.
.
func10(varA, varB)
Using [sys.]source has the very undesirable side-effect of executing the source inside the file. At the worst this has security problems, but even “benign” code may simply have unintended side-effects when executed. At best it just takes unnecessary time (and potentially a lot).
It’s actually unnecessary to execute the code, though: it is enough to parse it, and then do some syntactical analysis.
The actual code is trivial:
file_parsed = parse(filename)
functions = Filter(is_function, file_parsed)
function_names = unlist(Map(function_name, functions))
And there you go, function_names contains a vector of function names. Extending this to also list the function arguments is left as an exercise to the reader. Hint: there are two approaches. One is to eval the function definition (now that we know it’s a function definition, this is safe); the other is to “cheat” and just get the list of arguments to the function call.
The implementation of the functions used above is also not particularly hard. There’s probably even something already in R core packages (‘utils’ has a lot of stuff) but since I’m not very familiar with this, I’ve just written them myself:
is_function = function (expr) {
if (! is_assign(expr)) return(FALSE)
value = expr[[3L]]
is.call(value) && as.character(value[[1L]]) == 'function'
}
function_name = function (expr) {
as.character(expr[[2L]])
}
is_assign = function (expr) {
is.call(expr) && as.character(expr[[1L]]) %in% c('=', '<-', 'assign')
}
This correctly recognises function declarations of the forms
f = function (…) …
f <- function (…) …
assign('f', function (…) …)
It won’t work for more complex code, since assignments can be arbitrarily complex and in general are only resolvable by actually executing the code. However, the three forms above probably account for ≫ 99% of all named function definitions in practice.
UPDATE: Please refer to the answer by #Konrad Rudolph instead
You can create a new environment, source your file in that environment and then list the functions in it using lsf.str() e.g.
test.env <- new.env()
sys.source("myfile.R", envir = test.env)
lsf.str(envir=test.env)
rm(test.env)
or if you want to wrap it as a function:
listFunctions <- function(filename) {
temp.env <- new.env()
sys.source(filename, envir = temp.env)
functions <- lsf.str(envir=temp.env)
rm(temp.env)
return(functions)
}

using callCC with higher-order functions in R

I'm trying to figure out how to get R's callCC function for short-circuiting evalutation of a function to work with functions like lapply and Reduce.
Motivation
This would make Reduce and and lapply have asymptotic efficiency > O(n), by allowing you to
exit a computation early.
For example, if I'm searching for a value in a list I could map a 'finder' function across the list, and the second it is found lapply stops running and that value is returned (much like breaking a loop, or using a return statement to break out early).
The problem is I am having trouble writing the functions that lapply and Reduce should take using a style that callCC requires.
Example
Say I'm trying to write a function to find the value '100' in a list: something equivalent to
imperativeVersion <- function (xs) {
for (val in xs) if (val == 100) return (val)
}
The function to pass to lapply would look like:
find100 <- function (val) { if (val == 100) SHORT_CIRCUIT(val) }
functionalVersion <- function (xs) lapply(xs, find100)
This (obviously) crashes, since the short circuiting function hasn't been defined yet.
callCC( function (SHORT_CIRCUIT) lapply(1:1000, find100) )
The problem is that this also crashes, because the short circuiting function wasn't around when find100 was defined. I would like for something similar to this to work.
the following works because SHORT_CIRCUIT IS defined at the time that the function passed to lapply is created.
callCC(
function (SHORT_CIRCUIT) {
lapply(1:1000, function (val) {
if (val == 100) SHORT_CIRCUIT(val)
})
)
How can I make SHORT_CIRCUIT be defined in the function passed to lapply without defining it inline like above?
I'm aware this example can be achieved using loops, reduce or any other number of ways. I am looking for a solution to the problem of using callCC with lapply and Reduce in specific.
If I was vague or any clarification is needed please leave a comment below. I hope someone can help with this :)
Edit One:
The approach should be 'production-quality'; no deparsing functions or similar black magic.
I found a soluton to this problem:
find100 <- function (val) {
if (val == 100) SHORT_CIRCUIT(val)
}
short_map <- function (fn, coll) {
callCC(function (SHORT_CIRCUIT) {
clone_env <- new.env(parent = environment(fn))
clone_env$SHORT_CIRCUIT <- SHORT_CIRCUIT
environment(fn) <- clone_env
lapply(coll, fn)
})
}
short_map(find100, c(1,2,100,3))
The trick to making higher-order functions work with callCC is to assign the short-circuiting function into the input functions environment before carrying on with the rest of the program. I made a clone of the environment to avoid unintended side-effects.
You can achieve this using metaprogramming in R.
#alexis_laz's approach was in fact already metaprogramming.
However, he used strings which are a dirty hack and error prone. So you did well to reject it.
The correct way to approach #alexis_laz's approach would be by wrangling on code level. In base R this is done using substitute(). There are however better packages e.g. rlang by Hadley Wickham. But I give you a base R solution (less dependency).
lapply_ <- function(lst, FUN) {
eval.parent(
substitute(
callCC(function(return_) {
lapply(lst_, FUN_)
}),
list(lst_ = lst, FUN_=substitute(FUN))))
}
Your SHORT_CIRCUIT function is actually a more general, control flow return function (or a break function which takes an argument to return it). Thus, I call it return_.
We want to have a lapply_ function, in which we can in the FUN= part use a return_ to break out of the usual lapply().
As you showed, this is the aim:
callCC(
function (return_) {
lapply(1:1000, function (x) if (x == 100) return_(x))
}
)
Just with the problem, that we want to be able to generalize this expression.
We want
callCC(
function(return_) lapply(lst, FUN_)
)
Where we can use inside the function definition we give for FUN_ the return_.
We can let, however, the function defintion see return_ only if we insert the function definition code into this expression.
This exactly #alexis_laz tried using string and eval.
Or you did this by manipulating environment variables.
We can safely achieve the insertion of literal code using substitute(expr, replacer_list) where expr is the code to be manipulated and replacer_list is the lookup table for the replacement of code.
By substitute(FUN) we take the literal code given for FUN= for lapply_ without evaluating it. This expression returns literal quoted code (better than the string in #alexis_laz's approach).
The big substitute command says: "Take the expression callCC(function(return_) lapply(lst_, FUN_)) and replace lst_ in this expression by the list given for coll and FUN_ by the literal quoted expression given for FUN.
This replaced expression is then evaluated in the parent environment (eval.parent()) meaning: the resulting expression replaces the lapply_() call and is executed exactly where it was placed.
Such use of eval.parent() (or eval( ... , envir=parent.frame())) is fool proof. (otherwise, tidyverse packages wouldn't be production level ...).
So in this way, you can generalize callCC() calls.
lapply_(1:1000, FUN=function(x) if (x==100) return_(x))
## [1] 100
I don't know if it can be of use, but:
find100 <- "function (val) { if (val == 100) SHORT_CIRCUIT(val) }"
callCC( function (SHORT_CIRCUIT) lapply(1:1000, eval(parse(text = find100))) )
#[1] 100

R warning() wrapper - raise to parent function

I have a wrapper around the in-built warning() function in R that basically calls warning(sprintf(...)):
warningf <- function(...)
warning(sprintf(...))
This is because I use warning(sprintf(...)) so often that I decided to make a function out of it (it's in a package I have of functions I use often).
I then use warningf when I write functions. i.e., instead of writing:
f <- function() {
# ... do stuff
warning(sprintf('I have %i bananas!',2))
# ... do stuff
}
I write:
f <- function() {
# ... do stuff
warningf('I have %i bananas!',2)
# ... do stuff
}
If I call the first f(), I get:
Warning message:
In f() : I have 2 bananas!
This is good - it tells me where the warning came from f() and what went wrong.
If I call the second f(), I get:
Warning message:
In warningf("I have %i bananas!",2) : I have 2 bananas!
This is not ideal - it tells me the warning was in the warningf function (of course, because it's the warningf function that calls warning, not f), masking the fact that it actually came from the f() function.
So my question is : Can I somehow "raise" the warning call so it displays the warning in f() message instead of the warning in warningf ?
One way of dealing with this is to get a list of the environments in your calling stack, and then pasting the name of the parent frame in your warning.
You do this with the function sys.call() which returns an item in the call stack. You want to extract the second from last element in this list, i.e. the parent to warningf:
warningf <- function(...){
parent.call <- sys.call(sys.nframe() - 1L)
warning(paste("In", deparse(parent.call), ":", sprintf(...)), call.=FALSE)
}
Now, if I run your function:
> f()
Warning message:
In f() : I have 2 bananas!
Later edit : deparse(parent.call) converts the call to a string in the case that the f() function had arguments, and shows the call as it was specified (ie including arguments etc).
I know it's old but, sys.call(sys.nframe() - 1L), or sys.call(-1),
returns a vector, with the function name and the argument.
If you use it inside paste() it will raise two warnings, one from the function and one from the argument.
The answer doesn't show because f() has no arguments.
sys.call(sys.nframe() - 1L)[1] does the trick.

Resources