Identifying if/else statements in R and extracting expressions from them - r

I would like to extract expr_a, expr_b, expr_c from the if/else block in the following function:
test <- function() {
result <- if (expr_a)
expr_b
else
expr_c
return(result)
}
for the purpose of dead-code elimination (I've seen the rco library but it currently doesn't support all the cases that I'm working with).
I would like to be able to then replace the entire loop with expr_b if eval(expr_a) == TRUE or expr_c otherwise.
The issue with regex-based approaches to this problem is that there are so many possible variations on this, and I'm not sure how to capture them all. For example, the code above is valid R code for one line expressions. Multi-line expressions surrounded by brackets can also be written e.g.:
else
{
# ...
}
Is there any way to identify and capture every possible if/else statement programmatically in R?
An example of what my real starting point looks like:
test <- function() {
my_var <- list(my_bool = FALSE)
my_result <- if(my_var$my_bool) 1
else my_var$my_bool + 2
return(my_result)
}
The ideal output would be: list(expression(my_var$my_bool), expression(1), expression(my_var$my_bool + 2))

Figured it out! as.list can be used to break up calls into the syntax tree.
For example:
example <- quote(
if (object) {
print(result_true)
} else {
print(result_false)
}
)
as.list(example)
# [[1]]
# `if`
#
# [[2]]
# object
#
# [[3]]
# {
# print(result_true)
# }
#
# [[4]]
# {
# print(result_false)
# }

Related

Create an R function programmatically with non-fixed body

In a for loop I make a "string-formula" and allocate it to e.g. body1. And when I try to make a function with that body1 it fails... And I have no clue what I should try else...
This question How to create an R function programmatically? helped me a lot but sadly only quote is used to set the body...
I hope you have an idea how to work around with this issue.
And now my code:
A.m=matrix(c(3,4,2,2,1,1,1,3,2),ncol=3,byrow=TRUE)
for(i in 1:dim(A.m)[1]) {
body=character()
# here the string-formula emerges
for(l in 1:dim(A.m)[2]) {
body=paste0(body,"A.m[",i,",",l,"]","*x[",l,"]+")
}
# only the last plus-sign is cutted off
assign(paste0("body",i),substr(body,1,nchar(body)-1))
}
args=alist(x = )
# just for your convenience the console output
body1
## [1] "A.m[1,1]*x[1]+A.m[1,2]*x[2]+A.m[1,3]*x[3]"
# in this code-line I don't know how to pass body1 in feasible way
assign("Function_1", as.function(c(args, ???body1???), env = parent.frame())
And this is my aim:
Function_1(x=c(1,1,1))
## 9 # 3*1 + 4*1 + 2*1
Since you have a string, you need to parse that string. You can do
assign("Function_1",
as.function(c(args, parse(text=body1)[[1]])),
env = parent.frame())
Though I would strongly discourage the use of assign for filling your global environment with a bunch of variables with indexes in their name. In general that makes things much tougher to program with. It would be much easier to collect all your functions in a list. For example
funs <- lapply(1:dim(A.m)[1], function(i) {
body <- ""
for(l in 1:dim(A.m)[2]) {
body <- paste0(body,"A.m[",i,",",l,"]","*x[",l,"]+")
}
body <- substr(body,1,nchar(body)-1)
body <- parse(text=body)[[1]]
as.function(c(alist(x=), body), env=parent.frame())
})
And then you can call the different functions by extracting them with [[]]
funs[[1]](x=c(1,1,1))
# [1] 9
funs[[2]](x=c(1,1,1))
# [1] 4
Or you can ever call all the functions with an lapply
lapply(funs, function(f, ...) f(...), x=c(1,1,1))
# [[1]]
# [1] 9
# [[2]]
# [1] 4
# [[3]]
# [1] 6
Although if this is actually what your function is doing, there are easier ways to do this in R using matrix multiplication %*%. Your Function_1 is the same as A.m[1,] %*% c(1,1,1). You could make a generator funciton like
colmult <- function(mat, row) {
function(x) {
as.numeric(mat[row,] %*% x)
}
}
And then create the same list of functions with
funs <- lapply(1:3, function(i) colmult(A.m, i))
Then you don't need any string building or parsing which tends to be error prone.

In R, how do I define a function which is equivalent to `deparse(substitute(x))`?

I want to write a function in R which grabs the name of a variable from the context of its caller's caller. I think the problem I have is best understood by asking how to compose deparse and substitute. You can see that a naive composition does not work:
# a compose operator
> `%c%` = function(x,y)function(...)x(y(...))
# a naive attempt to combine deparse and substitute
> desub = deparse %c% substitute
> f=function(foo) { message(desub(foo)) }
> f(log)
foo
# this is how it is supposed to work
> g=function(foo) { message(deparse(substitute(foo))) }
> g(log)
log
I also tried a couple of variations involving eval.parent but with no luck. Any help is appreciated.
Clarification: I'm not looking for a synonym for deparse(substitute(...)), e.g. match.call()[[2]] - what I'm looking for is a way to define a function
desub = function(foo) {
...
# What goes here?
}
such that the definition of f above produces the same answer as g. It should look like this:
> f=function(foo) { message(desub(foo)) }
> f(log)
log
Perhaps match.call could be of use in the body of desub above, but I'd like to know how. Thanks!
As you surmised, this is an issue with environments. The reason why the function f does not give log when you call f(log), is that the environment in which substitute is called, namely the evaluation environment of desub, does not contain a binding to log.
The remedy is to evaluate the call to substitute in the proper environment, and modify desub accordingly:
desub <- function(x, env = parent.frame()) {
deparse(eval(substitute(substitute(x)), envir = env))
}
Now f does what it was intended to do:
f(log)
#> log
Thanks to #egnha and #akrun for the brave attempts. After playing around a bit I found a solution that works.
This fragment:
desub <- function(y) {
e1=substitute(y)
e2=do.call(substitute,list(e1), env=parent.frame())
deparse(e2)
}
gives:
> f <- function(x) message(desub(x))
> f(log)
log
Update:
With help from Mark Bravington on the R-devel list, I was able to generalize this to multiple frames. I thought I should post it here, because it's a bit more useful than the above, and because there was a tricky workaround involving (possibly buggy?) behavior in parent.frame().
# desub(v,0)=="v"
# desub(v,1)==deparse(substitute(v))
# desub(v,2)==name of v in grandparent's frame
# etc.
desub = function(y,n=1) {
env=environment();
for(i in 0:n) {
y = do.call(substitute, list(substitute(y)), env=env)
env = do.call(my_mvb_parent, list(), env=env)
}
deparse(y)
}
# helper:
#
# - using mvb.parent.frame fixes problems with capture.output and
# weird cycling behavior in the built-in parent.frame
#
# - this wrapper makes mvb.parent.frame not throw an error when we get
# to globalenv()
my_mvb_parent=function() {
library(mvbutils)
tryCatch(
mvb.parent.frame(2),
error=function(e) { globalenv()})
}
if(1) {
# example code
g2=function(t) {
for(i in 0:5) {
res=desub(t,i);
print(res);
res1=capture.output(desub(t,i))
stopifnot(capture.output(res)==res1)
}
}
g1=function(z) g2(z)
g=function(y) g1(y)
g(log)
# prints:
## [1] "t"
## [1] "z"
## [1] "y"
## [1] "log"
## [1] "log"
## [1] "log"
}

Changing a list inside a function

I would like to do perform analysis of structure
a <- list()
replicate_letter <- function(letter) {
return(data.frame(first_letter = rep(letter, 10),
second_letter = rep(letter, 10)))
}
get_letter <- function(letter) {
if (is.null(a[[letter]])) {
a[[letter]] <- replicate_letter(letter)
}
# Do further analysis, plotting,....
}
and perform it often just by calling get_letter. However, this does not work - function runs, but a is not altered.
I have figured out that this is due to attempt to change list inside the function as this
a <- list()
replicate_letter <- function(letter) {
return(data.frame(first_letter = rep(letter, 10),
second_letter = rep(letter, 10)))
}
for (letter in letters[1:3]) {
if (is.null(a[[letter]])) {
a[[letter]] <- replicate_letter(letter)
}
}
a
runs alright. How do I need to change function get_letter to make it work? Is it possible?
The problem is that a copy of the list is modified in the function's environment, and that copy is destroyed when the function exits. R differs in this way from many other languages, in that the global environment is not (by default) modified within functions.
You should have your function return the new list:
get_letter <- function(a, letter) {
if (is.null(a[[letter]])) {
a[[letter]] <- replicate_letter(letter)
}
# Do further analysis, plotting,....
return(a)
}
Calling:
a <- get_letter(a, 'c')
I'm not quite sure if I understand your situation, so perhaps take this with a grain of salt. But be aware that you are assigning the output of a[[letter]] <- replicate_letter(letter) only within the function (which then disappears afterwards). You could try ?<<- instead. Consider:
replicate_letter <- function(letter) {
return(data.frame(first_letter = rep(letter, 10),
second_letter = rep(letter, 10)))
}
get_letter <- function(letter) {
if (is.null(a[[letter]])) {
a[[letter]] <<- replicate_letter(letter)
}
# Do further analysis, plotting,....
}
get_letter("a")
a
# $a
# first_letter second_letter
# 1 a a
# 2 a a
# 3 a a
# 4 a a
# 5 a a
# 6 a a
# 7 a a
# 8 a a
# 9 a a
# 10 a a
The help for <<- reads (in part):
The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.
The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment. Note that their semantics differ from that in the S language, but are useful in conjunction with the scoping rules of R. See ‘The R Language Definition’ manual for further details and examples.
(The boldface is mine.)

R: on.exit - use returned value without knowing its name

I have below function. I cannot alter the function in any way except the first block of code in the function.
In this simple example I want to display apply some function on returning object.
The point is the name of variable returned by function may vary and I'm not able to guess it.
Obviously I also cannot wrap the f function into { x <- f(); myfun(x); x }.
The below .Last.value in my on.exit call represents the value to be returned by f function.
f <- function(param){
# the only code I know - start
on.exit(if("character" %in% class(.Last.value)) message(print(.Last.value)) else message(class(.Last.value)))
# the only code I know - end
# real processing of f()
a <- "aaa"
"somethiiiing"
if(param==1L) return(a)
b <- 5L
"somethiiiing"
if(param==2L) return(b)
"somethiiiing"
return(32)
}
f(1L)
# function
# [1] "aaa"
f(2L)
# aaa
# [1] 5
f(3L)
# integer
# [1] 32
Above code with .Last.value seems to be working with lag (so in fact not working) and also the .Last.value is probably not the way to go as I want to use the value few times like if(fun0(x)) fun1(x) else fun2(x), and because returned value might be a big object, copy it on the side is also bad approach.
Any way to use on.exit or any other function which can help me to run my function on the f function results without knowing result variable name?
In a similar way to how you are modifying the function, you could easily wrap it as well. Here's a reproducible example.
library(data.table)
append.log<-function(x) {
cat(paste("value:",x,"\n"))
}
idx.dt <- data.table:::`[.data.table`
environment(idx.dt)<-asNamespace("data.table")
idx.wrap <- function(...) {
x<-do.call(idx.dt, as.list(substitute(...())), envir=parent.frame())
append.log(if(is(x, "data.table")) {
nrow(x)
} else { NA })
x
}
environment(idx.wrap)<-asNamespace("data.table")
(unlockBinding)("[.data.table",asNamespace("data.table"))
assign("[.data.table",idx.wrap,envir=asNamespace("data.table"),inherits=FALSE)
dt<-data.table(a=1:10, b=seq(2, 20, by=2), c=letters[1:10])
dt[a%%2==0]
Since R 3.2.0 it is fully possible, thanks to new function returnValue.
Working example below.
f <- function(x, err = FALSE){
pt <- proc.time()[[3L]]
on.exit(message(paste("proc.time:",round(proc.time()[[3L]]-pt,4),"\nnrow:",as.integer(nrow(returnValue()))[1L])))
Sys.sleep(0.001)
if(err) stop("some error")
return(x)
}
dt <- data.frame(a = 1:5, b = letters[1:5])
f(dt)
f(dt, err=T)
f(dt)
f(dt[dt$a %in% 2:3 & dt$b %in% c("c","d"),])

do-while loop in R

I was wondering about how to write do-while-style loop?
I found this post:
you can use repeat{} and check conditions whereever using if() and
exit the loop with the "break" control word.
I am not sure what it exactly means. Can someone please elaborate if you understand it and/or if you have a different solution?
Pretty self explanatory.
repeat{
statements...
if(condition){
break
}
}
Or something like that I would think. To get the effect of the do while loop, simply check for your condition at the end of the group of statements.
See ?Control or the R Language Definition:
> y=0
> while(y <5){ print( y<-y+1) }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
So do_while does not exist as a separate construct in R, but you can fake it with:
repeat( { expressions}; if (! end_cond_expr ) {break} )
If you want to see the help page you cannot type ?while or ?repeat at the console but rather need to use ?'repeat' or ?'while'. All the "control-constructs" including if are on the same page and all need character quoting after the "?" so the interpreter doesn't see them as incomplete code and give you a continuation "+".
Building on the other answers, I wanted to share an example of using the while loop construct to achieve a do-while behaviour. By using a simple boolean variable in the while condition (initialized to TRUE), and then checking our actual condition later in the if statement. One could also use a break keyword instead of the continue <- FALSE inside the if statement (probably more efficient).
df <- data.frame(X=c(), R=c())
x <- x0
continue <- TRUE
while(continue)
{
xi <- (11 * x) %% 16
df <- rbind(df, data.frame(X=x, R=xi))
x <- xi
if(xi == x0)
{
continue <- FALSE
}
}
Noticing that user 42-'s perfect approach {
* "do while" = "repeat until not"
* The code equivalence:
do while (condition) # in other language
..statements..
endo
repeat{ # in R
..statements..
if(! condition){ break } # Negation is crucial here!
}
} did not receive enough attention from the others, I'll emphasize and bring forward his approach via a concrete example. If one does not negate the condition in do-while (via ! or by taking negation), then distorted situations (1. value persistence 2. infinite loop) exist depending on the course of the code.
In Gauss:
proc(0)=printvalues(y);
DO WHILE y < 5;
y+1;
y=y+1;
ENDO;
ENDP;
printvalues(0); # run selected code via F4 to get the following #
1.0000000
2.0000000
3.0000000
4.0000000
5.0000000
In R:
printvalues <- function(y) {
repeat {
y=y+1;
print(y)
if (! (y < 5) ) {break} # Negation is crucial here!
}
}
printvalues(0)
# [1] 1
# [1] 2
# [1] 3
# [1] 4
# [1] 5
I still insist that without the negation of the condition in do-while, Salcedo's answer is wrong. One can check this via removing negation symbol in the above code.

Resources