Objects between functions within larger function in R (reproducible code) - r

This has to simple but it has me confused for the life of me.
I'm trying to get objects (numeric variables, data frames, etc.) created in a larger function into a smaller function within that larger function.
The output of the example functions below should be a data.frame with 20 rows, but I only get "test". Would someone please kindly explain this object function behavior in R and how to get created objects between the functions? Thanks in advance!
testfunload <- function(test){
###All my different tries to get test dataframe into testfunload function
txt <- get("test")
try1 <- get(test)
try2 <- get(paste0(txt))
try3 <- get(paste0(test))
try4 <- get(test)
try5 <- test
try6 <- get("test")
print(try1)
print(try2)
print(try3)
print(try4)
print(try5)
print(try6)
}
###Testing with test data.frame in local environment
testfunload2 <- function(){
test <- data.frame(i1=rep("i",20), i2=rep("I",20))
testfunload(test="test")
}
###Testing with test data.frame in local environment
testfunload2global <- function(){
test <<- data.frame(i1=rep("i",20), i2=rep("I",20))
testfunload(test="test")
}
###PROBLEM:Doesn't have data.frame only "test" as char
testfunload2()
[1] "test"
[1] "test"
[1] "test"
[1] "test"
[1] "test"
[1] "test"
###PROBLEM:Doesn't have data.frame only "test" as char
testfunload2global()
[1] "test"
[1] "test"
[1] "test"
[1] "test"
[1] "test"
[1] "test"
Desired output:
i1 i2
1 i I
2 i I
3 i I
4 i I
5 i I
6 i I
###on for 20 rows etc.

To me, it looks a bit like a complicated way to pass variables to a function. Maybe you can get what you want using the ... argument?
If you want to access the calling scope, though, you can look at parent.frame(). It gives you the environment in the scope where your function is called.
Consider this function:
f <- function() {
e <- parent.frame()
with(e, x + y)
}
It evaluates x + y in the calling scope -- you are getting dynamic rather than lexical scoping, in a way.
So with these functions, you are passing x and y to f without doing so explicitly.
g <- function(x, y) f()
h <- function(x) {
y <- 2
f()
}
They will work like this:
> g(1, 1)
[1] 2
> h(3)
[1] 5
It would be much simpler just to make x and y parameters of f, of course.

Related

Create an R function programmatically with non-fixed body

In a for loop I make a "string-formula" and allocate it to e.g. body1. And when I try to make a function with that body1 it fails... And I have no clue what I should try else...
This question How to create an R function programmatically? helped me a lot but sadly only quote is used to set the body...
I hope you have an idea how to work around with this issue.
And now my code:
A.m=matrix(c(3,4,2,2,1,1,1,3,2),ncol=3,byrow=TRUE)
for(i in 1:dim(A.m)[1]) {
body=character()
# here the string-formula emerges
for(l in 1:dim(A.m)[2]) {
body=paste0(body,"A.m[",i,",",l,"]","*x[",l,"]+")
}
# only the last plus-sign is cutted off
assign(paste0("body",i),substr(body,1,nchar(body)-1))
}
args=alist(x = )
# just for your convenience the console output
body1
## [1] "A.m[1,1]*x[1]+A.m[1,2]*x[2]+A.m[1,3]*x[3]"
# in this code-line I don't know how to pass body1 in feasible way
assign("Function_1", as.function(c(args, ???body1???), env = parent.frame())
And this is my aim:
Function_1(x=c(1,1,1))
## 9 # 3*1 + 4*1 + 2*1
Since you have a string, you need to parse that string. You can do
assign("Function_1",
as.function(c(args, parse(text=body1)[[1]])),
env = parent.frame())
Though I would strongly discourage the use of assign for filling your global environment with a bunch of variables with indexes in their name. In general that makes things much tougher to program with. It would be much easier to collect all your functions in a list. For example
funs <- lapply(1:dim(A.m)[1], function(i) {
body <- ""
for(l in 1:dim(A.m)[2]) {
body <- paste0(body,"A.m[",i,",",l,"]","*x[",l,"]+")
}
body <- substr(body,1,nchar(body)-1)
body <- parse(text=body)[[1]]
as.function(c(alist(x=), body), env=parent.frame())
})
And then you can call the different functions by extracting them with [[]]
funs[[1]](x=c(1,1,1))
# [1] 9
funs[[2]](x=c(1,1,1))
# [1] 4
Or you can ever call all the functions with an lapply
lapply(funs, function(f, ...) f(...), x=c(1,1,1))
# [[1]]
# [1] 9
# [[2]]
# [1] 4
# [[3]]
# [1] 6
Although if this is actually what your function is doing, there are easier ways to do this in R using matrix multiplication %*%. Your Function_1 is the same as A.m[1,] %*% c(1,1,1). You could make a generator funciton like
colmult <- function(mat, row) {
function(x) {
as.numeric(mat[row,] %*% x)
}
}
And then create the same list of functions with
funs <- lapply(1:3, function(i) colmult(A.m, i))
Then you don't need any string building or parsing which tends to be error prone.

R data.table: evaluating address of an object in parent frame

I'm trying to evaluate an expression containing an address of an object at a parent.frame scope, and am getting weird results:
test2 <- function(d) {
address.current <- address(d) # "0x5595b73aedf8"
address.at.caller <- eval(parse(text="address(df)")) # "0x5595b73aedf8"
address.at.caller2 <- do.call(address, args=list("df"), envir=parent.frame()) # problem: "0x5595b6d89de8"
}
test1 <- function(df) {
test2(df)
}
df <- data.frame(a=1:2)
test1(df)
Moreover, if you stop at a breakpoint inside test2 and re-evaluate the expression for address.at.caller2 you'd get non-repeating results:
Browse[2]> do.call(address, args=list("df"), envir=parent.frame())
[1] "0x5595b8c37d78"
Browse[2]> do.call(address, args=list("df"), envir=parent.frame())
[1] "0x5595b8cc74a8"
Browse[2]> do.call(address, args=list("df"), envir=parent.frame())
[1] "0x5595b8cd1348"
This seems to indicate that the result is an address of some temporary object. (Evaluate repeatedly address(2) for a different example).
Is something wrong with the expression do.call(address, args=list("df"), envir=parent.frame())?
Is there a different explanation for this behaviour?
Its not really clear what you are trying to do by using do.call. When you use it like you did, you gave it a variable ( a string) and you asked it for the address. the thing is that R automatically creates copies when you enter variables into functions. So when you gave args = list("df) what R did was create a copy of the string "df" within the do.call frame, and then it gave you the local address before closing the call. You should pass the variable you want to evaluate into the function, or alternatively have it sit on the global scope.
Interesting question.
You don't have to pass input variable really, or operate on the global scope. You can use a more robust alternative to do.call, the eval(as.call(.)).
test2 <- function(d) {
address.current <- address(d)
print(address.current)
address.at.caller <- eval(parse(text="address(df)"))
print(address.at.caller)
address.at.caller2 <- do.call(address, args=list("df"), envir=parent.frame())
print(address.at.caller2)
address.at.caller3 = eval.parent(as.call(list(quote(address), as.name("df"))))
print(address.at.caller3)
}
test1 <- function(df) {
test2(df)
}
df <- data.frame(a=1:2)
test1(df)
[1] "0x560d46e33cc0"
[1] "0x560d46e33cc0"
[1] "0x560d46e4a5f8"
[1] "0x560d46e33cc0"

Bring the objects produced by a R function to the main working environment

I am trying to inspect the internal objects produced by a R function such as the example below:
myfunction <- function(arg1, arg2){
sum.both <- arg1+arg2
diff.both <- arg1-arg2
return(diff.both)
}
I am aware that I can bring it to the working environment by modifying the function itself:
myfunction.mod <- function(arg1, arg2){
sum.both <- arg1+arg2
sum.both <<- sum.both
diff.both <- arg1-arg2
return(diff.both)
}
myfunction.mod(1,2)
By doing that I can see the sum.both object by typing ls() in the console. However, I am looking for a way to get such internal objects from any existing function. Therefore, I tried debug() and environment() without success. Any ideas or directions on how to obtain internal objects from a function would be appreciated.
I guess one easy way to modify an existing function is to use the trace() debugging tool. We can use that to insert code that will run at exit of a function to "leak" all the values from the function scope into the global scope. Here's such a function
make_leaky <- function(f) {
fn <- substitute(f)
invisible(trace(fn, print=FALSE, exit=quote(list2env(mget(ls()), globalenv()))))
}
Then we can test it with the following function
foo <- function(x, y) {
a <- x+7
b <- x*y
b/a
}
We will use ls() to see all the variables at each step
ls()
# [1] "foo" "make_leaky"
foo(5,2)
# [1] 0.8333333
ls() # NO NEW VARIABLES CREATED HERE
# [1] "foo" "make_leaky"
make_leaky(foo)
foo(5,2)
# [1] 0.8333333
ls() # ALL VARIABLES FROM FOO ARE NOW IN GLOBAL ENV
# [1] "a" "b" "foo" "make_leaky"
# [5] "x" "y"

Creating functions in a for loop with lists

I am scratching my head at the following problem:
I am creating two functions inside a for loop with parameters that depend on some dataframe. Each function is then put inside a list.
Printing the parameters inside the for loop shows that eachh function is well defined. Yet, when I use those outside of the loop, only the last parameters are used for both functions. The following example should make that clearer.
dt <- data.frame(color = c("red", "blue"),
a = c(3,9),
b = c(1.3, 1.8))
function_list <- list()
for (col in dt$color) {
a <- dt$a[dt$color == col]
b <- dt$b[dt$color == col]
foo <- function(x) {
a*x^b
}
print(paste(col, foo(1)))
function_list[[col]] <- foo
}
[1] "red 3"
[1] "blue 9"
function_list[["red"]](1)
[1] 9
function_list[["blue"]](1)
[1] 9
To note, this is inspired from the following question: R nested for loop to write multiple functions and plot them
The equivalent solution with assign and get works (my answer to the previous question).
The relevant values of a and b are those when you call the function and not when you define it. The way you create the list, they are taken from the global environment. The solution is to create closures. I'd use Map for this, but you can do the same with a for loop:
funs <- Map(function(a, b) function(x) a*x^b, a = dt$a, b = dt$b)
print(funs)
#[[1]]
#function (x)
#a * x^b
#<environment: 0x000000000a9a4298>
#
#[[2]]
#function (x)
#a * x^b
#<environment: 0x000000000a9a3728>
Notice the different environments.
environment(funs[[1]])$a
#[1] 3
environment(funs[[2]])$a
#[1] 9
funs[[1]](1)
#[1] 3
funs[[2]](1)
#[1] 9
Your confusion will be solved by going a bit deeper with Environments
Let's check why your code doesn't work. When I try to print(function_list), you can see that both of the functions stored will return a*x^b.
# Part 1 : Why it doesn't work
# --------------------------
print(function_list)
# $red
# function (x)
# {
# a * x^b
# }
#
# $blue
# function (x)
# {
# a * x^b
# }
If you try to remove a and re-run the function, an error will be returned .
rm(a)
function_list[['red']](1)
# Error in function_list[["red"]](1) : object 'a' not found
.
And now to how to make your code work:
There is more than one way to make it work, most of which will require either playing around with your environments or changing the data structure.
One way to manage your environments - in such way that it will keep your values and not search for the variable in the global environment - is returning a function from the function.
# Part 2 : How to make it work
# ----------------------------
function_list <- list()
for (col in dt$color) {
a <- dt$a[dt$color == col]
b <- dt$b[dt$color == col]
foo1 <- function(inner.a, inner.b) {
return(function(x) {inner.a*x^inner.b})
}
foo2 <- foo1(a,b)
print(paste(col, foo2(1)))
function_list[[col]] <- foo2
}
Now, if we check what's in the function_list, you can see that the functions are in two environments
print(function_list)
# $red
# function (x)
# {
# inner.a * x^inner.b
# }
# <environment: 0x186fb40>
#
# $blue
# function (x)
# {
# inner.a * x^inner.b
# }
# <environment: 0x2536438>
The output is also as expected. And even when we remove a, it will still work as expected.
function_list[['red']](1) # 3
function_list[['blue']](1) # 9
rm(a)
function_list[['red']](1) #[1] 3
I think that the for loop does not create new environments (you can check this by print(environment) within the loop), so the values of a and b are taken by foo in the global environment where they are 9 and 1.8, i.e. their last assigned values.

Compose functions with function operators does not work as expected

In the following example I created the add_timing function operator. The input is a function (say mean) and it returns a function that does the same as mean, but reports on how long it took for the function to complete. See the following example:
library(pryr)
add_timing = function(input_function, specific_info) {
if (missing(specific_info)) specific_info = function(l) 'That'
function(...) {
relevant_value = specific_info(list(...))
start_time = Sys.time()
res = input_function(...)
cat(sprintf('%s took', relevant_value), difftime(Sys.time(), start_time, units = 'secs'), 'sec', '\n')
res
}
}
timed_mean = add_timing(mean)
# > timed_mean(runif(10000000))
# That took 0.4284899 sec
# [1] 0.4999762
Next I tried to use pryr::compose to create the same timed_mean function (I like the syntax):
timed_mean_composed = pryr::compose(add_timing, mean)
But this does get me the required output:
# > timed_mean_composed(runif(100))
# function(...) {
# relevant_value = specific_info(list(...))
# start_time = Sys.time()
# res = input_function(...)
# cat(sprintf('%s took', relevant_value), difftime(Sys.time(), start_time, units = 'secs'), 'sec', '\n')
# res
# }
It seems that the compose operation does not lead to the add_timing function actually being executed. Only after calling the function, the new timed_mean_compose actually shows the correct function output.
Based on the following example from Advanced R by #HadleyWickham I expected this to work as I used it (see below for an excerpt):
dot_every <- function(n, f) {
i <- 1
function(...) {
if (i %% n == 0) cat(".")
i <<- i + 1
f(...)
}
}
download <- pryr::compose(
partial(dot_every, 10),
memoise,
partial(delay_by, 1),
download_file
)
Where the dot_every function operator is used in the same way I use add_timing above.
What am I missing?
The difference is that in your first attempt, you are calling
(add_timing(mean))(runif(1e7)
and with the compose syntax you are calling something more similar to
add_timing(mean(runif(1e7))
These are not exactly equivalent. Actually, the pryr compose function is really expanding the syntax to something more like
x <- runif(1e7)
x <- mean(x)
x <- add_timing(x)
Maybe looking at this will help
a <- function(x) {print(paste("a:", x));x}
b <- function(x) {print(paste("b:", x));x}
x <- pryr::compose(a,b)(print("c"))
# [1] "c"
# [1] "b: c"
# [1] "a: c"
Notice how a isn't called until after b. This means that a would have no way to time b. compose would not be an appropriate way to create a timer wrapper.
The issue is that pryr::compose is aimed at doing something completely different from what you're trying to do in your initial example. You want to create a function factory (called add_timing), which will take a function as input and return a new function as output that does the same thing as the input function but with an additional time printing. I would write that as follows:
add_timing <- function(FUN) { function(...) { print(system.time(r <- FUN(...))); r }}
mean(1:5)
# [1] 3
add_timing(mean)(1:5)
# user system elapsed
# 0 0 0
# [1] 3
The compose function, by contrast, returns a function that represents a series of functions to be evaluated in sequence. The examples in ? compose are helpful here. Here's an example that builds on that:
add1 <- function(x) x + 1
times2 <- function(x) x * 2
# the following two are identical:
add1(1)
# [1] 2
compose(add1)(1)
# [1] 2
# the following two are identical:
times2(1)
# [1] 2
compose(times2)(1)
# [1] 2
compose becomes useful for nesting, when the order of nesting is important:
add1(times2(2))
# [1] 5
compose(add1, times2)(2)
# [1] 5
times2(add1(2))
# [1] 6
compose(times2, add1)(2)
# [1] 6
This means that the reason your example does not work is because your functions are not actually nested in the way that compose is intended to work. In your example, you're asking system.time to, for example, calculate the time to evaluate 3 (the output of mean) rather than the time to evaluate mean(1:5).

Resources