Attempting Pass by reference in R - r

I want to emulate call by reference in R and in my search came across this link https://www.r-bloggers.com/call-by-reference-in-r/.
Using the strategy given in the link above I tried to create a function that would modify the integer vector passed to it as well as return the modified vector. Here's its implementation
library(purrr)
fun = function(top){
stopifnot(is_integer(top))
top1 <- top
top1 <- c(top1,4L)
eval.parent(substitute(top<-top1))
top1
}
When I create a variable and pass to this function, it works perfectly as shown
> k <- c(9L,5L)
> fun(k)
[1] 9 5 4
> k
[1] 9 5 4
But when I pass the integer vector directly, it throws an error:
> fun(c(3L,4L))
Error in c(3L, 4L) <- c(3L, 4L, 4L) :
target of assignment expands to non-language object
Is there a workaround for this situation, where if a vector is passed directly, then we only return the modified vector as the result ?
Any help would be appreciated...

There is no workaround for this. You've essentially created a function that takes a variable name as input and modifies that variable as a side effect of running. Because c(3L,4L) is not a variable name, the function cannot work as intended.
To be clear, what you have right now is not really pass-by-reference. Your function resembles it superficially, but is in fact using some workarounds to simply evaluate an expression in the function's parent environment, instead of its own. This type of "operation by side effect" is generally considered bad practice (such changes are hard to track and debug, and prone to error), and R is built to avoid them.
Pass-by-reference in R is generally not possible, nor have I found it necessary in over a decade of daily R use.

Related

use of $ and () in same syntax?

I'm certain there is a really basic answer to this, which is possibly why I'm finding it hard to actually search for and find an answer. But... can somebody please explain exactly what it means to combine $ and () in the same syntax in R?
For example from this vignette:
https://cran.r-project.org/web/packages/pivottabler/vignettes/v00-vignettes.html
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$renderPivot()
I never encountered this while learning R until now years later. I'm seeing it more and more lately but it is not intuitive to me?
$ is usually used when accessing sub-structures of objects in R like columns of a data frame e.g dataframe$column1, while () is usually used to enclose all arguments of a named function e.g rnorm(10,0,1)
What does it mean when they are used together? e.g. x$y(z)
The dollar is a generic operator used to extract or replace parts of recursive objects, such as lists and data frames.
A list is an object consisting of an ordered collection of objects (including other lists), perhaps of different types, said components.
Consider the following list:
L <- list(a = 1, f = function() message("hello"))
This is a list with two components: a and f.
The first is a number and the second is a function. By applying the $-operator, you extract the value of the component, which can also be reassigned:
L$a
# 1
L$a <- 2
L$a
# 2
In the case of the f component, because it is a function, you get its body:
L$f
# function() message("hello")
This is in line with each function identifier: its value is the function's body. It is not surprising that, applying the parentheses to the function's identifier, you execute the function, that is:
L$f()
# hello
This opens the doors to very powerful structures, where you can store both data and the functions to manipulate them.
This logic resembles the classes used in the OOP world. Of course, you need much more features, such instantiations, inheritance. Such mechanisms are provided, for example, by the R6 package, which you mention in your tag.
library(R6)
A <- R6Class("A", list(f=function() message("hello") ))
a <- A$new()
a$f()
# hello
A is an R6 class, so A$new() creates a new instance of the class, a, by means of the class function new. As you can see, this function is called using a syntax (and a logic) similar to L$f() above. The instance a inherits the class function f, said method here, and a$f() executes it.

Can someone please explain me this code? especially the role of "function x and [[x]]"?

This is the code in R and I'm having trouble understanding the role of function(x) and qdata[[x]] in this line of code. Can someone elaborate me this piece by piece? I didn't write this code. Thank you
outs=lapply(names(qdata[,12:35]), function(x)
hist(qdata[[x]],data=qdata,main="Histogram of Quality Trait",
xlab=as.character(x),las=1.5)$out)
This code generate a series of histograms, one for each of columns 12 to 35 of dataframe qdata. The lapply function iterates over the columns. At each iteraction, the name of the current column is passed as argument "x" to the anonymous function defined by "function(x)". The body of the function is a call to the hist() function, which creates the histogram. qdata[[x]] (where x is the name of a column) extracts the data from that column. I am actually confused by "data=qdata".
We don't have the data object named qdata so we cannot really be sure what will happen with this code. It appears that the author of this code is trying to pass the values of components named outs from function calls to hist. If qdata is an ordinary dataframe, then I suspect that this code will fail in that goal, because the hist function does not have an out component. (Look at the output of ?hist. When I run this with a simple dataframe, I do get histogram plots that appear in my interactive plotting device but I get NULL values for the outs components. Furthermore the 12 warnings are caused by the lack of a data parameter to hte hist function.
qdata <- data.frame(a=rnorm(10), b=rnorm(10))
outs=lapply(names(qdata), function(x)
hist(qdata[[x]],data=qdata,main="Histogram of Quality Trait",
xlab=as.character(x),las=1.5)$out)
#There were 12 warnings (use warnings() to see them)
> str(outs)
List of 2
$ : NULL
$ : NULL
So I think we need to be concerned about the level of R knowledge of the author of this code. It's possible I'm wrong about this presumption. The hist function is generic and it is possible that some unreferenced package has a function designed to handle a data object and retrun an outs value when delivered a vector having a particular class. In a typical starting situation with only the base packages loaded however, there are only three hist.* functions:
methods(hist)
#[1] hist.Date* hist.default hist.POSIXt*
#see '?methods' for accessing help and source code
As far as the questions about the role of function and [[x]]: the keyword function returns a language object that can receive parameter values and then do operations and finally return results. In this case the names get passed to the anonymous function and become, each in turn, the local name, x and the that value is used by the '[['-function to look-up the column in what I am presuming is the ‘qdata’-dataframe.

R super assignment vector

I have a function in which I use the superassingment operator to update a variable in the global environment. This works fine as long as it is a single value e.g.
a <<- 3
However I get errors with subsets of data frames and data tables e.g.
a <- c(1,2,3)
a[3] <<- 4
Error in a[3] <<- 4 : object 'a' not found
Any idea why this is and how to solve it?
Thanks!
The superassignment operator and other scope-breaking techniques should be avoided if at all possible, in particular because it makes for unclear code and confusing situations like these. But if you really, truly had to assign values to a variable that is out of scope, you could use standard assignment inside eval:
a <- c(1,2,3)
eval(a[3] <- 4, envir = -1)
a
[1] 1 2 4
To generalize this further (if performing the assignment inside a function), you may need to use <<- inside eval anyway.
While changing variables out of scope is still a bad idea, using eval at least makes the operation more explicit, since you have to specify the environment in which the expression is to be evaluated.
All that said, scope-breaking assignments are never necessary, per se, and you should perhaps find a way to write your script such that this is not relied on.

returning functions in R - when does the binding occur?

As in other functional languages, returning a function is a common case in R. for example, after training a model you'd like to return a "predictor" object, which is essentially a function, that given new data, returns predictions. There are other cases when this is useful, of course.
My question is when does the binding (e.g. evaluation) of values within the returned function occur.
As a simple example, suppose I want to have a list of three functions, each is slightly different based on a parameter whose value I set at the time of the creation of the function. Here is a simple code for this:
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) x+i
So now I have three functions. Ideally, the first one returns x+1, the second computes x+2 and the third computes x+3
so I would expect:
function.list[[1]] (3) = 4
function.list[[2]] (3) = 5
etc.
Unfortunately, this doesn't happen and all the functions in the list above compute the same x+3. my question is why? why does the binding of the value of i is so late, and hence the same for all the functions in the list? How can I work around this?
EDIT:
rawr's link to a similar question was insightful, and I thought it solved the problem. Here is the link:
Explain a lazy evaluation quirk
however, I checked the code I gave above, with the fix suggested there, and it still doesn't work. Certainly, I miss something very basic here. can anyone tell me what is it? here is the "fixed" code (that still doesn't work)
function.list = list()
for (i in 1:3) { force(i); function.list[[i]] = function(x) x+i}
Still function.list[[1]] (3) gives 6 and not 4 as expected.
I also tried the following (e.g. putting the force() inside the function)
function.list = list()
for (i in 1:3) function.list[[i]] = function(x) {force(i);x+i}
what's going on?
Here's a solution with a for loop, using R 3.1:
> makeadd=function(i){force(i);function(x){x+i}}
> for (i in 1:3) { function.list[[i]] = makeadd(i)}
> rm(i) # not necessary but causes errors if we're actually using that `i`
> function.list[[1]](1)
[1] 2
> function.list[[2]](1)
[1] 3
The makeadd function creates the adding function in a context with a local i, which is why this works. It would be interesting to know if this works without the force in R 3.2. I always use the force, Luke....

R apply with custom functions

I'm having trouble applying a custom function in R. The basic setup is I have a bunch of points, I want to drop a grid over top and get the max z value from each cell.
The code I'm trying is below. The results I'm looking for would return myGrid$z=c(5,10,na). The na could be a different value as well as long as I could filter it out later. I'm getting an error at the apply stage
I believe there is an error in how I'm using apply, but I just haven't been able to get my head wrapped around apply.
thanks,
Gordon
myPoints<-data.frame(x=c(0.7,0.9,2),y=c(0.5,0.7,3), z=c(5,3,10))
myGrid<-data.frame(x=c(0.5,2,4),y=c(0.5,3,10))
grid_spacing = 1
get_max_z<-function(x,y) {
z<-max(myPoints$z[myPoints$x > (x-grid_spacing/2)
& myPoints$x <= (x+grid_spacing/2)
& myPoints$y > (y-grid_spacing/2)
& myPoints$y <= (y+grid_spacing/2)])
return(z)
}
myGrid$z<-apply(myGrid,1,get_max_z(x,y),x=myGrid$x,y=myGrid$y)
Edited to include the return(z) line I left out. added $y to line of custom function above return.
First of all I would recommend you to always boil down a question to its core instead of just posting code. But I think I know what your problem is:
> df <- data.frame(x = c(1,2,3), y = c(2,1,5))
> f <- function(x,y) {x+y}
> apply(df,1,function(d)f(d["x"],d["y"]))
[1] 3 3 8
apply(df,1,.) will traverse df row wise and hand the current row as an argument to the provided function. This row is a vector and passed into the anonymous function via the only available argument d. Now you can access the elements of the vector and hand them further down to your custom function f taking two parameters.
I think if you get this small piece of code then you know how to adjust in your case.
UPDATE:
Essentially you make two mistakes:
you hand a function call instead of a function to apply.
function call: get_max_z(x,y)
function: function(x,y)get_max_z(x,y)
you misinterpreted the meaning of "..." in the manual to apply as the way to hand over the arguments. But actually this is just the way to pass additional arguments independent of the traversed data object.

Resources