What factors should I consider when deciding whether or not to remove a variable that will not be used again in a function?
Here's a noddy example:
DivideByLower <- function (a, b) {
if (a > b) {
tmp <- a
a <- b
b <- tmp
remove(tmp) # When should I include this line?
}
# Return:
a / b
}
I understand that tmp will be removed when the function finishes executing, but should I ever be concerned about removing it earlier?
From Hadley Wickham's advanced R :
In some languages, you have to explicitly delete unused objects for
their memory to be returned. R uses an alternative approach: garbage
collection (or GC for short). GC automatically releases memory when an
object is no longer used. It does this by tracking how many names
point to each object, and when there are no names pointing to an
object, it deletes that object.
In the case you're describing garbage collection will release the memory.
In case the output of your function is another function, in which case Hadley names these functions respectively the function factory and the manufactured function, the variables created in the body of the function factory will be available in the enclosing environment of the manufactured function, and memory won't be freed.
More info, still in Hadley's book, can be found in the chapter about function factories.
function_factory <- function(x){
force(x)
y <- "bar"
fun <- function(z){
sprintf("x, y, and z are all accessible and their values are '%s', '%s', and '%s'",
x, y, z)
}
fun
}
manufactured_function <- function_factory("foo")
manufactured_function("baz")
#> [1] "x, y, and z are all accessible and their values are 'foo', 'bar', and 'baz'"
Created on 2019-07-08 by the reprex package (v0.3.0)
In this case, if you want to control which variables are available in the enclosing environment, or be sure you don't clutter your memory, you might want to remove unnecessary objects, either by using rm / remove as you did, or as I tend to prefer, wrapped in an on.exit statement.
Another case in which I might use rm is if I want to access variables from a parent environment without risk of them being overriden inside of the function, but in that case it's often possible and cleaner to use eval.parent.
y <- 2
z <- 3
test0 <- function(x, var){
y <- 1
x + eval(substitute(var))
}
# opps, the value of y is the one defined in the body
test0(0, y)
#> [1] 1
test0(0, z)
#> [1] 3
# but it will work using eval.parent :
test1 <- function(x, var){
y <- 1
x + eval.parent(substitute(var))
}
test1(0, y)
#> [1] 2
test1(0, z)
#> [1] 3
# in some cases (better avoided), it can be easier/quick and dirty to do something like :
test2 <- function(x, var){
y <- 1
# whatever code using y
rm(y)
x + eval(substitute(var))
}
test2(0, y)
#> [1] 2
test2(0, z)
#> [1] 3
Created on 2019-07-08 by the reprex package (v0.3.0)
Related
We can use purrr::partial to create partial functions:
f <- function(x, y) {
print(x)
print(y)
return(invisible())
}
ff <- purrr::partial(f, y = 1)
ff(2)
#> [1] 2
#> [1] 1
Created on 2020-02-19 by the reprex package (v0.3.0)
This can often be quite useful, but has the unfortunate side-effect that the partialized function loses it's signature, which is replaced with an elipsis:
ff
#> <partialised>
#> function (...)
#> f(y = 1, ...)
While programatically irrelevant, this leads to worse code legibility during development, where RStudio's "intellisense" can no longer aid us in remembering the names and/or order of arguments. So is there some other means of partializing which keeps the original signature (minus the partialized-away arguments), as below?
ff
#> <partialised>
#> function (x)
#> f(y = 1, x)
Now, obviously this can be done manually, by defining a new function ff which is simply a wrapper around f with the desired arguments.
ff <- function(x) f(x, y = 1)
But this means any modifications to the signature of f need to be replicated to ff. So is there a "cleaner" way of partializing while keeping the signature?
One option is to use rlang::fn_fmls() (or base::formals() equivalent) to explicitly give default values to the function arguments:
# If desired, create a copy of the function first: ff <- f
rlang::fn_fmls(f) <- purrr::list_modify( rlang::fn_fmls(f), y=1 )
args(f)
# function (x, y = 1)
f(2)
# [1] 2
# [1] 1
How R interpret line : arg.list <- list(x, y) in below definition of function, does it copy x and y into arg.list object when execution happen or they are passed by reference ?
fplot <- function(x, y, add=FALSE){
arg.list <- list(x, y)
if(!add){
plot(arg.list))
}else{
lines(arg.list)
}
}
The variables are embedded into the list by reference (at least if you use vectors).
Proof:
library(pryr)
x <- 1:100
y <- 201:200
arg.list <- list(x,y)
al.x <- arg.list[[1]]
al.y <- arg.list[[2]]
Now look at the memory addresses (they are the same):
> address(x)
[1] "0x37598c0"
> address(y)
[1] "0x40fd6f8"
> address(al.x)
[1] "0x37598c0"
> address(al.y)
[1] "0x40fd6f8"
If you change one item a copy will be created ("copy on modification"):
> x[1]=42
> address(x)
[1] "0x417a470"
> al.x <- arg.list[[1]]
> address(al.x)
[1] "0x37598c0"
Edit:
As #HongOoi said: R semantically never uses references (except for objects in the environment class) but copies for variables. It "is clever enough to avoid copies until they are really required" ("copy on [first] modification"). Function parameters are passed "by value" semantically (even though references are used until a modification occurs).
The semantics of R is that function arguments are always passed by value. The underlying implementation may not necessarily make new copies of the arguments, so as to save memory. But your function will behave as if it has brand-new copies to work with.
This means you don't have to worry about changing a variable outside a function because you changed it inside:
x <- 1
f <- function(z) {
z <- z + 1
z
}
y <- f(x)
print(y) # y now contains 2
print(x) # but x still contains 1
If R was pass-by-reference, then modifying the argument of f would also modify the variable that was passed in. This doesn't happen.
I searched for a reference to learn about replacement functions in R, but I haven't found any yet. I'm trying to understand the concept of the replacement functions in R. I have the code below but I don't understand it:
"cutoff<-" <- function(x, value){
x[x > value] <- Inf
x
}
and then we call cutoff with:
cutoff(x) <- 65
Could anyone explain what a replacement function is in R?
When you call
cutoff(x) <- 65
you are in effect calling
x <- "cutoff<-"(x = x, value = 65)
The name of the function has to be quoted as it is a syntactically valid but non-standard name and the parser would interpret <- as the operator not as part of the function name if it weren't quoted.
"cutoff<-"() is just like any other function (albeit with a weird name); it makes a change to its input argument on the basis of value (in this case it is setting any value in x greater than 65 to Inf (infinite)).
The magic is really being done when you call the function like this
cutoff(x) <- 65
because R is parsing that and pulling out the various bits to make the real call shown above.
More generically we have
FUN(obj) <- value
R finds function "FUN<-"() and sets up the call by passing obj and value into "FUN<-"() and arranges for the result of "FUN<-"() to be assigned back to obj, hence it calls:
obj <- "FUN<-"(obj, value)
A useful reference for this information is the R Language Definition Section 3.4.4: Subset assignment ; the discussion is a bit oblique, but seems to be the most official reference there is (replacement functions are mentioned in passing in the R FAQ (differences between R and S-PLUS), and in the R language reference (various technical issues), but I haven't found any further discussion in official documentation).
Gavin provides an excellent discussion of the interpretation of the replacement function. I wanted to provide a reference since you also asked for that: R Language Definition Section 3.4.4: Subset assignment.
As a complement to the accepted answer I would like to note that replacement functions can be defined also for non standard functions, namely operators (see ?Syntax) and control flow constructs. (see ?Control).
Note also that it is perfectly acceptable to design a generic and associated methods for replacement functions.
operators
When defining a new class it is common to define S3 methods for $<-, [[<- and [<-, some examples are data.table:::`$<-.data.table`, data.table:::`[<-.data.table`, or tibble:::`$.tbl_df`.
However for any other operator we can write a replacement function, some examples :
`!<-` <- function(x, value) !value
x <- NULL # x needs to exist before replacement functions are used!
!x <- TRUE
x
#> [1] FALSE
`==<-` <- function(e1, e2, value) replace(e1, e1 == e2, value)
x <- 1:3
x == 2 <- 200
x
#> [1] 1 200 3
`(<-` <- function(x, value) sapply(x, value, USE.NAMES = FALSE)
x <- c("foo", "bar")
(x) <- toupper
x
#> [1] "FOO" "BAR"
`%chrtr%<-` <- function(e1, e2, value) {
chartr(e2, value, e1)
}
x <- "woot"
x %chrtr% "o" <- "a"
x
#> [1] "waat"
we can even define <-<-, but the parser will prevent its usage if we call x <- y <- z, so we need to use the left to right assignment symbol
`<-<-` <- function(e1, e2, value){
paste(e2, e1, value)
}
x <- "b"
"a" -> x <- "c"
x
#> [1] "a b c"
Fun fact, <<- can have a double role
x <- 1:3
x < 2 <- NA # this fails but `<<-` was called!
#> Error in x < 2 <- NA: incorrect number of arguments to "<<-"
# ok let's define it then!
`<<-` <- function(x, y, value){
if (missing(value)) {
eval.parent(substitute(.Primitive("<<-")(x, y)))
} else {
replace(x, x < y, value)
}
}
x < 2 <- NA
x
#> [1] NA 2 3
x <<- "still works"
x
#> [1] "still works"
control flow constructs
These are in practice seldom encountered (in fact I'm responsible for the only practical use I know, in defining for<- for my package pbfor), but R is flexible enough, or crazy enough, to allow us to define them. However to actually use them, due to the way control flow constructs are parsed, we need to use the left to right assignment ->.
`repeat<-` <- function(x, value) replicate(value, x)
x <- "foo"
3 -> repeat x
x
#> [1] "foo" "foo" "foo"
function<-
function<- can be defined in principle but to the extent of my knowledge we can't do anything with it.
`function<-` <- function(x,value){NULL}
3 -> function(arg) {}
#> Error in function(arg) {: target of assignment expands to non-language object
Remember, in R everything operation is a function call (therefore also the assignment operations) and everything that exists is an object.
Replacement functions act as if they modify their arguments in place such as in
colnames(d) <- c("Input", "Output")
They have the identifier <- at the end of their name and return a modified copy of the argument object (non-primitive replacement functions) or the same object (primitive replacement functions)
At the R prompt, the following will not work:
> `second` <- function(x, value) {
+ x[2] <- value
+ x
+ }
> x <- 1:10
> x
[1] 1 2 3 4 5 6 7 8 9 10
> second(x) <- 9
Error in second(x) <- 9: couldn't find function "second<-"
As you can see, R is searching the environment not for second but for second<-.
So lets do the same thing but using such a function identifier instead:
> `second<-` <- function(x, value) {
+ x[2] <- value
+ x
+ }
Now, the assignment at the second position of the vector works:
> second(x) <- 9
> x
[1] 1 9 3 4 5 6 7 8 9 10
I also wrote a simple script to list all replacement functions in R base package, find it here.
I have defined a function called once as follows:
once <- function(x, value) {
xname <- deparse(substitute(x))
if(!exists(xname)) {
assign(xname, value, env=parent.frame())
}
invisible()
}
The idea is that value is time-consuming to evaluate, and I only want to assign it to x the first time I run a script.
> z
Error: object 'z' not found
> once(z, 3)
> z
[1] 3
I'd really like the usage to be once(x) <- value rather than once(x, value), but if I write a function once<- it gets upset that the variable doesn't exist:
> once(z) <- 3
Error in once(z) <- 3 : object 'z' not found
Does anyone have a way around this?
ps: is there a name to describe functions like once<- or in general f<-?
If you are willing to modify your requirements slightly to use square brackets rather than parentheses then you could do this:
once <- structure(NA, class = "once")
"[<-.once" <- function(once, x, value) {
xname <- deparse(substitute(x))
pf <- parent.frame()
if (!exists(xname, pf)) assign(xname, value, pf)
once
}
# assigns 3 to x (assuming x does not currently exist)
once[x] <- 3
x # 3
# skips assignment (since x now exists)
once[x] <- 4
x # 3
As per item 3.4.4 in the R Language Reference, something like a names replacement is evaluated like this:
`*tmp*` <- x
x <- "names<-"(`*tmp*`, value=c("a","b"))
rm(`*tmp*`)
This is bad news for your requirement, because the assignment will fail on the first line (as x is not found), and even if it would work, your deparse(substitute) call will never evaluate to what you want it to.
Sorry to disappoint you
I can create a compose operator in R:
`%c%` = function(x,y)function(...)x(y(...))
To be used like this:
> numericNull = is.null %c% numeric
> numericNull(myVec)
[2] TRUE FALSE
but I would like to know if there is an official set of functions to do this kind of thing and other operations such as currying in R. Largely this is to reduce the number of brackets, function keywords etc in my code.
My curry function:
> curry=function(...){
z1=z0=substitute(...);z1[1]=call("list");
function(...){do.call(as.character(z0[[1]]),
as.list(c(eval(z1),list(...))))}}
> p = curry(paste(collapse=""))
> p(letters[1:10])
[1] "abcdefghij"
This is especially nice for e.g. aggregate:
> df = data.frame(l=sample(1:3,10,rep=TRUE), t=letters[1:10])
> aggregate(df$t,df["l"],curry(paste(collapse="")) %c% toupper)
l x
1 1 ADG
2 2 BCH
3 3 EFIJ
Which I find much more elegant and editable than:
> aggregate(df$t, df["l"], function(x)paste(collapse="",toupper(x)))
l x
1 1 ADG
2 2 BCH
3 3 EFIJ
Basically I want to know - has this already been done for R?
Both of these functions actually exist in the roxygen package (see the source code here) from Peter Danenberg (was originally based on Byron Ellis's solution on R-Help):
Curry <- function(FUN,...) {
.orig = list(...);
function(...) do.call(FUN,c(.orig,list(...)))
}
Compose <- function(...) {
fs <- list(...)
function(...) Reduce(function(x, f) f(x),
fs,
...)
}
Note the usage of the Reduce function, which can be very helpful when trying to do functional programming in R. See ?Reduce for more details (which also covers other functions such as Map and Filter).
And your example of Curry (slightly different in this usage):
> library(roxygen)
> p <- Curry(paste, collapse="")
> p(letters[1:10])
[1] "abcdefghij"
Here's an example to show the utility of Compose (applying three different functions to letters):
> Compose(function(x) x[length(x):1], Curry(paste, collapse=""), toupper)(letters)
[1] "ZYXWVUTSRQPONMLKJIHGFEDCBA"
And your final example would work like this:
> aggregate(df[,"t"], df["l"], Compose(Curry(paste, collapse=""), toupper))
l x
1 1 ABG
2 2 DEFH
3 3 CIJ
Lastly, here's a way to do the same thing with plyr (could also easily be done with by or aggregate as already shown):
> library(plyr)
> ddply(df, .(l), function(df) paste(toupper(df[,"t"]), collapse=""))
l V1
1 1 ABG
2 2 DEFH
3 3 CIJ
The standard place for functional programming in R is now the functional library.
From the library:
functional: Curry, Compose, and other higher-order functions
Example:
library(functional)
newfunc <- Curry(oldfunc,x=5)
CRAN:
https://cran.r-project.org/web/packages/functional/index.html
PS: This library substitutes the ROxigen library.
There is a function called Curry in the roxygen package.
Found via this conversation on the R Mail Archive.
A more complex approach is required if you want the 'names' of the variables to pass through accurately.
For example, if you do plot(rnorm(1000),rnorm(1000)) then you will get nice labels on your x- and y- axes. Another example of this is data.frame
> data.frame( rnorm(5), rnorm(5), first=rpois(5,1), second=rbinom(5,1,0.5) )
rnorm.5. rnorm.5..1 first second
1 0.1964190 -0.2949770 0 0
2 0.4750665 0.8849750 1 0
3 -0.7829424 0.4174636 2 0
4 1.6551403 1.3547863 0 1
5 1.4044107 -0.4216046 0 0
Not that the data.frame has assigned useful names to the columns.
Some implementations of Curry may not do this properly, leading to unreadable column names and plot labels. Instead, I now use something like this:
Curry <- function(FUN, ...) {
.orig = match.call()
.orig[[1]] <- NULL # Remove first item, which matches Curry
.orig[[1]] <- NULL # Remove another item, which matches FUN
function(...) {
.inner = match.call()
.inner[[1]] <- NULL # Remove first item, which matches Curry
do.call(FUN, c(.orig, .inner), envir=parent.frame())
}
}
This is quite complex, but I think it's correct. match.call will catch all args, fully remembering what expressions defined the args (this is necessary for nice labels). The problem is that it catches too many args -- not just the ... but also the FUN. It also remembers the name of the function that's being called (Curry).
Therefore, we want to delete these first two entries in .orig so that .orig really just corresponds to the ... arguments. That's why we do .orig[[1]]<-NULL twice - each time deletes an entry and shifts everything else to the left.
This completes the definition and we can now do the following to get exactly the same as above
Curry(data.frame, rnorm(5), rnorm(5) )( first=rpois(5,1) , second=rbinom(5,1,0.5) )
A final note on envir=parent.frame(). I used this to ensure that there won't be a problem if you have external variables called '.inner' or '.orig'. Now, all variables are evaluated in the place where the curry is called.
in package purrr ,now there is a function partial
If you are already using the purrr package from tidyverse, then purrr::partial is a natural choice to curry functions. From the description of purrr::partial:
# Partial is designed to replace the use of anonymous functions for
# filling in function arguments. Instead of:
compact1 <- function(x) discard(x, is.null)
# we can write:
compact2 <- partial(discard, .p = is.null)