Static Variables in R - r

I have a function in R that I call multiple times. I want to keep track of the number of times that I've called it and use that to make decisions on what to do inside of the function. Here's what I have right now:
f = function( x ) {
count <<- count + 1
return( mean(x) )
}
count = 1
numbers = rnorm( n = 100, mean = 0, sd = 1 )
for ( x in seq(1,100) ) {
mean = f( numbers )
print( count )
}
I don't like that I have to declare the variable count outside the scope of the function. In C or C++ I could just make a static variable. Can I do a similar thing in the R programming language?

Here's one way by using a closure (in the programming language sense), i.e. store the count variable in an enclosing environment accessible only by your function:
make.f <- function() {
count <- 0
f <- function(x) {
count <<- count + 1
return( list(mean=mean(x), count=count) )
}
return( f )
}
f1 <- make.f()
result <- f1(1:10)
print(result$count, result$mean)
result <- f1(1:10)
print(result$count, result$mean)
f2 <- make.f()
result <- f2(1:10)
print(result$count, result$mean)
result <- f2(1:10)
print(result$count, result$mean)

Here is another approach. This one requires less typing and (in my opinion) more readable:
f <- function(x) {
y <- attr(f, "sum")
if (is.null(y)) {
y <- 0
}
y <- x + y
attr(f, "sum") <<- y
return(y)
}
This snippet, as well as more complex example of the concept can by found in this R-Bloggers article

It seems the right answer was given by G. Grothendieck there: Emulating static variable within R functions But somehow this post got more favorable position in google search, so i copy this answer here:
Define f within a local like this:
f <- local({
static <- 0
function() { static <<- static + 1; static }
})
f()
## [1] 1
f()
## [1] 2

Related

Alternative to writing to global variable from within function

I've got a bit of code that works, but which I understand relies on bad practice to do so. To use a simple representation of the problem, take the code;
operation <- function(index){
a <- 0
if(data[index] == FALSE){
data[index] <<- TRUE
a <- a + 1}
a <- a + 1
return(a)
}
data <- c(FALSE, FALSE, FALSE)
x <- 0
x <- x + operation(sample(c(1,2,3),1))
x <- x + operation(sample(c(1,2,3),1))
x <- x + operation(sample(c(1,2,3),1))
x
The "operation" function has two purposes - firstly, to output 2 if the value specified by the inputs is FALSE or 1 if TRUE, and importantly to change the input to TRUE so that future calls of the same input return 1.
The problems with this are that the operation function references a global variable which I know for my use case will always exist, but hypothetically may not, and that the function writes to the global variable with the <<- command, which I understand is incredibly bad practice.
Is there a better-practice way to achieve the same functionality without the function writing to the global variable?
R does, by design, only return one object. To return multiple objects, you have to store them in a list and use both elements as inputs.
operation <- function(index, data){
a <- 0
if(data[index] == FALSE) {
data[index] <- TRUE
a <- a + 1}
a <- a + 1
return(list(a = a, data = data))
}
data <- c(FALSE, FALSE, FALSE)
x <- 0
set.seed(999)
res <- operation(sample(1:3, 1), data)
x <- x + res$a
res <- operation(sample(1:3, 1), res$data)
x <- x + res$a
res <- operation(sample(1:3, 1), res$data)
x <- x + res$a
x
#> [1] 5
res$data
#> [1] TRUE FALSE TRUE
Another option would be to create a R6-Object that has two bindings x and data and change those by self referencing
We can use object oriented programming (OOP). Compare this to using lists in another answer to see the increased clarity of using OOP once the object has been defined -- the actual code which runs the op method hardly changes from the question. 1a, 2 and 3 do not require any addon packages.
1) proto First we use the proto package for OOP. proto objects are environments with certain added methods. Here p is a proto object that contains data and also a method op. Note that with proto we can avoid the use of <<- and unlike class-based object oriented systems proto allows definitions of objects, here p is an object, without needing classes.
library(proto)
p <- proto(op = function(., index) {
a <- 0
if( ! .$data[index] ) {
.$data[index] <- TRUE
a <- a + 1
}
a <- a + 1
return(a)
})
p$data <- c(FALSE, FALSE, FALSE)
x <- 0
x <- x + p$op(sample(c(1,2,3),1))
x <- x + p$op(sample(c(1,2,3),1))
x
p$data
1a A variation of this is to use just use plain environments.
e <- local({
op <- function(index) {
a <- 0
if( ! data[index] ) {
data[index] <<- TRUE
a <- a + 1
}
a <- a + 1
return(a)
}
environment()
})
e$data <- c(FALSE, FALSE, FALSE)
x <- 0
x <- x + e$op(sample(c(1,2,3),1))
x <- x + e$op(sample(c(1,2,3),1))
x
e$data
2) Reference Classes Reference classes for OOP come with R and do not require any packages. This may be overkill since it requires creating a class which only ever instantiates one object whereas with proto we can directly generate an object without this extra step.
MyClass <- setRefClass("MyClass", fields = "data",
methods = list(
op = function(index) {
a <- 0
if( ! data[index] ) {
data[index] <<- TRUE
a <- a + 1
}
a <- a + 1
return(a)
}
)
)
obj <- MyClass$new(data = c(FALSE, FALSE, FALSE))
x <- 0
x <- x + obj$op(sample(c(1,2,3),1))
x <- x + obj$op(sample(c(1,2,3),1))
x
obj$data
3) scoping It is possible to devise a poor man's OOP system that works with R by making use of function scoping. Try demo(scoping) for another example. This also does not require any packages. It does have the disadvantage of (2) that it requires the definition of a class which is only used once.
cls <- function(data = NULL) {
list(
put_data = function(x) data <<- x,
get_data = function() data,
op = function(index) {
a <- 0
if( ! data[index] ) {
data[index] <<- TRUE
a <- a + 1
}
a <- a + 1
return(a)
}
)
}
obj <- cls(data = c(FALSE, FALSE, FALSE))
x <- 0
x <- x + obj$op(sample(c(1,2,3),1))
x <- x + obj$op(sample(c(1,2,3),1))
x
obj$get_data()
4) You can also explore R6, R.oo and oops which are other CRAN packages that implement OOP in R.

How to create a list of functions from a list of parameters?

I want to create several functions using parameters and the function names contained inside a dataframe.
The for loop did not return what I was expecting, i.e each fuction to contain the parameters of intercept and slope from their line in the dataframe
data <- data.frame(name = c("A","B","C"), intercepts = c(1,0.5,4), slopes = c(0.1, -2,4))
> data
names intercepts slopes
1 A 1.0 0.1
2 B 0.5 -2.0
3 C 4.0 4.0
for(i in data$name){
assign(i, function(x){force(i);
data[data$name==i,]$intercepts + data[data$name==i,]$slopes*x}
)
}
I know the problem has something to do with the scope, but I could not fix it using "force" as recommended by some users.
> A(1)
[1] 8
> B(1)
[1] 8
> C(1)
[1] 8
I messed around with it a little bit and I do not think you can get what you want because of R's weird scope rules. Maybe try a function factory instead?
data <- data.frame(name = c("A","B","C"), intercepts = c(1,0.5,4), slopes = c(0.1, -2,4))
factory <- function(data, i) {
function(x) {
data[i,]$intercepts + data[i,]$slopes*x
}
}
factory(data, 1)(1) #> 1.1
A <- factory(data, 1)
A(1) #> 1.1
Or you could write code that just takes in data, i, and x and calculates the value outright. To be honest, what you're asking for seems weirdly unidiomatic.
One way to do it with substitute:
for(i in data$name){
local({
fn <- function(x) 1
body(fn) <- substitute(data[data$name==X,]$intercepts + data[data$name==X,]$slopes*x, list(X=i))
assign(i, fn, .GlobalEnv)
}
)
}
This does not rely on lexical scoping (which kkeey called "weird scoping rules"); "A" becomes hard-coded within A() etc:
print(A)
# function (x)
# data[data$name == "A", ]$intercepts + data[data$name == "A",
# ]$slopes * x
Using local above is not really necessary but I did it to avoid polluting the global namespace with unnecessary objects (fn in this case).
A simpler version without local:
for(i in data$name){
fn <- function(x) 1
body(fn) <- substitute(data[data$name==X,]$intercepts + data[data$name==X,]$slopes*x, list(X=i))
assign(i, fn)
}
Finally, a version for creating functions that become independent of your data:
for(i in data$name){
fn <- function(x) 1
int <- data[data$name==i,]$intercepts
slp <- data[data$name==i,]$slopes
body(fn) <- substitute(a + b*x, list(a=int, b=slp))
assign(i, fn)
}
> A
function (x)
1 + 0.1 * x
> B
function (x)
0.5 + -2 * x
> C
function (x)
4 + 4 * x

Vectorizing this function in R

Hi so I have the following function:
kde.cv = function(X,s) {
l = length(X)
log.fhat.vector = c()
for (i in 1:l) {
current.log.fhat = log ( kde(X[i],X[-i],s) )
log.fhat.vector[i] = current.log.fhat
}
CV.score = sum(log.fhat.vector)
return(CV.score)
}
I'd like to vectorize this without using any for loops or apply statements, can't seem to get around doing so. Help would be appreciated. Thanks.
EDIT: Given the responses, here are my answers to the questions posed.
Given requests for clarification, I will elaborate on the function inputs and on the user defined function inside the function given. So X here is a dataset in the form of a vector, specifically, a vector of length 7 in the dataset I used as an input to this function. The X I used this function for is c(-1.1653, -0.7538, -1.3218, -2.3394, -1.9766, -1.8718, -1.5041). s is a single scalar point set at 0.2 for the use of this function. kde is a user - defined function that I wrote. Here is the implementation:
kde = function(x,X,s){
l = length(x)
b = matrix(X,l,length(X),byrow = TRUE)
c = x - b
phi.matrix = dnorm(c,0,s)
d = rowMeans(phi.matrix)
return(d)
}
in this function, X is the same vector of data points used in kde.cv. s is also the same scalar value of 0.2 used in kde.cv. x is a vector of evaluation points for the function, I used seq(-2.5, -0.5, by = 0.1).
Here is an option using sapply
kde.cv = function(X,s)
sum(sapply(1:length(X), function(i) log(kde(X[i], X[-i], s))))
For convenience, please provide a more complete example. For example, the kde() function. Is that a customized function?
Alternative to sapply, you can try Vectorize(). There are some examples you can find on stack overflow.
Vectorize() vs apply()
Here is an example
f1 <- function(x,y) return(x+y)
f2 <- Vectorize(f1)
f1(1:3, 2:4)
[1] 3 5 7
f2(1:3, 2:4)
[1] 3 5 7
and the second example
f1 <- function(x)
{
new.vector<-c()
for (i in 1:length(x))
{
new.vector[i]<-sum(x[i] + x[-i])
}
return(sum(new.vector))
}
f2<-function(x)
{
f3<-function(y, i)
{
u<-sum(y[i]+y[-i])
return(u)
}
f3.v<-Vectorize(function(i) f3(y = x, i=i))
new.value<-f3.v(1:length(x))
return(sum(new.value))
}
f1(1:3)
[1] 24
f2(1:3)
[1] 24
Note: Vectorize is a wrapper for mapply
EDIT 1
According to the response, I edited your kde.cv function.
kde.cv = function(X,s) {
l = length(X)
log.fhat.vector = c()
for (i in 1:l) {
current.log.fhat = log ( kde(X[i],X[-i],s) )
log.fhat.vector[i] = current.log.fhat
}
CV.score = sum(log.fhat.vector)
return(CV.score)
}
kde = function(x,X,s){
l = length(x)
b = matrix(X,l,length(X),byrow = TRUE)
c = x - b
phi.matrix = dnorm(c,0,s)
d = rowMeans(phi.matrix)
return(d)
}
##### Vectorize kde.cv ######
kde.cv.v = function(X,s)
{
log.fhat.vector = c()
kde.v<-Vectorize(function(i) kde(X[i], X[-i], s))
CV.score <- sum(log(kde.v(1:length(X))))
return(CV.score)
}
X<-c(-1.1653, -0.7538, -1.3218, -2.3394, -1.9766, -1.8718, -1.5041)
s<-0.2
x<-seq(-2.5, -0.5, by = 0.1)
kde.cv(X, s)
[1] -10.18278
kde.cv.v(X, s)
[1] -10.18278
EDIT 2
Well, I think the following function may match your requirement. BTW, since the little x is not used in your kde.cv, I just edited both two functions
kde.cv.2 <- function(X,s)
{
log.fhat.vector<-log(kde.2(X, s))
CV.score = sum(log.fhat.vector)
return(CV.score)
}
kde.2<-function(X, s)
{
l <- length(X)
b <- matrix(rep(X, l), l, l, byrow = T)
c <- X - b
diag(c) <- NA
phi.matrix <- dnorm(c, 0, s)
d <- rowMeans(phi.matrix, na.rm = T)
return(d)
}
X<-c(-1.1653, -0.7538, -1.3218, -2.3394, -1.9766, -1.8718, -1.5041)
s<-0.2
kde.cv(X,s)
[1] -10.18278
kde.cv.2(X, s)
[1] -10.18278

Print Specific Output from multiple functions in R

I have multiple user defined functions written in R. I usually source the code and then print the output in R console. My problem is I have 3 function written in one file and all three functions have similar output( here I have z which is common in all three function).. Is there any solution in R where I do not have to type print(z) at the end of every function but after sourcing my code I should be able to print z specific to function?
harry<-function(i){
for(i in 3:5) {
z <- i + 1
print(z)
}
}
harry1<-function(i){
for(i in 1:5) {
z <- i + 1
print(z)
}
}
harry2<-function(i){
for(i in 1:5) {
z <- i + 5
print(z)
}
}
harry <- function(i){
z <- 1 # initialize
for(i in 3:5) {
z[i] <- i + 1 # save to vector
}
return(z) # returns the object z
}
Now you can go:
harry(100)
z <- harry(100)
print(z)
z
To access the same information.
Might I suggest a more general way of doing things?
harry<-function(i,sq){
sapply(sq, function(s,i) {
s + i
}, i=i )
}
harry(i=1,sq=3:5)
harry(i=1,sq=1:5)
harry(i=5,sq=1:5)

In R, how to make the variables inside a function available to the lower level function inside this function?(with, attach, environment)

Update 2
#G. Grothendieck posted two approaches. The second one is changing the function environment inside a function. This solves my problem of too many coding replicates. I am not sure if this is a good method to pass through the CRAN check when making my scripts into a package. I will update again when I have some conclusions.
Update
I am trying to pass a lot of input argument variables to f2 and do not want to index every variable inside the function as env$c, env$d, env$calls, that is why I tried to use with in f5 and f6(a modified f2). However, assign does not work with with inside the {}, moving assign outside with will do the job but in my real case I have a few assigns inside the with expressions which I do not know how to move them out of the with function easily.
Here is an example:
## In the <environment: R_GlobalEnv>
a <- 1
b <- 2
f1 <- function(){
c <- 3
d <- 4
f2 <- function(P){
assign("calls", calls+1, inherits=TRUE)
print(calls)
return(P+c+d)
}
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f2(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f1()
Function f2 is inside f1, when f2 is called, it looks for variables calls,c,d in the environment environment(f1). This is what I wanted.
However, when I want to use f2 also in the other functions, I will define this function in the Global environment instead, call it f4.
f4 <- function(P){
assign("calls", calls+1, inherits=TRUE)
print(calls)
return(P+c+d)
}
This won't work, because it will look for calls,c,d in the Global environment instead of inside a function where the function is called. For example:
f3 <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f4(P=0) ## or replace here with f5(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f3()
The safe way should be define calls,c,d in the input arguments of f4 and then pass these parameters into f4. However, in my case, there are too many variables to be passed into this function f4 and it would be better that I can pass it as an environment and tell f4 do not look in the Global environment(environment(f4)), only look inside the environment when f3 is called.
The way I solve it now is to use the environment as a list and use the with function.
f5 <- function(P,liste){
with(liste,{
assign("calls", calls+1, inherits=TRUE)
print(calls)
return(P+c+d)
}
)
}
f3 <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f5(P=0,as.list(environment())) ## or replace here with f5(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f3()
However, now assign("calls", calls+1, inherits=TRUE) does not work as it should be since assign does not modify the original object. The variable calls is connected to an optimization function where the objective function is f5. That is the reason I use assign instead of passing calls as an input arguments. Using attach is also not clear to me. Here is my way to correct the assign issue:
f7 <- function(P,calls,liste){
##calls <<- calls+1
##browser()
assign("calls", calls+1, inherits=TRUE,envir = sys.frame(-1))
print(calls)
with(liste,{
print(paste('with the listed envrionment, calls=',calls))
return(P+c+d)
}
)
}
########
##################
f8 <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
##browser()
##v[i] <- f4(P=0) ## or replace here with f5(P=0)
v[i] <- f7(P=0,calls,liste=as.list(environment()))
c <- c+1
d <- d+1
}
f7(P=0,calls,liste=as.list(environment()))
print(paste('final call number',calls))
return(v)
}
f8()
I am not sure how this should be done in R. Am I on the right direction, especially when passing through the CRAN check? Anyone has some hints on this?
(1) Pass caller's environment. You can explicitly pass the parent environment and index into it. Try this:
f2a <- function(P, env = parent.frame()) {
env$calls <- env$calls + 1
print(env$calls)
return(P + env$c + env$d)
}
a <- 1
b <- 2
# same as f1 except f2 removed and call to f2 replaced with call to f2a
f1a <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f2a(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f1a()
(2) Reset called function's environment We can reset the environment of f2b in f1b as shown here:
f2b <- function(P) {
calls <<- calls + 1
print(calls)
return(P + c + d)
}
a <- 1
b <- 2
# same as f1 except f2 removed, call to f2 replaced with call to f2b
# and line marked ## at the beginning is new
f1b <- function(){
environment(f2b) <- environment() ##
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f2b(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f1b()
(3) Macro using eval.parent(substitute(...)) Yet another approach is to define a macro-like construct which effectively injects the body of f2c inline into f1c1. Here f2c is the same as f2b except for the calls <- calls + 1 line (no <<- needed) and the wrapping of the entire body in eval.parent(substitute({...})). f1c is the same as f1a except the call to f2a is replaced with a call to f2c .
f2c <- function(P) eval.parent(substitute({
calls <- calls + 1
print(calls)
return(P + c + d)
}))
a <- 1
b <- 2
f1c <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f2c(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f1c()
(4) defmacro This is almost the same as the the last solution except it uses defmacro in the gtools package to define the macro rather than doing it ourself. (Also see the Rcmdr package for another defmacro version.) Because of the way defmacro works we must also pass calls but since it's a macro and not a function this just tells it to substitute calls in and is not the same as passing calls to a function.
library(gtools)
f2d <- defmacro(P, calls, expr = {
calls <- calls + 1
print(calls)
return(P + c + d)
})
a <- 1
b <- 2
f1d <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f2d(P=0, calls)
c <- c+1
d <- d+1
}
return(v)
}
f1d()
In general, I would say that any variable that is needed inside a function should be passed on through its arguments. In addition, if its value is needed later you pass it back from the function. Not doing this can quite quickly lead to strange results, e.g. what if there are multiple functions defining a variable x, which one should be used. If the amount of variables is larger, you create a custom data structure for it, e.g. putting them into a named list.
One could also use a function that redefines other functions in the specified environment.
test_var <- "global"
get_test_var <- function(){
return(test_var)
}
some_function <- function(){
test_var <- "local"
return(get_test_var())
}
some_function() # Returns "global". Not what we want here...
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
some_function2 <- function(){
test_var <- "local"
# define function locally
get_test_var2 <- function(){
return(test_var)
}
return(get_test_var2())
}
some_function2() # Returns "local", but 'get_test_var2' can't be used in other places.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
add_function_to_envir <- function(my_function_name, to_envir) {
script_text <- capture.output(eval(parse(text = my_function_name)))
script_text[1] <- paste0(my_function_name, " <- ", script_text[1])
eval(parse(text = script_text), envir = to_envir)
}
some_function3 <- function(){
test_var <- "local"
add_function_to_envir("get_test_var", environment())
return(get_test_var())
}
some_function3() # Returns "local" and we can use 'get_test_var' from anywhere.
Here add_function_to_envir(my_function_name, to_envir) captures the script of the function, parses and reevaluates it in the new environment.
Note: the name of the function for my_function_name needs to be in quotes.
Whenever I use nested functions and don't pass the variables on as arguments, but instead pass them on with ..., I use the following function in all nested functions to get variables from the parent environment.
LoadVars <- function(variables, ...){
for (var in 1:length(variables)) {
v <- get(variables[var], envir = parent.frame(n=2))
assign(variables[var], v, envir = parent.frame(n=1))
}
}
Inside a nested function, I then LoadVars(c("foo", "bar")).
This approach is useful in the sense that you only pass on the variables you need, similar as when you pass on the variables through arguments.
Approach 2
However, it is simple to rewrite this function to load in all variables from the parent function—or higher up if needed, just increase the n value in parent.frame from its original value of 2.
LoadVars <- function(){
variables <- ls(envir = parent.frame(n=2))
for (var in 1:length(variables)) {
v <- get(variables[var], envir = parent.frame(n=2))
assign(variables[var], v, envir = parent.frame(n=1))
}
}
Example
a <- 1
A <- function(...){
b <- 2
printf("A, a = %s", a)
printf("A, b = %s", b)
B()
}
B <- function(...){
LoadVars()
printf("B, a = %s", a)
printf("B, b = %s", b)
}
A()
If you don't load variables in B, then B is able to load a because it is a global environment variable, but not b which is located in A().
Output:
[1] "A, a = 1"
[1] "A, b = 2"
[1] "B, a = 1"
[1] "B, b = 2"

Resources