can lapply not modify variables in a higher scope - r

I often want to do essentially the following:
mat <- matrix(0,nrow=10,ncol=1)
lapply(1:10, function(i) { mat[i,] <- rnorm(1,mean=i)})
But, I would expect that mat would have 10 random numbers in it, but rather it has 0. (I am not worried about the rnorm part. Clearly there is a right way to do that. I am worry about affecting mat from within an anonymous function of lapply) Can I not affect matrix mat from inside lapply? Why not? Is there a scoping rule of R that is blocking this?

I discussed this issue in this related question: "Is R’s apply family more than syntactic sugar". You will notice that if you look at the function signature for for and apply, they have one critical difference: a for loop evaluates an expression, while an apply loop evaluates a function.
If you want to alter things outside the scope of an apply function, then you need to use <<- or assign. Or more to the point, use something like a for loop instead. But you really need to be careful when working with things outside of a function because it can result in unexpected behavior.
In my opinion, one of the primary reasons to use an apply function is explicitly because it doesn't alter things outside of it. This is a core concept in functional programming, wherein functions avoid having side effects. This is also a reason why the apply family of functions can be used in parallel processing (and similar functions exist in the various parallel packages such as snow).
Lastly, the right way to run your code example is to also pass in the parameters to your function like so, and assigning back the output:
mat <- matrix(0,nrow=10,ncol=1)
mat <- matrix(lapply(1:10, function(i, mat) { mat[i,] <- rnorm(1,mean=i)}, mat=mat))
It is always best to be explicit about a parameter when possible (hence the mat=mat) rather than inferring it.

One of the main advantages of higher-order functions like lapply() or sapply() is that you don't have to initialize your "container" (matrix in this case).
As Fojtasek suggests:
as.matrix(lapply(1:10,function(i) rnorm(1,mean=i)))
Alternatively:
do.call(rbind,lapply(1:10,function(i) rnorm(1,mean=i)))
Or, simply as a numeric vector:
sapply(1:10,function(i) rnorm(1,mean=i))
If you really want to modify a variable above of the scope of your anonymous function (random number generator in this instance), use <<-
> mat <- matrix(0,nrow=10,ncol=1)
> invisible(lapply(1:10, function(i) { mat[i,] <<- rnorm(1,mean=i)}))
> mat
[,1]
[1,] 1.6780866
[2,] 0.8591515
[3,] 2.2693493
[4,] 2.6093988
[5,] 6.6216346
[6,] 5.3469690
[7,] 7.3558518
[8,] 8.3354715
[9,] 9.5993111
[10,] 7.7545249
See this post about <<-. But in this particular example, a for-loop would just make more sense:
mat <- matrix(0,nrow=10,ncol=1)
for( i in 1:10 ) mat[i,] <- rnorm(1,mean=i)
with the minor cost of creating a indexing variable, i, in the global workspace.

Instead of actually altering mat, lapply just returns the altered version of mat (as a list). You just need to assign it to mat and turn it back into a matrix using as.matrix().

Related

Load for-loop results in container

All of the below is conducted in R.
I am trying to store the result of a for-loop result in different containers, but somehow I keep ending up with NA-warnings and the results are not not stored in my container. Even tried different containers for different for-loops within the function and then finally a matrix for the containers, but it seems it's not working.
Already trying different solutions for two full days, and it seems there should be such an easy solution. Maybe I just can't see it myself anymore...
data.ols<-data.frame(cbind(rep(1),holiday,weathersit,atemp,hum,windspeed))
y<-as.vector(cnt)
z=c(holiday, weathersit, atemp, hum, windspeed)
z.names=c("holiday","weathersit","atemp","hum","windspeed")
result.container<-data.frame(matrix(nrow=6,ncol=4))
colnames(result.container)<-c("beta","SE","t-statistic","p-value")
ols<-function(y,X2,x=0){
X<-matrix(z, ncol=5)
X2<-cbind(rep(1, nrow(X)), X)
XXinv <- solve(t(X2) %*% X2, diag(ncol(X2))) # Compute (X'X)^-1
beta<-XXinv%*%t(X2)%*%y
print(beta)
result.container[,1]<-beta
result.testdebug<-vector()
for (i in c("V1","holiday","weathersit","atemp","hum","windspeed")){
SE<-sd(i)
result.testdebug[i]<-sd(data.ols[,i])
return(result.testdebug)
result.container[,2]<-result.testdebug}
result.testtvalue<-vector()
for (i in c("V1","holiday","weathersit","atemp","hum","windspeed")){
nominator<-(mean(i)-x)
t.value <- nominator/sd(i)
return(t.value)
result.testtvalue<-t.value
result.container[,3]<-result.testtvalue}
df <- length(X)-1
p.value <- 2*pt(t.value, df, lower.tail=FALSE)
return(p.value)
result.container[,4]<-p.value
list(rbind(beta,result.testdebug,t.value,p.value))}
It seems you are having some trouble with functions in R. In R, functions have their own environment (i.e. their own set of objects). Even though they can read from their parent environment (the set of all objects), they cannot write on it. Let me demonstrate that with a simpler code.
teste2=matrix(,2,2)
teste=function(a,b) {teste2[,1]=c(a,b)}
teste(3,2)
teste2
[,1] [,2]
[1,] NA NA
[2,] NA NA
As you can see teste (the function) cannot change teste2 (the matrix).
In R ,the best way to make a function is to give it all the objects it needs as parameters and by the end of the function body, to give a single return() function that gives the final object.
You did something close to that, but used multiple return() functions. R only uses the first return() and ignores the rest. See below:
teste=function(a,b) {c=a;return(c);d=b;return(d)}
teste(3,2)
[1] 3
For your particular code, I reccomend excluding all result.container<- and put return() only on the end, around that last (list)

r- call function inside for loop alternative

my_function <- function(n){}
result = list()
for(i in 0:59){
result[i] = my_function(i)
}
write.csv(result, "result.csv")
New to R, read that for-loops are bad in R, so is there an alternative to what I'm doing? I'm basically trying to call my_function with a parameter that's increasing, and then write the results to a file.
Edit
Sorry I didn't specify that I wanted to use some function of i as a parameter for my_function, 12 + (22*i) for example. Should I create a list of values and then called lapply with that list of values?
for loops are fine in R, but they're syntactically inefficient in a lot of use cases, especially simple ones. The apply family of functions usually makes a good substitute.
result <- lapply(0:59, my_function)
write.csv(result, "result.csv")
Depending on what your function's output is, you might want sapply rather than lapply.
Edit:
Per your update, you could do it as you say, creating the vector of values first, or you could just do something like:
lapply(12+(22*0:59), my_function)

Evaluate a list of functions in R all with the same input and return list of arrays or matrix

Say you have a list of functions
funList=list()
for (i in 1:5){
funList[[i]]=approxfun(0:5,(0:5)^i,method="linear", rule=2)
}
and later you want a matrix of values with each row (or column which ever makes the code simpler or even a list of arrays instead of a matrix would be fine) being of the form of lets say
funList[[i]](1:3)
I've tried using lapply, but I haven't been able to get that to work
I would do:
eval.with.args <- function(FUN, ...) FUN(...)
Then one of:
lapply(funList, eval.with.args, 1:3)
sapply(funList, eval.with.args, 1:3)
mapply(eval.with.args, funList, list(1:3))
Map(eval.with.args, funList, list(1:3))
I think I remember asking on the forums if there was a function that already implemented function(FUN, ...)FUN(...) but the answer was "no" at the time. It could make a nice addition to the base or functional packages IMHO.
You're looking for do.call:
lapply(funList, do.call, list(1:3))
You can replace eval.with.args in all of #flodel's examples with do.call if you wrap the second argument in an additional call to list.

parallel computations on Reference Classes

I have a list of fairly large objects that I want to apply a complicated function to in parallel, but my current method uses too much memory. I thought Reference Classes might help, but using mcapply to modify them doesn't seem to work.
The function modifies the object itself, so I overwrite the original object with the new one. Since the object is a list and I'm only modifying a small part of it, I was hoping that R's copy-on-modify semantics would avoid having multiple copies made; however, in running it, it doesn't seem to be the case for what I'm doing. Here's a small example of the base R methods I have been using. It correctly resets the balance to zero.
## make a list of accounts, each with a balance
## and a function to reset the balance
foo <- lapply(1:5, function(x) list(balance=x))
reset1 <- function(x) {x$balance <- 0; x}
foo[[4]]$balance
## 4 ## BEFORE reset
foo <- mclapply(foo, reset1)
foo[[4]]$balance
## 0 ## AFTER reset
It seems that using Reference Classes might help as they are mutable, and when using lapply it does do as I expect; the balance is reset to zero.
Account <- setRefClass("Account", fields=list(balance="numeric"),
methods=list(reset=function() {balance <<- 0}))
foo <- lapply(1:5, function(x) Account$new(balance=x))
foo[[4]]$balance
## 4
invisible(lapply(foo, function(x) x$reset()))
foo[[4]]$balance
## 0
But when I use mclapply, it doesn't properly reset. Note that if you're on Windows or have mc.cores=1, lapply will be called instead.
foo <- lapply(1:5, function(x) Account$new(balance=x))
foo[[4]]$balance
## 4
invisible(mclapply(foo, function(x) x$reset()))
foo[[4]]$balance
## 4
What's going on? How can I work with Reference Classes in parallel? Is there a better way altogether to avoid unnecessary copying of objects?
I think the forked processes, while they have access to all the variables in the workspace, must not be able to change them. This works, but I don't know yet if it improves the memory issues or not.
foo <- mclapply(foo, function(x) {x$reset(); x})
foo[[4]]$balance
## 0

apply different functions to different elements of a vector in R

apply is easy, but this is a nutshell for me to crack:
In multi-parametric regression, optimisers are used to find a best fit of a parametric function to say x1,x2 Data. Often, and function specific, optimisers can be faster if they try to optimise transformed parameters (e.g. with R optimisers such as DEoptim, nls.lm)
From experience I know, that different transformations for different parameters from one parametric function is even better.
I wish to apply different functions in x.trans (c.f. below) to different but in their position corresponding elements in x.val:
A mock example to work with.
#initialise
x.val <- rep(100,5); EDIT: ignore this part ==> names(x.val) <- x.names
x.select <- c(1,0,0,1,1)
x.trans <- c(log10(x),exp(x),log10(x),x^2,1/x)
#select required elements, and corresponding names
x.val = subset(x.val, x.select == 1)
x.trans = subset(x.trans, x.select == 1)
# How I tried: apply function in x.trans[i] to x.val[i]
...
Any ideas? (I have tried with apply, and sapply but can't get at the functions stored in x.trans)
You must use this instead:
x.trans <- c(log10,exp,log10,function(x)x^2,function(x)1/x)
Then this:
mapply(function(f, x) f(x), x.trans, x.val)

Resources