I'm trying to wrap my head around the R scoping rules, and I'm having trouble understanding the force function's example in the R documentation
f <- function(y) function() y
lf <- vector("list", 5)
for (i in seq_along(lf)) lf[[i]] <- f(i)
lf[[1]]() # returns 5
g <- function(y) { force(y); function() y }
lg <- vector("list", 5)
for (i in seq_along(lg)) lg[[i]] <- g(i)
lg[[1]]() # returns 1
Why does
lf[[1]]()
return 5 instead of 1?
I have a basic understanding of scoping rules from Hadley's advance R but I can't figure out how it applies here
Related
I am trying to write a function in R to generate n random variables from x using sample () function when x~Ge(p) (it means x has geometric distribution). In my function I would like to use a while loop.
I think my function needs two inputs as size and p. I need also a for loop in my function. What I think will work is something like a below framework for my function:
rGE <- function(size,p){
for
i<-1
while()
...
return(i)
}
I would like to develope my above function in order to generate n random variables from x (when x~Ge(p))
For a home-grown, inefficient (but comprehensible) version of rgeom, something like this should work:
my_rgeom <- function(n, p) {
x <- numeric(n) ## allocate space for the results (all zeros)
for (i in seq(n)) {
done <- FALSE
while (!done) {
x[i] <- x[i] + 1
done <- runif(1)<p
}
}
return(x)
}
I'm sure you could use sample() instead of runif() for the innermost loop, but it's not obvious to me how. One piece of advice: if you're unfamiliar with programming, try writing your proposed algorithm out as pseudocode rather than jumping in to R-bashing right away. It can be easier if you deal with the logic and the coding nuts-and-bolts separately ...
You could use rgeom:
set.seed(1)
rgeom(n = 10, p = .1)
#> [1] 6 3 23 3 24 13 15 2 20 3
I have finally written the below function:
rge<- function(n, p) {
x <- numeric(n)
for (i in seq(n)) {
j <- 0
while (j==0) {
x[i] <- x[i] + 1
j <- sum(sample(0:1,replace=TRUE,prob=c(1-p,p)))
}
}
return(x)
}
rge(10,.2)
I hope it really generates n random variables number from geometric distribution.
I just started programming in R. I'm working at code which execute operations on arrays. It works when I put there a variable but if i wrap it into a function something is wrong. When I try to recall list_matrices[i] i got NULL.
F <- function(x){
list_matrices=c()
for(i in 1:dim(x)[1]){
list_matrices[[i]] <- t(rbind(x[i,1:dim(x)[2],1:dim(x)[3]]))}
}
It has already been pointed out in the comments that the problem is that the function does not return list_matrices so here we will point out that there is some question of whether you really need to do this in the first place. If the reason to create the list is to later iterate over it with some function g, it would be possible to do that directly over 'x' using apply. These two are the same:
# test inputs
x <- array(1:24, 2:4)
g <- function(x) sum(x*x)
# 1
f <- function(x) {
list_matrices <- c()
for(i in 1:dim(x)[1]) {
list_matrices[[i]] <- t(rbind(x[i,1:dim(x)[2],1:dim(x)[3]]))
}
list_matrices
}
L <- f(x)
sapply(L, g)
## [1] 2300 2600
# 2
apply(x, 1, g)
## [1] 2300 2600
Also note that F is used for FALSE in R and could result in subtle errors if used as the name of an object so above we have renamed the function f.
Suppose I have a function including a for loop part. This for loop will work for, say, 10 iteration. How can I know from the result that the function is working now at level (iteration) number, say, 5.
That is, I would like my function to let me know the current iteration number.
For example,
I would like the result to be such this:
Iteration 1 starts
some result
iteration 1 ends
iteration 2 starts
some result
iteration 2 ends
...
...
Please note this is not my original function. In my original function I use optim function over a list of models, and I really need to know what is the current model.
Here is a general example:
Myfun <- function(x,y){
v <- list()
for(i in 1:100){
v[[i]] <- sum(x[[i]], y[[i]])
cat(v, "\n")
}
v
}
x <- rnorm(100)
y <- rnorm(100)
Myfun(x=x, y=y)
Method 1
Output the current iteration step inside the for loop.
Myfun <- function(x,y) {
v <- list()
for (i in 1:100) {
v[[i]] <- sum(x[[i]], y[[i]])
cat(sprintf("Step %i / 100 done\n", i))
}
v
}
Method 2
Use a progress bar (see ?txtProgressBar for details).
Myfun <- function(x,y) {
v <- list()
pb <- txtProgressBar(min = 0, max = 100, style = 3)
for (i in 1:100) {
v[[i]] <- sum(x[[i]], y[[i]])
setTxtProgressBar(pb, i)
}
close(pb)
v
}
Note that the line cat(v, "\n") from your original Myfun will give an error.
I would like to create a list of functions in R where values from a for loop are stored in the function definition. Here is an example:
init <- function(){
mod <- list()
for(i in 1:3){
mod[[length(mod) + 1]] <- function(x) sum(i + x)
}
return(mod)
}
mod <- init()
mod[[1]](2) # 5 - but I want 3
mod[[2]](2) # 5 - but I want 4
In the above example, regardless of which function I call, i is always the last value in the for loop sequence, I understand this is the correct behavior.
I'm looking for something that achieves this:
mod[[1]] <- function(x) sum(1 + x)
mod[[2]] <- function(x) sum(2 + x)
mod[[3]] <- function(x) sum(3 + x)
You can explicitly ensure i is evaluated at it's current value in the for loop by using force.
init <- function(){
mod <- list()
f_gen = function(i) {
force(i)
return(function(x) sum(i + x))
}
for(i in 1:3){
mod[[i]] <- f_gen(i)
}
return(mod)
}
mod <- init()
mod[[1]](2)
# [1] 3
mod[[2]](2)
# [1] 4
More details are in the Functions/Lazy Evaluation subsection of Advanced R. Also see ?force, of course. Your example is fairly similar to the examples given in ?force.
Using a single-function generator function (f_gen in my code above) seems to make more sense than a list-of-functions generator function. Using my f_gen your code code be simplified:
f_gen = function(i) {
force(i)
return(function(x) sum(i + x))
}
mod2 <- lapply(1:3, f_gen)
mod2[[1]](2)
# [1] 3
mod2[[2]](2)
# [1] 4
## or alternately
mod3 = list()
for (i in 1:3) mod3[[i]] <- f_gen(i)
mod3[[1]](2)
mod3[[2]](2)
I would like to implement a simulation program, which requires the following structure:
It has a for loop, the program will generate an vector in each iteration. I need each generated vector is appended to the existing vector.
I do not how how to do this in R. Thanks for the help.
These answers work, but they all require a call to a non-deterministic function like sample() in the loop. This is not loop-invariant code (it is random each time), but it can still be moved out of the for loop. The trick is to use the n argument and generate all the random numbers you need beforehand (if your problem allows this; some may not, but many do). Now you make one call rather than n calls, which matters if your n is large. Here is a quick example random walk (but many problems can be phrased this way). Also, full disclosure: I haven't had any coffee today, so please point out if you see an error :-)
steps <- 30
n <- 100
directions <- c(-1, 1)
results <- vector('list', n)
for (i in seq_len(n)) {
walk <- numeric(steps)
for (s in seq_len(steps)) {
walk[s] <- sample(directions, 1)
}
results[[i]] <- sum(walk)
}
We can rewrite this with one call to sample():
all.steps <- sample(directions, n*steps, replace=TRUE)
dim(all.steps) <- c(n, steps)
walks <- apply(all.steps, 1, sum)
Proof of speed increase (n=10000):
> system.time({
+ for (i in seq_len(n)) {
+ walk <- numeric(steps)
+ for (s in seq_len(steps)) {
+ walk[s] <- sample(directions, 1)
+ }
+ results[[i]] <- sum(walk)
+ }})
user system elapsed
4.231 0.332 4.758
> system.time({
+ all.steps <- sample(directions, n*steps, replace=TRUE)
+ dim(all.steps) <- c(n, steps)
+ walks <- apply(all.steps, 1, sum)
+ })
user system elapsed
0.010 0.001 0.012
If your simulation needs just one random variable per simulation function call, use sapply(), or better yet the multicore package's mclapply(). Revolution Analytics's foreach package may be of use here too. Also, JD Long has a great presentation and post about simulating stuff in R on Hadoop via Amazon's EMR here (I can't find the video, but I'm sure someone will know).
Take home points:
Preallocate with numeric(n) or vector('list', n)
Push invariant code out of for loops. Cleverly push stochastic functions out of code with their n argument.
Try hard for sapply() or lapply(), or better yet mclapply.
Don't use x <- c(x, rnorm(100)). Every time you do this, a member of R-core kills a puppy.
Probably the best thing you can do is preallocate a list of length n (n is number of iterations) and flatten out the list after you're done.
n <- 10
start <- vector("list", n)
for (i in 1:n) {
a[[i]] <- sample(10)
}
start <- unlist(start)
You could do it the old nasty way. This may be slow for larger vectors.
start <- c()
for (i in 1:n) {
add <- sample(10)
start <- c(start, add)
}
x <- rnorm(100)
for (i in 100) {
x <- c(x, rnorm(100))
}
This link should be useful: http://www.milbo.users.sonic.net/ra/
Assuming your simulation function -- call it func -- returns a vector with the same length each time, you can store the results in the columns of a pre-allocated matrix:
sim1 <- function(reps, func) {
first <- func()
result <- matrix(first, nrow=length(first), ncol=reps)
for (i in seq.int(from=2, to=reps - 1)) {
result[, i] <- func()
}
return(as.vector(result))
}
Or you could express it as follows using replicate:
sim2 <- function(reps, func) {
return(as.vector(replicate(reps, func(), simplify=TRUE)))
}
> sim2(3, function() 1:3)
[1] 1 2 3 1 2 3 1 2 3