R: set.seed produces the same result after seed removal - r

Background. I want to generate random sequences within a for cycle in R v.3.5.0. To do this I use the code like bellow:
rm(.Random.seed, envir=globalenv())
some_list = list()
for (iter in 1:3) {
set.seed(iter)
some_list[[iter]] = sample(1:10)
}
some_list
This code returns me a list like this:
> some_list
[[1]]
[1] 3 4 5 7 2 8 9 6 10 1
[[2]]
[1] 2 7 5 10 6 8 1 3 4 9
[[3]]
[1] 2 8 4 3 9 6 1 5 10 7
After that I'm rerunning the same script, and expect to have the seed to be reset after running rm(.Random.seed, envir=globalenv()) within session, hence get different result.
But the reality is different - I receive exact the same list even after removal of .Random.seed from globalenv().
Please, see the screen attached with exact sequence of commands:
Sequence of commands
I'm really confused by such behaviour of set.seed.
My question is:
1) Is such behaviour of set.seed normal?
2) How to reset seed if rm(.Random.seed, envir=globalenv()) do not work?
Thanks in advance.

It seems like you are aiming for random behaviour with the call to rm(.Random.seed, envir=globalenv()), so why not just remove the set.seed from your code altogether?
rm(.Random.seed, envir=globalenv())
some_list = list()
for (iter in 1:3) {
some_list[[iter]] = sample(1:10)
}
some_list
The above produces different results each time you run it. There is no need to have set.seed in our code.

I created workaround which based on usage of Sys.time() as a seed. Here is a code:
some_list = list()
for (iter in 1:3) {
set.seed(as.numeric(Sys.time()))
some_list[[iter]] = sample(1:10)
Sys.sleep(1)
}
some_list
But, neverthless, I needed to add Sys.sleep(1) because this solution does not work if operation in the cycle lasts less than 1 second.
I beleive that this is just workaround and the main question is still opened.

Related

Return value in R function using for loop

I've made this simple code to test something that isn't working.
funcion=function(x,p){
for(i in 1:p){
return(x+i)
}
}
funcion(5,5)
This returns the value 6, and not 6,7,8,9,10 which is what I would've expected and what I'm looking for.
Can someone explain why this works this way and how can I make it so that I get what I want?
Thank you
We need to collect the output in an object and then return
f1 <- function(x, p) {
# // create an object to store the output from each iteration
out <- numeric(p)
for(i in seq_len(p)) {
out[i] <- x + i # // assign the output based on the index
}
return(out)
}
f1(5, 5)
#[1] 6 7 8 9 10
In R, this can be executed without a for loop i.e.
5 + seq_len(5)
#[1] 6 7 8 9 10
The issue is that return inside a function returns only once. So, it gets executed the first time with x + 1 and it return that output instead of the full output

How to interrupt a function within a loop after a certain time in R?

I'm running an algorithm several times through a for loop in R. My loop is very basic and looks like this.
iter <- 5 #number of iterations
result <- list()
for (i in 1:iter) {
fit <- algorithm() #this is an example function that starts the algorithm
result[[i]] <- print(fit)
}
The problem is that the running times vary greatly with each run. There are runs that take only 10 minutes, others take over an hour. However, I know that the longer running times are due to the fact that the algorithm has problems because of the initial values and that the results of these runs will be wrong anyway.
So, I am now looking for a solution that (1) interrupts the function (i.e. algorithm() in the example above) after e.g. 1000 seconds, (2) proceeds with the for loop and (3) adds an additional iteration for each interruption. So, in the end, I want results from five runs with a running time less than 1000 seconds.
Does anyone have an idea? Is this even technically possible? Thanks in advance!
I think you can use setTimeLimit for this.
Quick demo:
setTimeLimit(elapsed = 2)
Sys.sleep(999)
# Error in Sys.sleep(999) : reached elapsed time limit
setTimeLimit(elapsed = Inf)
(It's important to note that you should return the time limit setting when you no longer desire its interruption.)
My "complex algorithm" will sleep a random length. Those random lengths are
set.seed(42)
sleeps <- sample(10, size=5)
sleeps
# [1] 1 5 10 8 2
I'm going to set an arbitrary limit of 6 seconds, beyond which the sleep will be interrupted and we'll get no return value. This should interrupt the third and fourth elements.
iter <- 5
result <- list()
for (i in seq_len(iter)) {
result[[i]] <- tryCatch({
setTimeLimit(elapsed = 6)
Sys.sleep(sleeps[[i]])
setTimeLimit(elapsed = Inf)
c(iter = i, slp = sleeps[[i]])
}, error = function(e) NULL)
}
result
# [[1]]
# iter slp
# 1 1
# [[2]]
# iter slp
# 2 5
# [[3]]
# NULL
# [[4]]
# NULL
# [[5]]
# iter slp
# 5 2
If you have different "sleeps" and you end up with a shorter object than you need, just append it:
result <- c(result, vector("list", 5 - length(result)))
I'll enhance this slightly, for a couple of things:
I prefer lapply to for loops when filling result in this way; and
since complex algorithms can fail for other reasons, if my sleep failed early then the time limit would not be reset, so I'll use on.exit, which ensures that a function will be called when its enclosure exits, whether due to error or not.
result <- lapply(seq_len(iter), function(i) {
setTimeLimit(elapsed = 6)
on.exit(setTimeLimit(elapsed = Inf), add = TRUE)
tryCatch({
Sys.sleep(sleeps[i])
c(iter = i, slp = sleeps[i])
}, error = function(e) NULL)
})
result
# [[1]]
# iter slp
# 1 1
# [[2]]
# iter slp
# 2 5
# [[3]]
# NULL
# [[4]]
# NULL
# [[5]]
# iter slp
# 5 2
In this case, result is length 5, since lapply will always return something for each iteration. (The use of lapply is idiomatic for R, where its efficiencies are often in apply and map-like methods, unlike other languages where real speed is realized with literal for loops.)
(BTW: instead of the on.exit logic, I could have used tryCatch(..., finally=setTimeLimit(elapsed=Inf)) as well.)
An alternative to the on.exit logic is to use setTimeLimit(.., transient=TRUE) from within the execution block to be limited. That would make this code
result <- lapply(seq_len(iter), function(i) {
tryCatch({
setTimeLimit(elapsed = 6, transient = TRUE)
Sys.sleep(sleeps[i])
c(iter = i, slp = sleeps[i])
},
error = function(e) NULL)
})
One benefit of this is that regardless of the success/interruption of the limited code block, once that is done then the limit is immediately lifted, so there is less risk of inadvertently leaving it in place.

Race condition with the parallel package in R

I am trying to execute a function with side effect on a vector in parallel. For example, in the following snippet, add.entry has the side effect of modifying master.
library(parallel)
master <- data.frame()
add.entry <- function(x) {
row <- data.frame(a = x, b = sin(x))
master <- rbind(master, row)
}
mclapply(1:42, add.entry)
The output I get is
[[1]] a b 1 1 0.841471
[[2]] a b 1 2 0.9092974
[[3]] a b 1 3 0.14112
[[4]] a b 1 4 -0.7568025
However, master contains nothing afterwards. My best guess is that there is some race condition involved. How can I fix it, like maybe declaring a critical section?
it is very slow to grow an object inside a loop (cf. https://privefl.github.io/blog/why-loops-are-slow-in-r/).
when you use parallelism, you don't rbind() to the master in your global environment, but to some copies of it in your different forks (cf. https://privefl.github.io/blog/a-guide-to-parallelism-in-r/).
mclapply already returns something (like lapply).
You can simply do
library(parallel)
add.entry <- function(x) {
data.frame(a = x, b = sin(x))
}
res_list <- mclapply(1:42, add.entry)
master <- do.call("rbind", res_list)

How to return multiple objects and only show part of the return when we call the function

For user defined function, I use list if there are multiple objects to return.
However, not all information are equal important.
For example, I am writing a function which estimates for 3 parameters by iteration. The final converged result is the most important, so I would like to see 3 numbers after I call my function. While the history of iteration (all steps of estimation) sometimes is needed, but printing out all steps all the time occupies the whole screen.
Currently, I use list to return 3 matrices, which contains all steps.
Is there a way that make the function return the same thing, but when I call the function, it only show the last 3 converged estimates. And if I need the estimation step, then I use $ to get them. So it looks like:
MyEstimate(arg1, arg2, ...) # only show 3 final estimates
model <- MyEstimate(arg1, arg2, ...)
model$theta1 # show all steps of estimates of theta1
Basically, I want it works like 'lm' function:
Show something important, like estimates of parameters;
Don't show, but still can access if we want, like the design matrix X
I reckon there is no simple answer for that.
To achieved this, what should I learn?
You can use print to always print the values you are interested in when you call the function, while maintain all information in the returned object.
> myFun <- function(){
a <- sample(1:10, 10, TRUE)
b <- sample(1:10, 10, TRUE)
c <- sample(1:10, 10, TRUE)
# Print Last value
print(tail(a,1))
print(tail(b,1))
print(tail(c,1))
return(list(a = a, b=b, c=c))
}
> Obj <- myFun()
[1] 10
[1] 5
[1] 2
> Obj
$a
[1] 2 9 4 7 3 2 2 5 1 10
$b
[1] 6 4 9 8 8 9 2 8 6 5
$c
[1] 2 6 9 2 6 7 8 2 6 2
You can use the S3 classes mechanism to have your function MyEstimate return an object of a special class, made up by you, and write a print method for that class. You would subclass class "list".
If this special class is named "tautology" you would write method print.tautology. Something along the lines of the following:
print.tautology <- function(x){
h <- tail(x[[1]], 3)
print(h)
invisible(h)
}
MyEstimate <- function(x, y, ...){
res <- list(theta1 = 1:10, beta2 = 11:15)
class(res) <- c("tautology", class(res))
res
}
arg1 <- 1
arg2 <- 2
MyEstimate(arg1, arg2) # only show 3 final estimates
#[1] 8 9 10
model <- MyEstimate(arg1, arg2)
model$theta1 # show all steps of estimates of theta1
#[1] 1 2 3 4 5 6 7 8 9 10

R: first N of all permutations

I'm looking for a function that
can list all n! permutations of a given input vector (typically just the sequence 1:n)
can also list just the first N of all n! permutations
The first requirement is met, e.g., by permn() from package combinat, permutations() from package e1071, or permutations() from package gtools. However, I'm positive that there is yet another function from some package that also provides the second feature. I used it once, but have since forgotten its name.
Edit:
The definition of "first N" is arbitrary: the function just needs an internal enumeration scheme which is always followed, and should break after N permutations are computed.
As Spacedman correctly pointed out, it's crucial that the function does not compute more permutations than actually needed (to save time).
Edit - solution: I remembered what I was using, it was numperm() from package sna. numperm(4, 7) gives the 7th permutation of elements 1:4, for the first N, one has to loop.
It seems like the best way to approach this would be to construct an iterator that could produce the list of permutations rather than using a function like permn which generates the entire list up front (an expensive operation).
An excellent place to look for guidance on constructing such objects is the itertools module in the Python standard library. Itertools has been partially re-implemented for R as a package of the same name.
The following is an example that uses R's itertools to implement a port of the Python generator that creates iterators for permutations:
require(itertools)
permutations <- function(iterable) {
# Returns permutations of iterable. Based on code given in the documentation
# of the `permutation` function in the Python itertools module:
# http://docs.python.org/library/itertools.html#itertools.permutations
n <- length(iterable)
indicies <- seq(n)
cycles <- rev(indicies)
stop_iteration <- FALSE
nextEl <- function(){
if (stop_iteration){ stop('StopIteration', call. = FALSE) }
if (cycles[1] == 1){ stop_iteration <<- TRUE } # Triggered on last iteration
for (i in rev(seq(n))) {
cycles[i] <<- cycles[i] - 1
if ( cycles[i] == 0 ){
if (i < n){
indicies[i:n] <<- c(indicies[(i+1):n], indicies[i])
}
cycles[i] <<- n - i + 1
}else{
j <- cycles[i]
indicies[c(i, n-j+1)] <<- c(indicies[n-j+1], indicies[i])
return( iterable[indicies] )
}
}
}
# chain is used to return a copy of the original sequence
# before returning permutations.
return( chain(list(iterable), new_iterator(nextElem = nextEl)) )
}
To misquote Knuth: "Beware of bugs in the above code; I have only tried it, not proved it correct."
For the first 3 permutations of the sequence 1:10, permn pays a heavy price for computing unnecessary permutations:
> system.time( first_three <- permn(1:10)[1:3] )
user system elapsed
134.809 0.439 135.251
> first_three
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] 1 2 3 4 5 6 7 8 10 9
[[3]]
[1] 1 2 3 4 5 6 7 10 8 9)
However, the iterator returned by permutations can be queried for only the first three elements which spares a lot of computations:
> system.time( first_three <- as.list(ilimit(permutations(1:10), 3)) )
user system elapsed
0.002 0.000 0.002
> first_three
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] 1 2 3 4 5 6 7 8 10 9
[[3]]
[1] 1 2 3 4 5 6 7 9 8 10
The Python algorithm does generate permutations in a different order than permn.
Computing all the permutations is still possible:
> system.time( all_perms <- as.list(permutations(1:10)) )
user system elapsed
498.601 0.672 499.284
Though much more expensive as the Python algorithm makes heavy use of loops compared to permn. Python actually implements this algorithm in C which compensates for the inefficiency of interpreted loops.
The code is available in a gist on GitHub. If anyone has a better idea, fork away!
In my version of R/combinat, the function permn() is just over thirty lines long. One way would be to make a copy of permn and change it to stop early.

Resources