R: first N of all permutations - r

I'm looking for a function that
can list all n! permutations of a given input vector (typically just the sequence 1:n)
can also list just the first N of all n! permutations
The first requirement is met, e.g., by permn() from package combinat, permutations() from package e1071, or permutations() from package gtools. However, I'm positive that there is yet another function from some package that also provides the second feature. I used it once, but have since forgotten its name.
Edit:
The definition of "first N" is arbitrary: the function just needs an internal enumeration scheme which is always followed, and should break after N permutations are computed.
As Spacedman correctly pointed out, it's crucial that the function does not compute more permutations than actually needed (to save time).
Edit - solution: I remembered what I was using, it was numperm() from package sna. numperm(4, 7) gives the 7th permutation of elements 1:4, for the first N, one has to loop.

It seems like the best way to approach this would be to construct an iterator that could produce the list of permutations rather than using a function like permn which generates the entire list up front (an expensive operation).
An excellent place to look for guidance on constructing such objects is the itertools module in the Python standard library. Itertools has been partially re-implemented for R as a package of the same name.
The following is an example that uses R's itertools to implement a port of the Python generator that creates iterators for permutations:
require(itertools)
permutations <- function(iterable) {
# Returns permutations of iterable. Based on code given in the documentation
# of the `permutation` function in the Python itertools module:
# http://docs.python.org/library/itertools.html#itertools.permutations
n <- length(iterable)
indicies <- seq(n)
cycles <- rev(indicies)
stop_iteration <- FALSE
nextEl <- function(){
if (stop_iteration){ stop('StopIteration', call. = FALSE) }
if (cycles[1] == 1){ stop_iteration <<- TRUE } # Triggered on last iteration
for (i in rev(seq(n))) {
cycles[i] <<- cycles[i] - 1
if ( cycles[i] == 0 ){
if (i < n){
indicies[i:n] <<- c(indicies[(i+1):n], indicies[i])
}
cycles[i] <<- n - i + 1
}else{
j <- cycles[i]
indicies[c(i, n-j+1)] <<- c(indicies[n-j+1], indicies[i])
return( iterable[indicies] )
}
}
}
# chain is used to return a copy of the original sequence
# before returning permutations.
return( chain(list(iterable), new_iterator(nextElem = nextEl)) )
}
To misquote Knuth: "Beware of bugs in the above code; I have only tried it, not proved it correct."
For the first 3 permutations of the sequence 1:10, permn pays a heavy price for computing unnecessary permutations:
> system.time( first_three <- permn(1:10)[1:3] )
user system elapsed
134.809 0.439 135.251
> first_three
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] 1 2 3 4 5 6 7 8 10 9
[[3]]
[1] 1 2 3 4 5 6 7 10 8 9)
However, the iterator returned by permutations can be queried for only the first three elements which spares a lot of computations:
> system.time( first_three <- as.list(ilimit(permutations(1:10), 3)) )
user system elapsed
0.002 0.000 0.002
> first_three
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] 1 2 3 4 5 6 7 8 10 9
[[3]]
[1] 1 2 3 4 5 6 7 9 8 10
The Python algorithm does generate permutations in a different order than permn.
Computing all the permutations is still possible:
> system.time( all_perms <- as.list(permutations(1:10)) )
user system elapsed
498.601 0.672 499.284
Though much more expensive as the Python algorithm makes heavy use of loops compared to permn. Python actually implements this algorithm in C which compensates for the inefficiency of interpreted loops.
The code is available in a gist on GitHub. If anyone has a better idea, fork away!

In my version of R/combinat, the function permn() is just over thirty lines long. One way would be to make a copy of permn and change it to stop early.

Related

How to return multiple objects and only show part of the return when we call the function

For user defined function, I use list if there are multiple objects to return.
However, not all information are equal important.
For example, I am writing a function which estimates for 3 parameters by iteration. The final converged result is the most important, so I would like to see 3 numbers after I call my function. While the history of iteration (all steps of estimation) sometimes is needed, but printing out all steps all the time occupies the whole screen.
Currently, I use list to return 3 matrices, which contains all steps.
Is there a way that make the function return the same thing, but when I call the function, it only show the last 3 converged estimates. And if I need the estimation step, then I use $ to get them. So it looks like:
MyEstimate(arg1, arg2, ...) # only show 3 final estimates
model <- MyEstimate(arg1, arg2, ...)
model$theta1 # show all steps of estimates of theta1
Basically, I want it works like 'lm' function:
Show something important, like estimates of parameters;
Don't show, but still can access if we want, like the design matrix X
I reckon there is no simple answer for that.
To achieved this, what should I learn?
You can use print to always print the values you are interested in when you call the function, while maintain all information in the returned object.
> myFun <- function(){
a <- sample(1:10, 10, TRUE)
b <- sample(1:10, 10, TRUE)
c <- sample(1:10, 10, TRUE)
# Print Last value
print(tail(a,1))
print(tail(b,1))
print(tail(c,1))
return(list(a = a, b=b, c=c))
}
> Obj <- myFun()
[1] 10
[1] 5
[1] 2
> Obj
$a
[1] 2 9 4 7 3 2 2 5 1 10
$b
[1] 6 4 9 8 8 9 2 8 6 5
$c
[1] 2 6 9 2 6 7 8 2 6 2
You can use the S3 classes mechanism to have your function MyEstimate return an object of a special class, made up by you, and write a print method for that class. You would subclass class "list".
If this special class is named "tautology" you would write method print.tautology. Something along the lines of the following:
print.tautology <- function(x){
h <- tail(x[[1]], 3)
print(h)
invisible(h)
}
MyEstimate <- function(x, y, ...){
res <- list(theta1 = 1:10, beta2 = 11:15)
class(res) <- c("tautology", class(res))
res
}
arg1 <- 1
arg2 <- 2
MyEstimate(arg1, arg2) # only show 3 final estimates
#[1] 8 9 10
model <- MyEstimate(arg1, arg2)
model$theta1 # show all steps of estimates of theta1
#[1] 1 2 3 4 5 6 7 8 9 10

R: set.seed produces the same result after seed removal

Background. I want to generate random sequences within a for cycle in R v.3.5.0. To do this I use the code like bellow:
rm(.Random.seed, envir=globalenv())
some_list = list()
for (iter in 1:3) {
set.seed(iter)
some_list[[iter]] = sample(1:10)
}
some_list
This code returns me a list like this:
> some_list
[[1]]
[1] 3 4 5 7 2 8 9 6 10 1
[[2]]
[1] 2 7 5 10 6 8 1 3 4 9
[[3]]
[1] 2 8 4 3 9 6 1 5 10 7
After that I'm rerunning the same script, and expect to have the seed to be reset after running rm(.Random.seed, envir=globalenv()) within session, hence get different result.
But the reality is different - I receive exact the same list even after removal of .Random.seed from globalenv().
Please, see the screen attached with exact sequence of commands:
Sequence of commands
I'm really confused by such behaviour of set.seed.
My question is:
1) Is such behaviour of set.seed normal?
2) How to reset seed if rm(.Random.seed, envir=globalenv()) do not work?
Thanks in advance.
It seems like you are aiming for random behaviour with the call to rm(.Random.seed, envir=globalenv()), so why not just remove the set.seed from your code altogether?
rm(.Random.seed, envir=globalenv())
some_list = list()
for (iter in 1:3) {
some_list[[iter]] = sample(1:10)
}
some_list
The above produces different results each time you run it. There is no need to have set.seed in our code.
I created workaround which based on usage of Sys.time() as a seed. Here is a code:
some_list = list()
for (iter in 1:3) {
set.seed(as.numeric(Sys.time()))
some_list[[iter]] = sample(1:10)
Sys.sleep(1)
}
some_list
But, neverthless, I needed to add Sys.sleep(1) because this solution does not work if operation in the cycle lasts less than 1 second.
I beleive that this is just workaround and the main question is still opened.

R keep randomly generating numbers until all numbers within specified range are present

My aim is to randomly generate a vector of integers using R, which is populated by numbers between 1-8. However, I want to keep growing the vector until all the numbers between 1:8 are represented at least once, e.g. 1,4,6,2,2,3,5,1,4,7,6,8.
I am able to generate single numbers or a sequence of numbers using sample
x=sample(1:8,1, replace=T)
>x
[1] 6
I have played around with the repeat function to see how it might work with sample and I can at least get the generation to stop when one specific number occurs, e.g.
repeat {
print(x)
x = sample(1:8, 1, replace=T)
if (x == 3){
break
}
}
Which gives:
[1] 3
[1] 6
[1] 6
[1] 6
[1] 6
[1] 6
[1] 2
I am struggling now to work out how to stop number generation once all numbers between 1:8 are present. Additionally, I know that the above code is only printing the sequence as it is generated and not storing it as a vector. Any advice pointing me in the right direction would be really appreciated!
This is fine for 1:8 but might not always be a good idea.
foo = integer(0)
set.seed(42)
while(TRUE){
foo = c(foo, sample(1:8, 1))
if(all(1:8 %in% foo)) break
}
foo
# [1] 8 8 3 7 6 5 6 2 6 6 4 6 8 3 4 8 8 1
If you have more than 1:8, it may be better to obtain the average number of tries (N) required to get all the numbers at least once and then sample N numbers such that all numbers are sampled at least once.
set.seed(42)
vec = 1:8
N = ceiling(sum(length(vec)/(1:length(vec))))
foo = sample(c(vec, sample(vec, N - length(vec), TRUE)))
foo
# [1] 3 6 8 3 8 8 6 4 5 6 1 6 4 6 6 3 5 7 2 2 7 8
Taking cue off of d.b, here's a slightly more verbose method that is more memory-efficient (and a little faster too, though I doubt speed is your issue):
Differences:
pre-allocate memory in chunks (size 100 here), mitigates the problem with extend-by-one vector work; allocating and extending 100 (or even 1000) at a time is much lower cost
compare only the newest number instead of all numbers each time (the first n-1 numbers have already been tabulated, no need to do that again)
Code:
microbenchmark(
r2evans = {
emptyvec100 <- integer(100)
counter <- 0
out <- integer(0)
unseen <- seq_len(n)
set.seed(42)
repeat {
if (counter %% 100 == 0) out <- c(out, emptyvec100)
counter <- counter+1
num <- sample(n, size=1)
unseen <- unseen[unseen != num]
out[counter] <- num
if (!length(unseen)) break
}
out <- out[1:counter]
},
d.b = {
foo = integer(0)
set.seed(42)
while(TRUE){
foo = c(foo, sample(1:n, 1))
if(all(1:n %in% foo)) break
}
}, times = 100, unit = 'us')
# Unit: microseconds
# expr min lq mean median uq max neval
# r2evans 1090.007 1184.639 1411.531 1228.947 1320.845 11344.24 1000
# d.b 1242.440 1372.264 1835.974 1441.916 1597.267 14592.74 1000
(This is intended neither as code-golf nor speed-optimization. My primary goal is to argue against extend-by-one vector work, and suggest a more efficient comparison technique.)
As d.b further suggested, this works fine for 1:8 but may run into trouble with larger numbers. If we extend n up:
(Edit: with d.b's code changes, the execution times are much closer, and not nearly as exponential looking. Apparently the removal of unique had significant benefits to his code.)

R vector staying the same length after indexing within recursive function

I wrote a recursive binary search function in R which finds the smallest element in a vector that is greater than a given value:
binary_next_biggest <- function(x, vec){
if (length(vec) == 1){
if (x < vec[1]){
return(vec[1])
} else {
return(NA)
}
} else {
mid = ceiling(length(vec)/2)
if (x < vec[mid]){
return(binary_next_biggest(x, vec[1:mid]))
} else {
return(binary_next_biggest(x, vec[mid+1:length(vec)]))
}
}
}
I've written this exact same function in Python with no issues (code below), but in R it does not work.
import numpy as np
def binary_next_biggest(x, arr):
if len(arr)==1:
if x < arr[0]:
return arr[0]
else:
return None
else:
mid = int(np.ceil(len(arr)/2)-1)
if x < arr[mid]:
return binary_next_biggest(x, arr[:mid+1])
else:
return binary_next_biggest(x, arr[mid+1:])
Through debugging in RStudio I discovered the mechanics of why it's not working: indexing the vector in my above function is returning a vector of the same length, so that if
vec <- 1:10
and vec is indexed within the function,
vec[6:10]
the resulting vector passed to the new call of binary_next_biggest() is
6 7 8 9 10 NA NA NA NA NA
where I would expect
6 7 8 9 10
What's going on here? I know I can just rewrite it as a while loop iteratively changing indexes, but I don't understand why vector indexing is behaving this way in the code I've written. Within the interactive R console indexing behaves as expected and changes the vector length, so why would it behave differently within a function, and what would be the appropriate way to index for what I'm trying to do?
The cause of the strange behavior of the code is an error in indexing of the vector elements. The part mid+1:length(vec) should be (mid+1):length(vec) because the : operator is executed before addition.
Here is an illustration of the difference.
5 + 1:10
# [1] 6 7 8 9 10 11 12 13 14 15
(5+1):10
# [1] 6 7 8 9 10
There might be a reason why you're doing a binary search (simplified example of more complicated problem?), but there are easier ways to do this in R.
vec <- 1:1000
x <- 49
min(vec[which(vec > x)])
# [1] 50
Which works even if vec isn't ordered.
vec <- sample.int(1000)
min(vec[which(vec > x)])
# [1] 50

Translate mathematical function into R

I have this mathematical function
I have written R code:
result <- 0
for (i in length(v)) {
result <- abs(x-v[i])
}
return(result)
to compute the function.
However, this does not seems efficient to me? How to implement this sum with the R sum() function?
I appreciate your answer!
sum(abs(x-v)) show be enough, no need the for loop, since arithmetic operations in R are vectorized
# Example
> x <- 5
> v <- 1:10
> abs(x-v)
[1] 4 3 2 1 0 1 2 3 4 5
> sum(abs(x-v))
[1] 25

Resources