I want to find all possible permutations for large n using R. At the moment I am using permutations(n,n) from gtools package, but for n>10 it is almost impossible; I get memory crashes due to the large number of permutations (n!). I do not want to sample as I need to find the exact distribution for a particular statistic. Is there any way I can do this faster or that I can break it down into small blocks?
Your goal is very likely to be impractical (how large is "large n"??? even if you can generate an enormous number of permutations, how long is it going to take you to summarize over them? How much difference in accuracy is there going to be between an exhaustive computation on a billion elements and a random sample of ten million of them?). However:
The iterpc package can enumerate permutations in blocks. For example:
library("iterpc")
Set up an object ("iterator") to generate permutations of 10 objects:
I <- iterpc(10,labels=1:10,ordered=TRUE)
Return the first 5 permutations:
getnext(I,5)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 2 3 4 5 6 7 8 9 10
## [2,] 1 2 3 4 5 6 7 8 10 9
## [3,] 1 2 3 4 5 6 7 9 8 10
## [4,] 1 2 3 4 5 6 7 9 10 8
## [5,] 1 2 3 4 5 6 7 10 8 9
Return the next 5 permutations:
getnext(I,5)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 2 3 4 5 6 7 10 9 8
## [2,] 1 2 3 4 5 6 8 7 9 10
## [3,] 1 2 3 4 5 6 8 7 10 9
## [4,] 1 2 3 4 5 6 8 9 7 10
## [5,] 1 2 3 4 5 6 8 9 10 7
Assuming you can compute your statistic one block at a time and then combine the results, this should be feasible. It doesn't look like you can parallelize very easily, though: there's no way to jump to a particular element of an iterator ... The numperm function from the sna package provides "random" (i.e. non-sequential) access to permutations, although in a different ordering from those given by iterpc - but I'm guessing that iterpc is much more efficient, so you may be better off crunching through blocks sequentially on a single node/core/machine rather than distributing the process.
Here are the first 5 permutations as given by sna::numperm:
library("sna")
t(sapply(1:5,numperm,olength=10))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 2 1 3 4 5 6 7 8 9 10
## [2,] 2 3 1 4 5 6 7 8 9 10
## [3,] 2 3 4 1 5 6 7 8 9 10
## [4,] 2 3 4 5 1 6 7 8 9 10
## [5,] 2 3 4 5 6 1 7 8 9 10
The guts of iterpc are written in C++, so it should be very efficient, but no matter what things are going to get hard for larger values of n. To my surprise, iterpc can handle the full set of 10!=3628800 permutations without much trouble:
system.time(g <- getall(I))
## user system elapsed
## 0.416 0.304 0.719
dim(g)
## [1] 3628800 10
However, I can't do any computations with n>10 in a single block on my machine (n=11: "cannot allocate vector of size 1.6 Gb" ... n>11 "The length of the iterator is too large, try using getnext(I,d)")
Related
I find people use
which(matrix==max(matrix, na.rm=FALSE))
to show both row and column index.
But my question is how do I extract row index and column index individually and then return these two values into another parameters?
like matrix=
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 6 7 7 2 4 3 7 1 4
[2,] 1 9 8 7 2 6 10 9 5 2
[3,] 7 10 8 4 10 5 4 8 4 4
[4,] 4 3 1 1 3 3 9 7 4 2
[5,] 1 8 1 9 9 8 1 3 7 7
[6,] 2 6 7 5 6 10 4 6 15 1
the max value is matrix[6,9]=15 how could I find row =6 and column = 9 separately and return 6 to a parameter:A, 9 to parameter:B
Thank you guys very much.
For a large matrix which.max should be more efficient than which. So, for a matrix m, we can use
A = row(m)[d <- which.max(m)]
B = col(m)[d]
Maybe a roundabout way but if the matrix is called "mat":
colmax <- {which(mat == max(mat)) %/% nrow(mat)} + 1
rowmax <- which(mat == max(mat)) %% nrow(mat)
I am trying to learn the piping function (%>%).
When trying to convert from this line of code to another line it does not work.
---- R code -- original version -----
set.seed(1014)
replicate(6,sample(1:8))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 3 7 4 5 1
[2,] 2 8 4 2 4 2
[3,] 5 4 8 5 8 5
[4,] 3 1 2 1 1 7
[5,] 4 6 3 7 7 3
[6,] 6 5 1 3 3 8
[7,] 8 7 5 8 6 6
[8,] 7 2 6 6 2 4
---- R code - recoded with the pipe ----
> sample(1:8) %>% replicate(6,.)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 7 7 7 7 7 7
[2,] 3 3 3 3 3 3
[3,] 2 2 2 2 2 2
[4,] 1 1 1 1 1 1
[5,] 5 5 5 5 5 5
[6,] 4 4 4 4 4 4
[7,] 8 8 8 8 8 8
[8,] 6 6 6 6 6 6
Notice that when using pipes, the sampling does not work giving me
the same vector across.
That's to be expected. replicate expects an expression, but when using the pipe operator as is you just paste the result of the call to sample() to replicate. So you get 6 times the same result.
You have to use quote() to pass the expression to replicate instead of the result, but you shouldn't forget to evaluate each of the repetitions of that expression.
quote(sample(c(1:10,-99),6,rep=TRUE)) %>%
replicate(6, .) %>%
sapply(eval)
Gives:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 5 2 10 10 9 2
[2,] 4 3 1 3 -99 1
[3,] 10 2 3 8 2 4
[4,] -99 1 6 2 10 3
[5,] 8 -99 1 9 4 6
[6,] 4 10 8 1 -99 8
What happens here:
the piping sends and expression to replicate without evaluating it.
replicate replicates that expression and returns a list with 6 times that expression but without evaluating it.
sapply(eval) goes through the list and executes each expression in that list.
In your previous question (i.e. when using data.frame), you could have done eg:
quote(sample(c(1:10,-99),6,rep=TRUE)) %>%
replicate(6, .) %>%
data.frame
Now the function data.frame would force the expressions to be executed, but you also end up with terrible variable names, i.e. the expression itself.
If you want to learn more about the issues here, you'll have to dive into what is called "lazy evaluation" and how that is dealt with exactly by the pipe operator. But in all honesty, I really don't see any advantage of using the pipe operator in this case. It's not even more readable.
As per Frank's comment: You can use a mixture of piping and nesting of functions to avoid the sapply. But for that, you have to contain the nested functions inside a code block or the pipe operator won't process it correctly:
quote(sample(c(1:10,-99),6,rep=TRUE)) %>% {
replicate(6, eval(.)) }
Very interesting, but imho not really useful...
I found this while searching for a similar approach.
Selecting rows with same result in different columns in R
Is there a way to search within a range of columns? Playing off the example in the link, what if instead of catch[catch$tspp.name == catch$elasmo.name,], is it possible to do this?
catch[catch$tspp.name == c[23:56],] where R would search for values within columns 23 to 56 that match the tspp value?
Thanks in advance and please let me know whether it's better to post an independent question on a topic related to a previous post or to insert a follow up question within the aforementioned post.
Here's one way to do it. This finds rows of X where the first column appears in columns 2 through 9.
> set.seed(1)
> X<-matrix(sample(10,100,T),10)
> X
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 3 10 5 9 5 10 4 5 3
[2,] 4 2 3 6 7 9 3 9 8 1
[3,] 6 7 7 5 8 5 5 4 4 7
[4,] 10 4 2 2 6 3 4 4 4 9
[5,] 3 8 3 9 6 1 7 5 8 8
[6,] 9 5 4 7 8 1 3 9 3 8
[7,] 10 8 1 8 1 4 5 9 8 5
[8,] 7 10 4 2 5 6 8 4 2 5
[9,] 7 4 9 8 8 7 1 8 3 9
[10,] 1 8 4 5 7 5 9 10 2 7
> X[rowSums(X[,1]==X[,2:9])>0,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 3 10 5 9 5 10 4 5 3
[2,] 3 8 3 9 6 1 7 5 8 8
[3,] 9 5 4 7 8 1 3 9 3 8
[4,] 7 4 9 8 8 7 1 8 3 9
I want to calculate all permutations of a blocked design suitable for a Friedman test. Consider the following example:
thedata <- data.frame(
score = c(replicate(4,sample(1:3))),
judge = rep(1:4,each=3),
wine = rep.int(1:3,4)
)
Four judges ranked 3 wines and now I want to calculate every possible permutation within the data for every judge. I expect to see 1,296 permutations, as also given by:
require(permute)
CTRL <- how(within=Within("free"),
plots=Plots(strata=factor(thedata$judge)),
complete=TRUE,maxperm=1e9)
numPerms(12,CTRL)
However, allPerms(12,control=CTRL) produces the following error:
Error in (function (..., deparse.level = 1) :
number of rows of matrices must match (see arg 2)
I tried using the block argument, but it simply returns a matrix that repeats 4 times a matrix with the 6 possible permutations of 3 values:
CTRL <- how(within=Within("free"),
blocks=factor(thedata$judge),
complete=TRUE,maxperm=1e9)
allPerms(12,control=CTRL)
IMPORTANT NOTE:
I do have a custom function to obtain the result, using an adaptation of expand.grid() with permn() from the combinat package. I'm interested in where I misunderstand the permute package, not how I can calculate all these permutations myself.
The examples provided by #Joris identify two bugs in allPerms() that were not picked up by the current set of examples or unit tests (that'll be fixed soon too!).
The first issue is an obscure bug that I'll need a bit of time to think through a fix for. I have now implemented fixes for this bug too. Version 0.8-3 of permute now happily handles the Plots version of #joris' question:
R> p <- allPerms(12,control=CTRL)
R> dim(p)
[1] 1295 12
R> head(p)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 5 6 7 8 9 10 12 11
[2,] 1 2 3 4 5 6 7 8 9 11 10 12
[3,] 1 2 3 4 5 6 7 8 9 11 12 10
[4,] 1 2 3 4 5 6 7 8 9 12 10 11
[5,] 1 2 3 4 5 6 7 8 9 12 11 10
[6,] 1 2 3 4 5 6 7 9 8 10 11 12
R> packageVersion("permute")
[1] ‘0.8.3’
The second is an oversight. allPerms() generates permutation indices, but internally it works block by block. In the case #Joris reported, each block has 3 observations and hence 6 permutations of the indices 1:3. Once these permutation indices have been created, the code should have used them to index the row indices of the original data for each block. allPerms() was doing this for every conceivable combination of permutation types except the simple random permutation within blocks case. r2838 fixes this issue.
allPerms() was also not replicating each within-block permutation matrix to match each combination of rows in the other within-block permutation matrices. This requires an operation like expand.grid() but on the within-block permutation matrices. r2839 fixes this particular issue.
allPerms() works this way because it does not expect the within-block samples to be located contiguously within the original data series.
This second bug was fixed via r2838 and r2839 in the SVN sources on R-Forge.
R> require(permute)
Loading required package: permute
R> CTRL <- how(within=Within("free"),
+ blocks=factor(thedata$judge),
+ complete=TRUE,maxperm=1e9,
+ observed = TRUE)
R> numPerms(12,CTRL)
[1] 1296
R> tmp <- allPerms(12,control=CTRL)
R> dim(tmp)
[1] 1296 12
R> head(tmp)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 5 6 7 8 9 10 11 12
[2,] 1 2 3 4 5 6 7 8 9 10 12 11
[3,] 1 2 3 4 5 6 7 8 9 11 10 12
[4,] 1 2 3 4 5 6 7 8 9 11 12 10
[5,] 1 2 3 4 5 6 7 8 9 12 10 11
[6,] 1 2 3 4 5 6 7 8 9 12 11 10
R> tail(tmp)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1291,] 3 2 1 6 5 4 9 8 7 10 11 12
[1292,] 3 2 1 6 5 4 9 8 7 10 12 11
[1293,] 3 2 1 6 5 4 9 8 7 11 10 12
[1294,] 3 2 1 6 5 4 9 8 7 11 12 10
[1295,] 3 2 1 6 5 4 9 8 7 12 10 11
[1296,] 3 2 1 6 5 4 9 8 7 12 11 10
Is it possible in R to say - I want all indices from position i to the end of vector/matrix?
Say I want a submatrix from 3rd column onwards. I currently only know this way:
A = matrix(rep(1:8, each = 5), nrow = 5) # just generate some example matrix...
A[,3:ncol(A)] # get submatrix from 3rd column onwards
But do I really need to write ncol(A)? Isn't there any elegant way how to say "from the 3rd column onwards"? Something like A[,3:]? (or A[,3:...])?
Sometimes it's easier to tell R what you don't want. In other words, exclude columns from the matrix using negative indexing:
Here are two alternative ways that both produce the same results:
A[, -(1:2)]
A[, -seq_len(2)]
Results:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
But to answer your question as asked: Use ncol to find the number of columns. (Similarly there is nrow to find the number of rows.)
A[, 3:ncol(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
For rows (not columns as per your example) then head() and tail() could be utilised.
A <- matrix(rep(1:8, each = 5), nrow = 5)
tail(A, 3)
is almost the same as
A[3:dim(A)[1],]
(the rownames/indices printed are different is all).
Those work for vectors and data frames too:
> tail(1:10, 4)
[1] 7 8 9 10
> tail(data.frame(A = 1:5, B = 1:5), 3)
A B
3 3 3
4 4 4
5 5 5
For the column versions, you could adapt tail(), but it is a bit trickier. I wonder if NROW() and NCOL() might be useful here, rather than dim()?:
> A[, 3:NCOL(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
Or flip this on its head and instead of asking R for things, ask it to drop things instead. Here is a function that encapsulates this:
give <- function(x, i, dimen = 1L) {
ind <- seq_len(i-1)
if(isTRUE(all.equal(dimen, 1L))) { ## rows
out <- x[-ind, ]
} else if(isTRUE(all.equal(dimen, 2L))) { ## cols
out <- x[, -ind]
} else {
stop("Only for 2d objects")
}
out
}
> give(A, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 1 2 3 4 5 6 7 8
[3,] 1 2 3 4 5 6 7 8
> give(A, 3, dimen = 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
You can use the following instruction:
A[, 3:length(A[, 1])]
A dplyr readable renewed approach for the same thing:
A %>% as_tibble() %>%
select(-c(V1,V2))
A %>% as_tibble() %>%
select(V3:ncol(A))