I want to calculate all permutations of a blocked design suitable for a Friedman test. Consider the following example:
thedata <- data.frame(
score = c(replicate(4,sample(1:3))),
judge = rep(1:4,each=3),
wine = rep.int(1:3,4)
)
Four judges ranked 3 wines and now I want to calculate every possible permutation within the data for every judge. I expect to see 1,296 permutations, as also given by:
require(permute)
CTRL <- how(within=Within("free"),
plots=Plots(strata=factor(thedata$judge)),
complete=TRUE,maxperm=1e9)
numPerms(12,CTRL)
However, allPerms(12,control=CTRL) produces the following error:
Error in (function (..., deparse.level = 1) :
number of rows of matrices must match (see arg 2)
I tried using the block argument, but it simply returns a matrix that repeats 4 times a matrix with the 6 possible permutations of 3 values:
CTRL <- how(within=Within("free"),
blocks=factor(thedata$judge),
complete=TRUE,maxperm=1e9)
allPerms(12,control=CTRL)
IMPORTANT NOTE:
I do have a custom function to obtain the result, using an adaptation of expand.grid() with permn() from the combinat package. I'm interested in where I misunderstand the permute package, not how I can calculate all these permutations myself.
The examples provided by #Joris identify two bugs in allPerms() that were not picked up by the current set of examples or unit tests (that'll be fixed soon too!).
The first issue is an obscure bug that I'll need a bit of time to think through a fix for. I have now implemented fixes for this bug too. Version 0.8-3 of permute now happily handles the Plots version of #joris' question:
R> p <- allPerms(12,control=CTRL)
R> dim(p)
[1] 1295 12
R> head(p)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 5 6 7 8 9 10 12 11
[2,] 1 2 3 4 5 6 7 8 9 11 10 12
[3,] 1 2 3 4 5 6 7 8 9 11 12 10
[4,] 1 2 3 4 5 6 7 8 9 12 10 11
[5,] 1 2 3 4 5 6 7 8 9 12 11 10
[6,] 1 2 3 4 5 6 7 9 8 10 11 12
R> packageVersion("permute")
[1] ‘0.8.3’
The second is an oversight. allPerms() generates permutation indices, but internally it works block by block. In the case #Joris reported, each block has 3 observations and hence 6 permutations of the indices 1:3. Once these permutation indices have been created, the code should have used them to index the row indices of the original data for each block. allPerms() was doing this for every conceivable combination of permutation types except the simple random permutation within blocks case. r2838 fixes this issue.
allPerms() was also not replicating each within-block permutation matrix to match each combination of rows in the other within-block permutation matrices. This requires an operation like expand.grid() but on the within-block permutation matrices. r2839 fixes this particular issue.
allPerms() works this way because it does not expect the within-block samples to be located contiguously within the original data series.
This second bug was fixed via r2838 and r2839 in the SVN sources on R-Forge.
R> require(permute)
Loading required package: permute
R> CTRL <- how(within=Within("free"),
+ blocks=factor(thedata$judge),
+ complete=TRUE,maxperm=1e9,
+ observed = TRUE)
R> numPerms(12,CTRL)
[1] 1296
R> tmp <- allPerms(12,control=CTRL)
R> dim(tmp)
[1] 1296 12
R> head(tmp)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 5 6 7 8 9 10 11 12
[2,] 1 2 3 4 5 6 7 8 9 10 12 11
[3,] 1 2 3 4 5 6 7 8 9 11 10 12
[4,] 1 2 3 4 5 6 7 8 9 11 12 10
[5,] 1 2 3 4 5 6 7 8 9 12 10 11
[6,] 1 2 3 4 5 6 7 8 9 12 11 10
R> tail(tmp)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1291,] 3 2 1 6 5 4 9 8 7 10 11 12
[1292,] 3 2 1 6 5 4 9 8 7 10 12 11
[1293,] 3 2 1 6 5 4 9 8 7 11 10 12
[1294,] 3 2 1 6 5 4 9 8 7 11 12 10
[1295,] 3 2 1 6 5 4 9 8 7 12 10 11
[1296,] 3 2 1 6 5 4 9 8 7 12 11 10
Related
I find people use
which(matrix==max(matrix, na.rm=FALSE))
to show both row and column index.
But my question is how do I extract row index and column index individually and then return these two values into another parameters?
like matrix=
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 6 7 7 2 4 3 7 1 4
[2,] 1 9 8 7 2 6 10 9 5 2
[3,] 7 10 8 4 10 5 4 8 4 4
[4,] 4 3 1 1 3 3 9 7 4 2
[5,] 1 8 1 9 9 8 1 3 7 7
[6,] 2 6 7 5 6 10 4 6 15 1
the max value is matrix[6,9]=15 how could I find row =6 and column = 9 separately and return 6 to a parameter:A, 9 to parameter:B
Thank you guys very much.
For a large matrix which.max should be more efficient than which. So, for a matrix m, we can use
A = row(m)[d <- which.max(m)]
B = col(m)[d]
Maybe a roundabout way but if the matrix is called "mat":
colmax <- {which(mat == max(mat)) %/% nrow(mat)} + 1
rowmax <- which(mat == max(mat)) %% nrow(mat)
The command
matrix(sample.int(12, 9*12, TRUE), 9, 12)
generates an integer random matrix (9 rows and 12 columns) with integer values from 1 to 12. I wonder if there is a version of this code that generates a matrix whose rows are integer random rows with value from 1 to 12 (without repetition). I was able to find a "trivial" answer to this question; with
matrix(sample.int(m, 1*12), 9, 12, byrow=TRUE)
I obtain a matrix of this kind, but the rows are all equal to each other (this is the same row repeated 9 times).
The replicate function (which repeats an operation like sample(12) a specified number of times) returns a matrix whose column major orientation can be flipped to your desired row orientation with t:
t( replicate(9, {sample(12)} ) )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 9 11 5 3 4 6 2 8 10 12 7 1
[2,] 4 5 12 6 8 2 9 1 11 10 7 3
[3,] 9 8 10 12 2 6 3 7 4 1 11 5
[4,] 4 9 1 2 6 11 8 5 7 3 12 10
[5,] 1 2 4 5 11 6 3 8 10 9 12 7
[6,] 4 8 10 12 5 9 2 7 11 1 3 6
[7,] 5 7 8 4 1 6 10 11 2 3 12 9
[8,] 2 4 10 1 12 5 7 6 11 3 8 9
[9,] 2 7 9 11 8 1 12 10 6 5 3 4
The replicate function is used in a lot of simulation code.
I have a 6 x 10 matrix where I have to find the row index and column index of the maximum value in each row.
set.seed(75)
amat <- matrix( sample(10, size=60, replace=T), nrow=6)
which gives me the matrix:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 6 7 7 2 4 3 7 1 4
[2,] 1 9 8 7 2 6 10 9 5 2
[3,] 7 10 8 4 10 5 4 8 4 4
[4,] 4 3 1 1 3 3 9 7 4 2
[5,] 1 8 1 9 9 8 1 3 7 7
[6,] 2 6 7 5 6 10 4 6 10 1
Now, I want to navigate row by row, and get the row index and column index of the maximum value in each row.
To get the maximum value in each row, I did:
apply(amat,1,max)
[1] 7 10 10 9 9 10
How do I get the row and column indices of the first occurrence of the maximum value?
Thanks
We can use max.col
cbind(1:nrow(amat), max.col(amat, 'first'))
I want to find all possible permutations for large n using R. At the moment I am using permutations(n,n) from gtools package, but for n>10 it is almost impossible; I get memory crashes due to the large number of permutations (n!). I do not want to sample as I need to find the exact distribution for a particular statistic. Is there any way I can do this faster or that I can break it down into small blocks?
Your goal is very likely to be impractical (how large is "large n"??? even if you can generate an enormous number of permutations, how long is it going to take you to summarize over them? How much difference in accuracy is there going to be between an exhaustive computation on a billion elements and a random sample of ten million of them?). However:
The iterpc package can enumerate permutations in blocks. For example:
library("iterpc")
Set up an object ("iterator") to generate permutations of 10 objects:
I <- iterpc(10,labels=1:10,ordered=TRUE)
Return the first 5 permutations:
getnext(I,5)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 2 3 4 5 6 7 8 9 10
## [2,] 1 2 3 4 5 6 7 8 10 9
## [3,] 1 2 3 4 5 6 7 9 8 10
## [4,] 1 2 3 4 5 6 7 9 10 8
## [5,] 1 2 3 4 5 6 7 10 8 9
Return the next 5 permutations:
getnext(I,5)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 2 3 4 5 6 7 10 9 8
## [2,] 1 2 3 4 5 6 8 7 9 10
## [3,] 1 2 3 4 5 6 8 7 10 9
## [4,] 1 2 3 4 5 6 8 9 7 10
## [5,] 1 2 3 4 5 6 8 9 10 7
Assuming you can compute your statistic one block at a time and then combine the results, this should be feasible. It doesn't look like you can parallelize very easily, though: there's no way to jump to a particular element of an iterator ... The numperm function from the sna package provides "random" (i.e. non-sequential) access to permutations, although in a different ordering from those given by iterpc - but I'm guessing that iterpc is much more efficient, so you may be better off crunching through blocks sequentially on a single node/core/machine rather than distributing the process.
Here are the first 5 permutations as given by sna::numperm:
library("sna")
t(sapply(1:5,numperm,olength=10))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 2 1 3 4 5 6 7 8 9 10
## [2,] 2 3 1 4 5 6 7 8 9 10
## [3,] 2 3 4 1 5 6 7 8 9 10
## [4,] 2 3 4 5 1 6 7 8 9 10
## [5,] 2 3 4 5 6 1 7 8 9 10
The guts of iterpc are written in C++, so it should be very efficient, but no matter what things are going to get hard for larger values of n. To my surprise, iterpc can handle the full set of 10!=3628800 permutations without much trouble:
system.time(g <- getall(I))
## user system elapsed
## 0.416 0.304 0.719
dim(g)
## [1] 3628800 10
However, I can't do any computations with n>10 in a single block on my machine (n=11: "cannot allocate vector of size 1.6 Gb" ... n>11 "The length of the iterator is too large, try using getnext(I,d)")
I found this while searching for a similar approach.
Selecting rows with same result in different columns in R
Is there a way to search within a range of columns? Playing off the example in the link, what if instead of catch[catch$tspp.name == catch$elasmo.name,], is it possible to do this?
catch[catch$tspp.name == c[23:56],] where R would search for values within columns 23 to 56 that match the tspp value?
Thanks in advance and please let me know whether it's better to post an independent question on a topic related to a previous post or to insert a follow up question within the aforementioned post.
Here's one way to do it. This finds rows of X where the first column appears in columns 2 through 9.
> set.seed(1)
> X<-matrix(sample(10,100,T),10)
> X
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 3 10 5 9 5 10 4 5 3
[2,] 4 2 3 6 7 9 3 9 8 1
[3,] 6 7 7 5 8 5 5 4 4 7
[4,] 10 4 2 2 6 3 4 4 4 9
[5,] 3 8 3 9 6 1 7 5 8 8
[6,] 9 5 4 7 8 1 3 9 3 8
[7,] 10 8 1 8 1 4 5 9 8 5
[8,] 7 10 4 2 5 6 8 4 2 5
[9,] 7 4 9 8 8 7 1 8 3 9
[10,] 1 8 4 5 7 5 9 10 2 7
> X[rowSums(X[,1]==X[,2:9])>0,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 3 10 5 9 5 10 4 5 3
[2,] 3 8 3 9 6 1 7 5 8 8
[3,] 9 5 4 7 8 1 3 9 3 8
[4,] 7 4 9 8 8 7 1 8 3 9