I will explain my problem in two different ways, please pick the one you understand better. If something's unclear please, tell me in the comments.
Version 1
For the sake of trying to make it clearer I will explain the same problem frm another point of view:
I have a binary matrix with 5 columns and 3 rows:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 0 0
[2,] 0 0 1 1 0
[3,] 0 0 0 1 1
I am interested in (almost) all the versions of this matrix by permuting only the rows. There is one constraint: I only want the permutations where each column has at least one 1. As said above, I already calculated that there are 370 possibilities of permuting the rows maintaing at leat one 1 in each column. My question here is, is there a possibility obtaining these matrices without having to run through all the 1000 possible permutations?
Version 2
I would like to calculate some probabilities, for wich I have 3 tables of elements which with each 10 columns which need to be combined.
Table 1: combn(5,3) -> 10 combinations
> combn(5,3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 2 2 2 3
[2,] 2 2 2 3 3 4 3 3 4 4
[3,] 3 4 5 4 5 5 4 5 5 5
Table 2 & 3: combn(5,2) -> 10 combinations
> combn(5,2)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 2 2 2 3 3 4
[2,] 2 3 4 5 3 4 5 4 5 5
Now I need to have all the combinations of the created groups, but only those who contain all possible elements at least once (1,2,3,4,5). There are 1000 possible combinations of which I am only interested in 370.
The only solution I can think about is to run through all the possible combinations (using expand.grid() and check if all elements are present. I would hope there is an easier solution to this. Meanwhile this example is doesnt pose to much trouble to calculate, as soon as the number grow I could save lots of computation time if I could obtain directly only the combinations I am intersted in.
Related
I have this vector
b=c(5,8,9)
I want to perform a combination on b selecting 2 items at a time such that i have the original elements of b as my first row to get
[,1] [,2] [,3]
[1,] 5 8 9
[2,] 8 9 5
I tried combn(b, 2) and it gives me this
[,1] [,2] [,3]
[1,] 5 5 8
[2,] 8 9 9
Can i get help to achieve my desired result?
Since the second row of your desired result is not uniquely defined, there is no need for any sophisticated tools:
b <- 1:10
rbind(b, c(b[-1], b[1]))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# b 1 2 3 4 5 6 7 8 9 10
# 2 3 4 5 6 7 8 9 10 1
In this case I only "shift" b by one position in the second row, which indeed results in a permutation. I'm assuming that the elements of b don't repeat.
I have a matrix of type numeric, with dim 10000 * 50. Now I want to find the index of top 5 elements in every row in the order of their values. e.g. a sample might look like :
set.seed(2)
v1 <- matrix(runif(20 , 0 ,20 ) , 2 ,10)
v1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,] 3.697645 11.466527 18.87679 2.58318 9.36037 11.053481 15.210266 8.105644 19.527970 8.896185
#[2,] 14.047481 3.361038 18.86950 16.66898 10.99967 4.777895 3.616402 17.070969 4.516509 1.499588
Then I want the output to look like :
#[1,] 9 3 7 2 6
#[2,] 3 8 4 1 5
I could find only this question, which explains how to find top n elements, but not in the order of values.
apply() is perfect for row-wise operations on matrices. You could do
t(apply(v1, 1, function(x) order(-x)[1:5]))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 9 3 7 2 6
# [2,] 3 8 4 1 5
This runs the order() function row-wise down the matrix v1 then takes the first five values for each one, transposing the result since you specify rows not columns.
This can also be done with data.table after melting into 'long' format, grouped by 'Var1', we get the order of 'value'
library(reshape2)
library(data.table)
setDT(melt(v1))[, head(order(-value),5), Var1]
# Var1 V1
#1: 1 9
#2: 1 3
#3: 1 7
#4: 1 2
#5: 1 6
#6: 2 3
#7: 2 8
#8: 2 4
#9: 2 1
#10: 2 5
Or using base R
ave(-v1, row(v1), FUN = order)[,1:5]
# [,1] [,2] [,3] [,4] [,5]
#[1,] 9 3 7 2 6
#[2,] 3 8 4 1 5
I want to find all possible permutations for large n using R. At the moment I am using permutations(n,n) from gtools package, but for n>10 it is almost impossible; I get memory crashes due to the large number of permutations (n!). I do not want to sample as I need to find the exact distribution for a particular statistic. Is there any way I can do this faster or that I can break it down into small blocks?
Your goal is very likely to be impractical (how large is "large n"??? even if you can generate an enormous number of permutations, how long is it going to take you to summarize over them? How much difference in accuracy is there going to be between an exhaustive computation on a billion elements and a random sample of ten million of them?). However:
The iterpc package can enumerate permutations in blocks. For example:
library("iterpc")
Set up an object ("iterator") to generate permutations of 10 objects:
I <- iterpc(10,labels=1:10,ordered=TRUE)
Return the first 5 permutations:
getnext(I,5)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 2 3 4 5 6 7 8 9 10
## [2,] 1 2 3 4 5 6 7 8 10 9
## [3,] 1 2 3 4 5 6 7 9 8 10
## [4,] 1 2 3 4 5 6 7 9 10 8
## [5,] 1 2 3 4 5 6 7 10 8 9
Return the next 5 permutations:
getnext(I,5)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 2 3 4 5 6 7 10 9 8
## [2,] 1 2 3 4 5 6 8 7 9 10
## [3,] 1 2 3 4 5 6 8 7 10 9
## [4,] 1 2 3 4 5 6 8 9 7 10
## [5,] 1 2 3 4 5 6 8 9 10 7
Assuming you can compute your statistic one block at a time and then combine the results, this should be feasible. It doesn't look like you can parallelize very easily, though: there's no way to jump to a particular element of an iterator ... The numperm function from the sna package provides "random" (i.e. non-sequential) access to permutations, although in a different ordering from those given by iterpc - but I'm guessing that iterpc is much more efficient, so you may be better off crunching through blocks sequentially on a single node/core/machine rather than distributing the process.
Here are the first 5 permutations as given by sna::numperm:
library("sna")
t(sapply(1:5,numperm,olength=10))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 2 1 3 4 5 6 7 8 9 10
## [2,] 2 3 1 4 5 6 7 8 9 10
## [3,] 2 3 4 1 5 6 7 8 9 10
## [4,] 2 3 4 5 1 6 7 8 9 10
## [5,] 2 3 4 5 6 1 7 8 9 10
The guts of iterpc are written in C++, so it should be very efficient, but no matter what things are going to get hard for larger values of n. To my surprise, iterpc can handle the full set of 10!=3628800 permutations without much trouble:
system.time(g <- getall(I))
## user system elapsed
## 0.416 0.304 0.719
dim(g)
## [1] 3628800 10
However, I can't do any computations with n>10 in a single block on my machine (n=11: "cannot allocate vector of size 1.6 Gb" ... n>11 "The length of the iterator is too large, try using getnext(I,d)")
I try to optimize parts of a simulation script and have parts of it run in parallel. So I discovered the foreach function.
As an example I try to have the following run in parallel
Wiederholungen<-4
assign("A",array(0,c(Wiederholungen,10,3)))
assign("B",array(0,c(Wiederholungen,10,3)))
assign("C",array(0,c(Wiederholungen,10,3)))
Old Code:
for(m in 1:Wiederholungen){
A[m,,]<-m
B[m,,]<-m
C[m,,]<-m
}
Resulting in: (for A)
A
, , 1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4 4 4 4
, , 2
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4 4 4 4
, , 3
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4 4 4 4
This should put the value of m in the m-th row of the dataframes A,B & C. In my script m is the number of replicates (Wiederholungen) the script should run.
Foreach Code:
foreach(m=1:Wiederholungen)%dopar%{
A[m,,]<-m
B[m,,]<-m
C[m,,]<-m
}
Resulting in:
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
But this does not lead in the above result for the dataframes A,B,C. I know that foreach has a combine option but it seems only to work if you have the result for one matrix and not for several as in my example. How do I get the same result as with my old for loop using a parallel processing approach. I am using Ubuntu and 3.2.2.
I know this is a very old post, but thought an answer might help someone in the future.
Foreach returns a list of only the last object created in the loop. so if you want A, B, and C returned you need to list them as the final step of the loop.
This code works for me.
W<- foreach(m = 1:Wiederholungen) %dopar% {
A[m,,]<-m
B[m,,]<-m
C[m,,]<-m
list(A, B, C)
}
For future reference, the foreach vignette can be very helpful with things like this.
I have two matrices:
x
A B C
2 3 4
3 4 5
and y
D E
1 2
3 2
How can I subtract the combination of elements within columns? Giving me the following result:
AD AE BD BE CD CE
1 0 2 1 3 2
0 1 1 2 2 3
I have tried applying outer, but can't make it work with matrices. Would vectorizing a function be a solution? I have tried the code below, but it doesn't seem to work.
vecfun= Vectorize(fun)
fun=function(a,b)(a-b)
outer(x,y, vecfun)
Thanks in advance for any advice.
This doesn't use outer, but gets your intended result:
> do.call(cbind,lapply(1:ncol(x),function(i) x[,i]-y))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 0 2 1 3 2
[2,] 0 1 1 2 2 3
Here's another way without loops/*apply family (assuming your matrices are x and y):
x[ , rep(seq_len(ncol(x)), each=ncol(y))] - y[, rep(seq_len(ncol(y)), ncol(x))]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 0 2 1 3 2
[2,] 0 1 1 2 2 3
I'm not sure if it'll be faster, yet. But I thought it is an interesting approach. Also this would take twice the memory of your resulting matrix during computation.