Random Vectors Regardless of Order - r

What I am looking to do is generate n vectors of length k such that each value in the vector is a random number, and that there are no repeated vectors regardless of order of the integers in each vector. For example, there would not be both the vectors 1,2,3 and 2,1,3.
So far what I have is the following to generate one vector, and I was planning on looping through this code to generate the n vectors.
vector<- sample(1:20000,k)
One idea that I had would be to sort all vectors in ascending order and then remove duplicate vectors. Is there an easier/more efficient way?
Thanks!

One approach would be to generate all the possible combinations, then choose from that set. This would guarantee no duplicates:
> tmp <- combn(100, 3)
> dim(tmp)
[1] 3 161700
> tmp[ , sample( ncol(tmp), 10 ) ]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 34 35 28 3 3 29 8 24 50 53
[2,] 54 54 63 57 17 86 31 30 52 81
[3,] 97 79 87 92 53 94 90 83 87 97
Of course this will only work if the total number of combinations can fit in memory. Though there are ways to iterate through the combinations and only keep some, or to generate a sample of integers, then convert them to the appropriate combination.
If the number of samples you want is small relative to the number of possible combinations then the rejection method will probably be more efficient. But this could work better if the number of samples is large relative to the number of combinations which would lead to a large number of duplicates.

Related

Single for loop [duplicate]

This question already has answers here:
Multiplying vector combinations
(2 answers)
Closed 5 years ago.
I would like to create a single for-loop instead of using a nested for this question.
Required: fill up Matrix (foo) with multiples of loopvec1 and loopvec2 using a single for loop.
So far, I am only able to solve it using nested for loops. Here are my codes:
loopvec1 <- 5:7
loopvec2 <- 9:6
foo <- matrix(NA, length(loopvec1), length(loopvec2))
for (i in 1:length(loopvec1)) {
for (j in 1:length(loopvec2)) {
foo[i, j] <- loopvec1[i] * loopvec2[j]
}
}
foo
Output (foo):
[,1] [,2] [,3] [,4]
[1,] 45 40 35 30
[2,] 54 48 42 36
[3,] 63 56 49 42
May I know how do I achieve the same results just by using a single for loop instead?
Edit: I am aware that there is another function "outer" that produces the same result (which resulted in this question being marked as a duplicate), however, the question that I am stuck with requires me to use a single for loop instead of any other functions.
try this:
loopvec1 <- 5:7
loopvec2 <- 9:6
foo <- matrix(NA, length(loopvec1), length(loopvec2))
for (i in 1:length(loopvec1)) {
foo[i,] <- loopvec1[i] * loopvec2
}
foo
results:
> foo
[,1] [,2] [,3] [,4]
[1,] 45 40 35 30
[2,] 54 48 42 36
[3,] 63 56 49 42

Is there a faster way to run sapply that is nested in two for loops?

I have a big data frame with > 1 million lines representing time series data for several individuals (with different individual data in different columns).
In addition, I have a 3D array that contains encounter frame numbers that indicate from which frame in my time series I want to extract data.
For a given individual and encounter type, I want to extract one time series of e.g. 100 frames. However, as I have many replicates for each meeting type and each individual, I want to directly calculate the average time series per individual and encounter type.
I managed to do so using sapply embedded in two for loops. However, running these for loops is very slow, and I now wonder whether there is a faster way of implementing this calculation in R, or whether I should rather do it in C++. Below my code, and the small bit of my data:
nb_ind = 3;
response_duration = 100;
nb_meeting_types = 2;
nb_variables = 2;
speed_offset = 2;
MEETING_START_OFFSET = 50;
replicate = 20;
# behavior_data is a data frame with columns: frame,speed1,head1,speed2,head2,speed3,head3
# there are about 1 million rows
dim(behavior_data)
[1] 1080000 7
head(behavior_data)
frame speed1 head1 speed2 headd2 speed3 head3
1 0 0 25 2.4 179 1.1 16
2 1 1.5 20 2.0 -175 1.6 27
3 2 1.6 28 2.0 -178 1.0 37
4 3 0.8 56 1.6 170 0.8 37
5 4 0.3 56 1.8 162 0 40
# encounters is an array with frame numbers of dimension [nb_ind,replicate,nb_meeting_types]
# these frame number correspond to starting points of meetings, for which I want to calculate the speed
dim(encounters)
[1] 3 20 2
head(encounters[,,1])
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 12049 17693 23350 29018 34666 40327 68608 57293 74264 45980 113864 79922 119522 102552 51636 153462 91235 142151 159121 62948
[2,] 12036 17694 23352 29014 34674 40322 68606 57296 74268 45982 113865 79929 119521 102558 51639 153463 91242 142161 159168 62952
[3,] 12037 17694 23351 29011 34669 40329 68606 57298 74263 45985 NA 79921 NA 102550 51641 NA 91234 NA NA 62950
all_average_speeds = array(NaN, c(nb_ind, response_duration, nb_meeting_types))
for (j in 1:nb_ind){
#calculate the average speed response for each meeting type for a given individual
average_speed = numeric(0);
for (i in 1:nb_meeting_types){
# calculate the average speed response across all replicates of a given meeting type for a given individual
average_speed_type = sapply(1:response_duration, function(k){
mean(behavior_data[,(j-1)*nb_variables + speed_offset][which(behavior_data$frame == ((encounters[j,,i] + k-1) - MEETING_START_OFFSET)], na.rm=TRUE)
})
average_speed = rbind(average_speed, t(average_speed_type))
}
all_average_speeds[j,,] = average_speed;
}

R: Split one matrix into many matrices

How can I split a matrix into many submatrixes? Every submatrix should contain a selection of rows of the initial matrix.
The initial matrix is imported out of an .csv-file:
seri <- read.table("/PATH/seriation_testdaten.csv", sep="\t", head=TRUE, row.names=1)
This matrix seri contains numeric values like for example a matrix like seritest:
seritest <- matrix(1:100,10)
Now I would like to divide the rows of this matrix into groups. For example, I would like to have groups each with three rows. So one group should contain the rows 1,2 and 3, the next one the rows 4,5 and 6 and so on, until nrow(seri) is reached. It's no problem, if the last group just contains less than three rows.
Matrix 1:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 11 21 31 41 51 61 71 81 91
[2,] 2 12 22 32 42 52 62 72 82 92
[3,] 3 13 23 33 43 53 63 73 83 93
Matrix 2:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[4,] 4 14 24 34 44 54 64 74 84 94
[5,] 5 15 25 35 45 55 65 75 85 95
[6,] 6 16 26 36 46 56 66 76 86 96
Matrix 3:
...
That's the first step. But I would like to go further. In this example I have groups of three rows in the resulting submatrices. But I also want the resulting submatrices for groups of 1 row, 2 rows, 4 rows, 5 rows and so on up to nrow(seri) rows. So basically hundreds of submatrices which are each part of a unit based on the decision how big the group-size should be.
Thanks to the help of #farnsy I was able to produce working code, which does exactly what I want:
seritest <- matrix(1:10000,100)
a = nrow(seritest)
e = 1:nrow(seritest)
seri_sub <- list()
U=1
while(U<=a) {
Q=0
AQ=0
EQ=0
Uk <- U*1000;
repeat{
(e[U]*Q)+1 -> EQ;
Q=Q+1;
e[U]*Q -> AQ;
if(AQ>a) break
seri_sub[[Uk+Q]] <- seritest[EQ:AQ,];
};
U=U+1;
}
I can access the matrices by calling for example seri_sub[[3002]]. Like this I get the second (300**2**) group (rows 4,5 and 6) of the unit which is the result of a division of the initial matrix into groups of three (**3**002).
Like already mentioned by #Dason, this code is inefficient. Many list-elements are empty (NULL). Maybe somebody has an idea how to improve it.
This code runs without any further packages. R version 3.0.2 (2013-09-25). OS: Ubuntu 14.04 (64bit).
(I'm not a native speaker - please excuse the lack of eloquence)
seri doesn't seem like a list here. Neither do A or B, actually. Are you sure you are talking about lists at all? It looks more like you want to subset a matrix a bunch of times, creating submatrices. I can't imagine you actually want "random" names, either. That's crazy talk.
If you want to break up a matrix, why not store all the resulting matrices in an actual list?
myList <- list()
myList[[1]] <- seri[a:b,]
myList[[2]] <- seri[c:d,]
You can see how it would be pretty easy to put this in a loop. Now myList is a list of matrices and, for example, mylist[[i]] would be the i-th matrix. If you want the second row and third column entry, it would be mylist[[i]][2,3].

Filtering permutations to avoid running out of memory

The context of this problem is asset allocation. If I have N assets, and can allocate them in 5% chunks, what are the permutations that exist such that the sum of the allocation is exactly equal to 100%.
For example if I had 2 assets there would be 21 (created using my function "fMakeAllocationsWeb(2)" code at the bottom of this post:
[,1] [,2]
[1,] 0 100
[2,] 5 95
[3,] 10 90
[4,] 15 85
[5,] 20 80
[6,] 25 75
[7,] 30 70
[8,] 35 65
[9,] 40 60
[10,] 45 55
[11,] 50 50
[12,] 55 45
[13,] 60 40
[14,] 65 35
[15,] 70 30
[16,] 75 25
[17,] 80 20
[18,] 85 15
[19,] 90 10
[20,] 95 5
[21,] 100 0
The problem of course come when the number of assets increases, even modestly. This is understandable as with repetition the number of permutations is n^(n) and I'm not able to allocate the intermediate step of creating all permutations to memory. For example with 20 assets the number of permutations is 5.84258701838598E+27!!
I would like to be able to filter these on the fly (sum==100) so as to not run into the memory allocation issue. Digging into the code beneath gtools::permutations it seems to be vectorised and intervening there to filter seems impossible.
Would gratefully welcome any thoughts - ideally would prefer to stick with R code and packages.
Many thanks
Russ
installifMissing <- function(sPackageName) {
if (!sPackageName %in% installed.packages()) install.packages(sPackageName)
}
fMakeAllocationsWeb<-function(iNumAssets=10,iIncrement=5){
installifMissing("gtools")
require(gtools)
iAlloc<-seq(0,100,by=iIncrement) #'the allocation increments eg 0,5,10...,95,100
#'generate permutations
permut<-permutations(n=length(iAlloc),r=iNumAssets,v=iAlloc,repeats.allowed=TRUE)
#'filter permuatations for those which sum to exactly 100'
permutSum<-apply(permut,MARGIN=1,FUN=sum)
permut100<-permut[which(permutSum==100),]
return(permut100)
}
If you install the partitions package, you have the restrictedparts function that will enumerate all the ways you can add n numbers together to get a sum S. In your case, you want to restrict the summands to be multiples of 5, and the restriction is to add up to S=100. Instead, divide your summands by 5 and have the total add up to 20. If you want 2 assets, then the code restrictedparts(100/5,2) * 5 will give you the 10 unordered pairs.
You can then loop through the columns and enumerate, for each, the set of all permutations of asset allocations. You'll have to deal carefully with the case where there are repeated elements - for example, we generate {100,0} which represents <100,0> and <0,100> whereas {50,50} only represents the single allocation <50,50>. You can deal with this by using the set attribute of permuatations
restrictedparts(100/5,20) * 5 gives 627 partitions that add up to 100% - and you'll need to permute each of these to get your full list of allocations.
In your problem, you will still have large number of combinations to deal with even after filtering.
Your problem essentially boils down to n multichoose k problem as described here
You want to choose k=20 slots of 5% weightage each to allocate from n assets.
So in your example case of 20 assets, your number of combinations would still be
choose(39, 20)
## [1] 68923264410
I suggest you have a look at DEoptim package which has specific examples directly related to your problem at hand. It uses differential evolution.

Perform 'cross product' of two vectors, but with addition

I am trying to use R to perform an operation (ideally with similarly displayed output) such as
> x<-1:6
> y<-1:6
> x%o%y
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 2 4 6 8 10 12
[3,] 3 6 9 12 15 18
[4,] 4 8 12 16 20 24
[5,] 5 10 15 20 25 30
[6,] 6 12 18 24 30 36
where each entry is found through addition not multiplication.
I would also be interested in creating the 36 ordered pairs (1,1) , (1,2), etc...
Furthermore, I want to use another vector like
z<-1:4
to create all the ordered triplets possible between x, y, and z.
I am using R to look into likelihoods of possible total when rolling dice with varied numbers of sizes.
Thank you for all your help! This site has been a big help to me. I appreciate anyone that takes the time to answer a stranger's question.
UPDATE So I found that `outer(x,y,'+') will do what I wanted first. But I still don't know how to create ordered pairs or ordered triplets.
Your first question is easily handled by outer:
outer(1:6,1:6,"+")
For the others, I suggest you try expand.grid, although there are specialized combination and permutation functions out there as well if you do a little searching.
expand.grid can answer your second question:
expand.grid(1:6,1:6)
expand.grid(1:6,1:6,1:4)

Resources