Find all combinations of numbers that sum to a target - r

I wish to find the speediest way to find up to 1000 possible combinations of 'n' integers to find a target integer.
For example. Say I wanted to sum to the number '20'. I want to find up to 1000 combinations of four integers that sum to this number. The integers can repeat themselves. I also have a condition that the integer must not be smaller than a particular number, in this case 4.
target<-20 #the number I wish to sum to
lowest<-4 #the smallest integer I allow
size<-4 #the number of integers I wish to use to sum
maxposs <- target - ((size-1) * lowest) #given the lowest, this is the max possible integer. In my example it is 8.
This is how I have started to work this out. Using combn to find all combinations of the four chosen integers and then filtering by those that sum to my target.
m <- combn(rep(lowest:maxposs,size), size)
m1<- m[,colSums(m)==target]
Here, 'm1' has 245 columns. There are only this many solutions. The last few columns:
# [,238] [,239] [,240] [,241] [,242] [,243] [,244] [,245]
#[1,] 4 4 4 4 4 4 5 5
#[2,] 5 5 5 6 7 4 6 4
#[3,] 7 4 5 4 4 5 4 5
#[4,] 4 7 6 6 5 7 5 6
However, in my real application, I can be dealing with very high integers (summing up to 1000) and want to limit myself to a random sample of 1000 possible combinations. As this is for a randomization statistical test, speed is of the essence. I wonder if anyone knows of a faster way of doing this. My way doesn't feel intuitively quick.

my_matrix <- matrix(nrow = 1000, ncol = 4)
i <- 1
nn <- 1000
while(i <= 1000){
x <- sample(x = 4:nn, size = 3)
y = nn - sum(x)
if(y >= 4){
my_matrix[i, ] <- c(x, y)
i <- i + 1
}
}
Per Gavin's suggestion, redone with a preallocated matrix. Now this runs in .158 seconds, twice as fast, and probably scales better.

Related

Looping through items on a list in R

this may be a simple question but I'm fairly new to R.
What I want to do is to perform some kind of addition on the indexes of a list, but once I get to a maximum value it goes back to the first value in that list and start over from there.
for example:
x <-2
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
data[x]
1
data[x+12]
1
data[x+13]
3
or something functionaly equivalent. In the end i want to be able to do something like
v=6
x=8
y=9
z=12
values <- c(v,x,y,z)
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
set <- c(data[values[1]],data[values[2]], data[values[3]],data[values[4]])
set
5 7 8 11
values <- values + 8
set
1 3 4 7
I've tried some stuff with additon and substraction to the lenght of my list but it does not work well on the lower numbers.
I hope this was a clear enough explanation,
thanks in advance!
We don't need a loop here as vectors can take vectors of length >= 1 as index
data[values]
#[1] 5 7 8 11
NOTE: Both the objects are vectors and not list
If we need to reset the index
values <- values + 8
ifelse(values > length(data), values - length(data) - 1, values)
#[1] 1 3 4 7

R - detect and summarize changes in matrices

I have two sets of matrices. Each matrix is 100x100 in dimension and I have 240 of them (imagine each matrix was collected in a month and I have a dataset composed of 240 months of 100x100 matrices).
The values in the matrices range from 1 to 15, representing vegetation types (grass, tropical forest, tundra etc).
My first set of matrices, m1, is my control experiment. My second set of matrices, m2, is a climate change experiment where changes in climate induce changes in the values of the matrices.
Therefore, the data is represented like this:
m1: set of 240 100x100 matrices, each matrix corresponding to a month (therefore 240 months of data). This is my control data
m2: same as m1, but the values are different because of some changes in climate. This is my experimental data.
Here is some data:
# generate dataset 1
set.seed(4)
someData1 <- round(runif(100 * 100 * 240, min=1, max=15),digits=0)
# generate dataset2
set.seed(5)
someData2 <- round(runif(100 * 100 * 240, min=1, max=15),digits=0)
# create matrices
k = 240; n=100; m = 100
m1 <- array(someData1, c(n,m,k))
m2 <- array(someData2, c(n,m,k))
What I would like to do is compare each cell of m2 relative to m1 in this way:
is the value different? yes/no
if yes, what was the change? for example 1 to 10, or 2 to 7 and so on.
and do the same for all 240 matrices in m2 relative to all 240 matrices in m1.
By the end, I would like to be able to:
have a binary matrix showing whether or not there has been changes in the values;
have a table with the frequency of changes in each class (i.e. 1 to 10, 2 to 7 etc).
Conceptually, what I need to achieve would be something like this:
where for simplicity sake I drew 5x5 matrices instead of 100x100 matrices.
How to achieve this in R?
To compare two matrices, use == or !=.
what.changed <- m1 != m2 # T if changed F if not
changes <- ifelse(what.changed, paste(m1, 'to', m2), NA)
changes # for your little matrices not the 100x100
[,1] [,2] [,3]
[1,] NA "7 to 10" "6 to 7"
[2,] NA NA NA
[3,] "3 to 4" "6 to 8" NA
Your matrices seem rather large, so I'm not sure if some sort of sparse matrix approach might be better. In regards to storing the changes as a string ("3 to 4"), perhaps you could only store changes where there is in fact a change, rather than creating such a large matrix where most of the elements are NA. e.g.
Or perhaps you could create a CSV/dataframe summarising your changes e.g. (using your 100x100x240 matrices to demonstrate the 3 coordinates):
# find coordinates of changes
change.coords <- which(m1 != m2, arr.ind=T)
colnames(change.coords) <- c('x', 'y', 'time') # whatever makes sense to your application
changes <- data.frame(change.coords, old=m1[change.coords], new=m2[change.coords])
head(changes)
x y time old new
1 1 1 1 9 4
2 2 1 1 1 11
3 3 1 1 5 14
4 5 1 1 12 2
5 6 1 1 5 11
6 7 1 1 11 8
Then you can print it out as you wish without having to store heaps of strings ("X to Y") and NAs, e.g (don't do this with your big example matrices, there are waaay too many changes and it will print them /all/):
with(changes, message(sprintf("Coords (%i, %i, %i): %i to %i\n",
x, y, time, old, new)))

Create vector by given distibution of values

Let's say I have a vector a = (1,3,4).
I want to create new vector with integer numbers in range [1,length(a)]. But the i-th number should appear a[i] times.
For the vector a I want to get:
(1,2,2,2,3,3,3,3)
Would you explain me how to implement this operation without several messy concatenations?
You can try rep
rep(seq_along(a), a)
#[1] 1 2 2 2 3 3 3 3
data
a <- c(1,3,4)

finding set of multinomial combinations

Let's say I have a vector of integers 1:6
w=1:6
I am attempting to obtain a matrix of 90 rows and 6 columns that contains the multinomial combinations from these 6 integers taken as 3 groups of size 2.
6!/(2!*2!*2!)=90
So, columns 1 and 2 of the matrix would represent group 1, columns 3 and 4 would represent group 2 and columns 5 and 6 would represent group 3. Something like:
1 2 3 4 5 6
1 2 3 5 4 6
1 2 3 6 4 5
1 2 4 5 3 6
1 2 4 6 3 5
...
Ultimately, I would want to expand this to other multinomial combinations of limited size (because the numbers get large rather quickly) but I am having trouble getting things to work. I've found several functions that do binomial combinations (only 2 groups) but I could not locate any functions that do this when the number of groups is greater than 2.
I've tried two approaches to this:
Building up the matrix from nothing using for loops and attempting things with the reshape package (thinking that might be something there for this with melt() )
working backwards from the permutation matrix (720 rows) by attempting to retain unique rows within groups and or removing duplicated rows within groups
Neither worked for me.
The permutation matrix can be obtained with
library(gtools)
dat=permutations(6, 6, set=TRUE, repeats.allowed=FALSE)
I think working backwards from the full permutation matrix is a bit excessive but I'm tring anything at this point.
Is there a package with a prebuilt function for this? Anyone have any ideas how I shoud proceed?
Here is how you can implement your "working backwards" approach:
gps <- list(1:2, 3:4, 5:6)
get.col <- function(x, j) x[, j]
is.ordered <- function(x) !colSums(diff(t(x)) < 0)
is.valid <- Reduce(`&`, Map(is.ordered, Map(get.col, list(dat), gps)))
dat <- dat[is.valid, ]
nrow(dat)
# [1] 90

Append rle result from loop

I am running a coin-toss simulation with a loop which runs about 1 million times.
Each time I run the loop I wish to retain the table output from the RLE command. Unfortunately a simple append does not seem to be appropriate. Each time I run the loop I get a slightly different amount of data which seems to be one of the sticking points.
This code gives an idea of what I am doing:
N <- 5 #Number of times to run
rlex <-NULL
#begin loop#############################
for (i in 1:N) { #tells R to repeat N number
x <-sample(0:1, 100000, 1/2)
rlex <-append(rlex, rle(x))
}
table(rlex) #doesn't work
table(rle(x)) #only 1
So instead of having five separate rle results (in this simulation, 1 million in the full version), I want one merged rle table. Hope this is clear. Obviously my actual code is a bit more complex, hence any solution should be as close to what I have specified as possible.
UPDATE: The loop is an absolute requirement. No ifs or buts. Perhaps I can pull out the table(rle(x)) data and put it into a matrix. However again the stumbling block is the fact that some of the less frequent run lengths do not always turn up in each loop. Thus I guess I am looking to conditionally fill a matrix based on the run length number?
Last update before I give up: Retaining the rle$values will mean that too much data is being retained. My simulation is large-scale and I really only wish to retain the table output of the rle. Either I retain each table(rle(x)) for each loop and combine by hand (there will be thousands), or I find a programmatic way to keep the data (yes for zeroes and ones) and have one table that is formed from merging each of the individual loops as I go along.
Either this is easyish to do, as specified, or I will not be doing it. It may seem a silly idea/request, but that should be incidental to whether it can be done.
Seriously last time. Here is an animated gif showing what I expect to happen.
After each iteration of the loop data is added to the table. This is as clear as I am going to be able to communicate it.
OK, attempt number 4:
N <- 5
set.seed(1)
x <- NULL
for (i in 1:N){
x <- rbind(x, table(rle(sample(0:1, 100000, replace=TRUE))))
}
x <- as.data.frame(x)
x$length <- as.numeric(rownames(x))
aggregate(x[, 1:2], list(x[[3]]), sum)
Produces:
Group.1 0 1
1 1 62634 62531
2 2 31410 31577
3 3 15748 15488
4 4 7604 7876
5 5 3912 3845
6 6 1968 1951
7 7 979 971
8 8 498 477
9 9 227 246
10 10 109 128
11 11 65 59
12 12 24 30
13 13 21 11
14 14 7 10
15 15 0 4
16 16 4 2
17 17 0 1
18 18 0 1
If you want the aggregation inside the loop, do:
N <- 5
set.seed(1)
x <- NULL
for (i in 1:N){
x <- rbind(x, table(rle(sample(0:1, 100000, replace=TRUE))))
y <- aggregate(x, list(as.numeric(rownames(x))), sum)
print(y)
}
Following up #CarlWitthoft's answer, you probably want:
N <- 5
rlex <-NULL
for (i in 1:N) {
x <-sample(0:1, 100000, 1/2)
rlex <-append(rlex, rle(x)$lengths)
}
since I think you don't care about the $values component (i.e. whether each run is a run of zeros or ones).
Result: one long vector of run lengths.
But this would probably be a lot more efficient:
maxlen <- 30
rlemat <- matrix(nrow=N,ncol=maxlen)
for (i in 1:N) {
x <-sample(0:1, 100000, 1/2)
rlemat[i,] <- table(factor(rle(x)$lengths,levels=1:maxlen))
}
Result: an N by maxlen table of run lengths from each iteration.
If you only want to save the total number of runs of each length you could try:
rlecumsum <- rep(0,maxlen)
for (i in 1:N) {
x <-sample(0:1, 100000, 1/2)
rlecumsum <- rlecumsum + table(factor(rle(x)$lengths,levels=1:maxlen))
}
Result: an vector of length maxlen of the total numbers of run lengths across all iterations.
And here's my final answer:
rlecumtab <- matrix(0,ncol=2,nrow=maxlen)
for (i in 1:N) {
x <- sample(0:1, 100000, 1/2)
r1 <- rle(x)
rtab <- table(factor(r1$lengths,levels=1:maxlen),r1$values)
rlecumtab <- rlecumtab + rtab
}
Result: a maxlen by 2 table of the total numbers of run lengths across all iterations, divided by type (0-run vs 1-run).
You need to read the help page for rle . Consider:
names(rlex) #"lengths" "values" "lengths" "values" .... and so on
In the meantime, I strongly suggest you spend some time reading up on statistical methods. There is zero (+/- epsilon) chance that running a binomial simulation a million times will tell you anything you won't learn after a few hundred tries, unless your coin has p=1e-5 :-).

Resources