How many sum of repeatable combinations of length N? - math

How many sum of repeatable combinations of length N ?
Like [1,3,5,10], N = 4.
And there gonna be
[1,1,1,1] -> sum is 4
[1,1,1,3] -> sum is 6
...
[10,10,10,10] -> sum is 40
I perform a backtracking algo. by python3
res = set()
n = 4
def backtrack(k, path):
if len(path) == n:
res.add(sum(path))
return
backtrack(k+1, path+[1])
backtrack(k+1, path+[3])
backtrack(k+1, path+[5])
backtrack(k+1, path+[10])
return
backtrack(0, list())
Is there has more efficient solution?

If n elements order be not important, then your code is wrong
for example [1,1,2,2] ~ [1,2,1,2]
You can create a new list and repeat each element of the original n times. then the question is how many ways we can select n item from new list which can be calculated easily
further more if you want the result set of all the sums i think there's no better way than iterating in all situations.

O(n*len(list)*numberofdistinctsums) approach.
We have two sets containing current possible sums for odd and even steps.
We calculate new sums formed with i+1 summands from previous step sums (with i summands).
For example, after two rounds we have all possible sums of two summands. Getting sum 8 from previous set, we put sums of three summands 8+1, 8+3,8+5, 8+10 into new set and so on.
a = [1,3,5,10]
n = 4
sums = [set(a), set()]
for i in range(1, n):
sums[i%2].clear
for j in sums[(i+1)%2]:
for x in a:
sums[i%2].add(j + x)
print(len(sums[(n-1)%2]))

Related

In R, need to find best combination of 8 columns, only being able to select one value from each row

In R, I'm attempting to find the best combination of 8 different columns of values but with the caveat of only being able to select one value from each row. It sounds relatively simple, but I'm trying to avoid a nasty looping scenario to evaluate all possible options, so I'm hopeful there is a function available that could make this a possibility. There are scenarios where I will need to run this on datasets with over 2000 rows, so efficiency is really important.
Here is an example:
I've been racking my brain and searching forever, but every scenario and solution I'm able to find can maximize series of columns but cant handle the condition of only allowing a single value per row. Are there any functions where this is possible?
I will take a risk here, and assume that I interpreted you right. That you seek the group of 8 numbers in that table that have the maximum sum. Given, of course that they do not share a column or a row.
There is no easy answer to this question. I am not a computer scientist, but I believe this is what is called an NP-hard problem. So efficiency will always be a problem. Fortunately, in practical terms, I think you can get an answer for a 2000+ table in a matter of seconds, as long as the number of columns remains small.
The algorithm I tried to use to win this problem is essentially a depth-first search that takes advantage of existing function in R that makes it faster. You can think of your problem as jumping from column to column, each time selecting the highest value with a twist. Every time you select a value, all cells in that row are turned to zero. So in essence, when you get to the last column, there will only be one value to choose.
However, due to this nature of excluding rows, your results will be different depending on the order you choose to visit the columns (let's call that a path). Thus, you have to test all paths.
So our code must be something of the sort:
1- Enumerate all paths (all permutations of column numbers);
2- For each path, "walk" it taking the maximum value of each column and transforming to 0 the values in its row. Store the values;
3- For each set of values, calculate its sum and select based on that.
Below is the code I have used to do it:
library(combinat) # loads permn function, that enumerates all the permutations
#Create fake data
data = sample(1:25)
data = matrix(data,5,5)
# Walking function
walker = function(path,data) {
bestn = numeric(length(path)) # Placeholder for the max value of each column
usedrows = numeric(length(path)) #Placeholder for the row of each max value
data.reduced=data # copies data to a new object
for(a in 1:length(path)) { # iterate through columns
bestn[a] = max(data.reduced[,path[a]]) #find the maximum value
usedrows[a] = which.max(data.reduced[,path[a]]) # find maximum value's row
data.reduced[usedrows,]=0 # set all values in that row to 0
data.reduced[,path[a]]=0 # set current column to 0.
}
return(bestn)
}
# Create all permutations and use functions in it, get their sum, and choose based on that
paths = permn(1:5)
values = lapply(paths,walker,data)
values.sum = sapply(values,sum)
values[[ which.max(values.sum)]]
The code can handle a matrix of 2000 x 5 in less than a second in a laptop. I just did not added it here, because the more rows, the more independent the results become from the path taken. And it is less easy to see its progress with large numbers.
This problem can be solved simply as a binary integer optimization problem. Here using the ROI and ompr optimization packages. ompr is a formulation manager that calls ROI functions for optimization and processing. Here is an example:
require(ROI)
require(ROI.plugin.glpk)
require(ompr)
require(ompr.roi)
set.seed(7)
n <- runif(77, 80, 120)
n <- c(n, rep(0, 179))
n <- sample(n)
m <- matrix(n, ncol = 8)
nrows <- nrow(m)
ncols <- ncol(m)
model <- MIPModel() %>%
add_variable(x[i, j], i=1:nrows, j=1:ncols, type='binary', lb=0) %>%
set_objective(sum_expr(colwise(m[i, j]) * x[i, j], i=1:nrows, j=1:ncols), 'max') %>%
add_constraint(sum_expr(x[i, j], i=1:nrows) <= 1, j=1:ncols) %>%
add_constraint(sum_expr(x[i, j], j=1:ncols) <= 1, i=1:nrows)
result <- solve_model(model, with_ROI(solver = "glpk", verbose = TRUE))
<SOLVER MSG> ----
GLPK Simplex Optimizer, v4.47
40 rows, 256 columns, 512 non-zeros
* 0: obj = 0.000000000e+000 infeas = 0.000e+000 (0)
* 20: obj = 9.321807877e+002 infeas = 0.000e+000 (0)
OPTIMAL SOLUTION FOUND
GLPK Integer Optimizer, v4.47
40 rows, 256 columns, 512 non-zeros
256 integer variables, all of which are binary
Integer optimization begins...
+ 20: mip = not found yet <= +inf (1; 0)
+ 20: >>>>> 9.321807877e+002 <= 9.321807877e+002 0.0% (1; 0)
+ 20: mip = 9.321807877e+002 <= tree is empty 0.0% (0; 1)
INTEGER OPTIMAL SOLUTION FOUND
<!SOLVER MSG> ----
solution <- get_solution(result, x[i, j])
solution <- subset(solution, value != 0)
solution
variable i j value
27 x 27 1 1
43 x 11 2 1
88 x 24 3 1
99 x 3 4 1
146 x 18 5 1
173 x 13 6 1
209 x 17 7 1
246 x 22 8 1
The first code chunk generates a 32X8 random matrix. The sample generates a 30% fill. The constraints constrain each column and row to have <= 1 active variable. You can use this code directly for any matrix of any dimension.

Finding if binary matrix exists given the row and column sums

How to find out if it is possible to contruct a binary matrix with given row and column sums.
Input :
The first row of input contains two numbers 1≤m,n≤1000, the number of rows and columns of the matrix. The next row contains m numbers 0≤ri≤n – the sum of each row in the matrix. The third row contains n numbers 0≤cj≤m – the sum of each column in the matrix.
Output:
Output “YES” if there exists an m-by-n matrix A, with each element either being 0 or 1. Else "NO".
I tried reading about Tomography algorithms but could not figure out an answer as all the papers related to Tomography algorithm is very complicated.
Can someone please help me..
I've successfully implemented randomly generating such matrices for R using a modeling based on network flow. I intend to write up those ideas one day, but haven't found the time yet. Reasearching for that, I read in Randomization of Presence–absence Matrices: Comments and New Algorithms by Miklós and Podani:
The Havel-Hakimi theorem (Havel 1955, Hakimi 1962) states that there exists a matrix Xn,m of 0’s and 1’s with row totals a0=(a1, a2,… , an) and column totals b0=(b1, b2,… , bm) such that bi ≥ bi+1 for every 0 < i < m if and only if another matrix Xn−1,m of 0’s and 1’s with row totals a1=(a2, a3,… , an) and column totals b1=(b1−1, b2−1,… ,ba1−1, ba1+1,… , bm) also exists.
I guess that should be the best method to recursively decide your question.
Phrased in my own words: Choose any row, remove it from the list of totals. Call that removed number k. Also subtract one from the k columns with larges sums. You obtain a description of a smaller matrix, and recurse. If at any point you don't have k columns with non-zero sums, then no such matrix can exist. Otherwise you can recursively build a matching matrix using the reverse process: take the matrix returned by the recursive call, then add one more row with k ones, placed in the columns from whose counts you originally subtracted one.
Implementation
bool satisfiable(std::vector<int> a, std::vector<int> b) {
while (!a.empty()) {
std::sort(b.begin(), b.end(), std::greater<int>());
int k = a.back();
a.pop_back();
if (k > b.size()) return false;
if (k == 0) continue;
if (b[k - 1] == 0) return false;
for (int i = 0; i < k; i++)
b[i]--;
}
for (std::vector<int>::iterator i = b.begin(); i != b.end(); i++)
if (*i != 0)
return false;
return true;
}

How many distinct combinations can be made from a complex group?

By complex group I mean a group where not all values are distinct. That is, if an ordinary group would be 1,2,3,4,5,6,7 (In which the amount of different combinations is 7C0+7C1+7C2...=2^7), then an example for a complex group is 1,1,1,3,3,5,7. How to calculate how many different combinations (where order does not matter) can be generated from such groups?
EDIT: to clarify this. If for example we take 7C1=7, then we find that it cannot be applied to complex groups. That's because we get 7 different groups, but some of them are equal (1=1=1 and 3=3), so actually there are only 4 different groups (1,3,5,7).
In other words, in the simple case of 1,1,2, simple 2^3 would consider these groups:
{},{1},{1},{2},{1,1},{1,2},{1,2},{1,1,2} = 8
What I need, is a way to calculate the amount of different groups (I consider {1,2}={2,1}). That would consider these:
{},{1},{2},{1,1},{1,2},{1,1,2} = 6
It is the product of the (counts+1) of unique elements in the set.
Explanation : For each unique number it can occur from zero to k times where k is the number of repetition of the number. So there are [0..k] i.e. total (k+1) options for each unique number. So the it is the product of the (counts+1) of unique elements in the set.
For {1,1,2}: count+1 for 1 = 2+1 = 3 and count+1 for 2 = 1+1 = 2
So the answer is 3*2 = 6.
For {1,1,1,3,3,5,7} it is (3+1)*(2+1)*(1+1)*(1+1) = 4*3*2*2 = 48
A python3 code:
>>> import collections
>>> A = [1,1,1,3,3,5,7]
>>> def countComplexGroups(A):
... count = collections.Counter(A)
... rt = 1
... for i in count: rt*=count[i]+1
... return rt
...
>>> print(countComplexGroups(A))
48

probabilities combinations

I faced a problem while writing a code in Matlab that calculates sum of products of all possible combinations of n numbers taken from a vector with length m. It is similar to the task that you will drag exactly n different balls out of the bag with m balls (order doesn't matter).
example:
m = 5, n = 3 then i need to calculate sum of 10 summands
thanks for your time
You should use nchoosek.
m=5;
n=3;
s=sum(nchoosek(1:m,n), 2);

A more generalized expand.grid function?

expand.grid(a,b,c) produces all the combinations of the values in a,b, and c in a matrix - essentially filling the volume of a three-dimensional cube. What I want is a way of getting slices or lines out of that cube (or higher dimensional structure) centred on the cube.
So, given that a,b, c are all odd-length vectors (so they have a centre), and in this case let's say they are of length 5. My hypothetical slice.grid function:
slice.grid(a,b,c,dimension=1)
returns a matrix of the coordinates of points along the three central lines. Almost equivalent to:
rbind(expand.grid(a[3],b,c[3]),
expand.grid(a,b[3],c[3]),
expand.grid(a[3],b[3],c))
almost, because it has the centre point repeated three times. Furthermore:
slice.grid(a,b,c,dimension=2)
should return a matrix equivalent to:
rbind(expand.grid(a,b,c[3]), expand.grid(a,b[3],c), expand.grid(a[3],b,c))
which is the three intersecting axis-aligned planes (with repeated points in the matrix at the intersections).
And then:
slice.grid(a,b,c,dimension=3)
is the same as expand.grid(a,b,c).
This isn't so bad with three parameters, but ideally I'd like to do this with N parameters passed to the function expand.grid(a,b,c,d,e,f,dimension=4) - its unlikely I'd ever want dimension greater than 3 though.
It could be done by doing expand.grid and then extracting those points that are required, but I'm not sure how to build that criterion. And I always have the feeling that this function exists tucked in some package somewhere...
[Edit] Right, I think I have the criterion figured out now - its to do with how many times the central value appears in each row. If its less than or equal to your dimension+1...
But generating the full matrix gets big quickly. It'll do for now.
Assuming a, b and c each have length 3 (and if there are 4 variables then they each have length 4 and so on) try this. It works by using 1:3 in place of each of a, b and c and then counting how many 3's are in each row. If there are four variables then it uses 1:4 and counts how many 4's are in each row, etc. It uses this for the index to select out the appropriate rows from expand.grid(a, b, c) :
slice.expand <- function(..., dimension = 1) {
L <- lapply(list(...), seq_along)
n <- length(L)
ix <- rowSums(do.call(expand.grid, L) == n) >= (n-dimension)
expand.grid(...)[ix, ]
}
# test
a <- b <- c <- LETTERS[1:3]
slice.expand(a, b, c, dimension = 1)
slice.expand(a, b, c, dimension = 2)
slice.expand(a, b, c, dimension = 3)

Resources