Find combination of numbers close to X - recursion

Imagine you have a list of N numbers. You also have the "target" number. You want to find the combination of Z numbers that summed together are close to the target.
Example:
Target = 3.085
List = [0.87, 1.24, 2.17, 1.89]
Output:
[0.87, 2.17] = 3.04 (0.045 offset)
In the example above you would get the group [0.87, 2.17] because it has the smallest offset from the target of 0.045. It's a list of 2 numbers but it could be more or less.
My question is what is the best way/algorithm (fastest) to solve this problem? I'm thinking a recursive approach but not yet exactly sure how. What is your opinion on this problem?

This is a knapsack problem. To solve it you would do the following:
def knap(numbers,target):
values = Set()
values.add(0)
for v in values:
for n in numbers:
if v+n<(2*target): # this is optional..
values.add(v+n);
for v in values:
# find the closest item to your target
Essentially, you are building up all of the possible sums of the numbers. If you have integral values, you can make this even faster by using an array instead of a set.

Intuitively, I would start by sorting the list. (Use your favorite algorithm.) Then find the index of the largest element that is smaller than the target. From there, pick the largest element that is less than the target, and combine it with the smallest element. That would probably be your baseline offset. If it is a negative offset, you can keep looking for combinations using bigger numbers; if it is a positive offset you can keep looking for combinations using smaller numbers. At that point recursion might be appropriate.
This doesn't yet address the need for 'Z' numbers, of course, but it's a step in the right direction and can be generalized.
Of course, depending on the size of the problem the "fastest" way might well be to divide up the possible combinations, assign them to a group of machines, and have each one do a brute-force calculation on its subset. Depends on how the question is phrased. :)

Related

How would you optimize dividing bi variate data in R?

I'm not looking for a specific line a code - just built in functions or common packages that may help me do the following. Basically, something like, write up some code and use this function. I'm stuck on how to actually optimize - should I use SGD?
I have two variables, X, Y. I want to separate Y into 4 groups so that the L2, that is $(Xji | Yi - mean(Xji) | Yi)^2$ is minimized subject to the constraint that there are at least n observations in each group.
How would one go about solving this? I'd imagine you can't do this with the optim function? Basically the algo needs to move 3 values around (there are 3 cutoff points for Y) until L2 is minimized subject to n being a certain size.
Thanks
You could try optim and simply add a penalty if the constraints are not satisfied: since you minimise, add zero if all constraints are okay; otherwise a positive number.
If that does not work, since you only look for three cutoff points, I'd probably try a grid search, i.e. compute the objective function for different levels of the cutoff point; throw away those that violate the constraints, and then keep the best solution.

Determine if there exists a number in the array occurring k times

I want to create a divide and conquer algorithm (O(nlgn) runtime) to determine if there exists a number in an array that occurs k times. A constraint on this problem is that only a equality/inequality comparison method is defined on the objects of the array (i.e can't use <, >).
So I have tried a number of approaches including splitting the array into k pieces of equal size (approximately). The approach is similar to finding the majority item in an array, however in the majority case when you split the array, you know that one half must have a majority item if such an item exists. Any pointers or tips that one could provide to put me in the right direction ?
EDIT: To clear up a little, I am wondering whether the problem of finding the majority item by splitting the array in half and using a recursive solution can be extended to other situations where k may be n/4 or n/5 etc.
Maybe I should of phrased the question using n/k instead.
This is impossible. As a simple example of why this is impossible, consider an input with a length-n array, all elements distinct, and k=2. The only way to be sure no element appears twice is to compare every element against every other element, which takes O(n^2) time. Until you perform all possible comparisons, you cannot be sure that some pair you didn't compare isn't actually equal.

Is it possible to represent 'average value' in programming?

Had a tough time thinking of an appropriate title, but I'm just trying to code something that can auto compute the following simple math problem:
The average value of a,b,c is 25. The average value of b,c is 23. What is the value of 'a'?
For us humans we can easily compute that the value of 'a' is 29, without the need to know b and c. But I'm not sure if this is possible in programming, where we code a function that takes in the average values of 'a,b,c' and 'b,c' and outputs 'a' automatically.
Yes, it is possible to do this. The reason for this is that you can model the sort of problem being described here as a system of linear equations. For example, when you say that the average of a, b, and c is 25, then you're saying that
a / 3 + b / 3 + c / 3 = 25.
Adding in the constraint that the average of b and c is 23 gives the equation
b / 2 + c / 2 = 23.
More generally, any constraint of the form "the average of the variables x1, x2, ..., xn is M" can be written as
x1 / n + x2 / n + ... + xn / n = M.
Once you have all of these constraints written out, solving for the value of a particular variable - or determining that many solutions exists - reduces to solving a system of linear equations. There are a number of techniques to do this, with Gaussian elimination with backpropagation being a particularly common way to do this (though often you'd just hand this to MATLAB or a linear algebra package and have it do the work for you.)
There's no guarantee in general that given a collection of equations the computer can determine whether or not they have a solution or to deduce a value of a variable, but this happens to be one of the nice cases where the shape of the contraints make the problem amenable to exact solutions.
Alright I have figured some things out. To answer the question as per title directly, it's possible to represent average value in programming. 1 possible way is to create a list of map data structures which store the set collection as key (eg. "a,b,c"), while the average value of the set will be the value (eg. 25).
Extract the key and split its string by comma, store into list, then multiply the average value by the size of list to get the total (eg. 25x3 and 23x2). With this, no semantic information will be lost.
As for the context to which I asked this question, the more proper description to the problem is "Given a set of average values of different combinations of variables, is it possible to find the value of each variable?" The answer to this is open. I can't figure it out, but below is an attempt in describing the logic flow if one were to code it out:
Match the lists (from Paragraph 2) against one another in all possible combinations to check if a list contains all elements in another list. If so, substract the lists (eg. abc-bc) as well as the value (eg. 75-46). If upon substracting we only have 1 variable in the collection, then we have found the value for this variable.
If there's still more than 1 variables left such as abcd - bc = ad, then store the values as a map data structure and repeat the process, till the point where the substraction count in the full iteration is 0 for all possible combinations (eg. ac can't substract bc). This is unfortunately not where it ends.
Further solutions may be found by combining the lists (eg. ac + bd = abcd) to get more possible ways to subtract and derive at the answer. When this is the case, you just don't know when to stop trying, and the list of combinations will get exponential. Maybe someone with strong related mathematical theories may be able to prove that upon a certain number of iteration, further additions are useless and hence should stop. Heck, it may even be possible that negative values are also helpful, and hence contradict what I said earlier about 'ac' can't subtract 'bd' (to get a,c,-b,-d). This will give even more combinations to compute.
People with stronger computing science foundations may try what templatetypedef has suggested.

I have vectors of ones and zeroes. How can I efficiently determine if one vector is covered by another vector?

I was having trouble fitting my question into the 150 character limit of the title; I apologize if it's unclear.
Let's say I have two vectors of equal length (A and B) composed entirely of ones and zeroes. I'd like to know if there exists any pair A_i and B_i such that A_i = 0 when B_i = 1.
My problem is efficiency. It's trivially easy to write a for-loop, do an element by element comparison, set a flag and break if the condition is met. The problem comes up though, that I'm working on matrices of up to 20000 rows and numbers of columns of a similar magnitude. I'd like to have a way of quickly doing this check and removing any of the rows which are redundant for my purposes... at this scale the element by element comparison takes an impractical amount of time.
Is there any elegant linear algebra trick to address this in an efficient way?
Edit 1: The columns aren't exactly arranged at random. I can't say that they are strictly sorted, but I can say that as I go from left to right the columns on the left are more likely to cover columns the left than vice-versa (by cover, I mean that A covers B iff for all i, if A_i = 1 then B_i = 1 too). I can try and apply some additional kind of sorting to them if it would make addressing the problem easier (but I would prefer to avoid that if the sorting process would be similarly impractical).
Within individual columns, I'm not aware of any pattern to how the ones are distributed by index.

derangements and permutations in cryptography

i have a problem that i am having a bit of trouble with;
we are given a partial key (missing 11 letters) for a mono-alphabetic substitution cipher and asked to calculate the number of possible keys given that no plaintext letter can be mapped to itself.
ordinarily, the number of possible keys would be the number of derangements of the missing letters (!11), however 5 of the plaintext letters that are missing mappings already exist as mappings in the partial key, so logically it shouldnt matter what the mapping of those plaintext letters is, because they can never map to themselves.
so shouldnt the number of possible keys be 5! * !6, ie. (the number of permutations of the 5 already mapped free letters) * (the number of derangements of the remaining 6)?
the problem is that 5! * !6 = 31800 which is much less than !11 = 14684570
intuitively the set of derangements should be a smaller subset of !11, shouldnt it?
am i just getting something wrong in my arithmetic? or am i completely missing the concepts? any help would be greatly appreciated
thanks gus
ps. i know this isn't strictly a programming question, but it is a computing question and related to a programming project, so i thought it might be pertinent. also, i posted it on math.stackexchange.com yesterday but havent had any responses yet..
EDIT: corrected the value of !11
I think your problem can be rephrased as the following:
How many permutation has a list with elements a_0, a_1, ... a_n-1, b_0, b_1, ..., b_m-1, in which no a_k element is at position k? (Let us denote this number with p_{n,m} - your specific question is the value of p_{6,5}.)
Please note that your suggested formula 5!*!6 is not correct because of the following:
it only counts the cases, where the a_ks are in the first 6 positions (without any of them being in the position of its own index), and the b_ks on the last 5.
You do not count any other configurations like: a_3, b_4, b_1, a_0, a_5, b_0, a_2, b_2, b_3, a_1, a_4, where the order is totally mixed.
Your other idea about the result being a subset of the !11-element derangement on all the elements is also not correct, as any of the b_ks can be at any position.
However, we can easily add a recursive formula for p_{n,m} by separating it into two cases based on the position of a_0.
If a_0 gets in one of the positions 1, 2, ..., n-1. (n-1 different possibilities.)
This means that neither a_0 is at position 0, and it also prevents another a_k from being at position k by occupying that position. Thus this a_k becomes 'free', it can go to any other positions. If a_0 gets fixed this way, the other elements can be permutated in p_{n-2,m+1} different ways.
If a_0 gets in one of the positions n, n+1, ..., n+m-1. (m different possibilities.)
This way no other a_k gets prevented to be at the position corresponding to it's index. The other elements can be permutated in p_{n-1,m} different ways.
Adding this together gives the recursion: p_{n,m} = (n-1)*p_{n-2,m+1} + m*p_{n-1,m}. The halting conditions are p_{0,m}=m! for every m, as it means, that each element can be at any location.
I also coded it in python:
import math
def derange(n,m):
if n<0:
return 0
elif n==0:
return math.factorial(m)
else:
return (n-1)*derange(n-2, m+1) + m*derange(n-1, m)
print derange(6,5)
gives 22852200.
If you are interested in the general case, you can find some related sequences on OEIS.
The search term 'differences of factorial numbers' can be interesting, e.g. in triangular form: http://oeis.org/A047920.
There is also an article mentioned there: http://www.pmfbl.org/janjic/enumfun.pdf, maybe it can help if you are interested in a generic closed formula for n and m.
Suddenly I didn't have any good idea to come up with, but I think this can be a good starting point.

Resources