Calculating a list of valid combinations - recursion

So I came up with a problem that I can’t logically solve. I’ve fleshed out an example;
You have a list of numbers that you need to output (e.g. 3x number 1 & 2x number 2) - we’ll call them ‘target numbers’.
You’re also given a range of source numbers (e.g. 2, 3, 4 and 5).
The task is to return all valid combinations of the source numbers that would allow you to produce the target numbers. You can use any combination and quantity of source numbers.
The constraints are that you can break a source number down to get to target numbers (e.g. you could break down a 5 into a 2 and a 3) but you cannot add source numbers together to get to a target number (for example you can’t add a 1 to a 1 to get to a 2).
Remainders are perfectly acceptable (e.g. using a source 3 to get to a target 2 and the remaining 1 is part of the combination but not ‘used’ in getting to the target).
In the interests of limiting results you’d also want to have a constraint that an acceptable combination does not contain any ‘totally unused’ source numbers [i.e. neither a result of being split nor a target number in the result)
So in the example target & source numbers given, the following results would be valid;
[1,1,1,2,2],[3,4],[3,3,1],[4,4]
But a [1,1,1,1,1,2] would not be valid, because you cannot join two source 1’s together to make a target 2.
I’ve thought about this logic and am largely thinking the solution involves some level of recursion, but the only examples I’ve seen are where the constraints are reversed (i.e. you can add source numbers together to reach a target number, but cannot split them to reach one)
What kind of logic would you use to generate all valid permutations in code?

Related

Can SPARK be used to prove that Quicksort actually sorts?

I'm not a user of SPARK. I'm just trying to understand the capabilities of the language.
Can SPARK be used to prove, for example, that Quicksort actually sorts the array given to it?
(Would love to see an example, assuming this is simple)
Yes, it can, though I'm not particularly good at SPARK-proving (yet). Here's how quick-sort works:
We note that the idea behind quicksort is partitioning.
A 'pivot' is selected and this is used to partition the collection into three groups: equal-to, less-than, and greater-than. (This ordering impacts the procedure below; I'm using this because it's different than the in-order version to illustrate that it is primarily about grouping, not ordering.)
If the collection is 0 or 1 in length, then you are sorted; if 2 then check and possibly-correct the ordering and they are sorted; otherwise continue on.
Move the pivot to the first position.
Scan from the second position position to the last position, depending on the value under consideration:
Less – Swap with the first item in the Greater partition.
Greater – Null-op.
Equal — Swap with the first item of Less, the swap with the first item of Greater.
Recursively call on the Less & Greater partitions.
If a function return Less & Equal & Greater, if a procedure re-arrange the in out input to that ordering.
Here's how you would go about doing things:
Prove/assert the 0 and 1 cases as true,
Prove your handling of 2 items,
Prove that given an input-collection and pivot there are a set of three values (L,E,G) which are the count of the elements less-than/equal-to/greater-than the pivot [this is probably a ghost-subprogram],
Prove that L+E+G equals the length of your collection,
Prove [in the post-condition] that given the pivot and (L,E,G) tuple, the output conforms to L items less-than the pivot followed by E items which are equal, and then G items that are greater.
And that should do it. [IIUC]

How to choose the lengths of my sub sequences for a shell sort?

Let's assume we have a sequence a_i of length n and we want to sort it using shell sort. To do so, we would choose sub sequences of out a_i's of length k_i.
I'm now wondering how to choose those k_i's. You usually see that if n=16 we would choose k_1=8, k_2=4, k_3=2, k_4=1. So we would pair-wise compare the number's for each k_i and at the end use insertionSort to finish our sorting.
The idea of first sorting sub sequences of length k_i is to "pre-sort" the sequence for the insertionSort. Right?
Questions:
Now, depending on how we choose our k_i, we get a better performance. Is there a rule I can use here to choose the k_i's?
Could I also choose e.g. n=15, k_1=5, k_2=3, k_3=2?
If we have n=10 and k_1=5, would we now go with {k_2=2, k_3=1} or {k_2=3, k_2=2, k_3=1} or {k_2=3, k_3=1}?
The fascinating thing about shellsort is that for a sequence of n (unique) entries a unique set of gaps will be required to sort it efficiently, essentially f(n) => {gap/gaps}
For example, to most efficiently - on average - sort a sequence containing
2-5 entries - use insertion sort
6 entries - use shellsort with gaps {4,1}
7 or 8 entries - use a {5,1} shellsort
9 entries - use a {6,1} shellsort
10 entries - use a {9,6,1} shellsort
11 entries - use a {10,6,1} shellsort
12 entries - use a {5,1} shellsort
As you can see, 6-9 require 2 gaps, 10 and 11 three and 12 two. This is typical of shellsort's gaps: from one n to the next (i e n+1) you can be fairly sure that the number and makeup of gaps will differ.
A nasty side-effect of shellsort is that when using a set of random combinations of n entries (to save processing/evaluation time) to test gaps you may end up with either the best gaps for n entries or the best gaps for your set of combinations - most likely the latter.
I speculate that it is probably possible to create algorithms where you can plug in an arbitrary n and get the best gap sequence computed for you. Many high-profile computer scientists have explored the relationship between n and gaps without a lot to show for it. In the end they produce gaps (more or less by trial and error) that they claim perform better than those of others who have explored shellsort.
Concerning your foreword given n=16 ...
a {8,4,2,1} shellsort may or may not be an efficient way to sort 16 entries.
or should it be three gaps and, if so, what might they be?
or even two?
Then, to (try to) answer your questions ...
Q1: a rule can probably be formulated
Q2: you could ... but you should test it (for a given n there are n! possible sequences to test)
Q3: you can compare it with the correct answer (above). Or you can test it against all 10! possible sequences when n=10 (comes out to 3628800 of them) - doable

Finding total number of ways for some conditions

There is a particular type of problem for which I need some help to understand it properly.
Lets look at an example.
Suppose we are given an integer n.
We have to find the number of possible pairs (say a and b) such that the given conditions are fulfilled-
1<=a<=b<=n
f(a) < f(b)
where f(a)=sum of digits of a
Now I understand that instead of counting the possible solutions, we will try to find the number of ways to form two numbers such that the above 3 conditions are fulfilled. We will start from the one's place and go on from there.
But how to proceed after that?
How to determine that we have to stop now??
How to check that the above conditions are fulfilled at each step?
For e.g What digit we choose for the thousand's place will depend on the digits chosen for the hundred's and the one's place.
This is a pretty common type of a problem in competitive programming and I want to learn the proper method to solve it.

What is the difference between permutations and derangements?

I have been given a program to write difference combinations of set of number entered by user and when I researched for the same I get examples with terms permutations and derangements.
I am unable to find the clarity between the them. Also adding to that one more term is combinations. Any one please provide a simple one liner for clarity on the question.
Thanks in advance.
http://en.wikipedia.org/wiki/Permutation
The notion of permutation relates to the act of rearranging, or permuting, all the members of a set into some sequence or order (unlike combinations, which are selections of some members of the set where order is disregarded). For example, written as tuples, there are six permutations of the set {1,2,3}, namely: (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), and (3,2,1). As another example, an anagram of a word, all of whose letters are different, is a permutation of its letters.
http://en.wikipedia.org/wiki/Derangement
In combinatorial mathematics, a derangement is a permutation of the elements of a set such that none of the elements appear in their original position.
The number of derangements of a set of size n, usually written Dn, dn, or !n, is called the "derangement number" or "de Montmort number". (These numbers are generalized to rencontres numbers.) The subfactorial function (not to be confused with the factorial n!) maps n to !n.1 No standard notation for subfactorials is agreed upon; n¡ is sometimes used instead of !n.2

Quantifying the non-randomness of a specialized random generator?

I just read this interesting question about a random number generator that never generates the same value three consecutive times. This clearly makes the random number generator different from a standard uniform random number generator, but I'm not sure how to quantitatively describe how this generator differs from a generator that didn't have this property.
Suppose that you handed me two random number generators, R and S, where R is a true random number generator and S is a true random number generator that has been modified to never produce the same value three consecutive times. If you didn't tell me which one was R or S, the only way I can think of to detect this would be to run the generators until one of them produced the same value three consecutive times.
My question is - is there a better algorithm for telling the two generators apart? Does the restriction of not producing the same number three times somehow affect the observable behavior of the generator in a way other than preventing three of the same value from coming up in a row?
As a consequence of Rice's Theorem, there is no way to tell which is which.
Proof: Let L be the output of the normal RNG. Let L' be L, but with all sequences of length >= 3 removed. Some TMs recognize L', but some do not. Therefore, by Rice's theorem, determining if a TM accepts L' is not decidable.
As others have noted, you may be able to make an assertion like "It has run for N steps without repeating three times", but you can never make the leap to "it will never repeat a digit three times." More appropriately, there exists at least one machine for which you can't determine whether or not it meets this criterion.
Caveat: if you had a truly random generator (e.g. nuclear decay), it is possible that Rice's theorem would not apply. My intuition is that the theorem still holds for these machines, but I've never heard it discussed.
EDIT: a secondary proof. Suppose P(X) determines with high probability whether or not X accepts L'. We can construct an (infinite number of) programs F like:
F(x): if x(F), then don't accept L'
else, accept L'
P cannot determine the behavior of F(P). Moreover, say P correctly predicts the behavior of G. We can construct:
F'(x): if x(F'), then don't accept L'
else, run G(x)
So for every good case, there must exist at least one bad case.
If S is defined by rejecting from R, then a sequence produced by S will be a subsequence of the sequence produced by R. For example, taking a simple random variable X with equal probability of being 1 or 0, you would have:
R = 0 1 1 0 0 0 1 0 1
S = 0 1 1 0 0 1 0 1
The only real way to differentiate these two is to look for streaks. If you are generating binary numbers, then streaks are incredibly common (so much so that one can almost always differentiate between a random 100 digit sequence and one that a student writes down trying to be random). If the numbers are taken from [0,1] uniformly, then streaks are far less common.
It's an easy exercise in probability to calculate the chance of three consecutive numbers being equal once you know the distribution, or even better, the expected number of numbers needed until the probability of three consecutive equal numbers is greater than p for your favourite choice of p.
Since you defined that they only differ with respect to that specific property there is no better algorithm to distinguish those two.
If you do triples of randum values of course the generator S will produce all other triples slightly more often than R in order to compensate the missing triples (X,X,X). But to get a significant result you'd need much more data than it will cost you to find any value three consecutive times the first time.
Probably use ENT ( http://fourmilab.ch/random/ )

Resources