Is it possible without using exponentiation to have a set of numbers that when added together, always give unique sum?
I know it can be done with exponentiation (see first answer): The right way to manage user privileges (user hierarchy)
But I'm wondering if it's possible without exponentiation.
No, you can only use exponentiation, because the sum of lower values have to be less than the new number to be unique: 1+2=3 < 4, 1+2+4=7 < 8.
[EDIT:]
This is a laymen's explanation, of course there are other possibilities, but none as efficient as using exponentials of 2.
There's a chance it can be done without exponentation (I'm no math expert), but not in any way that's more efficient than exponentation. This is because it only takes one bit of storage space per possible value, and as an added plus you can use boolean operators to do useful stuff with the values.
If you restrict yourself to integers, the numbers have to grow at least as fast as an exponential function. If you find a function that grows faster (like, oh, maybe the Ackermann function) then the numbers produced by that will probably work too.
With floating-point numbers, you can keep adding unique irreducible roots of primes (sqrt(2), sqrt(3), sqrt(5), ...) and you will always get something unique, up until you hit the limits of floating-point precision. Not sure how many unique numbers you could squeeze out of it - maybe you should try it.
No. To see this directly, think about building up the set of basis values by considering at each step the smallest possible positive integer that could be included as the next value. The next number to add must be different from all possible sums of the numbers already in the set (including the empty sum, which is 0), and can't combine with any combination of numbers already present to produce a duplicate. So...
{} : all possible sums = {0}, smallest possible next = 1
{1} : all possible sums = {0, 1}, smallest possible next = 2
{1, 2} : all possible sums = {0, 1, 2, 3}, smallest possible next = 4
{1, 2, 4} : a.p.s. = {0, 1, 2, 3, 4, 5, 6, 7}, s.p.n. = 8
{1, 2, 4, 8} ...
And, of course, we're building up the binary powers. You could start with something other than {1, 2}, but look what happens, using the "smallest possible next" rule:
{1, 3} : a.p.s. = {0, 1, 3, 4}, s.p.n. = 6 (because 2 could be added to 1 giving 3, which is already there)
{1, 3, 6} : a.p.s. = {0, 1, 3, 4, 6, 7, 9, 10}, s.p.n = 11
{1, 3, 6, 11} ...
This sequence is growing faster than the binary powers, term by term.
If you want a nice Project-Euler-style programming challenge, you could write a routine that takes a set of positive integers and determines the "smallest possible next" positive integer, under the "sums must be unique" constraint.
Related
This is a bit of a math question, but I post it here too because there's a direct practical purpose and it's related to creating a faster algorithm. I want to identify users that use my app on a weekly basis. For each user I can generate a sequence of times of their interactions, and from that I can generate a sequence of the length of time between each interaction.
So given this sequence of lengths of time, how can I find sections of consecutive numbers that have an average of 7 days or less?
As an example, if I had the following sequence: [1, 11, 1, 8, 12]
[1, 11, 1, 8, 12] would be a valid stretch of numbers with an average of 7 or less, but [11, 1, 8, 12] would not be valid. [1, 2, 12] would again be valid.
Ideally, my output for every valid section would be the starting position of the first item and the length of the section. So [1, 11, 1, 8, 12] would be described as [1, 5] and [1, 2, 12] would be described as [3, 3].
There is a brute force, computational approach where I take every item in the sequence as a start point, and calculate the averages of every possible length of following numbers up until the end of the sequence. The number of calculations grows quickly though at a rate of n(n+1)/2 (Imagine for each given sequence of length N finding consecutive sequences of length N, N-1, N-2 etc.)
I ask broadly if there's a more elegant approach that doesn't require a quadratically growing number of individual calculations for means.
Take a UUID in its hex representation: '123e4567-e89b-12d3-a456-426655440000'
I have a lot of such UUIDs, and I want to separate them into N buckets, where N is of my choosing, and I want to generate the bounds of these buckets.
I can trivially create 16 buckets with these bounds:
00000000-0000-0000-0000-000000000000
10000000-0000-0000-0000-000000000000
20000000-0000-0000-0000-000000000000
30000000-0000-0000-0000-000000000000
...
e0000000-0000-0000-0000-000000000000
f0000000-0000-0000-0000-000000000000
ffffffff-ffff-ffff-ffff-ffffffffffff
just by iterating over the options for the first hex digit.
Suppose I want 50 equal size buckets(equal in terms of number of UUID possibilities contained within each bucket), or 2000 buckets, or N buckets.
How do I generate such bounds as a function of N?
Your UUIDs above are 32 hex digits in length. So that means you have 16^32 ≈ 3.4e38 possible UUIDs. A simple solution would be to use a big int library (or a method of your own) to store these very large values as actual numbers. Then, you can just divide the number of possible UUIDs by N (call that value k), giving you bucket bounds of 0, k, 2*k, ... (N-1)*k, UMAX.
This runs into a problem if N doesn't divide the number of possible UUIDs. Obviously, not every bucket will have the same number of UUIDs, but in this case, they won't even be evenly distributed. For example, if the number of possible UUIDs is 32, and you want 7 buckets, then k would be 4, so you would have buckets of size 4, 4, 4, 4, 4, 4, and 8. This probably isn't ideal. To fix this, you could instead make the bucket bounds at 0, (1*UMAX)/N, (2*UMAX)/N, ... ((N-1)*UMAX)/N, UMAX. Then, in the inconvenient case above, you would end up with bounds at 0, 4, 9, 13, 18, 22, 27, 32 -- giving bucket sizes of 4, 5, 4, 5, 4, 5, 5.
You will probably need a big int library or some other method to store large integers in order to use this method. For comparison, a long long in C++ (in some implementations) can only store up to 2^64 ≈ 1.8e19.
If N is a power of 2, then the solution is obvious: you can split on bit boundaries as for 16 buckets in your question.
If N is not a power of 2, the buckets mathematically cannot be of exactly equal size, so the question becomes how unequal are you willing to tolerate in the name of efficiency.
As long as N<2^24 or so, the simplest thing to do is just allocate UUIDs based on the first 32 bits into N buckets each of size 2^32/N. That should be fast enough and equal enough for most applications, and if N needs to be larger than that allows, you could easily double the bits with a smallish penalty.
Just as a side note, this isn't asking someone to help with homework it's just a example question in my lecture notes that i'm not grasping and would greatly appreciate if someone could explain to me how exactly to work it out.
It says simply this:
Let G = {3,5,7}. Write down some examples of 4-tuples.
Thank you to anyone who tries to help, this is mathematics to understand a systems unit :)
Your collection G is a set, which is unordered and non-repeating, containing three elements. You want some 4-tuples, which is an ordered collection of possibly-repeating elements, and there must be 4 elements.
We show tuples by using parentheses around the collection, while a set like G is written using braces (curly brackets). Some examples of 4-tuples using the elements of G are
(3, 3, 3, 3)
(3, 3, 3, 5)
(3, 3, 3, 7)
(3, 3, 5, 3)
...
(7, 7, 7, 5)
(7, 7, 7, 7)
That list of mine was in a particular order, called the lexigraphical order. Since there are 4 elements and each element has 3 choices regardless of the other choices, the total number of 4-tuples is
3x3x3x3 = 81
As another answer implied, your question is somewhat ambiguous. I assumed that each 4-tuple was to have elements taken from your set G, but your question did not actually say that. It does seem to be implied, however.
A 4-tuple, in math, is any vector populated with 4 objects. An example of a 4-tuple is {1, 5, 3, 7}.
Without more context, I can't use the fact that G = {3, 5, 7}.
so below I have some mathematica code and I would like to know how to do this in R. For those not familiar with mathematica I will explain what this code does.
We are looking at the permutations of the list (1,2,3,4).
We generate a list of these permutations: {(1,2,3,4),(1,2,4,3)...etc}.
Then for each permutation, we check to see how many of the elements are in the right spot. E.g. if the current permutation is (4,2,3,1) then 2 and 3 are both in the correct spot and I want to recode this as the value 2, (1,4,2,3) would be 1, etc
then I want to randomly sample 1000000 times from these recoded list.
I know in R this last part is in the form of
sample(vectorToSample,10^6,replace=True)
Here it the mathematica code
RandomChoice[
Table[
Total[
Table[Permutations[{1, 2, 3, 4}][[i]][[j]] == j, {j, 1,4}]
]
, {i, 1,24}]/. {True -> 1,False -> 0}
, 10^6]
Or in a much cleaner way
RandomChoice[{9/24, 8/24, 6/24, 1/24} -> {0, 1, 2, 4}, 10^6]
EDIT: Wow, many great responses. Yes, I am using this as a fitness function for judging the quality of a sort performed by a genetic algorithm. So cost-of-evaluation is important (i.e., it has to be fast, preferably O(n).)
As part of an AI application I am toying with, I'd like to be able to rate a candidate array of integers based on its monotonicity, aka its "sortedness". At the moment, I'm using a heuristic that calculates the longest sorted run, and then divides that by the length of the array:
public double monotonicity(int[] array) {
if (array.length == 0) return 1d;
int longestRun = longestSortedRun(array);
return (double) longestRun / (double) array.length;
}
public int longestSortedRun(int[] array) {
if (array.length == 0) return 0;
int longestRun = 1;
int currentRun = 1;
for (int i = 1; i < array.length; i++) {
if (array[i] >= array[i - 1]) {
currentRun++;
} else {
currentRun = 1;
}
if (currentRun > longestRun) longestRun = currentRun;
}
return longestRun;
}
This is a good start, but it fails to take into account the possibility that there may be "clumps" of sorted sub-sequences. E.g.:
{ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9}
This array is partitioned into three sorted sub-sequences. My algorithm will rate it as only 40% sorted, but intuitively, it should get a higher score than that. Is there a standard algorithm for this sort of thing?
This seems like a good candidate for Levenshtein Damerau–Levenshtein distance - the number of swaps needed to sort the array. This should be proportional to how far each item is from where it should be in a sorted array.
Here's a simple ruby algorithm that sums the squares of the distances. It seems a good measure of sortedness - the result gets smaller every time two out-of-order elements are swapped.
ap = a.sort
sum = 0
a.each_index{|i| j = ap.index(a[i])-i
sum += (j*j)
}
dist = sum/(a.size*a.size)
I expect that the choice of function to use depends very strongly on what you intend to use it for. Based on your question, I would guess that you are using a genetic system to create a sorting program, and this is to be the ranking function. If that is the case, then speed of execution is crucial. Based on that, I bet your longest-sorted-subsequence algorithm would work pretty well. That sounds like it should define fitness pretty well.
Something like these? http://en.wikipedia.org/wiki/Rank_correlation
Here's one I just made up.
For each pair of adjacent values, calculate the numeric difference between them. If the second is greater than or equal to the first, add that to the sorted total, otherwise add to the unsorted total. When done, take the ratio of the two.
Compute the lenghts of all sorted sub-sequences, then square them and add them.
If you want to calibrate how much enphasis you put on largest, use a power different than 2.
I'm not sure what's the best way to normalize this by length, maybe divide it per length squared?
What you're probably looking for is Kendall Tau. It's a one-to-one function of the bubble sort distance between two arrays. To test whether an array is "almost sorted", compute its Kendall Tau against a sorted array.
I would suggest looking at the Pancake Problem and the reversal distance of the permutations. These algorithms are often used to find the distance between two permutations (the Identity and the permuted string). This distance measure should take into account more clumps of in order values, as well as reversals (monotonically decreasing instead of increasing subsequences). There are also approximations that are polynomial time[PDF].
It really all depends on what the number means and if this distance function makes sense in your context though.
I have the same problem (monotonicity scoring), and I suggest you to try Longest Increasing Subsequence. The most efficient algorithm run in O(n log n), not so bad.
Taking example from the question, the longest increasing sequence of {4, 5, 6, 0, 1, 2, 3, 7, 8, 9} is {0, 1, 2, 3, 7, 8, 9} (length of 7). Maybe it rate better (70%) than your longest-sorted-run algorithm.
It highly depends on what you're intending to use the measure for, but one easy way to do this is to feed the array into a standard sorting algorithm and measure how many operations (swaps and/or comparisons) need to be done to sort the array.
Some experiments with a modifier Ratcliff & Obershelp
>>> from difflib import SequenceMatcher as sm
>>> a = [ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9 ]
>>> c = [ 0, 1, 9, 2, 8, 3, 6, 4, 7, 5 ]
>>> b = [ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9 ]
>>> b.sort()
>>> s = sm(None, a, b)
>>> s.ratio()
0.69999999999999996
>>> s2 = sm(None, c, b)
>>> s2.ratio()
0.29999999999999999
So kind of does what it needs to. Not too sure how to prove it though.
How about counting the number of steps with increasing value vs. the number of total steps. That's O(n).