Determine if there exists a number in the array occurring k times - recursion

I want to create a divide and conquer algorithm (O(nlgn) runtime) to determine if there exists a number in an array that occurs k times. A constraint on this problem is that only a equality/inequality comparison method is defined on the objects of the array (i.e can't use <, >).
So I have tried a number of approaches including splitting the array into k pieces of equal size (approximately). The approach is similar to finding the majority item in an array, however in the majority case when you split the array, you know that one half must have a majority item if such an item exists. Any pointers or tips that one could provide to put me in the right direction ?
EDIT: To clear up a little, I am wondering whether the problem of finding the majority item by splitting the array in half and using a recursive solution can be extended to other situations where k may be n/4 or n/5 etc.
Maybe I should of phrased the question using n/k instead.

This is impossible. As a simple example of why this is impossible, consider an input with a length-n array, all elements distinct, and k=2. The only way to be sure no element appears twice is to compare every element against every other element, which takes O(n^2) time. Until you perform all possible comparisons, you cannot be sure that some pair you didn't compare isn't actually equal.

Related

Can SPARK be used to prove that Quicksort actually sorts?

I'm not a user of SPARK. I'm just trying to understand the capabilities of the language.
Can SPARK be used to prove, for example, that Quicksort actually sorts the array given to it?
(Would love to see an example, assuming this is simple)
Yes, it can, though I'm not particularly good at SPARK-proving (yet). Here's how quick-sort works:
We note that the idea behind quicksort is partitioning.
A 'pivot' is selected and this is used to partition the collection into three groups: equal-to, less-than, and greater-than. (This ordering impacts the procedure below; I'm using this because it's different than the in-order version to illustrate that it is primarily about grouping, not ordering.)
If the collection is 0 or 1 in length, then you are sorted; if 2 then check and possibly-correct the ordering and they are sorted; otherwise continue on.
Move the pivot to the first position.
Scan from the second position position to the last position, depending on the value under consideration:
Less – Swap with the first item in the Greater partition.
Greater – Null-op.
Equal — Swap with the first item of Less, the swap with the first item of Greater.
Recursively call on the Less & Greater partitions.
If a function return Less & Equal & Greater, if a procedure re-arrange the in out input to that ordering.
Here's how you would go about doing things:
Prove/assert the 0 and 1 cases as true,
Prove your handling of 2 items,
Prove that given an input-collection and pivot there are a set of three values (L,E,G) which are the count of the elements less-than/equal-to/greater-than the pivot [this is probably a ghost-subprogram],
Prove that L+E+G equals the length of your collection,
Prove [in the post-condition] that given the pivot and (L,E,G) tuple, the output conforms to L items less-than the pivot followed by E items which are equal, and then G items that are greater.
And that should do it. [IIUC]

Find the first root and local maximum/minimum of a function

Problem
I want to find
The first root
The first local minimum/maximum
of a black-box function in a given range.
The function has following properties:
It's continuous and differentiable.
It's combination of constant and periodic functions. All periods are known.
(It's better if it can be done with weaker assumptions)
What is the fastest way to get the root and the extremum?
Do I need more assumptions or bounds of the function?
What I've tried
I know I can use root-finding algorithm. What I don't know is how to find the first root efficiently.
It needs to be fast enough so that it can run within a few miliseconds with precision of 1.0 and range of 1.0e+8, which is the problem.
Since the range could be quite large and it should be precise enough, I can't brute-force it by checking all the possible subranges.
I considered bisection method, but it's too slow to find the first root if the function has only one big root in the range, as every subrange should be checked.
It's preferable if the solution is in java, but any similar language is fine.
Background
I want to calculate when arbitrary celestial object reaches certain height.
It's a configuration-defined virtual object, so I can't assume anything about the object.
It's not easy to get either analytical solution or simple approximation because various coordinates are involved.
I decided to find a numerical solution for this.
For a general black box function, this can't really be done. Any root finding algorithm on a black box function can't guarantee that it has found all the roots or any particular root, even if the function is continuous and differentiable.
The property of being periodic gives a bit more hope, but you can still have periodic functions with infinitely many roots in a bounded domain. Given that your function relates to celestial objects, this isn't likely to happen. Assuming your periodic functions are sinusoidal, I believe you can get away with checking subranges on the order of one-quarter of the shortest period (out of all the periodic components).
Maybe try Brent's Method on the shortest quarter period subranges?
Another approach would be to apply your root finding algorithm iteratively. If your range is (a, b), then apply your algorithm to that range to find a root at say c < b. Then apply your algorithm to the range (a, c) to find a root in that range. Continue until no more roots are found. The last root you found is a good candidate for your minimum root.
Black box function for any range? You cannot even be sure it has the continuous domain over that range. What kind of solutions are you looking for? Natural numbers, integers, real numbers, complex? These are all the question that greatly impact the answer.
So 1st thing should be determining what kind of number you accept as the result.
Second is having some kind of protection against limes of function that will try to explode your calculations as it goes for plus or minus infinity.
Since we are touching the limes topics you could have your solution edge towards zero and look like a solution but never touch 0 and become a solution. This depends on your margin of error, how close something has to be to be considered ok, it's good enough.
I think for this your SIMPLEST TO IMPLEMENT bet for real number solutions (I assume those) is to take an interval and this divide and conquer algorithm:
Take lower and upper border and middle value (or approx middle value for infinity decimals border/borders)
Try to calculate solution with all 3 and have some kind of protection against infinities
remember all 3 values in an array with results from them (3 pair of values)
remember the current best value (one its closest to solution) in seperate variable (a pair of value and result for that value)
STEP FORWARD - repeat above with 1st -2nd value range and 2nd -3rd value range
have a new pair of value and result to be closest to solution.
clear the old value-result pairs, replace them with new ones gotten from this iteration while remembering the best value solution pair (total)
Repeat above for how precise you wish to get and look at that memory explode with each iteration, keep in mind you are gonna to have exponential growth of values there. It can be further improved if you lets say take one interval and go as deep as you wanna, remember best value-result pair and then delete all other memory and go for next interval and dig deep.

Bit Array with Find Max

So bit arrays and hash tables don't seem to inherently allow for a find-max type operation, but there are ways around it. I'm wondering if there's a way using the bit array alone without extra variables, pointers, or manipulating the start/end of the array, in some scenarios. For example...
I have integers {1,...,n} and a n-bit bit array. To keep a subset of the integers, I use the integer itself as the key in the bit array and set the bit to 1 if it is in the subset, or 0 if it is not.
For example for integers {1,2,3,4} and subset {1,3), the bit array would look like {1,0,1,0}.
It seems like there's no way to do this without somehow moving the bits around which leads me to believe the O(1) dream is dead and perhaps the bit array won't work. Is something like this possible in O(log n)?
Thanks
Finding the highest set bit on a bit array of length n is O(n). If you need better, then you'll need to choose another data structure, or keep a high-water mark along with your bitmap.

Comparing a c++ std::vector's elements with each other

I have a std::vector of double values. Now I need to know if two succeeding elements are within a certain distance in order to process them. I have sorted the vector previously with std::sort.
I have thought about a solution to determine these elements with std::find_if and a lambda expression (c++11), like this:
std::vector<std::vector<double>::iterator> foundElements;
std::vector<double>::iterator foundElement;
while (foundElement != gradients.end())
{
foundElement = std::find_if(gradients.begin(), gradients.end(),
[] (double grad)->bool{
return ...
});
foundElements.push_back(foundElement);
}
But what should the predicate actually return? The plan is that I use the vector of iterators to later modify the vector.
Am I on the right track with this approach or is it too complicated/impossible? What are other, propably more practical solutions?
EDIT: I think I will go after the std::adjacent_find function, as proposed by one answerer.
Read about std::adjacent_find.
Can you enhance the grammar of your question?
"I have a std::vector with different double values. Now I need to know if two succeeding ones (I have sorted the vector previously with std::sort) are within a certain distance to process them)."
Do you imply that each element of a vector of type double is a unique value? If that's the case, can it be reasonably inferred that your goal to find the distance between each of these elements?

pattern matching

Suppose I have a set of tuples like this (each tuple will have 1,2 or 3 items):
Master Set:
{(A) (A,C) (B,C,E)}
and suppose I have another set of tuples like this:
Real Set: {(BOB) (TOM) (ERIC,SALLY,CHARLIE) (TOM,SALLY) (DANNY) (DANNY,TOM) (SALLY) (SALLY,TOM,ERIC) (BOB,SALLY) }
What I want to do is to extract all subsets of Tuples from the Real Set where the tuple members can be substituted to become the same as the Master Set.
In the example above, two sets would be returned:
{(BOB) (BOB,SALLY) (ERIC,SALLY,CHARLIE)}
(let BOB=A,ERIC=B,SALLY=C,CHARLIE=E)
and
{(DANNY) (DANNY,TOM) (SALLY,TOM,ERIC)}
(let DANNY=A,SALLY=B,TOM=C,ERIC=E)
Its sort of pattern matching, sort of combinatorics I guess. I really don't know how to classify this problem and what common plans of attack there are for it. What would the stackoverflow experts suggest?
Seperate your tuples into sets by size. Within each set, create a data structure that allows you to efficiently query for tuples containing a given element. The first part of this structure is your tuples as an array (so that each tuple has a cannonical index). The second set is: Map String (Set Int). This is somewhat space intensive but hopefully not prohibative.
Then, you, essentially, brute force it. For all assignments to the first master set, restrict all assignments to other master sets. For all remaining assignments to the second, restrict all assignments to the third and beyond, etc. The algorithm is basically inductive.
I should add that I don't think the problem is NP-complete so much as just flat worst-case exponential. It's not a decision problem, but an enumeration problem. And it's fairly easy to imagine scenarios of inputs that blow up exponentially.
It will be difficult to do efficiently since your problem is probably NP-complete (it includes subgraph isomorphism as a special case). That assumes the patterns and database both vary in size, though. How much data are you searching? How complicated will your patterns be? I would recommend the brute force solution first, then test if that is too slow and you need something fancier.

Resources