Generating a unique random number in constant time - collections

Is there any way to generate a unique random number in constant time? I'm currently using an arrayList which contains all my possible numbers. It implements Collection and I'm calling the shuffle method on this arrayList. I'm then removing the 0th element from the arrayList to get the unique random number, but I believe this cannot be constant time for 2 reasons:
the remove method is O(n)
Collection.shuffle is O(n)
Suggestions? Thanks!

Collection.shuffle is indeed O(n), but it only needs to be called once. If you're getting n random numbers, that time sort of diffuses into the remaining operations. (this is the kind of idea behind amortized analysis, which seems necessary for your problem: either you want "amortized deterministic constant time", or "expected constant time", as one might do with a dictionary)
the remove method is O(n) in general, but if you repeatedly remove the last element instead of the first, the array doesn't have to constantly shift its elements down, so it's actually O(1)--it's just that it's O(n) in the average and worst cases.

Related

Why the time complexity of linked list ( single linked list ) isn't n-1?

My point is that when the pointer traverse in a linked list till n-1 position he get the value of nth easily because as we know the address of nth pointer is at n-1th location. Hence, Time Complexity Must needed to be n-1 instead of n.
any Big O notation always describes an upper bound. this means although n-1 would be more accurate to describe the complexity, it is still included in O(n).
Secondly any constant factor (-1) is reduced from the equation, when describing a Big O complexity class, because for any decent large n any constant factor does not have any noticable influence on the result.
The same as any constant factor for the n is removed. O(3n+5) is the same complexity class as O(n).
This both combined is the reason why you describe the complexity as O(n) and not O(n-1), although it might technically still be correct and more accurate.

Determine if there exists a number in the array occurring k times

I want to create a divide and conquer algorithm (O(nlgn) runtime) to determine if there exists a number in an array that occurs k times. A constraint on this problem is that only a equality/inequality comparison method is defined on the objects of the array (i.e can't use <, >).
So I have tried a number of approaches including splitting the array into k pieces of equal size (approximately). The approach is similar to finding the majority item in an array, however in the majority case when you split the array, you know that one half must have a majority item if such an item exists. Any pointers or tips that one could provide to put me in the right direction ?
EDIT: To clear up a little, I am wondering whether the problem of finding the majority item by splitting the array in half and using a recursive solution can be extended to other situations where k may be n/4 or n/5 etc.
Maybe I should of phrased the question using n/k instead.
This is impossible. As a simple example of why this is impossible, consider an input with a length-n array, all elements distinct, and k=2. The only way to be sure no element appears twice is to compare every element against every other element, which takes O(n^2) time. Until you perform all possible comparisons, you cannot be sure that some pair you didn't compare isn't actually equal.

Number of movements in a dynamic array

A dynamic array is an array that doubles its size, when an element is added to an already full array, copying the existing elements to a new place more details here. It is clear that there will be ceil(log(n)) of bulk copy operations.
In a textbook I have seen the number of movements M as being computed this way:
M=sum for {i=1} to {ceil(log(n))} of i*n/{2^i} with the argument that "half the elements move once, a quarter of the elements twice"...
But I thought that for each bulk copy operation the number of copied/moved elements is actually n/2^i, as every bulk operation is triggered by reaching and exceeding the 2^i th element, so that the number of movements is
M=sum for {i=1} to {ceil(log(n))} of n/{2^i} (for n=8 it seems to be the correct formula).
Who is right and what is wrong in the another argument?
Both versions are O(n), so there is no big difference.
The textbook version counts the initial write of each element as a move operation but doesn't consider the very first element, which will move ceil(log(n)) times. Other than that they are equivalent, i.e.
(sum for {i=1} to {ceil(log(n))} of i*n/{2^i}) - (n - 1) + ceil(log(n))
== sum for {i=1} to {ceil(log(n))} of n/{2^i}
when n is a power of 2. Both are off by different amounts when n is not a power of 2.

What is the cost of deleting a value from a hashtable?

Now I have this question where I was asked the cost of deleting a value from a hash table when we used linear probing while the insertion process.
What I could figure out from reading various stuff on the internet is that it has to do something with the load factor. Though I am not sure, but I read a relation between the load factor and no of probes required and it is No of probes = 1 / (1-LF).
So I believe the cost has to be dependent on the probe sequence. But then another thought ruins everything.
What if the element was inserted in p probes and now I am trying to delete this element. But before this I had already deleted few elements having the same hash code and were a part of insertion in probes less than p.
In this case I reach to a stage where I see a slot empty in the hash table but I am not sure if the element I am trying to delete is already deleted or is at some other location as a result of probing.
I also found that once I delete an element I must mark this slot with some special indicator to inform that it is available, but this doesn't solve my problem of being uncertain about the element which I am willing to delete.
Could anyone please suggest how to find the cost in such cases?
Is the approach going to vary if it is non-linear probing?
The standard approach is "lookup the element, mark as deleted". Marking obviously has O(1) cost, so the total operation cost is the same as just lookup: O(1) expected. It can be as high as O(n) in degenerate cases (e.g. all elements have the same hash). O(1) expected is all we can say theoretically.
About the load factor. The higher the load factor (ratio of number of occupied buckets to the total number), the larger is the expected factor (but this doesn't change the theoretical O cost). Note that in this case load factor includes number of both present in the table elements plus the number of buckets that got marked as deleted previously.
Other probing kinds (e.g. quadratic) don't change the theoretical cost, but may alter the expected constant factor or its variance. If you look at "fallback" sequences, in linear ordering the sequences of different buckets overlap. This means that if for some bucket the sequence is long, for adjacent buckets it will also be long. E.g.: if buckets 4 to 10 are occupied, sequence for bucket #4 is 7 bucket long (4, 5, 6, ..., 10), for #5 it's 6 and so on. For quadratic probing this is not the case.
However, linear probing has the benefit of better memory-cache behavior, since you check memory cells close to each other. In practice, though, for quadratic probing fallback sequences are rarely long enough for this to matter.
Finally, in linear probing case, it is possible to work without deleted mark, but for this you'd have to complicate deleting procedure considerably (still O(1) expected, though, but with much higher constant factor). Whether it is worth it has to be decided with actual profiling; for example, this simplifies inserting somewhat and lookup a bit. For a C++ implementation this would have the downside that erase() would invalidate iterators, though.

Run Time for Linear Probing on Hash table

what is the running time (big Oh) for linear probing on insertion, deletion and searching.
Thanks
Theoretical worst case is O(n) since if you happen to insert all the elements such that they consecutively collide then the last element inserted will have to be put into the table n steps from its original hash position.
You could however calculate the average expected number of steps if you know the distribution of your input elements (possibly assuming random, but of course assumption is the mother...), know the distribution of your hashing function (hopefully uniform, but it depends on the quality of your algorithm), the length of your hash table and the number of elements inserted.
If your hashing function is sufficiently uniform you can calculate the probability of collisions using the birthday problem and from those probabilities calculate expected probing lengths.

Resources