How to answer queries of type l,r,k which finds number of elements in an array in range l to r which occurs atleast k times? - counting

How to compute answer to queries of type l,r,k which finds number of elements in an array which occurs atleast k times in range l to r ? How to use Mo's algorithm?

If you have offline queries, you can use Mo's algorithm.
The following link is quite helpful.Anudeep's blog - Mo's Algorithm


Simple function to generate random number sequence without knowing previous number but know current index (no variable assignment)?

Is there any (simple) random generation function that can work without variable assignment? Most functions I read look like this current = next(current). However currently I have a restriction (from SQLite) that I cannot use any variable at all.
Is there a way to generate a number sequence (for example, from 1 to max) with only n (current number index in the sequence) and seed?
Currently I am using this:
cast(((1103515245 * Seed * ROWID + 12345) % 2147483648) / 2147483648.0 * Max as int) + 1
with max being 47, ROWID being n. However for some seed, the repeat rate is too high (3 unique out of 47).
In my requirements, repetition is ok as long as it's not too much (<50%). Is there any better function that meets my need?
The question has sqlite tag but any language/pseudo-code is ok.
P.s: I have tried using Linear congruential generators with some a/c/m triplets and Seed * ROWID as Seed, but it does not work well, it's even worse.
EDIT: I currently use this one, but I do not know where it's from. The rate looks better than mine:
((((Seed * ROWID) % 79) * 53) % "Max") + 1
I am not sure if you still have the same problem but I might have a solution for you.
What you could do is use Pseudo Random M-sequence generators based on shifting registers. Where you just have to take high enough order of you primitive polynomial and you don't need to store any variables really.
For more info you can check the wiki page
What you would need to code is just the primitive polynomial shifting equation and I have checked in an online editor it should be very easy to do. I think the easiest way for you would be to use Binary base and use PRBS sequences and depending on how many elements you will have you can choose your sequence length. For example this is the implementation for length of 2^15 = 32768 (PRBS15), the primitive polynomial I took from the wiki page (There youcan find the primitive polynomials all the way to PRBS31 what would be 2^31=2.1475e+09)
Basically what you need to do is:
SELECT (((ROWID << 1) | (((ROWID >> 14) <> (ROWID >> 13)) & 1)) & 0x7fff)
The beauty of this approach is if you take the sequence of the PRBS with longer period than your ROWID largest value you will have unique random index. Very simple. :)
If you need help with searching for primitive polynomials you can see my github repo which deals exactly with finding primitive polynomials and unique m-sequences. It is currently written in Matlab, but I plan to write it in python in next few days.
What about using good hash function and map result into [1...max] range?
Along the lines (in pseudocode). sha1 was added to SQLite 3.17.
sha1(ROWID) % Max + 1
Or use any external C code for hash (murmur, chacha, ...) as shown here
A linear congruential generator with appropriately-chosen parameters (a, c, and modulus m) will be a full-period generator, such that it cycles pseudorandomly through every integer in its period before repeating. Although you may have tried this idea before, have you considered that m is equivalent to max in your case? For a list of parameter choices for such generators, see L'Ecuyer, P., "Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure", Mathematics of Computation 68(225), January 1999.
Note that there are some practical issues to implementing this in SQLite, especially if your SQLite version supports only 32-bit integers and 64-bit floating-point numbers (with 52 bits of precision). Namely, there may be a risk of—
overflow if an intermediate multiplication exceeds 32 bits for integers, and
precision loss if an intermediate multiplication results in a greater-than-52-bit number.
Also, consider why you are creating the random number sequence:
Is the sequence intended to be unpredictable? In that case, a linear congruential generator alone is not enough, and you should generate unique identifiers by other means, such as by combining unique numbers with cryptographically random numbers.
Will the numbers generated this way be exposed in any way to end users? If not, there is no need to obfuscate them by "shuffling" them.
Also, depending on the SQLite API you're using (for your programming language), there may be a way to write a custom function to convert the seed and ROWID to a random unique number. The details, however, depend heavily on the specific SQLite API. Another answer shows an example for Perl.

Finding all permutations for a given number of football games in ocaml

I have to write the function series : int -> int -> result list list, so the first int for the number of games and the second int for the points to earn.
I already thought about an empirical solution by creating all permutations and filtering the list, but I think this would be in ocaml very dirty solution with many lines of code. And I cant find another way to solve this problem.
The following types are given
type result = Win (* 3 points *)
| Draw (* 1 point *)
| Loss (* 0 points *)
so if i call
series 3 4
the solution should be:
[[Win ;Draw ;Loss]; [Win ;Loss ;Draw]; [Draw ;Win ;Loss];
[Draw ;Loss ;Win]; [Loss ;Win ;Draw]; [Loss ;Draw ;Win]]
Maybe someone can give me a hint or a code example how to start.
Consider calls of the form series n (n / 2), and consider cases where all the games were Draw or Loss. Under these restrictions the number of answers is proportional to 2^n/sqrt(n). (Guys online get this from Stirling's approximation.)
This doesn't include any series where anybody wins a game. So the actual result lists will be longer than this in general.
I conclude that the number of possible answers is gigantic, and hence that your actual cases are going to be small.
If your actual cases are small, there might be no problem with using a brute-force approach.
Contrary to your claim, brute-force code is usually quite short and easy to understand.
You can easily write a function to list all possible sequences of length n taken from Win, Lose, Draw. You can then filter them for the correct sum. Asymptotically this is probably only a little worse than the fastest algorithm, due to the near-exponential behavior described above.
A simple recursive solution would go along this way:
if there's 0 game to play and 0 point to earn, then there is exactly one (empty) solution
if there's 0 game to play and 1 or more points to earn, there is no solution.
otherwise, p points must be earned in g games: any solution for p points in g-1 game can be extended to a solution by adding a Loss in front of it. If p>=1, you can similarly add a Draw to any solution for p-1 in g-1 games, and if p>=3, there might also be possibilities starting with a Win.

How to choose the lengths of my sub sequences for a shell sort?

Let's assume we have a sequence a_i of length n and we want to sort it using shell sort. To do so, we would choose sub sequences of out a_i's of length k_i.
I'm now wondering how to choose those k_i's. You usually see that if n=16 we would choose k_1=8, k_2=4, k_3=2, k_4=1. So we would pair-wise compare the number's for each k_i and at the end use insertionSort to finish our sorting.
The idea of first sorting sub sequences of length k_i is to "pre-sort" the sequence for the insertionSort. Right?
Now, depending on how we choose our k_i, we get a better performance. Is there a rule I can use here to choose the k_i's?
Could I also choose e.g. n=15, k_1=5, k_2=3, k_3=2?
If we have n=10 and k_1=5, would we now go with {k_2=2, k_3=1} or {k_2=3, k_2=2, k_3=1} or {k_2=3, k_3=1}?
The fascinating thing about shellsort is that for a sequence of n (unique) entries a unique set of gaps will be required to sort it efficiently, essentially f(n) => {gap/gaps}
For example, to most efficiently - on average - sort a sequence containing
2-5 entries - use insertion sort
6 entries - use shellsort with gaps {4,1}
7 or 8 entries - use a {5,1} shellsort
9 entries - use a {6,1} shellsort
10 entries - use a {9,6,1} shellsort
11 entries - use a {10,6,1} shellsort
12 entries - use a {5,1} shellsort
As you can see, 6-9 require 2 gaps, 10 and 11 three and 12 two. This is typical of shellsort's gaps: from one n to the next (i e n+1) you can be fairly sure that the number and makeup of gaps will differ.
A nasty side-effect of shellsort is that when using a set of random combinations of n entries (to save processing/evaluation time) to test gaps you may end up with either the best gaps for n entries or the best gaps for your set of combinations - most likely the latter.
I speculate that it is probably possible to create algorithms where you can plug in an arbitrary n and get the best gap sequence computed for you. Many high-profile computer scientists have explored the relationship between n and gaps without a lot to show for it. In the end they produce gaps (more or less by trial and error) that they claim perform better than those of others who have explored shellsort.
Concerning your foreword given n=16 ...
a {8,4,2,1} shellsort may or may not be an efficient way to sort 16 entries.
or should it be three gaps and, if so, what might they be?
or even two?
Then, to (try to) answer your questions ...
Q1: a rule can probably be formulated
Q2: you could ... but you should test it (for a given n there are n! possible sequences to test)
Q3: you can compare it with the correct answer (above). Or you can test it against all 10! possible sequences when n=10 (comes out to 3628800 of them) - doable

Determine if there exists a number in the array occurring k times

I want to create a divide and conquer algorithm (O(nlgn) runtime) to determine if there exists a number in an array that occurs k times. A constraint on this problem is that only a equality/inequality comparison method is defined on the objects of the array (i.e can't use <, >).
So I have tried a number of approaches including splitting the array into k pieces of equal size (approximately). The approach is similar to finding the majority item in an array, however in the majority case when you split the array, you know that one half must have a majority item if such an item exists. Any pointers or tips that one could provide to put me in the right direction ?
EDIT: To clear up a little, I am wondering whether the problem of finding the majority item by splitting the array in half and using a recursive solution can be extended to other situations where k may be n/4 or n/5 etc.
Maybe I should of phrased the question using n/k instead.
This is impossible. As a simple example of why this is impossible, consider an input with a length-n array, all elements distinct, and k=2. The only way to be sure no element appears twice is to compare every element against every other element, which takes O(n^2) time. Until you perform all possible comparisons, you cannot be sure that some pair you didn't compare isn't actually equal.

HashTable problems Complexity implementation

I coded a java implementation of Hashtable, and I want to test the complexity. The hash table is structured as ad array of double linked list(always implemented by me). The dimension of array is m. I implemented a division hashing function, multiplication one and universal one. For now I'm testing the first one hashing.
I've developed a testing suite made this way:
U (maximum value for a key) = 10000;
m (number of position in the hashkey) = 709;
n (number of element to be inserted) = variable.
So I made multiple insert, where gradually I inserted array with different n. I checked the time of execution with the System.nanoTime().
The graph that comes out is the next:
Supposed that insert is O(1), n insert are O(n). So should this graph be a O(n)?
If I change my values like this:
U = 1000000
m = 1009
n = variable-> ( I inserted once for time, array with incrementally dimension by 25000 elements, from the one with 25000 elements to the one with 800000 elements ).
The graph i got looks like a little strange:
The unique key of elements to be inserted are chosen pseudo randomly between the universe of key U.
But, with different executions, also if I store the same keys in a file, the behavior of the graph always changes with some peaks.
Hope you may help me. If someone needs code, can comment and I will be pleasure to show.
