How to find probe sequence of keys? - hashtable

How would I solve this question? I'm sort of confused on how to start
The keys 34, 25, 79, 56, 6 are to be inserted into a hash table of length 11, where collisions will be resolved
by open addressing. The hash function is
h(k,i) = (k mod11 + i(1+k mod10))mod11
a. Calculate the probe sequence of each of the above keys.

The Probe Sequence will be: 1,3,2,8,6.
To find that you should first put in the numbers into a table using the equation. Every time there is a collision (every time you try to put in a number into the table when there already is a number), you increment i (i starts at 0).
For example, the first number 34 is put in as h(34,0)=(34mod11+0(1+34mod10))mod11 which equals 1. Continue doing this for all the keys.
Hash Table:
0: 1: 342: 793: 254: 5: 6: 67: 8: 569: 10:
So for the probe sequence, you would simply record in order of the keys which number they fall in under the hash table. Let me know if this helps or if I need to make any changes.

Related

Counting Unique Paths in an Array with Special Rules

Description of the Problem:
I have an array of two-digit numbers going from 00 to 99. I must choose a number at random anywhere in the array; let's call this result r. I may now take up to n steps within the array to "travel" to another number in the array, according to the following rules:
Adding or subtracting 1 from r takes one (1) step; I cannot add 1 if there is a 9 in the ones place (ex: 09, 19, 29, ...) and I cannot subtract 1 if there is a 0 in the ones place (ex: 00, 10, 20, ...)
Adding or subtracting 10 from r takes one (1) step; I cannot bring the result to lower than 00 or higher than 99.
By taking two (2) steps, I can swap the digits in the ones and tens place (ex: 13 -> 31, 72 -> 27); however, I can't perform the swap if the digits are the same (ex: can't swap 00, 11, 22, ...)
For a given number x (00 <= x <= 99) I want to count the set of unique values of r from which I can travel to x, given that I can take between 0 and n steps. I call this count how "accessible" x is. I'd like to express this as a formula, A(x, n), rather than just brute-forcing the results for each combination of x and n.
What I Have Tried:
A(x, 0) is easy enough to calculate: A(x, 0) = 1 for all values in the array, because the only way to reach x from r is for r = x; I take zero (0) steps to reach it.
A(x, 1) is trickier, but still simple: you just take into account the new paths available if I spend my one step on either Rule #1 or Rule #2, and add them to A(x, 0). A(x, 2) is where I have to start including Rule #3, but also includes the problem of backtracking. For instance, if I want to reach x and x = r, and I have two (2) steps available, I could perform the following operation: Step 1, r -> r' = r+1 (Rule #1); Step 2, r' -> r'' = r'-1 (Rule #1); r'' = x AND r'' = r. This does not add to my count of unique values from which I can travel to x from.
Where I am Stuck:
I cannot figure out how to count the number of backtracking paths in order to remove them from the otherwise simple calculations of A(x, n), so my values of accessibility are coming out too high.

Optimizing (minimizing) the number of lines in file: an optimization problem in line with permutations and agenda scheduling

I have a calendar, typically a csv file containing a number of lines. Each line corresponds to an individual and is a sequence of consecutive values '0' and '1' where '0' refers to an empty time slot and '1' to an occupied slot. There cannot be two separated sequences in a line (e.g. two sequences of '1' separated by a '0' such as '1,1,1,0,1,1,1,1').
The problem is to minimize the number of lines by combining the individuals and resolving the collisions between time slots. Note the time slots cannot overlap. For example, for 4 individuals, we have the following sequences:
id1:1,1,1,0,0,0,0,0,0,0
id2:0,0,0,0,0,0,1,1,1,1
id3:0,0,0,0,1,0,0,0,0,0
id4:1,1,1,1,0,0,0,0,0,0
One can arrange them to end up with two lines, while keeping track of permuted individuals (for the record). In our example it yields:
1,1,1,0,1,0,1,1,1,1 (id1 + id2 + id3)
1,1,1,1,0,0,0,0,0,0 (id4)
The constraints are the following:
The number of individuals range from 500 to 1000,
The length of the sequence will never exceed 30
Each sequence in the file has the exact same length,
The algorithm needs to be reasonable in execution time because this task may be repeated up to 200 times.
We don't necessarly search for the optimal solution, a near optimal solution would suffice.
We need to keep track of the combined individuals (as in the example above)
Genetic algorithms seems a good option but how does it scales (in terms of execution time) with the size of this problem?
A suggestion in Matlab or R would be (greatly) appreciated.
Here is a sample test:
id1:1,1,1,0,0,0,0,0,0,0
id2:0,0,0,0,0,0,1,1,1,1
id3:0,0,0,0,1,0,0,0,0,0
id4:1,1,1,1,1,0,0,0,0,0
id5:0,1,1,1,0,0,0,0,0,0
id6:0,0,0,0,0,0,0,1,1,1
id7:0,0,0,0,1,1,1,0,0,0
id8:1,1,1,1,0,0,0,0,0,0
id9:1,1,0,0,0,0,0,0,0,0
id10:0,0,0,0,0,0,1,1,0,0
id11:0,0,0,0,1,0,0,0,0,0
id12:0,1,1,1,0,0,0,0,0,0
id13:0,0,0,1,1,1,0,0,0,0
id14:0,0,0,0,0,0,0,0,0,1
id15:0,0,0,0,1,1,1,1,1,1
id16:1,1,1,1,1,1,1,1,0,0
Solution(s)
#Nuclearman provided a working solution in O(NT) (where N is the number of individuals (ids) and T is the number of time slots (columns)) based on the Greedy algorithm.
I study algorithms as a hobby and I agree with Caduchon on this one, that greedy is the way to go. Though I believe this is actually the clique cover problem, to be more accurate: https://en.wikipedia.org/wiki/Clique_cover
Some ideas on how to approach building cliques can be found here: https://en.wikipedia.org/wiki/Clique_problem
Clique problems are related to independence set problems.
Considering the constraints, and that I'm not familiar with matlab or R, I'd suggest this:
Step 1. Build the independence set time slot data. For each time slot that is a 1, create a mapping (for fast lookup) of all records that also have a one. None of these can be merged into the same row (they all need to be merged into different rows). IE: For the first column (slot), the subset of the data looks like this:
id1 :1,1,1,0,0,0,0,0,0,0
id4 :1,1,1,1,1,0,0,0,0,0
id8 :1,1,1,1,0,0,0,0,0,0
id9 :1,1,0,0,0,0,0,0,0,0
id16:1,1,1,1,1,1,1,1,0,0
The data would be stored as something like 0: Set(id1,id4,id8,id9,id16) (zero indexed rows, we start at row 0 instead of row 1 though probably doesn't matter here). Idea here is to have O(1) lookup. You may need to quickly tell that id2 is not in that set. You can also use nested hash tables for that. IE: 0: { id1: true, id2: true }`. Sets also allow for usage of set operations which may help quite a bit when determining unassigned columns/ids.
In any case, none of these 5 can be grouped together. That means at best (given that row) you must have at least 5 rows (if the other rows can be merged into those 5 without conflict).
Performance of this step is O(NT), where N is the number of individuals and T is the number of time slots.
Step 2. Now we have options of how to attack things. To start, we pick the time slot with the most individuals and use that as our starting point. That gives us the min possible number of rows. In this case, we actually have a tie, where the 2nd and 5th rows both have 7. I'm going with the 2nd, which looks like:
id1 :1,1,1,0,0,0,0,0,0,0
id4 :1,1,1,1,1,0,0,0,0,0
id5 :0,1,1,1,0,0,0,0,0,0
id8 :1,1,1,1,0,0,0,0,0,0
id9 :1,1,0,0,0,0,0,0,0,0
id12:0,1,1,1,0,0,0,0,0,0
id16:1,1,1,1,1,1,1,1,0,0
Step 3. Now that we have our starting groups we need to add to them while trying to avoid conflicts between new members and old group members (which would require an additional row). This is where we get into NP-complete territory as there are a ton (roughly 2^N to be more accurately) to assign things.
I think the best approach might be a random approach as you can theoretically run it as many times as you have time for to get results. So here is the randomized algorithm:
Given the starting column and ids (1,4,5,8,9,12,16 above). Mark this column and ids as assigned.
Randomly pick an unassigned column (time slot). If you want a perhaps "better" result. Pick the one with the least (or most) number of unassigned ids. For faster implementation, just loop over the columns.
Randomly pick an unassigned id. For a better result, perhaps the one with the most/least groups that could be assigned that ID. For faster implementation, just pick the first unassigned id.
Find all groups that unassigned ID could be assigned to without creating conflict.
Randomly assign it to one of them. For faster implementation, just pick the first one that doesn't cause a conflict. If there are no groups without conflict, create a new group and assign the id to that group as the first id. The optimal result is that no new groups have to be created.
Update the data for that row (make 0s into 1s as needed).
Repeat steps 3-5 until no unassigned ids for that column remain.
Repeat steps 2-6 until no unassigned columns remain.
Example with the faster implementation approach, which is an optimal result (there cannot be less than 7 rows and there are only 7 rows in the result).
First 3 columns: No unassigned ids (all have 0). Skipped.
4th Column: Assigned id13 to id9 group (13=>9). id9 Looks like this now, showing that the group that started with id9 now also includes id13:
id9 :1,1,0,1,1,1,0,0,0,0 (+id13)
5th Column: 3=>1, 7=>5, 11=>8, 15=>12
Now it looks like:
id1 :1,1,1,0,1,0,0,0,0,0 (+id3)
id4 :1,1,1,1,1,0,0,0,0,0
id5 :0,1,1,1,1,1,1,0,0,0 (+id7)
id8 :1,1,1,1,1,0,0,0,0,0 (+id11)
id9 :1,1,0,1,1,1,0,0,0,0 (+id13)
id12:0,1,1,1,1,1,1,1,1,1 (+id15)
id16:1,1,1,1,1,1,1,1,0,0
We'll just quickly look the next columns and see the final result:
7th Column: 2=>1, 10=>4
8th column: 6=>5
Last column: 14=>4
So the final result is:
id1 :1,1,1,0,1,0,1,1,1,1 (+id3,id2)
id4 :1,1,1,1,1,0,1,1,0,1 (+id10,id14)
id5 :0,1,1,1,1,1,1,1,1,1 (+id7,id6)
id8 :1,1,1,1,1,0,0,0,0,0 (+id11)
id9 :1,1,0,1,1,1,0,0,0,0 (+id13)
id12:0,1,1,1,1,1,1,1,1,1 (+id15)
id16:1,1,1,1,1,1,1,1,0,0
Conveniently, even this "simple" approach allowed for us to assign the remaining ids to the original 7 groups without conflict. This is unlikely to happen in practice with as you say "500-1000" ids and up to 30 columns, but far from impossible. Roughly speaking 500 / 30 is roughly 17, and 1000 / 30 is roughly 34. So I would expect you to be able to get down to roughly 10-60 rows with about 15-45 being likely, but it's highly dependent on the data and a bit of luck.
In theory, the performance of this method is O(NT) where N is the number of individuals (ids) and T is the number of time slots (columns). It takes O(NT) to build the data structure (basically converting the table into a graph). After that for each column it requires checking and assigning at most O(N) individual ids, they might be checked multiple times. In practice since O(T) is roughly O(sqrt(N)) and performance increases as you go through the algorithm (similar to quick sort), it is likely O(N log N) or O(N sqrt(N)) on average, though really it's probably more accurate to use O(E) where E is the number of 1s (edges) in the table. Each each likely gets checked and iterated over a fixed number of times. So that is probably a better indicator.
The NP hard part comes into play in working out which ids to assign to which groups such that no new groups (rows) are created or a lowest possible number of new groups are created. I would run the "fast implementation" and the "random" approaches a few times and see how many extra rows (beyond the known minimum) you have, if it's a small amount.
This problem, contrary to some comments, is not NP-complete due to the restriction that "There cannot be two separated sequences in a line". This restriction implies that each line can be considered to be representing a single interval. In this case, the problem reduces to a minimum coloring of an interval graph, which is known to be optimally solved via a greedy approach. Namely, sort the intervals in descending order according to their ending times, then process the intervals one at a time in that order always assigning each interval to the first color (i.e.: consolidated line) that it doesn't conflict with or assigning it to a new color if it conflicts with all previously assigned colors.
Consider a constraint programming approach. Here is a question very similar to yours: Constraint Programming: Scheduling with multiple workers.
A very simple MiniZinc-model could also look like (sorry no Matlab or R):
include "globals.mzn";
%int: jobs = 4;
int: jobs = 16;
set of int: JOB = 1..jobs;
%array[JOB] of var int: start = [0, 6, 4, 0];
%array[JOB] of var int: duration = [3, 4, 1, 4];
array[JOB] of var int: start = [0, 6, 4, 0, 1, 8, 4, 0, 0, 6, 4, 1, 3, 9, 4, 1];
array[JOB] of var int: duration = [3, 4, 1, 5, 3, 2, 3, 4, 2, 2, 1, 3, 3, 1, 6, 8];
var int: machines;
constraint cumulative(start, duration, [1 | j in JOB], machines);
solve minimize machines;
This model does not, however, tell which jobs are scheduled on which machines.
Edit:
Another option would be to transform the problem into a graph coloring problem. Let each line be a vertex in a graph. Create edges for all overlapping lines (the 1-segments overlap). Find the chromatic number of the graph. The vertices of each color then represent a combined line in the original problem.
Graph coloring is a well-studied problem, for larger instances consider a local search approach, using tabu search or simulated annealing.

All numbers in a given range but random order

Let's say I want to generate all integers from 1-1000 in a random order. But...
No numbers are generated more then once
Without storing an Array, List... of all possible numbers
Without storing the already generated numbers.
Without missing any numbers in the end.
I think that should be impossible but maybe I'm just not thinking about the right solution.
I would like to use it in C# but I'm more interested in the approche then the actual implementation.
Encryption. An encryption is a one-to-one mapping between two sets. If the two sets are the same, then it is a permutation specified by the encryption key. Write/find an encryption that maps {0, 1000} onto itself. Read up on Format Preserving Encryption (FPE) to help you here.
To generate the random order just encrypt the numbers 0, 1, 2, ... in order. You don't need to store them, just keep track of how far you have got through the list.
From a practical point of view, numbers in {0, 1023} would be easier to deal with as that would be a block cipher with a 10 bit block size, and you could write a simple Feistel cipher to generate your numbers. You might want to do that anyway, and just re-encrypt numbers above 1000 -- the cycle walking method of FPE.
If randomness isn't a major concern, you could use a linear congruential generator. Since an LCG won't produce a maximal length sequences when the modulus is a prime number, you would need to choose a larger modulus (the next highest power of 2 would be an obvious choice) and skip any values outside the required range.
I'm afraid C# isn't really my thing, but hopefully the following Python is self-explanatory. It will need a bit of tweaking if you want to generate sequences over very small ranges:
# randint(a, b) returns a random integer in the range (a..b) (inclusive)
from random import randint
def lcg_params(u, v):
# Generate parameters for an LCG that produces a maximal length sequence
# of numbers in the range (u..v)
diff = v - u
if diff < 4:
raise ValueError("Sorry, range must be at least 4.")
m = 2 ** diff.bit_length() # Modulus
a = (randint(1, (m >> 2) - 1) * 4) + 1 # Random odd integer, (a-1) divisible by 4
c = randint(3, m) | 1 # Any odd integer will do
return (m, a, c, u, diff + 1)
def generate_pseudorandom_sequence(rmin, rmax):
(m, a, c, offset, seqlength) = lcg_params(rmin, rmax)
x = 1 # Start with a seed value of 1
result = [] # Create empty list for output values
for i in range(seqlength):
# To generate numbers on the fly without storing them in an array,
# just run the following while loop to fetch a new number
while True:
x = (x * a + c) % m # Iterate LCG until we get a value in the
if x < seqlength: break # required range
result.append(x + offset) # Add this value to the list
return result
Example:
>>> generate_pseudorandom_sequence(1, 20)
[4, 6, 8, 1, 10, 3, 12, 5, 14, 7, 16, 9, 18, 11, 20, 13, 15, 17, 19, 2]

Oracle query to count rows based on value from next record

Input values to the query : 1-20
Values in the database : 4,5, 15,16
I would like a query that gives me results as following
Value - Count
===== - =====
1 - 3
6 - 9
17 - 3
So basically, first generate continuous numbers from 1 to 20, count available numbers.
I wrote a query but I can not get it to fully work:
with avail_ip as (
SELECT (0) + LEVEL AS val
FROM DUAL
CONNECT BY LEVEL < 20),
grouped_tab as (
select val,lead(val,1,0) over (order by val) next_val
from avail_ip u
where not exists (
select 'x' from (select 4 val from dual) b
where b.val=u.val) )
select
val,next_val-val difference,
count(*) over (partition by next_val-val) avail_count
from grouped_tab
order by 1
It gives me count but i am not sure how to compress the rows to just three rows.
I was not able to add complete query, I kept getting 'error occurred while submission'. For some reason it does not like union clause. So I am attaching query as a image :(
More details of exact requirement:
I am writing a ip management module and i need to find out available (free) ip addresses within a ip block. Block could be /16 or /24 or even /12. To make it even challenging, i also support IPv6 so will have lot more numbers to manage. All issued ip addresses are stored in decimal format. So my thought is to first generate all ip decimals within the block range from network address to broadcast address. For eg. in a /24, there would 255 addresses and in case of /16 would 64K.
Now, secondly find all used addresses within a block, and find out available number of address with a starting ip. So in the above example, starting 1 ip- 3 addresses are available, starting with 6, 9 are available .
My last concern would be the query should be able to run fast enough to run through millions of numbers.
And sorry again, if my original question was not clear enough.
Similar sort of idea to what you tried:
with all_values as (
select :start_val + level - 1 as val
from dual
connect by level <= (:end_val - :start_val) + 1
),
missing_values as (
select val
from all_values
where not exists (select null from t42 where id = val)
),
chains as (
select val,
val - (row_number() over (order by val) + :start_val - 1) as chain
from missing_values
)
select min(val), count(*) - 1 as gap_count
from chains
group by chain
order by min(val);
With start_val as 1 and end_val as 20, and your data in table t42, that gets:
MIN(VAL) GAP_COUNT
---------- ----------
1 3
6 9
17 4
I've made end_val inclusive though; not sure if if you want it to be inclusive or exclusive. And I've perhaps made it more flexible that you need - your version also assumes you're always starting from 1.
The all_values CTE is basically the same as your, generating all the numbers between the start and end values - 1 to 20 (inclusive!) in this case.
The missing_values CTE removes the values that are in the table, so you're left with 1,2,3,6,7,8,9,10,11,12,13,14,17,18,19,20.
The chains CTE does the magic part. This gets the difference between each value and where you would expect it to be in a contiguous list. The difference - what I've called 'chain' - is the same for all contiguous missing values; 1,2,3 all get 0, 6 to 14 all get 2, and 17 to 20 all get 4. That chain value can then be used to group by, and you can use the aggregate count and min to get the answer you need.
SQL Fiddle of a simplified version that is specifically for 1-20, showing the data from each intermediate step. This would work for any upper limit, just by changing the 20, but assumes you'll always start from 1.

Exam question about hash tables (interpretation of wording)

I was confused about the wording of a particular exam question about hash tables. The way I understand it there could be two different answers depending on the interpretation. So I was wondering if someone could help determine which understanding is correct. The question is below:
We have a hash table of size 7 to store integer keys, with hash function h(x) = x mod 7. If we use linear probing and insert elements in the order 1, 15, 14, 3, 9, 5, 27, how many times will an element try to move to an occupied spot?
I'll break down my two different understandings of this question. First of all the initial indexes of each element would be:
1: 1
15: 1
14: 0
3: 3
9: 2
5: 5
27: 6
First interpretation:
1: is inserted into index 1
15: tries to go to index 1, but due to a collision moves left to index 0. Collision count = 1
14: tries to go to index 0, but due to collision moves left to index 6. Collision count = 2
3: is inserted into index 3
9: is inserted into index 2
5: is inserted into index 5
27: tries to go to index 6, but due to collisions moves to index 5 and then to 4 which is empty. Collision count = 4
Answer: 4?
Second interpretation:
Only count the time when 27 tries to move to the occupied index 5 because of a collision with the element in index 6.
Answer: 1?
Which answer would be correct?
Thanks.
The wording is silly.
The teacher arguably wants #1 but I would argue that #2 is pedantically correct because an element will only ever try to move to an occupied spot once, as pointed out. In the other cases it does not move to an occupied spot but rather from an occupied spot to a free spot.
Tests in school are sort of silly -- the teacher (or TA) already knows what he/she wants. There is a line to draw between "being pedantically correct" and "giving the teacher what they want". (Just never, ever give in to the provably wrong!)
One thing that has never (at least that I recall ;-) failed me in a test or homework is providing an answer with a solid -- and correct -- justification for the answer; this may include also explaining the "other" answer.
Teacher/environment, repertoire, hubris and grade (to name a few) need to be balanced.
Happy schooling.
Interpretation 1 is correct. Collision with 6 means that slot 6 is occupied, so why don't you count it?

Resources