Reusable Barrier solution has a deadlock? - deadlock

I have been reading "The Little Book of Semaphores" and in page 41 there is a solution for the Reusable Barrier problem. The problem I have is why it won't generate a deadlock situation.
1 # rendezvous
2
3 mutex.wait()
4 count += 1
5 if count == n:
6 turnstile2.wait() # lock the second
7 turnstile.signal() # unlock the first
8 mutex.signal()
9
10 turnstile.wait() # first turnstile
11 turnstile.signal()
12
13 # critical point
14
15 mutex.wait()
16 count -= 1
17 if count == 0:
18 turnstile.wait() # lock the first
19 turnstile2.signal() # unlock the second
20 mutex.signal()
21
22 turnstile2.wait() # second turnstile
23 turnstile2.signal()
In this solution, between lines 15 and 20, isn't it a bad habit to call wait() on a semaphore (in line 18) while holding a mutex which causes a deadlock? Please explain. Thank you.

mutex protects the count variable. The first mutex lock is concerned with incrementing the counter to account for each thread, and the last thread to enter (if count == n) locks the second tunstile in preparation of leaving (see below) and releases the waiting (n-1) threads (that are waiting on line 10). Then each signals to the next.
The second mutex lock works similarly to the first, but decrements count (same mutext protects it). The last thread to enter the mutex block locks turnstile to prepare for the next batch entring (see above) and releases the (n-1) thread waiting on line 22. Then each thread signals to the next.
Thus turnstile coordinates the entries to the critical point, while turnstile2 coordinates the exit from it.
There could be no deadlock: by the time the (last) thread gets to line 18, turnstile is guarantted to be not held by any other thread (they are all waiting on line 22). Similarly with turnstile2

Related

OpenCL: multiple work items saving results to the same global memory address

I'm trying to do a reduce-like cumulative calculation where 4 different values need to be stored depending on certain conditions. My kernel receives long arrays as input and needs to store only 4 values, which are "global sums" obtained from each data point on the input. For example, I need to store the sum of all the data values satisfying certain condition, and the number of data points that satisfy said condition. The kernel is below to make it clearer:
__kernel void photometry(__global float* stamp,
__constant float* dark,
__global float* output)
{
int x = get_global_id(0);
int s = n * n;
if(x < s){
float2 curr_px = (float2)((x / n), (x % n));
float2 center = (float2)(centerX, centerY);
int dist = (int)fast_distance(center, curr_px);
if(dist < aperture){
output[0] += stamp[x]-dark[x];
output[1]++;
}else if (dist > sky_inner && dist < sky_outer){
output[2] += stamp[x]-dark[x];
output[3]++;
}
}
}
All the values not declared in the kernel are previously defined by macros. s is the length of the input arrays stamp and dark, which are nxn matrices flattened down to 1D.
I get results but they are different from my CPU version of this. Of course I am wondering: is this the right way to do what I'm trying to do? Can I be sure that each pixel data is only being added once? I can't think of any other way to save the cumulative result values.
Atomic operation is needed in your case, otherwise data races will cause the results unpredictable.
The problem is here:
output[0] += stamp[x]-dark[x];
output[1]++;
You can imagine that threads in the same wave might still follow the same step, therefore, it might be OK for threads inside the same wave. Since they read the same output[0] value using a global load instruction (broadcasting). Then, when they finish the computation and try to store data into the same memory address (output[0]), the writing operations will be serialized. To this point, you may still get the correct results (for the work items inside the same wave).
However, since it is highly likely that your program launches more than one wave (in most applications, this is the case). Different waves may execute in an unknown order; then, when they access the same memory address, the behavior becomes more complicated. For example, wave0 and wave1 may access output[0] in the beginning before any other computation happens, that means they fetch the same value from output[0]; then they start the computation. After computation, they save their accumulative results into output[0]; apparently, result from one of the waves will be overwritten by another one, as if only the one who writes memory later got executed. Just imagine that you have much more waves in a real application, so it is not strange to have a wrong result.
You can do this in O(log2(n)) concurrently. a concept idea:
You have 16 (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) inputs and you want to have the sum of these inputs concurrently.
you can concurrently sum 1 in 2, 3 in 4, 5 in 6, 7 in 8, 9 in 10, 11 in 12, 13 in 14, 15 in 16
then you sum concurrently 2 in 4, 6 in 8, 10 in 12, 14 in 16
then always concurrently 4 in 8, 10 in 16
and finally 8 in 16
everything done in O(log2(n)) in our case in 4 passages.

Puzzle / Riddle to put unique combination of 1 to 8 numbers in 7 by 7 matrix

I'm badly stuck in below puzzle. Need your help in solving this:
8 guys booked 7 rooms in a hotel and decided to use only 4 rooms out of those 7, by making unique combinations of 2 people in such a way that neither a combination repeats in 7 days nor a person stays in a room twice.
E.g. if 1&2, 3&4, 5&6 and 7&8 stays in room1, room2, room3 and room4 then these combination will never stay together. They have to make different combination and change the room as well.
Another example, 1 can not stay again in room1 with anybody and same applies to others as well.
Can someone please solve this 7x7 matrix for me and help me out.. appreciate your efforts on this!
1&2 6&7 3&8 4&5
5&6 1&3 4&8 2&7
5&7 1&4 2&8 3&6
7&8 2&3 1&5 4&6
2&4 5&8 3&7 1&6
3&4 6&8 2&5 1&7
2&6 4&7 3&5 1&8
A program :)
One solution would be:
18 36 27 45
35 17 46 28
47 25 16 38
26 37 15 48
58 23 14 67
68 24 57 13
34 78 56 12
Actually, that's two solutions, since you could read
the rows as days and columns as rooms, or vice versa.
Each permutation of the numbers assigned to people, and each permutation of the rooms (columns), and each permutation of the days (rows) leads to another solution as well.
Here is Python code, based on Egor Skriptunoff's solution. The main idea is the same: Generate a list of all the pairs of people, and start placing them one at a time on the board. If there is no valid placement, then backtrack and try something else. In the code below a list of tasks records board states, so when a dead end is reached, the next candidate board state is popped off the tasks list. The trick is to enumerate all the possibilites in an organized, efficient way.
There are some minor differences:
This code is iterative rather than recursive.
The solve function tries to generate all solutions instead of stopping after finding one.
For simplicity, the initialization condition has been removed.
import itertools as IT
import collections
import copy
import random
people = range(1, 9)
days = range(7)
rooms = range(7)
pairs = list(IT.combinations(people, 2))
def solve():
board = [[None for room in rooms] for day in days]
pairidx = 0
tasks = [(pairidx, board)]
while tasks:
pairidx, board = tasks.pop()
if pairidx == len(pairs):
yield board
continue
for day, room in IT.product(days, rooms):
if not (used_day(board, pairidx, day)
or used_room(board, pairidx, room)
or used_day_room(board, day, room)):
tasks.append(
(pairidx + 1, move(board, pairidx, day, room)))
def used_day(board, pairidx, day):
"""
Return True if any person in persons has already been assigned a room
"""
return any([person in pairs[idx] for idx in board[day] if idx is not None
for person in pairs[pairidx]])
def used_room(board, pairidx, room):
"""
Return True if any person in persons already been in the room
"""
return any([person in pairs[row[room]] for row in board if row[room] is not None
for person in pairs[pairidx]])
def used_day_room(board, day, room):
"""
Return True if the room has already been assigned a pair for the day
"""
return board[day][room] is not None
def move(board, pairidx, day, room):
"""
Assign a pair to a room on a given day. Return the new (copy) of the board.
"""
board = copy.deepcopy(board)
board[day][room] = pairidx
return board
def report(board):
print('\n'.join(
[' '.join([''.join(map(str, pairs[col])) if col is not None else
' ' for col in row])
for row in board]))
print('-' * 20)
for solution in solve():
report(solution)
By the way, this problem is very similar to the Zebra problem and other constraint puzzles. You might look there for more ideas on how to solve your problem.

How does Clojure's laziness interact with calls to Java/impure code?

We stumbled upon an issue in our code today, and couldn't answer this Clojure question:
Does Clojure evaluate impure code (or calls to Java code) strictly or lazily?
It seems that side-effects + lazy sequences can lead to strange behavior.
Here's what we know that led to the question:
Clojure has lazy sequences:
user=> (take 5 (range)) ; (range) returns an infinite list
(0 1 2 3 4)
And Clojure has side-effects and impure functions:
user=> (def value (println 5))
5 ; 5 is printed out to screen
user=> value
nil ; 'value' is assigned nil
Also, Clojure can make calls to Java objects, which may include side-effects.
However, side-effects may interact poorly with lazy evaluation:
user=> (def my-seq (map #(do (println %) %) (range)))
#'user/my-seq
user=> (take 5 my-seq)
(0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0 1 2 3 4)
So it returned the first 5 elements, but printed the first 31!
I assume the same kinds of problems could occur if calling side-effecting methods on Java objects. This could make it really hard to reason about code and figure out what's going to happen.
Ancillary questions:
Is it up to the programmer to watch out for and prevent such situations? (Yes?)
Besides sequences, does Clojure perform strict evaluation? (Yes?)
Clojure's lazy seqs chunk about 30 items so the little overhead is further reduced. It's not the purist's choice but a practical one. Consult "The Joy of Clojure" for an ordinary solution to realize one element at time.
Lazy seqs aren't a perfect match for impure functions for the reason you encountered.
Clojure will also evaluate strictly, but with macros things are a bit different. Builtins such as if will naturally hold evaluating.
Lazy constructs are evaluated more or less whenever is convenient for the implementation no matter what's referenced in them. So, yes, it's up to the programmer to be careful and force realization of lazy seqs when needed.
I have no idea what you mean by strict evaluation.

Point handling for closed loop searching

I have set of line segments. Each contains only 2 nodes. I want to find the available closed cycles which produces by joining line segments. Actually, I am looking for the smallest loop if there exist more than one occurrence. If can, please give me a good solution for this.
So, for example I have added below line list together with their point indices to get idea about m case. (Where First value = line number, second 2 values are the point indices)
0 - 9 11
1 - 9 18
2 - 9 16
3 - 11 26
4 - 11 45
5 - 16 25
6 - 16 49
7 - 18 26
8 - 18 25
9 - 18 21
10 - 25 49
11 - 26 45
So, assume I have started from the line 1. That is I have started to find connected loops from point 9, 18. Then, could you please explain (step by step) how I can get the "closed loops" from that line.
Well, I don't see any C++ code, but I'll try to suggest a C++ solution (although I'm not going to write it for you).
If your graph is undirected (if it's directed, s/adjacent/in-edges' vertices/), and you want to find all the shortest cycles passing through some vertex N, then I think you could follow this procedure:
G <= a graph
N <= some vertex in G
P <= a path (set of vertexes/edges connecting them)
P_heap <= a priority queue, ascending by distance(P) where P is a path
for each vertex in adjacent(N):
G' = G - edge(vertex, N)
P = dijkstraShortestPath(vertex, N, G')
push(P, P_heap)
You could also just throw out all but the shortest loop, but that's less succinct. As long as you don't allow negative edge weights (which, since you'll be using line segment length for weights, you don't), I think this should work. Also, fortunately Boost.Graph provides all of the necessary functionality to do this in C++ (you don't even have to implement Dijkstra's algorithm)! You can find documentation about it here:
http://www.boost.org/doc/libs/1_47_0/libs/graph/doc/table_of_contents.html
EDIT: you will have to create the graph from that data you listed first before you can do this, so you'll just define your graph's property_map accordingly and make sure the distance between a vertex you're about to insert and all vertexes currently in the graph is greater than zero, because otherwise the vertex is already in the graph and you don't want to insert it again.
Happy graphing!

Exam question about hash tables (interpretation of wording)

I was confused about the wording of a particular exam question about hash tables. The way I understand it there could be two different answers depending on the interpretation. So I was wondering if someone could help determine which understanding is correct. The question is below:
We have a hash table of size 7 to store integer keys, with hash function h(x) = x mod 7. If we use linear probing and insert elements in the order 1, 15, 14, 3, 9, 5, 27, how many times will an element try to move to an occupied spot?
I'll break down my two different understandings of this question. First of all the initial indexes of each element would be:
1: 1
15: 1
14: 0
3: 3
9: 2
5: 5
27: 6
First interpretation:
1: is inserted into index 1
15: tries to go to index 1, but due to a collision moves left to index 0. Collision count = 1
14: tries to go to index 0, but due to collision moves left to index 6. Collision count = 2
3: is inserted into index 3
9: is inserted into index 2
5: is inserted into index 5
27: tries to go to index 6, but due to collisions moves to index 5 and then to 4 which is empty. Collision count = 4
Answer: 4?
Second interpretation:
Only count the time when 27 tries to move to the occupied index 5 because of a collision with the element in index 6.
Answer: 1?
Which answer would be correct?
Thanks.
The wording is silly.
The teacher arguably wants #1 but I would argue that #2 is pedantically correct because an element will only ever try to move to an occupied spot once, as pointed out. In the other cases it does not move to an occupied spot but rather from an occupied spot to a free spot.
Tests in school are sort of silly -- the teacher (or TA) already knows what he/she wants. There is a line to draw between "being pedantically correct" and "giving the teacher what they want". (Just never, ever give in to the provably wrong!)
One thing that has never (at least that I recall ;-) failed me in a test or homework is providing an answer with a solid -- and correct -- justification for the answer; this may include also explaining the "other" answer.
Teacher/environment, repertoire, hubris and grade (to name a few) need to be balanced.
Happy schooling.
Interpretation 1 is correct. Collision with 6 means that slot 6 is occupied, so why don't you count it?

Resources