Suppose we have n processes forming a general network. We don't know which are connected together, but we know the number of the processes (n).If at each round, a process sends a message to all processes it is connected to, receives 1 message from each of them, and the program executes for r rounds, is there a way to find how many messages have been sent during the program execution?
As you have pointed out, without the exact network structure it is impossible to put a specific value on the number of messages sent. Instead, we can look at its Big-O value.
Now just to be clear what we mean by Big-O:
Big-O refers to the upper bound (ie. the worst possible case) complexity
It is possible (and quite likely in real systems) that the actual value will be less
Without some function that describes the average case (eg. each process is connected to, on average N / 2 other processes) we must assume the worst case
By "worst case" for this problem we mean the maximum number of messages are sent
So let us assume the worst case, in which each process is connected to N - 1 other processes.
Let us also define some variables:
S := the set of processes
N := the number of processes in S
We can represent the set S as a complete (every node connects to every other node), undirected graph in which each node in the graph corresponds to a process and each edge in the graph corresponds to 2 messages sent (1 outgoing transmission and one reply).
From here, we see that the number of edges in a complete graph is (N(N-1))/2
So in the worst case, the number of messages sent is N(N-1), or N^2 - N.
Now because we are dealing with Big-O notation, we are interested in how this value grows as a function of N.
By the triangle inequality, we can see that O(N^2 - N) is an element of O(N^2).
So the number of messages sent grows as N^2 in the worst case.
It is also possible to arrive at this result using an adjacency matrix, that is an N x N matrix where a 1 in the (i, j)th element refers to an edge from node i to node j.
Because in the original problem each process sends a single message to all connected processes, who respond with a single message, we can see that for every pair (i, j) and (j, i) there will be an edge (one representing an outgoing message, one a reply). The exception to this will be pairs where i = j, ie. we a process doesn't send itself a message.
So the matrix will be completely filled with 1s with the exception of the diagonal.
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
Above: the adjacency matrix for N = 4.
So we first want to determine a formula for the total number of messages sent as a function of the number of nodes.
By the area of a rectangle we can see that the number of 1s in the matrix (ignoring the diagonal) would be N x N = N^2.
Now we must consider the diagonal. The number of pairs (x, x) that can exist is given by the function f(i) where Z(N) -> Z(N) x Z(N) : f(i) = (i, i) -- this means that there will be exactly N distinct solutions to this function.
So the overall result is that we have N^2 - N messages when the diagonal is considered.
Now, we use the same Big-O reasoning from above to arrive at the same conclusion, the number of messages grows in the worst case as O(N^2).
So, now you need only take into consideration the number of rounds that have occured, leaving you with O(RN^2).
Of course now you must consider whether you really do have the worst case ...
Related
I' am doing my homework in programming, and I don't know how to solve this problem:
We have a set of n weights, we are putting them on a scale one by one until all weights is used. We also have string of n letters "R" or "L" which means which pen is heavier in that moment, they can't be in balance. There are no weights with same mass. Compute in what order we have to put weights on scale and on which pan.
The goal is to find order of putting weights on scale, so the input string is respected.
Input: number 0 < n < 51, number of weights. Then weights and the string.
Output: in n lines, weight and "R" or "L", side where you put weight. If there are many, output any of them.
Example 1:
Input:
3
10 20 30
LRL
Output:
10 L
20 R
30 L
Example 2:
Input:
3
10 20 30
LLR
Output:
20 L
10 R
30 R
Example 3:
Input:
5
10 20 30 40 50
LLLLR
Output:
50 L
10 L
20 R
30 R
40 R
I already tried to compute it with recursion but unsuccessful. Can someone please help me with this problem or just gave me hints how to solve it.
Since you do not show any code of your own, I'll give you some ideas without code. If you need more help, show more of your work then I can show you Python code that solves your problem.
Your problem is suitable for backtracking. Wikipedia's definition of this algorithm is
Backtracking is a general algorithm for finding all (or some) solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it determines that the candidate cannot possibly be completed to a valid solution.
and
Backtracking can be applied only for problems which admit the concept of a "partial candidate solution" and a relatively quick test of whether it can possibly be completed to a valid solution.
Your problem satisfies those requirements. At each stage you need to choose one of the remaining weights and one of the two pans of the scale. When you place the chosen weight on the chosen pan, you determine if the corresponding letter from the input string is satisfied. If not, you reject the choice of weight and pan. If so, you continue by choosing another weight and pan.
Your overall routine first inputs and prepares the data. It then calls a recursive routine that chooses one weight and one pan at each level. Some of the information needed by each level could be put into mutable global variables, but it would be more clear if you pass all needed information as parameters. Each call to the recursive routine needs to pass:
the weights not yet used
the input L/R string not yet used
the current state of the weights on the pans, in a format that can easily be printed when finalized (perhaps an array of ordered pairs of a weight and a pan)
the current weight imbalance of the pans. This could be calculated from the previous parameter, but time would be saved by passing this separately. This would be total of the weights on the right pan minus the total of the weights on the left pan (or vice versa).
Your base case for the recursion is when the unused-weights and unused-letters are empty. You then have finished the search and can print the solution and quit the program. Otherwise you loop over all combinations of one of the unused weights and one of the pans. For each combination, calculate what the new imbalance would be if you placed that weight on that pan. If that new imbalance agrees with the corresponding letter, call the routine recursively with appropriately-modified parameters. If not, do nothing for this weight and pan.
You still have a few choices to make before coding, such as the data structure for the unused weights. Show me some of your own coding efforts then I'll give you my Python code.
Be aware that this could be slow for a large number of weights. For n weights and two pans, the total number of ways to place the weights on the pans is n! * 2**n (that is a factorial and an exponentiation). For n = 50 that is over 3e79, much too large to do. The backtracking avoids most groups of choices, since choices are rejected as soon as possible, but my algorithm could still be slow. There may be a better algorithm than backtracking, but I do not see it. Your problem seems to be designed to be handled by backtracking.
Now that you have shown more effort of your own, here is my un-optimized Python 3 code. This works for all the examples you gave, though I got a different valid solution for your third example.
def weights_on_pans():
def solve(unused_weights, unused_tilts, placement, imbalance):
"""Place the weights on the scales using recursive
backtracking. Return True if successful, False otherwise."""
if not unused_weights:
# Done: print the placement and note that we succeeded
for weight, pan in placement:
print(weight, 'L' if pan < 0 else 'R')
return True # success right now
tilt, *later_tilts = unused_tilts
for weight in unused_weights:
for pan in (-1, 1): # -1 means left, 1 means right
new_imbalance = imbalance + pan * weight
if new_imbalance * tilt > 0: # both negative or both positive
# Continue searching since imbalance in proper direction
if solve(unused_weights - {weight},
later_tilts,
placement + [(weight, pan)],
new_imbalance):
return True # success at a lower level
return False # not yet successful
# Get the inputs from standard input. (This version has no validity checks)
cnt_weights = int(input())
weights = {int(item) for item in input().split()}
letters = input()
# Call the recursive routine with appropriate starting parameters.
tilts = [(-1 if letter == 'L' else 1) for letter in letters]
solve(weights, tilts, [], 0)
weights_on_pans()
The main way I can see to speed up that code is to avoid the O(n) operations in the call to solve in the inner loop. That means perhaps changing the data structure of unused_weights and changing how it, placement, and perhaps unused_tilts/later_tilts are modified to use O(1) operations. Those changes would complicate the code, which is why I did not do them.
From what I have learned in my supercomputing class I know that MPI is a communicating (and data passing) interface.
I'm confused on when you run a function in a C++ program and want each processor to perform a specific task.
For example, a prime number search (very popular for supercomputers). Say I have a range of values (531-564, some arbitrary range) and say I have 50 processes I could run a series of evaluations on for each number. If root (process 0) wants to examine 531 and knowing prime numbers I can use 8 processes (1-8) to evaluate the prime status. If the number is divisible by any number 2-9 with a remainder of 0, then it is not prime.
Is it possible that for MPI which passes data to each process to have these processes perform these actions?
The hardest part for me is understanding that if I perform an action in the original C++ program the processes taking place could be allocated on several different processes, then in MPI how can I structure this? Or is my understanding completely wrong? If so how am I supposed to truly go about this path of thinking in a correct manner?
The big idea is passing data to a process versus having a function sent to a process. I'm fairly certain I'm wrong but I'm trying to back track to fix my thinking.
Each MPI process is running the same program, but that doesn't mean that they are doing the same thing. Different processes can be running different branches of the code, depending on the id (or "rank") of the process, and in effect be completely independent. Like any distributed computation, the actors do need to agree on how they will communicate.
The most basic strategy in MPI is scatter-gather, where the "master" process (usually the one with rank 0) will split an array of work equally amongst the peers (including the master process itself) by having them all call scatter, the peers will do the work, then all peers will call gather to send the results back to master.
In your prime algorithm example, build an array of integers, "scatter" it to all the peers, each peer will run through its array saving 1 if it is prime, 0 if it is not then "gather" the results to master. [In this particular example, since the input data is completely predictable based on process rank, the scatter step is unnecessary but we will do it anyway.]
As pseudo-code:
main():
int x[n], n = 100
MPI_init()
// prepare data on master
if rank == 0:
for i in 1 ... n, x[i] = i
// send data from x on root to local on each process in world
MPI_scatter(x, n, int, local, n/k, int, root, world)
for i in 1 ... n/k
result[i] = 1 // assume prime
if 2 divides local[i], result[i] = 0
if 3 divides local[i], result[i] = 0
if 5 divides local[i], result[i] = 0
if 7 divides local[i], result[i] = 0
// gather reults from local on each process in world to x on root
MPI_gather(result, n/k, int, x, n, int, root, world)
// print results
if rank == 0:
for i in 1 ... n, print i if x[i] == 1
MPI_finalize()
There are lots of details to fill in such as proper declarations, and dealing with the fact that some ranks will have fewer elements than others, using
proper C syntax, etc., but getting them right doesn't help explain the overall picture.
More fine-grained synchronization and communication is possible using direct send/recv between processes. Such programs are harder to write since the different processes may be in different states. In particular, it is important that if process a is calling MPI_send to process b, then process b had better be calling MPI_recv from a.
Sorry for the long post. I did read some other MPI broadcast related errors but I couldn't
find out why my program is failing.
I am new to MPI and I am facing this problem. First I will explain what I am trying to do:
My declarations :
ROWTAG 400
COLUMNTAG 800
Create a 2 X 2 Cartesian topology.
Rank 0 has the whole matrix. It wants to dissipate parts of matrix to all the processes in the 2 X 2 Cartesian topology. For now, instead
of matrix I am just dealing with integers. So for process P(i,j) in 2 X 2 Cartesian topology, (i - row , j - column), I want it to receive
(ROWTAG + i ) in one message and (COLUMNTAG + j) in another message.
My strategy to do so is:
Processes: P(0,0) , P(0,1), P(1,0), P(1,1)
P(0,0) has all the initial data.
P(0,0) sends (ROWTAG+1) (in this case 401) to P(1,0) - In essense P(1,0) is responsible for dissipating information related to row 1 for all the processes in Row 1 - I just used a blocking send
P(0,0) sends (COLUMNTAG+1) (in this case 801) to P(0,1) - In essense P(0,1) is responsible for dissipating information related to column 1 for all the processes in Column 1 - Used a blocking send
For each process, I made a row_group containing all the processes in that row and out of this created a row_comm (communicator object)
For each process, I made a col_group containing all the processes in that column and out of this created a col_comm (communicator object)
At this point, P(0,0) has given information related to row 'i' to Process P(i,0) and P(0,0) has given information related to column 'j' to
P(0,j). I call P(i,0) and P(0,j) as row_head and col_head respectively.
For Process P(i,j) , P(i,0) gives information related to row i, and P(0,j) gives information related to column j.
I used a broad cast call:
MPI_Bcast(&row_data,1,MPI_INT,row_head,row_comm)
MPI_Bcast(&col_data,1,MPI_INT,col_head,col_comm)
Please find my code here: http://pastebin.com/NpqRWaWN
Here is the error I see:
* An error occurred in MPI_Bcast
on communicator MPI COMMUNICATOR 5 CREATE FROM 3
MPI_ERR_ROOT: invalid root
* MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
Also please let me know if there is any better way to distribute the matrix data.
There are several errors in your program. First, row_Ranks is declared with one element less and when writing to it, you possibly overwrite other stack variables:
int col_Ranks[SIZE], row_Ranks[SIZE-1];
// ^^^^^^
On my test system the program just hangs because of that.
Second, you create new subcommunicators out of matrixComm but you use rank numbers from the latter to address processes in the former when performing the broadcast. That doesn't work. For example, in a 2x2 Cartesian communicator ranks range from 0 to 3. In any column- or row-wise subgroup there are only two processes with ranks 0 and 1 - there is neither rank 2 nor rank 3. If you take a look at the value of row_head across the ranks, it is 2 in two of them, hence the error.
For a much better way to distribute the data, you should refer to this extremely informative answer.
Specifically around log log counting approach.
I'll try and clarify the use of probabilistic counters although note that I'm no expert on this matter.
The aim is to count to very very large numbers using only a little space to store the counter (e.g. using a 32 bits integer).
Morris came up with the idea to maintain a "log count", so instead of counting n, the counter holds log₂(n). In other words, given a value c of the counter, the real count represented by the counter is 2ᶜ.
As logs are not generally of integer value, the problem becomes when the c counter should be incremented, as we can only do so in steps of 1.
The idea here is to use a "probabilistic counter", so for each call to a method Increment on our counter, we update the actual counter value with a probability p. This is useful as it can be shown that the expected value represented by the counter value c with probabilistic updates is in fact n. In other words, on average the value represented by our counter after n calls to Increment is in fact n (but at any one point in time our counter is probably has an error)! We are trading accuracy for the ability to count up to very large numbers with little storage space (e.g. a single register).
One scheme to achieve this, as described by Morris, is to have a counter value c represent the actual count 2ᶜ (i.e. the counter holds the log₂ of the actual count). We update this counter with probability 1/2ᶜ where c is the current value of the counter.
Note that choosing this "base" of 2 means that our actual counts are always multiples of 2 (hence the term "order of magnitude estimate"). It is also possible to choose other b > 1 (typically such that b < 2) so that the error is smaller at the cost of being able to count smaller maximum numbers.
The log log comes into play because in base-2 a number x needs log₂ bits to be represented.
There are in fact many other schemes to approximate counting, and if you are in need of such a scheme you should probably research which one makes sense for your application.
References:
See Philippe Flajolet for a proof on the average value represented by the counter, or a much simpler treatment in the solutions to a problem 5-1 in the book "Introduction to Algorithms". The paper by Morris is usually behind paywalls, I could not find a free version to post here.
its not exactly for the log counting approach but i think it can help you,
using Morris' algorithm, the counter represents an "order of magnitude estimate" of the actual count.The approximation is mathematically unbiased.
To increment the counter, a pseudo-random event is used, such that the incrementing is a probabilistic event. To save space, only the exponent is kept. For example, in base 2, the counter can estimate the count to be 1, 2, 4, 8, 16, 32, and all of the powers of two. The memory requirement is simply to hold the exponent.
As an example, to increment from 4 to 8, a pseudo-random number would be generated such that a probability of .25 generates a positive change in the counter. Otherwise, the counter remains at 4. from wiki
I just read this interesting question about a random number generator that never generates the same value three consecutive times. This clearly makes the random number generator different from a standard uniform random number generator, but I'm not sure how to quantitatively describe how this generator differs from a generator that didn't have this property.
Suppose that you handed me two random number generators, R and S, where R is a true random number generator and S is a true random number generator that has been modified to never produce the same value three consecutive times. If you didn't tell me which one was R or S, the only way I can think of to detect this would be to run the generators until one of them produced the same value three consecutive times.
My question is - is there a better algorithm for telling the two generators apart? Does the restriction of not producing the same number three times somehow affect the observable behavior of the generator in a way other than preventing three of the same value from coming up in a row?
As a consequence of Rice's Theorem, there is no way to tell which is which.
Proof: Let L be the output of the normal RNG. Let L' be L, but with all sequences of length >= 3 removed. Some TMs recognize L', but some do not. Therefore, by Rice's theorem, determining if a TM accepts L' is not decidable.
As others have noted, you may be able to make an assertion like "It has run for N steps without repeating three times", but you can never make the leap to "it will never repeat a digit three times." More appropriately, there exists at least one machine for which you can't determine whether or not it meets this criterion.
Caveat: if you had a truly random generator (e.g. nuclear decay), it is possible that Rice's theorem would not apply. My intuition is that the theorem still holds for these machines, but I've never heard it discussed.
EDIT: a secondary proof. Suppose P(X) determines with high probability whether or not X accepts L'. We can construct an (infinite number of) programs F like:
F(x): if x(F), then don't accept L'
else, accept L'
P cannot determine the behavior of F(P). Moreover, say P correctly predicts the behavior of G. We can construct:
F'(x): if x(F'), then don't accept L'
else, run G(x)
So for every good case, there must exist at least one bad case.
If S is defined by rejecting from R, then a sequence produced by S will be a subsequence of the sequence produced by R. For example, taking a simple random variable X with equal probability of being 1 or 0, you would have:
R = 0 1 1 0 0 0 1 0 1
S = 0 1 1 0 0 1 0 1
The only real way to differentiate these two is to look for streaks. If you are generating binary numbers, then streaks are incredibly common (so much so that one can almost always differentiate between a random 100 digit sequence and one that a student writes down trying to be random). If the numbers are taken from [0,1] uniformly, then streaks are far less common.
It's an easy exercise in probability to calculate the chance of three consecutive numbers being equal once you know the distribution, or even better, the expected number of numbers needed until the probability of three consecutive equal numbers is greater than p for your favourite choice of p.
Since you defined that they only differ with respect to that specific property there is no better algorithm to distinguish those two.
If you do triples of randum values of course the generator S will produce all other triples slightly more often than R in order to compensate the missing triples (X,X,X). But to get a significant result you'd need much more data than it will cost you to find any value three consecutive times the first time.
Probably use ENT ( http://fourmilab.ch/random/ )