I am having difficulty in understanding a key point in how count to infinity can occur.
Let us say we have a network
A-B-C-D-E
The cost for each link is 1.
According to Tanenbaum,
when A goes down, B will update its cost towards A as infinity. But B receives an advertisement from C which says "I can reach A with a cost of 2". Now, B can reach C with a cost of 1, so it updates the distance to A as 3.
In the next part I have a problem.
He says,
now C notices that both its neighbors can reach A with a cost of 3.
"So C will update distance to A as 4"
Why does this happen? Because already C thinks it can reach A by a cost of 2.
By the Bellman Ford equation, this cost is lesser than the cost 3+1=4. Why shouldn't it simply keep 2 as the distance rather than changing it to 4?
Because the previous route from C to A was via B (with cost 2). Since now B is announcing to C a new route with cost 3, C has to update the cost to 4. This could happen in a scenario when the path from B to A has changed, and has a higher cost; C has to use the new cost.
Related
I have a dataset with different email codes, email recipients and a flag of whether they responded to the email. I calculated the past response rates for each person, for the emails preceding the current email (sum of responses / number of emails). It looks something like this:
email_code responded person number_of_emails response_rate date
wy2 1 A 0 0 2022/01/12
na3 1 A 1 100 2022/01/22
li3 0 A 2 100 2022/01/23
pa4 1 A 3 66 2022/01/24
However, this doesn't seem right. Imagine that person A received 1 email and replied to it, so their response rate will be 100%. Person B received 10 emails and replied to 9 of them, so their response rate will be 90%. But person B is more likely to respond.
I think I need to calculate some Bayesian average, in a similar vein to this post and this website. However, these websites show how to do this for ratings, and I do not know how I can adapt the formula to my case.
Any help/suggestions would be greatly appreciated!
The post on SO perfectly describes how you can calculate the Bayesian rating, IMO.
I quote:
rating = (v / (v + m)) * R +
(m / (v + m)) * C;
The variables are:
R – The item's own rating. R is the average of the item's votes. (For example, if an item has no votes, its R is 0. If someone gives it 5 stars, R becomes 5. If someone else gives it 1 star, R becomes 3, the average of [1, 5]. And so on.)
C – The average item's rating. Find the R of every single item in the database, including the current one, and take the average of them; that is C. (Suppose there are 4 items in the database, and their ratings are [2, 3, 5, 5]. C is 3.75, the average of those numbers.)
v – The number of votes for an item. (To given another example, if 5 people have cast votes on an item, v is 5.)
m – The tuneable parameter. The amount of "smoothing" applied to the rating is based on the number of votes (v) in relation to m. Adjust m until the results satisfy you. And don't misinterpret IMDb's description of m as "minimum votes required to be listed" – this system is perfectly capable of ranking items with less votes than m.
So in your case:
R is the response rate or number of replies / number of received emails. If someone hasn't received any emails set Rto0to avoid divison by zero. If the haven't responded to any received emails theirR` is of course zero.
C, is the sum of Rs of all recipients divided by the number of all recipients.
v, is the number of received emails. If someone received 10 emails, their v will be 10. If the haven't received any emails, their v will be zero.
m, is, as described in the original post, the tuneable parameter.
Further quote from the original post which describes m very well:
All the formula does is: add m imaginary votes, each with a value of C, before calculating the average. In the beginning, when there isn't enough data (i.e. the number of votes is dramatically less than m), this causes the blanks to be filled in with average data. However, as votes accumulates, eventually the imaginary votes will be drowned out by real ones.
Reading the book of Aziz & Prakash 2021 I am a bit stuck on problem 3.7 and the associated solution for which I am trying to implement.
The problem says :
You have n users with unique hashes h1 through hn and
m servers, numbered 1 to m. User i has Bi bytes to store. You need to
find numbers K1 through Km such that all users with hashes between
Kj and Kj+1 get assigned to server j. Design an algorithm to find the
numbers K 1 through Km that minimizes the load on the most heavily
loaded server.
The solution says:
Let L(a,b) be the maximum load on a server when
users with hash h1 through ha are assigned to servers S1 through Sb in
an optimal way so that the max load is minimised. We observe the
following recurrence:
In other words, we find the right value of x such that if we pack the
first x users in b - 1 servers and the remaining in the last servers the max
load on a given server is minimized.
Using this relationship, we can tabulate the values of L till we get
L(n,m). While computing L(a,b) when the values of L is tabulated
for all lower values of a and b we need to find the right value of x to
minimize the load. As we increase x, L(x,b-1) in the above expression increases the the sum term decreases. We can do binary search for x to find x that minimises their max.
I know that we can probably use some sort of dynamic programming, but how could we possibly implement this idea into a code?
The dynamic programming algorithm is defined fairly well given that formula: Implementing a top-down DP algorithm just needs you to loop from x = 1 to a and record which one minimizes that max(L(x,b-1), sum(B_i)) expression.
There is, however, a simpler (and faster) greedy/binary search algorithm for this problem that you should consider, which goes like this:
Compute prefix sums for B
Find the minimum value of L such that we can partition B into m contiguous subarrays whose maximum sum is equal to L.
We know 1 <= L <= sum(B). So, perform a binary search to find L, with a helper function canSplit(v) that tests whether we can split B into such subarrays of sum <= v.
canSplit(v) works greedily: Remove as many elements from the start of B as possible so that our sum does not exceed v. Repeat this a total of m times; return True if we've used all of B.
You can use the prefix sums to run canSplit in O(m log n) time, with an additional inner binary search.
Given L, use the same strategy as the canSplit function to determine the m-1 partition points; find the m partition boundaries from there.
I am struggling optimising this past amazon Interview question involving a DAG.
This is what I tried (The code is long and I would rather explain it)-
Basically since the graph is a DAG and because its a transitive relation a simple traversal for every node should be enough.
So for every node I would by transitivity traverse through all the possibilities to get the end vertices and then compare these end vertices to get
the most noisy person.
In my second step I have actually found one such (maybe the only one) most noisy person for all the vertices of the traversal in step 2. So I memoize all of this in a mapping and mark the vertices of the traversal as visited.
So I am basically maintaining an adjacency list for the graph, A visited/non visited mapping and a mapping for the output (the most noisy person for every vertex).
In this way by the time I get a query I would not have to recompute anything (in case of duplicate queries).
The above code works but since I cannot test is with testcases it may/may not pass the time limit. Is there a faster solution(maybe using DP) to this. I feel I am not exploiting the transitive and anti symmetric condition enough.
Obviously I am not checking the cases where a person is less wealthy than the current person. But for instance if I have pairs like - (1,2)(1,3)(1,4)...etc and maybe (2,6)(2,7)(7,8),etc then if I am given to find a more wealthy person than 1 I have traverse through every neighbor of 1 and then the neighbor of every neighbor also I guess. This is done only once as I store the results.
Question Part 1
Question Part 2
Edit(Added question Text)-
Rounaq is graduating this year. And he is going to be rich. Very rich. So rich that he has decided to have
a structured way to measure his richness. Hence he goes around town asking people about their wealth,
and notes down that information.
Rounaq notes down the pair (Xi; Yi) if person Xi has more wealth than person Yi. He also notes down
the degree of quietness, Ki, of each person. Rounaq believes that noisy persons are a nuisance. Hence, for
each of his friends Ai, he wants to determine the most noisy(least quiet) person among those who have
wealth more than Ai.
Note that "has more wealth than"is a transitive and anti-symmetric relation. Hence if a has more wealth
than b, and b has more wealth than c then a has more wealth than c. Moreover, if a has more wealth than
b, then b cannot have more wealth than a.
Your task in this problem is to help Rounaq determine the most noisy person among the people having
more wealth for each of his friends ai, given the information Rounaq has collected from the town.
Input
First line contains T: The number of test cases
Each Test case has the following format:
N
K1 K2 K3 K4 : : : Kn
M
X1 Y1
X2 Y2
. . .
. . .
XM YM
Q
A1
A2
. . .
. . .
AQ
N: The number of people in town
M: Number of pairs for which Rounaq has been able to obtain the wealth
information
Q: Number of Rounaq’s Friends
Ki: Degree of quietness of the person i
Xi; Yi: The pairs Rounaq has noted down (Pair of distinct values)
Ai: Rounaq’s ith friend
For each of Rounaq’s friends print a single integer - the degree of quietness of the most noisy person as required or -1 if there is no wealthier person for that friend.
Perform a topological sort on the pairs X, Y. Then iterate from the most wealthy down the the least wealthy, and store the most noisy person seen so far:
less wealthy -> most wealthy
<- person with lowest K so far <-
Then for each query, binary search the first person with greater wealth than the friend. The value we stored is the most noisy person with greater wealth than the friend.
UPDATE
It seems that we cannot rely on the data allowing for a complete topological sort. In this case, traverse sections of the graph that lead from known greatest to least wealth, storing for each person visited the most noisy person seen so far. The example you provided might look something like:
3 - 5
/ |
1 - 2 |
/ |
4 --
Traversals:
1 <- 3 <- 5
1 <- 2
4 <- 2
4 <- 5
(Input)
2 1
2 4
3 1
5 3
5 4
8 2 16 26 16
(Queries and solution)
3 4 3 5 5
16 2 16 -1 -1
I'm having problems understanding the count to infinity for RIP.
I understand how the table is initially set up using distance vectors. But when a link breaks the costs must be recalculated and updated in the new table. I'm not sure how the (3,3) value is updated after the (4,1). Why would the cost from node 3 to node 3 be 3.
In this example the prof posted the link between node 3 and 4 breaks.
The table is the route table for node 4 as the destination.
(x,y) in the table means "I can get to node 4 via x in y steps.
before the break:
N1 can get to N4 via N2 in 3 steps.
N2 can get to N4 via N3 in 2 steps.
N3 can get to N4 via N4 in 1 step.
After the break, N3 no longer knows how to get to N4 directly. The problem is that N2 thinks it knows how to get to N4 in 2 steps, and communicates this to N3. Therefore N3 now thinks that it can get to N4 via N2 in 3 steps, and then the downward spiral begins.
Sorry, I disagree with the given answer in the question itself (i'm not referring to wookie919 answer".
In real life implementation, there will not be any count to infinity event for the above topology, when link between 3 and 4 is down. Node 3 will know it and it will do a route poisoning where it will inform Node 2 that the link 3<-->4 is down (cost is 16 infinity) and no longer reachable. Node 2 will wait for some time to accept this info. There is definitely no count to infinity.
Count to infinity only happen in where there is a loop topology in the network
http://www.cs.fsu.edu/~curci/itl/labs/countinf/countinf.htm
I've been trying to trace Dijkstra's shortest path algorithm for the following undirected graph:
(B)
/ \
/ \
6 / \ 9
/ \
/ \
/ \
(A)- 5 -(C)- 1 -(F)----2----(I)
\ /
\ /
4 \ / 2
\ /
\ /
\ /
(D)
For clarification:
(N) will represent nodes, numbers with no formatting will represent weights.
the edge between A and C has a weight of 5,
the edge between C and F has a weight of 1.
I'll outline my process here:
Since A is my initial node, the algorithm begins here. Since D is the cheaper path, the algorithm traverses to D. A is now marked as visited, meaning we cannot traverse to it again.
At D it is easy to see that we will move to F.
F is where I start having trouble. Since the shortest path will lead me to C, I'm stuck between two visited nodes with no way to get to I. Can anyone help me?
EDIT: Sorry about the graph guys, this question was originally asked from a phone. I'll get that fixed asap.
The way you are working on it is wrong. "At D it is easy to see that we will move to F" that is not true. You first visit D, then C, not F. Take a careful look at the algorithm and what it does.
At first you visit A so you have the following cost: 6 to B, 5 to C, 4 to D and INFINITE for the rest of the nodes.
You first go to D. You now update your cost to go from A to F (passing through D) to 6. Your next node to visit is not D, it is C as it has the lowest cost (5) of all the unvisited nodes. The cost of going from A to F passing through C is 6 which is already the cost you have so there is not need to update.
From there you have a tie of 6 between B and F. Let's say you first go to B, then nothing happens since the shortest path to F is already 6, while passing through B to go to F would cost 15, which is more expensive than the cost you already have so don't update the cost. Then you visit F since it has the lowest cost of all the unvisited nodes. From there you update your path to I which it won't be INFINITE anymore but 8.
As result, your shortest path from A to I is the following sequence: A - D - F - I .
From C you still cannot go back to A and you cannot return to F so this path is false.
You need to remove the C from the graph at next iteration if it gives you dead-end, or ignore last step and move from F to I as you would expect.
http://en.wikipedia.org/wiki/Dijkstra's_algorithm - you remember that it processes ALL of a vertex's neighbours before moving on to another vertex, right?
Before moving on to D, it processes C and B (calculates their distances). And judging from your graph, there is no route between D and F..
Dijkstra's algorithm uses a priority queue. It's not a walk on the graph, and it is possible to visit vertices in order that does not resemble a path. For example, this tree:
A -> B -> C
\
> D -> E -> F
with all weights 1 is explored in order A,B,D,C,E,F. Each iteration you visit the vertex with smallest cost and pop it; initially you pop A, the cost of B and D is updated to 1; you visit B, the cost of C is updated to 2; you visit D, the cost of E is updated to 2; you visit E; and finally F.