In the picture step b, s5 is elected as leader with term 3. How this 3 comes from ?
In the paper, when a follower gets message from leader encounter timeout, it will increments its term and turn to a candidate state. So i think the term is 2, and it can still win the election to become leader. Because the node s3 and s4 will vote for it.
The description to the picture has a single indicator for term: at step B the term is three; this means that at step A the term is 2. As simple as that. Based in the description, it seems this picture clarifies some previous example and in that example the term got to 2 somehow.
Few observations on the picture to avoid any confusion.
The picture itself has no indication of current term for either step. Horizontal numbers are index, not term. The picture would be better if the author would use [term, value] for each index.
We know the term is 3 as step B. This means that the term became 4 when S1 was reelected. And the rest of the picture explains why d1 won't happen and value (2) accepted by S1 in term 4 and accepted my majority won't ever be lost - which is the main property of the protocol - if a value is accepted by majority, the value won't be lost.
Quick note: a value accepted by majority won't be lost, but the term of that value may change. In other words - every index contains a tuple [term, value]. If a specific value is accepted by majority, then the leader and majority of followers will have same [term, value] for the same index. If the leader fails before issuing commit, then the new leader will emerge and will re-propose the same value with a new term. So as soon as a value is accepted by majority, the value is to stay, even if term will change.
Related
Will the partitioned server increase term all the time?
If so, I get another confusion.
Chapter 3.6 (safety) in raft paper says:
Raft determines which of two logs is more up-to-date by comparing the index and term of the last entries in the logs.If the logs have last entries with different terms, then the log with the later term is more up-to-date. If the logs end with the same term, then whichever log is longer is more
up-to-date.
It got me thinking about a scenario when one server from a partitioned network win the election because of the huge term, then causing the unconsistency. Will that happens?
Edit part:
When a node with larger term rejoins the cluster, that will force the current leader to step down. When a leader sends request to a follower and the follower has larger term; the follower rejects the request and the leader sees larger term; that forces the leader to step down and new election will happen.
Some raft implementations have an extra step before a follower becomes candidate - if a follower does not hear from the leader, the follower tries to connect other followers; and if there is a quorum, then the follower becomes a candidate.
(I read your link), and yes, an implementation may keep increasing the term; or be more practical and wait till majority is reachable - there is no point to initiate an election if no majority is available.
I decided to comment because of this line in your question "win the election because of the huge term". Long time ago, when I read raft paper (https://raft.github.io/raft.pdf), I was quite confused by the election process.
Section 5.2 talks about leader election; and it has these words "Each server will vote for at most one candidate in a
given term, on a first-come-first-served basis (note: Section 5.4 adds an additional restriction on votes)."
Section 5.4 "The previous sections described how Raft elects leaders and replicates log entries. However, the mechanisms
described so far are not quite sufficient to ensure that each
state machine executes exactly the same commands in the
same order..."
Basically if a reader reads the paper section by section, and stops to think after each of them, then the reader will be a bit confused. At least I was :)
As a general conclusion: new term absolute value does not make any difference. It is used to reject older terms. But for actual leader election (and new term being started) it's the state of the log what actually matters.
Consider that we are running Raft on 3 machines: A, B, C and let A be the leader. There is a network partition that splits C, from A, B. Call the current term t. A and B remain on term 2, with no additional messages besides periodic heartbeats. At this time, C enters candidate state and increments term to 3, votes for itself, times out, and repeats. After say 10 cycles, the network partition is resolved. Now the state is A[2], B[2], C[12]; C will reject AppendEntries RPC from A as the term 2 is less than its current term, 10; C cannot assemble a quorum and will continue to run the leader election protocol as a candidate, and become increasingly more divergent from the current term value of A and B.
The question is then, how does Raft (or Raft-derived implementations) handle this issue? Some thoughts I had included:
Such a situation is an availability issue, rather than a safety violation. Ignore and let human operators handle by killing or resetting C
Exponential backoff to decrease the divergence of C per elections
Have C use lastApplied instead of currentTerm as the basis for rejecting or accepting the AppendEntries RPC. That is, we trust the log as the source of truth for terms, rather than currentTerm value. This is already used to ensure that C would not win as per the Election Restriction, however the paper seems to indicate that this "up-to-date" property is a grounds for not voting for C, but is not grounds for C to acquiesce and reset to a follower.
Note: terminology as per In Search of an Understandable Consensus Algorithm (Extended Version)
When C rejects an AppendEntries RPC from the leader A, it will return its now > 2 term. Raft replicas always recognize greater terms, so that in turn will cause the leader to step down and start a new election. Eventually, the cluster will converge on a new term that’s > 2 and which is >= C’s term.
This is an oft discussed (in the Raft dev community) somewhat inconvenient scenario that can cause unnecessary churn in Raft clusters. To guard against it, the Raft dissertation — and most real-world implementations — introduce and use the so-called “pre-vote protocol.” The pre-vote protocol essentially dictates that before becoming a candidate, a follower must first determine whether it can win an election by asking its peers. In the scenario you described above, C would ask for a pre-vote from A and B, and because of the network partition it would not receive any votes. So, C would never transition to the candidate role, never increment the term, and thus never present a term > 2 after the partition heals. Thus, you’ve eliminated the churn.
You can read more about the pre-vote protocol in Diego’s dissertation.
I'm attempting to develop a better intuition of information entropy. The example case I would like to look at is a symbolic graph where the leaves are more specific entity types than the root.
If each node in the graph can be recorded using the same number of bits eg.. a 64 bit id handle.. would the more specific 'Jack Pine' node be considered to be higher or lower entropy than the 'Tree' node?
Update 12 hours after post: Perhaps this concept can't be understood with entropy.. Assume the symbol 'jack pine' has more information in it than symbol 'tree' yet can be transmitted using the same number of bits - in this case the observer receiving the symbol is decompressing the information in their mind based on previous knowledge.. so receiving the symbol for 'jack pine' is giving them more information than receiving the symbol 'tree' because they understand a specific type of tree. Does this mean 'jack pine' is lower entropy because it has more information for a given signal and thus higher compression?
If I assume that each branch has equal probability, then going from tree to a kind of tree adds one bit of entropy. Going from a pine tree to a kind of pine tree adds another bit of entropy. Knowing that it is a Jack Pine has two bits more entropy than just knowing that it is a Tree.
Suppose a cluster of 5 nodes(ABCDE), node-A is elected leader at the beginning, and while leader-A issues AppendEntries RPCs to the follower(BCDE) to replicate log entry(log-X),only node-B receives and returns success, at this point leader-A crashes.
If node C(or D or E) wins the next leader election then everything's fine, because only node B has log-X, and that means log-X is not committed.
My question is, could node-B (which has the highest term and longest log) win the next leader election? If so, will node-B spread log-X to other nodes?
Yes B could win the election, if it does become leader then the first thing it does it to create a log entry with the new term in its log, and start replicating its log to all the followers. As B's log includes log-X, if all goes well, eventually the log-X entry will be replicated & considered committed.
If node C wins the election, then when it becomes leader, it won't have the log-X entry, and it'll end up overwriting that entry on Node B.
See section 5.4.2 of the raft paper for more details.
Also this means you can't treat failure as meaning the attempted entry definitely doesn't exist, only that the caller doesn't know the outcome. Section 8 has some suggestions for handling this.
A DisjointSet is a kind of Object.
A DisjointSet is a part of every DisjointSet.
Trying to start a DisjointSet Object in Inform 7, but even though I know it could cause a infinite loop, I want to do it anyway for the obvious reason that it is part of the algorithm.
Here is the error
You wrote 'A DisjointSet is a part of every DisjointSet' : but this
generalisation would be too dangerous, because it would lead to
infinite regress in the assembly process. Sometimes this happens if
you have set up matters with text like 'A container is in every
container.'.
I suppose it would be easier to do something like this in Inform 6, but I don't have any knowledge of it so I am trying to avoid it. But will accept help in Inform 6 too.
Edit following #jeroen-mostert advice:
Maybe I'm doing it wrong, but maybe some sample code might help.
A DisjointSet is a kind of Container.
A DisjointSet always contains a DisjointSet called the Parent.
The First Decl is a DisjointSet.
The Second Decl is a DisjointSet.
The Parent of the First Decl is the Second Decl. [This line doesn't work.]
The sentence 'The Parent of the First Decl is the Second Decl'
appears to say two things are the same - I am reading 'Parent of the
First Decl' and 'Second Decl' as two different things, and therefore
it makes no sense to say that one is the other: it would be like
saying that 'St Peter is St Paul'. It would be all right if the second
thing were the name of a kind, perhaps with properties: for instance
'Pearly Gates is a lighted room' says that something called Pearly
Gates exists and that it is a 'room', which is a kind I know about,
combined with a property called 'lighted' which I also know about.
I'll give two answers. First, the question you asked:
Linked lists in Inform 7
The problem is that contains and is a part of indicate physical concepts. When you use those words, Inform thinks you're talking about matter in the fictional universe, as if you said, "every bucket contains a bucket".
Instead, define your own property that has nothing to do with Inform's physical world model:
Every DisjointSet has a DisjointSet called the Parent.
With this change, your code works.
Now on to the question you didn't ask:
Relations in groups
If what you want is a bunch of sets of objects, where the sets are all disjoint, use an equivalence relation:
Friendship relates people to each other in groups.
This defines a relation called "friendship" that divides all animals into disjoint sets, such that an animal is friends with the other animals in that set, and not friends with any other animal.
Then you must teach Inform a bit of vocabulary:
The verb to be friends with means the friendship relation.
After that, the phrase "X is friends with Y" means that X and Y are in the same friendship set. You can say things like "Now the badger is friends with the giant squid" to update the sets.
See "Relations in groups" in the manual.