Should raft follower update term on receiving vote request with higher term? - raft

According to raft specification: Section 5.2: Leader election: A
server remains in follower state as long as it receives valid
RPCs from a leader or candidate. It is pretty clear that if the request is an append entry RPC, the follower should extend its timeout. But what if it is a requestVote RPC that is received by the follower? Should it reset its timeout? If the request vote RPC's term is greater than follower's term, should the follower increase its term as well?

If a follower receives a RequestVoteRpc with a term that is higher than it's current term (and assuming that candidate’s log is at least as up-to-date as receiver’s log) the follower will:
grant the vote to the candidate
update it's term to the one received from the candidate
remain in the FOLLOWER state
kick-off a new timer for AppendEntriesRpc
The follower should increase its term because if the timer waiting for AppendEntriesRpc expires, the follower will start a new election with a term greater than the term form the election he participated last.

Related

Assume that there are 101 nodes in a Raft cluster. Then, to succeed an operation, how many responses are required in the cluster?

I know that the leader's response is required, but are other nodes required to respond in order for the operation to succeed?
Depends on what do you mean with "leader's response". Leader accepts a request from a client and sends a response to the client when a new log entry is committed. It is totally possible that the request is committed, but the communication has failed to notify the customer. In that case the customer does not actually know the outcome, the they should act accordingly.
To under the question in the title: raft original paper (https://raft.github.io/raft.pdf) is pretty clear on that: "A log entry is committed once the leader
that created the entry has replicated it on a majority of
the servers " [5.3]. In other words, majority has to accept the new entry for it to be accepted, which is equivalent to "operation succeeded".
Actually the leader response is not required for a record to be committed to the log. Leader

How raft node know itself has voted after an crash recover?

If an raft node has voted for some candidate ,then crash before it could persistent the vote info, will the server has re-vote ability after restart?
The way this should work is to persist the vote first before sending the vote.
In the worst case when the candidate does not receive enough votes (due to many crashes right after persisting or the vote is lost when being sent over the network), just start the election again.
Please note the highlighted text from raft paper: https://raft.github.io/raft.pdf
This could be confirmed by https://raft.github.io/ visualization:

Corda: Making HTTP request within a responder flow?

Is it okay to make HTTP requests to a counter party's external service from within a responder flow?
My use case is a Party invokes a "request-token" flow with an exchange node. That exchange node makes a HTTP request (on the responder flow) to move cash from that parties account to an exchange account in the external payment system. The event of the funds actually hitting the count and hence the issuance of the tokens would happen with another flow.
If it is not okay, what may be an alternative design to achieve the task?
It is not always a good idea to make HTTP request that way.
Unless you think very carefully about what happens when the previous checkpoint is replayed.so dedupe and idempotence are key considerations.plus what happens if target is down? plus this may exhaust the thread pool upon which the fibers operate.
Flows are run on fibers. CordaServices can spawn their own threads
threads can block on I/O, fibers can only do so for short periods and we make no guarantees about freeing resources, or ordering unless it is the DB. Also threads can register observables
The real challenge is restart-ability and for that they need to test the hell out of their code with a random kills.
You need to be aware that steps can be replayed in the event of a crash. this is true of any server-side work based system that restarts work.
Effectively, you should:
Step 1) execute an on-ledger Corda transaction to move one or more
assets into a locked state (analogous to XA 'prepare'). When
successfully notarised,
Step 2) execute the off-ledger transaction
with an idempotent call that succeeds or fails. When we know if it
succeeded or failed, move to
Step 3) execute a second Corda
transaction that either reverts the status of the asset or moves it
to its intended final state

Handle RPC and inter Node communication timeout

I have 5 nodes & 1 Notary node corda private network and a web client with UI & RESTful services.
There are quite a few states and its attributes that are managed by users using the UI. I need to understand how to handle timeouts and avoid multiple updates or Errors
Scenario 1
User is viewing a specific un-consumed current state of a feature.
User performs an edit and updates the state
Upon receiving the request RESTful component use CORDA RPCClient to start the flow. It sets the timeout value e.g. 2 secs
CORDA flow runs the configured rules and it has to sync / collect signatures from all the participating nodes (4). Complete processing takes more than 2 secs (Some file processing, multiple states update etc. I can change the timeout to higher value based on specific use cases. It can surely happen anytime. Need to understand what is the recommended way of handling)
As time taken is higher than provided, CORDA RPCClient throws exception. For the RESTFul service / User transaction has failed.
Behind the scenes CORDA is processing and collecting signatures and updating nodes. From CORDA perspective everything looks fine and changed set is committed to the ledger.
Question:
Is there a way to know transaction submitted is in progress so RESTful service should wait
If user submits again we do check for transaction hash is the latest one associated with unconsumed state and reject if not (It was provided to UI while querying.
Any recommended way of handling.
Scenario 2
User viewing a specific un-consumed current state of a feature.
User performs an edit and updates the state
Upon receiving the request RESTful component use CORDA RPCClient to start the flow. It sets the timeout value e.g. 2 secs
CORDA flow runs the configured rules and it has to sync / collect signatures from all the participating nodes (4). One of the nodes is down or not reachable. Flow hangs / waits for the node to be live again.
RESTFul service / UI receives a timeout exception. User refreshes the view and submits the change again. Querying the current node will return old data and user will try to make change again and submit. Same will happen at CORDA layer transaction will be of latest unconsumed state (comparing the tx hash as state is not committed, it will proceed further and will hang / waits for the node to be live again. It waits for long time i have waited for a minute it did not quite trying.
Now the node comes up and will be syncing with peers. Notary will give exception as there are two states / requests pending to form the next state in chain. Transaction fails.
Question:
Is there a way to know transaction submitted is in progress so RESTful service should wait
Any recommended way of handling.
Is there a way to provide timeout values for node communication.
Do i need to keep monitoring if the node is active or not and accordingly tailor user experience.
Appreciate all the help and support for above issue. Please let me know if there is any additional information needed.
Timeouts
As of Corda 3.3, there is no way to set a timeout either on a Corda RPC request, a flow, or a message to another node. If another node is down when you try to contact it as part of a flow, the message will simply remain in your outbound message queue until it can be successfully delivered.
Checking flow progress
Each flow has a unique run ID. When you start a flow via RPC (e.g. using CordaRPCOps.startFlowDynamic), you get back a FlowHandle. The flow's unique run ID is then available via FlowHandle.id. Once you have this ID, you can check whether the flow is still in progress by checking whether it is still present in the list of current state machines (i.e. flows):
val flowInProgress = flowHandle.id in cordaRPCOps.stateMachinesSnapshot().map { it.id }
You can also monitor the state machine manager to wait until the flow completes, then get its result:
val flowUpdates = cordaRPCOps.stateMachinesFeed().updates
flowUpdates.subscribe {
if (it.id == flowHandle.id && it is StateMachineUpdate.Removed) {
val int = it.result.getOrThrow()
// Handle result.
}
}
Handling duplicate requests
The flow will throw an exception if you try and consume the same state twice, either when you query the vault to retrieve the state or when you try to notarise the transaction. I'd suggest letting the user start the flow again, then handling any double-spend errors appropriately and reflecting them on the front-end (e.g. via an error message and automatic refresh).

Do we really need a notary that validates?

At the risk of sounding naive, I ask myself "Is a validating notary necessary?", given all the issues with it - transaction and dependencies leak, exposure of state model, to name a few.
The answer I hear has something to do with the potential attack where a dishonest node tries to steal someone else's asset. For example, in a legitimate transaction, partyA sold partyB some asset S subject to a Move contract. Immediately afterwards, partyA creates a self-signed transaction that transfers S back to himself subject to a dummy contract in a bogus flow that does not even run the ledger transaction verify(). But when he calls FinaltyFlow to satisfy the simple notary to commit the transaction on the ledger, it will fail the verifyContracts() because S appoints to the Move contract which should say owner partyB must sign the bogus transaction.
So that does not convince me of the need for a validating notary.
Apparently, I must have missed something. Can someone enlighten me?
Thanks.
\Sean
As you say, the advantage of the validating notary is that it prevents a malicious party from "wedging" a state - that is, creating an invalid transaction that consumes an unconsumed state that they are aware of. Although the transaction is invalid, the non-validating notary would not be aware of this, and would mark the state as spent.
You're correct that this invalid transaction would fail the FinalityFlow's call to verifyContracts(). However, we cannot force a malicious node to use FinalityFlow to send something to the notary. If the malicious node was sufficiently motivated, they could build the invalid transaction hash by hand and send that to the notary directly, for example.
However, note that in the non-validating notary case, there are still multiple layers of protection against wedging:
The malicious party has to be aware of the state(s) they want to wedge. Since information in Corda is only distributed on a need-to-know basis, the node would only be aware of a small subset of unconsumed states
If the state is wedged by accident, the node can show the invalid transaction to the notary. Upon seeing that the transaction is invalid, the notary will mark the re-mark the state as unconsumed
As Corda is a permissioned network, the notary can keep a record of the legal identity of everyone who submits a transaction. If a malicious node is submitting invalid transactions to consume states, their identity will be known, and the dispute can be resolved outside of the platform, based on the agreements governing participation in the network
On the other hand, SGX addresses the data leak issue associated with validating notaries. If the validation of the transaction occurs within an SGX enclave, the validating notary gains no information about the contents of the transaction they are validating.
Ultimately, the choice between validating and non-validating notaries comes down to the threat model associated with a given deployment. If you are using Corda in a setting where you are sure the participants won't deliberately change their node's code or act maliciously, then a validating notary is not needed.
But if you assume somebody WILL try to cheat and would be willing to write their own code to do so, then a validating notary provides an extra layer of protection.
So Corda provides choices:
Choose to reveal more to a notary cluster if you trust the participants relatively less...
Choose to reveal less to a notary cluster if you consider the risk of revealing too much to the notaries to be the bigger problem
(and use SGX if you're paranoid about everybody!)

Resources