Expand description
Rules and predicates that help with maintaining the invariant properties of the Block Tree data structure.
The rustdoc for this module is divided into three sections:
- Invariants clarifies the notion of block tree invariants and groups them into two categories depending on whether they are “local” or “global” in nature.
- Methods lists the methods defined in this module and groups them into two categories depending on whether they deal with “whether” questions about state updates or “what” questions.
- Finally, Blockchain Consistency discusses HotStuff-rs’ overarching global invariant, “blockchain consistency”, and how the methods in this module work together to enforce it.
§Invariants
In the context of this module, invariants are logical properties that are always true about the Block Tree.
Block tree invariants can be grouped into two categories depending on their scope:
- Local Invariants: invariants that pertain to isolated parts of the block tree. An example of a
local invariant is the invariant enforced by
safe_nudge
thatnudge.justify.phase
must be eitherPrepare
,Precommit
, orCommit
. - Global Invariants: invariants that relate different parts of the block tree. An example of a
global invariant is the invariant enforced by
safe_pc
that either: (i).pc.block
extends fromblock_tree.locked_pc()?.block
, or (ii).pc.view
is greater thanblock_tree.locked_pc()?.view
.
Some simple local invariants can be enforced by the type system at compile-time and therefore
do not need to be checked at runtime. For example, the typedef of Phase
automatically guarantees
the invariant nudge.justify.phase
could only be one of five values–Generic
, Prepare
, Precommit
,
Commit
and Decide
.
More complicated invariants, however (including both global invariants and also more complicated local
invariants, as illustrated by the safe_nudge
and safe_pc
examples above), can only be enforced by
runtime logic. This is where the methods in this module come in.
§Methods
The methods in this module each help enforce a combination of local and global block tree
invariants. Specifically, they do this by ensuring that every block tree update, i.e., set of
state mutations done by the
top-level updater methods defined on the
BlockTreeSingleton
struct, is invariant-preserving. This idea can be summarized in simple
formulaic terms as: a block tree that satisfies invariants + a invariant-preserving update = an
updated block tree that also satisfies invariants.
Each method works to ensure that every update is invariant-preserving in one of two different ways:
- By checking whether an event (like receiving a
Proposal
or collecting aPhaseCertificate
) can trigger an invariant-preserving update, or - By determining what invariant-preserving updates should be made in response to an event.
These two different ways allow us to group the methods in this module into the two different categories discussed in the following subsections.
Before reading the following subsections, please first note that not every top-level updater method
directly uses or is related to the methods in this module. In particular, set_highest_tc
,
set_highest_view_entered
, and set_highest_view_voted
have simple enough preconditions that they
do not need to have functions of the “whether” category in this module, and do state updates that are
simple enough that they do not need functions of the “what” class either. The methods in this module
only directly relate to the insert
and update
top-level
state mutators.
§Category 1: “whether”
Methods in this category: safe_pc
, safe_block
, safe_nudge
, (outlier) repropose_block
.
These methods check whether a PhaseCertificate
, Block
, or Nudge
(respectively) can
trigger invariant-preserving state updates. Methods in this category feature in the preconditions
of the insert
and update
.
We also include in this category the method called repropose_block
. This method does not fit
neatly into this category in terms of name or purpose, but is closely related to safe_nudge
in
that it serves to help proposers choose a block to propose that satisfy the “consecutive views”
requirement that safe_nudge
checks.
§Category 2: “what”
Methods in this category: pc_to_lock
, block_to_commit
.
These methods help determine what invariant-preserving state updates should be triggered in
update
in response to obtaining a PhaseCertificate
, whether through receiving a Proposal
,
Nudge
, or NewView
message, or by collecting enough PhaseVote
s. Methods in this category are
called inside update
.
§Blockchain Consistency
The most important global invariant that HotStuff-rs guarantees is called “Blockchain Consistency”. Blockchain consistency is the property that the block trees of all honest replicas are identical below a certain block height.
This “certain block height” is exactly the height of the
Highest Committed Block. Below this height, the blocktree (a directed
acyclic graph of blocks) is reduced to a blockchain, a directed acyclic graph of blocks with the
restriction that every block has exactly one inward edge (formed by the justify
of its child) and
one outward edge (formed by its justify
to its parent).
The blockchain grows as more and more blocks are committed. Committing is a state update whereby a block (and transitively, all of its ancestors) become part of the permanent blockchain. Critically, committing is a one-way update: once a block becomes committed, it can never be un-committed. Because of this, it is essential that the protocol is designed such that a replica only commits when it is sure that all other honest replicas have either also committed the block, or will eventually.
This section explains how the methods in this module work together to maintain blockchain consistency. The discussion is split into two parts:
- First, Locking discusses an intermediate state that blocks enter after being inserted but before being committed, that is, being “Locked”.
- Then, Committing discusses how blocks move between being locked into being committed.
§Locking
Before a block is committed, its branch must first be locked.
Locking ensures that an honest replica only commits a block when either of the following two conditions hold:
- All other honest replicas have also committed the block, in which case the commit is trivially consistent, or
- If not all honest replicas have committed the block, then a quorum of replicas is currently locked on the block, which makes it impossible for a PC for a conflicting block to be formed.
The consequence of condition 2 is that condition 1 will eventually hold, making the block safe to commit.
Locking entails keeping track of a block tree variable called “Locked PC” and doing two things with it:
- Updating the Locked PC whenever it is appropriate, according to the logic implemented by
pc_to_lock
, and - Checking every PC received or collected against the Locked PC. Only PCs that pass this check and therefore “satisfy the lock” are allowed to cause state updates.
Updating the locked PC and checking the locked PC is discussed in turn in the following two subsections:
§Locking on a Block
Any time update
is called, the locked_pc
should potentially be updated. The pc_to_lock
method in this module decides what locked PC should be updated to.
The precise logic used by pc_to_lock
to decide which PC to lock on is documented in
the doc for pc_to_lock
. In short, the basic logic for choosing
which PC to lock in HotStuff-rs is the same as the basic logic for choosing which PC to lock in the
PODC’19 HotStuff paper, that is, “lock on seeing a 2-Chain”.
The basic logic is already well-documented in the PODC ’19 paper, so for brevity, we do not re-describe it here. Instead, in the rest of this subsection, we describe a small but nuanced difference in the precise logic, and then explain the rationale behind this difference:
In both HotStuff-rs and PODC ’19 HotStuff, which PC pc_to_lock
decides to lock on upon receiving
justify
depends on what justify.phase
is:
- If
justify.phase
isGeneric
,Prepare
, orPrecommit
,pc_to_lock
‘s decision rule is exactly the same as the decision rule used in the algorithm in the original (PODC’ 19) HotStuff paper that corresponds to the operating mode that thePhase
is part of (recall that the Pipelined Mode corresponds to Algorithm 1, while the Phased Mode corresponds to Algorithm 3). - On the other hand, if
justify.phase
isCommit
orDecide
,pc_to_lock
will decide to lock onjustify
(as long as the currentlocked_pc.block
is different fromjustify.block
). This is different from the logic used in Algorithm 1 in the original HotStuff paper, which does not updatelocked_pc
upon receiving aCommit
PC (there is no phase calledDecide
in the original HotStuff paper).
The reason why the PODC ’19 HotStuff does not lock upon receiving a Commit
or Decide
PC while
HotStuff-rs does becomes clearer when we consider that the original HotStuff makes the simplifying
assumption that receiving any proposal implies that we have received every proposal in the chain
that precedes the proposal. E.g., receiving a proposal for a block at height 10 means that we (the
replica) has previously received a complete set of proposals for the ancestor blocks at heights
0..9, including for every phase.
This assumption simplifies the specification of the algorithm, and is one that is made by many
publications. However, this assumption is difficult to uphold in a production setting, where
messages are often dropped. HotStuff-rs’ Block Sync goes some way toward making
this assumption hold, but is not perfect: in particular, Sync Servers only send their singular
current highest_pc
in their
SyncResponse
s, which could be a PC of any phase: Generic
up to Decide
.
This means that if we use the same logic as used in Algorithm 1 to decide on which PC to lock on
upon receiving a phased mode PC, i.e., to lock only if
justify.phase == Precommit
, then we will fail to lock on justify.block
if justify.phase
is
Commit
or Decide
, which can lead to safety violations because the next block may then extend
a conflicting branch.
Because extending the Block Sync protocol to return multiple PCs in a SyncResponse
could be
complicated (in particular, it would probably require replicas to store extra state), we instead
solve this issue by deviating from PODC ’19 HotStuff slightly by locking upon receiving a
Commit
or Decide
PC.
§Checking against the Lock
The 3rd predicate of safe_pc
checks whether any received or
collected PC satisfies the lock and therefore is allowed to trigger state updates. This predicate
is exactly the same as the corresponding predicate in the PODC ’19 HotStuff paper, but is simple
enough that we describe it and the rationale behind it fully in the rest of this subsection.
The 3rd predicate comprises of two clauses, joined by an “or”:
- Safety clause:
pc.block
extends fromlocked_pc.block
, or - Liveness clause:
pc.view
is greater thanlocked_pc.view
.
In stable cases–i.e., where in every view, either 1/3rd of replicas lock on the same locked_pc
,
or none lock–the safety clause will always be satisfied. This ensures that the pc
extends the
branch headed by the locked block.
In unstable cases, however, where e.g., messages are dropped or a proposer is faulty, less than
1/3rd but more than zero replicas may lock on the same locked_pc
. If, in this scenario, safe_pc
only comprises of the safety clause and a Block
or Nudge
that conflicts with locked_pc
is
proposed in the next view, only replicas that didn’t lock on locked_pc
in the previous view will
be able to accept the new Block
or Nudge
and make progress, while the replicas that did lock
will be stuck, unable to grow their blockchain further.
This is where the liveness clause comes in. This clause enables the replicas that did lock on the
now “abandoned” PC to eventually accept new Block
s and Nudge
s, and does so by relaxing the
third predicate to allow Block
s and Nudge
s that build on a different branch than the current
locked_pc.block
to cause state updates as long as the PC they contain has a higher view than
locked_pc.view
.
§Committing
As is the case with Locking, Committing in HotStuff-rs follows the same basic logic as committing in PODC ’19 HotStuff, but with a small and nuanced difference. The following two subsections discuss, in turn:
- The conditions in which a block becomes committed, one of the conditions being a “consecutive views requirement” that is more relaxed than the “same views requirement” used in Algorithm 1 of PODC ’19 HotStuff.
- How the algorithm requires that replicas re-propose existing blocks in certain conditions in order to satisfy the consecutive views requirement while still achieving Immediacy.
§Committing a Block
Any time update
is called, blocks should potentially be committed. The block_to_commit
method
in this module decides what blocks should be committed.
Like with pc_to_lock
, the precise logic used by block_to_commit
is documented in
the doc for block_to_commit
. Again, the logic used for
choosing which block to commit in HotStuff-rs is broadly similar as the logic used for choosing
which block to commit in the PODC ’19 HotStuff paper.
In particular, the logic used in HotStuff-rs’ Pipelined Mode is the same as the logic used in Algorithm 3 in PODC ’19 HotStuff; that is, a block should be committed in the Pipelined Mode when it meets two requirements:
- 3-Chain: the block must head a sequence of 3 PCs.
- Consecutive views: the 3 PCs that follow the block must each have consecutively increasing
views, i.e.,
justify3.view == justify2.view + 1 == justify1.view + 2
wherejustify3.block.justify == justify2
,justify2.block.justify == justify1
, andjustify1.block = block
.
The nuanced difference between HotStuff-rs and PODC ’19 HotStuff with regards to block_to_commit
logic has to do with the Phased Mode. Specifically, the difference is that PODC ‘19’s Algorithm 1
requires that Prepare
, Precommit
, and Commit
PCs that follow a block have the same view
number in order for this sequence of PCs to commit the block, whereas on the other hand,
HotStuff-rs’ Phased Mode requires only that these PCs have consecutive view numbers, just
like Pipelined Mode and Algorithm 3.
The underlying reason why the same view requirement is used in PODC ’19’s Algorithm 1 but the strictly less stringent consecutive views requirement is used in Phased Mode is one specific difference between these two algorithms:
- In Algorithm 1, each view is comprised of 3 phases.
- In Phased Mode, each view is comprised of only 1 phase.
The result is that in Phased Mode, Prepare
, Precommit
, and Commit
PCs can never have the
same view number, so if “same view” is a requirement to commit a block in Phased Mode, no block can
ever be committed.
The consecutive views requirement relaxes block_to_commit
enough in order for blocks to be
committed, but does not relax it too far that it would obviate the uniqueness guarantee provided
by locking.
Consider what could happen if we had instead, for example, relaxed the requirement further to just
“increasing views”, and a replica commits a block upon receiving Prepare
, Precommit
, and
Commit
PCs for the block with views 4, 5 and 7. Because 5 and 7 are not contiguous, it could be
the case that in view 6, a quorum of replicas have locked on a conflicting block, so it would be
incorrect to assume that a quorum of replicas is currently locked on the block, and therefore it is
unsafe in this situation to commit the block.
§Ensuring Immediacy
Recall that Immediacy requires validator set updating blocks to be committed by a Commit
PC
before a direct child can be inserted. This requirement, combined with the consecutive views
requirement, creates a challenge for proposers.
Normally, proposers query the highest_pc
and broadcast a Proposal
or Nudge
containing it
to all replicas. When views fail to make progress, however, the current_view
of live replicas may
grow to significantly greater than highest_pc.view
. If in this situation, more than 1/3rd of
replicas have locked on a validator set updating block, proposers must not propose a Nudge
containing the highest PC, since the 4th predicate of safe_nudge
wil prevent honest replicas from voting on it, and hence prevent a quorum for the Nudge
from being
formed.
To make progress in this situation, a proposer must re-propose either the locked block, or a
(possibly new) sibling of the locked block. The implementation in HotStuff-rs chooses to do the
former: the repropose_block
method in this module helps determine whether a proposer should
re-propose a block by considering its current_view
and the local block tree’s highest_view.pc
,
and if it finds that it should re-propose a block, returns the hash of the block that should
be re-proposed so that the proposer can get it from the block tree.
Functions§
- Get the
Block
inblock_tree
(if any) that, along with all of its uncommitted predecessors, should be committed after the replica seesjustify
. - Check whether
pc
belongs to the branch that extends from theblock_tree
’slocked_pc.block
. - Get the PC (if any) that should be set as the Locked PC after the replica sees the given
justify
. - Get the
Block
in theblock_tree
which a leader of thecurrent_view
should re-propose in order to satisfy the Consecutive Views Rule and make progress in the view. - Check whether
block
can safely cause updates toblock_tree
, given the replica’schain_id
. - Check whether
nudge
can safely cause updates toblock_tree
, given the replica’scurrent_view
andchain_id
. - safe_pc 🔒Check whether
pc
can safely cause updates toblock_tree
, given the replica’schain_id
.