hotstuff_rs/pacemaker/
mod.rs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
//! Subprotocol for Byzantine view synchronization and leader selection.
//!
//! The documentation for this module is divided into two sections, the first describing how the
//! Pacemaker subprotocol provides [Byzantine View Synchronization](#byzantine-view-synchronization)
//! functionality for HotStuff-rs, and the second describing how it provides
//! [Leader Selection](#leader-selection) functionality.
//!
//! # Byzantine View Synchronization
//!
//! SMR protocols like [HotStuff](super::hotstuff) can only make progress if a quorum of replicas spend
//! a long enough time in the same view in order to go through all of the protocol's voting phases. In
//! other words, state machine replication is only live once **view synchronization** is achieved.
//!
//! Achieving view synchronization is hard, and it is *especially* hard if we have to make any of the
//! following four assumptions, all of which we have to make in order to make a robust BFT SMR solution:
//! 1. **Replicas do not have synchronized clocks**: replicas' local clocks can go out of sync, and
//!    further, replicas cannot rely on any "global" clock such as that provided by the Network Time
//!    Protocol (NTP), since these would be centralized attack targets.
//! 2. **Replicas can fail in arbitrary ways**: replicas can crash, refuse to respond to messages, or
//!    generally behave in malicious ways as if controlled by an adversary.
//! 3. **Message complexity matters**: network bandwidth is limited, so we want to limit the number (and
//!    size) of messages that have to be exchanged in order to achieve view synchronization.
//! 4. **Optimistic responsiveness is desirable**: we want view synchronization to be achieved as quickly
//!    as network delays allow it to be achieved. This eliminates protocols such as "just increment your
//!    local view number every `n` seconds" from our consideration.
//!
//! The view synchronization mechanism used in HotStuff-rs versions 0.3 and prior (as well as mentioned
//! in the original PODC'19 HotStuff paper) is "exponentially increasing view timeouts". This mechanism
//! nominally succeeds in not relying on synchronized clocks, being resistant to Byzantine failures, and
//! having a low message complexity (no messages are ever sent!). However, it completely fails to ensure
//! optimistic responsiveness.
//!
//! The current version of HotStuff-rs (v0.4) achieves view synchronization under all four
//! assumptions by incorporating a brand new **Pacemaker** module (this module) that implements a
//! modified version of the Byzantine View Synchronization algorithm described in
//! [Lewis-Pye (2021)](https://arxiv.org/pdf/2201.01107.pdf).
//!
//! ## High-level idea
//!
//! The modified Lewis-Pye (2021) algorithm implemented by this module divides views into sequences
//! called "epochs". Each of these epochs are of equal length; i.e., if we denote this length as `N`,
//! then:
//! - The 0th epoch would contain views `[0, 1, ..., N-2, N-1]`.
//! - The 1st epoch would contain views `[N, N+1, ..., 2N-2, 2N-1]`.
//! - ...and so on.
//!
//! On top of the delineation of views into epochs described above, the algorithm distinguishes between
//! two kinds of views, "Epoch-Change Views", and "Normal Views":
//! - **Epoch-Change View**: the last view in every epoch (i.e., `{N-1, 2N-1, ...}`).
//! - **Normal View**: every other view.
//!
//! The Pacemaker [`implementation`] works so that views progress in a monotonically increasing fashion,
//! i.e., from a smaller view, to a higher view. However, the ways a replica can leave its current view
//! to enter the next view differ fundamentally depending on what kind of view the current view is.
//! After reading this documentation, you’ll understand how these different ways of leaving a view work
//! together to guarantee the safety and liveness of the Pacemaker subprotocol, as well as HotStuff-rs
//! as a whole.
//!
//! The following three subsections explain the different ways a replica can leave a view:
//! 1. [Leaving a Normal View](#leaving-a-normal-view) lists the two lightweight ways a replica can leave
//!    a normal view.
//! 2. [The need to vote for Epoch Changes](#the-need-to-vote-for-epoch-changes) explains why the
//!    lightweight mechanisms used to leave a normal view are not sufficient in the long run to maintain
//!    view synchronization even if it were initially achieved.
//! 3. [Leaving an Epoch-Change View](#leaving-an-epoch-change-view) picks up from the previous subsection
//!    and describes the heavyweight way a replica can leave an Epoch-Change view, which allows for view
//!    synchronization to be recovered in case it was lost.
//!
//! ### Leaving a Normal View
//!
//! A replica leaves a normal view when one of the following two things happen:
//! 1. It receives a valid `AdvanceView` message containing a
//!    [`PhaseCertificate`](crate::hotstuff::types::PhaseCertificate) for the current view or higher.
//! 2. It has spent longer than a predetermined amount of time in the view (i.e., when its local timer
//!    times out).
//!
//! Both mechanisms for leaving a normal view are "lightweight" in the sense that neither requires a
//! dedicated voting round. The first mechanism reuses `PhaseCertificate`s produced by the HotStuff
//! subprotocol in order to advance to the next view as fast as consensus decisions happen (thereby
//! achieving Optimistic Responsiveness), while the second mechanism serves as a fallback in case the
//! leader of the current view fails to make progress in time (which could happen if the current leader
//! is Byzantine).
//!
//! ### The need to vote for Epoch Changes
//!
//! Recall our assumption that replicas do not have synchronized clocks. This means that if we do not
//! intervene, replicas' local clocks will slowly but surely go out of sync. This progressive
//! desynchronization is manageable as long as the HotStuff subprotocol continues to make new consensus
//! decisions, since in this case `AdvanceView` messages will continue to be delivered, acting as a
//! rudimentary block synchronization mechanism.
//!
//! Inevitably, however, the HotStuff subprotocol *will* fail to make progress for an extended period:
//! network connections will go down, replicas will fail, and messages will fail to get delivered. If
//! this period of no progress continues for long enough, replicas’ clocks will eventually get so out of
//! sync that there will no longer be a quorum of validators currently in any single view. Consensus
//! decisions beyond this point will become impossible, and the blockchain will stop growing forever.
//!
//! To prevent views from getting *too* out of sync, and to help restore view synchronization after it
//! has been lost, leaving an Epoch-Change View after it has timed out requires an explicit voting step,
//! as described in the next section.
//!
//! ### Leaving an Epoch-Change View
//!
//! Just like with Normal Views, a replica still leaves an Epoch-Change view upon receiving a valid
//! `AdvanceView` message containing a `PhaseCertificate` for the current view or higher. The difference
//! between the procedures for leaving the two different kinds of views has to do with what happens when
//! a replica times out in an Epoch-Change View.
//!
//! When a validator times out in an Epoch-Change View, it does not immediately move to the next view.
//! Instead, it broadcasts a `TimeoutVote` message to all `N` leaders of the next epoch, and then waits.
//!
//! Concurrently, when a leader of the next epoch receives a `TimeoutVote`, it buffers it temporarily.
//! Once it has managed to buffer a set of `TimeoutVote`s for the same view from a quorum of replicas in
//! an [active validator set](crate::types::signed_messages::ActiveCollectorPair#active-validator-sets),
//! it collects them in a [`TimeoutCertificate`](types::TimeoutCertificate) and broadcasts it to every
//! replica.
//!   
//! Upon receiving a valid `TimeoutCertificate` for the current view or a higher view, a replica waiting
//! at the end of an Epoch-Change View re-broadcasts the `TimeoutCertificate` to all other replicas and
//! then moves to the view that follows immediately after the view indicated in the certificate.
//!
//! The idea behind this mechanism is that if there is no view which currently contains a quorum of
//! validators, replicas will eventually stop and  "park" themselves in the same epoch-change view and
//! get going again when there's again a quorum of validators in the view ready to proceed.
//!
//! ## Modifications on Lewis-Pye (2021)
//!
//! This module does not implement the protocol specified in Lewis-Pye (2021) exactly, but instead makes
//! a few subtle modifications. These modifications are described in the subsections below alongside the
//! justification for each.
//!
//! ### Modification 1: Terminological differences
//!
//! We call the "epoch e" messages defined in Lewis-Pye (2021) `TimeoutVote`s, and the
//! "EpochCertificate" type as `TimeoutCertificate`s.
//!
//! This is so that the terminology used in the Pacemaker subprotocol aligns better with the terminology
//! used in the HotStuff subprotocol, in which [`PhaseVote`](crate::hotstuff::messages::PhaseVote)s are
//! aggregated into [`PhaseCertificate`](crate::hotstuff::types::PhaseCertificate)s.
//!
//! ### Modification 2: User-configurable epoch length
//!
//! A key part of Lewis-Pye (2021)'s correctness is the property that at **at least 1** validator in any
//! in any epoch is honest (non-Byzantine).
//!
//! Lewis-Pye (2021) guarantees this under **static validator sets** by setting the length of epochs
//! (`N`) to be `f+1`, where `f` is the maximum number of validators that can be Byzantine at any
//! moment. This makes it so that even in the "worst case" where all `f` faulty validators are selected
//! to be leaders of the same epoch, said epoch will still contain `1` honest leader.
//!
//! With **dynamic validator sets**, which HotStuff-rs
//! [supports](crate::app#two-app-mutable-states-app-state-and-validator-set), `f` ceases to be a
//! constant. To guarantee the property that at least 1 validator in any epoch is honest under this
//! setting, two options come into mind:
//! 1. Make the Pacemaker somehow dynamically change epoch lengths.
//! 2. Make the epoch length user-configurable, and advise the user to set it to be a large enough value.
//!
//! HotStuff-rs' Pacemaker goes with the **second option** for for completely pragmatic reasons: if we
//! had gone for the first option, we’d have to create an algorithm that speculates not only about
//! dynamically-changing validator sets, but also dynamically-changing epoch lengths, and intuitively,
//! it seems difficult to create one that satisfies safety and liveness while still remaining simple.
//!
//! A reasonable choice for `N` could be the expected maximum number of validators throughout the
//! lifetime of the blockchain. This is large enough that `f` should remain well below `N` even if the
//! number of validators in the network exceeds expectations, and small enough that it does not affect
//! the asymptotic complexities of the algorithm.
//!
//! ### Modification 3: `AdvanceView` replaces View Certificates (VCs)
//!
//! The "optimistic" mechanism for leaving a normal view in Lewis-Pye (2021) involves a dedicated voting
//! round where replicas send a "view v" message to the leader of view v, and then waits to receive a
//! View Certificate (VC) from said leader.
//!
//! The [corresponding mechanism](#leaving-a-normal-view) in HotStuff-rs replaces VCs with a new message
//! type called `AdvanceView`. These contain existing `PhaseCertificate`s produced by the HotStuff
//! subprotocol.
//!
//! This optimization is sound because the whole purpose of `ViewCertificate`s is to indicate that a
//! quorum of validators are "done" with a specific view; and a `PhaseCertificate` serve this purpose
//! just as well, and without requiring an extra voting round.
//!
//! ### Modification 4: TimeoutVote includes the `highest_tc`
//!
//! In Lewis-Pye (2021), [Epoch-Change views end](#leaving-an-epoch-change-view) with a leader of the
//! next epoch broadcasting a newly-collected `TimeoutCertificate`.
//!
//! `TimeoutCertificate`s are also broadcasted at the end of an Epoch-Change view in HotStuff-rs's
//! Pacemaker subprotocol, but are also **additionally** sent out in another message type:
//! [`TimeoutVote`](messages::TimeoutVote)s. Specifically, `TimeoutVote` has a
//! [`highest_tc`](messages::TimeoutVote::highest_tc) field not present in Lewis-Pye (2021). This is
//! set by the sending validator to be the `TimeoutCertificate` with the highest view number it knows
//! of.
//!
//! Theoretically speaking, including the `highest_tc` in `TimeoutVote`s is unnecessary: in the
//! partially synchronous network model that Lewis-Pye (2021) assumes, message deliveries can be delayed
//! arbitrarily before Global Stabilization Time (GST), but always eventually get delivered, and
//! therefore the all-to-all broadcast of `TimeoutCertificate`s that happen at the end of every
//! Epoch-Change View should be sufficient to keep a quorum of replicas up-to-date about the current
//! epoch.
//!
//! Practically, however, network infrastructure does occasionally suffer catastrophic failures that
//! cause the vast majority of messages between replicas to be lost. `highest_tc` helps Pacemaker be
//! robust against these kinds of failures by giving replicas a "second chance" to receive a
//! `TimeoutCertificate` if they had missed it during an all-to-all broadcast.
//!
//! # Leader Selection
//!
//! The second of the two functionalities that Pacemaker provides to HotStuff-rs is called Leader
//! Selection.
//!
//! The basic requirement that a Leader Selection Mechanism for a generic SMR algorithm must meet
//! is that it must provide a mapping from *Views to Leaders*, i.e., a function with signature
//! `ViewNumber -> VerifyingKey`.
//!
//! Leader Selection for HotStuff-rs, however, comes with three additional requirements:
//! 1. **Support for Dynamic Validator Sets**: the validator set in that replicates a HotStuff-rs
//!    blockchain can [change
//!    dynamically](crate::app#two-app-mutable-states-app-state-and-validator-set) according to the
//!    application's instructions. This means that the function that the Leader Selection mechanism
//!    provides must have the signature `(ViewNumber, ValidatorSet) -> VerifyingKey`.
//! 2. **Frequent rotation**: if a Byzantine replica were to become leader for an extended, continuous
//!    sequence of views, it could harm the network in a variety of ways: ranging from the immediately
//!    apparent (e.g., not proposing any blocks, thereby preventing the blockchain from growing), to
//!    the more insiduous (e.g., refusing to include transactions from specific clients in the blocks
//!    it proposes, i.e., censoring them). To prevent these kinds of service disruptions, our leader
//!    selection mechanism should ensure that leadership of the validator set is rotated among all
//!    validators, ensuring that a Byzantine replica cannot be leader for too long.
//! 3. **Weighted selection**: In most popular Proof of Stake (PoS) blockchain systems, a validator’s stake
//!    determines not only the weight of its votes in consensus decisions but also likelihood of being
//!    selected as the leader for a given view. Likewise, we would like our leader selection mechanism to
//!    generate a leader sequence where each validator appears with a frequency proportional to its Power
//!    relative to the TotalPower of the validator set (i.e., if a validator makes up roughly X% of the
//!    validator set’s total power, then it should be the leader of approximately X% of views).
//!
//! To meet these requirements, Pacemaker's leader selection implementation
//! ([`select_leader`](implementation::select_leader)) uses the Interleaved Weighted Round-Robin
//! ([Interleaved WRR](https://en.wikipedia.org/wiki/Weighted_round_robin#Interleaved_WRR)) algorithm.

pub mod messages;

pub(crate) mod implementation;

pub mod types;