Skip to content

Commit

Permalink
Revert "HotStuff on Steroids"
Browse files Browse the repository at this point in the history
  • Loading branch information
asonnino authored Sep 29, 2021
1 parent 577692b commit 601ef74
Show file tree
Hide file tree
Showing 41 changed files with 112 additions and 2,929 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
[workspace]
members = ["primary", "node", "store", "crypto", "worker", "consensus", "network", "config", "hotstuff"]
members = ["primary", "node", "store", "crypto", "worker", "consensus", "network", "config"]
55 changes: 53 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,60 @@
# Narwhal HotStuff
# Narwhal and Tusk

[![build status](https://img.shields.io/github/workflow/status/facebookresearch/narwhal/Rust/master?style=flat-square&logo=github)](https://github.com/facebookresearch/narwhal/actions)
[![rustc](https://img.shields.io/badge/rustc-1.51+-blue?style=flat-square&logo=rust)](https://www.rust-lang.org)
[![license](https://img.shields.io/badge/license-Apache-blue.svg?style=flat-square)](LICENSE)

The code in this branch is a prototype of Narwhal HotStuff (Hotstuff-over-Narwhal). It supplements the paper [Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus](https://arxiv.org/pdf/2105.11827.pdf) enabling reproducible results. There are no plans to maintain this branch. The [master branch](https://github.com/asonnino/narwhal) contains the most recent and polished version of this codebase.
This repo provides an implementation of [Narwhal and Tusk](https://arxiv.org/pdf/2105.11827.pdf). The codebase has been designed to be small, efficient, and easy to benchmark and modify. It has not been designed to run in production but uses real cryptography ([dalek](https://doc.dalek.rs/ed25519_dalek)), networking ([tokio](https://docs.rs/tokio)), and storage ([rocksdb](https://docs.rs/rocksdb)).

## Quick Start
The core protocols are written in Rust, but all benchmarking scripts are written in Python and run with [Fabric](http://www.fabfile.org/).
To deploy and benchmark a testbed of 4 nodes on your local machine, clone the repo and install the python dependencies:
```
$ git clone https://github.com/facebookresearch/narwhal.git
$ cd narwhal/benchmark
$ pip install -r requirements.txt
```
You also need to install Clang (required by rocksdb) and [tmux](https://linuxize.com/post/getting-started-with-tmux/#installing-tmux) (which runs all nodes and clients in the background). Finally, run a local benchmark using fabric:
```
$ fab local
```
This command may take a long time the first time you run it (compiling rust code in `release` mode may be slow) and you can customize a number of benchmark parameters in `fabfile.py`. When the benchmark terminates, it displays a summary of the execution similarly to the one below.
```
-----------------------------------------
SUMMARY:
-----------------------------------------
+ CONFIG:
Faults: 0 node(s)
Committee size: 4 node(s)
Worker(s) per node: 1 worker(s)
Collocate primary and workers: True
Input rate: 50,000 tx/s
Transaction size: 512 B
Execution time: 19 s
Header size: 1,000 B
Max header delay: 100 ms
GC depth: 50 round(s)
Sync retry delay: 10,000 ms
Sync retry nodes: 3 node(s)
batch size: 500,000 B
Max batch delay: 100 ms
+ RESULTS:
Consensus TPS: 46,478 tx/s
Consensus BPS: 23,796,531 B/s
Consensus latency: 464 ms
End-to-end TPS: 46,149 tx/s
End-to-end BPS: 23,628,541 B/s
End-to-end latency: 557 ms
-----------------------------------------
```

## Next Steps
The next step is to read the paper [Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus](https://arxiv.org/pdf/2105.11827.pdf). It is then recommended to have a look at the README files of the [worker](https://github.com/facebookresearch/narwhal/tree/master/worker) and [primary](https://github.com/facebookresearch/narwhal/tree/master/primary) crates. An additional resource to better understand the Tusk consensus protocol is the paper [All You Need is DAG](https://arxiv.org/abs/2102.08325) as it describes a similar protocol.

The README file of the [benchmark folder](https://github.com/facebookresearch/narwhal/tree/master/benchmark) explains how to benchmark the codebase and read benchmarks' results. It also provides a step-by-step tutorial to run benchmarks on [Amazon Web Services (AWS)](https://aws.amazon.com) accross multiple data centers (WAN).

## License
This software is licensed as [Apache 2.0](LICENSE).
18 changes: 3 additions & 15 deletions benchmark/benchmark/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,6 @@ class Committee:
"authorities: {
"name": {
"stake": 1,
"consensus: {
"consensus_to_consensus": x.x.x.x:x,
},
"primary: {
"primary_to_primary": x.x.x.x:x,
"worker_to_primary": x.x.x.x:x,
Expand Down Expand Up @@ -67,11 +64,6 @@ def __init__(self, addresses, base_port):
self.json = {'authorities': OrderedDict()}
for name, hosts in addresses.items():
host = hosts.pop(0)
consensus_addr = {
'consensus_to_consensus': f'{host}:{port}',
}
port += 1

primary_addr = {
'primary_to_primary': f'{host}:{port}',
'worker_to_primary': f'{host}:{port + 1}'
Expand All @@ -89,7 +81,6 @@ def __init__(self, addresses, base_port):

self.json['authorities'][name] = {
'stake': 1,
'consensus': consensus_addr,
'primary': primary_addr,
'workers': workers_addr
}
Expand Down Expand Up @@ -124,9 +115,6 @@ def ips(self, name=None):

ips = set()
for name in names:
addresses = self.json['authorities'][name]['consensus']
ips.add(self.ip(addresses['consensus_to_consensus']))

addresses = self.json['authorities'][name]['primary']
ips.add(self.ip(addresses['primary_to_primary']))
ips.add(self.ip(addresses['worker_to_primary']))
Expand Down Expand Up @@ -169,15 +157,14 @@ def __init__(self, names, port, workers):
assert all(isinstance(x, str) for x in names)
assert isinstance(port, int)
assert isinstance(workers, int) and workers > 0
addresses = OrderedDict((x, ['127.0.0.1']*(2+workers)) for x in names)
addresses = OrderedDict((x, ['127.0.0.1']*(1+workers)) for x in names)
super().__init__(addresses, port)


class NodeParameters:
def __init__(self, json):
inputs = []
try:
inputs += [json['timeout_delay']]
inputs += [json['header_size']]
inputs += [json['max_header_delay']]
inputs += [json['gc_depth']]
Expand Down Expand Up @@ -216,6 +203,7 @@ def __init__(self, json):
raise ConfigError('Missing input rate')
self.rate = [int(x) for x in rate]


self.workers = int(json['workers'])

if 'collocate' in json:
Expand All @@ -224,7 +212,7 @@ def __init__(self, json):
self.collocate = True

self.tx_size = int(json['tx_size'])

self.duration = int(json['duration'])

self.runs = int(json['runs']) if 'runs' in json else 1
Expand Down
9 changes: 2 additions & 7 deletions benchmark/benchmark/logs.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def __init__(self, clients, primaries, workers, faults=0):
self.faults = faults
if isinstance(faults, int):
self.committee_size = len(primaries) + int(faults)
self.workers = len(workers) // len(primaries)
self.workers = len(workers) // len(primaries)
else:
self.committee_size = '?'
self.workers = '?'
Expand Down Expand Up @@ -107,9 +107,6 @@ def _parse_primaries(self, log):
commits = self._merge_results([tmp])

configs = {
'timeout_delay': int(
search(r'Timeout delay .* (\d+)', log).group(1)
),
'header_size': int(
search(r'Header size .* (\d+)', log).group(1)
),
Expand All @@ -134,7 +131,7 @@ def _parse_primaries(self, log):
}

ip = search(r'booted on (\d+.\d+.\d+.\d+)', log).group(1)

return proposals, commits, configs, ip

def _parse_workers(self, log):
Expand Down Expand Up @@ -191,7 +188,6 @@ def _end_to_end_latency(self):
return mean(latency) if latency else 0

def result(self):
timeout_delay = self.configs[0]['timeout_delay']
header_size = self.configs[0]['header_size']
max_header_delay = self.configs[0]['max_header_delay']
gc_depth = self.configs[0]['gc_depth']
Expand Down Expand Up @@ -219,7 +215,6 @@ def result(self):
f' Transaction size: {self.size[0]:,} B\n'
f' Execution time: {round(duration):,} s\n'
'\n'
f' Timeout delay: {timeout_delay:,} ms\n'
f' Header size: {header_size:,} B\n'
f' Max header delay: {max_header_delay:,} ms\n'
f' GC depth: {gc_depth:,} round(s)\n'
Expand Down
6 changes: 3 additions & 3 deletions benchmark/benchmark/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from itertools import cycle

from benchmark.utils import PathMaker
from benchmark.config import PlotParameters, ConfigError
from benchmark.config import PlotParameters
from benchmark.aggregate import LogAggregator


Expand Down Expand Up @@ -162,8 +162,8 @@ def plot_tps(cls, files, scalability):
def plot(cls, params_dict):
try:
params = PlotParameters(params_dict)
except ConfigError as e:
raise PlotError(e)
except PlotError as e:
raise PlotError('Invalid nodes or bench parameters', e)

# Aggregate the logs.
LogAggregator(params.max_latency).print()
Expand Down
30 changes: 14 additions & 16 deletions benchmark/fabfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,15 @@ def local(ctx, debug=True):
'faults': 0,
'nodes': 4,
'workers': 1,
'rate': 10_000,
'rate': 50_000,
'tx_size': 512,
'duration': 20,
}
node_params = {
'timeout_delay': 500, # ms
'header_size': 1_000, # bytes
'max_header_delay': 200, # ms
'gc_depth': 50, # rounds
'sync_retry_delay': 5_000, # ms
'sync_retry_delay': 10_000, # ms
'sync_retry_nodes': 3, # number of nodes
'batch_size': 500_000, # bytes
'max_batch_delay': 200 # ms
Expand All @@ -38,7 +37,7 @@ def local(ctx, debug=True):


@task
def create(ctx, nodes=6):
def create(ctx, nodes=2):
''' Create a testbed'''
try:
InstanceManager.make().create_instances(nodes)
Expand Down Expand Up @@ -92,24 +91,23 @@ def install(ctx):


@task
def remote(ctx, debug=True):
def remote(ctx, debug=False):
''' Run benchmarks on AWS '''
bench_params = {
'faults': 0,
'nodes': [4],
'workers': 4,
'collocate': False,
'rate': [50_000],
'faults': 3,
'nodes': [10],
'workers': 1,
'collocate': True,
'rate': [10_000, 110_000],
'tx_size': 512,
'duration': 300,
'runs': 1,
'runs': 2,
}
node_params = {
'timeout_delay': 5_000, # ms
'header_size': 1_000, # bytes
'max_header_delay': 200, # ms
'gc_depth': 50, # rounds
'sync_retry_delay': 5_000, # ms
'sync_retry_delay': 10_000, # ms
'sync_retry_nodes': 3, # number of nodes
'batch_size': 500_000, # bytes
'max_batch_delay': 200 # ms
Expand All @@ -125,11 +123,11 @@ def plot(ctx):
''' Plot performance using the logs generated by "fab remote" '''
plot_params = {
'faults': [0],
'nodes': [4],
'workers': [1, 4, 7, 10],
'nodes': [10, 20, 50],
'workers': [1],
'collocate': True,
'tx_size': 512,
'max_latency': [2_000, 2_500]
'max_latency': [3_500, 4_500]
}
try:
Ploter.plot(plot_params)
Expand Down
8 changes: 4 additions & 4 deletions benchmark/settings.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{
"key": {
"name": "aws",
"path": "/Users/asonnino/.ssh/aws"
"name": "aws-fb",
"path": "/Users/asonnino/.ssh/aws-fb.pem"
},
"port": 5000,
"repo": {
"name": "narwhal",
"url": "https://github.com/asonnino/narwhal",
"branch": "narwhal-hs"
"url": "https://github.com/facebookresearch/narwhal",
"branch": "master"
},
"instances": {
"type": "m5d.8xlarge",
Expand Down
30 changes: 0 additions & 30 deletions config/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,6 @@ pub type WorkerId = u32;

#[derive(Deserialize, Clone)]
pub struct Parameters {
/// The timeout delay of the consensus protocol.
pub timeout_delay: u64,
/// The preferred header size. The primary creates a new header when it has enough parents and
/// enough batches' digests to reach `header_size`. Denominated in bytes.
pub header_size: usize,
Expand All @@ -86,7 +84,6 @@ pub struct Parameters {
impl Default for Parameters {
fn default() -> Self {
Self {
timeout_delay: 5_000,
header_size: 1_000,
max_header_delay: 100,
gc_depth: 50,
Expand All @@ -102,8 +99,6 @@ impl Import for Parameters {}

impl Parameters {
pub fn log(&self) {
// NOTE: These log entries are needed to compute performance.
info!("Timeout delay set to {} ms", self.timeout_delay);
info!("Header size set to {} B", self.header_size);
info!("Max header delay set to {} ms", self.max_header_delay);
info!("Garbage collection depth set to {} rounds", self.gc_depth);
Expand All @@ -114,12 +109,6 @@ impl Parameters {
}
}

#[derive(Clone, Deserialize)]
pub struct ConsensusAddresses {
/// Address to receive messages from other consensus nodes (WAN).
pub consensus_to_consensus: SocketAddr,
}

#[derive(Clone, Deserialize)]
pub struct PrimaryAddresses {
/// Address to receive messages from other primaries (WAN).
Expand All @@ -142,8 +131,6 @@ pub struct WorkerAddresses {
pub struct Authority {
/// The voting power of this authority.
pub stake: Stake,
/// The network addresses of the consensus protocol.
pub consensus: ConsensusAddresses,
/// The network addresses of the primary.
pub primary: PrimaryAddresses,
/// Map of workers' id and their network addresses.
Expand Down Expand Up @@ -193,23 +180,6 @@ impl Committee {
(total_votes + 2) / 3
}

/// Returns the consensus addresses of the target consensus node.
pub fn consensus(&self, to: &PublicKey) -> Result<ConsensusAddresses, ConfigError> {
self.authorities
.get(to)
.map(|x| x.consensus.clone())
.ok_or_else(|| ConfigError::NotInCommittee(*to))
}

/// Returns the addresses of all consensus nodes except `myself`.
pub fn others_consensus(&self, myself: &PublicKey) -> Vec<(PublicKey, ConsensusAddresses)> {
self.authorities
.iter()
.filter(|(name, _)| name != &myself)
.map(|(name, authority)| (*name, authority.consensus.clone()))
.collect()
}

/// Returns the primary addresses of the target primary.
pub fn primary(&self, to: &PublicKey) -> Result<PrimaryAddresses, ConfigError> {
self.authorities
Expand Down
Loading

0 comments on commit 601ef74

Please sign in to comment.