Skip to content

Commit

Permalink
Add (almost complete) EPaxos implementation & fixes (#24)
Browse files Browse the repository at this point in the history
* minor updates to plotting scripts

* add distributed machines helper scripts

* minor updates to plotting scripts

* make toml dependency optional for workflow

* staging progress on physical net expers

* add tidb workload profile results

* staging progress on changed bench parameters

* update gitignore

* minor updates to tla+ formatting

* minor updates to plotting scripts

* minor updates to plotting scripts

* minor updates to plotting scripts

* minor updates to plotting scripts

* add open_tcp_ports.sh script

* add auto iperf & ping script

* staging progress on physical net expers

* staging progress on physical net expers

* staging progress on physical net expers

* finished physical network exper =)

* finished physical network exper =)

* update motivation profiling plots

* add banal formal dump script

* minor updates to plotting scripts

* add WAN delays modeling calculation

* minor updates to WAN perf modeling script

* reorganize scripts into paper-specific folders

* reorganize scripts into paper-specific folders

* mv artifact memos to separate file

* adding SMR-style TLA+ spec

* adding command line TLC helper scripts

* optimizing SMR-style TLA+ spec & linearizability constraint

* completing SMR-style TLA+ spec

* completing SMR-style TLA+ spec

* complete SMR-style MultiPaxos TLA+ spec

* add README to MultiPaxos spec folder

* add SMR-style MultiPaxos spec blog

* minor updates to MultiPaxos spec README

* update SMR-style MultiPaxos spec and TLC script

* add node failure injection to MultiPaxos spec

* update Crossword TLA+ spec

* update Crossword TLA+ spec

* update Crossword TLA+ spec .gitignore

* extending SMR-style MultiPaxos spec

* extending SMR-style MultiPaxos spec

* finish extended SMR-style MultiPaxos spec

* staging progress on Bodega TLA+ spec

* staging progress on Bodega TLA+ spec

* finish Bodega TLA+ spec

* finish Bodega TLA+ spec

* finish Bodega TLA+ spec

* finish Bodega TLA+ spec

* test change

* scanning through the codebase

* scanning through the codebase

* add back committed condition check in Bodega spec

* cleaning up bench client

* add comments on port usage mess

* remove useless perf sim parameters

* minor updates to README

* add server module sync APIs

* fix wal_offset interference bug

* refactor MultiPaxos code to fix Prepare phase bug

* comment out the snapshotting test

* fixing the Prepare phase for RSPaxos

* fixing the Prepare phase for RSPaxos

* fixed the Prepare phase for RSPaxos

* fixed the Prepare phase for Crossword

* fix formatting issues

* updating CloudLab experiments support

* updating CloudLab experiments support

* updating CloudLab experiments support

* updating CloudLab experiments support

* updating CloudLab experiments support

* updating CloudLab experiments support

* updating CloudLab experiments support

* updating CloudLab experiments support

* updating CloudLab experiments support

* add remote_hosts.toml grouping

* restructured scripts utils library

* making better benchmarking scripts

* making better benchmarking scripts

* making better benchmarking scripts

* making better benchmarking scripts

* making better benchmarking scripts

* making better benchmarking scripts

* making better benchmarking scripts

* finish follower read stalenss exper

* enable keyspace partitioning run mode

* enable keyspace partitioning for ChainPaxos

* enable keyspace partitioning for ChainPaxos

* enable keyspace partitioning for ChainPaxos

* enable keyspace partitioning for ChainPaxos

* minor updates to artifact readme

* working on improved YCSB exper

* working on improved YCSB exper

* finished improved YCSB trace exper

* finished improved YCSB trace exper

* update Crossword publish crop scripts

* wrapping up the improved evaluation

* update results archiving script

* add wan quorums model plotting for Bodega

* use k reqs unit for ycsb plot

* staging progress on chain replication impl

* staging progress on chain rep impl

* finished chain replication implementation

* finished chain replication implementation

* finished chain replication implementation

* fixing CI workflow issues

* fixing CI workflow issues

* minor update to README

* finishing final plotting updates for Crossword

* finishing final plotting updates for Crossword

* minor updates to bench script comments

* use clone_from() according to clippy

* update Bodega TLA+ model

* minor updates to cargo metadata

* modifications for code quality improvement

* modifications for code quality improvement

* use static OnceLock for logging identifier

* let formatter reorder imports automatically

* minor updates to README

* minor fix for logging & script ports

* minor updates to README

* minor updates to README

* adding public repo publish script

* adding public repo publish script

* adding public repo publish script

* adding public repo publish script

* adding public repo publish script

* adding public repo publish script

* bump rust dependencies versions

* replace messagepack with bincode

* remove unnecessary bind ports requirements

* bump tokio version and minor plot changes

* update setup scripts

* update setup scripts for cockroach

* update archive_results.sh script

* working on extra expers for reviews

* working on gossiping batching exper

* finish bw utilization exper

* finish bw utilization exper

* add cockroach automation scripts

* enabling TPC-C workload bench for cockroach

* working on cockroach tpcc experiment

* minor updates to comments

* minor updates to mirror script

* working on cockroach tpcc experiment

* working on cockroach tpcc experiment

* correct cockroach tpcc payload profiling

* correct cockroach tpcc payload profiling

* add protocol env var to cockroach script

* improve plot dimensions and spacing

* minor modifications to rscoding util

* minor updates to mirror script

* add RS coding timing logging switch

* minor updates to cockroach script

* minor updates to cockroach script

* adding settings to cockroach scripts

* minor updates to cockroach script

* adding settings to cockroach scripts

* add cockroach auto benchmark script

* finalize Crossword benchmark scripts

* cleaning up helper scripts

* cleaning up helper scripts

* cleaning up helper scripts

* adding zookeeper scripts

* enable zookeeper cluster helper scripts

* remove unnecessary home dir copy in geni scripts

* remove unnecessary home dir copy in geni scripts

* minor updates to toml parsing helper

* minor updates to toml parsing helper

* finishing up cocoroachdb experiment

* adding zookeeper client support

* added zookeeper clients script

* update and finish zookeeper clients

* adding etcd cluster scripts

* adding etcd cluster scripts

* adding etcd cluster scripts

* added etcd clients script

* update bodega tla+ gitignore

* staging progress on leasing module

* staging progress on leasing module

* staging progress on leasing module

* finish leasing module implementation

* staging progress on MultiPaxos leader leases

* staging progress on MultiPaxos leader leases

* better Summerset module tasks code structure

* better heartbeats management & fix Raft metadata serde

* staging progress on MultiPaxos leader leases

* staging progress on MultiPaxos leader leases

* staging progress on MultiPaxos leader leases

* proper mutual_leases test & improve timed unit tests

* minor updates to leaseman module printing

* typo fixes across the codebase

* typo fixes across the codebase

* fix github workflow dependencies issues

* staging progress on near quorum reads

* finish implementation of near quorum reads

* minor updates to near quorum reads logging

* bench client sequential keys & preloading feature

* staging progress on loc_wr_grid bench script

* staging progress on loc_wr_grid bench script

* staging progress on microbenchmark scripts

* add QuorumLeases and Bodega protocols skeletons

* staging progress on QuorumLeases implementation

* add config changes support to client-facing external APIs

* staging progress on QuorumLeases implementation

* staging progress on QuorumLeases implementation

* staging progress on QuorumLeases implementation

* staging progress on QuorumLeases implementation

* staging progress on QuorumLeases implementation

* staging progress on QuorumLeases implementation

* staging progress on QuorumLeases implementation

* fix excessive qleasing actions bug for QuorumLeases

* fix blocking by higher_number at ensure_qlease_cleared

* fix missing NoGrants handling on leader itself

* minor fixes to typos

* add urgent commit notifications option to QuorumLeases

* clean up skeleton code for Bodega

* clean up skeleton code for EPaxos

* add peer_accept_bars guard to leader leases solutions

* minor modifications to comments & structure

* staging progress on EPaxos implementation

* staging progress on EPaxos implementation

* staging progress on EPaxos implementation

* staging progress on EPaxos implementation

* staging progress on EPaxos implementation

* staging progress on EPaxos implementation

* staging progress on EPaxos implementation

* staging progress on EPaxos implementation

* finish EPaxos implementation & various fixes

* fix heartbeater module missing cancellation bug

* fix EPaxos normal operation bugs

* fix EPaxos explicit prepare bugs

* minor updates to README

* minor updates to formatting

* minor updates to README & formatting

* trimming for public repo

---------

Co-authored-by: Guanzhou Hu <[email protected]>
  • Loading branch information
josehu07 and Guanzhou Hu authored Nov 10, 2024
1 parent eaa8738 commit 7e59a61
Show file tree
Hide file tree
Showing 102 changed files with 10,252 additions and 839 deletions.
2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,12 @@ log = { workspace = true }
env_logger = { workspace = true }
async-trait = "0.1"
fixedbitset = { version = "0.5", features = ["serde"] }
atomic_refcell = "0.1"
flashmap = "0.1"
futures = "0.3"
bincode = "1.3"
reed-solomon-erasure = { version = "6.0" }
petgraph = "0.6"
get-size = { version = "0.1", features = ["derive"] }
linreg = "0.2"
statistical = "1.0"
Expand Down
24 changes: 15 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[![Proc tests status](https://github.com/josehu07/summerset/actions/workflows/tests_proc.yml/badge.svg)](https://github.com/josehu07/summerset/actions?query=josehu07%3Atests_proc)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)

Summerset is a distributed, replicated, protocol-generic key-value store supporting a wide range of state machine replication (SMR) protocols for research purposes. More protocols are actively being added.
Summerset is a distributed, replicated, protocol-generic key-value store supporting a wide range of state machine replication (SMR) protocols for research purposes.

<p align="center">
<img width="360" src="./README.png">
Expand All @@ -20,24 +20,30 @@ Summerset is a distributed, replicated, protocol-generic key-value store support
| `RepNothing` | Simplest protocol w/o any replication |
| `SimplePush` | Pushing to peers w/o consistency guarantees |
| `ChainRep` | Bare implementation of Chain Replication ([paper](https://www.cs.cornell.edu/home/rvr/papers/OSDI04.pdf)) |
| `MultiPaxos` | Classic MultiPaxos protocol ([paper](https://www.microsoft.com/en-us/research/uploads/prod/2016/12/paxos-simple-Copy.pdf)) |
| `RSPaxos` | MultiPaxos w/ Reed-Solomon erasure code sharding ([paper](https://madsys.cs.tsinghua.edu.cn/publications/HPDC2014-mu.pdf)) |
| `MultiPaxos` | Classic MultiPaxos ([paper](https://www.microsoft.com/en-us/research/uploads/prod/2016/12/paxos-simple-Copy.pdf)) w/ modern features |
| `EPaxos` | Leaderless-style Egalitarian Paxos ([paper](https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf)) |
| `Raft` | Raft with explicit log and strong leadership ([paper](https://raft.github.io/raft.pdf)) |
| `CRaft` | Raft w/ erasure code sharding and fallback support ([paper](https://www.usenix.org/system/files/fast20-wang_zizhong.pdf)) |
| `RSPaxos` | MultiPaxos w/ RS erasure code sharding ([paper](https://madsys.cs.tsinghua.edu.cn/publications/HPDC2014-mu.pdf)) |
| `CRaft` | Raft w/ erasure code sharding and fallback ([paper](https://www.usenix.org/system/files/fast20-wang_zizhong.pdf)) |
| `QuorumLeases` | Local reads at leaseholders when quiescent ([paper](https://www.cs.cmu.edu/~imoraru/papers/qrl.pdf)) |

Formal TLA+ specification of some protocols are provided in `tla+/`.

More exciting protocols are actively being added!

</details>

<details>
<summary>Why is Summerset different from other codebases...</summary>

- **Async Rust**: Summerset is written in Rust and demonstrates canonical usage of async programming structures backed by the [`tokio`](https://tokio.rs/) framework;
- **Event-based**: Summerset adopts a channel-oriented, event-based system architecture; each replication protocol is basically just a set of event handlers plus a `tokio::select!` loop;
- **Modularized**: Common components of a distributed KV store, e.g. network transport and durable logger, are cleanly separated from each other and connected through channels.
- **Protocol-generic**: With the above two points combined, Summerset is able to support a set of different replication protocols in one codebase, with common functionalities abstracted out.
- **Async Rust**: Summerset is written in Rust and demonstrates canonical usage of async programming structures backed by the [`tokio`](https://tokio.rs/) framework.
- **Channel/event-based**: Summerset adopts a channel-oriented, event-based system architecture; each replication protocol is basically just a set of event handlers plus a `tokio::select!` loop. The entire codebase contains 0 explicit usage of `Mutex`.
- **Modularized**: Common components of a distributed KV store, e.g. network transport and durable logger, as well as protocol-specific parallelism tasks, are cleanly separated from each other and connected through channels. This extends Go's philosophy of doing "synchronization by (low-cost) communication (of ownership transfers)".
- **Protocol-generic**: With the above points combined, Summerset is able (and strives) to support a set of different replication protocols in one codebase, with common functionalities abstracted out, leaving each protocol's implementation concise and to-the-point.

These design choices make protocol implementation in Summerset rather straight-forward and understandable, without making a sacrifice on performance.

These design choices make protocol implementation in Summerset straight-forward and understandable, without any sacrifice on performance. Comments / issues / PRs are always welcome!
Comments / issues / PRs are always welcome!

</details>

Expand Down
67 changes: 58 additions & 9 deletions scripts/distr_clients.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,24 @@
"put_ratio",
"ycsb_trace",
"length_s",
"use_random_keys",
"skip_preloading",
"norm_stdev_ratio",
"unif_interval_ms",
"unif_upper_bound",
],
"tester": ["test_name", "keep_going", "logger_on"],
"mess": ["pause", "resume"],
"tester": [
"test_name",
"keep_going",
"logger_on",
],
"mess": [
"pause",
"resume",
"grantor",
"grantee",
"leader",
],
}


Expand Down Expand Up @@ -88,7 +100,9 @@ def glue_params_str(cli_args, params_list):
return "+".join(params_strs)


def compose_client_cmd(protocol, manager, config, utility, timeout_ms, params, release):
def compose_client_cmd(
protocol, manager, config, utility, timeout_ms, params, release, near_id=None
):
cmd = [f"./target/{'release' if release else 'debug'}/summerset_client"]
cmd += [
"-p",
Expand All @@ -98,7 +112,19 @@ def compose_client_cmd(protocol, manager, config, utility, timeout_ms, params, r
"--timeout-ms",
str(timeout_ms),
]

if config is not None and len(config) > 0:
# if dist_machs is set, near_id will be the node ID that's considered
# the closest to this client
# NOTE: dumb overwriting near_server_id field here for simplicity
if near_id is not None and "near_server_id" in config:
bi = config.index("near_server_id=") + 15
epi = config[bi:].find("+")
esi = config[bi:].find(" ")
ei = epi if esi == -1 else esi if epi == -1 else min(epi, esi)
ei = len(config) if ei == -1 else ei
assert config[bi:ei] == "0"
config = config[:bi] + str(near_id) + config[ei:]
cmd += ["--config", config]

cmd += ["-u", utility]
Expand All @@ -115,6 +141,7 @@ def compose_client_cmd(protocol, manager, config, utility, timeout_ms, params, r
def run_clients(
remotes,
ipaddrs,
hosts,
me,
man,
cd_dir,
Expand All @@ -137,10 +164,9 @@ def run_clients(

# if dist_machs set, put clients round-robinly across this many machines
# starting from me
hosts = list(remotes.keys())
hosts = hosts[hosts.index(me) :] + hosts[: hosts.index(me)]
td_hosts = hosts[hosts.index(me) :] + hosts[: hosts.index(me)]
if dist_machs > 0:
hosts = hosts[:dist_machs]
td_hosts = td_hosts[:dist_machs]

client_procs = []
for i in range(num_clients):
Expand All @@ -153,6 +179,7 @@ def run_clients(
timeout_ms,
params,
release,
near_id=hosts.index(me if dist_machs <= 1 else td_hosts[i % len(td_hosts)]),
)

proc = None
Expand All @@ -161,8 +188,8 @@ def run_clients(
i, cmd, capture_stdout=capture_stdout, cores_per_proc=pin_cores
)
else:
host = hosts[i % len(hosts)]
local_i = i // len(hosts)
host = td_hosts[i % len(td_hosts)]
local_i = i // len(td_hosts)
if host == me:
# run my responsible clients locally
proc = run_process_pinned(
Expand Down Expand Up @@ -263,6 +290,12 @@ def run_clients(
action="store_true",
help="if set, expect there'll be a service halt",
)
parser_bench.add_argument(
"--use_random_keys", action="store_true", help="if set, generate random keys"
)
parser_bench.add_argument(
"--skip_preloading", action="store_true", help="if set, skip preloading phase"
)
parser_bench.add_argument(
"--norm_stdev_ratio", type=float, help="normal dist stdev ratio"
)
Expand Down Expand Up @@ -303,11 +336,26 @@ def run_clients(
parser_mess.add_argument(
"--resume", type=str, help="comma-separated list of servers to resume"
)
parser_mess.add_argument(
"--grantor",
type=str,
help="comma-separated list of servers as configured grantors",
)
parser_mess.add_argument(
"--grantee",
type=str,
help="comma-separated list of servers as configured grantees",
)
parser_mess.add_argument(
"--leader",
type=str,
help="string form of configured leader ID (or empty string)",
)

args = parser.parse_args()

# parse hosts config file
base, repo, _, remotes, _, ipaddrs = utils.config.parse_toml_file(
base, repo, hosts, remotes, _, ipaddrs = utils.config.parse_toml_file(
TOML_FILENAME, args.group
)
cd_dir = f"{base}/{repo}"
Expand Down Expand Up @@ -359,6 +407,7 @@ def run_clients(
client_procs = run_clients(
remotes,
ipaddrs,
hosts,
args.me,
args.man,
cd_dir,
Expand Down
19 changes: 15 additions & 4 deletions scripts/distr_cluster.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import sys
import os
import time
import signal
import argparse

Expand Down Expand Up @@ -28,9 +29,9 @@


class ProtoFeats:
def __init__(self, may_snapshot, has_heartbeats, extra_defaults):
def __init__(self, may_snapshot, has_hb_leader, extra_defaults):
self.may_snapshot = may_snapshot
self.has_heartbeats = has_heartbeats
self.has_hb_leader = has_hb_leader
self.extra_defaults = extra_defaults


Expand All @@ -39,9 +40,11 @@ def __init__(self, may_snapshot, has_heartbeats, extra_defaults):
"SimplePush": ProtoFeats(False, False, None),
"ChainRep": ProtoFeats(False, False, None),
"MultiPaxos": ProtoFeats(True, True, None),
"Raft": ProtoFeats(True, True, None),
"EPaxos": ProtoFeats(True, False, lambda n, _: f"optimized_quorum=true"),
"RSPaxos": ProtoFeats(True, True, lambda n, _: f"fault_tolerance={(n//2)//2}"),
"Raft": ProtoFeats(True, True, None),
"CRaft": ProtoFeats(True, True, lambda n, _: f"fault_tolerance={(n//2)//2}"),
"QuorumLeases": ProtoFeats(True, True, lambda n, _: f"sim_read_lease=false"),
}


Expand Down Expand Up @@ -120,7 +123,7 @@ def config_dict_to_str(d):
if config is not None and len(config) > 0:
config_dict.update(config_str_to_dict(config))

if PROTOCOL_FEATURES[protocol].has_heartbeats and hb_timer_off:
if PROTOCOL_FEATURES[protocol].has_hb_leader and hb_timer_off:
config_dict["disable_hb_timer"] = "true"

return config_dict_to_str(config_dict)
Expand Down Expand Up @@ -208,6 +211,7 @@ def launch_servers(
file_midfix,
fresh_files,
pin_cores,
launch_wait,
):
if num_replicas != len(remotes):
raise ValueError(f"invalid num_replicas: {num_replicas}")
Expand Down Expand Up @@ -258,6 +262,9 @@ def launch_servers(
)
server_procs.append(proc)

if launch_wait:
time.sleep(1) # NOTE: helps enforce server ID assignment

return server_procs


Expand Down Expand Up @@ -311,6 +318,9 @@ def launch_servers(
parser.add_argument(
"--pin_cores", type=int, default=0, help="if > 0, set CPU cores affinity"
)
parser.add_argument(
"--launch_wait", action="store_true", help="if set, wait 1s between launches"
)
parser.add_argument(
"--skip_build", action="store_true", help="if set, skip cargo build"
)
Expand Down Expand Up @@ -414,6 +424,7 @@ def launch_servers(
file_midfix,
not args.keep_files,
args.pin_cores,
args.launch_wait,
)

# register termination signals handler
Expand Down
37 changes: 35 additions & 2 deletions scripts/local_clients.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,24 @@
"put_ratio",
"ycsb_trace",
"length_s",
"use_random_keys",
"skip_preloading",
"norm_stdev_ratio",
"unif_interval_ms",
"unif_upper_bound",
],
"tester": ["test_name", "keep_going", "logger_on"],
"mess": ["pause", "resume"],
"tester": [
"test_name",
"keep_going",
"logger_on",
],
"mess": [
"pause",
"resume",
"grantor",
"grantee",
"leader",
],
}


Expand Down Expand Up @@ -197,6 +209,12 @@ def run_clients(
action="store_true",
help="if set, expect there'll be a service halt",
)
parser_bench.add_argument(
"--use_random_keys", action="store_true", help="if set, generate random keys"
)
parser_bench.add_argument(
"--skip_preloading", action="store_true", help="if set, skip preloading phase"
)
parser_bench.add_argument(
"--norm_stdev_ratio", type=float, help="normal dist stdev ratio"
)
Expand Down Expand Up @@ -237,6 +255,21 @@ def run_clients(
parser_mess.add_argument(
"--resume", type=str, help="comma-separated list of servers to resume"
)
parser_mess.add_argument(
"--grantor",
type=str,
help="comma-separated list of servers as configured grantors",
)
parser_mess.add_argument(
"--grantee",
type=str,
help="comma-separated list of servers as configured grantees",
)
parser_mess.add_argument(
"--leader",
type=str,
help="string form of configured leader ID (or empty string)",
)

args = parser.parse_args()

Expand Down
10 changes: 6 additions & 4 deletions scripts/local_cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@


class ProtoFeats:
def __init__(self, may_snapshot, has_heartbeats, extra_defaults):
def __init__(self, may_snapshot, has_hb_leader, extra_defaults):
self.may_snapshot = may_snapshot
self.has_heartbeats = has_heartbeats
self.has_hb_leader = has_hb_leader
self.extra_defaults = extra_defaults


Expand All @@ -38,9 +38,11 @@ def __init__(self, may_snapshot, has_heartbeats, extra_defaults):
"SimplePush": ProtoFeats(False, False, None),
"ChainRep": ProtoFeats(False, False, None),
"MultiPaxos": ProtoFeats(True, True, None),
"Raft": ProtoFeats(True, True, None),
"EPaxos": ProtoFeats(True, False, lambda n, _: f"optimized_quorum=true"),
"RSPaxos": ProtoFeats(True, True, lambda n, _: f"fault_tolerance={(n//2)//2}"),
"Raft": ProtoFeats(True, True, None),
"CRaft": ProtoFeats(True, True, lambda n, _: f"fault_tolerance={(n//2)//2}"),
"QuorumLeases": ProtoFeats(True, True, lambda n, _: f"sim_read_lease=false"),
}


Expand Down Expand Up @@ -105,7 +107,7 @@ def config_dict_to_str(d):
if config is not None and len(config) > 0:
config_dict.update(config_str_to_dict(config))

if PROTOCOL_FEATURES[protocol].has_heartbeats and hb_timer_off:
if PROTOCOL_FEATURES[protocol].has_hb_leader and hb_timer_off:
config_dict["disable_hb_timer"] = "true"

return config_dict_to_str(config_dict)
Expand Down
Loading

0 comments on commit 7e59a61

Please sign in to comment.