Replies: 1 comment
-
Thanks for this! I have a few questions. Per the below, we are effectively allowing nodes to be offline and inactive at unpredictable times. Assuming there is not a deep bench of nodes, wouldn't this leave the network susceptible to finality delays? What downstream effects might this cause (e.g., difficulty wrt recognizing potential node client issues vs. swaths of nodes choosing to be inactive)?
I see you have specifically address the above concern in the incentives section below, but I'm skeptical it's a strong enough incentive to ensure network availability and liveness. Is a small punitive measure necessary?
The below bullets imply there are specific fees for nodes providing RPC services for client apps; is this the intent?
How does the dynamic fee mentioned below work?
I'm also curious about the live-DAG sync; how does it allow nodes to provide optimal service? |
Beta Was this translation helpful? Give feedback.
-
Goal: The goal is to propose a consensus protocol for the decentralized XMTP network that extends the conversation data layer through Merkle CRDT replication. The consensus protocol should ensure liveness, consistency, and overall reliability in the network by incorporating economic and non-economic incentives for the nodes to carry out their responsibilities.
Purpose: This post outlines detailed specifications of the consensus protocol for XMTP, and builds upon the Delegated Proof-of-Stake consensus mechanism idea introduced in the Network Architecture proposal.
Pre-requisite: The Consensus Protocol assumes that topic replication (message gossip) is already handled at the Data Layer.
Responsibilities of Nodes
The consensus mechanism and DPOS rewards are designed to incentivize nodes to carry out their responsibilities as stated below.
Terminology
Protocol
DAG Snapshot and Synchronization
At cut-off time, consensus nodes (CNs) prepare for consensus by creating a temporary snapshot of the Merkle DAG for each topic (aka “consensus-DAG”). The block time, a fixed interval, determines when this snapshot occurs across all nodes. This is measured as every Nth block of Ethereum or an L2 where the state attestation of XMTP happens. Relying on an external blockchain gives consistent time across the XMTP network.
To facilitate a clear explanation, let's assume the block time of XMTP is every 50th Ethereum block (roughly 10 minutes) since the genesis. Consequently, nodes create a consensus-DAG copy every 50th block. This synchronization ensures consistency among nodes and serves as a checkpoint to achieve finality.
It's important to note that the DAG synchronization process varies across nodes due to factors like network topology. The synchronization allows nodes to agree on a consistent state while message sending, replication, and relaying continue unhindered—new messages are replicated for the upcoming block without interruption.
Following the consensus-DAG synchronization, nodes will also independently perform live-DAG sync. This process is separate from consensus-DAG synchronization and allows nodes to provide optimal service to clients and users. It's worth mentioning that further engineering optimizations will be explored to efficiently maintain and synchronize these two DAGs.
Block Construction & Validator Set
After the consensus-DAG synchronization, the CNs proceed to construct the block, and make it accessible to the network through an API endpoint.
For instance, the constructed block can be accessed through the URL format: "URL/2345/constructed_block", where "2345" represents the current block number, universally known across the network.
As the consensus time approaches, which is every N+Δ external block where N is the block at cut-off time and Δ is a fixed delta, the CNs run a validator set selection algorithm. They interact with the staking smart contract to get up-to-date stake details for the top staked nodes and an asynchronous random selection of eligible CNs (irrespective of stake) that meet minimum staking requirements. This requires uniform randomness across the network, which is obtained through the block hash of the N+Δ external block as specified above. Another round of deterministic random selection among the validator set is run to designate one node as the block proposer (leader). It is important to note that participation in consensus is not mandatory for full nodes. Nodes that have not completed synchronization by the consensus time can choose to not participate in that particular round. To ensure an active leader node exists, the validators check the API endpoint of the constructed block for the selected leader. If it is not active, an alternate node is chosen as the leader.
The inclusion of a small number of random nodes in each validation cycle incentivizes all full nodes to actively participate in consensus, regardless of their stake. This encourages frequent synchronization in the hope of being selected as a leader or being able to vote when selected as a validator. Additionally, better performance metrics during consensus allow nodes with lower stake to demonstrate their capabilities to delegators, attracting more delegation and increasing the chance of becoming a top-staked validator and earning higher commission from delegation. The total number of validators will be determined based on the metrics from initial implementation and will be a governance parameter for future updates.
Voting & Commit-Reveal
Validators access the constructed block endpoint of the chosen leader node. If the constructed block matches their own block, they vote "yes" and make their vote available through the vote endpoint, following a commit-reveal scheme. A commit-reveal scheme is required to prevent nodes from simply voting the same as the majority they observe from the public endpoints of other validators.
The commit-reveal scheme obfuscates voting information until it needs to be revealed on-chain. It involves hashing the data with a random salt at the commit phase and storing it on-chain (or at the Oracle) as an attestation of the original data. Eventually, the original data is disclosed on-chain along with the random salt, ensuring transparency and security. This ensures that unless there is collusion, the revealed information remains confidential until the designated disclosure time. The validators that do not reveal the information will be considered as abstained and will not be eligible for rewards.
In the event of an inconsistency, CNs identify the topic(s) with the DAG head inconsistency or determine which topics are missing or in excess. They then proceed to sync again with different nodes on those specific topic(s) to rectify the issue. If a CN discovers that its own copy of the data is at fault during synchronization, it updates its copy accordingly.
This approach allows CNs to identify misbehaving nodes. They can choose to block such nodes instantly, or when they observe repeated faulty data, ensuring consensus integrity. If, even after reasonable synchronization, a validator verifies that its own copy of the data is correct, it can infer that the leader is at fault and vote "no", accordingly.
Vote Aggregation
The staking smart contract collects aggregated votes from all validator node endpoints, either via Oracle or by nodes submitting inputs to the smart contract directly through transactions. The smart contract verifies if a supermajority, defined as 2/3rd or 66.67% of "yes" votes, is reached.
If supermajority (inconclusive vote) is not achieved, the staking smart contract initiates another round of consensus for the same block, with a different leader and a different random selection of nodes. In the event of an inconclusive vote, the staking smart contract penalizes the leader and triggers another round of voting. The penalty deters faulty or malicious leaders from significantly draining network resources. This approach allows CNs to carry out honest and efficient work in block construction and synchronization, while still providing an option to opt-out from publishing the constructed block if they encounter any issues.
Nodes that voted differently from the supermajority or those that did not vote at all will not receive any block rewards. This incentivizes all validators to align with the consensus outcome and participate actively in the voting process.
Note - The Oracle approach as an alternative to transaction submission will be explained later in a separate post
Block Finalization
The smart contract proceeds to publish the finalized block. This serves as the single source of truth for that particular block. Nodes that did not participate in the consensus can compare their data against the finalized block and adjust their inconsistent topic syncs accordingly.
The finalized block is typically published using Ethereum's (or an L2's) 'Transaction Input Data' if gas costs permit. Alternatively, a combination of Chainlink, IPFS/Arweave can be utilized for persistence, with the corresponding URL of the block provided in the 'Input Data' field (details are outside the scope of this proposal).
Block rewards are distributed based on the stake-weight of all agreeing supermajority nodes. The rewarded nodes then share these rewards with their respective delegators, according to the predetermined distribution mechanism. This process ensures the dissemination of the finalized block, establishes a verifiable source of truth for all nodes, and enables the fair distribution of block rewards to participating nodes and their delegators.
Incentives for nodes to carry out their responsibilities
Active participation in replication
Reliable relaying of messages from connected clients (first node)
Validation of messages
Maintaining consistent connectivity, avoiding prolonged periods of idleness or disconnection
Any lags or disconnections for a long time will cause nodes to lose reputation with clients/users, as they can query the finalized block and notice missing items. Additionally, they will be unable to participate in consensus and lose out on block rewards. This ensures the liveness of the nodes.
Honest construction of blocks and voting (for Consensus Nodes)
A Consensus Node can be selected as a leader at any time, and its constructed block can be accessed by validators. Any node attempting to earn rewards simply by voting yes for all blocks and not properly constructing blocks risks penalization when it is selected as a leader. A false vote will cause the Consensus Nodes to lose block rewards. It eventually has to sync with the finalized block to self-validate its consensus-DAG so Consensus Nodes would find it suitable do it before, allowing them to earn the reward.
Abstain from network attacks such as takeovers
Proof of Stake secures the network. An attacker can gain maximum validating power (all top 67% nodes) by buying $XMTP in large quantities and validating fake blocks. However, it comes at the expense of the network. Any inconsistency observed by the community immediately triggers a drop in $XMTP price, causing the attacker to lose the value of their stake.
Storing data for the designated retention period
Nodes can participate in consensus to earn block rewards but not in other activities such as storing messages until the retention period or relaying, while foregoing messaging fees. While these nodes are earning rewards for their work, if we were to encourage storage beyond fees, we propose requiring nodes to attach random old block/topic data, similar to Arweave Proof-of-Access.
Managing network congestion and recovering from continuous consensus failures
The dynamic fee ensures that the network can handle congestion and successfully produce blocks even during consensus failures that can happen due to the inability to reach supermajority because of sync delays arising from congestion.
Enabling verifiability for clients/users:
Beta Was this translation helpful? Give feedback.
All reactions