near · flmel · Jul 26, 2024 · Sep 19, 2023 · Sep 20, 2023 · Sep 20, 2023
@@ -0,0 +1,231 @@
+---
+NEP: 509
+Title: Stateless validation Stage 0
+Authors: Robin Cheng, Anton Puhach, Alex Logunov, Yoon Hong
+Status: Draft
+DiscussionsTo: https://docs.google.com/document/d/1C-w4FNeXl8ZMd_Z_YxOf30XA1JM6eMDp5Nf3N-zzNWU/edit?usp=sharing, https://docs.google.com/document/d/1TzMENFGYjwc2g5A3Yf4zilvBwuYJufsUQJwRjXGb9Xc/edit?usp=sharing
+Type: Protocol
+Version: 1.0.0
+Created: 2023-09-19
+LastUpdated: 2023-09-19
+---
+
+## Summary
+
+The NEP proposes an solution to achieve phase 2 of sharding (where none of the validators needs to track all shards), with stateless validation, instead of the traditionally proposed approach of fraud proof and state rollback.
+
+The fundamental idea is that validators do not need to have state locally to validate chunks. 
+
+* Under stateless validation, the responsibility of a chunk producer extends to packaging transactions and receipts and annotating them with state witnesses. This extended role will be called "chunk proposers".
+* The state witness of a chunk is defined to be a subset of the trie state, alongside its proof of inclusion in the trie, that is needed to execute a chunk. A state witness allows anyone to execute the chunk without having the state of its shard locally. 
+* Then, at each block height, validators will be randomly assigned to a shard, to validate the state witness for that shard. Once a validator receives both a chunk and its state witness, it verifies the state transition of the chunk, signs a chunk endorsement and sends it to the block producer. This is similar to, but separate from, block approvals and consensus.
+* The block producer waits for sufficient chunk endorsements before including a chunk into the block it produces, or omits the chunk if not enough endorsements arrive in time.
+
+## Motivation
+
+As phase 1 of sharding requires block producers to track all shards due to underlying security concerns, the team explored potential ways to achieve phase 2 of sharding, where none of the validators has to track all shards.
+
+The early design of phase 2 relied on the security assumption that as long as there is one honest validator or fisherman tracking a shard, the shard is secure; by doing so, it naturally relied on protocol's ability to handle challenges (when an honest validator or fisherman detects a malicious behavior and submits a proof of such), state rollbacks (when validators agree that the submitted challenge is valid), and slashing (to punish the malicious validator). While it sounds straightforward and simple on paper, the complex interactions between these abilities and the rest of the protocol led to concrete designs that were extremely complicated, involving several specific problems we still don't know how to solve.
+
+As a result, the team sought alternative approaches and concluded that stateless validation is the most realistic and promising one; the stateless validation approach does not assume the existence of a fishermen, does not rely on challenges, and never rolls back state. Instead, it relies on the assumption that a shard is secure if every single chunk in that shard is validated by a randomly sampled subset of all validators, to always produce valid chunks in the first place.
+
+## Specification
+
+### Assumptions
+
+* In memory trie is enabled - [REF](https://docs.google.com/document/d/1_X2z6CZbIsL68PiFvyrasjRdvKA_uucyIaDURziiH2U/edit?usp=sharing)
+* State sync is enabled (so that nodes can track different shards across epochs)
+* Merkle Patricia Trie continues to be the state trie implementation
+* TBD
+
+### High level requirements
+
+* No validator needs to track all shards.
+* Security of protocol must not degrade.
+  * Validator assignment for both chunk validation and block validation should not create any security vulnerabilities.
+* Block processing time should not take significantly more than what it takes today.
+* Any additional load on network and compute should not negatively affect existing functionalities of any node in the blockchain.
+  * The cost of additional network and compute should be acceptable.
+* Validator rewards should not be reduced.
+* Resharding should still be possible after stateless validation is in place.
+* TBD
+
+### Out of scope
+
+* Data size optimizations such as compression, for both chunk data and state witnesses, except basic optimizations that are practically necessary.
+* Separation of consensus and execution, where consensus runs independently from execution, and validators asynchronously perform state transitions after the transactions are proposed on the consensus layer, for the purpose of amortizing the computation and network transfer time.
+* More shards - this is covered in the resharding project.
+* ZK integration.
+* Underlying data structure change (e.g. verkle tree).
+* Change to validator rewards.
+* TBD
+
+## High level flow
+
+We propose a change to the following parts of the chunk and block production flow:
+
+* When a chunk producer produces a chunk, in addition to collecting transactions and receipts for the chunk, it will also produce a `ChunkStateWitness`.
+  * The `ChunkStateWitness` contains whatever data necessary to prove that this chunk's header should indeed be what is being produced:
+    * As it is today, all fields of the `ShardChunkHeaderInnerV2`, except `tx_root`, are uniquely determined by the blockchain's history based on where the chunk is located (i.e. its parent block and shard ID).
+    * The `tx_root` is based on the list of transactions proposed, which is at the discretion of the chunk producer. However, these transactions must be valid (i.e. the sender accounts have enough balance and the correct nonce, etc.).
+    * This `ChunkStateWitness` proves to anyone, including those who track only block data and no shards, that this chunk header is correct, meaning that the uniquely determined fields are exactly what should be expected, and the discretionary `tx_root` field corresponds to a valid set of transactions.
+  * The `ChunkStateWitness` is not part of the chunk itself; it is distributed separately and is considered transient data.
+* The chunk producer then distributes the `ChunkStateWitness` to a subset of *Chunk Validators* assigned for this shard. This is in addition to, and independent of, the existing chunk distribution logic (implemented by `ShardsManager`) today.
+  * Chunk Validator is a new role described in the "Validator role change" section.
+  * The subset of chunk validators assigned to a shard is determined by a random shuffle, once per block. See the "Validator Shuffling" section.
+* A chunk validator, upon receiving a `ChunkStateWitness`, validates the state witness and determines if the chunk header is indeed correctly produced. If so, it sends a `ChunkEndorsement` to the current block producer.
+  * A `ChunkEndorsement` contains the chunk hash along with a signature proving the endorsement by the chunk validator. It implicitly carries a weight equal to the amount of the chunk validator's stake that is assigned to this shard for this block. (See Chunk Validator Shuffling).
+* As the existing logic is today, the block producer for this block waits until either all chunks are ready, or a timeout occurs, and then proposes a block containing whatever chunks are ready. Now, the notion of readiness here is expanded to also having more than 2/3 of chunk endorsements by weight.
+  * This means that if a chunk does not receive enough chunk endorsements by the timeout, it will not be included in the block. In other words, the block only contains chunks for which there is already a consensus of validity. **This is the key reason why we will no longer need challenges**.
+  * The 2/3 fraction has the denominator being the total stake assigned to validate this shard, *not* the total stake of all validators. See Chunk Validator Shuffling.
+* The block producer, when producing the block, additionally includes the chunk endorsements (at least 2/3 needed for each chunk) in the block's body. The validity of the block is expanded to also having valid 2/3 chunk endorsements for each chunk included in the block.
+  * This necessitates a new block format.
+  * If a block fails validation because of not having the required chunk endorsements, it is considered a block validation failure for the purpose of Doomslug consensus, just like any other block validation failure. In other words, nodes will not apply the block on top of their blockchain, and (block) validators will not endorse the block.
+
+We also propose a change to the validator roles and responsibilities. This is the list of roles after the proposal, with same and new behavior clearly labelled:
+
+* Block producers:
+  * (Same as today) Produce blocks, (new) including waiting for chunk endorsements
+  * (Same as today) Maintain chunk parts (i.e. participates in data availability based on Reed-Solomon erasure encoding)
+  * (Same as today) Do not require tracking any shard
+  * (Same as today) Should have a higher barrier of entry for security reasons (e.g. to make block double signing harder)
+* Chunk producers:
+  * (Same as today) Produce chunks, (new) including producing chunk state witnesses
+  * (New) Distributes state witnesses to chunk validators
+  * (Same as today) Must track the shard it produces the chunk for
+  * (Same as today) Rotate shards across epoch boundaries, (new) but at a lower rate (e.g. 1 week)
+* Block validators:
+  * (Same as today) Validate blocks, (new) including verifying chunk endorsements
+  * (Same as today) Vote for blocks with endorsement or skip messages
+  * (New) No longer require tracking any shard
+  * (Same as today) Must collectively have a majority of all the validator stake, for security reasons.
+* (New) Chunk validators:
+  * Validate state witnesses, and sends chunk endorsements to block producers
+  * Do not require tracking any shard
+  * Must collectively have a majority of all the validator stake, to ensure the security of chunk validation.
+
+See the Validator Role Change section for more details.
+
+## Chunk Validator Shuffling
+
+Chunk validators will be randomly assigned to validate shards, for each block (or as we may decide later, for multiple blocks in a row, if required for performance reasons). A chunk validator may be assigned multiple shards at once, if it has sufficient stake.
+
+Each chunk validator's stake is divided into "mandates". There are full and partial mandates. The amount of stake for a full mandate is a fixed parameter determined by the stake distribution of all validators, and any remaining amount smaller than a full mandate is a partial mandate. A chunk validator therefore has zero or more full mandates plus up to one partial mandate. The list of full mandates and the list of partial mandates are then separately shuffled and partitioned equally (as in, no more than one mandate in difference between any two shards) across the shards. Any mandate assigned to a shard means that the chunk validator who owns the mandate is assigned to validate that shard. Because a chunk validator may have multiple mandates, it may be assigned multiple shards to validate.
+
+We have done research to show that the security of this algorithm is sufficient with a reasonable number of chunk validators and a reasonable number of shards, assuming a reasonable bound for the total stake of malicious nodes. TODO: Include or link to that research here.
+
+## Reference Implementation
+
+TODO: This is essentially going to be describing the exact structure of `ChunkStateWitness`, `ChunkEndorsement`, and describing the exact algorithm to be used for the chunk validator shuffling.
+
+[This technical section is required for Protocol proposals but optional for other categories. A draft implementation should demonstrate a minimal implementation that assists in understanding or implementing this proposal. Explain the design in sufficient detail that:
+
+* Its interaction with other features is clear.
+* Where possible, include a Minimum Viable Interface subsection expressing the required behavior and types in a target programming language. (ie. traits and structs for rust, interfaces and classes for javascript, function signatures and structs for c, etc.)
+* It is reasonably clear how the feature would be implemented.
+* Corner cases are dissected by example.
+* For protocol changes: A link to a draft PR on nearcore that shows how it can be integrated in the current code. It should at least solve the key technical challenges.
+
+The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work.]
+
+## Validator Role Change
+Currently, there are two different types of validators and their responsibilities are as follows:
+|  | Top ~50% validators | Remaining validatiors (Chunk only producers) |
+|-----|:-----:|:----:|
+| block production | Y | N |
+| chunk production | Y | Y |
+| block validation | Y | N |
+
+With stateless validation, this structure does not make sense anymore for several reasons:
+* Chunk production is the most resource consuming activity.
+* (Only) chunk production needs state in memory while other responsibilities can be completed via acquiring state witness
+* Chunk production does not have to be performed by all validators. 
+
+Hence, the most simple proposal is to change Chunk-only producers to Chunk-only validators as follows:
+| | Top ~50% validators | Remaining validatiors (Chunk-only validators) |
+|-----|:-----:|:----:|
+| block production | Y | N |
+| chunk production | Y | N |
+| block validation | Y | N |
+| chunk validation | Y | Y |
+
+Block production and validation remain as responsibility of validators with more stake to maintain the same level of security.
+
+This approach is the most straight forward as it maintains the same grouping as we have today.
+
+Potential improvement to which can lower hardware requirement for more validators is limiting the responsibility of chunk production to top N validators, who are often equipped with powerful machines already.
+|  | Top N validatiors (Chunk proposers) | Top ~50% - N validators | Remaining validators (Chunk-only validators) |
+|-----|:-----:|:----:|:----:|
+| block production | Y | Y | N |
+| chunk production | Y | N | N |
+| block validation | Y | Y | N |
+| chunk validation | Y | Y | N |
+
+## Security Implications
+
+[Explicitly outline any security concerns in relation to the NEP, and potential ways to resolve or mitigate them. At the very least, well-known relevant threats must be covered, e.g. person-in-the-middle, double-spend, XSS, CSRF, etc.]
+
+## Alternatives
+
+[Explain any alternative designs that were considered and the rationale for not choosing them. Why your design is superior?]
+
+## Future possibilities
+
+[Describe any natural extensions and evolutions to the NEP proposal, and how they would impact the project. Use this section as a tool to help fully consider all possible interactions with the project in your proposal. This is also a good place to "dump ideas"; if they are out of scope for the NEP but otherwise related. Note that having something written down in the future-possibilities section is not a reason to accept the current or a future NEP. Such notes should be in the section on motivation or rationale in this or subsequent NEPs. The section merely provides additional information.]
+
+## Consequences
+
+[This section describes the consequences, after applying the decision. All consequences should be summarized here, not just the "positive" ones. Record any concerns raised throughout the NEP discussion.]
+
+### Positive
+
+- p1
+
+### Neutral
+
+- n1
+
+### Negative
+
+- n1
+
+### Backwards Compatibility
+
+[All NEPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. Author must explain a proposes to deal with these incompatibilities. Submissions without a sufficient backwards compatibility treatise may be rejected outright.]
+
+## Unresolved Issues (Optional)
+
+[Explain any issues that warrant further discussion. Considerations
+
+- What parts of the design do you expect to resolve through the NEP process before this gets merged?
+- What parts of the design do you expect to resolve through the implementation of this feature before stabilization?
+- What related issues do you consider out of scope for this NEP that could be addressed in the future independently of the solution that comes out of this NEP?]
+
+## Changelog
+
+[The changelog section provides historical context for how the NEP developed over time. Initial NEP submission should start with version 1.0.0, and all subsequent NEP extensions must follow [Semantic Versioning](https://semver.org/). Every version should have the benefits and concerns raised during the review. The author does not need to fill out this section for the initial draft. Instead, the assigned reviewers (Subject Matter Experts) should create the first version during the first technical review. After the final public call, the author should then finalize the last version of the decision context.]
+
+### 1.0.0 - Initial Version
+
+> Placeholder for the context about when and who approved this NEP version.
+
+#### Benefits
+
+> List of benefits filled by the Subject Matter Experts while reviewing this version:
+
+- Benefit 1
+- Benefit 2
+
+#### Concerns
+
+> Template for Subject Matter Experts review for this version:
+> Status: New | Ongoing | Resolved
+
+|   # | Concern | Resolution | Status |
+| --: | :------ | :--------- | -----: |
+|   1 |         |            |        |
+|   2 |         |            |        |
+
+## Copyright
+
+Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).