-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: initial structure of mdBook including existing content
- Loading branch information
Showing
18 changed files
with
520 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,9 @@ | |
# Custom pre-commit configuration | ||
.custom-pre-commit-config.yaml | ||
|
||
# mdBook | ||
build/ | ||
|
||
# Misc | ||
*.key | ||
.env | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
[book] | ||
authors = ["Mysten Labs <[email protected]>"] | ||
language = "en" | ||
multilingual = false | ||
src = "docs" | ||
title = "Walrus" | ||
|
||
[build] | ||
build-dir = "build" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Walrus | ||
|
||
Welcome to the developer documentation for Walrus, a decentralized storage and availability protocol | ||
designed specifically for large binary files, or "blobs". Walrus focuses on providing a robust | ||
solution for storing unstructured content on decentralized storage nodes while ensuring high | ||
availability and reliability even in the presence of Byzantine faults. | ||
|
||
## Features | ||
|
||
- **Storage and retrieval**: Walrus supports storage operations to write and read blobs. It also | ||
allows anyone to prove that a blob has been stored and is available for retrieval at a later | ||
time. | ||
|
||
- **Cost efficiency**: By utilizing advanced error correction coding, Walrus maintains storage | ||
costs at approximately five times the size of the stored blobs and encoded parts of each blob | ||
are stored on each storage node. This is significantly more cost-effective compared to | ||
traditional full replication methods and much more robust against failures compared to | ||
protocols that only store each blob on a subset of storage nodes. | ||
|
||
- **Integration with Sui blockchain**: Walrus leverages the [Sui](https://github.com/MystenLabs/sui) | ||
for coordination, attesting availability and payments. Storage space can be owned as a resource on | ||
Sui, split, merged, and transferred. Blob storage is represented using storage objects on Sui, and | ||
smart contracts can check whether a blob is available and for how long. | ||
|
||
- **Flexible access**: Users can interact with Walrus through a command-line interface (CLI), | ||
software development kits (SDKs), and web2 HTTP technologies. Walrus is designed to work well | ||
with traditional caches and content distribution networks (CDNs), while ensuring all operations | ||
can also be run using local tools to maximize decentralization. | ||
|
||
## Architecture and operations | ||
|
||
Walrus's architecture ensures that content remains accessible and retrievable even when many | ||
storage nodes are unavailable or malicious. Under the hood it uses modern error correction | ||
techniques based on fast linear fountain codes, augmented to ensure resilience against Byzantine | ||
faults, and a dynamically changing set of storage nodes. The core of Walrus remains simple, and | ||
storage node management and blob certification leverages Sui smart contracts. | ||
|
||
This documentation is split into several parts. The first part provides an overview of the | ||
objectives, security properties, and architecture of the Walrus system. The second part contains | ||
concrete documentation on the usage of Walrus. At the end, we provide a [glossary](./glossary.md), | ||
which defines key terms used throughout the project. | ||
|
||
Walrus is architected to provide a reliable and cost-effective solution for large-scale blob | ||
storage, making it an ideal choice for applications requiring decentralized, affordable, durable, | ||
and accessible data storage. | ||
|
||
## Sources | ||
|
||
This documentation is built using [mdBook](https://rust-lang.github.io/mdBook/) from source files in | ||
[github.com/MystenLabs/walrus-docs/](https://github.com/MystenLabs/walrus-docs/). Please report or | ||
fix any errors you find in this documentation in that GitHub project. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
<!-- markdownlint-disable MD025 MD042 --> | ||
# Summary | ||
|
||
[Walrus](./README.md) | ||
|
||
--- | ||
|
||
# Overview | ||
|
||
- [Objectives and use-cases](./objectives_use_cases.md) | ||
- [Overview](./overview.md) | ||
- [Properties](./properties.md) | ||
- [Architecture](./architecture.md) | ||
- [Encoding](./encoding.md) | ||
- [Operations](./operations.md) | ||
- [Sui operations](./operations-sui.md) | ||
- [Off-chain operations](./operations-off-chain.md) | ||
- [Future discussion](./future.md) | ||
|
||
# Usage | ||
|
||
- [Setup]() | ||
- [Prerequisites]() | ||
- [Installation]() | ||
- [Configuration]() | ||
- [Interacting with Walrus]() | ||
- [Using the client CLI]() | ||
- [Using the client JSON API]() | ||
- [Using the client daemon]() | ||
- [Examples]() | ||
|
||
# Walrus sites | ||
|
||
--- | ||
|
||
[Glossary](./glossary.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Basic architecture and security assumptions | ||
|
||
The key actors in the Walrus architecture are the following: | ||
|
||
- **Users** through **clients** want to store and read **blobs**. They are ready to pay for service | ||
when it comes to writes, and when it comes to non-best-effort reads. Users also want to prove | ||
the **availability** of a blob to third parties without the cost of sending or receiving the full | ||
blob. Users may be malicious in various ways: they may wish to not pay for services, prove the | ||
availability of an unavailable blobs, or modify / delete blobs without authorization, try to | ||
exhaust resources of storage nodes, etc. | ||
- **Storage nodes** hold one or many **shards** within a **storage epoch**. Each blob is erasure | ||
encoded in many **slivers** and slivers from each stored blob become part of all shards. A shard | ||
at any storage epoch is associated with a **storage node** that actually stores all slivers of | ||
the shard, and is ready to serve them. The assignment of storage nodes to shards within | ||
**storage epochs** is controlled by a Sui smart contract and we assume that more than 2/3 of the | ||
shards are managed by correct storage nodes within each storage epoch. This means that we must | ||
tolerate up to 1/3 Byzantine storage nodes within each storage epoch and across storage epochs. | ||
- All clients and storage nodes operate a **blockchain** client (specifically on Sui), and mediate | ||
payments, resources (space), mapping of shards to storage nodes, and metadata through blockchain | ||
smart contracts. Users interact with the blockchain to get storage resources and certify stored | ||
blobs, and storage nodes listen to the blockchain events to coordinate their operations. | ||
|
||
Walrus supports any additional number of optional infrastructure actors that can operate in a | ||
permissionless way: | ||
|
||
- **Caches** are **clients** that store one or more full blobs and make them available to users | ||
over traditional web2 (HTTP, etc) technologies. They are optional in that end-users may also | ||
operate a local cache, and perform Walrus reads over web2 technologies locally. However, cache | ||
infrastructures may also act as CDNs, share the cost of blob reconstruction over many requests, | ||
have better connectivity, etc. A client can always verify that reads from such infrastructures | ||
are correct. | ||
- **Publishers** are **clients** that help end-users store a blob using web2 technologies, and | ||
using less bandwidth and custom logic. They in effect receive the blob to be published. over | ||
traditional web2 protocols (e.g., HTTP), and perform the Walrus store protocol on their behalf, | ||
including the encoding, distribution of slivers to shards, creation of certificate of certificate, | ||
and other on-chain actions. They are optional in that a user may directly interact with both Sui | ||
and storage nodes to store blobs directly. An end user can always verify that a publisher | ||
performed their duties correctly by attesting availability. | ||
|
||
Caches, publishers, and end-users are not considered trusted components of the system, and they may | ||
deviate from the protocol arbitrarily. However, some of the security properties of Walrus only hold | ||
for honest end-users that use honest intermediaries (caches and publishers). We provide means for | ||
end-users to audit the correct operation of both caches and publishers. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Encoding, overheads, and verification | ||
|
||
We summarize here the basic encoding and cryptographic techniques used in Walrus. | ||
|
||
- **Storage nodes** hold one or many **shards** in a storage epoch, out of a larger total (say 1000) | ||
and each shard contains one blob **sliver** for each blob past PoA. Each shard is assigned to a | ||
storage node in a storage epoch. | ||
- An [erasure code](https://en.wikipedia.org/wiki/Online_codes) **encode algorithm** takes a blob, | ||
and encodes it as $K$ symbols, such that any fraction $p$ of symbols can be used to reconstruct | ||
the blob. Each blob sliver contains a fixed number of such symbols. | ||
- We select $p<1/3$ so that a third of symbols and also slivers may be used to reconstruct the blob | ||
by the **decode algorithm**. The matrix used to produce the erasure code is fixed and the same | ||
for all blobs by the Walrus system, and encoders have no discretion about it. | ||
- Storage nodes manage one or more shards, and corresponding sliver of each blob are distributed | ||
to all the storage shards. As a result, the overhead of the distributed store is ~5x that of | ||
the blob itself, no matter how many shards we have. The encoding is systematic meaning that some | ||
storage nodes hold part of the plain blob, allowing for fast random access reads. | ||
|
||
Each blob is also associated with some metadata including a blob ID to allow verification: | ||
|
||
- A blob ID is computed as an authenticator of the set of all shard data and metadata (byte size, | ||
encoding, blob hash). We hash a sliver representation in each of the shards and add the resulting | ||
hashes into a Merkle tree. Then the root of the Merkle tree is the blob hash used to derive the | ||
blob ID that identifies the blob in the system. | ||
- Each storage node may use the blob ID to check if some shard data belongs to a blob using the | ||
authenticated structure corresponding to the blob hash (Merkle tree). A successful check means | ||
that the data is indeed as intended by the writer of the blob (who, remember, may be corrupt). | ||
- When any party reconstructs a blob ID from shards data and slivers, or accepts any blob purporting | ||
to be a specific blob ID, it must check that it encodes to the correct blob ID. This process | ||
involves re-coding the blob using the erasure correction code, and re-deriving the blob ID to | ||
check the blob indeed matches it. This prevents a malformed blob (i.e., incorrectly erasure coded) | ||
from ever being read with a blob ID at any correct recipient. | ||
- A set of slivers above the reconstruction threshold belonging to a blob ID that are either | ||
inconsistent or lead to the reconstruction of a different ID represent an incorrect encoding | ||
(this may happen if the user that encoded the blob was malicious and encoded it incorrectly). | ||
Storage nodes may delete slivers belonging to inconsistently encoded blobs, and upon request | ||
return an inconsistency proof. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Future discussion | ||
|
||
In this document, we left out details of the following features: | ||
|
||
- Shard transfer and recovery upon storage epoch change. The encoding scheme used has been designed | ||
to allow this operation to be efficient. A storage node needs to only get data of the same | ||
magnitude to the missing sliver data to reconstruct them. | ||
- Details of light clients that can be used to sample availability. Individual clients may sample | ||
the certified blobs from Sui metadata, and sample the availability of some slivers that they | ||
store. On-chain bounties may be used to retrieve these slivers for missing blobs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Walrus Glossary | ||
|
||
To make communication as clear and efficient as possible, we make sure to use a single term for | ||
every Walrus entity/concept and *do not* use any synonyms. The following table lists various | ||
concepts, their canonical name, how they relate to / differ from other terms. | ||
|
||
Italicized terms in the description indicate other specific Walrus terms contained in the table. | ||
|
||
| Approved name | Description | | ||
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| storage node (SN) | entity storing data for Walrus; holds one or several *shards* | | ||
| blob | single unstructured data object stored on Walrus | | ||
| shard | (disjoint) subset of erasure-encoded data of all *blobs*; at every point in time, a *shard* is assigned to and stored on a single *SN* | | ||
| sliver | erasure-encoded data of one *shard* corresponding to a single blob for one of the two encodings; this contains several erasure-encoded symbols of that blob but not the *blob metadata* | | ||
| blob ID | cryptographic ID computed from a *blob*’s *slivers* | | ||
| blob metadata | metadata of one *blob*; in particular, this contains a hash per *shard* to enable the authentication of *slivers* and recovery symbols | | ||
| (end) user | any entity/person that wants to store or read *blobs* on/from Walrus; can act as a Walrus client itself or use the simple interface exposed by *publishers* and *caches* | | ||
| publisher | service interacting with Sui and the *SNs* to store *blobs* on Walrus; offers a simple HTTP POST endpoint to *end users* | | ||
| aggregator | service that reconstructs *blobs* by interacting with *SNs* and exposes a simple HTTP GET endpoint to *end users* | | ||
| cache | an *aggregator* with additional caching capabilities | | ||
| (Walrus) client | entity interacting directly with the *SNs*; this can be an *aggregator*/*cache*, a *publisher*, or an *end user* | | ||
| (blob) reconstruction | decoding of the primary *slivers* to obtain the blob; includes re-encoding the *blob* and checking the Merkle proofs | | ||
| (shard/sliver) recovery | process of an SN recovering a *sliver* or full *shard* by obtaining recovery symbols from other *SNs* | | ||
| storage attestation | process where *SNs* exchange challenges and responses to demonstrate that they are storing their currently assigned *shards* | | ||
| certificate of availability (CoA) | a *blob ID* with signatures of *SNs* holding at least $2f+1$ *shards* in a specific *epoch* | | ||
| point of availability (PoA) | point in time when a *CoA* is submitted to Sui and the corresponding *blob* is guaranteed to be available until its expiration | | ||
| inconsistency proof | set of several recovery symbols with their Merkle proofs such that the decoded *sliver* does not match the corresponding hash; this proves an incorrect/inconsistent encoding by the client | | ||
| inconsistency certificate | an aggregated signature from 2/3 of *SNs* (weighted by their number of *shards*) that they have seen and stored an *inconsistency proof* for a *blob ID* | | ||
| storage committee | the set of *SNs* for a *storage epoch*, including metadata about the *shards* they are responsible for and other metadata | | ||
| member | an *SN* that is part of a *committee* at some *epoch* | | ||
| storage epoch | the epoch for Walrus as distinct to the epoch for Sui | | ||
| availability period | the period specified in *storage epochs* for which a *blob* is certified to be available on Walrus | |
Oops, something went wrong.