Skip to content

Commit

Permalink
feat: initial structure of mdBook including existing content
Browse files Browse the repository at this point in the history
  • Loading branch information
mlegner committed Jun 5, 2024
1 parent 1fc5958 commit 5fad5c1
Show file tree
Hide file tree
Showing 18 changed files with 520 additions and 5 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
name: Check spelling
steps:
- uses: actions/checkout@v4
- uses: crate-ci/typos@v1.21.0
- uses: crate-ci/typos@v1.22.0

check-all:
name: Check if all lint jobs succeeded
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
# Custom pre-commit configuration
.custom-pre-commit-config.yaml

# mdBook
build/

# Misc
*.key
.env
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ repos:
- id: markdownlint-cli2
args: ["--fix"]
- repo: https://github.com/crate-ci/typos
rev: v1.21.0
rev: v1.22.0
hooks:
- id: typos
pass_filenames: false
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,21 @@
# Documentation and examples for the Walrus decentralized storage
# The Walrus decentralized blob storage system

Walrus is a decentralized blob store using [Sui](https://github.com/MystenLabs/sui) for coordination
and governance.
Welcome to the GitHub repository for Walrus, a decentralized storage and availability protocol
designed specifically for large binary files, or "blobs". Walrus focuses on providing a robust
solution for storing unstructured content on decentralized storage nodes while ensuring high
availability and reliability even in the presence of Byzantine faults.

## Documentation

Our documentation is available at **TBD**; it is generated using
[mdBook](https://rust-lang.github.io/mdBook/) from source files in the [`docs/`](./docs/) directory.

You can also build and access the documentation locally (assuming you have Rust installed):

```sh
cargo install mdbook
mdbook serve
```

## Get help and report issues

Expand Down
9 changes: 9 additions & 0 deletions book.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[book]
authors = ["Mysten Labs <[email protected]>"]
language = "en"
multilingual = false
src = "docs"
title = "Walrus"

[build]
build-dir = "build"
51 changes: 51 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Walrus

Welcome to the developer documentation for Walrus, a decentralized storage and availability protocol
designed specifically for large binary files, or "blobs". Walrus focuses on providing a robust
solution for storing unstructured content on decentralized storage nodes while ensuring high
availability and reliability even in the presence of Byzantine faults.

## Features

- **Storage and retrieval**: Walrus supports storage operations to write and read blobs. It also
allows anyone to prove that a blob has been stored and is available for retrieval at a later
time.

- **Cost efficiency**: By utilizing advanced error correction coding, Walrus maintains storage
costs at approximately five times the size of the stored blobs and encoded parts of each blob
are stored on each storage node. This is significantly more cost-effective compared to
traditional full replication methods and much more robust against failures compared to
protocols that only store each blob on a subset of storage nodes.

- **Integration with Sui blockchain**: Walrus leverages the [Sui](https://github.com/MystenLabs/sui)
for coordination, attesting availability and payments. Storage space can be owned as a resource on
Sui, split, merged, and transferred. Blob storage is represented using storage objects on Sui, and
smart contracts can check whether a blob is available and for how long.

- **Flexible access**: Users can interact with Walrus through a command-line interface (CLI),
software development kits (SDKs), and web2 HTTP technologies. Walrus is designed to work well
with traditional caches and content distribution networks (CDNs), while ensuring all operations
can also be run using local tools to maximize decentralization.

## Architecture and operations

Walrus's architecture ensures that content remains accessible and retrievable even when many
storage nodes are unavailable or malicious. Under the hood it uses modern error correction
techniques based on fast linear fountain codes, augmented to ensure resilience against Byzantine
faults, and a dynamically changing set of storage nodes. The core of Walrus remains simple, and
storage node management and blob certification leverages Sui smart contracts.

This documentation is split into several parts. The first part provides an overview of the
objectives, security properties, and architecture of the Walrus system. The second part contains
concrete documentation on the usage of Walrus. At the end, we provide a [glossary](./glossary.md),
which defines key terms used throughout the project.

Walrus is architected to provide a reliable and cost-effective solution for large-scale blob
storage, making it an ideal choice for applications requiring decentralized, affordable, durable,
and accessible data storage.

## Sources

This documentation is built using [mdBook](https://rust-lang.github.io/mdBook/) from source files in
[github.com/MystenLabs/walrus-docs/](https://github.com/MystenLabs/walrus-docs/). Please report or
fix any errors you find in this documentation in that GitHub project.
36 changes: 36 additions & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
<!-- markdownlint-disable MD025 MD042 -->
# Summary

[Walrus](./README.md)

---

# Overview

- [Objectives and use-cases](./objectives_use_cases.md)
- [Overview](./overview.md)
- [Properties](./properties.md)
- [Architecture](./architecture.md)
- [Encoding](./encoding.md)
- [Operations](./operations.md)
- [Sui operations](./operations-sui.md)
- [Off-chain operations](./operations-off-chain.md)
- [Future discussion](./future.md)

# Usage

- [Setup]()
- [Prerequisites]()
- [Installation]()
- [Configuration]()
- [Interacting with Walrus]()
- [Using the client CLI]()
- [Using the client JSON API]()
- [Using the client daemon]()
- [Examples]()

# Walrus sites

---

[Glossary](./glossary.md)
43 changes: 43 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Basic architecture and security assumptions

The key actors in the Walrus architecture are the following:

- **Users** through **clients** want to store and read **blobs**. They are ready to pay for service
when it comes to writes, and when it comes to non-best-effort reads. Users also want to prove
the **availability** of a blob to third parties without the cost of sending or receiving the full
blob. Users may be malicious in various ways: they may wish to not pay for services, prove the
availability of an unavailable blobs, or modify / delete blobs without authorization, try to
exhaust resources of storage nodes, etc.
- **Storage nodes** hold one or many **shards** within a **storage epoch**. Each blob is erasure
encoded in many **slivers** and slivers from each stored blob become part of all shards. A shard
at any storage epoch is associated with a **storage node** that actually stores all slivers of
the shard, and is ready to serve them. The assignment of storage nodes to shards within
**storage epochs** is controlled by a Sui smart contract and we assume that more than 2/3 of the
shards are managed by correct storage nodes within each storage epoch. This means that we must
tolerate up to 1/3 Byzantine storage nodes within each storage epoch and across storage epochs.
- All clients and storage nodes operate a **blockchain** client (specifically on Sui), and mediate
payments, resources (space), mapping of shards to storage nodes, and metadata through blockchain
smart contracts. Users interact with the blockchain to get storage resources and certify stored
blobs, and storage nodes listen to the blockchain events to coordinate their operations.

Walrus supports any additional number of optional infrastructure actors that can operate in a
permissionless way:

- **Caches** are **clients** that store one or more full blobs and make them available to users
over traditional web2 (HTTP, etc) technologies. They are optional in that end-users may also
operate a local cache, and perform Walrus reads over web2 technologies locally. However, cache
infrastructures may also act as CDNs, share the cost of blob reconstruction over many requests,
have better connectivity, etc. A client can always verify that reads from such infrastructures
are correct.
- **Publishers** are **clients** that help end-users store a blob using web2 technologies, and
using less bandwidth and custom logic. They in effect receive the blob to be published. over
traditional web2 protocols (e.g., HTTP), and perform the Walrus store protocol on their behalf,
including the encoding, distribution of slivers to shards, creation of certificate of certificate,
and other on-chain actions. They are optional in that a user may directly interact with both Sui
and storage nodes to store blobs directly. An end user can always verify that a publisher
performed their duties correctly by attesting availability.

Caches, publishers, and end-users are not considered trusted components of the system, and they may
deviate from the protocol arbitrarily. However, some of the security properties of Walrus only hold
for honest end-users that use honest intermediaries (caches and publishers). We provide means for
end-users to audit the correct operation of both caches and publishers.
Binary file added docs/assets/WriteFlow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 37 additions & 0 deletions docs/encoding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Encoding, overheads, and verification

We summarize here the basic encoding and cryptographic techniques used in Walrus.

- **Storage nodes** hold one or many **shards** in a storage epoch, out of a larger total (say 1000)
and each shard contains one blob **sliver** for each blob past PoA. Each shard is assigned to a
storage node in a storage epoch.
- An [erasure code](https://en.wikipedia.org/wiki/Online_codes) **encode algorithm** takes a blob,
and encodes it as $K$ symbols, such that any fraction $p$ of symbols can be used to reconstruct
the blob. Each blob sliver contains a fixed number of such symbols.
- We select $p<1/3$ so that a third of symbols and also slivers may be used to reconstruct the blob
by the **decode algorithm**. The matrix used to produce the erasure code is fixed and the same
for all blobs by the Walrus system, and encoders have no discretion about it.
- Storage nodes manage one or more shards, and corresponding sliver of each blob are distributed
to all the storage shards. As a result, the overhead of the distributed store is ~5x that of
the blob itself, no matter how many shards we have. The encoding is systematic meaning that some
storage nodes hold part of the plain blob, allowing for fast random access reads.

Each blob is also associated with some metadata including a blob ID to allow verification:

- A blob ID is computed as an authenticator of the set of all shard data and metadata (byte size,
encoding, blob hash). We hash a sliver representation in each of the shards and add the resulting
hashes into a Merkle tree. Then the root of the Merkle tree is the blob hash used to derive the
blob ID that identifies the blob in the system.
- Each storage node may use the blob ID to check if some shard data belongs to a blob using the
authenticated structure corresponding to the blob hash (Merkle tree). A successful check means
that the data is indeed as intended by the writer of the blob (who, remember, may be corrupt).
- When any party reconstructs a blob ID from shards data and slivers, or accepts any blob purporting
to be a specific blob ID, it must check that it encodes to the correct blob ID. This process
involves re-coding the blob using the erasure correction code, and re-deriving the blob ID to
check the blob indeed matches it. This prevents a malformed blob (i.e., incorrectly erasure coded)
from ever being read with a blob ID at any correct recipient.
- A set of slivers above the reconstruction threshold belonging to a blob ID that are either
inconsistent or lead to the reconstruction of a different ID represent an incorrect encoding
(this may happen if the user that encoded the blob was malicious and encoded it incorrectly).
Storage nodes may delete slivers belonging to inconsistently encoded blobs, and upon request
return an inconsistency proof.
10 changes: 10 additions & 0 deletions docs/future.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Future discussion

In this document, we left out details of the following features:

- Shard transfer and recovery upon storage epoch change. The encoding scheme used has been designed
to allow this operation to be efficient. A storage node needs to only get data of the same
magnitude to the missing sliver data to reconstruct them.
- Details of light clients that can be used to sample availability. Individual clients may sample
the certified blobs from Sui metadata, and sample the availability of some slivers that they
store. On-chain bounties may be used to retrieve these slivers for missing blobs.
32 changes: 32 additions & 0 deletions docs/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Walrus Glossary

To make communication as clear and efficient as possible, we make sure to use a single term for
every Walrus entity/concept and *do not* use any synonyms. The following table lists various
concepts, their canonical name, how they relate to / differ from other terms.

Italicized terms in the description indicate other specific Walrus terms contained in the table.

| Approved name | Description |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| storage node (SN) | entity storing data for Walrus; holds one or several *shards* |
| blob | single unstructured data object stored on Walrus |
| shard | (disjoint) subset of erasure-encoded data of all *blobs*; at every point in time, a *shard* is assigned to and stored on a single *SN* |
| sliver | erasure-encoded data of one *shard* corresponding to a single blob for one of the two encodings; this contains several erasure-encoded symbols of that blob but not the *blob metadata* |
| blob ID | cryptographic ID computed from a *blob*’s *slivers* |
| blob metadata | metadata of one *blob*; in particular, this contains a hash per *shard* to enable the authentication of *slivers* and recovery symbols |
| (end) user | any entity/person that wants to store or read *blobs* on/from Walrus; can act as a Walrus client itself or use the simple interface exposed by *publishers* and *caches* |
| publisher | service interacting with Sui and the *SNs* to store *blobs* on Walrus; offers a simple HTTP POST endpoint to *end users* |
| aggregator | service that reconstructs *blobs* by interacting with *SNs* and exposes a simple HTTP GET endpoint to *end users* |
| cache | an *aggregator* with additional caching capabilities |
| (Walrus) client | entity interacting directly with the *SNs*; this can be an *aggregator*/*cache*, a *publisher*, or an *end user* |
| (blob) reconstruction | decoding of the primary *slivers* to obtain the blob; includes re-encoding the *blob* and checking the Merkle proofs |
| (shard/sliver) recovery | process of an SN recovering a *sliver* or full *shard* by obtaining recovery symbols from other *SNs* |
| storage attestation | process where *SNs* exchange challenges and responses to demonstrate that they are storing their currently assigned *shards* |
| certificate of availability (CoA) | a *blob ID* with signatures of *SNs* holding at least $2f+1$ *shards* in a specific *epoch* |
| point of availability (PoA) | point in time when a *CoA* is submitted to Sui and the corresponding *blob* is guaranteed to be available until its expiration |
| inconsistency proof | set of several recovery symbols with their Merkle proofs such that the decoded *sliver* does not match the corresponding hash; this proves an incorrect/inconsistent encoding by the client |
| inconsistency certificate | an aggregated signature from 2/3 of *SNs* (weighted by their number of *shards*) that they have seen and stored an *inconsistency proof* for a *blob ID* |
| storage committee | the set of *SNs* for a *storage epoch*, including metadata about the *shards* they are responsible for and other metadata |
| member | an *SN* that is part of a *committee* at some *epoch* |
| storage epoch | the epoch for Walrus as distinct to the epoch for Sui |
| availability period | the period specified in *storage epochs* for which a *blob* is certified to be available on Walrus |
Loading

0 comments on commit 5fad5c1

Please sign in to comment.