Technical details

cubist is designed around content-addressable storage, where 256-bit hashes (generated via BLAKE3) are used pervasively to reference objects in both the object storage layer and internal data structures.

File trees

Each node of a file tree is one of:

a file that is empty or references a block tree by its hash
a symlink that references another file by its path
a directory that contains zero or more child nodes

Each node contains the following metadata:

inode
mode
group
owner
accessed
created
modified

Block trees

Each node in a block tree is one of:

a leaf block (level 0), containing data compressed using Zstandard and referenced by the hash of this data
a branch block (level N >= 1), containing hashes that reference nodes of level N-1 and referenced by the hash of all its constituent hashes

Files

Files are split up into leaf blocks in a streaming fashion using the FastCDC v2020. Due to the nature of content-defined chunking, block size is specified as a range instead of a single number, which allows the algorithm to split the file at relatively consistent points, in contrast to fixed-size chunking in which any inserted or deleted data results in a completely different stream of blocks. Specifying --target-block-size=<N> will result block sizes in the range [N / 2, N * 4].

The default block size is 1 MiB, which is selected to compress well, work well with block storage systems, and minimize the number of requests necessary to read and write large files. To keep block sizes consistent, branch blocks are limited to the this size as well, meaning that with the default size of 1 MiB, a branch block can store up to 32768 hashes of 256 bits (32 bytes) each.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TECHNICAL.md

TECHNICAL.md

Technical details

Archives

File trees

Block trees

Files

Files

TECHNICAL.md

Latest commit

History

TECHNICAL.md

File metadata and controls

Technical details

Archives

File trees

Block trees

Files