README.md

Toolchain architecture

The toolchain represents the production portion of Carbon. At a high level, the toolchain's top priorities are:

TODO: Add an expanded document that details the goals and priorities and link to it here.

The main components are:

A few common design patterns are:

Distinct steps: Each step of processing produces an output structure, avoiding callbacks passing data between structures.
- For example, the parser takes a Lex::TokenizedBuffer as input and produces a Parse::Tree as output.
- Performance: It should yield better locality versus a callback approach.
- Understandability: Each step has a clear input and output, versus callbacks which obscure the flow of data.
Vectorized storage: Data is stored in vectors and flyweights are passed around, avoiding more typical heap allocation with pointers.
- For example, the parse tree is stored as a llvm::SmallVector<Parse::Tree::NodeImpl> indexed by Parse::Node which wraps an int32_t.
- Performance: Vectorization both minimizes memory allocation overhead and enables better read caching because adjacent entries will be cached together.
Iterative processing: We rely on state stacks and iterative loops for parsing, avoiding recursive function calls.
- For example, the parser has a Parse::State enum tracked in state_stack_, and loops in Parse::Tree::Parse.
- Scalability: Complex code must not cause recursion issues. We have experience in Clang seeing stack frame recursion limits being hit in unexpected ways, and non-recursive approaches largely avoid that risk.

See also Idioms for abbreviations and more implementation techniques.