forked from carbon-language/carbon-lang
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Move toolchain architecture to markdown (carbon-language#4242)
Note I'm mostly trying to capture [the docs](https://docs.google.com/document/d/1RRYMm42osyqhI2LyjrjockYCutQ5dOf8Abu50kTrkX0/edit?resourcekey=0-kHyqOESbOHmzZphUbtLrTw&tab=t.0) as they exist today, not fixing issues with the docs. I think the doc itself hasn't changed much lately (i.e., for months). Trying to organize it a little better though, particularly so that it shows up reasonably when looking in github or the website. --------- Co-authored-by: Geoff Romer <[email protected]>
- Loading branch information
Showing
13 changed files
with
2,696 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# Toolchain architecture | ||
|
||
<!-- | ||
Part of the Carbon Language project, under the Apache License v2.0 with LLVM | ||
Exceptions. See /LICENSE for license information. | ||
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
--> | ||
|
||
<!-- toc --> | ||
|
||
## Table of contents | ||
|
||
- [Goals](#goals) | ||
- [High-level architecture](#high-level-architecture) | ||
- [Design patterns](#design-patterns) | ||
- [Adding features](#adding-features) | ||
|
||
<!-- tocstop --> | ||
|
||
## Goals | ||
|
||
The toolchain represents the production portion of Carbon. At a high level, the | ||
toolchain's top priorities are: | ||
|
||
- Correctness. | ||
- Quality of generated code, including performance. | ||
- Compilation performance. | ||
- Quality of diagnostics for incorrect or questionable code. | ||
|
||
TODO: Add an expanded document that details the goals and priorities and link to | ||
it here. | ||
|
||
## High-level architecture | ||
|
||
The main components are: | ||
|
||
- [Driver](driver.md): Provides commands and ties together compilation flow. | ||
- [Diagnostics](diagnostics.md): Produces diagnostic output. | ||
- Compilation flow: | ||
|
||
1. Source: Load the file into a | ||
[SourceBuffer](/toolchain/source/source_buffer.h). | ||
2. [Lex](lex.md): Transform a SourceBuffer into a | ||
[Lex::TokenizedBuffer](/toolchain/lex/tokenized_buffer.h). | ||
3. [Parse](parse.md): Transform a TokenizedBuffer into a | ||
[Parse::Tree](/toolchain/parse/tree.h). | ||
4. [Check](check.md): Transform a Tree to produce | ||
[SemIR::File](/toolchain/sem_ir/file.h). | ||
5. [Lower](lower.md): Transform the SemIR to an | ||
[LLVM Module](https://llvm.org/doxygen/classllvm_1_1Module.html). | ||
6. CodeGen: Transform the LLVM Module into an Object File. | ||
|
||
### Design patterns | ||
|
||
A few common design patterns are: | ||
|
||
- Distinct steps: Each step of processing produces an output structure, | ||
avoiding callbacks passing data between structures. | ||
|
||
- For example, the parser takes a `Lex::TokenizedBuffer` as input and | ||
produces a `Parse::Tree` as output. | ||
|
||
- Performance: It should yield better locality versus a callback approach. | ||
|
||
- Understandability: Each step has a clear input and output, versus | ||
callbacks which obscure the flow of data. | ||
|
||
- Vectorized storage: Data is stored in vectors and flyweights are passed | ||
around, avoiding more typical heap allocation with pointers. | ||
|
||
- For example, the parse tree is stored as a | ||
`llvm::SmallVector<Parse::Tree::NodeImpl>` indexed by `Parse::Node` | ||
which wraps an `int32_t`. | ||
|
||
- Performance: Vectorization both minimizes memory allocation overhead and | ||
enables better read caching because adjacent entries will be cached | ||
together. | ||
|
||
- Iterative processing: We rely on state stacks and iterative loops for | ||
parsing, avoiding recursive function calls. | ||
|
||
- For example, the parser has a `Parse::State` enum tracked in | ||
`state_stack_`, and loops in `Parse::Tree::Parse`. | ||
|
||
- Scalability: Complex code must not cause recursion issues. We have | ||
experience in Clang seeing stack frame recursion limits being hit in | ||
unexpected ways, and non-recursive approaches largely avoid that risk. | ||
|
||
See also [Idioms](idioms.md) for abbreviations and more implementation | ||
techniques. | ||
|
||
## Adding features | ||
|
||
We have a [walkthrough for adding features](adding_features.md). |
Oops, something went wrong.