Move toolchain architecture to markdown (carbon-language#4242)

Note I'm mostly trying to capture [the docs](https://docs.google.com/document/d/1RRYMm42osyqhI2LyjrjockYCutQ5dOf8Abu50kTrkX0/edit?resourcekey=0-kHyqOESbOHmzZphUbtLrTw&tab=t.0) as they exist today, not fixing issues with the docs. I think the doc itself hasn't changed much lately (i.e., for months). Trying to organize it a little better though, particularly so that it shows up reasonably when looking in github or the website. --------- Co-authored-by: Geoff Romer <[email protected]>
dwblaikie · Sep 9, 2024 · 91168b4 · 91168b4
1 parent ef1415a
commit 91168b4
Show file tree

Hide file tree

Showing 13 changed files with 2,696 additions and 4 deletions.
diff --git a/toolchain/README.md b/toolchain/README.md
@@ -6,6 +6,4 @@ Exceptions. See /LICENSE for license information.
 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 -->
 
-A design is currently maintained in
-[Google Drive](https://docs.google.com/document/d/1RRYMm42osyqhI2LyjrjockYCutQ5dOf8Abu50kTrkX0/edit?resourcekey=0-kHyqOESbOHmzZphUbtLrTw).
-It'll be migrated to markdown once we are confident in its stability.
+See [docs](docs/).
diff --git a/toolchain/docs/README.md b/toolchain/docs/README.md
@@ -0,0 +1,94 @@
+# Toolchain architecture
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Goals](#goals)
+-   [High-level architecture](#high-level-architecture)
+    -   [Design patterns](#design-patterns)
+-   [Adding features](#adding-features)
+
+<!-- tocstop -->
+
+## Goals
+
+The toolchain represents the production portion of Carbon. At a high level, the
+toolchain's top priorities are:
+
+-   Correctness.
+-   Quality of generated code, including performance.
+-   Compilation performance.
+-   Quality of diagnostics for incorrect or questionable code.
+
+TODO: Add an expanded document that details the goals and priorities and link to
+it here.
+
+## High-level architecture
+
+The main components are:
+
+-   [Driver](driver.md): Provides commands and ties together compilation flow.
+-   [Diagnostics](diagnostics.md): Produces diagnostic output.
+-   Compilation flow:
+
+    1. Source: Load the file into a
+       [SourceBuffer](/toolchain/source/source_buffer.h).
+    2. [Lex](lex.md): Transform a SourceBuffer into a
+       [Lex::TokenizedBuffer](/toolchain/lex/tokenized_buffer.h).
+    3. [Parse](parse.md): Transform a TokenizedBuffer into a
+       [Parse::Tree](/toolchain/parse/tree.h).
+    4. [Check](check.md): Transform a Tree to produce
+       [SemIR::File](/toolchain/sem_ir/file.h).
+    5. [Lower](lower.md): Transform the SemIR to an
+       [LLVM Module](https://llvm.org/doxygen/classllvm_1_1Module.html).
+    6. CodeGen: Transform the LLVM Module into an Object File.
+
+### Design patterns
+
+A few common design patterns are:
+
+-   Distinct steps: Each step of processing produces an output structure,
+    avoiding callbacks passing data between structures.
+
+    -   For example, the parser takes a `Lex::TokenizedBuffer` as input and
+        produces a `Parse::Tree` as output.
+
+    -   Performance: It should yield better locality versus a callback approach.
+
+    -   Understandability: Each step has a clear input and output, versus
+        callbacks which obscure the flow of data.
+
+-   Vectorized storage: Data is stored in vectors and flyweights are passed
+    around, avoiding more typical heap allocation with pointers.
+
+    -   For example, the parse tree is stored as a
+        `llvm::SmallVector<Parse::Tree::NodeImpl>` indexed by `Parse::Node`
+        which wraps an `int32_t`.
+
+    -   Performance: Vectorization both minimizes memory allocation overhead and
+        enables better read caching because adjacent entries will be cached
+        together.
+
+-   Iterative processing: We rely on state stacks and iterative loops for
+    parsing, avoiding recursive function calls.
+
+    -   For example, the parser has a `Parse::State` enum tracked in
+        `state_stack_`, and loops in `Parse::Tree::Parse`.
+
+    -   Scalability: Complex code must not cause recursion issues. We have
+        experience in Clang seeing stack frame recursion limits being hit in
+        unexpected ways, and non-recursive approaches largely avoid that risk.
+
+See also [Idioms](idioms.md) for abbreviations and more implementation
+techniques.
+
+## Adding features
+
+We have a [walkthrough for adding features](adding_features.md).