From a7ce0fd9e3dc31425beedb10d10e8823193e6fe9 Mon Sep 17 00:00:00 2001 From: Wojciech Zmuda Date: Wed, 2 Oct 2024 16:05:54 +0200 Subject: [PATCH] docs: describe LLVM IR generation features Signed-off-by: Wojciech Zmuda --- docs/Getting LLVM IR Output.md | 1 + docs/LLVM IR Generation Features.md | 136 ++++++++++++++++++++++++++++ 2 files changed, 137 insertions(+) create mode 100644 docs/LLVM IR Generation Features.md diff --git a/docs/Getting LLVM IR Output.md b/docs/Getting LLVM IR Output.md index 793be15..6d587f9 100644 --- a/docs/Getting LLVM IR Output.md +++ b/docs/Getting LLVM IR Output.md @@ -16,6 +16,7 @@ the [`rustc` developer guide](https://rustc-dev-guide.rust-lang.org/backend/debu `cargo rustc -- --emit=llvm-ir`. - You can also set the `RUSTC_FLAGS` environment variable before invoking cargo as normal: `RUSTFLAGS='--emit=llvm-ir'`. +- You can see all possible LLVM flags with `cargo rustc -- -Cllvm-args="-help"`. There are some difficulties here with multiple-unit compilation that need to be figured out, but this is a reasonable starting point. diff --git a/docs/LLVM IR Generation Features.md b/docs/LLVM IR Generation Features.md new file mode 100644 index 0000000..074432c --- /dev/null +++ b/docs/LLVM IR Generation Features.md @@ -0,0 +1,136 @@ +# LLVM IR Generation Features + +## Introduction + +The process of Rust compilation involves several steps to transform Rust source code into an +executable binary. The `rustc` compiler first picks up the Rust code file and generates an +intermediate representation known as HIR (High-Level Intermediate Representation). It is then +converted to MIR (Mid-Level) and then, finally, to LLVM IR. HIR and MIR are out of scope of this +document. + +LLVM then performs multiple optimization and transformation passes over the IR. The final output +from LLVM's operations is a binary file containing machine code. This document focuses on generating +LLVM IR from Rust code and the various passes performed on the LLVM IR. + +## Rustc (LLVM IR generation) + +See [Getting LLVM IR Output](Getting LLVM IR Output.md). + +## LLVM IR Passes + +[LLVM passes](https://llvm.org/docs/Passes.html) come in three flavors: **analysis**, **transform**, +and **utils**. + +### Analysis Passes + +Analysis passes read the IR and output some data about the code. These passes can be helpful to +generate insights and optimize further transformations: + +- Control graph analysis +- Memory access dependencies +- Call graph printing +- Natural loops detection +- Instruction type counting + +These analytical insights can serve as valuable inputs for optimizing the compiler's performance. + +### Transform Passes + +Transform passes modify the IR in various ways, often utilizing data from the analysis passes: + +- Dead code, arguments, store, globals, loops, and tail call elimination +- Global variable optimization and value numbering +- Function inlining and loop unrolling +- Lower `invoke`s to `call`s and `SwitchInst`s to branches +- Combining redundant instructions (note: should e.g. merge two `add`s into one; can be problematic + if instruction swapping is undesired) +- Lower atomic intrinsics to non-atomic form +- Promoting memory to registers (possibly useful if this leverages Cairo's memory model with + read-only variables?) + +### Utility Passes + +Utility passes encapsulate operations that don't fit into the above categories. I didn't notice any +particularly interesting utility passes. + +### `opt` - the LLVM optimization tool + +The official LLVM documentation may not always be up-to-date. Refer to `opt -help` for the latest +source of information for LLVM passes and options. + +To get the list of all flags `opt` accepts, without arch-specific options, call this command: + +```sh +$ opt -help | grep -ivE 'aarch64|amdgpu|arm|avr|hexagon|mips|msp430|nvptx|ppc|r600|riscv|si|systemz|wasm|x86' +``` + +``` + +On my machine (LLVM version 18.1.8 aarch64-apple-darwin23.6.0) this command returns 463 different +flags. + +#### Optimization Levels + +LLVM offers several optimization levels: + +- `-O0`: No optimization +- `-O1`: Moderate optimization +- `-O2`: Default optimization +- `-O3`: High-level optimization +- `-Os`: Optimize for size +- `-Oz`: Optimize aggressively for size + +#### Architecture-Specific Passes + +There are numerous architecture-specific passes prefixed with the arch name: + +- `aarch64-`, +- `amdgpu-`, +- `arm-`, +- `avr-`, +- `hexagon-`, +- `mips-`, +- `msp430-`, +- `nvptx-`, +- `ppc-`, +- `r600-`, +- `riscv-`, +- `si-`, +- `systemz-`, +- `wasm-`, +- `x86-`. + +Since we're not generating any machine code for these architectures, these passes are irrelevant to +us. Some arch-specific flags do not have matching prefixes but flags descriptions mention these +names, so they can be filtered out from the help output. Arch, CPU and EABI selectors also seem +irrelevant. + +#### Code Organization + +LLVM provides options for code organization: + +- Emit basic blocks, functions data into separate sections, +- Configure data layout with a string value. + +#### Floating Point Optimization + +Several flags allow the selection of denormal number handling (IEEE 754, preserved sign, positive +zero, unknown) and further optimizations for floating-point calculations. They sound like they may +have impact on ALU and/or FPU. There are also multiple optimization flags for floating points, which +may be worth investigating again when we get to FP design and implementation. + +#### Memory Analysis + +`opt` offers a `--memoryssa` pass which provides +[updated memory dependency analysis](https://llvm.org/docs/Passes.html#memdep-memory-dependence-analysis), +replacing [the older pass](https://llvm.org/docs/MemorySSA.html). + +#### Polly Optimizer + +A set of `--polly-` flags is available to control the +[Polly loop optimizer](https://polly.llvm.org/docs/Architecture.html), which is extensive and may +require its own detailed research to understand fully. + +By understanding and utilizing these passes effectively, we can optimize the compilation process, +resulting in efficient and performant executable binaries. +```