- Overview
- C++ dialect
- Abbreviations used in the code (AKA Carbon abbreviation decoder ring)
.def
files- Index types
- ValueStore
- Template metaprogramming
- Local lambdas to reduce duplicate code
- Immediately invoked function expressions (IIFE)
- Declarations in conditions
- CRTP or "Curiously recurring template pattern"
- Multiple inheritance
- Defining constants usable in constexpr contexts
The toolchain implementation uses some implementation techniques that may not be commonly found in typical C++ code.
The toolchain implementation does not use some C++ features, following Google's C++ style guide:
Note that abbreviations are typically only used in code, not comments (except when referring to an entity from the code).
- Addr: "address"
- Arg: "argument"
- Decl: "declaration"
- Expr: "expression"
- SubExpr: "subexpression"
- Float: "floating point"
- Init: "initialization"
- Inst: "instruction"
- Int: "integer"
- Loc: "location"
- Param: "parameter"
- Paren: "parenthesis"
- Ref: "reference"
- Deref: "dereference"
- Subst: "substitute"
Phrase abbreviations (where we have an abbreviation for a phrase, where we wouldn't perform all of the abbreviations of those words individually):
- InitRepr: "initializing representation"
- ObjectRepr: "object representation"
- SemIR: "semantics intermediate representation"
- ValueRepr: "value representation"
The Carbon toolchain uses a technique related to X-macros to generate code that operates over a collection of types, enumerators, or another similar list of names. This works as follows:
- A
.def
file is provided, that is intended to be repeatedly included by way of#include
. - The user of the
.def
defines a macro, with a name and a form specified by the.def
file, for example#define CARBON_EACH_WIDGET(Name) Scope::Name,
. - A
#include
of the.def
file expands toCARBON_EACH_WIDGET(Name1)
,CARBON_EACH_WIDGET(Name2)
, ... for each widget name, and then#undef
s theCARBON_EACH_WIDGET
macro.
For example:
enum Widgets {
#define CARBON_EACH_WIDGET(Name) Name,
#include "widgets.def"
}
... would expand to an enumeration definition with one enumerator per widget name.
Most .def
files will have a corresponding EnumBase
child class (if widgets.def
has X-macros, widgets.h
and widgets.cpp
has
the EnumBase
child class). These work similarly to an enum class
, with the
addition of a name()
function and <<
stream operator support. Many also have
further utility functions for information related to the enum value.
In code, these types and values can be used directly in a switch
. They will
convert to an internal actual enum class
for the switch
, and receive
corresponding compiler safety checks that all enum values are handled.
Carbon makes frequent use of
IndexBase and IdBase. The IndexBase
and
IdBase
types are small wrappers around int32_t
to provide a measure of
type-checking when passing around indices to vector-like storage types. The only
difference is that IndexBase
supports all comparison operators, whereas
IdBase
only supports equality comparison.
Variable naming will often have _id
at the end to indicate that it corresponds
to an IdBase
. This may include the full type, as in operand_inst_id
being an
InstId
for an operand.
A block is an array of ids. These will be indicated with either a _block
suffix or pluralization (for example, param_refs
pluralizing refs
).
The ref
concept in a name means that there is an underlying instruction block,
but only a subset of instructions are present in the refs
block. For example,
function parameters have a sequence, and also have a refs
block with one entry
per parameter. The refs
block allows parameters to be counted and accessed
directly, rather than through vector iteration.
Many of Carbon's data types are stored in a
ValueStore or related type with similar
semantics (sem_ir
has several such classes).
ValueStore
links an indexing type to a value type with vector-like storage.
The indices typically use IdBase
.
ValueStore
s APIs follow the shape of simple array access and mutation:
Add
which takes a value and returns the index.Set
which takes a value and index to modify.Get
takes an index and returns a reference to the value (possibly a constant reference).- Other vector-like functionality, including
size
orReserve
ValueStores should be named after the type they contain. The index type used on
the value store should have a using ValueType...
which indicates the stored
type. When taking a return of one of these functions, it's common to use auto
and rely on the name of the storage type to imply the returned type.
Some name mirroring examples are:
-
ints
is aValueStore<IntId>
, which has an index type ofIntId
and a value type ofllvm::APInt
. -
functions
is aValueStore<SemIR::FunctionId>
, which has an index type ofSemIR::FunctionId
and a value type ofSemIR::
Function
. -
strings
is aValueStore<StringId>
, which has an index type ofStringId
, but for copy-related reasons, usesllvm::StringRef
for values.
A fairly complete list of ValueStore
uses should be available on
checking's Context class.
FIXME: show example patterns
- TypedInstArgsInfo from toolchain/sem_ir/inst.h
- templated using
- std::declval
- decltype
- static_assert
- if constexpr
- template specialization, for example
Inst::FromRaw<T>
(maybe also type traits?)
The toolchain uses a primitive form of struct reflection to operate generically
over the fields in a typed SemIR
instruction. This is implemented in
common/struct_reflection.h
, and the interface to the functionality is
StructReflection::AsTuple(your_struct)
, which converts the given struct into a
std::tuple
containing the same fields in the same order.
The presence of specific fields in a struct with a specified type is detected using the following idiom:
template <typename T, typename = FieldType T::*>
constexpr bool HasField = false;
template <typename T>
constexpr bool HasField<T, decltype(&T::field)> = true;
This is intended to check the same property as the following concept, which we can't use because we currently need to compile in C++17 mode:
template <typename T> concept HasField = requires (T x) {
{ x.field } -> std::same_as<FieldType>;
};
To detect a field with a specific name with a type derived from a specified base type, use this idiom:
// HasField<T> is true if T has a `U field` field,
// where `U` extends `BaseClass`.
template <typename T, bool Enabled = true>
inline constexpr bool HasField = false;
template <typename T>
inline constexpr bool HasField<
T, bool(std::is_base_of_v<BaseClass, decltype(T::field)>)> = true;
The equivalent concept is:
template <typename T> concept HasField = requires (T x) {
{ x.field } -> std::derived_from<BaseClass>;
};
Sometimes code that would be repeated in a function is factored into a local variable containing a lambda:
auto common_code = [&](AType param1, AnotherType param2) {
// code that would otherwise be repeated
...
}
if (something) {
common_code(...);
}
if (something_else) {
common_code(...)
}
Compared to defining a new function, this has the advantage of being able to be declared in context and access the local variables of the enclosing function.
Instead of creating a separate function with its own name that will be called once to produce the initial value for a variable, the function can be declared inline and then immediately called.
This can be used for complex initialization, as in:
// variable declaration
static const llvm::ArrayRef<std::byte> entropy_bytes =
// initializer starts with a lambda
[]() -> llvm::ArrayRef<std::byte> {
static llvm::SmallVector<std::byte> bytes;
// a bunch of code
// return the value to initialize the variable with
return bytes;
// finish defining the lambda, and then immediately invoke it
}();
It can also be used inside a CARBON_DCHECK
to avoid computation that is only
needed in debug builds:
CARBON_DCHECK([&] {
// a bunch of code
// condition that will be tested by CARBON_DCHECK
return complicated && multiple_parts;
// finish defining the lambda, and then immediately invoke it
}(), "Complicated things went wrong");
See a description of this technique on wikipedia.
The condition part of an if
statement may contain a declaration with an
initializer followed by a semicolon (;
) and then the proper boolean condition
expression, as in:
if (auto verify = tree.Verify(); !verify.ok()) {
The condition can be replaced by a declaration entirely, as in:
if (auto equals = context.ConsumeIf(Lex::TokenKind::Equal)) {
// Equivalent to:
if (auto equals = context.ConsumeIf(Lex::TokenKind::Equal); equals) {
or
if (auto literal = bound_inst.TryAs<SemIR::IntegerLiteral>()) {
// Equivalent to:
if (auto literal = bound_inst.TryAs<SemIR::IntegerLiteral>(); literal) {
This is a common way of handling a function that returns an optional value.
See https://en.cppreference.com/w/cpp/language/if
Curiously Recurring Template Pattern - cppreference.com
Curiously recurring template pattern - Wikipedia
Examples:
template <typename DerivedT, ...>
in enum_base.htemplate <typename DerivedT>
in ostream.h
We use multiple inheritance to support uses of CRTP.
Example:
struct NameScopeId : public IndexBase, public Printable<NameScopeId> {
To declare a constant usable at compile time in constexpr
contexts as a static
class member, we use this pattern:
Declaration:
class Foo {
// ...
static const std::array<ElementType, ElementCount> MyTable;
static constexpr auto ComputeMyTable()
-> std::array<ElementType, ElementCount> { ... }
};
Definition:
constexpr std::array<ElementType, ElementCount>
Foo::MyTable = Foo::ComputeMyTable();
Note the const
on the declaration does not match the constexpr
on
definition, and that the definition is outside of the class body. This allows
the initializer to depend on the definition of the class.
Further note that this only works with static members of classes, not static variables in functions.
Due to a Clang bug, this technique does not work in a class template. The following pattern can be used instead:
template <typename T>
class Foo {
// ...
template <typename Self = Foo>
static constexpr auto MyValueImpl = Self();
static constexpr const Foo& MyValue = MyValueImpl<>;
// ...
};
The parameters of the variable template can be chosen to allow reuse of the same variable template for multiple static data members.
Examples:
NodeStack::IdKindTable
in check/node_stack.hBuiltinKind::ValidCount
in sem_ir/builtin_inst_kind.h
A global constant may use a single definition without a separate declaration:
static constexpr std::array<bool, 256> IsIdStartByteTable = [] {
std::array<bool, 256> table = {};
// ...
return table;
}();
Note this example is using an immediately invoked function expression to compute the initial value, which is common.
Examples: