From 7b11f92cb0c46f6dca0c51ec370fbcc610ed2999 Mon Sep 17 00:00:00 2001 From: "g. nicholas d'andrea" Date: Thu, 31 Oct 2024 00:35:16 -0400 Subject: [PATCH] [wip] --- packages/web/spec/program/concepts.mdx | 84 +++++++++++++++++++++++ packages/web/spec/program/instruction.mdx | 2 +- packages/web/spec/program/overview.mdx | 73 ++++++++++++++++++++ packages/web/spec/program/program.mdx | 2 +- 4 files changed, 159 insertions(+), 2 deletions(-) create mode 100644 packages/web/spec/program/concepts.mdx create mode 100644 packages/web/spec/program/overview.mdx diff --git a/packages/web/spec/program/concepts.mdx b/packages/web/spec/program/concepts.mdx new file mode 100644 index 00000000..928f02f5 --- /dev/null +++ b/packages/web/spec/program/concepts.mdx @@ -0,0 +1,84 @@ +--- +sidebar_position: 2 +--- + +# Key concepts + +## Programs are associated with a contract's compiled bytecode + +This bytecode might either be the call bytecode, executed when a contract +account with this bytecode receives a message on-chain, or the create bytecode, +executed as part of deploying the contract associated with the bytecode. + +Reflecting this relationship, **ethdebug/format/program** records contain +a reference to the concrete contract (i.e., not an `abstract contract` or +`interface`), the environment the bytecode will be executed (call or +create), and the compilation that yielded the contract and bytecode. + +## Programs contain instruction listings for debuggers to reference + +Programs contain a list of **ethdebug/format/program/instruction** objects, +where each instruction corresponds to one machine instruction in the +associated bytecode. + +These instructions are ordered sequentially, matching the order and +corresponding one-to-one with the encoded binary machine instructions in +the bytecode. Instructions specify the byte offset at which they appear in the +bytecode; this offset is equivalent to program counter on non-EOF EVMs. + +By indexing these instructions by their offset, **ethdebug/format** +programs allow debuggers to lookup high-level information at any point +during machine execution. + +## Instructions describe high-level context details + +Each instruction object in a program contains crucial information about the +high-level language state at that point in the bytecode execution. +Instructions represent these details using the +**ethdebug/format/program/context** schema, and these details may include: + +- Source code ranges associated with the instruction (i.e., "source mappings") +- Variables known to be in scope following the instruction and where to + find those variable's values in the machine state +- Control flow information such as an instruction being associated with the + process of calling from one function to another + +This information serves as a compile-time guarantee about the high-level +state of the world that exists following each instruction. + +## Contexts inform high-level language semantics during machine tracing + +The context information provided for each instruction serves as a bridge +between low-level EVM execution and high-level language constructs. Debuggers +can use these strong compile-time guarantees to piece together a useful and +consistent model of the high-level language code behind the running machine +binary. + +By following the state of machine execution, a debugger can use context +information to stay apprised of the changing compile-time facts over the +course of the trace. Each successively-encountered context serves as the +source of an observed state transition in the debugger's high-level state +model. This allows the debugger to maintain an ever-changing and coherent +view of the high-level language runtime. + +In essence, the information provided by objects in this schema serves as a +means of reducing over state transitions, yielding a dynamic and accurate +representation of the program's high-level state. This enables debugging +tools to: + +1. Map the current execution point back to the original source code +2. Reconstruct the state of variables at any given point +3. Provide meaningful stack traces that reference function names and source + locations +4. Offer insights into control flow, such as entering or exiting functions, + or iterating through loops +5. Present data structures (like arrays or mappings) in a way that reflects + their high-level representation, rather than their low-level storage + +By leveraging these contexts, debugging tools can offer a more intuitive and +developer-friendly experience when working with EVM bytecode, effectively +translating between the machine-level execution and the high-level code that +developers write and understand. This continuous mapping between low-level +execution and high-level semantics allows developers to debug their smart +contracts more effectively, working with familiar concepts and structures +even as they delve into the intricacies of EVM operation. diff --git a/packages/web/spec/program/instruction.mdx b/packages/web/spec/program/instruction.mdx index 375e11e7..ac9a6b9d 100644 --- a/packages/web/spec/program/instruction.mdx +++ b/packages/web/spec/program/instruction.mdx @@ -1,5 +1,5 @@ --- -sidebar_position: 4 +sidebar_position: 5 --- import SchemaViewer from "@site/src/components/SchemaViewer"; diff --git a/packages/web/spec/program/overview.mdx b/packages/web/spec/program/overview.mdx new file mode 100644 index 00000000..8aee4cef --- /dev/null +++ b/packages/web/spec/program/overview.mdx @@ -0,0 +1,73 @@ +--- +sidebar_position: 1 +--- + +# Overview + +:::tip[Summary] + +**ethdebug/format/program** is a JSON schema for describing compile-time +information about EVM bytecode, organized from the perspective of individual +machine instructions. + +In **ethdebug/format**, a program record (or "program") represents one block of +executable EVM machine code that a compiler generated for a specific contract. +This could be either the contract's runtime call bytecode or the bytecode +to create the contract. + +A program is structured as a sequence of instruction records ("instructions"), +where each corresponds to a single EVM instruction in the machine code. Each +instruction contains information about the high-level language context at that +point in the bytecode. This allows debuggers to map low-level machine state +back to high-level language concepts at any point during execution. + +Key information that programs contain for a particular instruction might +include: +- the source range or source ranges that are "associated" with the + instruction +- the collection of known high-level variables at that point in time, + including their types and where to find the bytes with those variables' + values +- signals to indicate that the instruction is part of some control flow + operation, such as calling some function from another. + +These program records provide debuggers with a powerful reference resource +to be consulted while observing a running EVM. At each step of EVM machine +execution, debuggers can find the matching **ethdebug/format** program +instruction and use its information to maintain a coherent model of the +high-level world, step-by-step. + +::: + +This format defines the primary **ethdebug/format/program** schema as well as +various sub-schemas in the ethdebug/format/program/* namespace. + +JSON values adhering to this schema contain comprehensive information about a +particular EVM bytecode object. This includes contract metadata (e.g., reference to the source range where the contract is defined) and, importantly, an +ordered list of **ethdebug/format/program/instruction** objects. + +Each instruction object contains essential details for translating low-level +machine state at the time of the instruction back into high-level language +concepts. This allows debuggers to provide a meaningful representation of +program state at any point during execution. + +## Reading this schema + +The **ethdebug/format/program** schema is a root schema that composes other +related schemas in the ethdebug/format/program/* namespace. + +These schemas (like all schemas in this format) are specified as +[JSON Schema](https://json-schema.org), draft 2020-12. + +Please refer to one or more of the following resources in this section, or +see the navigation bar for complete contents: + +- [Key concepts](/spec/program/concepts) + +- [Schema](/spec/program) (**ethdebug/format/program** schema listing) + +- [Instruction schema](/spec/program/instruction) + (**ethdebug/format/program/instruction** schema listing) + +- [Program contexts](/spec/program/context/overview) + (**ethdebug/format/program/context** schema listing) diff --git a/packages/web/spec/program/program.mdx b/packages/web/spec/program/program.mdx index c6d59716..9be2c82d 100644 --- a/packages/web/spec/program/program.mdx +++ b/packages/web/spec/program/program.mdx @@ -1,5 +1,5 @@ --- -sidebar_position: 3 +sidebar_position: 4 --- import SchemaViewer from "@site/src/components/SchemaViewer";