-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft of DebugInfomationFormat.md. #1
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
#Debug Symbol Integration | ||
|
||
This is a design proposal for providing debug information within WebAssembly | ||
that would allow source-level debugging of code compiled into WebAssembly. | ||
|
||
## Goals and Principles | ||
|
||
The immediate goal is to implement enough debugging functionality in WebAssembly | ||
MVP to let users perform at least rudimentary debugging on their source. | ||
Another goal is not to bake in any limitations that would prevent reasonable | ||
future evolution of the standard. | ||
|
||
Specifically, we are currently not focused on identifying a language-independent | ||
subset of the debug format -- this proposal bundles language-dependent and | ||
language-independent parts together. | ||
|
||
META: We need an evolution strategy that allows new front-end/debugger pairs to | ||
use the format in the future to transfer information currently unanticipated. | ||
Examples: a) dynamic scoping in Lisp; b) full DWARF 4 equivalence. | ||
|
||
## Debug Info Goes into Extra Sections | ||
|
||
To allow easy stripping of debug info, all of it will go into a separate | ||
section. In the binary format, this will be an unknown section inserted into | ||
the wasm file. (Note that this new info doesn't make the current | ||
[name section](BinaryEncoding.md#name-section) redundant, because that section | ||
contains WebAssembly names, not source-code names.) | ||
|
||
META: How do we distinguish the debug-containing unknown section from any other | ||
unknown section? | ||
|
||
In the text format, this will be an extra child of the **module** element named | ||
**debug**. | ||
|
||
META: If there's a better name than **debug**, we can certainly use it. Also, | ||
currently undecided about allowing multiple **debug** children, like **func**. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you proposing that there be a single section, and a single format, that will suffice for all HLL debugging use cases? And perhaps further, that we attempt to build in the flexibility to grow to encompass everything DWARF can do, and everything needed to describe LISP, and anything else that comes up, all in this one format? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment just says that I'm undecided whether we should allow multiple debug children under module. I'm leaning towards allowing multiple debugs as a syntactic convenience; their union will constitute the module's debug info. As for the larger question of flexibility, I'd like this design to cover all debugging use cases supported in the MVP. Beyond MVP, I envision extending the format in backward-compatible ways for each new use case. But I don't pretend to know which use cases (and in which order) that will be -- LISP and DWARF are just some examples that I threw in to illustrate the possible variety of what could be coming one day. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At an initial read, this sounds like a plan to make a format that has the eventual goal of reimplementing the functionality of DWARF. It's not explicitly spelled out here, but I assume the purpose of standardizing such a format is to allow browsers to consume it and provide their own HLL debugging experiences. Is this the vision here? If so, I'd like to have a discussion about the merits of that vision compared with the merits of another one: Another vision is that WebAssembly just provides '@' notation and byte offsets for identifying locations in WebAssembly code (as you describe), browser APIs for debugging primitives like reading out memory, setting breakpoints, and so on (in a security-appropriate manner), and browser APIs for reading the contents of unknown sections from wasm files. These make it possible for content to implement debugging functionality itself, such as in the form of a library that gets linked in, or possibly in the form of code that the user might request be loaded alongside the main content of a site. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry if that's the impression, I didn't mean to take other options off the table. I'm happy to have the discussion about different visions. I'll reach out on IRC. |
||
|
||
## AST Gets Unique Identifiers | ||
|
||
Since all the debug info is in a separate section, there must be a way for it to | ||
reference AST nodes. For that, we introduce a mechanism to tag any AST node | ||
with a unique identifier: just add an atom to the node's s-expression. That | ||
atom must begin with the character '@' and be unique across the entire module | ||
(for instance: `(get_local $x @abc123)` or `(param $y f64 @def456)`). | ||
|
||
This extension is not visible in the binary format. In binary, the debug | ||
section can refer to the wasm code by its byte address. | ||
|
||
## Debug Functions | ||
|
||
The **debug** child of the **module** will contain invocations of special | ||
functions defined in this section. These functions are intended to convey the | ||
debug info about the AST. | ||
|
||
The functions' syntax is given using conventions borrowed from | ||
[here](https://www.gnu.org/software/emacs/manual/html_node/elisp/A-Sample-Function-Description.html#A-Sample-Function-Description). | ||
Any _symbolID_ argument must be unique across the module. | ||
|
||
### File | ||
|
||
Function: **file** _symbolID_ _stringFilepath_ | ||
|
||
Describes a file that can be referenced by _symbolID_. The file contents can be | ||
read by accessing _stringFilepath_ in a manner described elsewhere. No two | ||
**file** invocations may have the same _stringFilepath_. | ||
|
||
META: Specify how exactly the browser may obtain the file content. | ||
|
||
### Source Location | ||
|
||
Function: **source\_location** _symbolID_ _symbolFile_ _integerLine_ _integerColumn_ | ||
|
||
Describes a source location that can be referenced by _symbolID_. _symbolFile_ | ||
refers to a file described by the **file** function. The last two arguments are | ||
line and column in that file. | ||
|
||
Function: **has\_source\_location** _symbolLocation_ &rest _@nodes_ | ||
|
||
Declares that _@nodes_ (an arbitrary number of @ references to the AST) | ||
correspond to the source location _symbolLocation_. | ||
|
||
### Lexical Context | ||
|
||
Function: **context** _symbolID_ _symbolStart_ _symbolEnd_ | ||
|
||
Describes a set of consecutive lines in a file, starting at location | ||
_symbolStart_ and endint at location _symbolEnd_. Both locations must be in the | ||
same file, and the end location must come lexically after the start. | ||
|
||
This can be used to describe lexical scopes in C-like program, with the start | ||
location pointing to the opening brace and the end location pointing to the | ||
closing brace. | ||
|
||
### Type | ||
|
||
Function: **type** _symbolID_ _symbolLocation_ | ||
|
||
TODO: Devise a convention to represent types. OK if it's specific to C++. | ||
|
||
### Variable | ||
|
||
Function: **var** _symbolID_ _stringName_ _symbolContext_ _symbolType_ _integerMemoryAddress_ | ||
|
||
Describes a source-code variable that can be referenced by _symbolID_. The | ||
variable name (as used in the source) is captured in _stringName_, its lexical | ||
scope in _symbolContext_ (refers to a _symbolID_ of a **context** invocation), | ||
and its type in _symbolType_ (refers to a _symbolID_ of a **type** invocation). | ||
_integerMemoryAddress_ is the address of module's memory that holds the value of | ||
this source variable. | ||
|
||
TODO: Capture the source location where the variable is declared? Isn't the | ||
context redundant then? | ||
|
||
### Function | ||
|
||
Function: **func** _symbolID_ _stringName_ _symbolType_ _symbolStart_ _symbolEnd_ | ||
|
||
Describes a source-code function or method whose mangled name is _stringName_, | ||
whose type is the **type** referenced by _symbolType_, and whose start/end | ||
locations are referenced by **source\_location** references _symbolStart_ and | ||
_symbolEnd_. | ||
|
||
## How to Perform Debugging Actions | ||
|
||
Here's how to perform some basic debugging actions based on this debug-info | ||
format: | ||
|
||
### Set a Breakpoint on a Specific Line of a Specific File | ||
|
||
1. Find the **file** invocation whose _stringFilepath_ argument matches the | ||
specified file. If none exist, abort with an error "no such file". | ||
2. Find all **source\_location** invocations whose _symbolFile_ equals the | ||
above's _symbolID_. Among them, find invocations with the minimal | ||
_integerLine_ equal or larger than the specified line. If none exist, abort | ||
with an error "no debuggable code on that line". | ||
3. Find all the **has\_source\_location** invocations whose _symbolLocation_ is | ||
among the _symbolID_ arguments to the above invocations. If none exist, | ||
abort with an error "no debuggable code on that line". | ||
4. Find the earliest (in evaluation order) AST node in the union of the above | ||
invocations' _@nodes_. This is where the breakpoint goes. | ||
|
||
### Set a Breakpoint on a Specific Function | ||
|
||
1. Find the **func** invocation whose _stringName_ matches (META: describe the | ||
matching process, including mangling) specified function name. If none | ||
exist, abort with an error "unknown function". If multiple invocations | ||
match, ask the user to choose one among them. | ||
2. Find the **source\_location** invocation whose _symbolID_ equals the | ||
_symbolStart_ of the above **func**. If none exists, abort with an error | ||
"incomplete debug info". | ||
3. Find the **file** invocation whose _symbolID_ equals the _symbolFile_ of the | ||
above. | ||
4. Proceed with setting a breakpoint on the file and line derived from the | ||
above's _stringFilepath_ and the source location's _integerLine_. The | ||
procedure is described in the previous section. | ||
|
||
### Set a Breakpoint Inside a C++ Template Definition | ||
|
||
1. Find all instantiations of the template in question. | ||
2. For each instantiation, repeat the above procedure for setting a breakpoint. | ||
|
||
### Show which Source Line is Being Executed | ||
|
||
1. Find the first **has\_source\_location** invocation whose _@nodes_ contains | ||
the current AST node's @ identifier. (META: should we disallow multiple | ||
locations for the same AST node?) If none exists, the result is nil. | ||
2. Find the **source\_location** invocation whose _symbolID_ equals the | ||
_symbolLocation_ of the above. If none exists, the result is nil. | ||
3. Find the **file** invocation whose _symbolID_ equals the _symbolFile_ of the | ||
above. If none exists, abort with an error "incomplete debug info". | ||
4. The result is _integerLine_ of the **source\_location** from 2. above, plus | ||
the _stringFilepath_ of the above. | ||
|
||
### Step Into the Next Line (Following Function Calls) | ||
|
||
1. Record the current source line as described in the previous section. | ||
2. Execute the current AST node, then consider the next in evaluation order. | ||
Until the calculated line/file changes, repeat from 1. above. | ||
|
||
### Step Over the Next Line (Skipping Function Calls) | ||
|
||
Same as the above section, except finish evaluating entire **invoke** nodes | ||
before recalculating the current line and file. | ||
|
||
### Show a Specified Variable's Value | ||
|
||
1. Record the currently executed source line as described in a prior section. | ||
2. Find all **var** invocations whose _stringName_ matches the (mangled) | ||
specified variable name. If none exist, abort with an error "unknown | ||
variable". | ||
3. For each above invocation, find the **context** whose _symbolID_ equals the | ||
_symbolContext_ of the **var**. Keep only those **var** invocations whose | ||
**context** envelopes the current line (ie, same file and current line is | ||
between the context's start and end). | ||
4. If multiple **var** invocations remain after the above step, ask the user to | ||
choose one among them. | ||
5. Find the **type** whose _symbolID_ equals _symbolType_ from the above. | ||
6. Inspect the memory contents at the _integerMemoryAddress_, in accordance with | ||
the variable's type. | ||
|
||
### Show a Specified Object's Field Value | ||
|
||
Like the above section, but use the type structure information to find the | ||
field's offset in memory. | ||
|
||
### Show a Variable's Type | ||
|
||
Like showing the variable's value, but show the **type** info instead. | ||
|
||
### Set a Specified Variable's Value | ||
|
||
Like showing the variable's value, but set the memory instead. | ||
|
||
### Set the Current Function's Return Value | ||
|
||
META: Describe this. | ||
|
||
### Display Call Stack | ||
|
||
META: Describe this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full DWARF 4 seems pretty huge? Isn't that a Turing-complete language? :)
Would you have a specific DWARF feature to point at instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These examples are just to jog one's imagination for where future evolution may take us. They're not meant to suggest that we intend to go in that direction, just that we don't want to gratuitously prevent it. Another example could be "enough of DWARF to allow stock gdb to run on wasm modules".
Anyway, see the discussion below -- we are moving to a different, more flexible design that allows any debug-info format to be used. I'll mothball this PR soon and create another one with the new design description.