Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify display conventions for wasm locations #1053

Merged
merged 8 commits into from
May 22, 2017
Merged

Specify display conventions for wasm locations #1053

merged 8 commits into from
May 22, 2017

Conversation

dschuff
Copy link
Member

@dschuff dschuff commented May 3, 2017

Based on the discussion in #990

Web.md Outdated
To achive the same goal of a common representations for WebAssembly constructs, the
following conventions are adopted.

A wasm location is a reference to a particular instruction in the binary, and may be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/wasm/WebAssembly/g everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Web.md Outdated
Where
* `${url}` is the URL associated with the module (e.g. via a response object), if any.
* `${funcIndex}` is an index the [function index space](https://github.com/WebAssembly/design/blob/master/Modules.md#function-index-space).
* `${pcOffset}` is the offset in the module binary of the first byte of the instruction, printed in hexadecimal with lower-case digits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x prefix or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in this formulation the 0x is part of the template and not part of the substituted value. Do you think we should switch that? It would mean that this line would say something like "${pcOffset} is the offset ... printed in hexadecimal with lower-case digits and a leading 0x prefix" which seems a little more awkward, but I don't have a strong opinion on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno, all I'm saying is this isn't clear. Whichever way works for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to more-or-less that wording. I agree it's clearer.

Web.md Outdated
Names of functions may also be displayed if the module contains a `"name"`
section; these can be used in the same contexts as JavaScript functions.
If there are no names provided, then engines should somehow indicate this;
(it may be sufficient to simply use e.g. an empty string if the name is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function number instead of empty string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a stack trace this would be kind of redundant. e.g. in SpiderMonkey, the full stacktrace entry would be
${name}@${location} where ${location} would include the wasm function number for wasm, and ${name} is already empty for top-level JS code not in a function.
For V8 the format is currently
at ${name} (${location}) for wasm and when there is a JS function, and just at ${location} for for top-level JS.
So the point is that there is already precedent for empty JS function names that browsers might want to reuse. OTOH I'm not opposed to a little redundancy in the name of making things clearer either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I guess this wording is unclear to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded to clarify.

Web.md Outdated
It has the following format:
`${url}:wasm-function[${funcIndex}]:0x${pcOffset}`
Where
* `${url}` is the URL associated with the module (e.g. via a response object), if any.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the format if there is no URL? Is the field just empty? Is the colon still included?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be something, rather than empty. The note below addresses that, but I agree it's not clear at this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise if it's empty there would be no way to tell different modules apart if they didn't have URLs. Obviously it's still possible to have collisions if the instantiation location is used, but at least it would allow a developer to avoid them if they cared.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded to clarify.

Web.md Outdated
`${url}:wasm-function[${funcIndex}]:0x${pcOffset}`
Where
* `${url}` is the URL associated with the module (e.g. via a response object), if any.
* `${funcIndex}` is an index the [function index space](https://github.com/WebAssembly/design/blob/master/Modules.md#function-index-space).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: index into the ...
Also, the URL can be relative (just "Modules.md#...").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Cellule
Copy link

Cellule commented May 5, 2017

I am still a little fuzzy about the url field.
Are we talking about the url of the script that instantiated the module or compiled the module or the bytes of the module ?
Even if the bytes are located in a .wasm file somewhere, when we create the module today we don't pass any info about the source, we simply give a buffer to compile.

Edit: I just noticed the part about the Response object, which is the only api I see that can give a meaningful url

Web.md Outdated
* `${pcOffset}` is the offset in the module binary of the first byte of the instruction, printed in hexadecimal with lower-case digits.
* `${url}` is the URL associated with the module (e.g. via a response
object), or other module identifier (see notes).
* `${funcIndex}` is an index the [function index space](Modules.md#function-index-space).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: missing "in"

Web.md Outdated

Notes:
* The URL field may be interpreted differently depending on the context. For
example offline tools may use a file name; or when the ArrayBuffer-based
`WebAssembly.instantiate` API is used in a browser, it may display the
location of the API call instead.
location of the API call instead. It should not be empty however; a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API calls may not have useful source locations either, e.g. when performed as part of an eval call.

Web.md Outdated

Notes:
* The URL field may be interpreted differently depending on the context. For
example offline tools may use a file name; or when the ArrayBuffer-based
`WebAssembly.instantiate` API is used in a browser, it may display the
location of the API call instead.
location of the API call instead. It should not be empty however; a
developer should be able to write their code such that modules from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that programmers should be able to rely on unambiguous location URLs? If so, I don't think that can work in general, e.g. for the aforementioned reason, but also because of URL ambiguities in general. I would rather drop that half-sentence.

@dschuff
Copy link
Member Author

dschuff commented May 5, 2017

@Cellule @rossberg-chromium re:URLs

So the obvious case is for the response APIs where there's a real non-data URL, which we hope will be the most common.
For the ArrayBuffer APIs, @lukewagner suggested in #990 (comment) that we could display the location of the JS caller that called the instantiate API. Obviously this could still be ambiguous (e.g. there's just one call in the source that instantiates different modules). However if a developer wanted to, they could introduce more call locations, ensuring that the modules could be distinguished for their own codebase. So the intent of this wording was to ensure that property, without necessarily mandating that "show the location of the API call" be exactly the mechanism. Thinking more about this though, if browsers disagree on that mechanism, it's probably not very useful. So we should probably either say that

  1. All browsers should display the JS location of the API call (this is presumably easy to do, and allows developers to split out the call locations if they want to), or
  2. there's no restriction, and browsers will presumably just disagree (most likely show the API call location or nothing)

@dschuff
Copy link
Member Author

dschuff commented May 5, 2017

I don't really like option 2 since it seems likely that with e.g. dynamic loading we'll have several linked modules all instantiated from the same API call (especially if we have an instantiateGroup-like API). One simple mechanism to help is to allow modules to be named (proposed in #1055). If we had a module name, we could also add it to this string (although it's maybe getting kind of long now).

@domenic
Copy link
Member

domenic commented May 5, 2017

One thing to note is that currently for eval browsers have all sorts of heuristics that generate "URLs" for stack traces. (E.g. I've seen "<eval code> in https://example.com/page.html" as a "URL".) I'm not sure whether browsers would want to either reuse that logic for wasm, or if maybe the community wants to crack down and only allow interoperable well-specified URLs for this new technology. (Possibly including such generated "URLs" in the future, but only if we get them specified so they can be implemented interoperably.)

@dschuff
Copy link
Member Author

dschuff commented May 10, 2017

If modules can have names inside them, it makes sense to prefer that over the location of an API call. And maybe even over the URL (although then it would be asymmetric with JS locations which do use URLs instead of other names). So I could go either way on whether a URL or module name should be preferred. But in any case, having the name available will at least ensure that a developer can always specify something of their choosing.

@dschuff
Copy link
Member Author

dschuff commented May 10, 2017

(And I've not attempted to specify here what anyone should do if in the presence of eval or whatever).

@lukewagner
Copy link
Member

Thanks for writing this up! Sorry for taking so long to get back to it (it always takes some time to page everything in).

So the module name is a new (but good) twist to consider. I think, even if a module name is supplied, you'd still always want to include the URL (since it's not redundant info). So what if instead:

  • the id field is renamed to url and is defined to be either the URL of the fetch (for the Response API) and otherwise the URL of the JS caller of compile/instantiate in the same fashion as eval (except substituting "eval" for the wasm method name: "compile", "instantiate", so, e.g., in SM, https://foo.com/foo.js line 10 > instantiate).
  • The name is defined to be module_name.func_name (or module_name, or func_name, or empty-string, if one of those fields is absent). I'd specifically say to use the empty string if no module/func names are present rather than index to avoid repeating wasm-function[funcIndex].

What I like is that this keeps all the names from the name section to the left of the @/at.

@dschuff
Copy link
Member Author

dschuff commented May 11, 2017

@lukewagner I like that idea; however:

  1. I don't want to prescribe the exact way an eval'd code location is represented, as it is an already-established difference between engines. Actually I'm not sure I really even want to say that 'it should be like the engine handles eval, but with the wasm function name' because some engines punt that entirely; e.g. JSC just says eval@[native code] so I'd like to allow them room to do something better for wasm without necessarily changing how they handle eval. (also, how would you handle an instantiate call from inside eval'd code? I guess just nest the representations?).
  2. If there are contexts where the function name is displayed that isn't right next to this location representation (e.g. in devtools UI?), then you probably don't want an empty string. So that could be beyond the scope of this document (e.g. if you have more expressive UI than just text). But are there other situations we are forgetting that would display a function name but not its location?

@lukewagner
Copy link
Member

  1. Agreed we don't want to overspecify b/c this already varies. But could we just call it the url field and say "a URL symmetric to a JS eval's URL" and give some examples? (instantiate-from-eval would work like eval-from-eval: @blah.js line 10 > eval line 5 > eval :)
  2. Good question. In a context like devtools where one didn't have the full mod_name.func_name@url:wasm-function[i]:pcOffset quintuplet, but rather just a "name" and "url" fields, I think we'd want "name" to be mod_name.func_name if both are defined else mod_name.wasm-function[i] if func name is not defined else wasm-function[i] if no names are defined. Perhaps we can capture this contextual distinction?

@domenic
Copy link
Member

domenic commented May 12, 2017

To be clear, my point in bringing up eval was that the from-ArrayBuffer APIs are analogous in terms of the kind of "source URLs" they might generate, in response to

So the obvious case is for the response APIs where there's a real non-data URL, which we hope will be the most common.

For the ArrayBuffer APIs, @lukewagner suggested in #990 (comment) that we could display the location of the JS caller that called the instantiate API.

eval() of code that calls the from-ArrayBuffer APIs is yet another level of complication (similar to eval() of code that calls eval()), but I wasn't intending to discuss that.

@lukewagner
Copy link
Member

lukewagner commented May 12, 2017

@domenic I think I agree with what you're saying, but I'm not sure if you're disagreeing with my more-recent comments :) To be clear, I'm suggesting that if, e.g., you're SM and already have URLs like @test.js line 2 > eval and @test.js line 2 > eval line 3 > eval for (nested) eval() then for WebAssembly.compile you'd have URLs like @test.js line 2 > WebAssembly.compile and @test.js line 2 > eval line 3 > WebAssembly.compile. And other engines would do symmetrically, basically doing s/eval/WebAssembly.compile/ (or WebAssembly.instantiate).

I like the idea of trying to be even more compatible, but this seems hard if there's already a diverging precedent for eval and wasm from-ArrayBuffer APIs can be called from within eval.

@domenic
Copy link
Member

domenic commented May 12, 2017

No disagreement; that sounds right! And yeah, it's not clear what the right answer is here, besides just speccing something like "engines should treat these APIs like they do eval for purposes of source locations". We can then hope that in the future someone takes on the heroic task of nailing down the stack trace format, including what happens with eval, for ES, and then wasm can just copy that work.

@dschuff
Copy link
Member Author

dschuff commented May 12, 2017

Good suggestions; I've tried to capture it, PTAL

@domenic
Copy link
Member

domenic commented May 12, 2017

Looks great, although there might be some mismatched parens in the example :)

Web.md Outdated

Names of functions may also be displayed if the module contains a
["name" section](BinaryEncoding.md#name-section);
these can be used in the same contexts as JavaScript functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "... same contexts as JavaScript function names".

Copy link
Member

@lukewagner lukewagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I missed the reply in email; sorry for taking so long to get back and thanks for applying all the changes! lgtm with two small nits

Web.md Outdated
not specify the full format of strings such as stack frame representations;
this allows engines to continue using their existing formats for JavaScript
(which existing code may already be depending on) while still printing
WebAssembly frames in a format consistent with JavaScript.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a Note saying somewhere that these conventions do not describe the value of the .name property of exported WebAssembly functions which is precisely [defined](JS.md#exported-function-exotic-objects) to be ToString(function-index)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha good point. Would we want a way to map one to the other as a standalone function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You you mean like some new JS API for producing the module_name.func_name? That seems possible, but it also makes the names section (more) semantically visible (than before), so I guess it depends on what our use case is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. My thinking it: we already let developers access the name section so they don't have to parse their own module... but then they need to parse the name section to get that information! Cut the middle-person. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense if client code would otherwise be doing their own binary parsing that we've already done. With the other Module reflection methods, our motivating use case was module loaders (and specific experience with incorporating wasm into SystemJS). It'd be nice to have some specific user who is wanting to programmatically access these function names.

Anyhow, this probably belongs in a different issue.

@lukewagner
Copy link
Member

I think @dschuff is out for a bit, so I took the liberty of applying my review requests to the PR. I also turned the naming paragraph into nested bullets so it was a bit easier to see the if/else structure.

@lukewagner
Copy link
Member

Any last comments before merging?

@hemobo
Copy link

hemobo commented May 19, 2017

This has probably been discussed elsewhere, but...
Is using the same format for runtime errors and compilation errors a strong constraint here? If it isn't, have you considered using a 'function n : instruction m (m being the m'th instruction in function n)' format instead? Individual instructions seem to be what both source maps and stack traces actually point to – what's actually behind the 'first byte of an instruction' part that's spec'ed here, just with a weird indirection through the binary encoding.

That would have the advantage of being independent of the current representation of a module (binary, text or some other data structure). As it is written, it seems one has to keep the particular byte stream around just to properly format a stack trace.

In principle an absolute binary offset can be used without having to really understand the binary encoding, but it isn't immediately clear that this is an advantage here, because tooling that has access to the byte stream and wants to do anything useful with that information (like displaying the trapping instructions) needs to be able to parse function bodies and possibly convert them into textual representation in any case.

@lukewagner
Copy link
Member

The 'function n' part is already explicitly present in this PR via wasm-function[${funcIndex}], so I think the main new thing you're proposing is changing from the current bytecode index to an instruction index (indexed by number of whole instructions).

From my experience, and from the previous discussion in #990, I think the bytecode index is simpler for everyone. For the engine compiling a trapping instruction, it's quite easy to just save (compile into the fail path, save in trap metadata, etc) the "current" bytecode offset for later trap reporting; no need to save bytecode. If we had to report instruction index, we'd have to maintain an additional instruction-counter that was incremented after decoding each instruction and this could both be a source of rare corner-case bugs and a mild source of decoding slowdown. For wasm producers, tools like wabt have a dump command that naturally displays bytecode offset next to each instruction; this too would need extra work to maintain an instruction counter instead.

Also, wasm source maps are currently being proposed to map bytecode offsets to source via bytecode offset and this would provide what developers really want which is errors in terms of source code location.

@dschuff
Copy link
Member Author

dschuff commented May 22, 2017

@lukewagner Thanks for helping push this along! I do think that using some abstraction other than byte offset is an interesting idea worth considering; we currently have some discussion in #1064 and here, so maybe we should merge this and file a separate issue or PR for that question specifically. If we were to switch, it would just replace the ${pcOffset} field as defined here with something different' maybe just a different number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants