Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: remove excess optimization for sourcemap #5547

Closed
wants to merge 1 commit into from

Conversation

JesseCodeBones
Copy link
Contributor

Removing the excess optimization for source map generation and print.
Followup to: 5504 and 5524

@JesseCodeBones
Copy link
Contributor Author

JesseCodeBones commented May 23, 2023

Hi Maintainer,
in this PR we fixed two issues:

  1. debug location down grade
    Input wat:
(module
  (func $foo (param $x i32) (param $y i32)
    ;;@ src.cpp:1:1
    (if
      (i32.add
        (local.get $x)
        (local.get $y)
      )
      (return)
    )
  )
)

outputs:

(module
 (type $i32_i32_=>_none (func (param i32 i32)))
 (func $0 (param $0 i32) (param $1 i32)
  (if
   (i32.add
    ;;@ src.cpp:1:1  <-----------------------same location will only insert once, the first child inserted this position, if instruction will not get the location any more
    (local.get $0)
    (local.get $1)
   )
   (return)
  )
 )
)
  1. siblings instructions will dismiss debug location:
    input wat:
(module
  (func $foo (param $x i32) (param $y i32)
    (if
      ;;@ src.cpp:1:1
      (i32.add
        (local.get $x)
        (local.get $y)
      )
      ;;@ src.cpp:1:1
      (return)
    )
  )
)

outputs:

(module
 (type $i32_i32_=>_none (func (param i32 i32)))
 (func $0 (param $0 i32) (param $1 i32)
  (if
   (i32.add
    ;;@ src.cpp:1:1
    (local.get $0)
    (local.get $1)
   )
   (return) <--------------the debug location will be dismissed to print, but actually it has in memory structure
  )
 )
)

@JesseCodeBones JesseCodeBones marked this pull request as ready for review May 23, 2023 08:32
@@ -133,6 +133,7 @@
(i32.const 0)
)
)
;;@ fib.c:3:0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand why this addition is correct. So I turned the sourcemap to json, and there are three entries for line 3:

  {
    "generatedLine": 1,
    "generatedColumn": 643,
    "lastGeneratedColumn": null,
    "source": "fib.c",
    "originalLine": 3,
    "originalColumn": 0,
    "name": null
  },
  {
    "generatedLine": 1,
    "generatedColumn": 686,
    "lastGeneratedColumn": null,
    "source": "fib.c",
    "originalLine": 3,
    "originalColumn": 0,
    "name": null
  },
  {
    "generatedLine": 1,
    "generatedColumn": 693,
    "lastGeneratedColumn": null,
    "source": "fib.c",
    "originalLine": 3,
    "originalColumn": 0,
    "name": null
  },

Before this PR there are two locations where line 3 is mentioned in the .fromBinary file here. So I'd expect one more to appear, to get to 3 appearances to match the source map. But this PR adds more than that. Am I looking at this wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken Hi Alon,
sourceMap location is not one by one equal to the debug location of expression. it is a range concept.
to be short, if expression offset is between previous source map debug location offset and current source map debug location offset, this expression will take the previous source map debug location.
before this change, the debug location of siblings element just were not printed, but it existed in the functionRef->debugLocations, so this change just print it and it is more clear to see which instruction is under which location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if remove the filter condition of printDebugLocation of Print.cpp.
it will print all locations to all expressions, and it is more clear to see the location information, and just for debug, print all debug location is not neccessary I suppose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the range concept idea documented somewhere? I'm not familiar with that. I had assumed source maps are just individual locations, but I could be wrong...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Alon,
I didnot find the documentation of the solution, this may come from PR:
#1017
Here is the read and write flow:

  1. wasm-opt read:
    wasm-s-parser.h function: SExpressionParser::parseDebugLocation it will update loc of SExpressionParser, so the after expression will take this position until the loc will updated with another debug location.
  2. binary write:
    WasmBinaryWriter::writeSourceMapEpilog
writeBase64VLQ(*sourceMap, int32_t(offset - lastOffset));
writeBase64VLQ(*sourceMap, int32_t(loc->fileIndex - lastLoc.fileIndex));
writeBase64VLQ(*sourceMap, int32_t(loc->lineNumber - lastLoc.lineNumber));
writeBase64VLQ(*sourceMap,
                 int32_t(loc->columnNumber - lastLoc.columnNumber));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken
Hi Alon,
I suppose the primary design for source map is shrink the size of debug info like DWARF, but source map is lack the control instructions, so do you think we should back to point to point solution for source map debug info?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JesseCodeBones , sorry for the late response. I don't know that much about this and was hoping someone else would know...

I think you might be right, that there are ranges here, since we keep using the same value until it is updated. I was not aware of that before. But since that is how it is, we shouldn't change it, I guess.

To move forward here, getting back to the start of this thread, I think it would be helpful to explain the diffs in this PR. That is, I'm still not sure how to read the diff here and check that it is correct. Something like "this annotation appeared here because of this part of the sourcemap" would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken
Hi Alon,
the changes contains two parts:
part 1: during printing, if parent ancestors node already printed the debug info, skip print
part 2: during source map generation, only ancestors generate the source map debug info
I am not sure if it is clear, if you have any question, please feel free to ask me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand that, and the code sort of makes sense. But I am having specific trouble in reading the diffs in the testcase specifically. That is where I need help. I'll add a comment on a specific part of the test diff now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, this is already on a test diff 😄 Then, yeah: how can I see that this addition on line 136 is a correct one to add? I'm not sure how to read the sourcemap itself and see that it says there should be an annotation here.

@@ -2547,6 +2548,10 @@ struct PrintSExpression : public UnifiedExpressionVisitor<PrintSExpression> {
Module* currModule = nullptr;
Function* currFunction = nullptr;
Function::DebugLocation lastPrintedLocation;
uint32_t lastPrintedIndent =
Copy link
Contributor Author

@JesseCodeBones JesseCodeBones Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will cache last print indent as the depth of last printed instruction
@kripken

if (func != nullptr) {
const auto& debugLocationIterator = func->debugLocations.find(curr);
if (debugLocationIterator != func->debugLocations.cend()) {
if (lastDebugLocation != debugLocationIterator->second ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the function debug info emit, we add the depth logic, if the parent or siblings instruction already inserted the debuginfo, the instruction will not emit the debug info in source map generation, it will not impact the debug info reload from source map (children and siblings instructions will also get the debug info when load them from source map to memory structures), just optimize the generation of source map.
@kripken

// )
// ;; debug location
// (siblings)
if (lastPrintedLocation == location && indent > lastPrintedIndent) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken
Hi Alon,
the easiest way to check the debug info is remove this filter for debugging, here is my step:
1, remove this condition return
2, build binaryen
3, run cmd: ./build/bin/wasm-opt ./test/fib-dbg.wasm -ism ./test/fib-dbg.wasm.map --print
then wasm-opt will print all debug info.
so you can check if the new added annotation of debug info is reasonable or not.
if any other question, please feel free to ask, thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken
Hi Alon,
here is the wat file(fib-dbg.wasm.fromBinary) will full debug info printed:

(module
 (type $i32_=>_i32 (func (param i32) (result i32)))
 (type $i32_i32_=>_none (func (param i32 i32)))
 (type $none_=>_i32 (func (result i32)))
 (type $i32_=>_none (func (param i32)))
 (type $none_=>_none (func))
 (import "env" "memory" (memory $mimport$0 256 256))
 (import "env" "table" (table $timport$0 0 0 funcref))
 (import "env" "DYNAMICTOP_PTR" (global $gimport$0 i32))
 (import "env" "tempDoublePtr" (global $gimport$1 i32))
 (import "env" "ABORT" (global $gimport$2 i32))
 (import "env" "STACKTOP" (global $gimport$3 i32))
 (import "env" "STACK_MAX" (global $gimport$4 i32))
 (import "env" "gb" (global $gimport$5 i32))
 (import "env" "fb" (global $gimport$6 i32))
 (import "global" "NaN" (global $gimport$7 f64))
 (import "global" "Infinity" (global $gimport$8 f64))
 (import "env" "memoryBase" (global $gimport$9 i32))
 (import "env" "tableBase" (global $gimport$10 i32))
 (global $global$0 (mut i32) (global.get $gimport$0))
 (global $global$1 (mut i32) (global.get $gimport$1))
 (global $global$2 (mut i32) (global.get $gimport$2))
 (global $global$3 (mut i32) (global.get $gimport$3))
 (global $global$4 (mut i32) (global.get $gimport$4))
 (global $global$5 (mut i32) (global.get $gimport$5))
 (global $global$6 (mut i32) (global.get $gimport$6))
 (global $global$7 (mut i32) (i32.const 0))
 (global $global$8 (mut i32) (i32.const 0))
 (global $global$9 (mut i32) (i32.const 0))
 (global $global$10 (mut i32) (i32.const 0))
 (global $global$11 (mut f64) (global.get $gimport$7))
 (global $global$12 (mut f64) (global.get $gimport$8))
 (global $global$13 (mut i32) (i32.const 0))
 (global $global$14 (mut i32) (i32.const 0))
 (global $global$15 (mut i32) (i32.const 0))
 (global $global$16 (mut i32) (i32.const 0))
 (global $global$17 (mut f64) (f64.const 0))
 (global $global$18 (mut i32) (i32.const 0))
 (global $global$19 (mut i32) (i32.const 0))
 (global $global$20 (mut i32) (i32.const 0))
 (global $global$21 (mut f64) (f64.const 0))
 (global $global$22 (mut i32) (i32.const 0))
 (global $global$23 (mut f64) (f64.const 0))
 (export "setThrew" (func $setThrew))
 (export "runPostSets" (func $runPostSets))
 (export "establishStackSpace" (func $establishStackSpace))
 (export "stackSave" (func $stackSave))
 (export "stackRestore" (func $stackRestore))
 (export "_fib" (func $_fib))
 (export "stackAlloc" (func $stackAlloc))
 (func $stackAlloc (param $0 i32) (result i32)
  (local $1 i32)
  (block $label$1
   (local.set $1
    (global.get $global$3)
   )
   (global.set $global$3
    (i32.add
     (global.get $global$3)
     (local.get $0)
    )
   )
   (global.set $global$3
    (i32.and
     (i32.add
      (global.get $global$3)
      (i32.const 15)
     )
     (i32.const -16)
    )
   )
   (return
    (local.get $1)
   )
  )
 )
 (func $stackSave (result i32)
  (return
   (global.get $global$3)
  )
 )
 (func $stackRestore (param $0 i32)
  (global.set $global$3
   (local.get $0)
  )
 )
 (func $establishStackSpace (param $0 i32) (param $1 i32)
  (block $label$1
   (global.set $global$3
    (local.get $0)
   )
   (global.set $global$4
    (local.get $1)
   )
  )
 )
 (func $setThrew (param $0 i32) (param $1 i32)
  (if
   (i32.eq
    (global.get $global$7)
    (i32.const 0)
   )
   (block
    (global.set $global$7
     (local.get $0)
    )
    (global.set $global$8
     (local.get $1)
    )
   )
  )
 )
 (func $_fib (param $0 i32) (result i32)
  (local $1 i32)
  (local $2 i32)
  (local $3 i32)
  (local $4 i32)
  (local $5 i32)
  (local $6 i32)
  (local $7 i32)
  (local $8 i32)
  (local $9 i32)
  (local $10 i32)
  (local $11 i32)
  (block $label$1
   (local.set $11
    (global.get $global$3)
   )
   ;;@ fib.c:3:0
   (local.set $6
    ;;@ fib.c:3:0
    (i32.gt_s
     ;;@ fib.c:3:0
     (local.get $0)
     ;;@ fib.c:3:0
     (i32.const 0)
    )
   )
   ;;@ fib.c:3:0
   (if
    ;;@ fib.c:3:0
    (local.get $6)
    (block
     ;;@ fib.c:3:0
     (local.set $1
      ;;@ fib.c:3:0
      (i32.const 0)
     )
     ;;@ fib.c:3:0
     (local.set $5
      ;;@ fib.c:3:0
      (i32.const 1)
     )
     ;;@ fib.c:3:0
     (local.set $8
      ;;@ fib.c:3:0
      (i32.const 0)
     )
    )
    (block
     ;;@ fib.c:3:0
     (local.set $4
      ;;@ fib.c:3:0
      (i32.const 1)
     )
     ;;@ fib.c:8:0
     (return
      ;;@ fib.c:8:0
      (local.get $4)
     )
    )
   )
   ;;@ fib.c:8:0
   (loop $label$4
    ;;@ fib.c:8:0
    (block $label$5
     ;;@ fib.c:4:0
     (local.set $3
      ;;@ fib.c:4:0
      (i32.add
       ;;@ fib.c:4:0
       (local.get $5)
       ;;@ fib.c:4:0
       (local.get $1)
      )
     )
     ;;@ fib.c:3:0
     (local.set $9
      ;;@ fib.c:3:0
      (i32.add
       ;;@ fib.c:3:0
       (local.get $8)
       ;;@ fib.c:3:0
       (i32.const 1)
      )
     )
     ;;@ fib.c:3:0
     (local.set $7
      ;;@ fib.c:3:0
      (i32.eq
       ;;@ fib.c:3:0
       (local.get $9)
       ;;@ fib.c:3:0
       (local.get $0)
      )
     )
     ;;@ fib.c:3:0
     (if
      ;;@ fib.c:3:0
      (local.get $7)
      (block
       ;;@ fib.c:3:0
       (local.set $4
        ;;@ fib.c:3:0
        (local.get $3)
       )
       ;;@ fib.c:3:0
       (br $label$5)
      )
      (block
       ;;@ fib.c:3:0
       (local.set $2
        ;;@ fib.c:3:0
        (local.get $5)
       )
       ;;@ fib.c:3:0
       (local.set $5
        ;;@ fib.c:3:0
        (local.get $3)
       )
       ;;@ fib.c:3:0
       (local.set $8
        ;;@ fib.c:3:0
        (local.get $9)
       )
       ;;@ fib.c:3:0
       (local.set $1
        ;;@ fib.c:3:0
        (local.get $2)
       )
      )
     )
     ;;@ fib.c:3:0
     (br $label$4)
    )
   )
   (return
    ;;@ fib.c:8:0
    (local.get $4)
   )
  )
 )
 (func $runPostSets
  (local $0 i32)
  (nop)
 )
 ;; custom section "sourceMappingURL", size 35, contents: "\"http://localhost:8000/fib.wasm.map"
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks, and sorry for the delay.

I'll look at this again soon, but first, I have recently been debugging another source maps issue (a user reported the wrong annotation somewhere), and so I read the source maps spec again and then our code. I have to say I am still a little confused on it. It seems like the spec says nothing about ranges or how to infer debug locations for things without an annotation, but Binaryen does do that - it will keep using a debug location after it sees one, until the next. I am not sure if that is correct or not, and I'm not sure if that is related to this PR or not, but the binary format parts seem like they might be? I am quite worried about changing it, because of how it might affect other users...

For this PR specifically, since I would like to move forward on it: Is it possible to split the Print parts from the binary format parts? I think there is a low bar to landing a Print improvement, but a high bar for a binary format change (again, because users could be affected by the latter). So if we can split it, that would make review simpler, and also allow the Print part to land quickly I hope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken
yep, I will splite the PR and suppose the print PR will comming soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants