-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(trim-paths): trim SO
and DW_AT_comp_dir
symbols for root DI node
#118518
Conversation
r? @b-naber (rustbot has picked a reviewer for you, use r? to override) |
cc @Urgau if you're interested. |
Not the right fix… |
10d6124
to
a68c2fb
Compare
r? @pnkfelix as you're one member of debugging wg |
unsplit-debuginfo
specifiedobject
scope specified
object
scope specifiedobject
scope
a68c2fb
to
674cdaf
Compare
Find a way that can fix issue on both Linux and Darwin. |
674cdaf
to
4fa22f1
Compare
object
scopeSO
and DW_AT_comp_dir
symbols for root DI node
@bors try |
fix(trim-paths): trim `SO` and `DW_AT_comp_dir` symbols for root DI node This is one way to fix <rust-lang#117652>. ## The issue When `--remap-path-scope=object` is specified, user expect that there is no local path embedded in final executables. Under `object` scope, the current implementation only remap debug symbols if debug info is splitted into its own file. In other words, when `split-debuginfo=packed|unpacked` is set, rustc assumes there is no embedded path in the final executable needing to be remapped. However, this doesn't work. * On Linux, the root `DW_AT_comp_dir` of a compile unit seems to go into the binary executables. * On macOS, `SO` symbols are embedded in binary executables and libraries regardless a split-debuginfo file is built. Each `SO` symbol contains a path to the root source file of a debug info compile unit. ## The approach Path of working directory in the root DI node seems to be embedded in executables. Hence, we trim them when the scope of `unsplit-debuginfo` is present, as if it is kinda a "unsplit" debuginfo. ## Unresolved issue * Not sure where to add more test to consolidate it. * Haven't investigate if we should apply the same logic to cranelift [here](https://github.com/rust-lang/rust/blob/64d7e0d0b61c460fbc882ae37c0f236756dd9c39/compiler/rustc_codegen_cranelift/src/debuginfo/mod.rs#L68-L80). * Not sure if there is any other consequence doing this, but AFAIK debugging still works on macOS and Linux with `profile.dev.trim-paths="object"` fairly (with some extra tweaks in cargo).
☀️ Try build successful - checks-actions |
This is ready for review again :) |
I'm not sure I follow your logic for the
Does this mean that you associate a filepath with When I implemented the RFC in #115214 I just made all the places that remapped paths to be conditional on the scope(s) they applied to (when emitting a diagnostic it's
The variable
Well that's currently the case, it's only ever called with the |
I think I might have formulated things in a confusing way. With "filepath" I just mean the path that is being remapped, that is, the thing that implements
I'm not sure I agree with that (although the distinction is subtle). In my mind, we use the concept of "scopes" to model where a given path will be emitted to. For example, if we are about to emit some path as part of a diagnostic message, the we use By extension, if we are storing a path to some place (like debuginfo in LLVM IR) that might then end up in multiple output artifacts and these output artifacts are in different scopes (e.g. LLVM might write it both to the object file and to the |
I see, using the same word for a (subtle) different meaning. Thanks for clearing that up.
👍 Agree.
I think we already have the logic for handling this in I also wouldn't expect the need to call |
# - Debuginfo in binary file | ||
# - `.o` deleted | ||
# - `.dSYM` present | ||
# - in binary, paths from `N_SO` (source files) and `N_OSO` (object files) shouldn be remapped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @michaelwoerister. I've move #118518 (review) to this thread so we can follow the discussion better.
I'm not quite clear yet on the situation on macOS. Do you have a link to some documentation on the various SO symbols?
It's also unclear to me. I have no idea if there is an official doc for Mach-O debug symbols. Here is what I've found:
- Apple's “Lazy” DWARF Scheme
- Apple's Linker & Deterministic Builds
- SymbolTable.pm from old Apple open source header (search for
N_SO
andN_OSO
) - ArchiveWriter.cpp from LLVM
- My own experiments on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to read up on those links by the end of the week. If we can't get to a clear conclusion, I think we should merge the PR nonetheless, for the improved behavior it provides on Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, from reading through those links it sounds to me like:
- SO symbols we can handle fine by remapping the paths generated by debuginfo.
- OSO symbols are added by the linker. They are not needed after dsymutil has built the dSYM bundle. So it would be fine to remove them after building the dSYM bundle.
I think the best option would be to use the -oso_prefix
linker option. My guess is that will not be a problem even on older macOS versions. But I don't know if we have a minimum supported XCode version (in addition to a minimum support macOS version).
Other than that, rewriting/removing the OSO symbols in a postprocessing step seems to be our only option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weihanglo, do you know how -Cstrip=debuginfo
behaves on macOS? Does that remove the OSO symbols?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It runs strip -S
on macOS and removes OSO symbols successfully, and the .dSYM
bundle can successfully find sources for a simple hello-world.rs
example.
Should we always pass -Cstrip=debuginfo
to deal with OSO symbols? I am not sure…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like -Cstrip=debuginfo
is now the default unless debuginfo is requested. We could also do a similar thing for when trim-paths is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good, but only when Cargo is involved. OSO is still there if using rustc directly.
(granted, people invoking rustc directly should be able to set -Cstript=debuginfo
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do you mean that rustc should implicitly strip debuginfo under some trim-paths circumstances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more thinking about the Cargo case. For direct rustc invocations, the defaults are less important, I think, because that's more of an advanced use case. The test cases here could simulate what we expect Cargo to do.
@Urgau, if you are interested in opening a PR that asserts that only a single scope is passed to for_scope, and that also adds a docstring saying that only a single scope is supported, you can r? me. Since for_scope currently accepts a set of scopes, I think it is a bit confusing at the moment. But a bit of documentation and an assertion, would be an easy fix for that. |
Failed to set assignee to
|
When `--remap-path-scope=object` is specified, user expect that there is no local path embedded in final executables. Under `object` scope, the current implementation only remap debug symbols if debug info is splitted into its own file. In other words, when `split-debuginfo=packed|unpacked` is set, rustc assumes there is no embedded path in the final executable needing to be remapped. However, this doesn't work on macOS. On macOS, `SO` symbols are embedded in binary executables and libraries regardless a split-debuginfo file is built. Each `SO` symbol contains a path to the root source file of a debug info compile unit. This commit demonstrates the case, and hope there is a fix soon.
When `--remap-path-scope=object` is specified, user expect that there is no local path embedded in final executables. Under `object` scope, the current implementation only remap debug symbols if debug info is splitted into its own file. In other words, when `split-debuginfo=packed|unpacked` is set, rustc assumes there is no embedded path in the final executable needing to be remapped. However, this doesn't work on Linux. On Linux, the root `DW_AT_comp_dir` of a compile unit seems to go into the binary binary executables. This commit demonstrates the case, and hope there is a fix soon.
e15a22c
to
930905c
Compare
Path of working directory in the root DI node seems to be embedded in executables. Hence, we trim them when the scope of `unsplit-debuginfo` is present, as if it is kinda a "unsplit" debuginfo.
930905c
to
c2b1547
Compare
Marking the PR as blocked until we've discussed whether the RFC can be implemented at all in its current form. See #118518 (comment) |
…chaelwoerister Assert that a single scope is passed to `for_scope` Addresses rust-lang#118518 (comment) r? `@michaelwoerister`
…chaelwoerister Assert that a single scope is passed to `for_scope` Addresses rust-lang#118518 (comment) r? ``@michaelwoerister``
Rollup merge of rust-lang#120230 - Urgau:for_scope-single-scope, r=michaelwoerister Assert that a single scope is passed to `for_scope` Addresses rust-lang#118518 (comment) r? ``@michaelwoerister``
☔ The latest upstream changes (presumably #122450) made this pull request unmergeable. Please resolve the merge conflicts. |
I am a bit lost. @michaelwoerister, this is still worth pursuing, or at least I should verify the current situation again, right? |
Some of the test changes look worth landing, even if the logic change is no longer necessary. Better have more tests than not enough. |
It's definitely worth having tests for the trim-paths related functionality. Maybe its better to create a number of smaller test cases. It might also be possible to use regular UI tests for this. I recently saw something pretty clever in another PR where a UI test looks for its own PDB file: We could do the same, i.e. just load binaries and separate debuginfo files into memory and then treat them as byte slices where we look for the text we want to find (or don't want to find). That should be a lot more readable than run-make tests. If compiletest allows it, we can also put some shared code into a common aux-crate. Regarding what to test: I suggest picking the most relevant configurations from the table in the tracking issue and then have a test for linux, macOS, and MSVC. For each useful It might be good to split this into a number of smaller PRs, starting with a simple case just for Linux or macOS. Feel free to assign to me for review. |
Currently don't have time working on this. It seems that we need a complete rewrite because runmake is under the gigantic porting. So close. |
Thanks, @weihanglo! This PR is triggered a lot of valuable work around clarifying the RFC. We'll get there 🙂 |
This is one way to fix #117652.
The issue
When
--remap-path-scope=object
is specified, user expect that there isno local path embedded in final executables. Under
object
scope, thecurrent implementation only remap debug symbols if debug info is
splitted into its own file. In other words, when
split-debuginfo=packed|unpacked
is set, rustc assumes there is noembedded path in the final executable needing to be remapped.
However, this doesn't work.
DW_AT_comp_dir
of a compile unit seems to go into the binary executables.SO
symbols are embedded in binary executables and libraries regardless a split-debuginfo file is built. EachSO
symbol contains a path to the root source file of a debug info compile unit.The approach
Path of working directory in the root DI node seems to be embedded in
executables. Hence, we trim them when the scope of
unsplit-debuginfo
is present, as if it is kinda a "unsplit" debuginfo.
Unresolved issue
profile.dev.trim-paths="object"
fairly (with some extra tweaks in cargo).from https://github.com/llvm/llvm-project/blob/847d8457d16a7334ba39bdd35c70faa1b295304d/clang/lib/CodeGen/CGDebugInfo.cpp#L623-L631