Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Add LTO support to clang in dist-x86_64-linux #134690

Merged
merged 1 commit into from
Dec 28, 2024

Conversation

clubby789
Copy link
Contributor

@clubby789 clubby789 commented Dec 23, 2024

After rust-lang/cc-rs#1279, we attempt to pass -flto=thin to clang. In dist-x86_64-linux, we don't build clang with the LLVMgold.so library so this fails. This attempts to resolve this
First, pass the binutils plugin include directory to Clang, which will build the library
Second, this library depends on the version of libstdc++ that we built specifically. However, despite both the RPATH and LD_LIBRARY_PATH pointing to /rustroot/lib, we incorrectly resolve to the system libstdc++, which doesn't load.

# LD_DEBUG=libs,files
      2219:    file=libstdc++.so.6 [0];  needed by /rustroot/bin/../lib/LLVMgold.so [0]
      2219:    find library=libstdc++.so.6 [0]; searching
      2219:     search path=/rustroot/bin/../lib/../lib        (RPATH from file /rustroot/bin/../lib/LLVMgold.so)
      2219:      trying file=/rustroot/bin/../lib/../lib/libstdc++.so.6
      2219:     search path=/usr/lib64/tls:/usr/lib64        (system search path)
      2219:      trying file=/usr/lib64/tls/libstdc++.so.6
      2219:      trying file=/usr/lib64/libstdc++.so.6

Using LD_PRELOAD causes it to correctly load the library

I think this is probably not the most maintainable way to do this, so opening to see if this is desired and if there's a better way of doing this

@rustbot
Copy link
Collaborator

rustbot commented Dec 23, 2024

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Dec 23, 2024
@clubby789
Copy link
Contributor Author

(just to make sure this works on GH too)
@bors try

@bors
Copy link
Contributor

bors commented Dec 23, 2024

⌛ Trying commit 7f7cd6e with merge 489bb36...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 23, 2024
CI: Add LTO support to clang in dist-x86_64-linux

After rust-lang/cc-rs#1279, we attempt to pass `-flto=thin` to clang. In `dist-x86_64-linux`, we don't build clang with the `LLVMgold.so` library so this fails. This attempts to resolve this
First, pass the binutils plugin include directory to Clang, [which will build the library](https://github.com/llvm/llvm-project/blob/2d6d723a85c2d007b0359c206d66cd2e5a9f00e1/llvm/docs/GoldPlugin.rst#how-to-build-it)
Second, this library depends on the *version of libstdc++ that we built* specifically. However, despite both the RPATH and LD_LIBRARY_PATH pointing to `/rustroot/lib`, we incorrectly resolve to the system libstdc++, which doesn't load.
```
# LD_DEBUG=libs,files
      2219:    file=libstdc++.so.6 [0];  needed by /rustroot/bin/../lib/LLVMgold.so [0]
      2219:    find library=libstdc++.so.6 [0]; searching
      2219:     search path=/rustroot/bin/../lib/../lib        (RPATH from file /rustroot/bin/../lib/LLVMgold.so)
      2219:      trying file=/rustroot/bin/../lib/../lib/libstdc++.so.6
      2219:     search path=/usr/lib64/tls:/usr/lib64        (system search path)
      2219:      trying file=/usr/lib64/tls/libstdc++.so.6
      2219:      trying file=/usr/lib64/libstdc++.so.6
```

Using `LD_PRELOAD` causes it to correctly load the library

I think this is probably not the most maintainable way to do this, so opening to see if this is desired and if there's a better way of doing this
@bors
Copy link
Contributor

bors commented Dec 23, 2024

☀️ Try build successful - checks-actions
Build commit: 489bb36 (489bb36d8e95310103206a99218960c6bb55bd35)

@Kobzol
Copy link
Contributor

Kobzol commented Dec 23, 2024

So if I understand it correctly, the cc detection that checks whether the used C/C++ compiler supports LTO says that it doesn't on our CI, and therefore LTO isn't passed to the C/C++ code that we compile? What does gold have to do with that? And also why does it matter if we modify the compilation of the host LLVM? Don't we build our own in-tree LLVM using cmake, instead of cc? Or is this for building other C/C++ code in bootstrap?

It's interesting why the cc bump had such perf. wins if LTO e.g. for jemalloc didn't work. Could it actually be PGO that caused the wins? 🤔

@clubby789
Copy link
Contributor Author

clubby789 commented Dec 23, 2024

The link in the PR explains some of this. Essentially, we use our compiled clang to build a few things, e.g. the C bindings for rustc_llvm.
As of the recent update, cc attempts to pass our LTO flags to clang, but currently the clang we build doesn't support LTO.

To enable it, we need to build LLVMgold.so, which is a plugin for gold. We do that by passing the binutils plugin include dir when building clang/llvm.

Now I'm not sure why clang is using gold for LTO, given that I'd assume lld supports it? Perhaps some other clang configuration option is needed. But if you run clang -flto=thin test.c without this patch in the Docker container, it will error on searching for LLVMgold

@Kobzol
Copy link
Contributor

Kobzol commented Dec 24, 2024

@rust-timer build 489bb36

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (489bb36): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-5.6%, -0.3%] 277
Improvements ✅
(secondary)
-0.9% [-2.9%, -0.2%] 238
All ❌✅ (primary) -0.7% [-5.6%, -0.3%] 277

Max RSS (memory usage)

Results (primary -2.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.5% [-3.3%, -1.6%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.5% [-3.3%, -1.6%] 3

Cycles

Results (primary -4.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.1% [-7.6%, -1.2%] 11
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -4.1% [-7.6%, -1.2%] 11

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 763.12s -> 761.61s (-0.20%)
Artifact size: 330.55 MiB -> 336.31 MiB (1.74%)

@clubby789
Copy link
Contributor Author

😮

@Kobzol
Copy link
Contributor

Kobzol commented Dec 24, 2024

Well, it certainly seems like it works, and LTO is now enabled for something.

@clubby789
Copy link
Contributor Author

Profiling locally it seemed like it was mostly jemalloc

@Kobzol
Copy link
Contributor

Kobzol commented Dec 25, 2024

That seems consistent with the last perf. result, and it also makes the most sense, it's probably the most perf. sensitive C/C++ thing that we build (outside of LLVM, of course, but that shouldn't be affected by cc).

@klensy
Copy link
Contributor

klensy commented Dec 25, 2024

rustc ? | Binary | 2.16 MiB | 5.03 MiB | 2.87 MiB | 133.336%
rustdoc ? | Binary | 15.52 MiB | 18.39 MiB | 2.87 MiB | 18.506%

almost + 3mb size for rustc/rustdoc weird, are they stripped(no)?

.text section twice big, debug sections too.

Weird, new rustc binary don't have our jemalloc _rjem_ prefixed jemalloc, but still have correct version string 5.3.0-1-ge13ca993e8ccb9ba9847cc330696e02839f328f7.

Okay, but why prefixes was used before?

features = ['unprefixed_malloc_on_supported_platforms']

@Kobzol
Copy link
Contributor

Kobzol commented Dec 25, 2024

┌─────────────────────┬───────────────┬──────────────┬──────────┬──────────┐
│ Sections            │ Size (before) │ Size (after) │     Diff │ Diff (%) │
├─────────────────────┼───────────────┼──────────────┼──────────┼──────────┤
│ .debug_info         │    800.97 KiB │     2.12 MiB │ +1399669 │  +170.7% │
│ .debug_loclists     │    384.40 KiB │     1.14 MiB │  +799812 │  +203.2% │
│ .debug_line         │    236.25 KiB │   559.95 KiB │  +331469 │  +137.0% │
│ .text               │    216.98 KiB │   477.87 KiB │  +267157 │  +120.2% │
│ .debug_rnglists     │     62.10 KiB │   187.06 KiB │  +127953 │  +201.2% │
│ .debug_addr         │     82.10 KiB │   174.95 KiB │   +95080 │  +113.1% │
│ .debug_str          │     85.41 KiB │    74.69 KiB │   -10979 │   -12.6% │
│ .debug_str_offsets  │    119.21 KiB │   129.04 KiB │   +10060 │    +8.2% │
│ .debug_abbrev       │     49.41 KiB │    55.84 KiB │    +6587 │   +13.0% │
│ .strtab             │     35.85 KiB │    30.68 KiB │    -5300 │   -14.4% │
│ .symtab             │     31.27 KiB │    28.05 KiB │    -3288 │   -10.3% │
│ .eh_frame           │     30.61 KiB │    27.80 KiB │    -2868 │    -9.2% │
│ .relro_padding      │      1.28 KiB │        320 B │     -992 │   -75.6% │
│ .eh_frame_hdr       │      4.98 KiB │     4.21 KiB │     -792 │   -15.5% │
│ .debug_line_str     │      2.43 KiB │     2.73 KiB │     +309 │   +12.4% │
│ .rodata             │     10.16 KiB │     9.99 KiB │     -176 │    -1.7% │
│ .rela.dyn           │     23.30 KiB │    23.16 KiB │     -144 │    -0.6% │
│ .bss                │      2.04 MiB │     2.04 MiB │      -79 │    -0.0% │
│ .data               │         568 B │        528 B │      -40 │    -7.0% │
│ .data.rel.ro        │     18.59 KiB │    18.57 KiB │      -16 │    -0.1% │
│ .got                │         104 B │         88 B │      -16 │   -15.4% │
│ <19 unchanged rows> │      9.82 KiB │     9.82 KiB │        0 │     0.0% │
│─────────────────────│───────────────│──────────────│──────────│──────────│
│ Total               │      4.19 MiB │     7.07 MiB │ +3013406 │   +68.5% │
└─────────────────────┴───────────────┴──────────────┴──────────┴──────────┘

There are a few weird things indeed. It really looks like the _rjem prefix is gone, but we shouldn't have been using it even before.. The increase in .text seems kind of expected, in theory, the code can be bigger after LTO, a bunch of stuff apparently got inlined into the jemalloc allocation functions.

It also seems like debuginfo somehow gets duplicated, but I have no idea why is that, that probably warrants further investigation.

The perf. results are nice enough that I don't think that we need to block this PR on that investigation, though.

@Kobzol
Copy link
Contributor

Kobzol commented Dec 25, 2024

It would be nice to find out where do these binary size regressions come from, but I don't think that we need to hold this PR on that. So unless you want to investigate further, you can r=me.

@clubby789
Copy link
Contributor Author

Did a little experimenting with dwz (needed to rollback the code to 2021 and apply https://inbox.sourceware.org/dwz/CH0PR12MB52659E9758818EBFDFE85EA8962E9@CH0PR12MB5265.namprd12.prod.outlook.com/ to make it work with rustc).
Running dwz on master takes us from 2.2m to 2.0 m, while running it on this build takes us from 5.0m to 4.3m

@lqd
Copy link
Member

lqd commented Dec 26, 2024

Weren’t we stripping debuginfo from the driver?

@Kobzol
Copy link
Contributor

Kobzol commented Dec 26, 2024

The debuginfo seems to be also in rustc, which I'm not sure if is expected. I trippes stripping debuginfo from rustc and the result binary had 600 KiB! Maybe we could just strip it, but we'd need to check if ICEs don't regress.

@lqd
Copy link
Member

lqd commented Dec 26, 2024

Maybe we didn't strip it because there wasn't a lot of debuginfo there then? I'm pretty sure we discussed this before but it wasn't that big of a win, and maybe these bins have ballooned in size since then (another use case for artifact size history and graph in rustc-perf). (update: it seems not, they've started at around 2.9 or so when we started recording artifact sizes, and have reduced since then...)

4MBs is huge for a single function + allocator override. But also I'm not sure what you were looking at, https://perf.rust-lang.org/compare.html?tab=artifact-size shows rustc at 2.5MiB (which is still huge) and this PR's at 5MiB. Maybe rerun your binary size command on its CI artifacts.

rustc's main is where we override the allocator to jemalloc so it's not crazy that more of jemalloc's code and data would show up in rustc's binary -- like rustdoc, but the code is weirdly inside librustc and not the binary launcher; miri should be setup like rustc and will likely also see the same size increase. I don't know about clippy.

It could be a new difference due to the cc PR (which was a 10% size increase for rustc) that this PR would surface more, e.g. some config expectation mismatch. We should:

  • check whether jemalloc is built with or without debuginfo (implicitly or explicitly) by jemalloc-sys and if that matches bootstrap's config. IIRC jemalloc has some things that are enabled by default and need to explicitly be disabled (stats gathering come to mind, which was only recently disabled in the last bump).
  • if that debuginfo is useful or not for whatever info we could get out of jemalloc, so that we don't regress by removing it
  • remove it via the build config for the -sys crate, or strip if after the fact during dist

@klensy
Copy link
Contributor

klensy commented Dec 26, 2024

Stats in jemalloc is weird: config will disable only part of it, while other parts still will be present in binary.

@clubby789
Copy link
Contributor Author

clubby789 commented Dec 26, 2024

It seems like jemalloc (the C library) is unconditionally built with -g3. jemalloc/jemalloc#2333. If I modify the jemalloc configuration to not provide a -g3 flag, the produced static library drops from ~20mb to ~1mb.
I guess we could either make a PR to tikv-jemalloc-sys to make this configurable, or just strip -d our final artifacts (it looks like pretty much all the debuginfo is indeed from jemalloc)

@bors
Copy link
Contributor

bors commented Dec 27, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 27, 2024
@rust-log-analyzer

This comment has been minimized.

@Kobzol
Copy link
Contributor

Kobzol commented Dec 27, 2024

Ah, the build-gcc.sh script is used in more jobs than the x64 dist, I thought that we reverted that, but this refactoring was done sooner. Could you please add the environment variable to all Dockerfiles that use the script?

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 27, 2024
Strip debuginfo from rustc-main and rustdoc

r? `@Kobzol`
Split from rust-lang#134690
@clubby789
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Dec 27, 2024

⌛ Trying commit 9e57593 with merge 4ce5e49...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 27, 2024
CI: Add LTO support to clang in dist-x86_64-linux

After rust-lang/cc-rs#1279, we attempt to pass `-flto=thin` to clang. In `dist-x86_64-linux`, we don't build clang with the `LLVMgold.so` library so this fails. This attempts to resolve this
First, pass the binutils plugin include directory to Clang, [which will build the library](https://github.com/llvm/llvm-project/blob/2d6d723a85c2d007b0359c206d66cd2e5a9f00e1/llvm/docs/GoldPlugin.rst#how-to-build-it)
Second, this library depends on the *version of libstdc++ that we built* specifically. However, despite both the RPATH and LD_LIBRARY_PATH pointing to `/rustroot/lib`, we incorrectly resolve to the system libstdc++, which doesn't load.
```
# LD_DEBUG=libs,files
      2219:    file=libstdc++.so.6 [0];  needed by /rustroot/bin/../lib/LLVMgold.so [0]
      2219:    find library=libstdc++.so.6 [0]; searching
      2219:     search path=/rustroot/bin/../lib/../lib        (RPATH from file /rustroot/bin/../lib/LLVMgold.so)
      2219:      trying file=/rustroot/bin/../lib/../lib/libstdc++.so.6
      2219:     search path=/usr/lib64/tls:/usr/lib64        (system search path)
      2219:      trying file=/usr/lib64/tls/libstdc++.so.6
      2219:      trying file=/usr/lib64/libstdc++.so.6
```

Using `LD_PRELOAD` causes it to correctly load the library

I think this is probably not the most maintainable way to do this, so opening to see if this is desired and if there's a better way of doing this

try-job: dist-i686-linux
@bors
Copy link
Contributor

bors commented Dec 27, 2024

☀️ Try build successful - checks-actions
Build commit: 4ce5e49 (4ce5e497c590e4e03fea30dbd3612b609ed336a8)

@clubby789
Copy link
Contributor Author

Looks like i686 is okay now

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 27, 2024
Strip debuginfo from rustc-main and rustdoc

r? `@Kobzol`
Split from rust-lang#134690
@Kobzol
Copy link
Contributor

Kobzol commented Dec 27, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Dec 27, 2024

📌 Commit 9e57593 has been approved by Kobzol

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 27, 2024
@bors
Copy link
Contributor

bors commented Dec 27, 2024

⌛ Testing commit 9e57593 with merge ecc1899...

@bors
Copy link
Contributor

bors commented Dec 28, 2024

☀️ Test successful - checks-actions
Approved by: Kobzol
Pushing ecc1899 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 28, 2024
@bors bors merged commit ecc1899 into rust-lang:master Dec 28, 2024
7 checks passed
@rustbot rustbot added this to the 1.85.0 milestone Dec 28, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (ecc1899): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-5.5%, -0.3%] 275
Improvements ✅
(secondary)
-0.9% [-2.6%, -0.2%] 235
All ❌✅ (primary) -0.7% [-5.5%, -0.3%] 275

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (primary -2.4%, secondary -2.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.4% [-4.1%, -0.7%] 2
Improvements ✅
(secondary)
-2.4% [-2.4%, -2.4%] 1
All ❌✅ (primary) -2.4% [-4.1%, -0.7%] 2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 763.214s -> 761.661s (-0.20%)
Artifact size: 325.15 MiB -> 325.61 MiB (0.14%)

poliorcetics pushed a commit to poliorcetics/rust that referenced this pull request Dec 28, 2024
Strip debuginfo from rustc-main and rustdoc

r? `@Kobzol`
Split from rust-lang#134690
poliorcetics pushed a commit to poliorcetics/rust that referenced this pull request Dec 28, 2024
CI: Add LTO support to clang in dist-x86_64-linux

After rust-lang/cc-rs#1279, we attempt to pass `-flto=thin` to clang. In `dist-x86_64-linux`, we don't build clang with the `LLVMgold.so` library so this fails. This attempts to resolve this
First, pass the binutils plugin include directory to Clang, [which will build the library](https://github.com/llvm/llvm-project/blob/2d6d723a85c2d007b0359c206d66cd2e5a9f00e1/llvm/docs/GoldPlugin.rst#how-to-build-it)
Second, this library depends on the *version of libstdc++ that we built* specifically. However, despite both the RPATH and LD_LIBRARY_PATH pointing to `/rustroot/lib`, we incorrectly resolve to the system libstdc++, which doesn't load.
```
# LD_DEBUG=libs,files
      2219:    file=libstdc++.so.6 [0];  needed by /rustroot/bin/../lib/LLVMgold.so [0]
      2219:    find library=libstdc++.so.6 [0]; searching
      2219:     search path=/rustroot/bin/../lib/../lib        (RPATH from file /rustroot/bin/../lib/LLVMgold.so)
      2219:      trying file=/rustroot/bin/../lib/../lib/libstdc++.so.6
      2219:     search path=/usr/lib64/tls:/usr/lib64        (system search path)
      2219:      trying file=/usr/lib64/tls/libstdc++.so.6
      2219:      trying file=/usr/lib64/libstdc++.so.6
```

Using `LD_PRELOAD` causes it to correctly load the library

I think this is probably not the most maintainable way to do this, so opening to see if this is desired and if there's a better way of doing this
@Mark-Simulacrum Mark-Simulacrum added the relnotes-perf Performance improvements that should be mentioned in the release notes. label Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc merged-by-bors This PR was explicitly merged by bors. relnotes-perf Performance improvements that should be mentioned in the release notes. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants