Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -flto flags in WSClean AMD #165

Closed
wants to merge 8 commits into from

Conversation

AlexKurek
Copy link
Contributor

@AlexKurek AlexKurek commented Jan 7, 2025

Description

-lfto was missing. It is here in Intel recipe.

@AlexKurek AlexKurek marked this pull request as draft January 7, 2025 11:13
@AlexKurek AlexKurek marked this pull request as ready for review January 7, 2025 11:26
@AlexKurek AlexKurek marked this pull request as draft January 7, 2025 12:17
@AlexKurek AlexKurek changed the title Tune C FLAGS in WSClean AMD Tune C flags in WSClean AMD Jan 7, 2025
@AlexKurek AlexKurek marked this pull request as ready for review January 7, 2025 16:05
@AlexKurek AlexKurek changed the title Tune C flags in WSClean AMD Add -flto flags in WSClean AMD Jan 7, 2025
@tikk3r
Copy link
Owner

tikk3r commented Jan 10, 2025

Those were not supposed to be there in the Intel recipe. I missed those when looking at the PR. Please make sure changes to compiler flags sneak in with unrelated PRs. I think link time optimisation should be relatively harmless, but this is something that I think should be kept individual track of with some benchmarks ideally.

@AlexKurek
Copy link
Contributor Author

In fact, maybe it would be even better to do this benchamrks passing the flag globally, not just to WSClean?

@AlexKurek
Copy link
Contributor Author

AlexKurek commented Feb 2, 2025

Using current master where:

HAS_CUDA=false
HAS_MKL=false
MARCH=cascadelake
MTUNE=cascadelake
NOAVX512=false
DEBUG=false
OPENBLASTARGET=SKYLAKEX

and https://github.com/tikk3r/flocs/blob/fedora-py3/benches/benchmark_image_6asec.sh as a test, but smaller size and niter, and passing LTO flags globally:

export CFLAGS="-w -march=${MARCH} -mtune=${MTUNE} -flto"
export CXXFLAGS="-w -march=${MARCH} -mtune=${MTUNE} -std=${CPPSTD} -flto"

I got:

no LTO
17m14,381s

LTO
17m7,313s

at the expanse of much longer compilation time.
I would not cross out LTO just yet:

But since it does not work for WSClean, maybe we should close it?

@tikk3r
Copy link
Owner

tikk3r commented Feb 2, 2025

Good to see that it doesn't see to have a negative impact at least. I'm not crossing LTO out, but I do want to confirm via benchmarks that it doesn't harm performance and is worth it. With a marginal 0.5% improvement while taking much longer to build. I would put that in the "not worth it" territory.

Closing this, because unfortunately it doesn't seem to have a worthwhile benefit.

@tikk3r tikk3r closed this Feb 2, 2025
@AlexKurek AlexKurek deleted the Tune-C-FLAGS-for-AMD branch February 2, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants