Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support toolchain with optimization levels included in multilib configurations #79887

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

keith-packard
Copy link
Collaborator

As a way to provide both space and speed optimized versions of every library provided by a toolchain, one option I'm exploring is adding optimization level to the existing multilib infrastructure. With this, specifying -Os during the linking phase will cause the compiler driver to adjust the link path to point at a directory containing libraries also build with -Os.

To make this work in Zephyr, a couple of changes are required:

  1. CMAKE_C_FLAGS_${CMAKE_BUILD_TYPE} must match ${OPTIMIZATION_FLAG}. That's because multi-lib semantics are different from how optimization flags are otherwise used. The multi-lib matcher only looks for the presence of the -Os flag; further -O2 or -O3 flags later in the command line have no impact. As a result, when you set CMAKE_BUILD_TYPE to MinSizeRel and CMAKE_C_FLAGS_MINSIZEREL contains '-Os', that drives the multi-lib code to select the space optimized libraries.
  2. Library paths generated by the compiler now depend on the optimization level. That affects the computation of gcc's LIBC_LIBRARY_DIR and LIBGCC_DIR variables. Neither of these is needed to locate any libraries; the compiler driver will already look for those in the correct multi-lib directory anyways, so I've removed their default computations. However, there is one case where LIBGCC_DIR is used to fine object files: crtbegin.o and crtend.o. I've created a function, compiler_file_name which uses the gcc --print-file-name option to search for those files. Because those file names are needed well after OPTIMIZATION_FLAG is set, we can add that to the gcc command line and have it perform the full multi-lib search for us.

With these changes, I've got -Os and -O2 tests for at least a few cases working using locally-built SDK toolchains that include the necessary changes.

Size:

$ west build -b qemu_cortex_m3 samples/hello_world -- -DCONFIG_SIZE_OPTIMIZATIONS=y
...
Memory region         Used Size  Region Size  %age Used
           FLASH:        8122 B       256 KB      3.10%
             RAM:          4 KB        64 KB      6.25%
        IDT_LIST:          0 GB        32 KB      0.00%

Speed:

$ west build -b qemu_cortex_m3 samples/hello_world -- -DCONFIG_SPEED_OPTIMIZATIONS=y
...
Memory region         Used Size  Region Size  %age Used
           FLASH:        9350 B       256 KB      3.57%
             RAM:          4 KB        64 KB      6.25%
        IDT_LIST:          0 GB        32 KB      0.00%

Note that this extends to all toolchain-provided libraries, including newlib and libstdc++.

In addition to warning about a mismatch between Zephyr's optimization
setting and the CMAKE_BUILD_TYPE, apply Zephyr's optimization flags to the
relevant CMAKE_C_FLAGS_ value so that when those are used, the right value
is applied.

When the warning is disabled, leave things alone.

Signed-off-by: Keith Packard <[email protected]>
Cover the other common C library

Signed-off-by: Keith Packard <[email protected]>
Instead of pre-computing the toolchain library path using options available
early in the configuration process, delay until the related file names are
actually needed, by which time the remaining compiler options have been computed
(most notably ${OPTIMIZATION_FLAG}).

Remove the computatlib of LIBGCC_DIR entirely; that will be done by the
compiler driver at link time without needed any help.

Signed-off-by: Keith Packard <[email protected]>
@keith-packard keith-packard requested review from tejlmand and nordicjm and removed request for jeremybettis, 57300 and peter-mitsis October 15, 2024 19:59
@keith-packard
Copy link
Collaborator Author

This obviously requires a new toolchain to be useful; that work is visible here: zephyrproject-rtos/sdk-ng#821

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No complaints here, though I still maintain that the sudden interest in optimized libc is really a symptom of a poorly-thought-out API in msgq that demands a general copy vs. the ThreadX equivalent which is word-chunked.

I mean, sure, better to have a fast memcpy() than not (and, OK, picolibc looked to be benching a bit behind our own built-in and newlib). But really we shouldn't have been calling libc at all in that case and eating complexity here may not be as good a tradeoff as it sounds (though this doesn't look so bad at all).

@keith-packard
Copy link
Collaborator Author

I mean, sure, better to have a fast memcpy() than not (and, OK, picolibc looked to be benching a bit behind our own built-in and newlib). But really we shouldn't have been calling libc at all in that case and eating complexity here may not be as good a tradeoff as it sounds (though this doesn't look so bad at all).

Picolibc is providing the size-optimized version by default while newlib is build with speed optimizations turned on. We could always hack up the picolibc code to always provide speed-optimized copy code; it already has the fast-strcmp option which was added to make Dhrystone results better (ick).

@andyross
Copy link
Contributor

We could always hack up the picolibc code to always provide speed-optimized copy code

That actually sounds like it might be the best default to me if the rig is there to change the optimizations on per-module levels. Really just mem{cpy,move,set}() would surely be all that Zephyr cares about (we make very sparing use of the string library).

@keith-packard
Copy link
Collaborator Author

We could always hack up the picolibc code to always provide speed-optimized copy code

That actually sounds like it might be the best default to me if the rig is there to change the optimizations on per-module levels. Really just mem{cpy,move,set}() would surely be all that Zephyr cares about (we make very sparing use of the string library).

Getting the toolchain libraries all built with -Os means we also get -Os versions of libgcc and libstdc++. The latter is probably of interest to anyone using C++ in their apps where space is an issue (hard for me to imagine, but?).

And, of course, applications using more of the C library than core Zephyr might have a deeper interest in the space/speed tradeoff here.

I just don't know; using the multilib stuff to enable this feature seems like a modest change, instead of having a wealth of per-API configuration bits? I assume we wouldn't always want speed-optimized memcpy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants