Support toolchain with optimization levels included in multilib configurations #79887

keith-packard · 2024-10-15T19:58:05Z

As a way to provide both space and speed optimized versions of every library provided by a toolchain, one option I'm exploring is adding optimization level to the existing multilib infrastructure. With this, specifying -Os during the linking phase will cause the compiler driver to adjust the link path to point at a directory containing libraries also build with -Os.

To make this work in Zephyr, a couple of changes are required:

CMAKE_C_FLAGS_${CMAKE_BUILD_TYPE} must match ${OPTIMIZATION_FLAG}. That's because multi-lib semantics are different from how optimization flags are otherwise used. The multi-lib matcher only looks for the presence of the -Os flag; further -O2 or -O3 flags later in the command line have no impact. As a result, when you set CMAKE_BUILD_TYPE to MinSizeRel and CMAKE_C_FLAGS_MINSIZEREL contains '-Os', that drives the multi-lib code to select the space optimized libraries.
Library paths generated by the compiler now depend on the optimization level. That affects the computation of gcc's LIBC_LIBRARY_DIR and LIBGCC_DIR variables. Neither of these is needed to locate any libraries; the compiler driver will already look for those in the correct multi-lib directory anyways, so I've removed their default computations. However, there is one case where LIBGCC_DIR is used to fine object files: crtbegin.o and crtend.o. I've created a function, compiler_file_name which uses the gcc --print-file-name option to search for those files. Because those file names are needed well after OPTIMIZATION_FLAG is set, we can add that to the gcc command line and have it perform the full multi-lib search for us.

With these changes, I've got -Os and -O2 tests for at least a few cases working using locally-built SDK toolchains that include the necessary changes.

Size:

$ west build -b qemu_cortex_m3 samples/hello_world -- -DCONFIG_SIZE_OPTIMIZATIONS=y
...
Memory region         Used Size  Region Size  %age Used
           FLASH:        8122 B       256 KB      3.10%
             RAM:          4 KB        64 KB      6.25%
        IDT_LIST:          0 GB        32 KB      0.00%

Speed:

$ west build -b qemu_cortex_m3 samples/hello_world -- -DCONFIG_SPEED_OPTIMIZATIONS=y
...
Memory region         Used Size  Region Size  %age Used
           FLASH:        9350 B       256 KB      3.57%
             RAM:          4 KB        64 KB      6.25%
        IDT_LIST:          0 GB        32 KB      0.00%

Note that this extends to all toolchain-provided libraries, including newlib and libstdc++.

In addition to warning about a mismatch between Zephyr's optimization setting and the CMAKE_BUILD_TYPE, apply Zephyr's optimization flags to the relevant CMAKE_C_FLAGS_ value so that when those are used, the right value is applied. When the warning is disabled, leave things alone. Signed-off-by: Keith Packard <[email protected]>

Cover the other common C library Signed-off-by: Keith Packard <[email protected]>

Instead of pre-computing the toolchain library path using options available early in the configuration process, delay until the related file names are actually needed, by which time the remaining compiler options have been computed (most notably ${OPTIMIZATION_FLAG}). Remove the computatlib of LIBGCC_DIR entirely; that will be done by the compiler driver at link time without needed any help. Signed-off-by: Keith Packard <[email protected]>

keith-packard · 2024-10-15T20:01:35Z

This obviously requires a new toolchain to be useful; that work is visible here: zephyrproject-rtos/sdk-ng#821

andyross

No complaints here, though I still maintain that the sudden interest in optimized libc is really a symptom of a poorly-thought-out API in msgq that demands a general copy vs. the ThreadX equivalent which is word-chunked.

I mean, sure, better to have a fast memcpy() than not (and, OK, picolibc looked to be benching a bit behind our own built-in and newlib). But really we shouldn't have been calling libc at all in that case and eating complexity here may not be as good a tradeoff as it sounds (though this doesn't look so bad at all).

keith-packard · 2024-10-15T20:05:58Z

I mean, sure, better to have a fast memcpy() than not (and, OK, picolibc looked to be benching a bit behind our own built-in and newlib). But really we shouldn't have been calling libc at all in that case and eating complexity here may not be as good a tradeoff as it sounds (though this doesn't look so bad at all).

Picolibc is providing the size-optimized version by default while newlib is build with speed optimizations turned on. We could always hack up the picolibc code to always provide speed-optimized copy code; it already has the fast-strcmp option which was added to make Dhrystone results better (ick).

andyross · 2024-10-17T19:35:12Z

We could always hack up the picolibc code to always provide speed-optimized copy code

That actually sounds like it might be the best default to me if the rig is there to change the optimizations on per-module levels. Really just mem{cpy,move,set}() would surely be all that Zephyr cares about (we make very sparing use of the string library).

keith-packard · 2024-10-18T20:37:58Z

We could always hack up the picolibc code to always provide speed-optimized copy code

That actually sounds like it might be the best default to me if the rig is there to change the optimizations on per-module levels. Really just mem{cpy,move,set}() would surely be all that Zephyr cares about (we make very sparing use of the string library).

Getting the toolchain libraries all built with -Os means we also get -Os versions of libgcc and libstdc++. The latter is probably of interest to anyone using C++ in their apps where space is an issue (hard for me to imagine, but?).

And, of course, applications using more of the C library than core Zephyr might have a deeper interest in the space/speed tradeoff here.

I just don't know; using the multilib stuff to enable this feature seems like a modest change, instead of having a wealth of per-API configuration bits? I assume we wouldn't always want speed-optimized memcpy?

keith-packard added 3 commits October 15, 2024 09:56

tests/kernel: Add common test using newlib

daee96c

Cover the other common C library Signed-off-by: Keith Packard <[email protected]>

keith-packard marked this pull request as draft October 15, 2024 19:58

zephyrbot added area: Build System area: Kernel area: Toolchains Toolchains labels Oct 15, 2024

zephyrbot requested review from 57300, andyross, ceolin, cfriedt, dcpleung, jeremybettis, nashif, nordicjm, npitre, peter-mitsis, stephanosio, TaiJuWu and tejlmand October 15, 2024 19:58

zephyrbot assigned tejlmand Oct 15, 2024

keith-packard removed request for andyross, npitre, ceolin, cfriedt, tejlmand, stephanosio, TaiJuWu, dcpleung and nordicjm October 15, 2024 19:59

keith-packard requested review from tejlmand and nordicjm and removed request for jeremybettis, 57300 and peter-mitsis October 15, 2024 19:59

andyross approved these changes Oct 15, 2024

View reviewed changes

cfriedt approved these changes Oct 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support toolchain with optimization levels included in multilib configurations #79887

Support toolchain with optimization levels included in multilib configurations #79887

keith-packard commented Oct 15, 2024

keith-packard commented Oct 15, 2024

andyross left a comment

keith-packard commented Oct 15, 2024

andyross commented Oct 17, 2024

keith-packard commented Oct 18, 2024

Support toolchain with optimization levels included in multilib configurations #79887

Are you sure you want to change the base?

Support toolchain with optimization levels included in multilib configurations #79887

Conversation

keith-packard commented Oct 15, 2024

keith-packard commented Oct 15, 2024

andyross left a comment

Choose a reason for hiding this comment

keith-packard commented Oct 15, 2024

andyross commented Oct 17, 2024

keith-packard commented Oct 18, 2024