Profile-Guided Optimization (PGO) benchmark report #322
zamazan4ik
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the library performance. For reference, results for other projects are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot for many libraries, I decided to apply it to
cosmic-text
to see if the performance win (or loss) can be achieved. Here are my benchmark results.Test environment
cosmic-text
version:main
branch on commit4fe90bb6126c22f589b46768d7754d65ae300c5e
Benchmark
For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench results I got with
taskset -c 0 cargo bench
command. The PGO training phase is done withtaskset -c 0 cargo pgo bench
, PGO optimization phase - withtaskset -c 0 cargo pgo optimize bench
.taskset -c 0
is used to reduce the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).Results
I got the following results:
According to the results, we see constant improvements in the library's performance.
Further steps
At the very least, the library's users can find this performance report and decide to enable PGO for their applications if they care about
cosmic-text
performance in their workloads (maybe other Cosmic apps?). Maybe a small note somewhere in the documentation (the README file?) will be enough to raise awareness about this work. Another way - try to figure out the root cause of performance differences between PGO and non-PGO library versions, and, probably, try to tweak the library sources a bit more - however this way also requires some time to analyze the resulting LLVM IR/assembly differences between them.Also, Post-Link Optimization (PLO) can be tested after PGO. It can be done by applying tools like LLVM BOLT to applications with' cosmic-text' apps. However, it's a much less mature optimization technique compared to PGO.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions