Performance optimization ideas #302
christianparpart
started this conversation in
Development
Replies: 1 comment 7 replies
-
I'll think more. :) |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I think about it regularly, but have never written down anything.
Maybe this thread can serve as brainstorming of what is all possible to improve. And if we come up with something I didn't mention in this top-post yet, I think it's best to extend it here (potentially with ticket numbers if present), so it's already easy to grasp the conclusions.
Areas of user-perceived performance improvements
And maybe as a result we can yield some smaller tickets to tackle each of these separately. So let's start:
✅ [VT parser] optimize for plaintext-throughput
Some people like to do cat-style throughput performance tests.
The content to optimize for usually contains a lot of characters and regularly newline characters.
This case can be optimized by scanning the input for escape sequences and C0 characters using SIMD to greatly speed up processing. What would be the performance gain? (Answer: plain ASCII text throughput on my Ryzen 9 with 3200MHz RAM: 16 GB/s, plain ASCII with LF (linefeed) chars: 1.9 GB/s - both up from ~250 MB/s)
[VT parser] UTF--8 input & ranged output
Don't first convert UTF-8 to UTF-32 and then feed the parser as only text (and C1) will be using more than one byte.
The parser should then decode UTF-8 iff it detects it, keeping decoding state internal.
Input length should therefore be divisible by 128 bits (16 bytes).
Mind, not just ground state is UTF-8 decoded but also textual payload from sequences like DCS and OSC. But is that good? Sixel payload is huge and should not be decoded for nothing.
[VT parser] L1/L2 cache level optimization
The VT parser currently is a state machine, so using tables. It would be worth investigating to see how well a switch/case-based VT parser performs compared to an FSM-based one. A dedicated test executable would be needed to produce some reliable numbers to reason about what is better.
Screen Grid's Cell
In order to maximize throughput, try to make
Cell
a trivial data type.std::u32string
can become a customstd::array<char32_t, N>
based string together with a size count.Cell
trivally copyable.scrollUp(n)
for the most common (full-margin) case.Screen Grid
Also, the parser should be able to detect bigger chunks of pure LF-delimited ASCII (not containing any other control codes). With such a sequence of N "pure" lines, they could be directly copied into the grid buffer as each line's starting offset can be pre-calculated so it can be a (threaded) paralellized copy
Embrace std::vector over map/unordered_map
With the one talk @whisperity linked me once I realized that it's not really worth using
map
/unordered_map
in most cases. Using avector
and maybe have it sorted for O(log n) speed.[Renderer] OpenGL rendering
While this does not have a direct impact on throughput performance, it may impact input latency as for input latency, the speed of rendering is important (also: input and rendering share the same (main) thread).
Passive render buffer updates
The code paths are all already there, but disabled as for some reason the performance wasn't as expected. Currently the render buffers are updated on request in the render thread, which implies that the terminal thread needs to be locked for the time of fetching a fresh render buffer state. This locking could be prevented by re-enabling passive render-buffer updates again, as they happen on the terminal-thread side, and accessing the render buffer on the renderer thread will therefore not block the terminal thread. Why this one currently (if enabled) is not more performant should be investigated. Visually it looks like a render-lag, that must be avoided.
input latency and key presses
when a key is pressed, the next render frame should already reflect that key press (typically by having that pressed character displayed). If that is the case, and if not, how to achieve that, I don't know yet. (TODO) :)
Balance
While all these are nice ideas, features that do make sense to the broader user-base must not be neglected just to maintain some certain level of performance, nor should code readability suffer.
Software usability and maintainability is much more of a concern than climbing up the who's-the-fasted-terminal-ladder.
References
Beta Was this translation helpful? Give feedback.
All reactions