-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] - Test Pioneer running on Raspberry Pi 5 #5700
base: master
Are you sure you want to change the base?
Conversation
Isn't there some reason we had 3.2 and not 3.1? Would there not be negative consequences when downgrading? With 20 FPS is this something we would like to provide to a PI? Can the window resolution be made even smaller (like when playing Doom on a 386, with stamp-sized window) |
I honestly do not recall why we required 3.2, need to check the commit logs. This PR is just to make it build and run on the RPi5 & 4 so that others, Raspberry Pi has a large developer community, can take a look if they're interested. As for the 20 fps, that appears to be caused by terrain and City rendering exclusively. Get into space and I'm getting a solid 60 fps with VSync on. I haven't tried with it off. In my opinion this means there's possibly some optimisation to be done for Terrain and City rendering. The resolution could be reduced, that 20 fps was windowed, and there's other tweaks but I think it will be interesting to do some profiling and optimisation. |
The update to OpenGL 3.1 --> 3.2 was in #4974 which is probably required for the offscreen rendertarget stuff but I'm not 100% sure. Either way I don't think we should revert to 3.1 for all platforms, but just for Raspberry Pi to allow the RPi4 & 5 to run. |
This is excellent news! ...I'll lead with the (perhaps unnecessary) negatives since I'm that kind of person. I don't think we should attempt to officially support the Pi while Pioneer is still using OpenGL. AFAIK, the Vulkan driver for the Pi's GPU hardware has marginally better performance and is significantly more feature-rich than the GL implementation shipping on the Pi. I'd rather not maintain additional GL-version backends, especially given the overall progress towards modernizing the rendering architecture and design of Pioneer. Additional hardware quirks and considerations have the potential to be very detrimental friction given the current scope of work needed to achieve compatibility with a more "sane" API like Vulkan. I'm aware I'm preaching to the choir on this one. 😉 That being said, I do want to eventually officially support the Pi or a platform like it once the rendering code in Pioneer is more sane and there's less overhead/waste contributing to the low performance. Depth buffer issues are most likely related to the GL-on-RPI implementation not supporting the pile of reverse-Z floating-point depth buffer tricks we use. IIRC, part of the move to GL 3.2 was better support for MSAA render targets including the float depth buffer. The performance of pushing 100k+ terrain vertices through a tile-based rasterizer is unsurprising to me. There may be a few options to improve that in the future (e.g. better horizon culling taking the "real feature height" of a GeoPatch into account), though I'm not fully read in on the literature to know if a Z-prepass style renderer would achieve better performance there. Off the top of my head, city rendering has a potential to be CPU-bound rather than GPU bound. There's a huge inefficiency there in preparing the instanced transform data which unfortunately can't be easily resolved at the moment without some invasive changes to the SceneGraph API. It's not a problem on modern desktop platforms, but on a platform with low memory bandwidth it will be more pronounced. The main reason for upgrading to GL 3.2 (more specifically GLSL version 1.50) was the ability to sample from an MSAA texture (e.g. the primary render target) for use in post-processing effects. This will primarily show up in the HDR->LDR tonemap+resolve step of the planned rendering pipeline, but is not currently used at this time. That being said, I'd strongly prefer not to downgrade to GL 3.1 on any platform other than in an experimental build configuration for the Pi. Eventually I'd like to fully bump the GL version we use to 3.3, to have support for separate samplers and some GLSL improvements (explicit attribute location, floatBitsToInt et. al), as that is slightly closer to the programming model of Vulkan. Overall, I'm very excited to see proof that Pioneer can run on the Pi. However, I have some strong cautions about targeting it "too early" and creating a dragging anchor which would prevent our adoption of a more performant rendering architecture. |
@Web-eWorks I totally agree, this isn't to give the Raspberry Pi a build, it's just to make it easier for that development work to begin. A Vulkan renderer is the future for all platforms, either a custom one written for Pioneer, or using BGFX/LLGL or some other wrapper/optimising library. Seeing the legacy OpenGL renderer move to 3.3, or even all the way upto 4.6 which is still almost 7 years old (July 31, 2017!!!), would completely eliminate the Raspberry Pi OpenGL support and that's a natural progression which I'd be fine with. Like I say, this just allows us to run it, try some optimisations, and figure out some issues whilst the Pi supports 3.1 and we've not updated the renderer yet that it causes a problem. Also resolving these build problems with the CMake are necessary regardless of the renderer for compiling on Pi. Having it actually run and RENDER was an actual "Holy shit!" moment when it ran 😆 |
So given everything said above:
What needs to change in this for it to become a PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what needs to change in this for it to become a PR?
Change the current #ifdefs to an opt-in model set via a CMake option()
call of some sort, and provide a CMake build preset setting the appropriate cache variables for the target hardware.
This is mostly concerned with ensuring we build "out of the box" for distros crazy enough to be shipping a desktop/laptop AArch64 build of Pioneer (where GL3.2+ is presumed to be available.)
I forgot to mention it due to lack of sleep in my previous comment, but despite all of the negatives I listed I'm definitely in favor of providing a build config for Raspberry Pi as long as it's understood by the end user to be experimental and potentially subject to breakage. 😄
CMakeLists.txt
Outdated
if(CMAKE_SYSTEM_PROCESSOR MATCHES "arm" OR CMAKE_SYSTEM_PROCESSOR MATCHES "aarch") | ||
# maybe for performance we can add options here? | ||
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mcpu=cortex-a76 -ffast-math") | ||
else() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of wrapping the USE_SSE42
etc. if-defs, prefer adding a new option(PIONEER_TARGET_RASPBERRY_PI)
declaration with the compile flags and adding a preset definition to either CMakePresets.json or a new scripts/CMakeExperimentalPresets.json which sets the appropriate options:
"cacheVariables": {
"USE_SSE42": false,
"USE_AVX2": false,
"PIONEER_TARGET_RASPBERRY_PI": true
}
If added to scripts/CMakeExperimentalPresets.json (preferred), you'll have to manually cp scripts/CMakeExperimentalPresets.json CMakeUserPresets.json
before running e.g. cmake --preset linux-arm-raspberry-pi
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only problem with that is defaulting to USE_SSE42 is only correct for x86/64 and wrong for PPC, Arm, RiscV etc so we should guard it somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a way to default that option to off on non-x86 style platforms I'm all for it. Otherwise, it's assumed the options are managed via a CMake preset or explicit build flag set from a distribution build harness of some sort.
...which now that I think about it blindly omits Windows ARM64 builds, but I don't think I've seen anyone yet who's tried to build Pioneer on that platform.
src/graphics/opengl/RendererGL.cpp
Outdated
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 3); | ||
#if (defined(__arm__) || defined(__aarch64__) || defined(__PPC64__) || defined(__riscv)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of testing for platform feature flags here, prefer adding an explicit PIONEER_TARGET_RASPBERRY_PI
define in buildopts.h
and testing for that define. If we're doing legacy backwards-compatibility hacks like this, it should be opt-in at build time, rather than automatic based on CPU architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to work this out but I detest CMake, it's the worst possible build system and I can barely hack around in it at the best of times 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably consider Autotools to be worse, but I completely understand where you're coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't work out the buildopts.h
route but I could do add_definitions(-DPIONEER_ON_RPI=1)
in CMakeLists.txt
to the same effect.
Currently the committed code still has the platform feature flags set but I tested it on device without them.
If someone has practical advice about a better way to do this I'd appreciate the CMake advice.
I'll change it to #if defined(PIONEER_ON_RPI)
in the next commit
CMakeLists.txt
Outdated
option(USE_ARM "Compile for ARM compatible microarchitectures" ${PIONEER_TARGET_ARM}) | ||
option(PIONEER_TARGET_RASPBERRY_PI "Compile for Raspberry Pi specifically (degrades to OpenGL 3.1)" OFF) | ||
|
||
if (USE_ARM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we now have autodetection for ARM platforms based on the CMAKE_SYSTEM_PROCESSOR
variable - compilation for ARM64 should set CMAKE_SYSTEM_PROCESSOR
to one of aarch64
or ARM64
(likely aarch64
since it's a linux target OS).
This should be done by whatever external tooling or option sets up the correct compiler toolchain for Raspberry Pi ...unless "normal" Raspberry Pi uses standard GCC builds in which case this should be set by the PIONEER_TARGET_RASPBERRY_PI
option before we include(cmake/TargetArchitecture.cmake)
. I'm used to working with the RPi Pico which has a separate GCC toolchain from the system compiler.
Ok so this is stalled because I just cannot fathom CMake. Is anyone willing or able to help with this build system which I find so awful? |
@fluffyfreak I volunteer @pcercuei (if ready and willing) |
I can help as well yeah, though my availability today is limited. Are you compiling for RPi on-device or using a cross-compiler from a Windows host? One is far simpler than the other. If compilation is on-device on the raspberry pi itself, then I can amend the CMakeLists.txt and push that to this branch for you to test with - sadly, I don't own a Raspberry Pi of my own (other than several Picos, but those don't count). |
That would be a huge help, I'm not sure where I'm supposed to define or put things, or how it all links up to produce the desired I mean i've gotten it to "work" but I'm stretching the definition of the word "work" too it's limit 🤣 Since we now have ARM detection, somewhere, it needs to do: Then I can compile the alternative code in C++ I did fix a few other build options for Raspberry Pi, and fixup/change some presets but I think otherwise I'll have to reset this to master HEAD and reimplement the CMake stuff from scratch because I'm not sure what I'm doing with it 🤷♂️ |
Oh and I'm doing it on-device, as I figure that is how most users would be doing it too |
Here's what you'd replace the existing changes to the CMake file with: # Default to enabling PIONEER_TARGET_RASPBERRY_PI if we're compiling on ARM64
option(PIONEER_TARGET_RASPBERRY_PI "blah blah" ${PIONEER_TARGET_ARM64})
if (PIONEER_TARGET_RASPBERRY_PI)
add_compile_options("-mcpu=cortex-a76" "-ffast-math")
endif(PIONEER_TARGET_RASPBERRY_PI) All That's uh... it. @pcercuei could suggest fancier ways of detecting the system is specifically a Raspberry Pi SoC, but defaulting to this mode on ARM64 is "good enough" for now. Note that this does not pass |
I know I mentioned pushing to the branch, but it turned out to be simple enough I don't think I need to - you should be able to copy-paste the above snippet and make the mentioned changes to C++ and it "should" work. |
What are you guys trying to achieve exactly? |
My suggestion is - just don't. Unless you're using a generic distribution on your Pi (e.g. regular Debian), which nobody does, the compiler will already use the correct flags for the system. And even then, adding board-specific stuff (not even arch-specific) to a cross-platform project is pretty bad design IMHO. |
@pcercuei thanks that's good advice. I'll try it without for now. This PR will get messy but then I'll sqush it flat (somehow) later on to remove merges etc |
I'm not sure why there are merges in the PR commit history to begin with, unless there's a need to resolve a conflict. Even then, I always opt for a |
Use OpenGL 3.1 on Pi Add PIONEER_TARGET_RASPBERRY_PI define in CmakeLists
a2e5201
to
eaadad3
Compare
Mostly because I rarely get to dip into Git anymore and have forgotten half of the little that I used to know 🤷♂️ Ok this builds on Raspberry Pi 5 again now. There's rendering issues, framerate with MSAA off is ~20fps at the Mars start point, but with VSync it's 60fps in space. |
I should clarify that performance is very variable and I think our terrain kills it.
|
I wonder if the RPi 5 even supports a floating point depth buffer. Everything else we do (including ARB_clip_control) seems to be working fine, or you'd see entirely inside-out models. Do you have a screenshot of an example rendering issue on the Pi? |
Hmm, there is an error output actually, right after we're setting up some OpenGL state
Which is this block that includes some depth setup stuff
|
It appears that the call to Also I locally added a check for Another |
It's not just terrain either... Look at the building models. Where the primitive order is unfavorable, they're rendering in reverse order. I suspect some poking around in RenderDoc may be needed here - at minimum I suspect reverse-Z isn't working correctly, possibly even issues with floating point precision (i.e. mediump/lowp) selected by the driver. |
Good eye 👀 even playing it I didn't spot that! |
No luck with RenderDoc so far, it can launch Pioneer but fails to capture anything. However I did remember that depth buffer float is done for v3d the RPI GPU apparently, but I can't find any support for glClipControl
Not that it matters since the driver reports it as available to Glew, and even when I force it to use |
Seamless cube map textures are a GL3.2 thing - we need to require the
This is a little weird. The driver shouldn't be reporting support for Neither gpuinfo nor mesa's feature matrix report any support for This is getting a bit complicated. Since we don't have The "fix" for this (and you can replicate the rendering issue on your PC just by commenting-out the Terrain is still very messed up once in orbit, but it's not hugely apparent... at least until you set terrain detail to Low, at which point the seams are quite visible. Jupiter's rings are still borked, since at that distance both they and Jupiter map to a single depth value. Given the Powercore VII is primarily an OpenGL ES chip meant for mobile devices, this is probably the best we're getting unless you know someone on the Mesa team who could look at implementing |
Yeah that "works" given the precision limitations. There is one other option which you may find distateful given that you removed it in the past, but I could reimplement the logz hack for these Arm / OpenGL ES boards. I did also look into the option of implementing glClipControl for Mesa V3D, after digging into the code around this commit I found that the defines seem to implement NONE, MIN_ONE_TO_ONE or ZERO_TO_ONE So implementing it might be possible 👍 though I've never submitted/committed anything to Mesa before |
Also with a working Z-depth test the performance was much improved, from ~20fps to ~30fps in the same save game file I used. That was at medium detail for planet and city as well. |
In fact in that v3dx_emit.c I think the parameter to decide would be
So it needs the glClipControl to set that, and then the check would have to change to something like:
Which would have to be better written to get it submitted but you can see the logic... Ok I'm not going to have much more time tonight to look into this but it looks like most of the elements are there to support glClipControl. |
The problem with reimplementing logz is...
You lose any kind of early-Z depth test, including the implicit one done when binning triangles in hardware. While part of the performance gain you're seeing here is likely coming from the tile-out bandwidth of writing a pixel back to main memory, doing depth-write in a fragment shader could in the worst case potentially halve (or worse) the framerate as the hardware renderer is no longer able to guarantee that fully occluded fragments (according to the vertex data) will still be occluded after the fragment shader runs. Whether the VideoCore VII has a hidden-surface-removal pipeline or not, you'll likely see a loss of performance from reimplementing log-Z. Given performance on this hardware is already tenuous, that's probably not a good idea.
If the hardware supports Vulkan / DirectX depth semantics, it should be able to implement at least the zero-to-one clip-space part of glClipControl - certainly enough for our needs. This would ideally be the way to go. |
#5970 may produce a slight FPS increase on the Raspberry Pi, as it allows a few extra milliseconds of work to be overlapped with the queued GPU commands from the previous frame. |
I got a Raspberry Pi 5 just before Xmas and whilst doing something completely unrelated to Pioneer discovered that it now supports OpenGL 3.1!
In theory this might also work on RPi 4.
Technically Pioneer requires 3.2 however a quick downgrade to 3.1 and it runs with caveats.
Creating this as a DRAFT for visibility and to jog some discussion
Commit description:
Downgrade to OpenGL 3.1 and in CMakeLists.txt check for arm CPU
test on Raspberry Pi 4
investigate graphical issues
test gameplay further (Gas Giants, saving & loading, sgm models, etc)
is requesting 3.1 safe for other platforms or should this be Pi only?