-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[webgl] Removal of GPGPU example #478
Comments
Fret not, @dleeftink! The feature is still there (and actually expanded in scope). Here're a few examples showing the new approach (fundamentally the same, just reading the results from FBOs/textures back is currently a out-of-scope and must be handled separately (see below): For an example of how to read back a texture, hava a look here: umbrella/examples/webgl-texture-paint/src/index.ts Lines 160 to 189 in 86e43da
Hth! :) |
Thank you for pointing me to these, and the webgl layer in general! In regards to IO via readPixels, I wonder whether using Dataview + transform feedback would prove more performant for data IO? The use case I am envisioning is parallel reduction on the GPU, and out of many Webgl packages I've tested, not many (if any) provide parallel reduction out of the box. Would be cool to try and implement this using the @thi.ng/webgl package. For instance, here's an older example using regl: https://github.com/regl-project/regl/blob/gh-pages/example/reduction.js |
hi @dleeftink - this has tickled my interest and I've just uploaded a new example showing a version of this kind of reduction (using thi.ng/webgl & thi.ng/shader-ast): Demo: Source code: Readme w/ benchmark results: Ps. Can you please explain your "DataView w/ transform feedback" comment/approach? Not sure how this fits into this picture here... 😉 |
Thank you for taking the time, will have a bit of a play around to see how regl/@thi.ng-webgl compare in terms of setting up a parallel reduction pipeline. The performance profile of the example you provided seems promising enough! Re; the dataview/transform feedback approach: see this cell on Observable which is part of a notebook where I compare various GPGPU libraries. Beware that you may have to hit the 'profile' button after all libraries have run once to get a proper comparison (I will have to clean up the examples some more, isolate the gl contexts and add proper cooldowns and disposal between runs). In any case, the approach in the linked cell uses the forked WebGP library to write a sizeable array to an array of N by N textures, which are processed using a transform feedback mechanism. Instead of readPixels, you can construct a TypedArray directly from the result buffer, after which the max is found on the CPU and pushed to an output array. Here, reduction is thus applied in CPU- rather than GPU land, but afaik the transform feedback mechanism allows you to run multiple passes on the bound buffer before passing the data back to the CPU to reduce to a final result. Both the 'subgpu' and 'supgpu' cells show that quite a fast turnaround can be achieved from the CPU > GPU > CPU using this approach. My thinking is to do as much work on the GPU (e.g. map/reduce) before handing over a relatively small array to the CPU to apply final processing (e.g., deriving an array of centroids from a large N x N matrix). The relevant lines in the WebGP source: |
Thanks for this, I only just now realized that you were talking about the built-in WebGL2 transform feedback feature, whereas I previously assumed you meant some generic/custom mechanism 🤦 Alas, I have not yet had a need to use this feature and so also don't have any direct experience with it... So I'm still trying to wrap my head around how this would be working here & I know I'll have to do more reading about this (and reading some of the code you linked to)... From the little understanding I have about FWIW for texture sizes upto 64x64 (aka up to 4096 result values) the process of binding and reading a FBO takes on my M1 0.04-0.09ms (avg of 1000 iterations)... I'd say that's absolutely acceptable in relation to main computation time... |
Yes, you summarised it better than I did. If major restructuring is required, then please ignore! I think the example is more than serviceable to demonstrate an important Re; transform feedback, the following gist provides a concise example: https://gist.github.com/CodyJasonBennett/34c36b91719171c45ec50e850dc38a34 Although I haven't been able to get instancing to work fully yet for the above gist, this would theoretically allow you to achieve even greater speed-ups as described here: Beyond that, I do see value in providing a Dataview to a vertex buffer, as there is less copying involved. For 4096 values the difference might be negligible compared to readPixels(), but I am investigating how to quickly process 2**24/31 array elements on a first pass with an optional reduction on a second pass (TextEncoder() on CPU -> BytePair encoding on GPU -> BytePair frequency tables on GPU -> Sorting frequency tables on CPU). Shader-ast seems of great help to implement this functionality. |
Again, thank you! I will try to take a look at these links over the weekend. Just some side notes here:
|
Just wondering why this code example (or the
GPGPU
class in general) has been removed from the current build:https://github.com/thi-ng/umbrella/blob/161b4f8afaef0df742a8e2c7776993b828662589/examples/webgl-gpgpu-basics/src/index.ts
This seems a very useful abstraction (especially the jobrunner), but I can understand if it has since been superseded by a newer version. If that is these case, could you point me to a recent example that demonstrates GPGPU capabilities?
The text was updated successfully, but these errors were encountered: