-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hybrid between old and LUT implicit triangulation #971
base: dev
Are you sure you want to change the base?
hybrid between old and LUT implicit triangulation #971
Conversation
Hi Julien, I ran clang-format so I don't know why this is complaining. |
you may want to use a version of clang-format in line with the CI's. |
I've just pushed the diff with clang-format 16. let's see :) |
it looks like the CI is using clang-format 14.0-55. |
I've just tried with clang-format 14. |
If this runs through I will squash the commits. |
e41dd2c
to
932a24d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi jonas,
thanks a lot!
that looks awesome.
I haven't run it yet (I'll double check the perfs over the weekend).
Could you address the two tiny remarks I mentioned here?
Also, could you please port this query over to the PeriodicImplicitTriangulation as well?
(in principles, there's fewer cases).
Thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's another batch of small-scale remarks.
thanks!
b66b9ce
to
bd4d638
Compare
perf notes (to be formated later)ScalarFieldSmoother(ctBones) vger (laptop)ttk:dev
LUT
hal9000 (desktop)ttk:dev
LUT
|
Hi Jonas, I've been trying to benchmark a little bit the LUT implementation by simply applying the scalar field smoother (which only uses For some reason, I can't reproduce speedups on my systems (see the above quick benchmark). Could you try to run this little benchmark on your end and report your numbers?
|
Hi Julien, [HelloWorld] Computing Averages .............................. [6.587s|32T|100%] LUT [HelloWorld] Computing Averages .............................. [2.228s|32T|100%] I suspect that your compiler decided not to inline the methods and as far as I know there is no way to force this. What are your thoughts @petersteneteg? Another thought: Is it possible that your coed uses the implicit triangulation with preconditions? |
hi jonas, |
perf notes (to be formated later)HelloWorldCmd -i ctBones512.vti (512 is sufficient to trigger the full implicit triangulation) hal9000 (desktop)ttk:dev
ttk:dev with inline fully implicit getVertexNeighbors (https://github.com/julien-tierny/ttk/tree/implicitInlining)
lut
vger (latop)ttk:dev
lut
perf smootherttkScalarFieldSmootherCmd -i ctBones512.vti -I hal9000 (desktop)ttk:dev
ttk:dev with inline fully implicit getVertexNeighbors (https://github.com/julien-tierny/ttk/tree/implicitInlining)
lut
perf persistence diagramttkPersistenceDiagramCmd -i ctBones512.vti hal9000 (desktop)ttk:dev
ttk:dev with inline fully implicit getVertexNeighbors (https://github.com/julien-tierny/ttk/tree/implicitInlining)
lut
|
alright, so I believe I understand what's going on here. in short:
so then, I believe this PR should be modified. the LUT implementation should replace the full implicit one, not the preconditioned one (I have the impression you replaced both) |
so in short, I believe you should leave the old implementation (i.e. for that you can follow the example of the function we can discuss this on Friday. |
Hi everyone, Best |
hi jonas, |
I don't understand why you don't see any speedup of using the hybrid over the inlined dev version but I think we already reached the point where investing time into the hybrid defeats the purpose. For now it seems much easier to just inline all methods of the fully implicit triangulation and call it a day. Afterwards we can see if a purely LUT-based approach brings additional speedup. |
alright everyone. what bugs me is that, if we don't in-line all functions, but only the getVertexNeighbor and getVertexNeighborNumber (only these two functions, julien-tierny@53e2e58), the gain is actually better (15%). do you guys have any explanations of what is going on? thanks a lot for your feedback. |
So to a first approximation inlining more will make it faster, since it will avoid the function call overhead, and often enables a bunch of other optimization. The problems arises when a lot of code get inline, then the code size will increase since the code get basically copy pastes many times. And when the code size increases that puts more pressure on the instruction cache in the cpu, basically the processor will be starved for the lack of instruction rather than the lack of data. So ideally you need to balance the amount of inlining that you are doing, and often the compiler will have a lot of built in heuristics to get a good balance between inlining and code size. But It will generally not always find the optimal balance. One can use tools like cachegrind. to look at the amount of instruction cache misses vs data cache misses to potentially optimize the performance. |
hi @petersteneteg I think I'll go ahead and PR the commit which optimizes the performance then (with only the |
I think just inlining the neighbor methods is sufficient for now. Those are the most relevant queries anyway. Maybe in the future the LUT-based procedures are also much smaller and therefore fit into the instruction cache. We will have to see. |
Hi Julien,
this PR is the unholy union between the old and the LUT-based approach for implicit triangulations. For now this seems to result in a 4x speedup. We will work in the background on getting the 8x speedup of the pure LUT-based approach in TTK, but as you said grabbing this speedup essentially for free right now makes sense. I tested this, but since touching the triangulation is tricky work I recommend that we test this as much as possible.
Best
Jonas