-
-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in collapsing blocked subspace #3282
Comments
|
I think we should turn of data reordering in serial. from mpi4py import MPI
from dolfinx import mesh, fem, la, log
from basix.ufl import element, mixed_element
import time
# log.set_log_level(log.LogLevel.INFO)
N = 5
P = 4
# Create square quad mesh
start = time.perf_counter()
domain = mesh.create_unit_cube(MPI.COMM_WORLD, N, N, N, mesh.CellType.hexahedron)
Ve = element("Lagrange", domain.basix_cell(), P, shape=(domain.geometry.dim,))
Qe = element("Lagrange", domain.basix_cell(), P)
W_el = mixed_element([Ve, Qe])
W = fem.functionspace(domain, W_el)
print(f"Meshing and creating mixed space: {time.perf_counter()-start}")
start2 = time.perf_counter()
V, WV_map = W.sub(0).collapse()
print(f"Collapsing V: {time.perf_counter()-start2}")
start3 = time.perf_counter()
Q, WQ_map = W.sub(1).collapse()
print(f"Collapsing Q: {time.perf_counter()-start3}") returns the following in serial: Meshing and creating mixed space: 0.013051556999926106
Collapsing V: 7.033872703999805
Collapsing Q: 0.0023294779998650483 and Meshing and creating mixed space: 0.020997950000037235
Collapsing V: 0.70820857300032
Collapsing Q: 0.003953482000270014
Meshing and creating mixed space: 0.020988049000152387
Collapsing V: 0.7082019069998751
Collapsing Q: 0.003954929999963497 with 2 procs, i.e. a 10 x speedup! Meshing and creating mixed space: 0.02864133799994306
Collapsing V: 0.34778579499970874
Collapsing Q: 0.003480805999970471
Meshing and creating mixed space: 0.027170727000338957
Collapsing V: 0.34781397200003994
Collapsing Q: 0.0034789139999702456
Meshing and creating mixed space: 0.03079914899990399
Collapsing V: 0.3478091659999336
Collapsing Q: 0.0034698189997470763 I still think it is remarkable that creating the collapsed space is more than 10 x more expensive than creating the mixed space itself. |
We shouldn't turn off re-ordering in serial. Can someone bisect to find out when the regression was introduced? @chrisrichardson could it be related to recent changes for mixed topology? |
It is prior to 0.7 release. |
I don't think mixed topology changes would affect this.
which seems strange. Looks like some kind of internal issue with the reordering. GPS is only called twice, but builds the level structure thousands of times. |
It is present in 0.6 as well: Meshing and creating mixed space: 0.5712962300000015
Collapsing V: 0.7795458059999874
Collapsing Q: 0.0005138619999911498 P=4 Meshing and creating mixed space: 0.6977521319999767
Collapsing V: 7.012910768000012
Collapsing Q: 0.0005572910000068987 |
I added a line in
so it is quite a dense graph |
@jorgensd Re: the speedup in parallel - you should do a weak scaling (e.g. double the mesh size with |
We probably build a sparsity graph for the dofs, which isn't a good use of compute effort for high-order elements. We should probably create a graph of vertices, facets, etc, and then lay the dofs for each entity 'on top'. |
As in not improving further? |
I think one of the main things could Also be the block size, as you can see that an unblocked (scalar) element is magnitudes of order faster than its vector counterpart. |
Summarize the issue
Collapsing a blocked subspace is orders of magnitude slower than collapsing the scalar version
and increases dramatically with increase of polynomial order.
First reported at: https://fenicsproject.discourse.group/t/collapsing-mixed-space-takes-unreasonably-long/15071
How to reproduce the bug
Run MWE below
Minimal Example (Python)
Output (Python)
Version
main branch
DOLFINx git commit
9f33dab
Installation
Docker using:
ghcr.io/fenics/dolfinx/dolfinx:nightly
Additional information
No response
The text was updated successfully, but these errors were encountered: