Petsc labeling issue in parallel #240

gthyagi · 2024-09-03T04:33:51Z

Here is the simple example that demonstrate petsc labelling issue in parallel.

import underworld3 as uw
from underworld3.systems import Stokes
import sympy

mesh = uw.meshing.StructuredQuadBox(elementRes=(2, 2), minCoords=(0., 0,),
                                    maxCoords=(1., 1.))

mesh.view()

v_soln = uw.discretisation.MeshVariable(r"u", mesh, 2, degree=2)
p_soln = uw.discretisation.MeshVariable(r"p", mesh, 1, degree=1, continuous=True)

# Create Stokes object
stokes = Stokes(mesh, velocityField=v_soln, pressureField=p_soln, solver_name="stokes")
stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
stokes.constitutive_model.Parameters.viscosity = 1.0
stokes.saddle_preconditioner = 1.0

# bc's
stokes.add_essential_bc(sympy.Matrix([1.0, 0.0]), "Top")
stokes.add_essential_bc(sympy.Matrix([sympy.oo, 0.0]), "Bottom")
stokes.add_essential_bc(sympy.Matrix([0.0,sympy.oo]), "Left")
stokes.add_essential_bc(sympy.Matrix([0.0,sympy.oo]), "Right")

stokes.bodyforce = sympy.Matrix([0, 0])

# stokes.petsc_options["pc_type"] = "lu"
stokes.tolerance = 1e-6
stokes.solve(verbose=False)

if uw.mpi.rank==0:
    print(p_soln.stats())

This works in serial but fails in parallel.
mpirun -np 2 python Ex_Sbox_parallel.py

The text was updated successfully, but these errors were encountered:

bknight1 · 2024-09-03T04:51:55Z

The output in serial from mesh.view() is:

| Boundary Name            | ID    | Min Size | Max Size |
| ------------------------------------------------------ |
| Bottom                   | 11    | 3        | 3        |
| Top                      | 12    | 3        | 3        |
| Right                    | 13    | 3        | 3        |
| Left                     | 14    | 3        | 3        |
| All_Boundaries           | 1001  | 8        | 8        |
| ------------------------------------------------------ |

and from mesh.dm.view():

Labels:
  depth: 3 strata with value/size (0 (9), 1 (12), 2 (4))
  All_Boundaries: 1 strata with value/size (1001 (8))
  Bottom: 1 strata with value/size (11 (3))
  Elements: 1 strata with value/size (99999 (5))
  Left: 1 strata with value/size (14 (3))
  Right: 1 strata with value/size (13 (3))
  Top: 1 strata with value/size (12 (3))
  celltype: 3 strata with value/size (0 (9), 1 (12), 4 (4))
  UW_Boundaries: 5 strata with value/size (11 (3), 12 (3), 13 (3), 14 (3), 1001 (8))

and parallel is:


| Boundary Name            | ID    | Min Size | Max Size |
| ------------------------------------------------------ |
| Bottom                   | 11    | 2        | 2        |
| Top                      | 12    | 2        | 2        |
| Right                    | 13    | 0        | 3        |
| Left                     | 14    | 0        | 3        |
| All_Boundaries           | 1001  | 4        | 4        |
| ------------------------------------------------------ |

Labels:
  depth: 3 strata with value/size (0 (6), 1 (7), 2 (2))
  All_Boundaries: 1 strata with value/size (1001 (4))
  Bottom: 1 strata with value/size (11 (2))
  Elements: 1 strata with value/size (99999 (3))
  Left: 0 strata with value/size ()
  Right: 1 strata with value/size (13 (3))
  Top: 1 strata with value/size (12 (2))
  celltype: 3 strata with value/size (0 (6), 1 (7), 4 (2))
  UW_Boundaries: 4 strata with value/size (11 (2), 12 (2), 13 (3), 1001 (4))

Seems like mesh.dm.view() is only reporting from the last processor, whilst mesh.view() is able to show that the left and right labels are missing from each processor (assuming the box is split down the middle, this makes sense).

I adding in:

for boundary in mesh.boundaries:
    proc = uw.mpi.rank
    # print(f'boundary name: {boundary.name}, label: {boundary.value}, proc: {proc}')
    l = mesh.dm.getLabel(boundary.name)
    if l:
        print(f'{boundary.name} found on {proc}' )
    else:
        print(f'{boundary.name} NOT found on {proc}' )

Which reports back:

Bottom found on 0
Top found on 0
Right found on 0
Left found on 0
All_Boundaries found on 0
Bottom found on 1
Top found on 1
Right found on 1
Left found on 1
All_Boundaries found on 1

which suggests the boundary labels are available on both processors?

The solve produces the following error:

[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: General MPI error
[1]PETSC ERROR: MPI error 1 MPI_ERR_BUFFER: invalid buffer pointer

lmoresi · 2024-09-03T05:13:51Z

@gthyagi / @bknight1

It fails because the p_soln.stats() command is collective so you can't execute it just on process 0.

p_stats = p_soln.stats()

if uw.mpi.rank==0:
    print(p_stats)

... is the required syntax.

gthyagi · 2024-09-03T05:33:02Z

Thanks @lmoresi for pointing that.

Here is the above example with natural bc that does not work in parallel. However, elementRes=(2, 2)/(3, 3)/(4, 4) works.

import underworld3 as uw
from underworld3.systems import Stokes
import sympy

mesh = uw.meshing.StructuredQuadBox(elementRes=(5, 5), minCoords=(0., 0,),
                                    maxCoords=(1., 1.))

x,y = mesh.X

mesh.view()

v_soln = uw.discretisation.MeshVariable(r"u", mesh, 2, degree=2)
p_soln = uw.discretisation.MeshVariable(r"p", mesh, 1, degree=1, continuous=True)

# Create Stokes object
stokes = Stokes(mesh, velocityField=v_soln, pressureField=p_soln, solver_name="stokes")
stokes.constitutive_model = uw.constitutive_models.ViscousFlowModel
stokes.constitutive_model.Parameters.viscosity = 1.0
stokes.saddle_preconditioner = 1.0

# +
# bc's
t_init = sympy.cos(3*x*sympy.pi) * sympy.exp(-1000.0 * ((y - 0.99) ** 2)) 

# stokes.add_essential_bc(sympy.Matrix([1.0, 0.0]), "Top")
stokes.add_natural_bc(sympy.Matrix([0.0, -t_init]), "Top")
stokes.add_essential_bc(sympy.Matrix([sympy.oo, 0.0]), "Bottom")
stokes.add_essential_bc(sympy.Matrix([0.0,sympy.oo]), "Left")
stokes.add_essential_bc(sympy.Matrix([0.0,sympy.oo]), "Right")
# -

stokes.bodyforce = sympy.Matrix([0, 0])

print(f'rank: {uw.mpi.rank}, min coord: {mesh.data[:,0].min(), mesh.data[:,1].min()}', flush=True)
print(f'rank: {uw.mpi.rank}, max coord: {mesh.data[:,0].max(), mesh.data[:,1].max()}', flush=True)

# stokes.petsc_options["pc_type"] = "lu"
stokes.tolerance = 1e-6
stokes.solve(verbose=False)

p_stats = p_soln.stats()
if uw.mpi.rank==0:
    print(p_stats)

mpirun -np 2 python Ex_Sbox_parallel.py

@knepley

…rals The surface integral code falls over if a label is not present on one of the processes. This is identified in issue #240. This workaround adds element-volumes (all of them) to each boundary label. These are not used in the surface integral code in 2D / 3D so they do not change the result. @knepley - we've discussed this issue in the past ... we should figure out the actual origin of the problem.

lmoresi · 2024-09-03T10:24:16Z

@gthyagi - I've proposed a workaround on branch #241 and suggest that you review / try it. Also @bknight1 since JG is not available to review this week.

knepley · 2024-09-03T12:05:13Z

@bknight1 Yes, -dm_view only reports label output for proc 0. I did not want to make that collective, since all procs might not have all labels, but we could revisit this. There might be a clean way to do it now.

gthyagi · 2024-09-04T00:09:19Z

@knepley In -dm_view, it would be useful to print all the processor numbers that belong to each label.

bknight1 · 2024-09-04T00:42:22Z

@gthyagi we could do this in UW by modifying the mesh.view() to return the number of label values on each node

Add a parallel_view() function to see the number of labels on each processor. In response to #240 returns something like this: | Boundary Name | ID | Size | Proc ID | | ------------------------------------------------------ | | Bottom | 11 | 2 | 0 | | Top | 12 | 2 | 0 | | Right | 13 | 3 | 0 | | Left | 14 | 0 | 0 | | All_Boundaries | 1001 | 4 | 0 | | Bottom | 11 | 2 | 1 | | Top | 12 | 2 | 1 | | Right | 13 | 0 | 1 | | Left | 14 | 3 | 1 | | All_Boundaries | 1001 | 4 | 1 | | ------------------------------------------------------ |

lmoresi · 2024-09-04T02:46:31Z

Make it optional because that could be a lot of information ...

bknight1 · 2024-09-04T02:49:20Z

I agree. I've made it a separate function for now, we can discuss tomorrow how to better present the information

gthyagi · 2024-09-04T04:38:34Z

@bknight1 The current print statement outputs a lot of information, and the label names are repetitive. It might be better to create a separate table with Label and List of Procs.

bknight1 · 2024-09-04T05:03:28Z

Yeah that probably is the best approach, feel free to make those changes

@knepley

* Add small offset in box point capture * Clarifications in nodal swarm * ddt - allowing discontinuous fields for advected points if required * Trying to fix semi-lagrange scheme Major bug (totally my fault) in which I failed to account for particles being reordered after sending and receiving. We now store the node and process for each particle which means we can take small substeps to the sample point and then snap back directly to the launch point. This includes JC's swarm wormhole teleportation code which is required for long trajectories. I'm turning off all smoothing for projection operators which was introduced to counter numerical problems which turned out to be related to this bug. Also, added the possibility to use a sample swarm based on a discontinuous underlying variable. This gives a good spread of points and means that corners / facets where particles may be lost are avoided by particle paths. * Fix regression / testing * Fixing one broken test There are (value related) failures in the advection tests. Of course, advection tests are a bloody nightmare since advection schemes are problematic to validate. Need to understand this before merging PR * Julian's review comments posixpath - not sure why this import is present epsilon in boxmesh return-coords-... - no need to complicate things * Make substepping time-criterion consistent with timestep definition * More minor fixes for adv-diffusion Arise from convection benchmark models. * A couple of fixes * Hackish work around for the PETSc labelling issues with surface integrals The surface integral code falls over if a label is not present on one of the processes. This is identified in issue #240. This workaround adds element-volumes (all of them) to each boundary label. These are not used in the surface integral code in 2D / 3D so they do not change the result. @knepley - we've discussed this issue in the past ... we should figure out the actual origin of the problem. * parallel_view fn for mesh labels Add a parallel_view() function to see the number of labels on each processor. In response to #240 returns something like this: | Boundary Name | ID | Size | Proc ID | | ------------------------------------------------------ | | Bottom | 11 | 2 | 0 | | Top | 12 | 2 | 0 | | Right | 13 | 3 | 0 | | Left | 14 | 0 | 0 | | All_Boundaries | 1001 | 4 | 0 | | Bottom | 11 | 2 | 1 | | Top | 12 | 2 | 1 | | Right | 13 | 0 | 1 | | Left | 14 | 3 | 1 | | All_Boundaries | 1001 | 4 | 1 | | ------------------------------------------------------ | * Label points instead Not a fix, just examining potential workaround choices * Parallel Test utilises previous mesh * try this one * Still just an experiment * Mostly going to give up on the labelling Put most stuff back together. Improve the view() a little bit. Fix the problems with expressions (sympy merging expressions with same symbol - FFS!) Somthing odd with dm re-distribution that should not have been happening. Also fixing up the object id things that are not being respected by sympy. * min viscosity - typo in expression definition * Some changes to the documentation / tidy notebooks Also adding rendering hints in the notebooks for quarto * gh-pages workflow - notebooks * Update publish.yml * Update publication source path * Updating /Adding examples in quick-start user guide 8 notebooks added which demonstrate various parts of the uw3 machinery plus the rendering code to put them on github pages as a quickstart guide. I still need to update the scripts to launch on binder. * Bug fix - discretisation Not zapping the kdtree properly when deforming the mesh. I guess this was a problem in free surface models so I am surprised it was not reported before - is there an open issue about particles added to a deforming mesh ? Minor detail: using invisible hspace in place of the \,\! strings. The width of the hspace is the unique ID / 100 in points. It still looks weird but the code is readable. I tried labels and other things but they break the rendering. There is probably a better way. * updating examples Fixes to advection and expressions to make examples and tests work. * Further fixes --------- Co-authored-by: Ben Knight <[email protected]>

lmoresi mentioned this issue Sep 3, 2024

Labelling work around #241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Petsc labeling issue in parallel #240

Petsc labeling issue in parallel #240

gthyagi commented Sep 3, 2024

bknight1 commented Sep 3, 2024 •

edited

Loading

lmoresi commented Sep 3, 2024

gthyagi commented Sep 3, 2024 •

edited

Loading

lmoresi commented Sep 3, 2024

knepley commented Sep 3, 2024

gthyagi commented Sep 4, 2024

bknight1 commented Sep 4, 2024

lmoresi commented Sep 4, 2024

bknight1 commented Sep 4, 2024

gthyagi commented Sep 4, 2024

bknight1 commented Sep 4, 2024

Petsc labeling issue in parallel #240

Petsc labeling issue in parallel #240

Comments

gthyagi commented Sep 3, 2024

bknight1 commented Sep 3, 2024 • edited Loading

lmoresi commented Sep 3, 2024

gthyagi commented Sep 3, 2024 • edited Loading

lmoresi commented Sep 3, 2024

knepley commented Sep 3, 2024

gthyagi commented Sep 4, 2024

bknight1 commented Sep 4, 2024

lmoresi commented Sep 4, 2024

bknight1 commented Sep 4, 2024

gthyagi commented Sep 4, 2024

bknight1 commented Sep 4, 2024

bknight1 commented Sep 3, 2024 •

edited

Loading

gthyagi commented Sep 3, 2024 •

edited

Loading