Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected parallel results #1017

Open
Edward-Rawson opened this issue Feb 20, 2024 · 5 comments
Open

Unexpected parallel results #1017

Edward-Rawson opened this issue Feb 20, 2024 · 5 comments
Labels

Comments

@Edward-Rawson
Copy link

I originally posted about this on the mailing board but, in hindsight, here seems more appropriate.
I'm not getting the expected results when running in parallel across multiple cores on one cluster node. This problem persists when using either Trilinos with Gmsh 4.11.1 and PETSC with Gmsh 3.0.6. The script below (run with Trillinos --no-pysparse) demonstrates this:

import numpy as np
from fipy import *
import sys
from PyTrilinos import Epetra
np.set_printoptions(threshold=sys.maxsize)

mesh = GmshGrid3D(0.001, 0.001, 0.001, 10, 10, 10)
X = mesh.cellCenters[0]
T = CellVariable(mesh=mesh, value=20.)
alpha = CellVariable(mesh=mesh, value=2.88e-6)
T.setValue(100, where=(X >= 0.5*max(X)))
eq = TransientTerm() == DiffusionTerm(coeff=alpha)

sys.stdout.write(
    ("\n PID = "+str(Epetra.PyComm().MyPID())+"; "
     +str(mesh.numberOfCells)+
     " elements on processor "+str(parallelComm.procID)+
     " of "+str(parallelComm.Nproc)))
parallelComm.Barrier()

for step in range(3):
    eq.solve(T, dt=0.75)

parallelComm.Barrier()
Tg = T.globalValue
if parallelComm.procID == 0:
    Tr = np.reshape(Tg, (10, 10, 10))
    sys.stdout.write(("\n"+str(np.round(Tr[:, :, 9], 0))+"\n"))

Output from serial case:

PID = 0; 1000 elements on processor 0 of 1
[[32. 32. 32. 32. 32. 32. 32. 32. 32. 32.]
[35. 35. 35. 35. 35. 35. 35. 35. 35. 35.]
[39. 39. 39. 39. 39. 39. 39. 39. 39. 39.]
[46. 46. 46. 46. 46. 46. 46. 46. 46. 46.]
[55. 55. 55. 55. 55. 55. 55. 55. 55. 55.]
[65. 65. 65. 65. 65. 65. 65. 65. 65. 65.]
[74. 74. 74. 74. 74. 74. 74. 74. 74. 74.]
[81. 81. 81. 81. 81. 81. 81. 81. 81. 81.]
[85. 85. 85. 85. 85. 85. 85. 85. 85. 85.]
[88. 88. 88. 88. 88. 88. 88. 88. 88. 88.]]

Output from the parallel case with Trilinos --no-pysparse, Gmsh 4.11.1 (made via "conda-trilinos-lock.yml"):

Warning : Saving a partitioned mesh in a format older than 4.0 may cause information loss

PID = 1; 197 elements on processor 1 of 10
PID = 2; 241 elements on processor 2 of 10
PID = 6; 185 elements on processor 6 of 10
PID = 7; 194 elements on processor 7 of 10
PID = 3; 207 elements on processor 3 of 10
PID = 4; 194 elements on processor 4 of 10
PID = 5; 199 elements on processor 5 of 10
PID = 8; 256 elements on processor 8 of 10
PID = 9; 196 elements on processor 9 of 10
PID = 0; 223 elements on processor 0 of 10
[[43. 48. 46. 55. 52. 62. 72. 70. 79. 44.]
[42. 47. 45. 52. 62. 61. 70. 69. 77. 44.]
[42. 47. 55. 52. 64. 60. 72. 79. 79. 40.]
[37. 41. 39. 47. 58. 55. 67. 76. 81. 40.]
[39. 42. 42. 48. 48. 54. 67. 76. 83. 41.]
[44. 51. 49. 57. 69. 66. 76. 83. 83. 64.]
[60. 72. 80. 80. 87. 84. 90. 88. 91. 79.]
[85. 84. 89. 89. 92. 92. 91. 93. 93. 79.]
[86. 84. 90. 89. 92. 92. 92. 94. 93. 83.]
[88. 86. 88. 92. 90. 91. 93. 91. 92. 93.]]

With regards to that warning: Gmsh 4.11.1 was documented as Fipy-compatible (>=4.5.2) in 2021, but I tried downgrading to <4.0 due to on older discussion here. Furthermore, the FiPy documentation notes that "PyTrilinos on conda-forge presently only provides 12.10, which is limited to Python 2.x.", whilst the above environment was run with Python 3.7.12, PyTrilinos 12.18.1 and was installed via conda-forge. Although this may contribute, it wasn't relevant to the following case:

Output from the parallel case with PETSC, Gmsh 3.0.6 and without Epetra.PyComm().MyPID() or from PyTrilinos import Epetra:

196 elements on processor 5 of 10
215 elements on processor 7 of 10
199 elements on processor 8 of 10
192 elements on processor 1 of 10
239 elements on processor 3 of 10
186 elements on processor 2 of 10
192 elements on processor 4 of 10
227 elements on processor 6 of 10
210 elements on processor 9 of 10
180 elements on processor 0 of 10
[[44. 44. 44. 44. 44. 44. 44. 44. 43. 43.]
[47. 47. 47. 47. 47. 47. 47. 47. 46. 46.]
[53. 53. 53. 53. 53. 53. 53. 52. 52. 52.]
[61. 60. 61. 61. 61. 61. 60. 60. 60. 59.]
[68. 68. 68. 68. 69. 69. 68. 68. 67. 67.]
[76. 76. 76. 76. 76. 76. 76. 76. 75. 75.]
[83. 83. 83. 83. 83. 83. 83. 82. 82. 82.]
[88. 88. 88. 88. 88. 87. 87. 87. 87. 87.]
[91. 91. 91. 91. 91. 91. 91. 90. 90. 90.]
[92. 92. 92. 92. 92. 92. 92. 92. 92. 92.]]

Closer, no more warning message, but also no correct results. I suspect that this could be a separate problem. With both cases:

  • I had the same problem when substituting the "GmshGrid3D" class for a "Gmsh3D("mesh.geo_unrolled)".
  • All of the FiPy tests ran with no errors.
  • Sweeping made no difference – residuals reach around 1e-9 after one solve.

I would appreciate any suggestions – if there are questions or if more information is needed, just let me know and I'll respond as soon as possible.

Kind regards,
Ed

@Edward-Rawson
Copy link
Author

Quick update: For the same Gmsh version (3.0.6), I have identical results with the Trilinos --no-pysparse solver as with the above PETSC (3.20.4) case. Considering this, I tried to replicate this method that mentioned Gmsh not always giving the correct sized grid by replacing the second chunk of code in above example by the following:

mesh = GmshGrid3D(0.001, 0.001, 0.001, 10, 10, 10)
#X = mesh.cellCenters[0]
T = CellVariable(mesh=mesh)
T0 = np.ones((10, 10, 10)) * 20.
T0[5:, :, :] = 100
T0 = np.resize(T0.flatten(), len(T.globalValue))
T.setValue(T0.copy())
alpha = CellVariable(mesh=mesh, value=2.88e-6)
#T.setValue(100, where=(X >= 0.5*max(X)))
eq = TransientTerm() == DiffusionTerm(coeff=alpha)

This returns;
For Gmsh 3.0.6:

PID = 1; 192 elements on processor 1 of 10
PID = 2; 186 elements on processor 2 of 10
PID = 3; 239 elements on processor 3 of 10
PID = 4; 192 elements on processor 4 of 10
PID = 5; 196 elements on processor 5 of 10
PID = 6; 227 elements on processor 6 of 10
PID = 7; 215 elements on processor 7 of 10
PID = 8; 199 elements on processor 8 of 10
PID = 9; 210 elements on processor 9 of 10
PID = 0; 180 elements on processor 0 of 10
[[32. 32. 32. 32. 32. 32. 32. 32. 32. 32.]
[35. 35. 35. 35. 35. 35. 35. 35. 35. 35.]
[39. 39. 39. 39. 39. 39. 39. 39. 39. 39.]
[46. 46. 46. 46. 46. 46. 46. 46. 46. 46.]
[55. 55. 55. 55. 55. 55. 55. 55. 55. 55.]
[65. 65. 65. 65. 65. 65. 65. 65. 65. 65.]
[74. 74. 74. 74. 74. 74. 74. 74. 74. 74.]
[81. 81. 81. 81. 81. 81. 81. 81. 81. 81.]
[85. 85. 85. 85. 85. 85. 85. 85. 85. 85.]
[88. 88. 88. 88. 88. 88. 88. 88. 88. 88.]]

For Gmsh 4.11.1:

Warning : Saving a partitioned mesh in a format older than 4.0 may cause information loss

PID = 1; 197 elements on processor 1 of 10
PID = 2; 241 elements on processor 2 of 10
PID = 3; 207 elements on processor 3 of 10
PID = 4; 194 elements on processor 4 of 10
PID = 5; 199 elements on processor 5 of 10
PID = 6; 185 elements on processor 6 of 10
PID = 7; 194 elements on processor 7 of 10
PID = 8; 256 elements on processor 8 of 10
PID = 9; 196 elements on processor 9 of 10
PID = 0; 223 elements on processor 0 of 10
[[30. 32. 32. 35. 36. 41. 48. 48. 56. 31.]
[32. 32. 37. 37. 41. 44. 49. 53. 61. 33.]
[44. 37. 36. 43. 42. 52. 50. 58. 60. 33.]
[36. 36. 40. 41. 42. 47. 49. 58. 67. 38.]
[50. 40. 52. 43. 55. 50. 54. 61. 69. 57.]
[54. 55. 68. 65. 66. 71. 75. 75. 82. 55.]
[60. 65. 70. 74. 76. 81. 81. 84. 85. 57.]
[66. 65. 74. 73. 79. 79. 79. 82. 82. 58.]
[67. 69. 74. 75. 80. 80. 81. 82. 83. 71.]
[77. 78. 85. 83. 83. 88. 84. 85. 87. 90.]]

@guyer
Copy link
Member

guyer commented Feb 20, 2024

Thanks for the detailed report.

There are two issues going on here:

  • The cell ordering is not guaranteed to be the same in parallel as it is in serial, so np.reshape(Tg, (10, 10, 10))[:, :, 9] does not give the same collection of cells. You can see this by defining

    Tg = mesh.z.globalValues

    where (after disabling rounding), in serial, I get

     PID = 0; 1000 elements on processor 0 of 1
    [[0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]
     [0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]
     [0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]
     [0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]
     [0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]
     [0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]
     [0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]
     [0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]
     [0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]
     [0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]]
    

    but, in parallel, I get

     PID = 2; 226 elements on processor 2 of 8
     PID = 3; 232 elements on processor 3 of 8
     PID = 4; 225 elements on processor 4 of 8
     PID = 5; 229 elements on processor 5 of 8
     PID = 6; 238 elements on processor 6 of 8
     PID = 1; 239 elements on processor 1 of 8
     PID = 7; 226 elements on processor 7 of 8
     PID = 0; 239 elements on processor 0 of 8
    [[0.0045 0.0045 0.0045 0.0045 0.0045 0.0045 0.0045 0.0045 0.0045 0.0045]
     [0.0045 0.0045 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095]
     [0.0055 0.0055 0.0055 0.0085 0.0085 0.0085 0.0085 0.0075 0.0075 0.0075]
     [0.0075 0.0075 0.0075 0.0075 0.0075 0.0075 0.0075 0.0035 0.0035 0.0045]
     [0.0045 0.0045 0.0045 0.0045 0.0045 0.0025 0.0025 0.0035 0.0035 0.0035]
     [0.0065 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0095 0.0075 0.0075]
     [0.0055 0.0055 0.0045 0.0045 0.0015 0.0015 0.0015 0.0015 0.0015 0.0015]
     [0.0015 0.0015 0.0015 0.0015 0.0005 0.0045 0.0045 0.0045 0.0045 0.0045]
     [0.0045 0.0045 0.0045 0.0045 0.0005 0.0005 0.0005 0.0055 0.0095 0.0095]
     [0.0095 0.0065 0.0065 0.0065 0.0085 0.0085 0.0085 0.0095 0.0095 0.0095]]
    

    This can be remedied by letting FiPy take the slice for you:

    x, y = (np.indices((10, 10)) + 0.5) * 0.001
    z = np.empty_like(x)
    z[:] = 0.0095
    
    xyz = numerix.row_stack([x.flat, y.flat, z.flat])
    Tg = T(xyz)
    
    if parallelComm.procID == 0:
        Tr = np.reshape(Tg, (10, 10))
        sys.stdout.write(("\n"+str(np.round(Tr, 0))+"\n"))
  • Unfortunately, this still doesn't give the correct result. The reason is that the initial condition is not the same, which can be seen by disabling the solve loop. In serial, the initial temperature in that slice is

     PID = 0; 1000 elements on processor 0 of 1
    [[ 20.  20.  20.  20.  20.  20.  20.  20.  20.  20.]
     [ 20.  20.  20.  20.  20.  20.  20.  20.  20.  20.]
     [ 20.  20.  20.  20.  20.  20.  20.  20.  20.  20.]
     [ 20.  20.  20.  20.  20.  20.  20.  20.  20.  20.]
     [ 20.  20.  20.  20.  20.  20.  20.  20.  20.  20.]
     [100. 100. 100. 100. 100. 100. 100. 100. 100. 100.]
     [100. 100. 100. 100. 100. 100. 100. 100. 100. 100.]
     [100. 100. 100. 100. 100. 100. 100. 100. 100. 100.]
     [100. 100. 100. 100. 100. 100. 100. 100. 100. 100.]
     [100. 100. 100. 100. 100. 100. 100. 100. 100. 100.]]
    

    whereas, in parallel, it is

     PID = 1; 239 elements on processor 1 of 8
     PID = 4; 225 elements on processor 4 of 8
     PID = 5; 229 elements on processor 5 of 8
     PID = 6; 238 elements on processor 6 of 8
     PID = 3; 232 elements on processor 3 of 8
     PID = 2; 226 elements on processor 2 of 8
     PID = 7; 226 elements on processor 7 of 8
     PID = 0; 239 elements on processor 0 of 8
    [[ 20.  20.  20.  20.  20.  20.  20.  20.  20.  20.]
     [ 20.  20.  20.  20.  20.  20.  20.  20.  20.  20.]
     [ 20.  20.  20.  20.  20.  20.  20.  20.  20.  20.]
     [ 20. 100. 100. 100. 100. 100. 100. 100. 100. 100.]
     [ 20.  20. 100. 100. 100. 100. 100. 100. 100. 100.]
     [100. 100. 100. 100. 100. 100. 100. 100. 100. 100.]
     [100. 100. 100. 100. 100. 100. 100. 100. 100. 100.]
     [100. 100. 100. 100. 100. 100. 100. 100. 100. 100.]
     [100. 100. 100. 100. 100. 100. 100. 100. 100. 100.]
     [100. 100. 100. 100. 100. 100. 100. 100. 100. 100.]]
    

    This happens because T.setValue(100, where=(X >= 0.5*max(X))) is comparing to the maximum value of X on each processor. Use T.setValue(100, where=(X >= 0.5*X.max())) to compare against the maximum value of X anywhere in the mesh.

Using either Trilinos or PETSc, making these two changes gives

 PID = 1; 239 elements on processor 1 of 8
 PID = 2; 226 elements on processor 2 of 8
 PID = 3; 232 elements on processor 3 of 8
 PID = 4; 225 elements on processor 4 of 8
 PID = 5; 229 elements on processor 5 of 8
 PID = 7; 226 elements on processor 7 of 8
 PID = 6; 238 elements on processor 6 of 8
 PID = 0; 239 elements on processor 0 of 8
[[32. 32. 32. 32. 32. 32. 32. 32. 32. 32.]
 [35. 35. 35. 35. 35. 35. 35. 35. 35. 35.]
 [39. 39. 39. 39. 39. 39. 39. 39. 39. 39.]
 [46. 46. 46. 46. 46. 46. 46. 46. 46. 46.]
 [55. 55. 55. 55. 55. 55. 55. 55. 55. 55.]
 [65. 65. 65. 65. 65. 65. 65. 65. 65. 65.]
 [74. 74. 74. 74. 74. 74. 74. 74. 74. 74.]
 [81. 81. 81. 81. 81. 81. 81. 81. 81. 81.]
 [85. 85. 85. 85. 85. 85. 85. 85. 85. 85.]
 [88. 88. 88. 88. 88. 88. 88. 88. 88. 88.]]

@guyer
Copy link
Member

guyer commented Feb 20, 2024

Furthermore, the FiPy documentation notes that "PyTrilinos on conda-forge presently only provides 12.10, which is limited to Python 2.x.", whilst the above environment was run with Python 3.7.12, PyTrilinos 12.18.1 and was installed via conda-forge.

Thank you for pointing this out. The PyTrilinos feedstock on conda-forge still lags way behind Python versions, but you're correct that it installs in Python 3.7 (and even 3.8 on some platforms). I'll fix the documentation (#1018).

@Edward-Rawson
Copy link
Author

Edward-Rawson commented Feb 20, 2024

Well this turned out to be a much simpler problem; me! I clearly have some more learning to do. Thanks for taking the time to explain, this would've taken me a while to find 👍. Good to know about the in-built slicing also

@guyer
Copy link
Member

guyer commented Feb 21, 2024

I would classify both of these issues as subtle. I appreciate you asking about them.

@guyer guyer added the question label Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants