BEAMS3D not producing correct results with PPPL GCC #109

zhucaoxiang · 2021-03-16T16:38:31Z

This is for tracking a known issue reported by @pomphrey. Using the GCC version (gcc/9.3) on PPPL, the BEAMS3D executable is not able to produce correct results.

Here is a simple example. Running the input.ORBITS case in BENCHMARKS/BEAMS3D with -N 1 -n 16. The GCC version somehow gives non-sense results. The Intel version gives close results (though not exactly the same).

@lazersos Do you have any idea what causes this issue? You can find the makefile at https://github.com/PrincetonUniversity/STELLOPT/blob/develop/SHARE/make_pppl_gcc.inc

The text was updated successfully, but these errors were encountered:

zhucaoxiang · 2021-03-16T16:45:35Z

Looks like the GCC one is not calling "Trajectory Calculation".

Here is the screen output for the GCC executable.

(base) czhu@sunfire03:BEAMS3D_TEST$ debug -N 1 -n 16 xbeams3d -vmec ORBITS -plasma
BEAMS3D Version  2.70
-----  HDF5 Parameters  -----
   HDF5_version:   1.12 release: 00
-----  MPI Parameters  -----
   MPI_version:   3.01
   Open MPI v4.0.3, package: Open MPI [email protected] Distribution, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020
   Nproc_total:        16
   Nproc_shared:       16
----- Input Parameters -----
   FILE: input.ORBITS
   R   = [  9.00000, 11.00000];  NR:    128
   PHI = [ 0.00000, 6.28319];  NPHI:   32
   Z   = [-1.00000, 1.00000];  NZ:    128
   # of Particles to Start:       40
   MAGNETIC FIELD FROM PLASMA ONLY!
----- Profile Parameters -----
   Zeff = [  1.00000,  1.00000];  NZEFF:    6
   PLASMA_MASS =    1.00728 amu
   PLASMA_ZAVG =    1.00000 <Z>
   PLASMA_ZMEAN =    1.00000 [Z]
----- VMEC Information -----
   FILE: ORBITS
   R       = [  9.00000, 11.00000]
   Z       = [-0.95106, 0.95106]
   BETA    =   0.000;  I  =  -0.636 [MA]
   AMINOR  =   1.000 [m]
   PHIEDGE =  15.700 [Wb]
   VOLUME  = 197.392 [m^3]
 -----  Vessel Information  -----
   Wall Name :  HARMONICS
   Date      :  TODAY
   Faces     :   28800

----- Constructing Splines -----
   R   = [  9.00000, 11.00000];  NR:    128
   PHI = [ 0.00000, 6.28319];  NPHI:   32
   Z   = [-1.00000, 1.00000];  NZ:    128
   HERMITE FORM: 1
----- FOLLOWING PARTICLE TRAJECTORIES -----
      Method: LSODE
   Particles:        40
       Steps:     32511   Delta-t:  0.3076E-07
      NPOINC:      1000    dt_out:  0.1000E-05
         Tol:  0.1000E-07  Type: 10

----- BEAM DIAGNOSTICS -----
 BEAMLINE     ENERGY [keV]   CHARGE [e]   MASS [Mp]   Particles [#]   Lost [%]  Shinethrough [%]  Port [%]
    1             0             0             0              40         40.0          0.0          0.0

And from the Intel executable.

(base) czhu@sunfire03:BEAMS3D_TEST$ debug -N 1 -n 16 xbeams3d -vmec ORBITS -plasma
BEAMS3D Version  2.70
-----  HDF5 Parameters  -----
   HDF5_version:   1.10 release: 05
-----  MPI Parameters  -----
   MPI_version:   3.01
   Open MPI v4.0.3, package: Open MPI [email protected] Distribution, ident: 4.0.3, repo rev: v4.0.3, Mar 03, 2020
   Nproc_total:        16
   Nproc_shared:       16
----- Input Parameters -----
   FILE: input.ORBITS
   R   = [  9.00000, 11.00000];  NR:    128
   PHI = [ 0.00000, 6.28319];  NPHI:   32
   Z   = [-1.00000, 1.00000];  NZ:    128
   # of Particles to Start:       40
   MAGNETIC FIELD FROM PLASMA ONLY!
----- Profile Parameters -----
   Zeff = [  1.00000,  1.00000];  NZEFF:    6
   PLASMA_MASS =    1.00728 amu
   PLASMA_ZAVG =    1.00000 <Z>
   PLASMA_ZMEAN =    1.00000 [Z]
----- VMEC Information -----
   FILE: ORBITS
   R       = [  9.00000, 11.00000]
   Z       = [-0.95106, 0.95106]
   BETA    =   0.000;  I  =  -0.636 [MA]
   AMINOR  =   1.000 [m]
   PHIEDGE =  15.700 [Wb]
   VOLUME  = 197.392 [m^3]
 -----  Vessel Information  -----
   Wall Name :  HARMONICS
   Date      :  TODAY
   Faces     :   28800
     Plasma Field Calculation [100]%
----- Constructing Splines -----
   R   = [  9.00000, 11.00000];  NR:    128
   PHI = [ 0.00000, 6.28319];  NPHI:   32
   Z   = [-1.00000, 1.00000];  NZ:    128
   HERMITE FORM: 1
----- FOLLOWING PARTICLE TRAJECTORIES -----
      Method: LSODE
   Particles:        40
       Steps:     32511   Delta-t:  0.3076E-07
      NPOINC:      1000    dt_out:  0.1000E-05
         Tol:  0.1000E-07  Type: 10
     Trajectory Calculation [  0]%
----- BEAM DIAGNOSTICS -----
 BEAMLINE     ENERGY [keV]   CHARGE [e]   MASS [Mp]   Particles [#]   Lost [%]  Shinethrough [%]  Port [%]
    1             0             0           ***              40          0.0          0.0          0.0
----- BEAMS3D DONE -----

lazersos · 2021-03-17T12:57:42Z

@zhucaoxiang This is an odd behavior. For reference I usually debug on my Mac with GCC and OpenMPI. For production runs I'm usually on the MPCDF machines (or Cineca) which compile with Intel. What branch was @pomphrey working on?

zhucaoxiang · 2021-03-17T13:24:49Z

@lazersos The develop branch.

pomphrey · 2021-03-17T13:27:08Z

@lazersos :
/p/stellopt/SOURCES/stellopt_develop_gcc/bin/xbeams3d -plasma -vmec RUN12B.00515
---trying to reproduce an alpha loss calculation that Matt had previously run, using identical input and wout files

lazersos · 2021-03-17T18:24:38Z

Run the BENCHMARKS with make test_beams3d in the BENCHMARKS directory and see if they pass. If they don't then there must be an issue with the compiler options. Do you get this for non-debug GCC runs?

lazersos · 2021-03-18T07:55:57Z

I can confirm this behavior for GCC on Mac when using the debugging options in the make_pppl_gnu.inc makefile. In fact here I don't even get as far.

lazerson@Lazersons-MacBook-Air BENCHMARKS % make test_beams3d
Beginning testing of BEAMS3D
/Applications/Xcode.app/Contents/Developer/usr/bin/make test_beams3d_basic
mpiexec -np 2 /Users/lazerson/bin/xbeams3d -vmec ORBITS -plasma
BEAMS3D Version  2.95
-----  HDF5 Parameters  -----
   HDF5_version:   1.12 release: 00
-----  MPI Parameters  -----
   MPI_version:   3.01
   Open MPI v4.1.0, package: Open MPI [email protected] Distribution, ident: 4.1.0, repo rev: v4.1.0, Dec 18, 2020
   Nproc_total:         2
   Nproc_shared:        2
----- Input Parameters -----
   FILE: input.ORBITS
   R   = [  9.00000, 11.00000];  NR:    128
   PHI = [ 0.00000, 6.28319];  NPHI:   32
   Z   = [-1.00000, 1.00000];  NZ:    128
   # of Particles to Start:       40
   MAGNETIC FIELD FROM PLASMA ONLY!
----- Profile Parameters -----
   Zeff = [  1.00000,  1.00000];  NZEFF:    6
   PLASMA_MASS =    1.00728 amu
   PLASMA_ZAVG =    1.00000 <Z>
   PLASMA_ZMEAN =    1.00000 [Z]

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x107455d3e
#1  0x10745549a
#2  0x7fff7006a5fc
#3  0x1036e1feb
#4  0x10372f808
#5  0x10372fa34
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node Lazersons-MacBook-Air exited on signal 8 (Floating point exception: 8).
--------------------------------------------------------------------------
make[1]: *** [test_beams3d_basic] Error 136
make: *** [test_beams3d] Error 2

I can confirm that the issue seems to stem from the compiler option -ffpe-trap=zero. I suspect this is because there are times where division by zero can happen. When this can happen we explicitly deal with the situation after the fact. However, I need to look into this in more detail.

zhucaoxiang · 2021-03-18T08:24:30Z

@lazersos Good find! So there might be a "division by zero" issue. The GCC compiler with extra debug flags catches it while the release flags just return unphysical results without errors. For Intel, there are some hidden default flags that make it work normally.

lazersos · 2021-03-18T09:41:38Z

The issue came from a grid helper variable that was declared to be n sized but should really have only been n-1 sized. As a result the nth position was never initialized. A second helper variable was also created with the same problem, only it's value was set to the inverse of the other variable. As a result in the nth position we got a divide by Nan. Now this never cause a direct issue in the code since we never referenced the array beyond the n-1 position. However the compiler flags for debug threw an error at that step.

For most situations this never caused an issue because the array element which was Nan was never reference by the code. Except in the beams3d_init_fusion.f90 routine. This has now been addressed. The fix is in pull request #79. @zhucaoxiang could you approve the merge it's been sitting there a while.

zhucaoxiang · 2021-03-19T09:54:38Z

@lazersos The new commits fix the error, but still on the PPPL cluster, neither using GCC nor Intel will give correct numbers when running the input.ORBITS case.

lazersos · 2021-03-19T15:35:42Z

@zhucaoxiang I can't replicate the error on Macports, MPCDF, or Cincea, so I can't fix that. The Intel results you posted look pretty close to correct. The GCC results look totally wrong, but a cursory glance at the make_pppl_gcc.inc doesn't seem to indicate anything that should cause a problem with a release build.

zhucaoxiang · 2021-03-19T16:43:11Z

@lazersos This doesn't make sense. I can reproduce your results on Eddy with the Intel compiler (all True). On PPPL, the Intel version only gives about half Trues. The GCC version just returns ridiculous numbers. I will test on my laptop and also over the GitHub server.

zhucaoxiang · 2021-03-22T02:22:30Z

@lazersos @pomphrey test_beams3d_basic passed on Eddy and Stellar with the Intel compiler. On my laptop with GCC-10, it was half passed.

mpirun -np 2 /Users/caoxiang/bin/xbeams3d -vmec ORBITS -plasma
BEAMS3D Version  2.95
-----  HDF5 Parameters  -----
   HDF5_version:   1.12 release: 00
-----  MPI Parameters  -----
   MPI_version:   3.01
   Open MPI v4.0.5, package: Open MPI brew@Catalina Distribution, ident: 4.0.5, repo rev: v4.0.5, Aug 26, 2020
   Nproc_total:         2
   Nproc_shared:        2
----- Input Parameters -----
   FILE: input.ORBITS
   R   = [  9.00000, 11.00000];  NR:    128
   PHI = [ 0.00000, 6.28319];  NPHI:   32
   Z   = [-1.00000, 1.00000];  NZ:    128
   # of Particles to Start:       40
   MAGNETIC FIELD FROM PLASMA ONLY!
----- Profile Parameters -----
   Zeff = [  1.00000,  1.00000];  NZEFF:    6
   PLASMA_MASS =    1.00728 amu
   PLASMA_ZAVG =    1.00000 <Z>
   PLASMA_ZMEAN =    1.00000 [Z]
----- VMEC Information -----
   FILE: ORBITS
   R       = [  9.00000, 11.00000]
   Z       = [-0.95106, 0.95106]
   BETA    =    0.00;  I  =   -0.64 [MA]
   AMINOR  =    1.00 [m]
   PHIEDGE =   15.70 [Wb]
   VOLUME  =  197.39 [m^3]

----- Constructing Splines -----
   R   = [  9.00000, 11.00000];  NR:    128
   PHI = [ 0.00000, 6.28319];  NPHI:   32
   Z   = [-1.00000, 1.00000];  NZ:    128
   HERMITE FORM: 1
 -----  Vessel Information  -----
   Wall Name :  HARMONICS
   Date      :  TODAY
   Faces     :   43200
   R_WALL   = [  9.00000, 11.00000]
   Z_WALL   = [ -1.00000,  1.00000]
----- FOLLOWING PARTICLE TRAJECTORIES -----
      Method: LSODE
   Particles:        40
       Steps:     33000   Delta-t:   30.303E-09
      NPOINC:      1000    dt_out:    1.000E-06
         Tol:   10.000E-09  Type: 10

----- BEAM DIAGNOSTICS -----
 BEAMLINE     ENERGY [keV]   CHARGE [e]   MASS [Mp]   Particles [#]   Lost [%]  Shinethrough [%]  Port [%]
    1             0             1             1              40          5.0          0.0          0.0
----- BEAMS3D DONE -----
BEAMS3D VERSION: 2.950000047683716
==== Vectors ====
  turning: 0.12277330387554908   0.12275448   105%
  delta: 0.009462204370692617   0.00946046   1778%
  R0: 10.5   10.5   0%
  R1: 10.499992712044104   10.499993   0%
  R100: 10.499753912830686   10.499754   68%
  R500: 10.503794778665622   10.503795   55%
=================
  STATUS: FAIL!!!!!

lazersos · 2021-03-23T09:58:44Z

@zhucaoxiang OK so this looks like the issue may just be a roundoff error. The code is showing that the particle positions agree well after 1 time step but for some particles (probably deeply trapped) after 100 timesteps the position isn't correct. There are a few things which could be the issue.

Can you upload the beams3d HDF5 file this run produced so I can take a look at it? This could be related to #110.

lazersos · 2021-03-23T10:25:10Z

@zhucaoxiang I suspect the issue is the -finit-real=snan option. It seems to screw up vmec_utils.f even when I run with the -O2 flag the code runs significantly slower when calling the routines from this code. It could be the magnetic field lookup is being messed up by said flag.

zhucaoxiang · 2021-03-23T12:51:18Z

@lazersos But both on my laptop and the PPPL cluster, I was using the "release" mode. -finit-real=snan is not activated.
Attached you can find the two HDF5 files (one from my laptop with GCC-10 and the other from PPPL portal).

BTW, I am surprised that two 50-Mb-files can be compressed to 6 Mb.

beams3d.zip

lazersos · 2021-03-23T13:03:28Z

@zhucaoxiang Yeah it's what I suspected. The magnetic field grid isn't being initialized correctly:

Something odd if happening here. What I can confirm from my end is that the tests work fine for Macports with

FLAGS_R = -O2 -g -fexternal-blas -fbacktrace -fcheck=all

The code also runs fine on Draco, Cobra, Raven, and Marconi.

The file size is probably small because the grids have quite a bit of redundancy for this axisymmetric case.

zhucaoxiang · 2021-03-23T13:09:03Z

@lazersos On the PPPL cluster, I was using FLAGS_R = -O2 -fexternal-blas and on my laptop with "homebrew", I am using FLAGS_R = -O2 -fallow-argument-mismatch.

zhucaoxiang · 2021-03-25T16:33:25Z

@lazersos Source of the discrepancy on my laptop is found. I changed the compiler flag from FLAGS_R = -O2 -fallow-argument-mismatch to FLAGS_R = -O2 -fallow-argument-mismatch -g -fbacktrace -fcheck=all. Now I got all the numbers matched.

I am really surprised that -g -fbacktrace -fcheck=all makes a difference. In my understanding, they are either for debugging or for checking.

----- BEAMS3D DONE -----
BEAMS3D VERSION: 2.950000047683716
==== Vectors ====
  turning: 0.12277346481644447   0.12275448   0%
  delta: 0.009462146399886628   0.00946046   0%
  R0: 10.5   10.5   0%
  R1: 10.499992711949272   10.499993   0%
  R100: 10.499753902139718   10.499754   0%
  R500: 10.503794697358195   10.503795   0%
=================
  STATUS: PASS

zhucaoxiang · 2021-03-31T07:18:06Z

@lazersos There might be something wrong with the sources. If you add one line before line 784 in vmec_utils.f to print the value of ui, you will get NaN values when checking the BENCHMARKS/BEAMS3D/input.ORBITS (probably you have to run in debugging mode). This is probably the reason why it doesn't work. Do you have any clues?

[stellar-i01n1:2290228:0:2290228] Caught signal 8 (Floating point exception: floating-point invalid operation)
 ui =                      NaN

zhucaoxiang added the bug Something isn't working label Mar 16, 2021

zhucaoxiang assigned lazersos Mar 16, 2021

zhucaoxiang assigned jcschmitt and alexleviness Mar 16, 2021

lazersos linked a pull request Mar 18, 2021 that will close this issue

Lazerson #79

Merged

lazersos closed this as completed in #79 Apr 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BEAMS3D not producing correct results with PPPL GCC #109

BEAMS3D not producing correct results with PPPL GCC #109

zhucaoxiang commented Mar 16, 2021

zhucaoxiang commented Mar 16, 2021

lazersos commented Mar 17, 2021

zhucaoxiang commented Mar 17, 2021

pomphrey commented Mar 17, 2021

lazersos commented Mar 17, 2021

lazersos commented Mar 18, 2021

zhucaoxiang commented Mar 18, 2021

lazersos commented Mar 18, 2021

zhucaoxiang commented Mar 19, 2021

lazersos commented Mar 19, 2021

zhucaoxiang commented Mar 19, 2021

zhucaoxiang commented Mar 22, 2021

lazersos commented Mar 23, 2021

lazersos commented Mar 23, 2021 •

edited

Loading

zhucaoxiang commented Mar 23, 2021

lazersos commented Mar 23, 2021

zhucaoxiang commented Mar 23, 2021

zhucaoxiang commented Mar 25, 2021

zhucaoxiang commented Mar 31, 2021

BEAMS3D not producing correct results with PPPL GCC #109

BEAMS3D not producing correct results with PPPL GCC #109

Comments

zhucaoxiang commented Mar 16, 2021

zhucaoxiang commented Mar 16, 2021

lazersos commented Mar 17, 2021

zhucaoxiang commented Mar 17, 2021

pomphrey commented Mar 17, 2021

lazersos commented Mar 17, 2021

lazersos commented Mar 18, 2021

zhucaoxiang commented Mar 18, 2021

lazersos commented Mar 18, 2021

zhucaoxiang commented Mar 19, 2021

lazersos commented Mar 19, 2021

zhucaoxiang commented Mar 19, 2021

zhucaoxiang commented Mar 22, 2021

lazersos commented Mar 23, 2021

lazersos commented Mar 23, 2021 • edited Loading

zhucaoxiang commented Mar 23, 2021

lazersos commented Mar 23, 2021

zhucaoxiang commented Mar 23, 2021

zhucaoxiang commented Mar 25, 2021

zhucaoxiang commented Mar 31, 2021

lazersos commented Mar 23, 2021 •

edited

Loading