Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Non-numeric pressure" error (pressure = -nan) when attempting NPT in LAMMPS #53

Open
samueldyoung29ctr opened this issue Aug 23, 2024 · 6 comments

Comments

@samueldyoung29ctr
Copy link

samueldyoung29ctr commented Aug 23, 2024

I'm trying to do NPT calculations in LAMMPS using pair_allegro, but the Allegro model I trained is predicting -nan for the system pressure, so the NPT run fails. If I run under NVE or NVT, including the press property in the LAMMPS thermo logging, I see output like this:

   Step          Time           Temp          Press   
         0   0              450           -nan        
         1   0.001          450.54175     -nan        
         2   0.002          444.03951     -nan        
         3   0.003          440.5265      -nan        

If I attempt a NPT calculation, like this

# Run MD in the NPT ensemble, with a Nosé-Hoover thermostat starting at 450.0 K.
units metal
fix mynose all npt & 
    temp 450.0 450.0 0.05 &
    tchain 3 &
    iso 1.0 1.0 0.3
run 1000

I immediately get this error:

Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
ERROR: Non-numeric pressure - simulation unstable (src/fix_nh.cpp:1049)

I am training on a ASE dataset with stress information stored in units of energy / length^2:

In [11]: from ase.io import read

In [12]: traj = read("./trainval.traj", index=0)

In [13]: traj.get_stress()
Out[13]:
array([ 0.00203341,  0.00251201,  0.0009121 , -0.00069527,  0.00031445,
       -0.00031556])

and I can confirm that when using nequip-evaluate for inference on this same deployed model, I do get predicted stresses in the output file.

$ nequip-evaluate --train-dir <my training dir> --test-indexes test-indices.json --model test-64bit-deployed-model.pth --output output.xyz

$ head -n 3 output.xyz
440
Lattice="16.44719886779785 0.0 0.0 0.0 16.44719886779785 0.0 0.0 0.0 16.44719886779785" Properties=species:S:1:pos:R:3:energies:R:1:forces:R:3 original_dataset_index=1 energy=-2975.787905704823 stress="0.0008169945453196768 -0.00012941664369788713 0.0009707167252817042 -0.00012941664369788713 0.0004451946509430406 -0.0010155815638519358 0.0009707167252817042 -0.0010155815638519358 -0.00377564117326529" pbc="F F F"
O       -2.82277608      -6.28169346       9.18250656      -6.48192610       0.01213487      -0.06158887      -0.59497979

This problem happens no matter whether I use default_dtype: float32 in the config (and pair_style allegro3232 in the LAMMPS script) or default_dtype: float64 in the config (and pair_style allegro in the LAMMPS script). I am using NequIP 0.6.1, mir-allegro 0.2.0, and PyTorch 1.11.0 (CUDA 11.3, cuDNN 8.2.0). I compiled LAMMPS 02Aug23 using pair_allegro commit 20538c9, which is the commit introducing support for stress. Details of my compilation of LAMMPS, an example of my training config (except the default_dtype setting), and an example NVT LAMMPS input script are here.

I am forcing deletion of any overlapping atoms prior to the NPT run, and do not see any indication when running under NVT or NVE that atoms are too close together, have very high forces, or are otherwise causing the simulation to go unstable. If I switch my LAMMPS input to use pair_style lj/cut, I am able to observe pressures in the thermo output

Is there something obvious I'm missing about how to get pair_allegro to pass the stress predictions from my Allegro models to LAMMPS?

Thanks!

@Linux-cpp-lisp
Copy link
Collaborator

Hi @samueldyoung29ctr ,

Hm, odd. If you start a simulation in LAMMPS from exactly a frame that shows up in your training data, or the one that shows a stress tensor in nequip-evaluate, do you get the expected result at least on that first frame?

@samueldyoung29ctr
Copy link
Author

All of the starting geometries I launch from LAMMPS appear to show -nan pressure, even on the first step. I tried printing the virial tensor elements directly as they come from the torch model output, just after they are passed to LAMMPS.

Diagnostic printing of virial tensor elements

pair_allegro.cpp:

  if (vflag) {
    torch::Tensor v_tensor = output.at("virial").toTensor().cpu();
    auto v = v_tensor.accessor<outputtype, 3>();
    // Convert from 3x3 symmetric tensor format, which NequIP outputs, to the flattened form LAMMPS expects
    // First [0] index on v is batch
    virial[0] = v[0][0][0];
    virial[1] = v[0][1][1];
    virial[2] = v[0][2][2];
    virial[3] = v[0][0][1];
    virial[4] = v[0][0][2];
    virial[5] = v[0][1][2];
+   std::cout << "\tVirial Voigt vector: " << std::to_string(virial[0]) << ", " << std::to_string(virial[1]) << ", " << std::to_string(virial[2]) << ", " << std::to_string(virial[3]) << ", " << std::to_string(virial[4]) << ", " << std::to_string(virial[5]) << ".\n";
  }

pair_allegro_kokkos.cpp:

  if(vflag){
    torch::Tensor v_tensor = output.at("virial").toTensor().cpu();
    auto v = v_tensor.accessor<outputtype, 3>();
    // Convert from 3x3 symmetric tensor format, which NequIP outputs, to the flattened form LAMMPS expects
    // First [0] index on v is batch
    this->virial[0] = v[0][0][0];
    this->virial[1] = v[0][1][1];
    this->virial[2] = v[0][2][2];
    this->virial[3] = v[0][0][1];
    this->virial[4] = v[0][0][2];
    this->virial[5] = v[0][1][2];
+   std::cout << "\tVirial Voigt vector: " << std::to_string(this->virial[0]) << ", " << std::to_string(this->virial[1]) << ", " << std::to_string(this->virial[2]) << ", " << std::to_string(this->virial[3]) << ", " << std::to_string(this->virial[4]) << ", " << std::to_string(this->virial[5]) << ".\n";
  }

All virial components are -nan in the Kokkos-accelerated version, leading to the -nan pressure.

Outputs from running 10 steps of NVE

using pair_allegro:

Per MPI rank memory allocation (min/avg/max) = 4.907 | 4.907 | 4.907 Mbytes
   Step          Time           Temp          Press          PotEng         KinEng         TotEng         E_pair         E_bond       Econserve         Fmax
         0   0              450            7123.5399     -23812.81       204.6899      -23608.121     -23812.81       0             -23608.121      4.6981153
        Virial Voigt vector: 29.758970, 45.364822, 7.063734, 3.145753, -8.978357, -9.596899.
         1   0.0005         451.08768      7333.0305     -23813.316      205.18465     -23608.132     -23813.316      0             -23608.132      5.067274
        Virial Voigt vector: 39.200824, 45.856633, 13.364390, 5.394424, -21.287504, -7.662407.
         2   0.001          451.81109      7584.5196     -23813.664      205.5137      -23608.15      -23813.664      0             -23608.15       4.8237129
        Virial Voigt vector: 49.247942, 45.048843, 23.481787, 7.260359, -31.039119, -4.989960.
         3   0.0015         451.6869       7871.0145     -23813.617      205.45722     -23608.159     -23813.617      0             -23608.159      4.4277271
        Virial Voigt vector: 56.195390, 42.077853, 37.043569, 7.770809, -36.851748, -2.044238.
         4   0.002          450.46312      8115.5436     -23813.056      204.90056     -23608.155     -23813.056      0             -23608.155      4.4915488
        Virial Voigt vector: 59.136834, 37.968314, 54.470379, 6.436822, -38.168621, 0.849517.
         5   0.0025         448.37065      8329.2582     -23812.081      203.94876     -23608.133     -23812.081      0             -23608.133      4.3433504
        Virial Voigt vector: 57.057529, 36.101378, 73.077232, 3.659967, -35.406512, 3.224243.
         6   0.003          446.1616       8517.6018     -23811.049      202.94394     -23608.105     -23811.049      0             -23608.105      4.5180411
        Virial Voigt vector: 48.109272, 36.321093, 88.279174, 0.041631, -27.643993, 4.904825.
         7   0.0035         444.84745      8596.1772     -23810.426      202.34618     -23608.08      -23810.426      0             -23608.08       5.0060566
        Virial Voigt vector: 33.680779, 37.114159, 98.805940, -2.907780, -15.261765, 3.495411.
         8   0.004          445.1429       8553.8979     -23810.565      202.48057     -23608.085     -23810.565      0             -23608.085      5.2285217
        Virial Voigt vector: 19.102289, 39.672924, 101.937126, -5.476006, -0.462937, 0.214057.
         9   0.0045         446.96257      8446.2134     -23811.421      203.30828     -23608.113     -23811.421      0             -23608.113      5.5230971
        Virial Voigt vector: 9.956778, 44.490638, 97.406353, -8.867623, 14.901005, -3.847867.
        10   0.005          449.42526      8347.684      -23812.577      204.42847     -23608.148     -23812.577      0             -23608.148      5.7109114
Loop time of 4.88647 on 1 procs for 10 steps with 3520 atoms

and I can do NPT.

Output when running pair_allegro/kk:

Per MPI rank memory allocation (min/avg/max) = 3.84 | 3.84 | 3.84 Mbytes
   Step          Time           Temp          Press          PotEng         KinEng         TotEng         E_pair         E_bond       Econserve         Fmax
         0   0              450           -nan           -23812.81       204.6899      -23608.121     -23812.81       0             -23608.121      4.6981153

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
         1   0.0005         451.08768     -nan           -23813.316      205.18465     -23608.132     -23813.316      0             -23608.132      5.067274

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
         2   0.001          451.81109     -nan           -23813.664      205.5137      -23608.15      -23813.664      0             -23608.15       4.8237129

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
         3   0.0015         451.6869      -nan           -23813.617      205.45722     -23608.159     -23813.617      0             -23608.159      4.4277271

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
         4   0.002          450.46312     -nan           -23813.056      204.90056     -23608.155     -23813.056      0             -23608.155      4.4915488

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
         5   0.0025         448.37065     -nan           -23812.081      203.94877     -23608.133     -23812.081      0             -23608.133      4.3433504

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
         6   0.003          446.1616      -nan           -23811.049      202.94394     -23608.105     -23811.049      0             -23608.105      4.5180411

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
         7   0.0035         444.84745     -nan           -23810.426      202.34618     -23608.08      -23810.426      0             -23608.08       5.0060566

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
         8   0.004          445.1429      -nan           -23810.565      202.48057     -23608.085     -23810.565      0             -23608.085      5.2285217

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
         9   0.0045         446.96257     -nan           -23811.421      203.30828     -23608.113     -23811.421      0             -23608.113      5.5230971

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
        10   0.005          449.42526     -nan           -23812.577      204.42847     -23608.148     -23812.577      0             -23608.148      5.7109114
Loop time of 4.88749 on 1 procs for 10 steps with 3520 atoms

and the same error message when attempting NPT:

Setting up Verlet run ...
  Unit style    : metal
  Current step  : 10
  Time step     : 0.0005

        Virial Voigt vector: -nan, -nan, -nan, -nan, -nan, -nan.
ERROR: Non-numeric pressure - simulation unstable (src/fix_nh.cpp:1049)

Looks like the workaround is to use the non-Kokkos pair_allegro for now. I am building and linking against cxx11 libtorch 2.0.0-cpuonly, as this is what NERSC's build uses. My build of LAMMPS is CPU-only due to resource constraints on the cluster I use. I do see in pair_allegro.cpp that there is an additional compute_custom_tensor packed into the model input that doesn't exist in the Kokkos version---perhaps this has something to do with enabling virial model outputs?

c10::Dict<std::string, torch::Tensor> input;
input.insert("pos", pos_tensor.to(device));
input.insert("edge_index", edges_tensor.to(device));
input.insert("atom_types", ij2type_tensor.to(device));
input.insert("compute_custom_output", compute_custom_tensor);

Happy to do some more testing if you'd like.

@Linux-cpp-lisp
Copy link
Collaborator

My build of LAMMPS is CPU-only due to resource constraints on the cluster I use.

Aha. I don't think we've ever actually tested Kokkos pair_allegro on CPU, nor am I sure we'd expect it to have any benefits over the OpenMP pair_allegro "plain" on CPU. @anjohan thoughts?

Still, I guess we would have expected it to work...

compute_custom_tensor should not be relevant for virials.

@samueldyoung29ctr
Copy link
Author

samueldyoung29ctr commented Aug 31, 2024

Also, just to clarify, we should be training Allegro models on stresses in units of energy / length^3, right? E.g., for LAMMPS metal units, we should train Allegro on stresses in units of eV/ang^3, not units of bar?

@samueldyoung29ctr
Copy link
Author

samueldyoung29ctr commented Sep 12, 2024

Update: I got around to compiling LAMMPS with CUDA support, but am still seeing this issue when using Kokkos to utilize the GPUs (NVIDIA A100-SXM4-40GB GPUs, CUDA 12.4 drivers installed).

  • LAMMPS version: 02Aug23 (commit 46265e36c in official LAMMPS repo)
  • pair_allegro version: commit 20538c9 (Add stress, precision support, compute #43)
  • Build environment:
    • CUDA 11.6 (CUDA 12.4 installation is broken on my cluster)
    • GCC 10 compilers
    • AMD AOCL 4.0, providing non-Kokkos FFTW3, compiled by GCC
    • NVIDIA cuFFT (from CUDA 11.6), providing Kokkos FFTs
    • Penguin TrueHPC OpenMPI 4.1.4, compiled by GCC
    • Intel MKL 2024.1.0 (not used for FFTs, but just to provide bindings to allow compilation to succeed)
    • NVIDIA cuDNN 9.2.0.82_cuda11
    • NVIDIA libtorch 1.13.1+cu116-cxx11-abi (static, with deps)

My CMake config looks like this:

CMake config
cmake ../cmake \
    -D LAMMPS_EXCEPTIONS=ON \
    -D BUILD_SHARED_LIBS=ON \
    -D BUILD_MPI=yes \
    -D BUILD_OMP=yes \
    -C ../cmake/presets/gcc.cmake \
    -C ../cmake/presets/kokkos-cuda.cmake \
    -D PKG_KOKKOS=yes \
    -D Kokkos_ARCH_ZEN3=yes \
    -D Kokkos_ARCH_PASCAL60=no \
    -D Kokkos_ARCH_AMPERE80=yes \
    -D Kokkos_ENABLE_CUDA=yes \
    -D Kokkos_ENABLE_OPENMP=yes \
    -D CUFFT_LIBRARY=$CUDA_HOME/lib64/libcufft.so \
    -D CUDA_INCLUDE_DIRS=$CUDA_HOME/include \
    -D CUDA_CUDART_LIBRARY=$CUDA_HOME/lib64/libcudart.so \
    -D CAFFE2_USE_CUDNN=ON \
    -D BUILD_TOOLS=no \
    -D FFT=FFTW3 \
    -D FFT_KOKKOS=CUFFT \
    -D FFTW3_INCLUDE_DIR=$AOCL_ROOT/include \
    -D FFTW3_LIBRARY=$AOCL_LIB/libfftw3.so \
    -D FFTW3_OMP_LIBRARY=$AOCL_LIB/libfftw3_omp.so \
    -D CMAKE_INSTALL_PREFIX="$LAMMPS_ROOT" \
    -D PKG_MANYBODY=yes \
    -D PKG_MOLECULE=yes \
    -D PKG_KSPACE=yes \
    -D PKG_REPLICA=yes \
    -D PKG_ASPHERE=yes \
    -D PKG_RIGID=yes \
    -D PKG_MPIIO=yes \
    -D PKG_COMPRESS=yes \
    -D PKG_H5MD=no \
    -D PKG_OPENMP=yes \
    -D CMAKE_POSITION_INDEPENDENT_CODE=yes \
    -D CMAKE_EXE_FLAGS="-dynamic" \
    -D CMAKE_VERBOSE_MAKEFILE=TRUE \

I am building a non-debug version of LAMMPS because the build fails when I try to enable debugging symbols, no matter whether I use GCC or NVHPC compilers.

The Allegro model I am using was trained using NequIP 0.6.1, mir-allegro 0.2.0, and PyTorch 1.11.0 (py3.10_cuda11.3_cudnn8.2.0_0). It was also trained on a NVIDIA A100-SXM4-40GB GPU.

Allegro training config
BesselBasis_trainable: true
PolynomialCutoff_p: 48
append: true
ase_args:
  format: traj
avg_num_neighbors: auto
batch_size: 1
chemical_symbols:
- H
- O
dataset: ase
dataset_file_name: <path to train+val dataset as .traj file>
dataset_seed: 123456
default_dtype: float64
early_stopping_lower_bounds:
  LR: 1.0e-05
early_stopping_patiences:
  validation_loss: 100
early_stopping_upper_bounds:
  cumulative_wall: 604800.0
edge_eng_mlp_initialization: uniform
edge_eng_mlp_latent_dimensions:
- 32
edge_eng_mlp_nonlinearity: null
ema_decay: 0.999
ema_use_num_updates: true
embed_initial_edge: true
env_embed_mlp_initialization: uniform
env_embed_mlp_latent_dimensions: []
env_embed_mlp_nonlinearity: null
env_embed_multiplicity: 64
l_max: 2
latent_mlp_initialization: uniform
latent_mlp_latent_dimensions:
- 64
- 64
- 64
- 64
latent_mlp_nonlinearity: silu
latent_resnet: true
learning_rate: 0.001
loss_coeffs:
  forces:
  - 1
  - PerSpeciesL1Loss
  stress: 5000
  total_energy:
  - 20.0
  - PerAtomL1Loss
lr_scheduler_kwargs:
  cooldown: 0
  eps: 1.0e-08
  factor: 0.9
  min_lr: 0
  mode: min
  patience: 400
  threshold: 0.0001
  threshold_mode: rel
  verbose: false
lr_scheduler_name: ReduceLROnPlateau
max_epochs: 50000
metrics_components:
- - forces
  - mae
  - PerSpecies: true
    report_per_component: false
- - forces
  - rmse
  - PerSpecies: true
    report_per_component: false
- - forces
  - rmse
- - forces
  - mae
- - total_energy
  - mae
  - PerAtom: true
- - total_energy
  - mae
  - PerAtom: true
- - total_energy
  - rmse
  - PerAtom: true
- - stress
  - mae
- - stress
  - rmse
metrics_key: validation_loss
model_builders:
- allegro.model.Allegro
- PerSpeciesRescale
- StressForceOutput
- RescaleEnergyEtc
n_train: 2250
n_val: 250
num_layers: 2
optimizer_kwargs:
  amsgrad: false
  betas: !!python/tuple
  - 0.9
  - 0.999
  eps: 1.0e-08
  weight_decay: 0.0
optimizer_name: Adam
parity: o3_full
r_max: 4.0
root: results/wateronly
run_name: hpo-2f39852b9d648fa732723543b02d3ca4c3581ddc
seed: 123456
shuffle: true
train_val_split: random
two_body_latent_mlp_initialization: uniform
two_body_latent_mlp_latent_dimensions:
- 32
- 64
two_body_latent_mlp_nonlinearity: silu
use_ema: true
verbose: debug
wandb: true
wandb_project: <project name>

After deploying the best model to TorchScript format, I attempted to use it in a LAMMPS NPT simulation. The input geometry is a water-only system.

geom-thermalized-298.15K.data
LAMMPS data file via write_data, version 2 Aug 2023, timestep = 10000, units = metal

5184 atoms
2 atom types

0 38.7494136435 xlo xhi
0 38.7494136435 ylo yhi
0 38.7494136435 zlo zhi

Masses

1 1.008
2 15.999

Atoms # atomic

4014 1 2.016863143251884 1.935372548383521 4.290501263898246 1 0 0
3979 2 2.5980102214720313 2.127124762222952 3.5158172054292423 1 0 0
3980 1 2.426726184113488 1.330444453601437 2.933484352485481 1 0 0
4235 1 0.4427546772570891 1.8248251034116987 0.3139859162106747 1 0 1
306 2 4.479768747024378 3.0437024573024676 1.0988955165290661 0 0 1
18 1 3.7138355554631155 2.7497739123432603 0.5598095055376706 0 0 0
10 2 5.838633127133152 1.38484923725688 2.427887300431939 0 0 0
1003 1 5.019716962999001 1.8773596976102618 2.24263102951684 0 1 0
57 1 5.1020184395333015 0.2780756937364116 4.697073480526178 0 0 0
58 1 4.189152573156522 1.4148146498389342 5.047283101683429 0 0 0
11 1 6.320533570258894 2.2046199287416948 2.4275733453782147 0 0 0
...

There are no atom overlaps in this geometry, and the LAMMPS input script attempts to do NPT.

input.lammps
# LAMMPS script for our MD systems to validate Allegro potentials

# System-wide settings
units metal
dimension 3
atom_style atomic
boundary p p p

# System geometry
# initial_frame.data will be written into the working directory where this
# script is located.
read_data ./geom-thermalized-298.15K.data

# Simulation settings
mass 1 1.008
mass 2 15.999

pair_style allegro
pair_coeff * * ./hpo-2f39852b9d648fa732723543b02d3ca4c3581ddc.pth H O 


# PART B - MOLECULAR DYNAMICS
delete_atoms overlap 0.1 all all

# Logging
thermo 1
thermo_style custom step time temp press pe ke etotal epair ebond econserve fmax

# Try to rebuild neighbor lists more often
neigh_modify every 1 delay 0 check yes binsize 10.0

# Also try to specify larger cutoff for ghost atoms to avoid losing atoms.
comm_modify mode single cutoff 10.0 vel yes

# Try specifying initial velocities for all atoms
velocity all create 298.15 3127835 dist gaussian


# Run MD in the NPT ensemble, with a Nosé-Hoover thermostat starting at 298.15 K and a barostat starting at 1.01325 bar.
fix mynose all npt & 
    temp 298.15 298.15 0.011 &
    tchain 3 &
    iso 1.01325 1.01325 0.03


# Be sure to dump the MD trajectory
dump mdtraj all atom 40 mdtraj.lammpstrj
dump mdforces all custom 40 mdforces.lammpstrj id x y z vx vy vz fx fy fz 

timestep 0.0005
# Set up binary restart dumps every 1000 steps in case something goes wrong.
restart 1000 step-*.restart

# Normal run, with a single balance first
balance 1.0 shift xyz 100 1.0
run 20000

undump mdtraj
undump mdforces


# Finally, write out the final geometry of the system
write_data geom-equilibrated-1atm.data

I invoke LAMMPS with Kokkos:

srun --cpu-bind=cores --gpu-bind=none lmp -k on g 4 -sf kk -pk kokkos neigh full newton on -in input.lammps

This results in the following error:

LAMMPS stdout
Module cudnn/linux-x86_64-9.2.0.82_cuda11 is loading...
Module libtorch/1.13.1+cu116-cxx11-abi is loading...
Module lammps-tpc/2Aug23 is loading...

Loading lammps-tpc/2Aug23/gcc10-allegro-gpu
  Loading requirement: slurm scl/gcc-toolset-10 amd/aocl/gcc/4.0 cuda/cuda-11.6
    penguin/openmpi/4.1.4/gcc intel/tbb/2021.12 intel/compiler-rt/2024.1.0
    intel/mkl/2024.1 intel/oneapi-2024.1.0-mkl
    cudnn/linux-x86_64-9.2.0.82_cuda11 libtorch/1.13.1+cu116-cxx11-abi
LAMMPS (2 Aug 2023 - Update 3)
KOKKOS mode with Kokkos version 3.7.2 is enabled (src/KOKKOS/kokkos.cpp:108)
  will use up to 4 GPU(s) per node
WARNING: Turning off GPU-aware MPI since it is not detected, use '-pk kokkos gpu/aware on' to override (src/KOKKOS/kokkos.cpp:316)
  using 1 OpenMP thread(s) per MPI task
Reading data file ...
  orthogonal box = (0 0 0) to (38.749414 38.749414 38.749414)
  1 by 2 by 2 MPI processor grid
  reading atoms ...
  5184 atoms
  reading velocities ...
  5184 velocities
  read_data CPU = 0.030 seconds
Allegro is using input precision f and output precision d
Allegro: Loading model from ./hpo-2f39852b9d648fa732723543b02d3ca4c3581ddc.pth
Allegro: Freezing TorchScript model...
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | H | 1 | H
1 | O | 2 | O
ti=0 tj=0 cut=4.00
ti=0 tj=1 cut=4.00
ti=1 tj=0 cut=4.00
ti=1 tj=1 cut=4.00
System init for delete_atoms ...
Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6
  ghost atom cutoff = 6
  binsize = 6, bins = 7 7 7
  2 neighbor lists, perpetual/occasional/extra = 1 1 0
  (1) command delete_atoms, occasional, copy from (2)
      attributes: full, newton on
      pair build: copy/kk/device
      stencil: none
      bin: none
  (2) pair allegro/kk, perpetual
      attributes: full, newton on, kokkos_device
      pair build: full/bin/kk/device
      stencil: full/bin/3d
      bin: kk/device
Deleted 0 atoms, new total = 5184
Balancing ...
Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6
  ghost atom cutoff = 10
  binsize = 10, bins = 4 4 4
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair allegro/kk, perpetual
      attributes: full, newton on, kokkos_device
      pair build: full/bin/kk/device
      stencil: full/bin/3d
      bin: kk/device
 rebalancing time: 0.004 seconds
  iteration count = 26
  initial/final maximal load/proc = 1369 1329
  initial/final imbalance factor  = 1.0563272 1.025463
  x cuts: 0 1
  y cuts: 0 0.4954834 1
  z cuts: 0 0.49346924 1
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
ERROR: Non-numeric pressure - simulation unstable (src/fix_nh.cpp:1049)
Last command: run 20000

If I instead change to NVT like this:

fix mynose all nvt & 
    temp 298.15 298.15 0.011 &
    tchain 3 &
    # iso 1.01325 1.01325 0.03

and again run with Kokkos the run starts, but with nan as the pressure:

LAMMPS stdout with NVT, using Kokkos
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 3.543 | 3.546 | 3.549 Mbytes
   Step          Time           Temp          Press          PotEng  ...
         0   0              298.15         nan           -29784.777  ...
         1   0.0005         296.93805      nan           -29783.946  ...
         2   0.001          296.03497      nan           -29783.33   ...
         3   0.0015         295.55724      nan           -29783.005  ...
         4   0.002          295.38153      nan           -29782.879  ...
         5   0.0025         295.32922      nan           -29782.824  ...

If I take this same job, again using GPU LAMMPS, and run a CPU-only job without using Kokkos:

srun --cpu-bind=cores --gpu-bind=none lmp -in input.lammps

then pressures are calculated (although they are quite high):

LAMMPS stdout with NVT, no Kokkos
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.0005
Per MPI rank memory allocation (min/avg/max) = 3.539 | 3.545 | 3.551 Mbytes
   Step          Time           Temp          Press          PotEng  ...
         0   0              298.15         4406.7395     -29784.805  ...
         1   0.0005         299.15125      4408.8153     -29785.494  ...
         2   0.001          300.23769      4425.8546     -29786.253  ...
         3   0.0015         300.77332      4476.0844     -29786.639  ...
         4   0.002          300.32951      4566.4221     -29786.356  ...
         5   0.0025         298.96391      4682.4999     -29785.437  ...
         6   0.003          297.23743      4794.6713     -29784.27   ...

Any advice on what to try? I have been using LAMMPS 02Aug23 since the folks at NERSC have used that version for their LAMMPS+pair_allegro installation, but is there a different LAMMPS release you recommend using?

The admins on my cluster are also going to fix CUDA 12.4, so I should be able to build against more recent CUDA and related libraries in the next few weeks.

Thanks!


Update 13 Sep 2024: The problem and nan pressures persist even when compiling with the latest LAMMPS development branch (i.e., Git commit 2995cb7 from doing git clone --depth=1 https://github.com/lammps/lammps).

@samueldyoung29ctr
Copy link
Author

Update: I also tried running calculations using the LAMMPS + Kokkos + pair_allegro installation available on NERSC Perlmutter (i.e., the nersc/lammps_allegro:23.08 Shifter image, hash 3bd7ce2e78, built on 03 Oct 2024). This time, I am getting nan values for pressures (as opposed to the -nan values of my other cluster).

NERSC told me that this image was built using the following Dockerfile:

Dockerfile for NERSC `nersc/lammps_allegro:23.08` image
FROM nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu22.04
WORKDIR /opt
ENV DEBIAN_FRONTEND noninteractive


RUN \
    apt-get update        &&   \
    apt-get install --yes      \
        build-essential        \
        autoconf               \
        cmake                  \
        flex                   \
        bison                  \
        zlib1g-dev             \
        fftw-dev               \
        fftw3                  \
        apbs                   \
        libicu-dev             \
        libbz2-dev             \
        libboost-all-dev       \
        libgmp-dev             \
        bc                     \
        libblas-dev            \
        liblapack-dev          \
        libfftw3-dev           \
        automake               \
        lsb-core               \
        libxc-dev              \
        git                    \
        unzip                  \
        clang                  \
        llvm                   \
        gcc                    \
        g++                    \
        libgsl-dev             \
        libhdf5-serial-dev     \
        cmake                  \
        intel-mkl-full         \
        vim                    \
        python3                \
        python3-pip            \
        mlocate                \
        wget              &&   \
    apt-get clean all


ARG mpich=4.1.1
ARG mpich_prefix=mpich-$mpich

RUN \
    wget https://www.mpich.org/static/downloads/$mpich/$mpich_prefix.tar.gz && \
    tar xvzf $mpich_prefix.tar.gz                                           && \
    cd $mpich_prefix                                                        && \
    ./configure FFLAGS=-fallow-argument-mismatch FCFLAGS=-fallow-argument-mismatch && \
    make -j 16                                                              && \
    make install                                                            && \
    make clean                                                              && \
    cd ..                                                                   && \
    rm -rf $mpich_prefix

RUN /sbin/ldconfig
ENV MPI_PATH=/opt/mpich/install
ENV PATH=$PATH:/opt/mpich/install/bin
ENV PATH=$PATH:/opt/mpich/install/include
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/mpich/install/lib
#RUN which mpicc
#RUN env MPICC=/opt/mpich/install/bin/mpicc python3 -m pip install mpi4py


# Install miniconda
ENV installer=Miniconda3-py39_4.12.0-Linux-x86_64.sh
RUN wget https://repo.anaconda.com/miniconda/$installer && \
    /bin/bash $installer -b -p /opt/miniconda3          && \
    rm -rf $installer
ENV PATH=/opt/miniconda3/bin:$PATH
RUN pip install numpy scipy matplotlib setuptools
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN pip install mkl mkl-devel mkl-static mkl-include
RUN pip install ninja 
RUN pip install wandb


#Installing lammps
WORKDIR /opt
RUN cd /opt
RUN git clone -b stable_2Aug2023_update2 --depth 1 https://github.com/lammps/lammps.git lammps 
RUN git clone -b multicut https://github.com/mir-group/pair_allegro.git pair_allegro
RUN cd /opt/pair_allegro && \
    ./patch_lammps.sh /opt/lammps

RUN apt-get install --yes clang-format xxd
RUN wget https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.0.0%2Bcu118.zip
RUN unzip libtorch-cxx11-abi-shared-with-deps-2.0.0+cu118.zip
RUN rm -rf libtorch-cxx11-abi-shared-with-deps-2.0.0+cu118.zip
RUN mv libtorch libtorch-gpu
ENV PATH=$PATH:/opt/libtorch-gpu/bin
ENV PATH=$PATH:/opt/libtorch-gpu/include
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/libtorch-gpu/lib


ENV TORCH_CUDA_ARCH_LIST="8.0 8.6 8.9 9.0"
ENV CONDA_PREFIX="/opt/miniconda3"
ENV PATH=$PATH:/opt/lammps/build/plumed_build-prefix/bin
ENV PATH=$PATH:/opt/lammps/build/plumed_build-prefix/include
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/lammps/build/plumed_build-prefix/lib
ENV PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/lammps/build/plumed_build-prefix/lib/pkgconfig
ENV PLUMED_KERNEL=/opt/lammps/build/plumed_build-prefix/lib/libplumedKernel.so
WORKDIR /opt/lammps
RUN mkdir build
WORKDIR /opt/lammps/build
RUN cmake -DMKL_INCLUDE_DIR=$CONDA_PREFIX/include -DMKL_LIBRARY=$CONDA_PREFIX/lib -D CMAKE_BUILD_TYPE=Release \
          -D CMAKE_PREFIX_PATH=/opt/libtorch-gpu \
          -D CMAKE_INSTALL_PREFIX=/opt/lammps/install -D CMAKE_CXX_STANDARD=17 -D CMAKE_CXX_STANDARD_REQUIRED=ON \
          -D BUILD_MPI=ON -D CMAKE_CXX_COMPILER=/opt/lammps/lib/kokkos/bin/nvcc_wrapper -D BUILD_SHARED_LIBS=ON \
          -D PKG_MANYBODY=ON -D PKG_MOLECULE=ON -D PKG_KSPACE=ON -D PKG_REPLICA=ON -D PKG_REAXFF=ON -D PKG_QEQ=ON \
          -D PKG_PHONON=ON -D PKG_ELECTRODE=yes -D PKG_PLUMED=yes -D DOWNLOAD_PLUMED=yes -D PLUMED_MODE=shared \
          -D BUILD_SHARED_LIBS=ON -D PKG_KOKKOS=yes -D Kokkos_ARCH_AMPERE80=ON -D Kokkos_ENABLE_CUDA=yes \
          -D CMAKE_PREFIX_PATH=/opt/libtorch-gpu ../cmake
RUN make -j 4
RUN make install
ENV PATH=/opt/lammps/install/bin:$PATH
ENV PATH=/opt/lammps/install/lib:$PATH
ENV PATH=/opt/lammps/install/include:$PATH
ENV LD_LIBRARY_PATH=/opt/lammps/install/lib:$LD_LIBRARY_PATH

I am again finding that pressures are not correctly calculated when LAMMPS + pair_allegro is invoked with Kokkos flags.

Slurm script to use containerized LAMMPS on NERSC with Kokkos
#!/bin/bash
#SBATCH --image docker:nersc/lammps_allegro:23.08
#SBATCH --job-name=pressure-test
#SBATCH --account=mXXXX
#SBATCH --qos=debug
#SBATCH --nodes=1
#SBATCH --constraint=gpu
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=32
#SBATCH --gpus-per-task=1
#SBATCH --gpu-bind=none
#SBATCH --time=5:00
#SBATCH --error=vt_lammps%j.err
#SBATCH --output=vt_lammps%j.out
#SBATCH [email protected]
#SBATCH --mail-type=ALL
#
#SBATCH --open-mode=append

# OpenMP parallelization
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_PROC_BIND=spread
export OMP_PLACES=threads

# Ensure that stack size is unlimited, or you may get a segfault error when
# attempting to run a MPI job.
ulimit -s unlimited
ulimit -S unlimited
ulimit -H unlimited

# Run LAMMPS
exe="lmp"
input='-k on g 4 -sf kk -pk kokkos newton on neigh full -in input.lammps'
srun --cpu-bind=cores --gpu-bind=none --module mpich,gpu shifter $exe $input

Under NVT conditions when running with Kokkos, the thermo output show nan pressures like this:

   Step          Time           Temp          Press          PotEng
         0   0              298.15         nan           -29472.012
         1   0.0005         297.55398      nan           -29471.599
         2   0.001          298.33164      nan           -29472.145
         3   0.0015         300.09509      nan           -29473.349
         4   0.002          302.16364      nan           -29474.772

and trying to do NPT with Kokkos results in the same error about non-numeric pressure:

Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 6
  ghost atom cutoff = 10
  binsize = 10, bins = 4 4 4
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair allegro/kk, perpetual
      attributes: full, newton on, kokkos_device
      pair build: full/bin/kk/device
      stencil: full/bin/3d
      bin: kk/device
 rebalancing time: 0.002 seconds
  iteration count = 26
  initial/final maximal load/proc = 1369 1329
  initial/final imbalance factor  = 1.0563272 1.025463
  x cuts: 0 1
  y cuts: 0 0.4954834 1
  z cuts: 0 0.49346924 1
run 20000
ERROR: Non-numeric pressure - simulation unstable (src/fix_nh.cpp:1049)
Last command: run 20000

However, when running LAMMPS without Kokkos, system pressures are calculated and the NPT simulation completes without issue. It appears that NPT calculations using multiple GPUs are not possible on Perlmutter.

Slurm script to use containerized LAMMPS on NERSC without Kokkos
#!/bin/bash
#SBATCH --image docker:nersc/lammps_allegro:23.08
#SBATCH --job-name=pressure-test
#SBATCH --account=mXXXX
#SBATCH --qos=regular
#SBATCH --nodes=1
#SBATCH --constraint=cpu
#SBATCH --ntasks-per-node=128
#SBATCH --cpus-per-task=1
#SBATCH --time=2:00:00
#SBATCH --error=vt_lammps%j.err
#SBATCH --output=vt_lammps%j.out
#SBATCH [email protected]
#SBATCH --mail-type=ALL
#
#SBATCH --open-mode=append

# OpenMP parallelization
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_PROC_BIND=spread
export OMP_PLACES=threads

# Ensure that stack size is unlimited, or you may get a segfault error when
# attempting to run a MPI job.
ulimit -s unlimited
ulimit -S unlimited
ulimit -H unlimited

# Run LAMMPS
exe="lmp"
input='-in input.lammps'
srun --cpu-bind=cores --gpu-bind=none --module mpich shifter $exe $input

It appears that you used Perlmutter GPU nodes for your scaling experiments in the SC'23 conference paper and A100 GPUs in the scaling testing of Allegro in the Nat. Comm. paper. Were the LAMMPS + pair_allegro builds for these experiments done before mir-allegro and pair_allegro got support for computing the virial tensor? Or was there a specific build configuration you used on Perlmutter to get virial tensor computation and thus support for NPT?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants