Skip to content

Latest commit

 

History

History
109 lines (87 loc) · 4.51 KB

README.md

File metadata and controls

109 lines (87 loc) · 4.51 KB

About

The vlp4d code solves Vlasov-Poisson equations in 4D (2d space, 2d velocity). From the numerical point of view, vlp4d is based on a semi-lagrangian scheme. Vlasov solver is typically based on a directional Strang splitting. The Poisson equation is treated with 2D Fourier transforms. For the sake of simplicity, all directions are, for the moment, handled with periodic boundary conditions.

The Vlasov solver is based on advection's operators:

  • 1D advection along x (Dt/2)
  • 1D advection along y (Dt/2)
  • Poisson solver -> compute electric fields Ex and E
  • 1D advection along vx (Dt)
  • 1D advection along vy (Dt)
  • 1D advection along x (Dt/2)
  • 1D advection along y (Dt/2)

Interpolation operator within advection is Lagrange polynomial of order 5, 7 depending on a compilation flag (order 5 by default).

Detailed descriptions of the test cases can be found in

For questions or comments, please find us in the AUTHORS file.

HPC

From the view point of high perfomrance computing (HPC), the code is parallelized with OpenMP without MPI domain decomposition. In order to investigate the performance portability of this kind of kinietic plasma simulation codes, we implement the mini-app with a mixed OpenACC/OpenMP and Kokkos, where we suppress unnecessary duplications of code lines. The detailed description and obtained performance is found in

Test environments

We have tested the code on the following environments.

  • Nvidia Tesla p100 on Tsubame3.0 (Tokyo Tech, Japan)
    Compilers (cuda/8.0.61, pgi19.1)

  • Nvidia Tesla v100 on Summit (OLCF, US)
    Compilers (cuda/10.1.168, pgi19.1)

  • Intel Skylake on JFRS-1 (IFERC-CSC, Japan)
    Compilers (intel19.0.0.117)

  • Marvell Thunder X2 on CEA Computing Complex (CEA, France)
    Compilers (armclang19.2.0)

Usage

Compile

Depending on your configuration, you may have to modify the Makefile. You may add your configuration in the same way as

ifneq (,$(findstring p100,$(DEVICES)))
CXXFLAGS=-O3 -I/apps/t3/sles12sp2/cuda/8.0.61/include -ta=nvidia:cc60 -Minfo -std=c++11 -DOWN_INDEX_SEQUENCE -DNO_ASSERT_IN_CONSTEXPR -DENABLE_OPENACC
CXX=pgc++
LDFLAGS = -Mcudalib=cufft -ta=nvidia:cc60 -acc
TARGET = vlp4d.p100_acc
endif

OpenACC version

export DEVICE=device_name # choose the device_name from "p100", "v100", "bdw", "skx", "tx2"
cd src_openacc
make

OpenMP4.5 version

export DEVICE=device_name # choose the device_name from "v100"
cd src_openmp4.5
make

Kokkos version

First of all, you need to install kokkos on your environment. Instructions are found in https://github.com/kokkos/kokkos. In the following example, it is assumed that kokkos is located at "your_kokkos_path".

export KOKKOS_PATH=your_kokkos_path # set your_kokkos_path
export DEVICE=device_name # choose the device_name from "p100", "v100", "bdw", "skx", "tx2"
export RANGE_POLICY=3D # optional, in case using MDRangePolicy3D for the better performance
cd src_kokkos
make

Test

Depending on your configuration, you may have to modify the job.sh in wk and sub_*.sh in wk/batch_scripts.

cd wk
./job.sh
gnuplot -e 'plot "nrj.out" u 2 w l, "nrj_SLD10" u 2; pause -1' 

To checkout if results are OK, the nrj curve should be close enough to nrj_SLD10.
For the performance measurement to reproduce the results in SC paper, you should change the argment in the bash script from "SLD10.dat" to "SLD10_large.dat". For example, in wk/batch_scripts/sub_p100_kokkos.sh, the last line should be changed as follows.

Original (Before change)
./vlp4d.p100_kokkos SLD10.dat
SC19 (After change)
./vlp4d.p100_kokkos SLD10_large.dat

You can also try the two beam instability by setting the argument as "TSI20.dat".