-
Notifications
You must be signed in to change notification settings - Fork 11
Tutorial on running 2x2_sim Apr2024
This tutorial is intended to provide a high-level understanding of how the 2x2 simulation chain works, and how to run it interactively at NERSC. The larnd-sim detector simulation makes use of GPUs, and NERSC is, for us, an especially convenient and abundant source of GPU resources, so that's where we've been running the chain. We are happy to provide support for anyone who wants to run the chain elsewhere, but that's not in scope for this tutorial. Nor is this tutorial intended to provide details on the underlying simulation packages. Nor is it intended to describe how we've been running mini-production campaigns at NERSC.
The chain is based on a small set of separate, decoupled steps that "communicate" only via the files that they produce. Each steps is configured solely by environment variables, which by convention begin with ARCUBE_
. These steps are:
-
GENIE: The event generator. Reponsible for taking the NuMI flux files and the geometry description, and generating neutrino interactions.
-
edep-sim: The Geant4 wrapper. Reponsible for taking the outgoing particles from GENIE interactions and propagating them through the geometry, recording the particle trajectories and any energy deposited in active ("sensitive") detector volumes.
We normally run GENIE and edep-sim in two parallel paths, with different GENIE geometries (but the same edep-sim geometry, namely the full rock + hall + detector geometry):
-
The "nu" path, where GENIE sees a geometry that contains the hall and detectors (MINERvA and the 2x2), but none of the surrounding rock.
-
The "rock" path, where GENIE sees a geometry that just has the rock and an empty hall.
The purpose of this two-path approach is to keep the rock interactions in a separate sample, so that they can be reused later if necessary for conserving computational resources.
Returning to the packages that make up the chain:
-
hadd: From ROOT; responsible for merging multiple edep-sim outputs into a single file. We do this in order to decouple the walltime requirements of GENIE+edep-sim from those of the later steps. Currently, we "hadd together" 10 edep-sim files at a time (separately for the "nu" and "rock" paths). These merged files are then passed to the next step (the spill builder).
-
spill-build: A simple ROOT macro that takes "nu" and "rock" edep-sim files and, based on POT per spill, spill structure, and spill separation, overlays the events into spills.
At this point, the ROOT files from the spill builder are passed on to the MINERvA chain, which consists of two steps:
-
edep2flat: Converts the "object-oriented" edep-sim ROOT TTree tree into a "flat" ntuple-like TTree suitable as input to the
minerva
step. -
minerva: Runs the Gaudi-based MINERvA detector simulation, which has been modified to read the "flat" edep-sim file format instead of using its own event generator and Geant4 wrapper. The output from this step is not used until the final CAF-making step (although the eventual goal is to feed it to a joint reconstruction with the simulated 2x2 data).
Meanwhile, for the 2x2, the steps continue:
-
larnd-sim: The detector simulation for the charge (LArPix) and light readout. Written in Python but with the "heavy lifting" compiled to GPU (CUDA) binary using Numba.
-
ndlar_flow: Calibration and low-level reconstruction. Written in numpy-based Python using Peter's "h5flow" framework.
-
validation: Produces matplotlib-based validation plots as multi-page PDFs, for the various preceding steps.
-
mlreco: Machine-learning reconstruction. Consists of three sub-steps:
flow2supera
(converts the output of ndlar_flow into input suitable for mlreco);mlreco_inference
(runs inference on GPUs);mlreco_analysis
(runs post-processing on CPUs) -
pandora: "Traditional" reconstruction, widely used in liquid argon TPCs. Currently run outside of this chain, but integration is planned. Will not be covered further in this version of the tutorial.
-
cafmaker: Produces high-level Common Analysis Files (CAFs) summarizing the output of reconstruction and (in the case of simulation) the MC truth. It is intended that CAFs will be the primary inputs for most physics analyses.
Also worth mentioning: The "g4numi" package is responsible for running the (Geant-based) NuMI beamline simulation and producing the "dk2nu" flux files that GENIE consumes. However, we don't run g4numi as an explicit step in this chain. Instead, we've been using a static set of dk2nu files copied over from a previous g4numi production run at Fermilab.
If you'd like to follow this tutorial directly, you will need a computing account at NERSC. To request an account, follow the instructions here. The project name is dune
. Assuming you have an account, you'll want to run these steps on the Perlmutter system, which provides GPUs. To log in:
ssh saul.nersc.gov
If you want to run the chain elsewhere, make sure that GPUs are available. (Only Nvidia is supported for now.) Options include the Wilson Cluster at Fermilab, the S(3)DF cluster at SLAC, and the Polaris system at Argonne. The main change you will need to make will be the container setup (see the next section). NERSC has its own "Shifter" container runtime, which can import Docker containers from Dockerhub. Meanwhile, WC and SDF support Singularity/Apptainer. The ARCUBE_RUNTIME
environment variable can be used to specify the container runtime. For details (which are outside the scope of this tutorial), see 2x2_sim/util/reload_in_container.inc.sh
.
There are a number of subdirectories, whose names begin with run-
, which contain the individual steps in the chain. In the order in which they're run:
Within each of these subdirectories, there's a corresponding "run script", e.g. run_edep_sim.sh
. These scripts should be run directly from the native Perlmutter OS, not from inside the container. The scripts themselves will take care of entering the container, loading any necessary modules or Python environments, etc.
The run scripts do not take any command-line arguments. Instead, they are controlled entirely by environment variables which, by convention, begin with ARCUBE_
. A couple of important common environment variables:
-
ARCUBE_RUNTIME
: The container runtime to use when running the 2x2 sim. Current valid options areSHIFTER
,SINGULARITY
,PODMAN-HPC
, andNONE
; the default option isSHIFTER
. -
ARCUBE_CONTAINER
: Name of the container to use when running the 2x2 sim. The name is slightly different between use with Shifter (name of container on DockerHub) or Singularity (name of container .sif file). -
ARCUBE_CONTAINER_DIR
: Path/directory where the Singularity container is stored. Not used for other container runtimes. -
ARCUBE_DIR
: The top-level (or root) location of the 2x2 sim directory (e.g./path/to/2x2_sim
). This is needed for Singularity to properly bind the directory to ensure it is mounted when using networked file systems. Not used for other container runtimes. -
ARCUBE_OUTDIR_BASE
: The base directory for output files. Recommended to be placed somewhere under$SCRATCH
when running at NERSC. -
ARCUBE_LOGDIR_BASE
: The base directory for log files. Also recommended to be placed somewhere under$SCRATCH
when running at NERSC. -
ARCUBE_OUT_NAME
: The name of the output subdirectory. Output filenames will also be prefixed by$ARCUBE_OUT_NAME
. By convention, we setARCUBE_OUT_NAME
to be the name of the "production", separated by a period from the abbreviated name of the step, e.g.MiniRun3.larnd
-
ARCUBE_IN_NAME
: The name of the subdirectory for the input to this step (i.e. the upstream step'sARCUBE_OUT_NAME
). For steps that consume multiple input files, we typically use e.g.ARCUBE_MINERVA_NAME
(in this case for the MINERvA files that are input to the CAF maker). Sometimes, just for the sake of confusion, we do this even for steps that consume a single input. -
ARCUBE_INDEX
: For a multiple-file "production", this is the ID of the file being produced. It is included as part of the output filename. For a typical production of 1000 CAFs,ARCUBE_INDEX
initially runs from 0 to 10000, but we thenhadd
those files in blocks of 10, so thatARCUBE_INDEX
runs from 0 to 1000 for the post-hadd steps. For the purpose of this tutorial, we will use 0 to 10 and just 0, pre- and post-hadd, respectively.
The first thing to do is to clone the 2x2_sim
repository. The examples in this tutorial have been tested with the MiniRun4.5-tutorial
branch. After cloning, we run the admin/install_everything.sh
to install a handful of dependencies locally into the repository. For convenience, we also define the TWOBYTWO_SIM
variable to point to the repository. Finally we define and create the output and log directories.
git clone -b MiniRun4.5-tutorial https://github.com/DUNE/2x2_sim.git
export TWOBYTWO_SIM=$PWD/2x2_sim
cd $TWOBYTWO_SIM
admin/install_everything.sh
export ARCUBE_OUTDIR_BASE=$SCRATCH/2x2tut_out
export ARCUBE_LOGDIR_BASE=$SCRATCH/2x2tut_log
mkdir -p $ARCUBE_OUTDIR_BASE $ARCUBE_LOGDIR_BASE
At the end of this tutorial, you'll have a CAF file made from 200 beam spills. Except where otherwise noted, these commands can be run on a Perlmutter login node, since we're only using 10 CPU cores at a time. We recommend running this tutorial from a working directory under Perlmutter's $SCRATCH
.
First we generate a set of 10 "nu" files containing fiducial interactions. This should only take a minute or so.
cd $TWOBYTWO_SIM/run-genie
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=mjkramer/sim2x2:genie_edep.3_04_00.20230912
export ARCUBE_DET_LOCATION=MiniRun5-Nu
export ARCUBE_DK2NU_DIR=/dvs_ro/cfs/cdirs/dune/users/mkramer/2x2EventGeneration/NuMI_dk2nu/newtarget-200kA_20220409
export ARCUBE_EXPOSURE=1E15
export ARCUBE_GEOM=geometry/Merged2x2MINERvA_v4/Merged2x2MINERvA_v4_noRock.gdml
export ARCUBE_TUNE=AR23_20i_00_000
export ARCUBE_RUN_OFFSET=0
export ARCUBE_XSEC_FILE=/dvs_ro/cfs/cdirs/dune/users/mkramer/2x2EventGeneration/inputs/NuMI/genie_xsec-3.04.00-noarch-AR2320i00000-k250-e1000/v3_04_00/NULL/AR2320i00000-k250-e1000/data/gxspl-NUsmall.xml
export ARCUBE_OUT_NAME=Tutorial.genie.nu
for i in $(seq 0 9); do
ARCUBE_INDEX=$i ./run_genie.sh &
done
wait
Next we generate a set of 10 files containing rock interactions, which will take about an hour. (This brute-force treatment of rock events is very suboptimal!)
cd $TWOBYTWO_SIM/run-genie
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=mjkramer/sim2x2:genie_edep.3_04_00.20230912
export ARCUBE_DET_LOCATION=MiniRun5-Rock
export ARCUBE_DK2NU_DIR=/dvs_ro/cfs/cdirs/dune/users/mkramer/2x2EventGeneration/NuMI_dk2nu/newtarget-200kA_20220409
export ARCUBE_EXPOSURE=1E15
export ARCUBE_GEOM=geometry/Merged2x2MINERvA_v4/Merged2x2MINERvA_v4_justRock.gdml
export ARCUBE_TUNE=AR23_20i_00_000
export ARCUBE_RUN_OFFSET=1000000000
export ARCUBE_XSEC_FILE=/dvs_ro/cfs/cdirs/dune/users/mkramer/2x2EventGeneration/inputs/NuMI/genie_xsec-3.04.00-noarch-AR2320i00000-k250-e1000/v3_04_00/NULL/AR2320i00000-k250-e1000/data/gxspl-NUsmall.xml
export ARCUBE_OUT_NAME=Tutorial.genie.rock
for i in $(seq 0 9); do
ARCUBE_INDEX=$i ./run_genie.sh &
done
wait
The GENIE output will be immediately passed to edep-sim, but it will also be used later as an input to the CAF maker.
Now that we have interactions, we can propagate the products using edep-sim. Again, we start with fiducial events (which will be simulated quickly):
cd $TWOBYTWO_SIM/run-edep-sim
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=mjkramer/sim2x2:ndlar011
export ARCUBE_GENIE_NAME=Tutorial.genie.nu
export ARCUBE_EDEP_MAC=macros/2x2_beam.mac
export ARCUBE_GEOM_EDEP=geometry/Merged2x2MINERvA_v4/Merged2x2MINERvA_v4_withRock.gdml
export ARCUBE_RUN_OFFSET=0
export ARCUBE_OUT_NAME=Tutorial.edep.nu
for i in $(seq 0 9); do
ARCUBE_INDEX=$i ./run_edep_sim.sh &
done
wait
And then the rock events (which again will take about an hour):
cd $TWOBYTWO_SIM/run-edep-sim
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=mjkramer/sim2x2:ndlar011
export ARCUBE_GENIE_NAME=Tutorial.genie.rock
export ARCUBE_EDEP_MAC=macros/2x2_beam.mac
export ARCUBE_GEOM_EDEP=geometry/Merged2x2MINERvA_v4/Merged2x2MINERvA_v4_withRock.gdml
export ARCUBE_RUN_OFFSET=1000000000
export ARCUBE_OUT_NAME=Tutorial.edep.rock
for i in $(seq 0 9); do
ARCUBE_INDEX=$i ./run_edep_sim.sh &
done
wait
Although GENIE and edep-sim took a while for the rock interactions, all of the remaining steps will only take a few minutes (unless otherwise noted).
Now we hadd together the "nu" files:
cd $TWOBYTWO_SIM/run-hadd
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=mjkramer/sim2x2:ndlar011
export ARCUBE_IN_NAME=Tutorial.edep.nu
export ARCUBE_HADD_FACTOR=10
export ARCUBE_OUT_NAME=Tutorial.edep.nu.hadd
export ARCUBE_INDEX=0
./run_hadd.sh
And likewise for the "rock" files:
cd $TWOBYTWO_SIM/run-hadd
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=mjkramer/sim2x2:ndlar011
export ARCUBE_IN_NAME=Tutorial.edep.rock
export ARCUBE_HADD_FACTOR=10
export ARCUBE_OUT_NAME=Tutorial.edep.rock.hadd
export ARCUBE_INDEX=0
./run_hadd.sh
Now we combine the fiducial and rock events to form beam spills.
cd $TWOBYTWO_SIM/run-spill-build
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=mjkramer/sim2x2:ndlar011
export ARCUBE_NU_NAME=Tutorial.edep.nu.hadd
export ARCUBE_NU_POT=1E16
export ARCUBE_ROCK_NAME=Tutorial.edep.rock.hadd
export ARCUBE_ROCK_POT=1E16
export ARCUBE_OUT_NAME=Tutorial.spill
export ARCUBE_INDEX=0
./run_spill_build.sh
Before proceeding with the 2x2 detector simulation, we produce the simulated MINERvA data. This is broken into two steps. First, we "flatten" the edep-sim ROOT file:
cd $TWOBYTWO_SIM/run-edep2flat
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=fermilab/fnal-wn-sl7:latest
export ARCUBE_IN_NAME=Tutorial.spill
export ARCUBE_OUT_NAME=Tutorial.edep2flat
export ARCUBE_INDEX=0
./run_edep2flat.sh
Then we run the MINERvA simulation itself:
cd $TWOBYTWO_SIM/run-minerva
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=fermilab/fnal-wn-sl7:latest
export ARCUBE_IN_NAME=Tutorial.edep2flat
export ARCUBE_OUT_NAME=Tutorial.minerva
export ARCUBE_INDEX=0
./run_minerva.sh
The output will be used later, during CAF making.
Returning to the 2x2, we again flatten the edep-sim ROOT files, this time to the HDF5 format used as input to larnd-sim:
cd $TWOBYTWO_SIM/run-convert2h5
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=mjkramer/sim2x2:ndlar011
export ARCUBE_ACTIVE_VOLUME=volTPCActive
export ARCUBE_SPILL_NAME=Tutorial.spill
export ARCUBE_OUT_NAME=Tutorial.convert2h5
export ARCUBE_INDEX=0
./run_convert2h5.sh
The 2x2 detector simulation requires a dedicated GPU. From a login node, we can get a shell on a compute node via:
salloc -A dune -q interactive -C gpu -t 30
On the compute node, we can then run the simulation:
cd $TWOBYTWO_SIM/run-larnd-sim
export ARCUBE_RUNTIME=NONE
export ARCUBE_CONVERT2H5_NAME=Tutorial.convert2h5
export ARCUBE_OUT_NAME=Tutorial.larnd
export ARCUBE_INDEX=0
./run_larnd_sim.sh
Back on the login node, we run the output of larnd-sim
through the ndlar_flow
calibration/processing stage:
cd $TWOBYTWO_SIM/run-ndlar-flow
export ARCUBE_RUNTIME=NONE
export ARCUBE_IN_NAME=Tutorial.larnd
export ARCUBE_OUT_NAME=Tutorial.flow
export ARCUBE_INDEX=0
./run_ndlar_flow.sh
The output of ndlar_flow
needs to be converted to the "LArCV" (a.k.a. "Supera") format expected by the ML reconstruction:
cd $TWOBYTWO_SIM/run-mlreco
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=deeplearnphysics/larcv2:ub20.04-cuda11.6-pytorch1.13-larndsim
export ARCUBE_IN_NAME=Tutorial.flow
export ARCUBE_OUT_NAME=Tutorial.flow2supera
export ARCUBE_INDEX=0
./run_flow2supera.sh
The reconstruction itself requires a GPU, so, again:
salloc -A dune -q interactive -C gpu -t 30
Then:
cd $TWOBYTWO_SIM/run-mlreco
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=deeplearnphysics/larcv2:ub20.04-cuda11.6-pytorch1.13-larndsim
export ARCUBE_IN_NAME=Tutorial.flow2supera
export ARCUBE_OUT_NAME=Tutorial.mlreco_inference
export ARCUBE_INDEX=0
./run_mlreco_inference.sh
Returning to a login node, we apply some postprocessing to the reconstruction's output:
cd $TWOBYTWO_SIM/run-mlreco
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=deeplearnphysics/larcv2:ub20.04-cuda11.6-pytorch1.13-larndsim
export ARCUBE_IN_NAME=Tutorial.mlreco_inference
export ARCUBE_OUT_NAME=Tutorial.mlreco_analysis
export ARCUBE_INDEX=0
./run_mlreco_analysis.sh
The final step in the chain is to produce an analysis file!
cd $TWOBYTWO_SIM/run-cafmaker
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=fermilab/fnal-wn-sl7:latest
export ARCUBE_GHEP_NU_NAME=Tutorial.genie.nu
export ARCUBE_GHEP_ROCK_NAME=Tutorial.genie.rock
export ARCUBE_MINERVA_NAME=Tutorial.minerva
export ARCUBE_MLRECO_NAME=Tutorial.mlreco_analysis
export ARCUBE_OUT_NAME=Tutorial.caf
export ARCUBE_INDEX=0
./run_cafmaker.sh
cd $TWOBYTWO_SIM/run-validation
export ARCUBE_RUNTIME=SHIFTER
export ARCUBE_CONTAINER=mjkramer/sim2x2:ndlar011
export ARCUBE_EDEP_NAME=Tutorial.convert2h5
export ARCUBE_LARND_NAME=Tutorial.larnd
export ARCUBE_FLOW_NAME=Tutorial.flow
export ARCUBE_OUT_NAME=Tutorial.plots
export ARCUBE_INDEX=0
./run_validation.sh
This will produce PDFs of plots under $ARCUBE_OUTDIR_BASE/run-validation
.
See https://github.com/lbl-neutrino/2x2Containers
See https://github.com/lbl-neutrino/fireworks4dune.
For the overlaying of the MINERvA and 2x2 geometries, see https://github.com/lbl-neutrino/GeoMergeFor2x2