Thomas edited this page Nov 2, 2023 · 32 revisions

Running SBN code on Polaris with parsl

This wiki documents the steps required to run code for the Short Baseline Neutrino (SBN) program on the Polaris supercomputer at Argonne National Laboratory.


SBN-related code is built on the LArSoft framework. The libraries required to build and run LArSoft are copied from manifests available at and installed in a Scientific Linux 7 container.

Once LArSoft is installed, it becomes possible to load experiment-specific software via ups in the same was as on Fermilab virtual machines, e.g.,

source ${LARSOFT_ROOT}/setup
setup sbndcode v09_75_03_02 -q e20:prof

Disk resources required to run the code are divided into two filesystems available on Polaris: eagle and grand. The grand filesystem contains compiled code and input files, while eagle is used for outputs and transfers.

Getting Started

  1. Request a user account on Polaris with access to neutrinoGPU project.
  2. Once logged in, create a local Conda environment and install parsl:
    module load conda/2023-10-04
    conda activate
    conda create -n sbn python=3.10
    conda activate sbn
    pip install parsl
  3. Clone the sbnd_parsl repository to your home directory. Modify the program to adjust the list of .fcl files, change submission configuration options, etc.
  4. Submit jobs by running the program, e.g. python sbnd_parsl/ -o /lus/eagle/projects/neutrinoGPU/my-production

Developing and Testing on Polaris

Interactive Jobs

You can test software within an interactive job. To begin an interactive job, create a script called with the following contents and run it:

# Start an interactive job


qsub -I -l select=1 -l walltime=0:45:00 -q debug \
        -A "${ALLOCATION}" -l filesystems="${FILESYSTEM}"

Once a slot on the debug queue becomes available, you will be automatically connected to a prompt within the interactive job.

Tips and Tricks

  • The program pbsq is installed at /lus/grand/projects/neutrinoGPU/software/pbsq. It produces more readable output about job status and can be invoked with pbsq -f neutrinoGPU.
  • Once your job is running you can ssh into the worker node.  Get the node with qstat -u $(whoami) or via pbsq, it should start with "x." Once connected, you can check the memory usage and other metrics with e.g. cat /proc/meminfo.
  • You can log in to Polaris once via an ssh tunnel, and allow future ssh connections to connect without requiring authentication. Place the function in your computer's local .bashrc or .zshrc file:
    connect_polaris () {
        # macOS (BSD-based ps)
        # s=$(ps -Ao user,pid,%cpu,%mem,vsz,rss,tt,stat,start,time,command \
        #     | grep $(whoami) | sed -e 's/sshd//g' | grep ssh | grep fNT | grep polaris)
        # Unix
        s=$(ps -aux | grep $(whoami) | sed -e 's/sshd//g' | grep ssh | grep fNT | grep polaris) 
        if [ -z "$s" ]; then
            echo "Opening background connection to Polaris"
            ssh -fNTY "$@" ${USER}
            ssh -Y "$@" ${USER}
  • If parsl ends immediately with exit status 0 or crashes, it is usually a job queue issue. The first scenario usually means parsl has put jobs into the queue, and exited, while the second could be there are outstanding held jobs that should be manually removed with jobsub_rm

Further Reading

