Allow for model scalability beyond 1800 NTASK (bugzilla #1297) #22

DanIredell-NOAA · 2022-03-15T17:52:24Z

http://www2.spa.ncep.noaa.gov/bugzilla/show_bug.cgi?id=1297

Currently we are limited to only running the forecast job using a max of 1800 cores due to Cice's hard set NTASK value of 1800.

This hard limit on scalability makes hard to improve the science(decrease time-step or run it faster) and/or fully utilize resources.

For example:
Current reservation line for rtofs_global_forecast_step2
#PBS -l place=vscatter:excl,select=15:ncpus=120:mpiprocs=120

Using only 120 cores out of the allowed 128 per node.
The code is not memory bound so, 120 cores are idling in this case.

The following would be more efficient, but would require a different NTASK
#PBS -l place=vscatter:excl,select=15:ncpus=128:mpiprocs=128

DanIredell-NOAA · 2023-11-09T20:11:27Z

First, what we use in operations now is this: (we set exclhost, not excl)
#PBS -l place=vscatter:exclhost,select=15:ncpus=120:mpiprocs=120

Options for the place statement:

Modifer       Meaning
free          Place job on any vnode(s)
pack          All chunks will be taken from one host
scatter       Only one chunk is taken from any host
vscatter      Only one chunk is taken from any vnode.  Each chunk must fit on a vnode.
excl          Only this job uses the vnodes chosen
exclhost      The entire host is allocated to this job
shared        This job can share the vnodes chosen

DanIredell-NOAA · 2023-11-09T21:09:33Z

Second - we can create another tile layout for HYCOM that is more than 1800 tasks. That would require creating another patch.input and changing about another half dozen parm files (blkdat,input, ice_in). Also the scripts would need modifying to know which set of these files to use (based on NTASK).

And would need another hycom executable as it is compiled with NTASKS set. It is NPX * NPY and in the current case that is 450 * 4. See comp_ice.csh.

DanIredell-NOAA · 2023-12-15T20:41:08Z

At the V2.4.0 kickoff meeting it was determined that this would be put on hold until MOM-CICE version planned for RTOFS v3.0 in 2026.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for model scalability beyond 1800 NTASK (bugzilla #1297) #22

Allow for model scalability beyond 1800 NTASK (bugzilla #1297) #22

DanIredell-NOAA commented Mar 15, 2022

DanIredell-NOAA commented Nov 9, 2023

DanIredell-NOAA commented Nov 9, 2023

DanIredell-NOAA commented Dec 15, 2023 •

edited

Loading

Allow for model scalability beyond 1800 NTASK (bugzilla #1297) #22

Allow for model scalability beyond 1800 NTASK (bugzilla #1297) #22

Comments

DanIredell-NOAA commented Mar 15, 2022

DanIredell-NOAA commented Nov 9, 2023

DanIredell-NOAA commented Nov 9, 2023

DanIredell-NOAA commented Dec 15, 2023 • edited Loading

DanIredell-NOAA commented Dec 15, 2023 •

edited

Loading