[0.25deg] update PE layout #214

minghangli-uni · 2024-08-28T02:29:31Z

To update the PE layout for the 0.25 deg configuration, nuopc.runconfig, ice_in and config.yaml require corresponding modifications.
Note: These changes will be updated when the configuration is revised.

nuopc.runconfig

PELAYOUT_attributes::
  atm_ntasks = 48
  atm_nthreads = 1
  atm_pestride = 1
  atm_rootpe = 0
  cpl_ntasks = 96
  cpl_nthreads = 1
  cpl_pestride = 1
  cpl_rootpe = 0
  esmf_logging = ESMF_LOGKIND_NONE
  esp_ntasks = 1
  esp_nthreads = 1
  esp_pestride = 1
  esp_rootpe = 0
  glc_ntasks = 1
  glc_nthreads = 1
  glc_pestride = 1
  glc_rootpe = 0
  ice_ntasks = 96
  ice_nthreads = 1
  ice_pestride = 1
  ice_rootpe = 0
  lnd_ntasks = 1
  lnd_nthreads = 1
  lnd_pestride = 1
  lnd_rootpe = 0
  ninst = 1
  ocn_ntasks = 1344
  ocn_nthreads = 1
  ocn_pestride = 1
  ocn_rootpe = 96
  pio_asyncio_ntasks = 0
  pio_asyncio_rootpe = 1
  pio_asyncio_stride = 0
  rof_ntasks = 48
  rof_nthreads = 1
  wav_ntasks = 1
  wav_nthreads = 1
  wav_pestride = 1
  wav_rootpe = 0
::

ice_in

&domain_nml
  block_size_x = 30
  block_size_y = 27
  distribution_type = "roundrobin"
  distribution_wght = "latitude"
  maskhalo_bound = .true.
  maskhalo_dyn = .true.
  maskhalo_remap = .true.
  max_blocks = 20
  ns_boundary_type = "tripole"
  nx_global = 1440
  ny_global = 1080
  processor_shape = "square-ice"
  debug_blocks = True
/

config.yaml

queue: normal
ncpus: 1440
jobfs: 10GB
mem: 5760GB

walltime: 24:00:00
jobname: 025deg_jra55do_ryf

model: access-om3

The text was updated successfully, but these errors were encountered:

dougiesquire · 2024-08-28T02:32:13Z

You'll also need to update the config.yaml for the new ncpus and mem

minghangli-uni · 2024-08-28T02:33:44Z

Right, thanks @dougiesquire

anton-seaice · 2024-08-28T02:51:10Z

I can't remember where we got those block sizes from, we should get better performance if we can reduce max_blocks (say to 10?) by setting the blocksizes differently.

Sorry I was wrong last week - we did put in a patch for max_blocks ... you can remove it from the namelist. Its still good to check the logs to get it to closer to 10.

The process would be - pick number of procs, then set block_size_x & block_size_y such that the blocks are close to square, and there are around 10 per PE (ideally also nx_global is divisible by block_size_x and ny_global is divisible by block_size_y)

We can also remove debug_blocks - but whilst setting the block size it provides useful information

minghangli-uni · 2024-08-28T07:48:13Z

I came across this issue again #156, where I forgot to adjust the pio settings after changing CICE layout.

     pio_numiotasks = 5
     pio_rearranger = 1
     pio_root = 1
     pio_stride = 48

@anton-seaice I understand the calculations, but could you please clarify why the ICE pio settings are configured this way? Will this improve the performance?

The error message isn’t very intuitive, making it difficult for users to realise that they need to modify these parameters when changing the layout.

Can we revert it to the settings used in the 1deg configuration, here https://github.com/ACCESS-NRI/access-om3-configs/blob/2bc6107ef1b195aa62485a5d87c4ba834996d8cc/nuopc.runconfig#L364-L373?

ICE_modelio::
     diro = ./log
     logfile = ice.log
     pio_async_interface = .false.
     pio_netcdf_format = nothing
     pio_numiotasks = 1
     pio_rearranger = 1
     pio_root = 0
     pio_stride = 48
     pio_typename = netcdf4p

minghangli-uni · 2024-08-28T07:52:11Z

I can't remember where we got those block sizes from

The block sizes were adopted from OM2 report, which specifies a CICE5 block size of 30x27, with a square-ice processor shape and roundrobin distribution type.

Its still good to check the logs to get it to closer to 10.

I cant remember why having the number of blocks close to 10?

anton-seaice · 2024-08-28T23:12:49Z

@anton-seaice I understand the calculations, but could you please clarify why the ICE pio settings are configured this way? Will this improve the performance?

In the old COSIMA TWG minutes from OM2 development (on the COSIMA website) the recommendation from NCI was to use one task per node. I think the Yang 2019 on Parallel I/O in MOM5 makes similar suggestion ? I guess there is a hardware benefit to one task per node. There's so many options there its hard to know what the best combination is without lots of work. e.g. we could also test having a dedicated IO PE, or changing the PIO_rearranger

I think one IO task per node is a good start. We could try just one IO task, it might not make much difference at this resolution.

The error message isn’t very intuitive, making it difficult for users to realise that they need to modify these parameters when changing the layout.

I agree, does it make a seperate ESMF log file ? I think they have names something like PETXX.ESMF...

It possible there are options in the ESMF build to change how the logging is done.
A good thing to do would be to check in payu if the PE layout fits within the request compute resources

The block sizes were adopted from OM2 report, which specifies a CICE5 block size of 30x27, with a square-ice processor shape and roundrobin distribution type.

Ok thanks!

I cant remember why having the number of blocks close to 10?

From the cice docs :

Smaller, more numerous blocks provides an opportunity for better load balance by allocating each processor both ice-covered and ice-free blocks. But smaller, more numerous blocks becomes less efficient due to MPI communication associated with halo updates. In practice, blocks should probably not have fewer than about 8 to 10 grid cells in each direction, and more square blocks tend to optimize the volume-to-surface ratio important for communication cost. Often 3 to 8 blocks per processor provide the decompositions flexiblity to create reasonable load balance configurations.

So we should actually aim for number of blocks of 8 or less by the sounds of it :)

minghangli-uni · 2024-08-29T04:17:06Z

I think one IO task per node is a good start. We could try just one IO task, it might not make much difference at this resolution.

I agree for the current phase. I will do a test on the I/O tasks to verify the optimal configuration.

does it make a seperate ESMF log file ? I think they have names something like PETXX.ESMF...

This can be enabled by setting thiscreate_esmf_pet_files to true in drv_in, but this should be used mostly for debugging purposes, not in production runs.
And would it be helpful to add a comment after the PE setup for ice_ntasks to reference this issue?
For example: ice_ntasks = 96 #NB: Parallel I/O github.com/COSIMA/access-om3/issues/214 This would inform users meeting the issue about the current setup, and we can remove the comment once the I/O is optimised.

So we should actually aim for number of blocks of 8 or less by the sounds of it :)

The updated settings result in a max_blocks of 5, in the range of 3-8 blocks per processor, which aligns with CICE docs.

&domain_nml
  block_size_x = 60
  block_size_y = 54
  distribution_type = "roundrobin"
  distribution_wght = "latitude"
  maskhalo_bound = .true.
  maskhalo_dyn = .true.
  maskhalo_remap = .true.
  max_blocks = -1
  ns_boundary_type = "tripole"
  nx_global = 1440
  ny_global = 1080
  processor_shape = "square-ice"
/

minghangli-uni · 2024-08-29T04:26:04Z

When setting max_blocks = -1 with the roundrobin distribution type, the max_blocks prescribed by CICE does not always match the actual number of ice blocks. E.g., with the above configuration, max_blocks is set to 6, but the log shows a warning:

 534   block_size_x,_y       =     60    54
 535   max_blocks            =      6
 536   Number of ghost cells =      1
 537
 538  (ice_read_global_nc) min, max, sum =   -1.41413909065909
 539    1.57079632679490        154674.873407807      ulat
 540  (ice_read_global_nc) min, max, sum =   0.000000000000000E+000
 541    1.00000000000000        969809.000000000      kmt
 542  ice_domain work_unit, max_work_unit =        28035          10
 543  ice_domain nocn =            0      280343    44787740
 544  ice_domain work_per_block =            0          11        2204
 545  ice: total number of blocks is         391
 546   ********WARNING***********
 547  (init_domain_distribution)
 548   WARNING: ice no. blocks too large: decrease max to           5

Despite this warning, I don’t believe it will impact overall performance since MOM typically has a much higher computational load than CICE.

NB:
max_blocks = -1 with the rake distribution type fails.

anton-seaice · 2024-08-29T04:29:48Z

Why do you think max_blocks shouldn't be 5 ?

minghangli-uni · 2024-08-29T04:31:42Z

It can be 5, but we have to manually modify it to be 5

anton-seaice · 2024-08-29T04:33:17Z

Oh sorry, I see now. That's something about the patch we put into access-om3 0.3.x for removing max_blocks, and the max_blocks calculation being approximate. When we update the cice version it should go away (after CICE-Consortium/CICE#954)

It will allocate ~20% more memory than it uses , but it uses a small enough amount of memory there probably isn't a performance impact.

anton-seaice · 2024-08-29T05:35:29Z

I created payu-org/payu#496 to add checks for the iolayout numbers

anton-seaice · 2024-09-04T06:40:30Z

Closed through ACCESS-NRI/access-om3-configs#114

minghangli-uni self-assigned this Aug 28, 2024

COSIMA deleted a comment Aug 28, 2024

minghangli-uni mentioned this issue Aug 29, 2024

[0.25deg] Update PE layout ACCESS-NRI/access-om3-configs#114

Merged

minghangli-uni mentioned this issue Sep 2, 2024

update doc files ACCESS-NRI/access-om3-configs#118

Merged

anton-seaice closed this as completed Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.25deg] update PE layout #214

[0.25deg] update PE layout #214

minghangli-uni commented Aug 28, 2024 •

edited

Loading

dougiesquire commented Aug 28, 2024

minghangli-uni commented Aug 28, 2024

anton-seaice commented Aug 28, 2024

minghangli-uni commented Aug 28, 2024

minghangli-uni commented Aug 28, 2024

anton-seaice commented Aug 28, 2024

minghangli-uni commented Aug 29, 2024 •

edited

Loading

minghangli-uni commented Aug 29, 2024

anton-seaice commented Aug 29, 2024

minghangli-uni commented Aug 29, 2024

anton-seaice commented Aug 29, 2024 •

edited

Loading

anton-seaice commented Aug 29, 2024

anton-seaice commented Sep 4, 2024

[0.25deg] update PE layout #214

[0.25deg] update PE layout #214

Comments

minghangli-uni commented Aug 28, 2024 • edited Loading

dougiesquire commented Aug 28, 2024

minghangli-uni commented Aug 28, 2024

anton-seaice commented Aug 28, 2024

minghangli-uni commented Aug 28, 2024

minghangli-uni commented Aug 28, 2024

anton-seaice commented Aug 28, 2024

minghangli-uni commented Aug 29, 2024 • edited Loading

minghangli-uni commented Aug 29, 2024

anton-seaice commented Aug 29, 2024

minghangli-uni commented Aug 29, 2024

anton-seaice commented Aug 29, 2024 • edited Loading

anton-seaice commented Aug 29, 2024

anton-seaice commented Sep 4, 2024

minghangli-uni commented Aug 28, 2024 •

edited

Loading

minghangli-uni commented Aug 29, 2024 •

edited

Loading

anton-seaice commented Aug 29, 2024 •

edited

Loading