Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support larger mesh sizes in the NetCDF-C mesh converter #595

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Dec 18, 2024

The nCells, nEdges and nVertices dimensions now are of type size_t in the c++ code. Products of these dimensions with other dimensions (maxEdges2, two, vertexDegree, etc.) are computed as size_t variables and then used to allocate arrays to avoid int overflow.

These changes are likely not sufficient to allow meshes with more than ~2 billion edges (or vertices or cells) because index arrays are still stored as int type.

@xylar xylar added the bug label Dec 18, 2024
@xylar xylar self-assigned this Dec 18, 2024
@xylar
Copy link
Collaborator Author

xylar commented Dec 18, 2024

@changliao1025 pointed me to a mesh on Compy:

/compyfs/liao313/mpas/tmptqk3i9r4/mesh_in.nc

with nEdges = 111735419 and maxEdge2 = 22 that crashes MpasMeshConverter.x with:

terminate called after throwing an instance of ‘std::bad_array_new_length’
 what(): std::bad_array_new_length

This turns out to be because of an int overflow:


This product overflows the maximum signed int and becomes a (large) negative number.

I am still testing the changes here but the fixes to that particular line
https://github.com/xylar/MPAS-Tools/blob/ccfeab98ac4a1b1bf0b3bb7788cef4e8eaf1b231/mesh_tools/mesh_conversion_tools_netcdf_c/mpas_mesh_converter.cpp#L2828-L2831
successfully allow the converter to write out *OnEdge arrays that it was not writing before the fix.

The `nCells`, `nEdges` and `nVertices` dimensions now are of
type `size_t` in the c++ code.  Products of these dimensions with
other dimensions (`maxEdges2`, `two`, `vertexDegree`, etc.) are
computed as `size_t` variables and then used to allocate arrays
to avoid `int` overflow.

These changes are likely not sufficient to allow meshes with more
than ~2 billion edges (or vertices or cells) because index arrays
are still stored as `int` type.
@xylar xylar force-pushed the fix-mesh-converter-for-large-meshes branch from ccfeab9 to 29ddc41 Compare December 18, 2024 14:54
@changliao1025
Copy link

@xylar Thank you for identifying the bug and providing a fix for it. The large mesh I attempted to generate is an improved mesh workflow designed to consider many hydrologic features (land surface, river, lake with boundary, dam, cities, watershed boundary). Due to the land surface heterogeneity, these features require a relatively high-resolution mesh (~5km) to resolve. Therefore, the global mesh size is significantly increased.

@xylar xylar marked this pull request as ready for review December 18, 2024 17:36
@xylar
Copy link
Collaborator Author

xylar commented Dec 18, 2024

Testing

I was able to run the mesh converter with this branch (I did a local build of the conda package):

$ ncdump -h mesh_out.nc 
netcdf mesh_out {
dimensions:
	nCells = 37245075 ;
	nEdges = 111735419 ;
	nVertices = 74490146 ;
	maxEdges = 11 ;
	maxEdges2 = 22 ;
	TWO = 2 ;
	vertexDegree = 3 ;
	Time = UNLIMITED ; // (0 currently)
variables:
	double latCell(nCells) ;
		latCell:long_name = "latitudes of cell centres" ;
	double lonCell(nCells) ;
		lonCell:long_name = "longitudes of cell centres" ;
	double xCell(nCells) ;
		xCell:long_name = "x-coordinates of cell centres" ;
	double yCell(nCells) ;
		yCell:long_name = "y-coordinates of cell centres" ;
	double zCell(nCells) ;
		zCell:long_name = "z-coordinates of cell centres" ;
	int indexToCellID(nCells) ;
		indexToCellID:long_name = "index to cell ID mapping" ;
	double latEdge(nEdges) ;
		latEdge:long_name = "latitudes of edge centres" ;
	double lonEdge(nEdges) ;
		lonEdge:long_name = "longitudes of edge centres" ;
	double xEdge(nEdges) ;
		xEdge:long_name = "x-coordinates of edge centres" ;
	double yEdge(nEdges) ;
		yEdge:long_name = "y-coordinates of edge centres" ;
	double zEdge(nEdges) ;
		zEdge:long_name = "z-coordinates of edge centres" ;
	int indexToEdgeID(nEdges) ;
		indexToEdgeID:long_name = "index to edge ID mapping" ;
	double latVertex(nVertices) ;
		latVertex:long_name = "latitudes of vertices" ;
	double lonVertex(nVertices) ;
		lonVertex:long_name = "longitudes of vertices" ;
	double xVertex(nVertices) ;
		xVertex:long_name = "x-coordinates of vertices" ;
	double yVertex(nVertices) ;
		yVertex:long_name = "y-coordinates of vertices" ;
	double zVertex(nVertices) ;
		zVertex:long_name = "z-coordinates of vertices" ;
	int indexToVertexID(nVertices) ;
		indexToVertexID:long_name = "index to vertex ID mapping" ;
	int cellsOnCell(nCells, maxEdges) ;
		cellsOnCell:long_name = "cells adj. to each cell" ;
	int edgesOnCell(nCells, maxEdges) ;
		edgesOnCell:long_name = "edges on each cell" ;
	int verticesOnCell(nCells, maxEdges) ;
		verticesOnCell:long_name = "vertices on each cell" ;
	int nEdgesOnCell(nCells) ;
		nEdgesOnCell:long_name = "number of edges on each cell" ;
	int edgesOnEdge(nEdges, maxEdges2) ;
		edgesOnEdge:long_name = "edges adj. to each edge" ;
	int cellsOnEdge(nEdges, TWO) ;
		cellsOnEdge:long_name = "cells adj. to each edge" ;
	int verticesOnEdge(nEdges, TWO) ;
		verticesOnEdge:long_name = "vertices on each edge" ;
	int nEdgesOnEdge(nEdges) ;
		nEdgesOnEdge:long_name = "number of edges on each edge" ;
	int cellsOnVertex(nVertices, vertexDegree) ;
		cellsOnVertex:long_name = "vertices adj. to each vertex" ;
	int edgesOnVertex(nVertices, vertexDegree) ;
		edgesOnVertex:long_name = "edges adj. to each vertex" ;
	int boundaryVertex(nVertices) ;
		boundaryVertex:long_name = "non-zero for each vertex on mesh boundary" ;
	double areaCell(nCells) ;
		areaCell:long_name = "surface areas of cells" ;
	double angleEdge(nEdges) ;
		angleEdge:long_name = "angle to edges" ;
	double dcEdge(nEdges) ;
		dcEdge:long_name = "length of arc between centres" ;
	double dvEdge(nEdges) ;
		dvEdge:long_name = "length of arc between vertices" ;
	double weightsOnEdge(nEdges, maxEdges2) ;
		weightsOnEdge:long_name = "tangential flux reconstruction weights" ;
	double areaTriangle(nVertices) ;
		areaTriangle:long_name = "surface areas of dual cells" ;
	double kiteAreasOnVertex(nVertices, vertexDegree) ;
		kiteAreasOnVertex:long_name = "surface areas of overlap between cells and dual cells" ;
	double cellQuality(nCells) ;
		cellQuality:long_name = "quality of mesh cells" ;
	double gridSpacing(nCells) ;
		gridSpacing:long_name = "grid spacing distribution" ;
	double triangleQuality(nVertices) ;
		triangleQuality:long_name = "quality of mesh dual cells" ;
	double triangleAngleQuality(nVertices) ;
		triangleAngleQuality:long_name = "quality of mesh dual cells" ;
	int obtuseTriangle(nVertices) ;
		obtuseTriangle:long_name = "non-zero for any dual cell containing obtuse angles" ;
	double meshDensity(nCells) ;
		meshDensity:long_name = "mesh density distribution" ;

// global attributes:
		:on_a_sphere = "YES" ;
		:sphere_radius = 6371000. ;
		:is_periodic = "NO" ;
		:history = "MpasMeshConverter.x tmptqk3i9r4/mesh_in.nc mesh_out.nc\nMon Dec 16 18:49:32 2024: run_pyflowline.py" ;
		:mesh_spec = "1.0" ;
		:Conventions = "MPAS" ;
		:source = "MpasMeshConverter.x" ;
		:file_id = "kry9y9yvrj" ;
}

@xylar
Copy link
Collaborator Author

xylar commented Dec 18, 2024

@mark-petersen, @matthewhoffman, @erinethomas, I have chosen the 3 of you to review this as representatives of the MPAS components in E3SM.

I don't believe this will have any unexpected results but I have not yet done any BFB tests yet. I will do that right away.

@xylar
Copy link
Collaborator Author

xylar commented Dec 18, 2024

I ran MpasMeshConverter.x on the small mesh we use for testing (https://github.com/MPAS-Dev/MPAS-Tools/blob/master/mesh_tools/mesh_conversion_tools/test/mesh.QU.1920km.151026.nc) using both the latest release (v0.36.0) and this branch. I used the following script to verify that the results are identical:

#!/usr/bin/env python
import numpy as np
import xarray as xr

ds_old = xr.open_dataset('mesh_0.36.0.nc')
ds_new = xr.open_dataset('mesh_fix.nc')

for var_name in ds_old:
    da_old = ds_old[var_name]
    da_new = ds_new[var_name]
    max_diff = np.abs(da_old - da_new).max().values
    print(f'{var_name}: {max_diff}')

I see:

latCell: 0.0
lonCell: 0.0
xCell: 0.0
yCell: 0.0
zCell: 0.0
indexToCellID: 0
latEdge: 0.0
lonEdge: 0.0
xEdge: 0.0
yEdge: 0.0
zEdge: 0.0
indexToEdgeID: 0
latVertex: 0.0
lonVertex: 0.0
xVertex: 0.0
yVertex: 0.0
zVertex: 0.0
indexToVertexID: 0
cellsOnCell: 0
edgesOnCell: 0
verticesOnCell: 0
nEdgesOnCell: 0
edgesOnEdge: 0
cellsOnEdge: 0
verticesOnEdge: 0
nEdgesOnEdge: 0
cellsOnVertex: 0
edgesOnVertex: 0
boundaryVertex: 0
areaCell: 0.0
angleEdge: 0.0
dcEdge: 0.0
dvEdge: 0.0
weightsOnEdge: 0.0
areaTriangle: 0.0
kiteAreasOnVertex: 0.0
cellQuality: 0.0
gridSpacing: 0.0
triangleQuality: 0.0
triangleAngleQuality: 0.0
obtuseTriangle: 0
meshDensity: 0.0

@xylar
Copy link
Collaborator Author

xylar commented Dec 18, 2024

@mark-petersen, @matthewhoffman and @erinethomas, if it helps with the review, I'm happy to build you an mpas_tools conda package with this branch and put it in an environment for you on the machine of your choice. I have tested this on Compy so far but I don't recommend working there if you don't have to.

I'm also fine with a visual inspection of the code if you prefer, but I know you all are not necessarily C++ experts.

@xylar
Copy link
Collaborator Author

xylar commented Dec 18, 2024

A note on size_t for your info:
https://en.cppreference.com/w/cpp/types/size_t
The key piece is:

std::size_t can store the maximum size of a theoretically possible object of any type (including array).
So much safer than int (which is 32 bit and signed, so can't represent numbers much over 2 billion) for representing large array sizes.

Copy link
Member

@matthewhoffman matthewhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar , I reviewed based on inspection and this makes complete sense and is the solution I would have expected to the problem. I am comfortable with the testing you've reported and do not feel the need to test myself.

@xylar
Copy link
Collaborator Author

xylar commented Jan 6, 2025

@mark-petersen and @erinethomas, a reminder that I'd appreciate your review on this or if you would suggest an alternative reviewer.

@xylar
Copy link
Collaborator Author

xylar commented Jan 7, 2025

@mark-petersen and @erinethomas, I made a test conda environment with this mpas_tools that you can try out. You can run:

source /lcrc/group/e3sm/ac.xylar/conda_test_mpas/etc/profile.d/conda.sh
conda activate mpas_dev

Copy link

@erinethomas erinethomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this PR based on the following testing:

I loaded three different conda environments on chrysalis with unique versions of MPAS_Tools (the 'default' MPAS_tools from e3sm unified environment, 'Xylars' environment listed above, and my own local built conda environment with this branch).

I then ran MpasMeshConverter.x on one of my own wave mesh files, using each of the above conda environments.

I used the following python script (Based on Xylar's python script above) to determine the differences in the resulting mesh files

#!/usr/bin/env python
import numpy as np
import xarray as xr

ds_old = xr.open_dataset('waves_mesh_OUT_DEFAULT_TEST.nc')
ds_new = xr.open_dataset('waves_mesh_OUT_XYLAR_TEST.nc')
print('DIFFERENCES BTW XYLAR ENV and DEFAULT')
for var_name in ds_old:
    da_old = ds_old[var_name]
    da_new = ds_new[var_name]
    max_diff = np.abs(da_old - da_new).max().values
    print(f'{var_name}: {max_diff}')

print('DIFFERENCES BTW ERIN ENV and DEFAULT')
ds_new = xr.open_dataset('waves_mesh_OUT_ERIN_TEST.nc')
for var_name in ds_old:
    da_old = ds_old[var_name]
    da_new = ds_new[var_name]
    max_diff = np.abs(da_old - da_new).max().values
    print(f'{var_name}: {max_diff}')

I find bit-for-bit results in all three versions of MpasMeshConverter as seen here:

DIFFERENCES BTW XYLAR ENV and DEFAULT
latCell: 0.0
lonCell: 0.0
xCell: 0.0
yCell: 0.0
zCell: 0.0
indexToCellID: 0
latEdge: 0.0
lonEdge: 0.0
xEdge: 0.0
yEdge: 0.0
zEdge: 0.0
indexToEdgeID: 0
latVertex: 0.0
lonVertex: 0.0
xVertex: 0.0
yVertex: 0.0
zVertex: 0.0
indexToVertexID: 0
cellsOnCell: 0
edgesOnCell: 0
verticesOnCell: 0
nEdgesOnCell: 0
edgesOnEdge: 0
cellsOnEdge: 0
verticesOnEdge: 0
nEdgesOnEdge: 0
cellsOnVertex: 0
edgesOnVertex: 0
boundaryVertex: 0
areaCell: 0.0
angleEdge: 0.0
dcEdge: 0.0
dvEdge: 0.0
weightsOnEdge: 0.0
areaTriangle: 0.0
kiteAreasOnVertex: 0.0
cellQuality: 0.0
gridSpacing: 0.0
triangleQuality: 0.0
triangleAngleQuality: 0.0
obtuseTriangle: 0
meshDensity: 0.0
DIFFERENCES BTW ERIN ENV and DEFAULT
latCell: 0.0
lonCell: 0.0
xCell: 0.0
yCell: 0.0
zCell: 0.0
indexToCellID: 0
latEdge: 0.0
lonEdge: 0.0
xEdge: 0.0
yEdge: 0.0
zEdge: 0.0
indexToEdgeID: 0
latVertex: 0.0
lonVertex: 0.0
xVertex: 0.0
yVertex: 0.0
zVertex: 0.0
indexToVertexID: 0
cellsOnCell: 0
edgesOnCell: 0
verticesOnCell: 0
nEdgesOnCell: 0
edgesOnEdge: 0
cellsOnEdge: 0
verticesOnEdge: 0
nEdgesOnEdge: 0
cellsOnVertex: 0
edgesOnVertex: 0
boundaryVertex: 0
areaCell: 0.0
angleEdge: 0.0
dcEdge: 0.0
dvEdge: 0.0
weightsOnEdge: 0.0
areaTriangle: 0.0
kiteAreasOnVertex: 0.0
cellQuality: 0.0
gridSpacing: 0.0
triangleQuality: 0.0
triangleAngleQuality: 0.0
obtuseTriangle: 0
meshDensity: 0.0

@xylar
Copy link
Collaborator Author

xylar commented Jan 7, 2025

Thanks so much @erinethomas. That's a nice test to have that we're doing no harm here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants