Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe performance degradation in 4.9.3-rc2 #3067

Open
xavierabellan opened this issue Dec 20, 2024 · 5 comments
Open

Severe performance degradation in 4.9.3-rc2 #3067

xavierabellan opened this issue Dec 20, 2024 · 5 comments

Comments

@xavierabellan
Copy link

This issue is related to Unidata/netcdf4-python#1393 but narrowed down to netcdf-c library in it's 4.9.3-rc2 version.

We have noticed a very significant degradation in read performance when accessing a variable in a loop.

As a reproducer, the International Best Track Archive for Climate Stewardship (IBTrACS) data can be used:

wget https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r01/access/netcdf/IBTrACS.ALL.v04r01.nc

With the following minimal C code:

#include <netcdf.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define FILE_NAME "IBTrACS.ALL.v04r00.nc"
#define DIM_STORM "storm"
#define VAR_LAT "lat"

int main() {
    int ncid;                 // File ID
    int storm_dimid;          // ID for the "storm" dimension
    int lat_varid;            // ID for the "lat" variable
    size_t nstorms;           // Total number of storms
    size_t start[2], count[2]; // Start and count arrays for reading lat data
    float lat_value;          // Latitude value to read
    double total_time = 0.0;  // Total time of getting lat value
    int iterations = 0;       // Number of iterations

    // Open the NetCDF file
    nc_open(FILE_NAME, NC_NOWRITE, &ncid);

    // Get the "storm" dimension ID and size
    nc_inq_dimid(ncid, DIM_STORM, &storm_dimid);
    nc_inq_dimlen(ncid, storm_dimid, &nstorms);

    // Get the "lat" variable ID
    nc_inq_varid(ncid, VAR_LAT, &lat_varid);

    // Iterate over the last 10 storms
    for (size_t ns = nstorms - 10; ns < nstorms; ns++) {
        // Set start and count for reading a single value
        start[0] = ns;  // Index of the storm
        start[1] = 0;   // First time step
        count[0] = 1;   // Read one storm
        count[1] = 1;   // Read one time step

        clock_t cstart, cend;
        cstart = clock();

        // Read the latitude value
        nc_get_vara_float(ncid, lat_varid, start, count, &lat_value);

        cend = clock();
        double time_taken = ((double)(cend - cstart)) * 1000/ CLOCKS_PER_SEC; // Convert to ms
        total_time += time_taken;
	iterations++;

        // Print the latitude value
        printf("Storm %zu: Latitude is %f, Time taken = %.3f ms\n", ns, lat_value, time_taken);

    }

    double average_time = total_time / iterations;
    printf("\nAverage runtime per iteration: %.3f ms\n", average_time);

    // Close the NetCDF file
    nc_close(ncid);

    return 0;
}

iterating over the last 10 storms in the input file and accessing the corresponding lat value. With version 4.9.2 and below only the first read takes significant time, while the rest take virtually nothing:

Storm 13801: Latitude is -12.000000, Time taken = 80.000 ms
Storm 13802: Latitude is -13.400000, Time taken = 0.000 ms
Storm 13803: Latitude is -12.700001, Time taken = 0.000 ms
Storm 13804: Latitude is -13.100001, Time taken = 0.000 ms
Storm 13805: Latitude is -13.800000, Time taken = 0.000 ms
Storm 13806: Latitude is -8.600000, Time taken = 0.000 ms
Storm 13807: Latitude is -9.100000, Time taken = 0.000 ms
Storm 13808: Latitude is -2.300000, Time taken = 0.000 ms
Storm 13809: Latitude is 11.199999, Time taken = 0.000 ms
Storm 13810: Latitude is 18.800001, Time taken = 0.000 ms

Average runtime per iteration: 8.000 ms

Where as using 4.9.3-rc2, every iteration takes a similar amount of time:

Storm 13801: Latitude is -12.000000, Time taken = 80.000 ms
Storm 13802: Latitude is -13.400000, Time taken = 90.000 ms
Storm 13803: Latitude is -12.700001, Time taken = 50.000 ms
Storm 13804: Latitude is -13.100001, Time taken = 60.000 ms
Storm 13805: Latitude is -13.800000, Time taken = 40.000 ms
Storm 13806: Latitude is -8.600000, Time taken = 50.000 ms
Storm 13807: Latitude is -9.100000, Time taken = 50.000 ms
Storm 13808: Latitude is -2.300000, Time taken = 50.000 ms
Storm 13809: Latitude is 11.199999, Time taken = 50.000 ms
Storm 13810: Latitude is 18.800001, Time taken = 50.000 ms

Average runtime per iteration: 57.000 ms

For certain programs, this may result in a very noticeable slowdown.

This has been reproduced in Linux and Mac.

@WardF
Copy link
Member

WardF commented Dec 21, 2024

Thank you for reporting this, and thank you for providing the test program. I'll take a look at this and see if I can sort it out. Out of curiosity, do you know which version of libhdf5 is being linked against?

@xavierabellan
Copy link
Author

The results pasted above were using hdf5 1.14.5, but I could also observe the same behaviour with a 1.12.X

@edwardhartnett
Copy link
Contributor

OK, this is very significant and I need to take a look at this. @WardF I don't think you should release 4.9.3 until we get to the bottom of this problem.

The access pattern in use here is very common indeed, so this is not an edge case. We can't just slow down everyone's supercomputer programs for more I/O! There will be a crowd of angry scientists outside Unidata offices with pitchforks and torches...

@edwardhartnett
Copy link
Contributor

Also it would be great if we could work out some basic performance testing in CI. It's a difficult problem, but surely we can try to catch such an obvious slowdown somehow...

@DennisHeimbigner
Copy link
Collaborator

My first thought was that the implementation of get_vara/get_vars had changed.
So I took the get_vars code from 4.9.2 and inserted it into the main branch.
It still shows the same perf. problem. So with high probability, it is not get_vars
(or at least only get_vars). BTW, I was using HDF5 1.12.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants