Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF Error when reading a NetCDF file as part of tests (only NetCDF4==1.7.1, using tox on Linux) #1343

Open
matteobachetti opened this issue Jun 24, 2024 · 14 comments

Comments

@matteobachetti
Copy link

matteobachetti commented Jun 24, 2024

My code saves and analyzes data in NetCDF4 format. I have no problem whatsoever with the analysis.
However, when I run unit tests with tox on Linux I get a ton of HDF and OS errors, e.g.:
https://github.com/StingraySoftware/HENDRICS/actions/runs/9580442835/job/26417155244?pr=164

I could reproduce this when running tox -e py311-test-alldeps, but only on Linux. On Mac OS (M1) the same tox command works with no issue, and if I run the tests with pytest on a fresh conda environment with the same software versions (in particular, the same necdf4, h5py, and numpy versions) of the tox environment, it works in all architectures. Apparently, I can only reproduce the issue while running with tox on Linux. This make debugging a lot more difficult.

On Stackoverflow, another user found that the error only occurred with NetCDF4 1.7.1, and indeed, fixing netcdf4 to !=1.7.1 made our test pass as well: https://github.com/StingraySoftware/HENDRICS/actions/runs/9650470922/job/26616163437?pr=165

@larsevj
Copy link

larsevj commented Jun 25, 2024

We are also seeing this issue with version 1.7.1, tests are failing on Linux:
https://github.com/equinor/ert/actions/runs/9660713887/job/26646962384?pr=8189
Cloning the repo and building from source in the workflow fixes the issue, so seems like something is off with the pypi wheels?

@ocefpaf
Copy link
Collaborator

ocefpaf commented Jun 25, 2024

Can you can create a simple, small, and reproducible example of code that we can use to debug?

@matteobachetti
Copy link
Author

@ocefpaf I'm trying but it's tricky. It only fails on tox for me! If I install the same version and run the tests directly with pytest, everything works

@larsevj
Copy link

larsevj commented Jun 25, 2024

I am not sure if this is the exact same issue as the tests failing above, but this code works on netCDF4<1.7.1, but fails on 1.7.1:

import h5py
import numpy as np
import netCDF4 as nc

rootgrp = nc.Dataset("test.nc", "w", format="NETCDF4")
x = rootgrp.createDimension("x", 1)
y = rootgrp.createVariable("y","f4",("x",))
y[:] = 0

, seems like the importing of h5py is messing things up with 1.7.1.
Note, this example only fails if h5py is imported before netcdf4.

@ocefpaf
Copy link
Collaborator

ocefpaf commented Jun 25, 2024

Note, this example only fails if h5py is imported before netcdf4.

This is probably b/c of a mismatch on the hdf5 libraries used. Sadly, unless both h5py and netcdf4 coordinate on what to use, that is a limitation of wheels as far as I know and you'll have to separate the workflows that imports both. Sorry but I don't have a better solution for wheels. You could try other package managers, like conda, where both h5py and netcdf4 will be using the exact same hdf5 library to ensure things like this doesn't happen.

@ocefpaf
Copy link
Collaborator

ocefpaf commented Jun 25, 2024

Similar issues: #653, #1214, #694, #213.

@isuruf, sorry for the ping but do you believe that there is something I'm missing in the cibuildwheel configuration here that solves this? I recall that delvewheel fixed this on Windows but I thought that auditwheel was run by default on Linux and should also fix that, no?

@NikosAlexandris
Copy link

Same issue here too. Using 1.7.1 does give an Errno 101 when reading a netCDF file the usual way, ie Dataset('somefile.nc').

@ZedThree
Copy link
Contributor

ZedThree commented Jul 1, 2024

@ocefpaf Would it help to change ghcr.io/ocefpaf/manylinux2014_x86_64-netcdf to use FROM ghcr.io/h5py/manylinux2014_x86_64-hdf5 and explicitly build on top of the h5py manylinux image? I don't think that would guarantee compatibility, as that would require keeping releases in sync, but it should at least ensure there is one compatible version of h5py.

Another potential solution could be to add h5py as a build dependency, but I think this is likely much harder to make work.


I can't actually see how the built hdf5 libraries could really differ, both images start with a plain manylinux image, and built hdf5 1.14.2 with the default options. Yet in the wheel, we end up with libhdf5-7b49ac63.so.310.2.0 (netCDF4) and libhdf5-7f639dcd.so.310.2.0 (h5py).

@ocefpaf
Copy link
Collaborator

ocefpaf commented Jul 1, 2024

@ocefpaf Would it help to change ghcr.io/ocefpaf/manylinux2014_x86_64-netcdf to use FROM ghcr.io/h5py/manylinux2014_x86_64-hdf5 and explicitly build on top of the h5py manylinux image? I don't think that would guarantee compatibility, as that would require keeping releases in sync, but it should at least ensure there is one compatible version of h5py.

I tried that at first, even more collaboration to get things under the same image. But I confess I don't have the energy necessary to coordinate the non-Python dependencies of wheels, specially those that consume 3-4 C libraries, alone.

Maybe there is a wheel trick, advanced option, or something obvious that I'm missing but, unless the community works together, I don't see this how issue can be solved in the long run, just small burst of lucky for a release here and there.

@ZedThree
Copy link
Contributor

ZedThree commented Jul 1, 2024

Ah, that's a shame they weren't interested. This is a really hard community problem.

I've not actually been able to find any real differences between the built .sos, and in fact copying one version over the top of the other still gives me the same errors, so I don't think it's due to incompatible binaries.

@leoschwarz
Copy link

leoschwarz commented Aug 7, 2024

If anyone wants a small example for this bug: (edit actually: h5py/h5py#2453 (comment) already contains a similar example)

# Dockerfile
FROM python:3.12-bookworm
RUN pip install netcdf4==1.7.1.post1 h5py==3.11.0
# provoke.py
import numpy as np
import h5py
import netCDF4 as nc

data = np.random.rand(100, 100)

with h5py.File('data.h5', 'w') as file:
    file.create_dataset('data', data=data)

with nc.Dataset('data.nc', 'w') as file:
    file.createDimension('x', 100)
    file.createDimension('y', 100)
    file.createVariable('data', 'f8', ('x', 'y'))[:] = data

I found that if I import h5py after netCDF4, I can't reproduce the error, but if it's imported before netCDF4 it's a deterministic error for me. I wasn't able to reproduce in an aarch64 container, but maybe it is due to a lack of wheel, on amd64 (native linux PC) I have this issue.

copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
Unidata/netcdf4-python#1343.

Additionally because of not using 1.7.1 netcdf (which adds support for np>=2 compatibility https://docs.xarray.dev/en/stable/whats-new.html#id10) restrict xarray versions from very latest versions.

PiperOrigin-RevId: 673435320
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
Unidata/netcdf4-python#1343.

Additionally because of not using 1.7.1 netcdf (which adds support for np>=2 compatibility https://docs.xarray.dev/en/stable/whats-new.html#id10) restrict xarray versions from very latest versions.

PiperOrigin-RevId: 673435320
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
Unidata/netcdf4-python#1343.

Additionally because of not using 1.7.1 netcdf (which adds support for np>=2 compatibility) restrict numpy to <2.

PiperOrigin-RevId: 673435320
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
Unidata/netcdf4-python#1343.

Additionally because of not using 1.7.1 netcdf (which adds support for np>=2 compatibility) restrict numpy to <2.

PiperOrigin-RevId: 673435320
copybara-service bot pushed a commit to google-deepmind/torax that referenced this issue Sep 11, 2024
Unidata/netcdf4-python#1343.

Additionally because of not using 1.7.1 netcdf (which adds support for np>=2 compatibility) restrict numpy to <2.

PiperOrigin-RevId: 673569486
@rosepearson
Copy link

rosepearson commented Oct 9, 2024

I didn't have any problems when the upgrade to 1.7.1 first occured - however I have run into problems since.

Working automated testing with netcdf 1.7.1 (see conda list drop down line 194): https://github.com/rosepearson/GeoFabrics/actions/runs/10839705758/job/31267478081
Error in same automated testing with netcdf 1.7.1 (see conda list drop down line 194): https://github.com/rosepearson/GeoFabrics/actions/runs/11205845975/job/31145826055

Are their any changes to the packing associated with netcdf4 1.7.1. that have occurred since it's first release?

@ocefpaf
Copy link
Collaborator

ocefpaf commented Oct 9, 2024

What is your error? It is hard to read the logs and a small reproducer would really help. If it is:

ERROR tests/test_add_patches_ngaruroro/test_case.py::Test::test_result_dem - rasterio.errors.RasterioIOError: 
'tests/test_add_patches_ngaruroro/data/results/../initial_dem.nc' not recognized as being in a supported file format.
It could have been recognized by driver HDF5, but plugin gdal_HDF5.so is not available in your installation.
You may install it with 'conda install -c conda-forge libgdal-hdf5'

All you need to do is to add the hdf driver as specified in the error message. That is not a problem with netcdf4 BTW.

@rosepearson
Copy link

What is your error? It is hard to read the logs and a small reproducer would really help. If it is:

ERROR tests/test_add_patches_ngaruroro/test_case.py::Test::test_result_dem - rasterio.errors.RasterioIOError: 
'tests/test_add_patches_ngaruroro/data/results/../initial_dem.nc' not recognized as being in a supported file format.
It could have been recognized by driver HDF5, but plugin gdal_HDF5.so is not available in your installation.
You may install it with 'conda install -c conda-forge libgdal-hdf5'

All you need to do is to add the hdf driver as specified in the error message. That is not a problem with netcdf4 BTW.

Hello @ocefpaf, the funny thing is the two environments are created from identical environment.yml files. In the more recent envionments something seems to be going awry leading to the above error even through libgdal-hdf5 is installed. I think you may be right however about this error being upstream of netcdf 1.7.1.

I will look into switching from conda installed environment to a pip installed on to see if this fixes the issues.

If interest I checked the following libraries between the two builds and they are identical. Are their any others you would recommend checking for different versions leading to weird errors associated with saving / loading netCDF files? These errors span driver errors through to CRS attribute errors.

Library versions (All conda installed):
netcdf4 - 1.7.1 - same
libnetcdf - 4.9.2 - same
libgdal-hdf5 - 3.9.2 - same
libgdal-netcdf - 3.9.2 - same
libgdal-hdf4 - 3.9.2 - same
hdf4 - 4.2.15 - same
hdf4 - 1.14.3 - same
rioxarray - 0.17.0 same
rasterio - 1.3.11 - same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants