Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Status #51

Open
kmuehlbauer opened this issue Jan 24, 2022 · 10 comments
Open

Project Status #51

kmuehlbauer opened this issue Jan 24, 2022 · 10 comments

Comments

@kmuehlbauer
Copy link
Collaborator

@jjhelmus

Could you please give a short status on the project?

It seems there is at least some interest to enhance and improve pyfive.

Xref:

Ping @bmaranville @woutdenolf

It would be great if there would be a way forward for pyfive.

pyfive is on the wanted list as h5netcdf backend quite a while and it looks like it has most features already in or in pending PR.

I would happily join a team of maintainers if one will be established.

@jjhelmus
Copy link
Owner

jjhelmus commented May 9, 2022

I have let this project idle for too long, sorry about that. I no longer work with HDF5 files often so have little need for this library, it no longer is an itch I need to scratch.

I'd be happy to add additional maintainers who would be willing to triage bugs, review and merge PR and prepare releases. @kmuehlbauer I've invited you as a collaborator on this repository.

@kmuehlbauer
Copy link
Collaborator Author

@jjhelmus Thanks Jonathan, I'll have a look on the current issues and pull requests in the next days.

@bmaranville @woutdenolf Are you still in the works with HDF5? It would be great if you could join the team on pyfive. It looks like you are much more into the depths of HDF5 format.

@bmaranville
Copy link
Collaborator

I am indeed still working with HDF5 files regularly... I would be happy to help where I can.

@bnlawrence
Copy link

So after a long silence, I will be trying to make some significant changes to (a version) of this library, and so I'm keen to understand whether the maintainers of this library would be keen to work with us on those, or whether we should fork and rename. The major changes would be:

  • full support for lazy chunk reading
  • full support for the H5D.DatasetID read interface (and the use of it by default for Dataset instances).
  • a modification to the behaviour of Dataset to make it cache everything it could need so that it can not only support lazy data access, but can operate efficiently in Dask threads (where we want to avoid each thread needing to read the b-tree etc, so we cache the b-tree when we instantiate the new DatasetID class instances). In practice this means that the Dataset would no longer have a link to the parent dataobjects instance, and utiise only the DatasetID instance for data access, and the DatasetID would have copied and cached everything it needs from the dataobjects instance).

There are some new "dummy classes" to support the use of pyfive in the backend of h5netcdf, and some minor bug fixes to core functionality as well,

The current status of this work is that is in a bunch of poorly documented branches of our existing fork, and so I'm asking now, so we can work out what's the best strategy to mature this work to a release state.

If we can do that efficiently here, that would be our preferred option, in which case we have an issue of how to bring it in. Unfortunately the change to DatasetID is quite a mega change, and so I can no longer see a nice way to bring this in as a bunch of easy to consume incremental pull requests. On the other hand, I think if we clean up what we have now into one branch, the motivation and the touch points should be quite clear.

@bnlawrence
Copy link

bnlawrence commented Dec 21, 2024

(I should say that if desired we could find someone from the NCAS CMS team to join the maintenance team here if you go with the "let's try and avoid a fork" route - you don't want me to be helping you maintain anything, even if I've instigated the changes, my time is not very reliable as you can see from how long it has taken for me to come back to this. If we fork it, we'd have to be assigning someone to maintenance anyway.)

@kmuehlbauer
Copy link
Collaborator Author

@bnlawrence From my h5netcdf maintainer view these new features are looking great. Also from my experience as xarray core-dev it would be really nice to have a complete pure python chain of flow. This will definitely be attractive for users of h5netcdf and xarray, too.

@jjhelmus
Copy link
Owner

jjhelmus commented Jan 2, 2025

@bmaranville I'd be happy to add you as a maintainer of the project and @kmuehlbauer or anyone else who is interested. If you wanted to keep the development here that is fine or move it to a different org I don't have a strong preference. I can also work to add maintainers to the PyPI project.

It has been a number of years since I've worked with HDF5 or netCDF files so my own interest in the project is limited. I can review pull requests but it would be good to have other who can as well.

@bmaranville
Copy link
Collaborator

Sure, I'd be happy to be added as a maintainer. I have been setting up trusted publisher workflows for pypi on other projects recently.

@kmuehlbauer
Copy link
Collaborator Author

kmuehlbauer commented Jan 2, 2025

@jjhelmus I'd be happy to be part of this maintainer team. My interest would be a pure Python chain pyfive -> h5netcdf -> xarray and further down to xradar and Py-ART (😉) and wradlib.

It would be great to have others on board, too, especially with @bnlawrence' outlined plan for enhancing pyfive in the near future.

@bnlawrence
Copy link

@jjhelmus You might want to consider adding @valeriupredoi to your maintainer team too. I've asked him to look at how you could extend your actions framework to include testing against files in S3 and available on https file servers with range-get capability (we have that working ourselves, but for obvious reasons we can't build it into the standard test suite). (V is a core maintainer for a number of python packages, and while he doesn't want to claim any expertise on the details of pyfive, he's happy to help maintain it from a testing, dependency and future python point of view.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants