Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving away from pickles #18

Open
NeoChaos12 opened this issue Jan 13, 2023 · 0 comments
Open

Moving away from pickles #18

NeoChaos12 opened this issue Jan 13, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@NeoChaos12
Copy link
Collaborator

Pickling our performance datasets and surrogate models was chosen as a convenient solution during the development phase of the repo, but is inherently problematic from the perspective of long-term maintenance and support of the repo. In particular, pickling makes the shared datasets and models very sensitive to the exact dependency versions and system configuration used when originally pickling them. Case in point, #6. Therefore, a future release should focus on moving away from pickles. Specifically,

  • Performance Datasets: Relatively straightforward since Pandas DataFrames are quite flexible and support a variety of I/O options. Viable formats include HDF5, Feather and CSV. Each comes with its own set of advantages and disadvantages that need to be carefully weighed. It may be fine to keep using pickles for the interim data (checkpoints, metrics) generated during model training and only switch to a different format for the final dataset.
  • Surrogate Models: This is more nuanced since it involves a large number of moving parts. A different serialization scheme will need to be chosen based on the interoperability of SciKit-Learn, XGBoost and jahs_bench.surrogates.model.XGBSurrogate. Ultimately, it may be necessary to write a custom JSON encoder for XGBSurrogate that captures all relevant parameters of the trained model object and saves/loads them.
@NeoChaos12 NeoChaos12 added the enhancement New feature or request label Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant