You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pickling our performance datasets and surrogate models was chosen as a convenient solution during the development phase of the repo, but is inherently problematic from the perspective of long-term maintenance and support of the repo. In particular, pickling makes the shared datasets and models very sensitive to the exact dependency versions and system configuration used when originally pickling them. Case in point, #6. Therefore, a future release should focus on moving away from pickles. Specifically,
Performance Datasets: Relatively straightforward since Pandas DataFrames are quite flexible and support a variety of I/O options. Viable formats include HDF5, Feather and CSV. Each comes with its own set of advantages and disadvantages that need to be carefully weighed. It may be fine to keep using pickles for the interim data (checkpoints, metrics) generated during model training and only switch to a different format for the final dataset.
Surrogate Models: This is more nuanced since it involves a large number of moving parts. A different serialization scheme will need to be chosen based on the interoperability of SciKit-Learn, XGBoost and jahs_bench.surrogates.model.XGBSurrogate. Ultimately, it may be necessary to write a custom JSON encoder for XGBSurrogate that captures all relevant parameters of the trained model object and saves/loads them.
The text was updated successfully, but these errors were encountered:
Pickling our performance datasets and surrogate models was chosen as a convenient solution during the development phase of the repo, but is inherently problematic from the perspective of long-term maintenance and support of the repo. In particular, pickling makes the shared datasets and models very sensitive to the exact dependency versions and system configuration used when originally pickling them. Case in point, #6. Therefore, a future release should focus on moving away from pickles. Specifically,
jahs_bench.surrogates.model.XGBSurrogate
. Ultimately, it may be necessary to write a custom JSON encoder forXGBSurrogate
that captures all relevant parameters of the trained model object and saves/loads them.The text was updated successfully, but these errors were encountered: