Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support large dataset preprocessing #452

Open
wants to merge 31 commits into
base: develop
Choose a base branch
from

Conversation

chiang-yuan
Copy link
Contributor

This PR tries to resolve OOM error and improve performance when loading very large dataset like MPTrj (1.58M) or even bigger ones. To use this file, mpi4py is needed.

Additional file preprocessing_data_mpi.py is added to ensure back compatibility and the refactoring is reduced to minimum, but ideally preprocessing_data.py could be replaced with the new file as long as we consider the import dependency on mpi4py

@ilyes319 ilyes319 changed the base branch from main to develop June 21, 2024 14:12
@ilyes319
Copy link
Contributor

Hey @chiang-yuan thank you. Is that ready to be merged?

@chiang-yuan
Copy link
Contributor Author

it still needs some refactoring. It seems like only modifying the ase read part is not enough. I will refactor all the hdf5 file writing part as well but it might take sometime...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants