-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: Reorganize how multi-device parallelism is implemented #322
REF: Reorganize how multi-device parallelism is implemented #322
Conversation
Hello @carterbox! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-07-17 18:41:45 UTC |
It is no longer safe to use the get_probe(), get_scan(), etc functions from the ptycho Reconstruction() context. These are marked as not implemented until they are fixed.
Purpose
Reduce communications overhead for multi-device reconstructions and hopefully speed up large reconstructions.
Separate multi-device parallelism from reconstruction logic (ptycho solver implementation).
Approach
Instead of synchronizing every update to the probe/object across all devices, use an approach that is more like distributed consensus where devices only communicate with their neighbors for object updates and synchronization only happens once per epoch.
The effect of this approach on performance is minor reduction in time per epoch for all numbers of GPUs, but increase in the number of epochs required for convergence for multi-gpu reconstructions because information is exchanged between GPUs less frequently.
Pre-Merge Checklists
Submitter
Reviewer