-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dask to speed up SAM algorithm in mineral.py #168
Comments
Hi @aheermann can you put this on the agenda for the next meeting? I am really keen to see what your plan for this is. Also, it might be appropriate for us to split this into smaller tasks... this may end up a pretty large undertaking. |
Yep, I'll put it on the agenda. As to the undertaking, our idea for this was to just do some preliminary investigation and trials with this module, to see if it could work. We also have Jonathan and Dennis investigating using Pytorch for parallelization of the same code, so that we move forward with the most appropriate module |
Excellent
|
Early branch available at https://github.com/capstone-coal/pycoal/tree/dask_trial |
Thus far, we have been working on the SAM algorithm, trying to speed up pixel classification. We have tried several ways of splitting up the pixel processing into dask delayed methods in order to parallelize it. However, the overhead on the smaller data set we are using has not led to any speed ups yet. We are running on the f180201t01p00r05rdn_e_sc01_ort_img.hdr image, which using the original master branch as a baseline, runs about 3 hours 25 min un-parallelized on my machine. |
@aheermann can you please hyperlink the dataset.
Do you mean pycoal master branch? If not then this is not much to worry about as this is to be expected. Please provide more details. Thanks |
Since the last update, dask was temporarily put on hold as our personal machines were not powerful enough to take advantage of it. As we now have access to AWS, we will pick back up work on dask. It will now be one option of several, including Pytorch (#172) and Joblib (#177) for users when running Pycoal. |
@aheermann got it. |
The SAM algorithm in mineral.py takes the majority of time when trying to classify an image, specifically these loops.
Speeding up this method with parallelization should prove beneficial in reducing runtimes. I think that trying the Dask module would be a good start to speeding up the process.
https://github.com/dask/dask
https://dask.org/
The text was updated successfully, but these errors were encountered: