Use dask to speed up SAM algorithm in mineral.py #168

aheermann · 2019-09-18T20:50:33Z

The SAM algorithm in mineral.py takes the majority of time when trying to classify an image, specifically these loops.

# for each pixel in the image
    for x in range(M):

        for y in range(N):

            # read the pixel from the file
            pixel = data[x,y]

            # if it is not a no data pixel
            if not numpy.isclose(pixel[0], -0.005) and not pixel[0]==-50:

                # resample the pixel ignoring NaNs from target bands that don't overlap
                # TODO fix spectral library so that bands are in order
                resampled_pixel = numpy.nan_to_num(resample(pixel))

                # calculate spectral angles
                angles = spectral.spectral_angles(resampled_pixel[numpy.newaxis,
                                                                 numpy.newaxis,
                                                                 ...],
                                                  library.spectra)

                # normalize confidence values from [pi,0] to [0,1]
                for z in range(angles.shape[2]):
                    angles[0,0,z] = 1-angles[0,0,z]/math.pi

                # get index of class with largest confidence value
                index_of_max = numpy.argmax(angles)

                # get confidence value of the classied pixel
                score = angles[0,0,index_of_max]

                # classify pixel if confidence above threshold
                if score > threshold:

                    # index from one (after zero for no data)
                    classified[x,y] = index_of_max + 1

                    if scores_file_name is not None:
                        # store score value
                        scored[x,y] = score

Speeding up this method with parallelization should prove beneficial in reducing runtimes. I think that trying the Dask module would be a good start to speeding up the process.
https://github.com/dask/dask
https://dask.org/

The text was updated successfully, but these errors were encountered:

lewismc · 2019-09-18T22:28:09Z

Hi @aheermann can you put this on the agenda for the next meeting? I am really keen to see what your plan for this is. Also, it might be appropriate for us to split this into smaller tasks... this may end up a pretty large undertaking.

aheermann · 2019-09-18T22:45:46Z

Yep, I'll put it on the agenda. As to the undertaking, our idea for this was to just do some preliminary investigation and trials with this module, to see if it could work. We also have Jonathan and Dennis investigating using Pytorch for parallelization of the same code, so that we move forward with the most appropriate module

lewismc · 2019-09-19T00:49:11Z

Excellent

lewismc · 2019-09-27T19:08:21Z

Early branch available at https://github.com/capstone-coal/pycoal/tree/dask_trial

aheermann · 2019-10-01T01:56:30Z

Thus far, we have been working on the SAM algorithm, trying to speed up pixel classification. We have tried several ways of splitting up the pixel processing into dask delayed methods in order to parallelize it. However, the overhead on the smaller data set we are using has not led to any speed ups yet. We are running on the f180201t01p00r05rdn_e_sc01_ort_img.hdr image, which using the original master branch as a baseline, runs about 3 hours 25 min un-parallelized on my machine.

lewismc · 2019-10-01T13:52:44Z

@aheermann can you please hyperlink the dataset.
A few more questions

which has a baseline un-parallelized runtime

Do you mean pycoal master branch? If not then this is not much to worry about as this is to be expected. Please provide more details. Thanks

aheermann · 2019-10-17T22:30:31Z

Since the last update, dask was temporarily put on hold as our personal machines were not powerful enough to take advantage of it. As we now have access to AWS, we will pick back up work on dask. It will now be one option of several, including Pytorch (#172) and Joblib (#177) for users when running Pycoal.

lewismc · 2019-10-17T23:09:24Z

@aheermann got it.
Thinking about the abstraction layer here is an important part of engineering a good solution. Please start thinking about that. It will require you to work with other in the group.

aheermann added enhancement mineral labels Sep 18, 2019

aheermann self-assigned this Sep 18, 2019

lewismc added this to the 0.6 milestone Sep 18, 2019

aheermann mentioned this issue Oct 17, 2019

Parallelize SAM algorithm in mineraly.py using Joblib #177

Closed

lewismc removed this from the 0.6 milestone Nov 26, 2019

lewismc unassigned aheermann Nov 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dask to speed up SAM algorithm in mineral.py #168

Use dask to speed up SAM algorithm in mineral.py #168

aheermann commented Sep 18, 2019

lewismc commented Sep 18, 2019

aheermann commented Sep 18, 2019

lewismc commented Sep 19, 2019 via email •

edited

Loading

lewismc commented Sep 27, 2019

aheermann commented Oct 1, 2019 •

edited

Loading

lewismc commented Oct 1, 2019 •

edited

Loading

aheermann commented Oct 17, 2019 •

edited

Loading

lewismc commented Oct 17, 2019

Use dask to speed up SAM algorithm in mineral.py #168

Use dask to speed up SAM algorithm in mineral.py #168

Comments

aheermann commented Sep 18, 2019

lewismc commented Sep 18, 2019

aheermann commented Sep 18, 2019

lewismc commented Sep 19, 2019 via email • edited Loading

lewismc commented Sep 27, 2019

aheermann commented Oct 1, 2019 • edited Loading

lewismc commented Oct 1, 2019 • edited Loading

aheermann commented Oct 17, 2019 • edited Loading

lewismc commented Oct 17, 2019

lewismc commented Sep 19, 2019 via email •

edited

Loading

aheermann commented Oct 1, 2019 •

edited

Loading

lewismc commented Oct 1, 2019 •

edited

Loading

aheermann commented Oct 17, 2019 •

edited

Loading