Dask 2024.8.1 and later is very slow #1267

tomwhite · 2024-10-07T15:54:47Z

This was originally reported in #1247 and a temporary pin introduced in #1248. I've opened this to track the issue so we can remove the pin.

tomwhite · 2024-10-07T15:56:04Z

I've opened dask/dask#11416

tomwhite · 2024-10-28T10:04:01Z

Unfortunately, it looks like Dask 2024.10.0 doesn't fix this, see https://github.com/sgkit-dev/sgkit/actions/runs/11551276595 which is taking 19 minutes to run, rather than 6 (with Dask 2024.08.0).

tomwhite · 2024-10-29T14:27:26Z

On further investigation what's happening is that locally defined functions that are passed to Dask map_blocks and that wrap Numba functions are being recompiled every time the (genomics) method is called. For example in pbs:

sgkit/sgkit/stats/popgen.py

Lines 598 to 600 in 9dd940e

    
           p = da.map_blocks( 
        
               lambda t: _pbs_cohorts(t, ct), t, chunks=shape, new_axis=3, dtype=np.float64 
        
           )

The lambda function calls a Numba function that is recompiled each time.

In most cases it's fairly easy to rewrite the code to avoid the use of locally defined functions. For PBS we can just do:

-    p = da.map_blocks(
-        lambda t: _pbs_cohorts(t, ct), t, chunks=shape, new_axis=3, dtype=np.float64
-    )
+    p = da.map_blocks(_pbs_cohorts, t, ct, chunks=shape, new_axis=3, dtype=np.float64)

The distance metrics code is more dynamic though, so it's not a simple fix:

sgkit/sgkit/distance/api.py

Lines 111 to 143 in 9dd940e

    
           try: 
        
               map_func_name = f"{metric}_map_{device}" 
        
               reduce_func_name = f"{metric}_reduce_{device}" 
        
               map_func = getattr(metrics, map_func_name) 
        
               reduce_func = getattr(metrics, reduce_func_name) 
        
               n_map_param = metrics.N_MAP_PARAM[metric] 
        
           except AttributeError: 
        
               raise NotImplementedError( 
        
                   f"Given metric: '{metric}' is not implemented for '{device}'." 
        
               ) 
        
           x = da.asarray(x) 
        
           if x.ndim != 2: 
        
               raise ValueError(f"2-dimensional array expected, got '{x.ndim}'") 
        
           # setting this variable outside of _pairwise to avoid it's recreation 
        
           # in every iteration, which eventually leads to increase in dask 
        
           # graph serialisation/deserialisation time significantly 
        
           metric_param = np.empty(n_map_param, dtype=x.dtype) 
        
           def _pairwise_cpu(f: ArrayLike, g: ArrayLike) -> ArrayLike: 
        
               result: ArrayLike = map_func(f[:, None, :], g, metric_param) 
        
               # Adding a new axis to help combine chunks along this axis in the 
        
               # reduction step (see the _aggregate and _combine functions below). 
        
               return result[..., np.newaxis] 
        
           def _pairwise_gpu(f: ArrayLike, g: ArrayLike) -> ArrayLike:  # pragma: no cover 
        
               result = map_func(f, g) 
        
               return result[..., np.newaxis] 
        
           pairwise_func = _pairwise_cpu 
        
           if device == "gpu": 
        
               pairwise_func = _pairwise_gpu  # pragma: no cover

tomwhite · 2024-10-29T15:17:13Z

I've fixed the non-distance functions in this commit: e83b52c

I'm not sure what to do about the distance functions at this point.

jeromekelleher · 2024-11-04T13:43:25Z

There's only two possible metrics right now ('euclidean' or 'correlation') so I vote we make the code less clever and just code in the function names directly for those two cases?

tomwhite · 2024-11-04T16:56:43Z

That's what I thought too - but there is another wrinkle. In this diff

tomwhite@e1119ca

previously metric_param was initialized outside the function to prevent Dask serialization/deserialization time (see the comment).

I suppose we could have a map of (shared) empty arrays keyed by dtype - but that doesn't seem very thread safe. Or we could initialize in the function, and leave a comment about how this previously caused Dask slowdown. Another option would be to remove the code!

jeromekelleher · 2024-11-05T10:51:13Z

Ah, I see. I'm reluctant to remove the code as we put quite a lot of effort in and it's our main usage of GPUs...

Perhaps @aktech would like to comment here? Is there an easy way to avoid using lambdas?

aktech · 2024-11-25T14:30:46Z

Perhaps @aktech would like to comment here? Is there an easy way to avoid using lambdas?

Apologies for late reply, I'll take a look at this soon. I think It might be possible.

tomwhite added performance upstream Used when our build breaks due to upstream changes labels Oct 7, 2024

tomwhite mentioned this issue Oct 7, 2024

Significant slowdown in Numba compiled functions from Dask 2024.8.1 dask/dask#11416

Closed

tomwhite mentioned this issue Oct 8, 2024

Install dask[dataframe] explicitly to fix upstream error #1261

Draft

tomwhite mentioned this issue Nov 4, 2024

Don't use local functions to wrap numba functions #1275

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dask 2024.8.1 and later is very slow #1267

Dask 2024.8.1 and later is very slow #1267

tomwhite commented Oct 7, 2024

tomwhite commented Oct 7, 2024

tomwhite commented Oct 28, 2024

tomwhite commented Oct 29, 2024

tomwhite commented Oct 29, 2024

jeromekelleher commented Nov 4, 2024

tomwhite commented Nov 4, 2024

jeromekelleher commented Nov 5, 2024

aktech commented Nov 25, 2024

Dask 2024.8.1 and later is very slow #1267

Dask 2024.8.1 and later is very slow #1267

Comments

tomwhite commented Oct 7, 2024

tomwhite commented Oct 7, 2024

tomwhite commented Oct 28, 2024

tomwhite commented Oct 29, 2024

tomwhite commented Oct 29, 2024

jeromekelleher commented Nov 4, 2024

tomwhite commented Nov 4, 2024

jeromekelleher commented Nov 5, 2024

aktech commented Nov 25, 2024