Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: "'ProxifyHostFile' object has no attribute 'fast'" upon JIT unspill #704

Closed
lmmx opened this issue Aug 17, 2021 · 9 comments
Closed

Comments

@lmmx
Copy link

lmmx commented Aug 17, 2021

(Reposted from dask/distributed bug tracker after mistakenly submitting it there)

I'm trying to achieve the speed of cuDF on individual files, for a list of 10 files, and cannot do so. When I tried to run them in parallel using multiprocessing, I got a CUDA OOM error and was directed to use dask-cudf, which uses 'memory spilling', and I've read more about how that works.

Unfortunately, I get the following error,

AttributeError("'ProxifyHostFile' object has no attribute 'fast'")

and the only guidance to troubleshoot I can find is to "use unproxy()", which I don't think I can do here (as it's internal to the dask/distributed code).

tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f647e9fc130>>, <Task finished name='Task-387' coro=<Worker.memory_monitor() done, defined at /home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/worker.py:3043> exception=AttributeError("'ProxifyHostFile' object has no attribute 'fast'")>)

What happened:

The code succeeds with say 3 threads per worker (the number of threads per worker essentially dictates a sliding scale of how parallel the CSVs are read) and achieves a faster time than 1 thread per worker (completely serial), but when a higher number e.g. 15 threads per worker is used, JIT spilling is supposed to take place, and the error occurs where the spilling does not happen correctly.

What you expected to happen:

The CUDA OOM error is supposed to be avoided due to JIT spilling. When JIT spilling fails here I get the CUDA OOM error I was trying to avoid.

Minimal Complete Verifiable Example:

The source files here are the 10 TSVs comprising the WIT dataset, available here

import dask.dataframe as dd
from dask_cuda import LocalCUDACluster
import dask_cudf
from dask.distributed import Client
from pathlib import Path
import time

if __name__ == "__main__":
    cluster = LocalCUDACluster(threads_per_worker=15, jit_unspill=True) # runs on 1 local GPU
    client = Client(cluster)

    store_p = Path.home() / "dev/wikitransp/src/wikitransp/data/store/"

    #input_tsv_list = [store_p / "wit_v1.train.all-1percent_sample.tsv"]
    input_tsvs = [
        store_p / f"wit_v1.train.all-0000{i}-of-00010.tsv.gz"
        for i in range(10)
    ]
    #input_tsvs = store_p / f"wit_v1.train.all-0000*-of-00010.tsv.gz"

    print(f"Data files: {[i.name for i in input_tsvs]}")

    def read_tsv(tsv_path):
        fields = ["mime_type"]
        df = dask_cudf.read_csv(tsv_path, sep="\t", usecols=fields, blocksize=None, chunksize=None)
        return df

    t0 = time.time()
    df = read_tsv(input_tsvs)
    pngs = df[df.mime_type == "image/png"]
    print(pngs.compute(scheduler=client.get))
    print(list(pngs))
    t1 = time.time()
    print(f"dask-cudf took: {t1-t0}s")
Data files: ['wit_v1.train.all-00000-of-00010.tsv.gz', 'wit_v1.train.all-00001-of-00010.tsv.gz', 'wit_v1.train.all-00002-of-00010.tsv.gz', 'wit_v1.train.all-00003-of-00010.tsv.gz', 'wit_v1.train.all-00004-of-00010.tsv.gz', 'wit_v1.train.all-00005-of-00010.tsv.gz', 'wit_v1.train.all-00006-of-00010.tsv.gz', 'wit_v1.train.all-00007-of-00010.tsv.gz', 'wit_v1.train.all-00008-of-00010.tsv.gz', 'wit_v1.train.all-00009-of-00010.tsv.gz']          
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f647e9fc130>>, <Task finished name='Task-387' coro=<Worker.memory_monitor() done, defined at /home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/worker.py:3043> exception=AttributeError("'ProxifyHostFile' object has no attribute 'fast'")>)         
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                                                                                        
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback                                                                                                                                                                                                                                                                                                                                  
    ret = callback()                                                                                                                                                                                                         
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()                                                                                                                                                                                                          
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/worker.py", line 3101, in memory_monitor
    if not self.data.fast:                                                                                   
AttributeError: 'ProxifyHostFile' object has no attribute 'fast'
distributed.worker - WARNING - Compute Failed                                                               
Function:  read_csv
args:      (PosixPath('/home/louis/dev/wikitransp/src/wikitransp/data/store/wit_v1.train.all-00005-of-00010.tsv.gz'))
kwargs:    {'sep': '\t', 'usecols': ['mime_type'], 'blocksize': None}
Exception: MemoryError('std::bad_alloc: CUDA error at: /home/louis/miniconda3/envs/cudf_test/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory')

Anything else we need to know?:

Environment:

conda list | grep dask

dask                      2021.7.0           pyhd3eb1b0_0  
dask-core                 2021.7.0           pyhd3eb1b0_0  
dask-cuda                 21.08.00                 py38_0    rapidsai
dask-cudf                 21.08.02        py38_gf6d31fa95d_0    rapidsai
  • Dask version: 2021.7.0
  • Python version: 3.8.10
  • Operating System: Linux
  • Install method (conda, pip, source): conda (dask: main conda channel; dask-cuda: rapidsai channel)
    • I originally installed dask from conda-forge and then installing dask-cudf from rapidsai superseded the conda-forge dask so it then came from the main conda channel.
@madsbk
Copy link
Member

madsbk commented Aug 18, 2021

Thanks for the bug report @lmmx, much appreciated.

I think this is because of two issues.

  1. JIT-unspill was missing the fast attribute, which JIT-unspill: warn when spill to disk triggers #705 should fix.
  2. Your code is triggering Dask's out of CPU memory handle.

JIT-unspill doesn't support spill-to-disk at the moment but it is on our to-do list: #657.
However, I am surprised that your example would run out of CPU memory. Can I get you to try #705 and report what happens?

@lmmx
Copy link
Author

lmmx commented Aug 18, 2021

When I use the warn_spill_to_disk branch the warning indicates "a memory leak or the memory may not be released to the OS" and directs me to the docs on memtrim (but the docs do mention that this may be a false alarm).

There is a high load (monitored via htop), the dataset is only 25GB in total and the sizes of each gzipped TSV is 2.5GB compressed, ~6.5GB uncompressed (2.6x larger), so the entire dataset uncompressed is likely ~65GB. Since blocksize is None, the datasets are not partitioned when being spread amongst workers. The same result is seen with 10 threads per worker. It doesn't seem right to me either that there is more CPU memory being used than the uncompressed dataset takes up on disk...

cd dev
git clone -b warn_spill_to_disk [email protected]:madsbk/dask-cuda.git
pip install -e dask-cuda
cd -
python dask_test_cudf_multiple_cluster.py

distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Data files: ['wit_v1.train.all-00000-of-00010.tsv.gz', 'wit_v1.train.all-00001-of-00010.tsv.gz', 'wit_v1.train.all-00002-of-00010.tsv.gz', 'wit_v1.train.all-00003-of-00010.tsv.gz', 'wit_v1.train.all-00004-of-00010.tsv.gz', 'wit_v1.train.all-00005-of-00010.tsv.gz', 'wit_v1.train.all-00006-of-00010.tsv.gz', 'wit_v1.train.all-00007-of-00010.tsv.gz', 'wit_v1.train.all-00008-of-00010.tsv.gz', 'wit_v1.train.all-00009-of-00010.tsv.gz']
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.84 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 91.40 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 93.89 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 96.15 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 98.41 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - Worker is at 80% memory usage. Pausing worker.  Process memory: 100.71 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 100.71 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 102.69 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 104.76 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 106.78 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 108.39 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 106.41 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - Worker is at 78% memory usage. Resuming worker. Process memory: 98.83 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 98.83 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 91.01 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - Compute Failed
Function:  read_csv
args:      (PosixPath('/home/louis/dev/wikitransp/src/wikitransp/data/store/wit_v1.train.all-00001-of-00010.tsv.gz'))
kwargs:    {'sep': '\t', 'usecols': ['mime_type'], 'blocksize': None}
Exception: MemoryError('std::bad_alloc: CUDA error at: /home/louis/miniconda3/envs/cudf_test/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory')

distributed.worker - WARNING - Compute Failed
Function:  read_csv
args:      (PosixPath('/home/louis/dev/wikitransp/src/wikitransp/data/store/wit_v1.train.all-00005-of-00010.tsv.gz'))
kwargs:    {'sep': '\t', 'usecols': ['mime_type'], 'blocksize': None}
Exception: MemoryError('std::bad_alloc: CUDA error at: /home/louis/miniconda3/envs/cudf_test/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory')

distributed.worker - WARNING - Compute Failed
Function:  read_csv
args:      (PosixPath('/home/louis/dev/wikitransp/src/wikitransp/data/store/wit_v1.train.all-00004-of-00010.tsv.gz'))
kwargs:    {'sep': '\t', 'usecols': ['mime_type'], 'blocksize': None}
Exception: MemoryError('std::bad_alloc: CUDA error at: /home/louis/miniconda3/envs/cudf_test/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory')

distributed.worker - WARNING - Compute Failed
Function:  read_csv
args:      (PosixPath('/home/louis/dev/wikitransp/src/wikitransp/data/store/wit_v1.train.all-00003-of-00010.tsv.gz'))
kwargs:    {'sep': '\t', 'usecols': ['mime_type'], 'blocksize': None}
Exception: MemoryError('std::bad_alloc: CUDA error at: /home/louis/miniconda3/envs/cudf_test/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory')

Traceback (most recent call last):
  File "dask_test_cudf_multiple_cluster.py", line 35, in <module>
    print(pngs.compute(scheduler=client.get))
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/dask/base.py", line 286, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/dask/base.py", line 568, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/client.py", line 2704, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/client.py", line 2018, in gather
    return self.sync(
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/client.py", line 859, in sync
    return sync(
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/utils.py", line 326, in sync
    raise exc.with_traceback(tb)
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/utils.py", line 309, in f
    result[0] = yield future
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/client.py", line 1883, in _gather
    raise exception.with_traceback(traceback)
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/cudf/io/csv.py", line 70, in read_csv
    return libcudf.csv.read_csv(
  File "cudf/_lib/csv.pyx", line 393, in cudf._lib.csv.read_csv
MemoryError: std::bad_alloc: CUDA error at: /home/louis/miniconda3/envs/cudf_test/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory

The htop profile looks like this (memory is the flatter line):

 Device 0 [GeForce RTX 3090] PCIe GEN 2@16x RX: 86.91 MiB/s TX: 4.883 MiB/s
 GPU 435MHz  MEM 810MHz  TEMP  44°C FAN  39% POW  41 / 350 W
 GPU[||||||                       19%] MEM[               0.444Gi/23.696Gi]
   ┌────────────────────────────────────────────────────────────────────────────────────┐
100│                                                                               GPU 0│
   │                                              ┌─┐                                MEM│
   │                                           ┌──┼─┼──┐                                │
   │                                          ┌┼┐ │ │  │                                │
75%│                                          │││ │ │  │                                │
   │                                          │││ │ │  │                                │
   │                                          │││ │ │  │                                │
   │                                          │││ │ │  │                                │
50%│                                          ││└─┘ │  │                                │
   │                                          ││    │ ┌┼┐                               │
   │                                          ││    │ │││             ┌─┐         ┌─┐   │
   │                                          ││    └─┘└┼┐            │ │ ┌───┐   │ │   │
25%│                            ┌─────────────┘│        ││          ┌─┘ └─┘   └───┘ └─┐ │
   │                        ┌───┘              │        ││      ┌───┘                 └─│
   │                      ┌─┘                  │        ││    ┌─┘                       │
   │──────────────────────┴────────────────────┘        └┴───┬┘                         │
 0%│                                                         └──────────────────────────│
   └────────────────────────────────────────────────────────────────────────────────────┘

Edit: I've run that again after changing the console logger format (at L172) in dask-cuda/dask_cuda/proxify_host_file.py so it shows times (it doesn't all take place immediately):

Click to show logs with timings

        self.logger = logging.getLogger("distributed.worker")
        log_format = logging.Formatter('[%(asctime)s] [%(levelname)s] - %(message)s')
        console = logging.StreamHandler()
        console.setFormatter(log_format)
        self.logger.addHandler(console)

distributed.preloading - INFO - Import preload module: dask_cuda.initialize                                                                                                                                                                    
[2021-08-18 13:57:36,471] [INFO] -       Start worker at:      tcp://127.0.0.1:35633                                                                                                                                                                                                                                                                                                                                                                                                          
[2021-08-18 13:57:36,471] [INFO] -          Listening to:      tcp://127.0.0.1:35633                                                                                                                                                                                                                                                                                                                                                                                                          
[2021-08-18 13:57:36,471] [INFO] -          dashboard at:            127.0.0.1:41923                                                                                                                                                           
[2021-08-18 13:57:36,471] [INFO] - Waiting to connect to:      tcp://127.0.0.1:45067                                                                                                                                                           
[2021-08-18 13:57:36,471] [INFO] - -------------------------------------------------                                                                                                                                                           
[2021-08-18 13:57:36,472] [INFO] -               Threads:                         15                                                                                                                                                           
[2021-08-18 13:57:36,472] [INFO] -                Memory:                 125.78 GiB                                                                                                                                                                                                                                                                                                                                                                                                          
[2021-08-18 13:57:36,472] [INFO] -       Local Directory: /home/louis/dev/testing/wikitransp/dask-worker-space/worker-26ok1e6b                                                                                                                                                                                                                                                                                                                                                                
[2021-08-18 13:57:36,472] [INFO] - Starting Worker plugin RMMSetup-65b8d033-3f10-46df-84ac-2693e65e5999                                                                                                                                        
[2021-08-18 13:57:36,472] [INFO] - Starting Worker plugin CPUAffinity-10e3d01b-ed3e-4343-b415-905e899cd35b                                                                                                                                                                                                                                                                                                                                                                                                                                                        
[2021-08-18 13:57:36,472] [INFO] - -------------------------------------------------                                                                                                                                                                                                                                                                                                                                                                                                          
[2021-08-18 13:57:36,476] [INFO] -         Registered to:      tcp://127.0.0.1:45067                                                                                                                                                                                                                                                                                                                                                                                                          
[2021-08-18 13:57:36,476] [INFO] - -------------------------------------------------                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
Data files: ['wit_v1.train.all-00000-of-00010.tsv.gz', 'wit_v1.train.all-00001-of-00010.tsv.gz', 'wit_v1.train.all-00002-of-00010.tsv.gz', 'wit_v1.train.all-00003-of-00010.tsv.gz', 'wit_v1.train.all-00004-of-00010.tsv.gz', 'wit_v1.train.all-00005-of-00010.tsv.gz', 'wit_v1.train.all-00006-of-00010.tsv.gz', 'wit_v1.train.all-00007-of-00010.tsv.gz', 'wit_v1.train.all-00008-of-00010.tsv.gz', 'wit_v1.train.all-00009-of-00010.tsv.gz']                                                                                                                  
[2021-08-18 13:58:17,679] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:17,679] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.53 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.53 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:17,879] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:17,879] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 91.09 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 91.09 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:18,078] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:18,079] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 93.64 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 93.64 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:18,278] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:18,278] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 96.20 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 96.20 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:18,480] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:18,481] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 98.80 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                                                                                                      
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 98.80 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                                                                                                             
[2021-08-18 13:58:18,894] [WARNING] - Worker is at 80% memory usage. Pausing worker.  Process memory: 101.34 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Worker is at 80% memory usage. Pausing worker.  Process memory: 101.34 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:18,894] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
[2021-08-18 13:58:18,895] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 101.34 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 101.34 GiB -- Worker memory limit: 125.78 GiB
[2021-08-18 13:58:18,896] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>        
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>        
[2021-08-18 13:58:18,896] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 104.03 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                          
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 104.03 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                 
[2021-08-18 13:58:19,078] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:19,078] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 106.36 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                          
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 106.36 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                 
[2021-08-18 13:58:19,278] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:19,278] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 108.67 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                          
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 108.67 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                 
[2021-08-18 13:58:19,481] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:19,482] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 109.82 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                          
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 109.82 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                 
[2021-08-18 13:58:19,679] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:19,679] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 111.27 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                          
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 111.27 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                 
[2021-08-18 13:58:19,878] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:19,878] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 111.81 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                          
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 111.81 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                 
[2021-08-18 13:58:20,078] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:20,078] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 111.07 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                          
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 111.07 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                             
[2021-08-18 13:58:20,279] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                           
[2021-08-18 13:58:20,279] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 104.97 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                                                                                                     
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 104.97 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                                                                                                            
[2021-08-18 13:58:20,571] [WARNING] - Worker is at 74% memory usage. Resuming worker. Process memory: 94.17 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                                                                                            
distributed.worker - WARNING - Worker is at 74% memory usage. Resuming worker. Process memory: 94.17 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                                                                                                   
[2021-08-18 13:58:20,571] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
[2021-08-18 13:58:20,571] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 94.17 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 94.17 GiB -- Worker memory limit: 125.78 GiB
[2021-08-18 13:58:20,678] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>
[2021-08-18 13:58:20,678] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 89.88 GiB -- Worker memory limit: 125.78 GiB
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 89.88 GiB -- Worker memory limit: 125.78 GiB
[2021-08-18 13:58:25,678] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                       
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                              
[2021-08-18 13:58:25,678] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.10 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.10 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:25,878] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                               
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                              
[2021-08-18 13:58:25,878] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.36 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.36 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:26,078] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                       
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                              
[2021-08-18 13:58:26,078] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.63 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.63 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:26,278] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                       
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                              
[2021-08-18 13:58:26,278] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.91 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 88.91 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:26,479] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                       
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                              
[2021-08-18 13:58:26,479] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 89.17 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 89.17 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:26,679] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                       
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
[2021-08-18 13:58:26,679] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 89.43 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 89.43 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:26,878] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                       
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                              
[2021-08-18 13:58:26,878] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 89.71 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                           
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 89.71 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                  
[2021-08-18 13:58:27,077] [WARNING] - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                    
distributed.worker - WARNING - JIT-Unspill doesn't support spilling to Disk, see <https://github.com/rapidsai/dask-cuda/issues/657>                                                                                                                                                                                                                                                                                                                                                                                                                               
[2021-08-18 13:58:27,078] [WARNING] - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 89.96 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                                                                                                      
distributed.worker - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker.html#memtrim for more information. -- Unmanaged memory: 89.96 GiB -- Worker memory limit: 125.78 GiB                                                                                                                                                                                                                                                                                                                                                             
[2021-08-18 13:58:27,711] [WARNING] - Compute Failed                                                                                                                                                                                                                             
Function:  read_csv                                                                                                                                            
args:      (PosixPath('/home/louis/dev/wikitransp/src/wikitransp/data/store/wit_v1.train.all-00005-of-00010.tsv.gz'))                   
kwargs:    {'sep': '\t', 'usecols': ['mime_type'], 'blocksize': None}                                                                   
Exception: MemoryError('std::bad_alloc: CUDA error at: /home/louis/miniconda3/envs/cudf_test/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory')                                                                                                                                     
                                                                                                                                        
distributed.worker - WARNING - Compute Failed                                                                                                                                                                                                                                    
Function:  read_csv                                                                                                                                            
args:      (PosixPath('/home/louis/dev/wikitransp/src/wikitransp/data/store/wit_v1.train.all-00005-of-00010.tsv.gz'))                   
kwargs:    {'sep': '\t', 'usecols': ['mime_type'], 'blocksize': None}                                                                   
Exception: MemoryError('std::bad_alloc: CUDA error at: /home/louis/miniconda3/envs/cudf_test/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory')                                                                                                                                     
                                                                                                                                        
Traceback (most recent call last):                                                                                                      
  File "dask_test_cudf_multiple_cluster.py", line 35, in <module>                                                                       
    print(pngs.compute(scheduler=client.get))                                                                                           
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/dask/base.py", line 286, in compute                           
    (result,) = compute(self, traverse=False, **kwargs)                                                                                 
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/dask/base.py", line 568, in compute                           
    results = schedule(dsk, keys, **kwargs)                                                                                             
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/client.py", line 2704, in get                     
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)                                                             
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/client.py", line 2018, in gather                  
    return self.sync(                                                                                                                   
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/client.py", line 859, in sync                     
    return sync(                                                                                                                        
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/utils.py", line 326, in sync                      
    raise exc.with_traceback(tb)                                                                                                        
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/utils.py", line 309, in f                                                                                                                                                                  
    result[0] = yield future                                                                                                            
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/tornado/gen.py", line 762, in run                             
    value = future.result()                                                                                                             
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/distributed/client.py", line 1883, in _gather                                                                                                                                                                                                       
    raise exception.with_traceback(traceback)                                                                                           
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/contextlib.py", line 75, in inner                                           
    return func(*args, **kwds)                                                                                                                                                                                                                                                   
  File "/home/louis/miniconda3/envs/cudf_test/lib/python3.8/site-packages/cudf/io/csv.py", line 70, in read_csv                         
    return libcudf.csv.read_csv(                                                                                                                               
  File "cudf/_lib/csv.pyx", line 393, in cudf._lib.csv.read_csv                                                                                                
MemoryError: std::bad_alloc: CUDA error at: /home/louis/miniconda3/envs/cudf_test/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory                                                                                                                                                  
[2021-08-18 13:58:27,768] [INFO] - Stopping worker at tcp://127.0.0.1:35633 

@madsbk
Copy link
Member

madsbk commented Aug 18, 2021

I think it is because of threads_per_worker=15, try set that to threads_per_worker=1 (or leave out the argument completely)

The following works for me without any spilling (15GB CPU and 8GB GPU peak memory usage)

from dask.sizeof import sizeof
import dask.dataframe as dd
from distributed.client import wait
from dask_cuda import LocalCUDACluster
import dask_cudf
from dask.distributed import Client
from pathlib import Path
import time

if __name__ == "__main__":
    cluster = LocalCUDACluster(threads_per_worker=1, n_workers=1, jit_unspill=True) # runs on 1 local GPU
    client = Client(cluster)

    store_p = Path.home() / "repos" / "dask-cuda" / "csv_test"

    #input_tsv_list = [store_p / "wit_v1.train.all-1percent_sample.tsv"]
    input_tsvs = [
        store_p / f"wit_v1.train.all-0000{i}-of-00010.tsv.gz"
        for i in range(10)
    ]
    #input_tsvs = store_p / f"wit_v1.train.all-0000*-of-00010.tsv.gz"

    print(f"Data files: {[i.name for i in input_tsvs]}")

    def read_tsv(tsv_path):
        fields = ["mime_type"]
        df = dask_cudf.read_csv(tsv_path, sep="\t", usecols=fields, blocksize=None, chunksize=None)
        return df

    t0 = time.time()
    df = read_tsv(input_tsvs)
    pngs = df[df.mime_type == "image/png"]
    pngs = pngs.persist()
    wait(pngs)
    print(f"dask-cudf persist took: {time.time()-t0}s")

    pngs = pngs.compute()
    print("sizeof(pngs): ", sizeof(pngs))
    print(f"dask-cudf took: {time.time()-t0}s")

@lmmx
Copy link
Author

lmmx commented Aug 18, 2021

Ah yes I was already aware of that, but my point was to parallelise the work being done here (to achieve the speedup obtained by cudf.read_csv over pandas.read_csv, roughly 24 seconds vs. 50 seconds). Using threads_per_worker=1 is completely serial, and gives roughly 10 x 24 seconds = 240 seconds run-time. With a very simple 'scheduling' of 1.5 second staggered multiprocessing calls to cudf.read_csv I was able to delay the memory spikes, but these delays added up, and I was told to look into dask-cuda instead.

This issue originated when I was trying to use simple multiprocessing with cudf.read_csv (rapidsai/cudf#9042) and I was directed to raise the issue here instead, as (in hindsight, obviously) cuDF itself provides no support for CUDA Out Of Memory. The 'spilling' feature is designed to handle such an error, but I guess it's not ready to use yet (so there is currently no solution).

(I now notice I didn't link back to that as the source of this issue, apologies)

Thanks for the wait/persist pattern example, I knew I was missing something when trying to use that...

I guess I'll revisit to try again when JIT unspilling is finished up (is there an expected release date?). Thanks for your help -- if you want to try anything else let me know. Will leave this for you to close.

@pentschev
Copy link
Member

Have you tried dask_cudf.read_csv? I think that's supposed to work similarly to what you're trying to achieve.

It's worth noting that threads_per_worker=1 is intentional in Dask-CUDA, this is because all the GPU work today is only running on the default CUDA stream, so usually there's no speedup in increasing the number of threads per worker. There's some work on exploring PTDS in #517 as @quasiben mentioned in some other issue already, but that's not complete and blocked by cuDF refactoring. One of the challenges is also to handle memory GPU memory consumption when there are multiple threads on the same GPU, which is what you're experiencing now. Of course you're welcome to explore, but honestly I don't think you'll be able to get much more performance with multiple threads until the PTDS work is complete.

@madsbk
Copy link
Member

madsbk commented Aug 18, 2021

There is no expected release date for spilling to disk but I don't think it would work in any case. Dask and Dask-CUDA is not able to spill data used by running tasks. It is only data that are being staged on workers that can be spilled.

I did a talk at the Dask Summit 2021, which explain the inner workings of spilling, you might find interesting: https://www.youtube.com/watch?v=mHWk7y2p-NM

@lmmx
Copy link
Author

lmmx commented Aug 18, 2021

Have you tried dask_cudf.read_csv? I think that's supposed to work similarly to what you're trying to achieve.

The code is using dask_cudf.read_csv! :-)

Yep I get that it's all WIP, appreciate it.

@lmmx
Copy link
Author

lmmx commented Aug 18, 2021

There is no expected release date for spilling to disk but I don't think it would work in any case. Dask and Dask-CUDA is not able to spill data used by running tasks. It is only data that are being staged on workers that can be spilled.

I did a talk at the Dask Summit 2021, which explain the inner workings of spilling, you might find interesting: https://www.youtube.com/watch?v=mHWk7y2p-NM

That's fair enough, I'm going to focus on improving the ability to partition this [in dask], may help split up the work staging as a result.

I came across it yesterday 😄 As far as I could understand at the time was very neat 😎 Cheers all

@madsbk
Copy link
Member

madsbk commented Aug 18, 2021

Thanks @lmmx

@madsbk madsbk closed this as completed Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants