Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cubed example using Coiled on AWS #271

Open
rsignell opened this issue Jan 31, 2024 · 14 comments
Open

Cubed example using Coiled on AWS #271

rsignell opened this issue Jan 31, 2024 · 14 comments

Comments

@rsignell
Copy link

rsignell commented Jan 31, 2024

I tried following the instructions to run a Cubed demo using Coiled on AWS following these instructions, and the environment creation went smoothly:

mamba create -n cubed-coiled-examples python=3.9 pip -y
conda activate cubed-coiled-examples
pip install 'cubed[coiled]'

but when I tried running the example:

git clone [email protected]:tomwhite/cubed.git
cd cubed/examples/coiled/aws
export AWS_PROFILE=esip-qhub
python coiled-add-asarray.py "s3://esip-qhub/rsignell/testing/cubed"

I got:

Traceback (most recent call last):
  File "/home/rsignell/repos/cubed/examples/coiled/aws/coiled-add-asarray.py", line 22, in <module>
    res = c.compute(
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/cubed/core/array.py", line 136, in compute
    result = compute(
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/cubed/core/array.py", line 225, in compute
    plan.execute(
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/cubed/core/plan.py", line 171, in execute
    executor.execute_dag(
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/cubed/runtime/executors/coiled.py", line 31, in execute_dag
    coiled_function = make_coiled_function(pipeline.function, coiled_kwargs)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/cubed/runtime/executors/coiled.py", line 13, in make_coiled_function
    return coiled.function(**coiled_kwargs)(execution_stats(func))
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/coiled/function.py", line 404, in decorator
    Function(func, cluster_kwargs, keepalive=keepalive, environ=environ, local=local, name=name)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/coiled/function.py", line 49, in __init__
    self._code = inspect.getsource(function)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/inspect.py", line 1024, in getsource
    lines, lnum = getsourcelines(object)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/inspect.py", line 1006, in getsourcelines
    lines, lnum = findsource(object)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/inspect.py", line 817, in findsource
    file = getsourcefile(object)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/inspect.py", line 697, in getsourcefile
    filename = getfile(object)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/inspect.py", line 677, in getfile
    raise TypeError('module, class, method, function, traceback, frame, or '
TypeError: module, class, method, function, traceback, frame, or code object was expected, got partial

I'm raising the issue here as I'm guessing something on the Coiled side changed?
But of course I don't really know so I'm prepared to be told to post it over at cubed... :)

@jrbourbeau
Copy link
Member

Thanks @rsignell. Looking at the traceback, I think this is probably on our end. I'll take closer look

@jrbourbeau
Copy link
Member

We have some code collection logic that doesn't handle functools.partial objects well. I'll push a fix up for that

In the meantime, I'm not sure where the partial is coming from inside cubed. Unpacking the original function by getting the .func attribute on the partial-wrapped function could be a temporary workaround

@rsignell
Copy link
Author

Unpacking the original function by getting the .func attribute on the partial-wrapped function could be a temporary workaround

Is this a workaround I could implement? I'm afraid I don't understand this! 😕

@jrbourbeau
Copy link
Member

Ah, sorry, I meant if you're playing around with a development version of cubed, then you could make some changes to fix things. We should have a fix out in coiled soon, so it may be faster to just wait for that

@jrbourbeau
Copy link
Member

We'll likely have a new release out sometime later today or tomorrow, but if you want to try a dev coiled release with the fix you can pip install coiled==1.3.12.dev10

@rsignell
Copy link
Author

rsignell commented Feb 1, 2024

Thanks @jrbourbeau!

Both tests ran successfully with pip install coiled==1.3.12.dev12.

Here's the results of the larger coiled-add-random.py test:

python coiled-add-random.py "s3://esip-qhub/rsignell/testing/cubed"

Screenshot 2024-02-01 070442
Screenshot 2024-02-01 070525

@rsignell
Copy link
Author

rsignell commented Feb 1, 2024

@tomwhite or @TomNicholas does the Coiled performance here appear (from the screenshots above) to be comparable to other executors? If not, I'm hoping to work with Coiled engineers to see if they have ideas about how to improve performance.

@rsignell rsignell reopened this Feb 1, 2024
@tomwhite
Copy link

tomwhite commented Feb 1, 2024

Thanks for looking at this @rsignell, and for the bugfix @jrbourbeau.

I tried the Coiled executor a while back and had a similar experience with the number of instances not scaling as much as it could have done (see cubed-dev/cubed#309). It would be great if Coiled could ramp up the number of instances faster (is it possible to tell it to do that?). Other executors like Lithops or Modal can scale up to 100s or 1000s of instances in seconds.

@TomNicholas
Copy link

TomNicholas commented Feb 1, 2024

We don't yet have good benchmarks so it's hard to know what the total time should be (and a lot has changed since I last ran anything at scale), but here's a way to get an idea of what the performance should be.

If I attempt to run this example locally1 (just for demonstration purposes) then you see a progress bar for each step in the plan (i.e. each intermediate array that needs to be written to zarr).

Screenshot 2024-02-01 at 10 50 50 AM

Ignoring the "create arrays", there are 3 steps, each of which consists of 100 embarrassingly parallel tasks (each is a chunk to be saved to an intermediate zarr store). You can see them laid out in the plan:

Screenshot 2024-02-01 at 11 08 02 AM

If you don't have 100 chunks being computed in parallel on Coiled between each step then it could go faster.2 If you were to run this on Lambda/GCF then that's what you would see.

Assuming your coiled cluster is computing one chunk per worker, it looks like you're only computing 10 chunks at once, so it could be at least 10x faster (before any other graph optimizations). The bottleneck here is the dynamic scaling up of the Coiled Cluster to match the very wide task graphs that Cubed is passing it.

Footnotes

  1. There's probably a neater way to get to this information that I'm blanking on right now. EDIT: I could simply have hovered over the graph nodes in jupyter to see the number of tasks for that node.

  2. In fact I think since the first two arrays are independent, and since https://github.com/tomwhite/cubed/issues/80 was fixed, the first part of the calculation should be able to compute 200 chunks at once.

@mrocklin
Copy link
Member

mrocklin commented Feb 1, 2024

Coiled uses raw VMs. This means that things are super-cheap, but also that there's a minute-long cold startup time one has to deal with (systems like Modal get around this by having VMs up all the time, but this is expensive and assumes that you're in their region/account. This gets rolled into the 10x cost difference).

Because things have minute-long startup times, the adaptive settings are configured to scale up to something large enough so that it completes in ~5 minutes. If we're going to spend one minute spinning up a VM, we want that VM to be active for at least a couple more minutes before spinning it down.

You can tweak those configurations if you like, but you'll never get sub-minute with Coiled. There's a cost-speed tradeoff here.

@shughes-uk
Copy link

https://docs.coiled.io/user_guide/clusters/scale.html#adaptive-scaling adaptive scaling docs if you want to tweak some settings.

I suspect Cubed can probably make better decisions about when to scale in/out though, and just achieve that through manual scale calls.

@tomwhite
Copy link

tomwhite commented Feb 1, 2024

I suspect Cubed can probably make better decisions about when to scale in/out though, and just achieve that through manual scale calls.

Yes, we could pre-scale to the maximum width of the graph.

@rsignell
Copy link
Author

rsignell commented Feb 1, 2024

I was planning on using this for rechunking a large Zarr dataset (reading 1TB Zarr from S3, writing a new rechunked 1 TB Zarr to S3). For this use case I don't care about super fast autoscaling, do I?

@mrocklin
Copy link
Member

mrocklin commented Feb 1, 2024

Yes, we could pre-scale to the maximum width of the graph

Cost-wise this might be bad, especially if the tasks run quickly. If you spend a minute spinning up VMs that then just run for a second, then you're spending the vast majority of VM-rental-time doing nothing other than setting up.

I was planning on using this for rechunking a large Zarr dataset (reading 1TB Zarr from S3, writing a new rechunked 1 TB Zarr to S3). For this use case I don't care about super fast autoscaling, do I?

Not at all. This is a case where Coiled's approach of trading slow spin-up time for dirt cheap spot VMs is ideal. You're spending $0.02 per CPU hour vs $0.20 per CPU hour, as long as you're fine waiting a few minutes for everything to complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants