-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preliminary Vast AI support #4365
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing to this @kristopolous ! This is really exciting. Left some discussions. One main confusion I have is that, does vast ai like runpod, a cloud providing pods to users as their "VM"s? Asking because I'm seeing a lot of docker related code, and just want to confirm :)
historically, runpod was a clone of vast. We currently offer docker-style containers and will be providing vms soonish (probably before end of year) |
e9e922a
to
4c9aff9
Compare
these test passing is blocked by https://github.com/skypilot-org/skypilot-catalog/pull/100/commits |
2b3e658
to
25b99f9
Compare
Co-authored-by: Tian Xia <[email protected]>
Co-authored-by: Tian Xia <[email protected]>
Co-authored-by: Tian Xia <[email protected]>
Co-authored-by: Tian Xia <[email protected]>
3010706
to
7face5e
Compare
cc @Michaelvll |
Co-authored-by: Tian Xia <[email protected]>
Co-authored-by: Tian Xia <[email protected]>
Is there an update on the failing CI tests? I find it still failing. Also, if passing smoke test is too hard, how about listing some basic test result in the PR description? include but not limited to:
|
I can do this sure.
We only offer GPUs instances ... you are free to use the CPU if you'd like but we're a GPU shop
Sure. all these are fine. I'll find tests that can do this. |
Got it. Feel free to ignore this one. |
So how is this "Autodowning" supposed to work? It used to work and now it puts a running machine into an "INIT" state which sounds completely wrong. It also doesn't attempt to stop or terminate any instance. You can do
and it succeeds, every time you can do
and it succeeds, every time you do
You can do autostop it then succeeds. You can do
You can do autostop it then succeeds. In december, all of these things worked. I've spent the past 5 or so hours trying to work through this code. Is there something special about the base image? Are you doing some action at a distance? |
I may have finally found the issue, let me look |
e849429
to
78fdcf6
Compare
This is preliminary support for Vast. It currently works on an unreleased version of the SDK which we will soon get up to PyPy
The document https://docs.google.com/document/d/1oWox3qb3Kz3wXXSGg9ZJWwijoa99a3PIQUHBR8UgEGs/edit?pli=1&tab=t.0 was followed and all the testing passed
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh
I'm pretty sure there will need to be edits, I'm fine with that. This is attempt 1. The outstanding work:
We need to