Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.4 #70

Open
wants to merge 66 commits into
base: dev
Choose a base branch
from
Open

0.4 #70

wants to merge 66 commits into from

Conversation

JadenFiotto-Kaufman
Copy link
Member

No description provided.

MichaelRipa and others added 30 commits September 26, 2024 15:11
Add live dependency updates and auto-update on container start
JadenFiotto-Kaufman and others added 19 commits December 12, 2024 00:36
…n. It updates an unused env var in the chosen applications config and then redeploys. This causes Ray to restart the Deployment.

Model deployments trigger this remote method with their application name when they hit a CUDA device side assert error as its the only way to uncorrupt its cuda runtime
Added remote method to Controller deployment to restart an applicatio…
…CCL process and torch distributed process group from being terminated on timeout. Vastly simplifies timeouts
@MichaelRipa
Copy link
Member

All of the model keys in the various service configs should be updated to the new schema (nnsight.modeling.language.LanguageModel instead of nnsight.models.LanguageModel.LanguageModel).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants