Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop a training job #147

Merged
merged 24 commits into from
Jan 29, 2025
Merged

Stop a training job #147

merged 24 commits into from
Jan 29, 2025

Conversation

lotif
Copy link
Collaborator

@lotif lotif commented Jan 7, 2025

PR Type

Feature

Short Description

Clickup Ticket(s): Link(s) if applicable.

Adding a button on in-progress jobs to stop it. Clicking on it will call an API that will ultimately kill the clients' processes and the server processes.

In this PR I am:

  • Saving the client and server PIDs in the database so they can be retrieved later
  • Adding an API on the client that will allow to kill a client given its PID
  • Adding an API on the server to stop a job. This API will call the client APIs one by one, and then it will kill the server process. It will also change the job status to "Finished With Error" and write down an error message.
  • As a bonus, now that we have an error message field, I'm also setting it to the exception message when the /start endpoint fails
  • Adding the stop button only for jobs that are in progress

Screenshot 2025-01-07 at 14 12 33

Tests Added

Fully unit and integration tested

@lotif lotif requested review from jewelltaylor and emersodb January 7, 2025 19:17
florist/api/routes/server/job.py Dismissed Show dismissed Hide dismissed
florist/api/routes/server/job.py Dismissed Show dismissed Hide dismissed
@codecov-commenter
Copy link

codecov-commenter commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 95.85492% with 8 lines in your changes missing coverage. Please review.

Project coverage is 94.31%. Comparing base (454cf11) to head (b959537).

Files with missing lines Patch % Lines
florist/api/routes/server/job.py 91.89% 3 Missing ⚠️
florist/app/jobs/page.tsx 94.33% 3 Missing ⚠️
florist/api/routes/server/training.py 92.30% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #147      +/-   ##
==========================================
+ Coverage   89.29%   94.31%   +5.02%     
==========================================
  Files          24       24              
  Lines        2411     2570     +159     
  Branches      138      149      +11     
==========================================
+ Hits         2153     2424     +271     
+ Misses        258      146     -112     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@emersodb emersodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small comments on my end. I can't review the front-end stuff, but the back end components seem sound.

florist/api/client.py Outdated Show resolved Hide resolved
florist/api/routes/server/job.py Show resolved Hide resolved

# Start the server training listener and client training listeners as threads to update
# the job's metrics and status once the training is done
server_listener_thread = Thread(target=asyncio.run, args=(server_training_listener(job),))
server_listener_thread.daemon = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do we need this to be True now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have added an explanation here, but daemon threads will not prevent the main thread to terminate if they are still running. This will solve an issue I have seen with shutting down the servers when there are listeners still running.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...that makes sense. Thanks for the background!

florist/api/client.py Dismissed Show dismissed Hide dismissed
@lotif lotif merged commit 08c87c2 into main Jan 29, 2025
8 checks passed
@lotif lotif deleted the kill-job branch January 29, 2025 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants