Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2895] Don't add duplicated allocation to node when the allo… #976

Closed
wants to merge 1 commit into from

Conversation

zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Oct 1, 2024

…cation already allocated before.

When i try to revisit the new update allocation logic, the potential duplicated allocation to node will happen if the allocation already allocated. And we try to add the allocation to the node again and don't revert it.

Note:

We have unwind logic for another side call for placeholder allocate:

_ = node.RemoveAllocation(reqFit.GetAllocationKey())

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@zhuqi-lucas zhuqi-lucas requested review from wilfred-s, pbacsko and craigcondit and removed request for pbacsko October 1, 2024 03:43
Copy link

codecov bot commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 80.99%. Comparing base (fe067b7) to head (5d1f2ba).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
pkg/scheduler/objects/application.go 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #976      +/-   ##
==========================================
- Coverage   81.03%   80.99%   -0.04%     
==========================================
  Files          97       97              
  Lines       12523    12529       +6     
==========================================
  Hits        10148    10148              
- Misses       2104     2110       +6     
  Partials      271      271              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ryankert01
Copy link
Contributor

Good catch!

Copy link
Contributor

@craigcondit craigcondit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break horribly. If allocateAsk() fails, there is nothing to unwind as the allocation result won't be returned and the final allocation will never happen. Your logic will remove something that was not added in the first place.

@zhuqi-lucas
Copy link
Contributor Author

Thanks @craigcondit for review:
We have inconsistent logic for allocate failed, here is the call with unwind, so i am assuming it may wrong:

_ = node.RemoveAllocation(reqFit.GetAllocationKey())

It seems both two cases call TryNodeAllocation, so i am confused.

@craigcondit
Copy link
Contributor

If you're sure this is correct, build a set of unit tests that prove it. Demonstrate that in the failure case without this patch that the system is left in an inconsistent state. Then show that it operates correctly with the patch. There's no tests in this PR at all.

@wilfred-s
Copy link
Contributor

-1 on this approach. I think the issue is way more basic. I opened a discussion on the jira as I see multiple issues in the way now track allocations and asks.

@zhuqi-lucas
Copy link
Contributor Author

Thanks @wilfred-s @craigcondit , i will try do add unit test and check the comments, look into the root cause of those issues.

@craigcondit
Copy link
Contributor

The allocateAsk() call can only fail if the ask was previously allocated. This is checked for in all code paths that call allocateAsk() earlier, and always with the application lock held. Therefore, if we ever hit the error case here, we have a serious BUG elsewhere.

@craigcondit
Copy link
Contributor

Closing this after determining that original issue is not a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants