Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: MetaGrpcClient deadlock when drop #16727

Merged
merged 2 commits into from
Oct 30, 2024

Conversation

drmingdrmer
Copy link
Member

@drmingdrmer drmingdrmer commented Oct 29, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

fix: MetaGrpcClient deadlock when drop

Move the reference to the dedicated runtime rt from MetaGrpcClient
to ClientHandle.

rt is a reference to the dedicated runtime for running
MetaGrpcClient.

If all ClientHandle are dropped, the runtime will be destroyed.

This rt previously is stored in MetaGrpcClient, which leads to a deadlock:

  • When all ClientHandle are dropped, the two workers worker_loop() and auto_sync_interval()
    will quit.
  • These two futures both held a reference to MetaGrpcClient.
  • The last of these(say, F) two will drop MetaGrpcClient.rt and Runtime::_dropper
    will block waiting for the runtime to shut down.
  • But F is still held, deadlock occurs.

Other changes:

  • Runtime::try_spawn and several other spawn methods now accept a name
    argument for display in async backtrace.

  • Add async-backtrace to MetaGrpcClient methods

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)

Related Issues


This change is Reviewable

@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Oct 29, 2024
@drmingdrmer drmingdrmer added C-bug Category: something isn't working A-meta Area: databend meta serive labels Oct 29, 2024
@drmingdrmer drmingdrmer marked this pull request as ready for review October 29, 2024 15:50
@drmingdrmer drmingdrmer force-pushed the 159-fix-meta-client-dead-lock branch 2 times, most recently from 5fb20f8 to 21ffe80 Compare October 30, 2024 04:08
Move the reference to the dedicated runtime `rt` from `MetaGrpcClient`
to `ClientHandle`.

`rt` is a reference to the dedicated runtime for running
`MetaGrpcClient`.

If all ClientHandle are dropped, the runtime will be destroyed.

This `rt` previously is stored in `MetaGrpcClient`, which leads to a deadlock:
- When all `ClientHandle` are dropped, the two workers `worker_loop()` and `auto_sync_interval()`
  will quit.
- These two futures both held a reference to `MetaGrpcClient`.
- The last of these(say, `F`) two will drop `MetaGrpcClient.rt` and `Runtime::_dropper`
  will block waiting for the runtime to shut down.
- But `F` is still held, deadlock occurs.

Other changes:

- `Runtime::try_spawn` and several other spawn methods now accept a name
  argument for display in async backtrace.

- Add async-backtrace to `MetaGrpcClient` methods
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @drmingdrmer for working on this!

Copy link
Member Author

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 12 files at r1.
Reviewable status: 8 of 12 files reviewed, all discussions resolved (waiting on @everpcpc)

Copy link
Member Author

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @everpcpc and @Xuanwo)

@drmingdrmer drmingdrmer added this pull request to the merge queue Oct 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 30, 2024
@BohuTANG BohuTANG merged commit d376d8c into databendlabs:main Oct 30, 2024
73 checks passed
@drmingdrmer drmingdrmer deleted the 159-fix-meta-client-dead-lock branch October 30, 2024 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-meta Area: databend meta serive C-bug Category: something isn't working pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants