Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: fix unrecoverable freeze when rabbit shuts down #10594

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

bougue-pe
Copy link
Contributor

Hard fix to just kill the process (and not only the thread) and let orchestrator restart it.

Bump amqp-client on the way as it doesn't hurt

Related to #8621, but the case reproduced and fixed leads to the following (different) core logs:

[11:16:04,880] [INFO]          [WorkerCommand] consume shutdown: amq.ctag-LsXBfmFdL6n758icthlCYA, com.rabbitmq.client.ShutdownSignalException: connection error; protocol method: #method<connection.close>(reply-code=320, reply-text=CONNECTION_FORCED - broker forced connection closure with reason 'shutdown', class-id=0, method-id=0)
[11:16:04,883] [WARN]  [ForgivingExceptionHandler] An unexpected connection driver error occurred (Exception message: Connection reset by peer)
Exception in thread "main" com.rabbitmq.client.AlreadyClosedException: connection is already closed due to connection error; protocol method: #method<connection.close>(reply-code=320, reply-text=CONNECTION_FORCED - broker forced connection closure with reason 'shutdown', class-id=0, method-id=0)
        at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:1012)
        at com.rabbitmq.client.impl.AMQConnection.close(AMQConnection.java:1127)
        at com.rabbitmq.client.impl.AMQConnection.close(AMQConnection.java:1056)
        at com.rabbitmq.client.impl.AMQConnection.close(AMQConnection.java:1040)
        at com.rabbitmq.client.impl.AMQConnection.close(AMQConnection.java:1032)
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.close(AutorecoveringConnection.java:289)
        at kotlin.io.CloseableKt.closeFinally(Closeable.kt:56)
        at fr.sncf.osrd.cli.WorkerCommand.run(WorkerCommand.kt:319)
        at fr.sncf.osrd.App.main(App.java:44)

@bougue-pe bougue-pe requested review from Khoyo and ElysaSrc January 30, 2025 10:16
@bougue-pe bougue-pe requested a review from a team as a code owner January 30, 2025 10:16
@github-actions github-actions bot added the area:core Work on Core Service label Jan 30, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.85%. Comparing base (ecb261f) to head (ee9d8aa).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##              dev   #10594   +/-   ##
=======================================
  Coverage   81.85%   81.85%           
=======================================
  Files        1075     1075           
  Lines      107172   107172           
  Branches      728      728           
=======================================
  Hits        87726    87726           
  Misses      19407    19407           
  Partials       39       39           
Flag Coverage Δ
editoast 74.24% <ø> (ø)
front 89.38% <ø> (ø)
gateway 2.18% <ø> (ø)
osrdyne 3.28% <ø> (ø)
railjson_generator 87.50% <ø> (ø)
tests 88.14% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -316,7 +320,7 @@ class WorkerCommand : CliCommand {
if (!channel.isOpen()) break
}

return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to this work out with multithreading?

I don't quite understand when this line is reached actually. It is when all threads have died, or just one?

Maybe some time we should rewrite this class to lower the amount of nested callbacks and functions. It's a little difficult to follow.

Hard fix to just kill the process (and not only the thread) and let
orchestrator restart it.

Bump amqp-client on the way as it doesn't hurt

Signed-off-by: Pierre-Etienne Bougué <[email protected]>
@bougue-pe bougue-pe force-pushed the peb/core/fix_core_freeze_on_rabbit_shutdown branch from 06726c5 to ee9d8aa Compare January 30, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core Work on Core Service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants