Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fault tolerance #39

Merged
merged 8 commits into from
Feb 22, 2024
Merged

Fault tolerance #39

merged 8 commits into from
Feb 22, 2024

Conversation

zuiderkwast
Copy link
Collaborator

@zuiderkwast zuiderkwast commented Feb 20, 2024

This change adds a supervisor for client processes, so a client process can crash without affecting the other client processes or the ered instance. (An ered client may crash e.g. due to a bug in OTP's TLS implementation.)

To keep track of missing or dead client processes, the ered process manages monitors to all ered_client processes, so it can properly return errors to commands handled by a crashed client process. ered:command/3,4 returns a value in all situations. ered:command_async/4 does not always return a value though. Specifically, if the client process crashes while handling the command, the reply callback function will never be called.

Fixes #1.

With a supervisor for the client processes, an ered instance survives
a crash of an ered_client process.

New problems arise: Commands issued while the process is restarting
hangs forever and async commands get no response.
The ered process keeps track of pending commands (but not async commands)
and with a monitor on each client process, ered can reply with an error
if the client process crashes. Then the client is restarted by the
supervisor and ered recovers.
Copy link
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. No noticeable performance impact I guess?

test/ered_SUITE.erl Outdated Show resolved Hide resolved
src/ered.erl Outdated Show resolved Hide resolved
test/ered_SUITE.erl Show resolved Hide resolved
test/ered_SUITE.erl Outdated Show resolved Hide resolved
@zuiderkwast
Copy link
Collaborator Author

No noticeable performance impact I guess?

Not that I noticed. We don't have benchmarks...

test/ered_SUITE.erl Outdated Show resolved Hide resolved
src/ered.erl Outdated Show resolved Hide resolved
Copy link
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

test/ered_SUITE.erl Outdated Show resolved Hide resolved
Co-authored-by: Björn Svensson <[email protected]>
@zuiderkwast zuiderkwast merged commit 4bcfab3 into Ericsson:main Feb 22, 2024
5 checks passed
@zuiderkwast zuiderkwast deleted the supervisor branch February 22, 2024 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add supervision tree
2 participants