-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: lack of recovery methods in case of message consumption failure #94
Comments
After we ran into nonce mismatch issue and @MaxMustermann2 fixed it with governance role(upgrade the contract to manually write the nonce to expected value), we realized that the current way of handling messages would bring us a lot of difficulty to recover. Here are some facts:
So @MaxMustermann2 asked why we maintain the nonce inside gateway contract(the app contract in layerzero's context), given layerzero endpoint has maintained the nonce by itself. The answer is that we have to maintain the nonce and check for expected nonce if we want ordered execution of messages(https://docs.layerzero.network/v2/developers/evm/oapp/message-design-patterns#ordered-delivery). Then @MaxMustermann2 realized if any revert happened during the execution of message, the contract would get stuck(being blocked), and this is true, but there are pros and cons: pros:
cons:
So we can conclude some best practices based the pros and cons:
Besides, not all reverts during execution of messages are not recoverable. Actually there are plenty of transient errors that would also result in the revert: like insufficient gas limit, insufficient balance for layerzero relayer, or the gateway contract is paused for some reason, but these errors are typically not logic errors, but just some transient errors that could be resolved: e.g. the relayer could provide more gas limit, or we could unpause the contract to start receiving messages and so on. So when we talk about reverts, we are mostly often talking about reverts caused by business logics and can not be resolved easily. If we follow these best practices, to only revert on exceptional and critical signals and never revert on expected and retry-able cases, we could make sure the revert would not happen frequently and depend on revert to identify system bugs. But even if the revert happens owing to an exceptional bug, it could be quite cumbersome, especially if we have multi-sig governance, so in the worst cases, the revert happens and it is difficult to recover the contract from bad state, we should have some privilege functions for contract governor to force consuming the message(it's nonce more specifically) to recover the contract from being stuck. |
Description
A message could possibly failed to be consumed in the worst cases(revert owing to completely unexpected error), and this would halt the protocol but we lack recovery functionalities, like force consuming a message, to recover from such situation
The text was updated successfully, but these errors were encountered: