Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: propose DeputyGuardianModule improvements #167

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions protocol/deputy-guardian-module-improvements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Deputy Guardian Module Improvements

## Context

The Foundation Safe is given the ability to act as the Security Council Safe for a limited number
of safety-net actions on OP Mainnet. For instance, the Foundation Safe can act as the Security
Council Safe to trigger the Superchain-wide pause function. These powers are given to the
Foundation Safe through the `DeputyGuardianModule` installed in the Security Council Safe.

## Problem Statement

The existing `DeputyGuardianModule` has a number of problems that we'd like to be able to solve in
a simple manner that makes as few contract changes as possible. We describe each of these problems
individually for clarity.

### Pre-Signed Pause Transactions

The Foundation Safe pre-signs certain transactions to reduce response time in case of an emergency.
Because pre-signed Foundation Safe transactions are dependent on the nonces of that Safe, these
pre-signed transactions become invalidated any time the Foundation Safe is required to sign
literally any other transaction. In an attempt to reduce the overhead from this state of affairs,
a *second* Foundation Safe sometimes referred to as the "Foundation Operations Safe" with the exact
same configuration as the original Foundation Safe was made to be the Deputy Guardian.

Even with this second Foundation Safe, pre-signed pauses are invalidated on a regular basis
whenever an upgrade touches the `DeputyGuardianModule`. Gripes with the pre-signed pause system
could fill a whole role of toilet paper and are not just limited to the issues noted above.

### Unclear Responsibilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about changing what the DGM can do, but not also changing what that Guardian role is capable of doing. IMO it is much cleaner to maintain the property that "Anything the Guardian can do, the DGM can do", so I'd prefer to see us move any safety impacting actions (ie. unpausing, fallback activation) to the Upgrade Controller (2of2 Safe).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would violate the condition that L2Beat wants that the Security Council alone can unpause the system. I fully agree with you but that's the constraint I was operating under.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, my goal with this proposal is to solve a lot of problems in the short-term and then come back to this problem in ~6 months when the landscape has changed a bit with Stage 1.4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would violate the condition that L2Beat wants that the Security Council alone can unpause the system.

Actually, what I'm advocating or is to also update the auth in the system.

It should not be too much of a lift once the Isthmus contract changes are in, as they introduce a new upgrader role that is stored in the SuperchainConfig, so all you'd need to do is update the auth check in unpause() and setRespectedGameType().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree with this in principle, my goal with this proposal is to be able to ship something into prod ASAP and then work on these more in-depth changes later


The `DeputyGuardianModule` has an unclear security model and underlying purpose. The original
intent behind the `DeputyGuardianModule` was to provide a fast way to respond to potential
incidents in a manner that can impact liveness but not system safety. A lack of strict clarity
behind this design has meant that the `DeputyGuardianModule` has become a source of security
implications that are not often easy to reason about.

For instance, the `DeputyGuardianModule` currently allow the Deputy Guardian account to change the
dispute game type that is respected within the `OptimismPortal` contract from the
`FaultDisputeGame` to the `PermissionedDisputeGame`. The `PermissionedDisputeGame` has a security
model that differs from the `FaultDisputeGame` which essentially means that the Deputy Guardian is
given the ability to change the security model of the system. This is a departure from the original
design of the `DeputyGuardianModule` which should only impact liveness and should not modify the
security model of the system.

## Proposed Solution

We propose resolving the above problems by modifying the existing `DeputyGuardianModule` to be more
restrictive in the actions it can carry out while also being more permissive in who can carry out
these actions.

The `DeputyGuardianModule` would be permitted to:

1. Trigger the Superchain-wide pause.
1. Blacklist an individual dispute game.

The `DeputyGuardianModule` would **no longer** be permitted to:

1. Undo the Superchain-wide pause ("unpause").
1. Change the respected game type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a serious gap in our ability to respond to issues with the fault dispute system. We would need to escalate any issue affecting a significant number of games to a superchain wide pause which we had previously explicitly designed to avoid.

It's important to note that changing the respected game type can only switch to a game implementation previously approved by governance and deployed by the security council. I can see the argument that it changes the security model but I think there's an equally valid view that it doesn't because the permissioned game is always a part of the system and thus part of its security model.

Copy link
Contributor Author

@smartcontracts smartcontracts Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your opinion here change if we add the invalidateExistingGames() function to the AnchorStateRegistry and allow the deputy guardian to call that function? That would allow the deputy guardian to respond to a large number of invalid games without triggering the pause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I guess we could call that regularly until the security council changes the game type but it doesn't seem like a good incident response process. The case I'm thinking about is where there's a bug in the fault dispute game that someone is using to continuously cause invalid output roots to resolve as valid.

I guess given timelines we could call invalidateExistingGames() once and depend on the security council signing to switch respected game type before more games resolve as valid and we'd bundle another call to invalidateExistingGames() in that task. With a 3.5 day air gap and a 3 day SLA for security council that would work but it doesn't feel good to leave a major issue unmitigated for 3 days. I think we'd need to carefully review our incident response again and see what it looks like without the ability to fallback to permissioned games and confirm we're ok with it. I think it increases the likelihood that we would need to pause the superchain and we've generally tried to restrict that action to only the most critical of situations.

Copy link
Contributor Author

@smartcontracts smartcontracts Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I could be convinced of allowing the deputy guardian to switch the respected game type, need to think about it a bit more. My primary concern is that if we're making the deputy guardian more permissive then giving it the ability to change the respected game type becomes a bit more dangerous. A malicious/leaked deputy key would be able to swap the respected game type back to the permissionless game which would probably force you to trigger the pause to avoid an endless back-and-forth.

That said I think the likelihood of this happening is relatively low. My short-term proposal would then be that we give the deputy guardian access to everything except for unpause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to come up with a broader proposal but I think the reality is that with Stage 1.4 + interop it's possible that the landscape around security responses looks very different in 6 months, so I'd be happy with a simplified/permissive deputy guardian for now, even if it's not ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also potentially allow the deputy guardian module to switch the respected game type to permissioned but not allow it to switch back to permissionless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually a pretty good compromise. When kona is deployed we would typically want to switch to it instead of permissioned if we can so need to think through if that should be allowed by the deputy guardian or if we're ok with requiring the security council for that. e.g. we could allow the deputy guardian to change the respected game arbitrarily unless it is already the permissioned game.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright going to update this proposal with what we've talked about so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also potentially allow the deputy guardian module to switch the respected game type to permissioned

Isn't that exactly the issue that we're trying to fix? ie. allowing the guardian to change the state transition is a safety, not liveness, consideration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I agree with this in principle, I'm trying to find a middle ground that improves on the status quo right now without needing to make a bunch of contract changes

1. Set the anchor state in the `AnchorStateRegistry`.
smartcontracts marked this conversation as resolved.
Show resolved Hide resolved

The `DeputyGuardianModule` would be accessible to:

1. Any member of the Security Council Safe.
1. An EOA operated by the Optimism Foundation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this scalable to N EOAs, so that ability to pause can be delegated to other security-facing orgs?


Please note that this proposal supercedes
[Design Doc 162](https://github.com/ethereum-optimism/design-docs/pull/162) that introduced the
concept of the `DeputyPauseModule`.

### Reasoning

The above changes make the role of the `DeputyGuardianModule` significantly more obvious. The
`DeputyGuardianModule` is now designed to be a fast response mechanism that can ONLY impact
liveness and cannot impact safety.

1. We remove the ability to unpause the system because this makes it possible for the
`DeputyGuardianModule` to diminish safety by unpausing the system when it was explicitly placed
into a paused state for security reasons.
1. We remove the ability to change the respected game tybe because this allows the
`DeputyGuardianModule` to modify the security model of the system and does not align with the
goal of limiting the module to liveness-impacting actions.
1. We remove the ability to set the anchor state within the `DisputeGameFactory` because this can
potentially impact safety if the provided anchor state game is sufficiently old.

## Alternatives Considered

### Deputy Pause Module

This proposal supercedes a previous proposal that would have installed a `DeputyPauseModule` into
the Foundation Safe. We would be able to get the benefits of this proposal while also simplifying
the role of the Deputy Guardian if we accepted this proposal instead.

## Risks & Uncertainties

### Audits

Any module will need to be carefully audited. The `DeputyGuardianModule` was previously audited but
would need to be re-audited as a result of these changes.

### Leaked Deputy

Our worst-case scenario is a leaked deputy private key. Since the `DeputyGuardianModule` would now
permit any member of the Security Council Safe or a dedicated Foundation EOA to trigger
liveness-impacting actions, such a leak would likely cause a temporary liveness failure. A majority
of the Security Council Safe would need to coordinate a transaction to remove or replace the
offending account and resolve the liveness failure.

### Compromised Deputy

A motivated attacker with access to a deputy's private key could try to drain the deputy's wallet
constantly to prevent it from being used to trigger transactions. Although this could be
side-stepped with private RPC/bundling tools, it would not be considered good practice to include
live third-party infrastructure in the hot path for critical security actions. It therefore seems
prudent that the deputy be able to act via signature instead of directly via `msg.sender` check.

If we allow the deputy to act via signature then we must also make sure that the signature includes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How different is this from just using 1/1 SAFEs instead of raw EOA addresses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 1/1 safe is compromised then owner can be changed and you lose access to pause. Using raw EOA makes this impossible.

some sort of nonce to prevent the same signature from being used more than once.

### Process Updates

Various processes and runbooks will need to be updated to reflect this new state of affairs. We'll
be happy, but it will take some effort. All of these updates will need to be made before we
actually go live with these changes and we should run drills on testnet and propose that drills be
run on mainnet.
108 changes: 0 additions & 108 deletions protocol/deputy-pause-module.md

This file was deleted.