Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle errors as no-op for execution requests #14826

Merged
merged 4 commits into from
Jan 31, 2025
Merged

Conversation

terencechain
Copy link
Member

@terencechain terencechain commented Jan 25, 2025

In Electra, when processing execution requests, any failure—except for malformed requests (i.e., nil)—should be treated as a no-op per consensus specifications. However, there's some ambiguity regarding what constitutes a failure:

  1. The execution request is invalid.
  2. A bug in the client code processing the execution request.

While (1) has been addressed, (2) remains unclear in terms of whether it should be treated as a no-op or a failure state transition. I argue that it should be treated as a no-op because triggering a state transition failure would result in the block being marked invalid and added to the invalid block cache. This could lead to a fork from other consensus layer implementations. Such a scenario might arise if, for example, a bug exists in an Electra helper function or if a bug is introduced in future updates. Treating all failures as no-ops, except for malformed requests, provides a much safer approach.

What this PR accomplishes:

  • Errors are only triggered for nil requests.
  • For other failures, errors are logged, and processing continues. Logging errors is more appropriate in this context, as these cases don't indicate a fundamental implementation failure.
  • Added safeguards to the process option to only trigger errors for nil requests, preventing potential issues from being introduced later.

@terencechain terencechain requested a review from a team as a code owner January 25, 2025 17:19
@@ -84,14 +85,20 @@ func ProcessOperations(
}
st, err = ProcessDepositRequests(ctx, st, requests.Deposits)
if err != nil {
return nil, errors.Wrap(err, "could not process deposit requests")
if errors.Is(err, errNilExecutionRequest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no logs required here? because they are processed inside?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's logged inside ProcessDepositRequests

}
amount := wr.Amount
isFullExitRequest := amount == params.BeaconConfig().FullExitRequestAmount
// If partial withdrawal queue is full, only full exits are processed
if n, err := st.NumPendingPartialWithdrawals(); err != nil {
return nil, err
log.WithError(err).Error("Could not get number of pending partial withdrawals")
Copy link
Contributor

@james-prysm james-prysm Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this and the logs below would potentially print many times for what isn't really affected by the loop above. wouldn't all of these continually fail? shouldn't this just print once?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably have a way to wrap and return this error so it only logs once, unless we want to print multiple times in which case we should add fields for the unique identifier of the request that caused it.

@@ -113,7 +113,8 @@ func ProcessWithdrawalRequests(ctx context.Context, st state.BeaconState, wrs []
}
validator, err := st.ValidatorAtIndexReadOnly(vIdx)
if err != nil {
return nil, err
log.WithError(err).Error("Could not get validator at index")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it exists in validator index by pubkey can it actually fail here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is needed we can provide some of the request information along with the log

@terencechain terencechain force-pushed the execution-reqs branch 2 times, most recently from 3ecd5c0 to a354632 Compare January 27, 2025 18:23
@@ -209,7 +209,7 @@ func ProcessConsolidationRequests(ctx context.Context, st state.BeaconState, req
}

if npc, err := st.NumPendingConsolidations(); err != nil {
log.WithError(err).Error("failed to fetch number of pending consolidations")
log.WithError(err).Errorf("failed to fetch number of pending consolidations at index %d", i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to use .WithField("index",i) instead?

Copy link
Contributor

@jtraglia jtraglia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Electra, when processing execution requests, any failure—except for malformed requests (i.e., nil)—should be treated as a no-op per consensus specifications. However, there's some ambiguity regarding what constitutes a failure.

I don't believe the specifications are ambiguous here. See the comment I left below. If there's an error which should never happen (eg uint64 overflow), the specifications will throw an exception. It will not just continue.

I argue that it should be treated as a no-op because triggering a state transition failure would result in the block being marked invalid and added to the invalid block cache. This could lead to a fork from other consensus layer implementations.

I think I agree with this, but I'm not entirely sure. I originally brought this up (in the doc I shared with the team) because I noticed inconsistent uses of return and continue in a few spots. It seemed that the dominant pattern was to return when the error should never happen. This made sense to me. But marking the block as invalid and adding it to the invalid block cache does feel dangerous. Wouldn't there be a consensus error in either situation (return vs continue)?

Comment on lines 187 to 190
withdrawableEpoch, err := exitQueueEpoch.SafeAddEpoch(params.BeaconConfig().MinValidatorWithdrawabilityDelay)
if err != nil {
return nil, errors.Wrap(err, "failed to add withdrawability delay to exit queue epoch")
log.WithError(err).Error("Could not compute withdrawable epoch")
continue
Copy link
Contributor

@jtraglia jtraglia Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, this really never should happen. The specification would throw an exception here too.

...
tests/core/pyspec/eth2spec/test/electra/block_processing/test_process_consolidation_request.py:1221: in run_consolidation_processing
    spec.process_consolidation_request(state, consolidation)
tests/core/pyspec/eth2spec/electra/minimal.py:5469: in process_consolidation_request
    if current_epoch < source_validator.activation_epoch + config.SHARD_COMMITTEE_PERIOD:
venv/lib/python3.13/site-packages/remerkleable/basic.py:88: in __add__
    return self.__class__(super().__add__(self.__class__.coerce_view(other))) 

cls = <class 'eth2spec.electra.minimal.Epoch'>, value = 18446744073709551679

    def __new__(cls, value: int):
        if value < 0:
            raise ValueError(f"unsigned type {cls} must not be negative")
        byte_len = cls.type_byte_length()
        if value.bit_length() > (byte_len << 3):
>           raise ValueError(f"value out of bounds for {cls}")
E           ValueError: value out of bounds for <class 'eth2spec.electra.minimal.Epoch'>

Edit: wrong function but it's the same idea.

@terencechain
Copy link
Member Author

I don't believe the specifications are ambiguous here

Sorry, I meant client implementation.

Wouldn't there be a consensus error in either situation (return vs continue)?

I think a client implementation bug is more likely to happen than something going out of bounds. The point is, if a client bug exists, it shouldn't be a footgun that causes the block to become invalid. An invalid block is the absolute worst-case scenario here, and we should avoid it

@jtraglia
Copy link
Contributor

An invalid block is the absolute worst-case scenario here, and we should avoid it.

This makes sense to me. With this in mind, the PR looks good to me.

Copy link
Contributor

@potuz potuz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are different kinds of errors that may happen on state transition:

  1. A context deadline (we should check this and not declare the block invalid in this case) The best option I see is to make this change in validateStateTransition directly, and instead of marking the block as invalid check first if ctx.Err() != nil
  2. An error on Prysm when processing, for example we can't get some fields or can't access some state required etc. These errors do not immediately indicate that the block is invalid, but continuing processing will alter the resulting state root and therefore declaring the block root as invalid later on. We should stop on these and not declare the block as invalid.
  3. Errors because the block actually fails to process, these should be marked as invalid.

} else if npc >= pcLimit {
return nil
}

activeBal, err := helpers.TotalActiveBalance(st)
if err != nil {
return err
log.WithError(err).Error("failed to fetch total active balance")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for these kinds of errors the request may be valid and we fail to get the active balance for whatever reason, by continuing we would have a consensus split anyway since the state root will fail to match. I think we need to return an error and not mark the block as invalid though.

@terencechain terencechain force-pushed the execution-reqs branch 2 times, most recently from b423c05 to 1e50745 Compare January 30, 2025 14:09
Copy link
Member

@prestonvanloon prestonvanloon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for this scenario

Comment on lines +84 to +88
for _, d := range requests.Deposits {
if d == nil {
return nil, errors.New("nil deposit request")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this validation logic be in ProcessDepositRequests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this PR is to separate out the error. ProcessDepositRequests should returns error typed execReqErr which that caller can handle it appropiately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this. ProcessOperations should not be doing any input validations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, this is planned to be reworked in a later change.

Comment on lines 472 to 474
if electra.IsExecutionRequestError(err) {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't check this here, this leaks internal logic up the stack, instead, if you keep the error in ProcessOperations, you can check on the caller.

}
st, err = ProcessWithdrawalRequests(ctx, st, requests.Withdrawals)
if err != nil {
return nil, errors.Wrap(err, "could not process withdrawal requests")
return nil, execReqErr{errors.Wrap(err, "could not process withdrawal requests")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not simply deal with this logic within the core function. If the spec expects you to simply continue on these cases, why not ditch the request right there and not return an error?

@terencechain
Copy link
Member Author

We'll close this and we figured out a better solution. A new one will be opened soon

@terencechain terencechain reopened this Jan 31, 2025
@terencechain terencechain force-pushed the execution-reqs branch 3 times, most recently from 36838e0 to d950af3 Compare January 31, 2025 15:57
Comment on lines 95 to 97
if wr == nil {
return nil, errors.New("nil execution layer withdrawal request")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

Comment on lines +84 to +88
for _, d := range requests.Deposits {
if d == nil {
return nil, errors.New("nil deposit request")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this. ProcessOperations should not be doing any input validations.

potuz
potuz previously approved these changes Jan 31, 2025
prestonvanloon
prestonvanloon previously approved these changes Jan 31, 2025
Comment on lines +84 to +88
for _, d := range requests.Deposits {
if d == nil {
return nil, errors.New("nil deposit request")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, this is planned to be reworked in a later change.

"github.com/prysmaticlabs/prysm/v5/testing/util"
)

func TestProcessOperationsWithNilRequests(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a good start for testing this method.

@prestonvanloon prestonvanloon added this pull request to the merge queue Jan 31, 2025
Merged via the queue into develop with commit 910609a Jan 31, 2025
17 checks passed
@prestonvanloon prestonvanloon deleted the execution-reqs branch January 31, 2025 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants