Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"exitOffer tool reclaims stuck payment" timed out waiting for tx to be included in a block #10565

Closed
Tracked by #10774
dckc opened this issue Nov 25, 2024 · 5 comments · Fixed by #10775
Closed
Tracked by #10774
Assignees
Labels
bug Something isn't working flake flakey test test

Comments

@dckc
Copy link
Member

dckc commented Nov 25, 2024

Describe the bug

I'm struggling to land #10530 (for #10391). It's in code that shouldn't be affecting acceptance testing yet.

At step:10:4370, we see:

    Error: error in json rpc client, with http response metadata: (Status: 200 OK, Protocol HTTP/1.1). RPC error -32603 - Internal error: timed out waiting for tx to be included in a block

Then at step:10:4372:

 Error {
    message: 'tryExitOffer failed to reclaim stuck payment  condition failed after 15 retries.',

And finally, at step:10:4329:

  ✘ [fail]: exitOffer tool reclaims stuck payment Rejected promise returned by test

To Reproduce

run CI on #10530 (intermittently?)

Expected behavior

no error

Platform Environment

CI:
git checkout --progress --force refs/remotes/pull/10530/merge
-> 8e51eff

Additional context

#10530 (for #10391)

@dckc dckc added bug Something isn't working flake flakey test labels Nov 25, 2024
@dckc
Copy link
Member Author

dckc commented Nov 25, 2024

It passed on the 17th, at step:10:3981 for #10483.

2024-11-17T18:31:22.034Z block-manager: block 1619 commit
Object <[Object: null prototype] {}> {
  maxRetries: 15,
  retryIntervalMs: 5000,
  reusePromise: false,
  message: 'tryExitOffer failed to reclaim stuck payment '
}
  ✔ exitOffer tool reclaims stuck payment (3.2s)

@turadg turadg added the test label Nov 25, 2024
mergify bot added a commit that referenced this issue Nov 26, 2024
refs: #10565

## Description

#10530 was failing repeatedly due to this test problem,
- #10565

I thought it was a flake so I [bypassed the check](#10530 (comment)) but it turns out to fail deterministically due to the changes in #10530. But those changes should have not affected the test, so the problem is with the test.

This marks it as failing until it can be solved.

### Security Considerations
none

### Scaling Considerations
none

### Documentation Considerations
none

### Testing Considerations
I used `failing` instead of `skip` to help confirm that it's deterministic. Also if something happens to resolve it inadvertently it'll be detected so we re-enable it.

### Upgrade Considerations
none
0xpatrickdev added a commit that referenced this issue Nov 26, 2024
- this test is flaky and will be addressed as part of #10565
@dckc
Copy link
Member Author

dckc commented Nov 26, 2024

@mfig notes ExecaError ... hardcoded 10 second durationMs: 10142.694018, which is just over a hard-coded 10 second -bblock timeout and a related in-progress PR:

@0xpatrickdev
Copy link
Member

This started to pass in CI so skipping until this ticket is prioritized:
118bc6a

@anilhelvaci
Copy link
Collaborator

anilhelvaci commented Dec 25, 2024

Investigation

I do have reason to believe the flake we're running into is related to the batched execution fee charged on gov1. I can't be 100% sure (yet) because of the following reasons;

  • In the logs belonging to when the test was failing, I cannot see IST balance of gov1 before sending the exit offer.

    t.log('istBalanceBefore', istBalanceBefore);

  • As far as I can tell slog files do not include beans charged per account so I'm not able to observe gov1's beans being charged.
    Question: Shouldn't gov1's purse be notified after execution fee is deducted from gov1's IST balance?
    The natural answer to this is "yes" however I cannot observe any messages going into vat-bank before the offer is picked up. Here's an example image to the slog replay I did:

    image

    I expect this IST deduction's notification to arrive as an inbound message to vat10-bridge. However, I reckon this is not what we are seeing because, ChargeBeans method of swingset/keeper uses bank module's SendCoinsFromAccountToModule instead of its vbank equivalent.

    if !feeCoins.IsZero() {
    err := k.bankKeeper.SendCoinsFromAccountToModule(ctx, addr, k.feeCollectorName, feeCoins)
    if err != nil {
    return err
    }
    }

    Thoughts? @michaelfig

Bug Description Breakdown

Below error message is NOT related to wallet test

    Error: error in json rpc client, with http response metadata: (Status: 200 OK, Protocol HTTP/1.1). RPC error -32603 - Internal error: timed out waiting for tx to be included in a block

The failing test in https://github.com/Agoric/agoric-sdk/actions/runs/11982434558/job/33413400559?pr=10530#step:10:4370 is named send invitation via namesByAddress and not exitOffer tool reclaims stuck payment. The failure is due to a -bblock time out which there's an on going work on it in agoric-labs/cosmos-sdk#423.

Last two pieces of logs IS related to wallet test

When looking at the condition in which retryUntilCondition operates accordingly (in the subject test), we can see that the IST balances are checked using regular > operator instead of tryIstBalances we built to avoid flakes rise from charging IST fees.

Solution

There's no practical way to reproduce my theory and verify it's indeed the case because;

  • The factors that effect accrued IST fee vary a lot. They include
    DefaultBeansPerFeeUnit = sdk.NewUint(1_000_000_000_000) // $1
    DefaultBeansPerInboundTx = DefaultBeansPerFeeUnit.Quo(sdk.NewUint(100)) // $0.01
    DefaultBeansPerMessage = DefaultBeansPerFeeUnit.Quo(sdk.NewUint(1_000)) // $0.001
    DefaultBeansPerMessageByte = DefaultBeansPerFeeUnit.Quo(sdk.NewUint(50_000)) // $0.00002
    DefaultBeansPerMinFeeDebit = DefaultBeansPerFeeUnit.Quo(sdk.NewUint(5)) // $0.2
    DefaultBeansPerStorageByte = DefaultBeansPerFeeUnit.Quo(sdk.NewUint(500)) // $0.002
    DefaultBeansPerSmartWalletProvision = DefaultBeansPerFeeUnit // $1
  • According to my theory, the flake we see above is introduced in a state where;
    • There was already an amount of IST fees accrued for gov1 that is close to 0.2 IST but not greater
    • Fees arised from tryExitOffer made the already accrued fees go beyond 0.2 IST so 0.2 IST is deducted from gov1's IST balance
    • Since the amount of IST reclaimed by tryExitOffer is 0,015 IST (15_000 uist), the below condition is never true;
      istBalance => istBalance > istBalanceBefore,

It's pretty difficult to re-create the state described above. So I suggest we get rid of all instances where IST balance is not checked using tryIstBalances. I believe this would remove the flake we see here and others that we'll possibly see as the number of tests increase. I've created #10774 to track this. And the flake in subject here is going to be included as a work item over there.

@dckc
Copy link
Member Author

dckc commented Dec 28, 2024

Shouldn't gov1's purse be notified after execution fee is deducted from gov1's IST balance?

No.

See https://docs.agoric.com/guides/getting-started/contract-rpc.html#vbank-assets-and-cosmos-bank-balances

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flake flakey test test
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants