Change implementation of ticking in the HFC to tick-then-translate-then-tick #339

amesgen · 2023-09-06T13:39:12Z

Status quo

Currently, when ticking a ledger state across an era boundary in the HFC, we use the translate-then-tick strategy.

Click to see current implementation details

HFC ticking is implemented here:
https://github.com/input-output-hk/ouroboros-consensus/blob/0ca9ca08f41b04619dc9ed1df692102ef9ba3c0e/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/HardFork/Combinator/Ledger.hs#L121-L154

Note that we first translate in extendToSlot if the target slot is in a new era:
https://github.com/input-output-hk/ouroboros-consensus/blob/0ca9ca08f41b04619dc9ed1df692102ef9ba3c0e/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/HardFork/Combinator/State.hs#L212
https://github.com/input-output-hk/ouroboros-consensus/blob/0ca9ca08f41b04619dc9ed1df692102ef9ba3c0e/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/HardFork/Combinator/State.hs#L226-L228

This means that when we tick a ledger state s in era FromEra (which arose by applying a block with slot x) to a slot y in the next era ToEra, we

first translate the FromEra ledger state s to a ToEra ledger state s' (s' is now a FromEra ledger state, even though its tip slot x hasn't moved, so is still before the era boundary), and
then tick s' to slot y,

yielding the ledger state s''' we can use to validate blocks in slot y.

In the following diagram, this means that we start at s, first go right and then down.

╔══════════════╦═════════════════════════════════════╗
║ Time \ Era   ║ FromEra ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌⇢ ToEra ║
╠══════════════╬═════════════════════════════════════╣
║  Start - x   ║     s ──── translate ────────→ s'   ║
║      ┊       ║     │╲                         │    ║
║      ┊       ║  tick ╲tick                    tick ║
║   Boundary ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ ║
║      ┊       ║     │   s' ── translate ─→ s'' │    ║
║      ┊       ║     │                      ╲   │    ║
║      ┊       ║     │                   tick╲  │    ║
║      ┊       ║     │                        ╲ │    ║
║      ⇣       ║     ↓                         ↘↓    ║
║  Target - y  ║     s' ─── translate ────────→ s''' ║
╚══════════════╩═════════════════════════════════════╝

Why the status quo is problematic

The current behavior is causing IntersectMBO/cardano-ledger#3491; we need some context how Ledger handles updates to the on-chain protocol parameters before Voltaire (ie in all Shelley-based eras before Conway):

At every epoch boundary, the protocol parameters can change if sufficiently many Genesis keys submit the same update proposal.
This logic is handled by the UPEC rule, which is executed when eg ticking.

This mechanism changed completely in Conway; in particular, there is no direct analogue to update proposals signed by Genesis keys. This means that all such proposals are discarded when translating a Babbage ledger state to a Conway ledger state.

Hence, as we currently first translate and then tick, the Conway logic responsible for ticking has no way of knowing that it should update the protocol version (or any other updateable parameter).

Proposed change

In this issue, we suggest to change the HFC cross-era ticking to the tick-then-translate-then-tick approach, ie first tick to the epoch boundary, then translate the ledger state, and then tick to the requested slot.

In more detail: In order to tick a ledger state s (with tip x) in era FromEra to a slot y in the next era ToEra, we

first tick the ledger state s to the first slot in FromEra, yielding a ledger state s',
then translate the FromEra ledger state s' to a ToEra ledger state s'', and
finally tick the ledger state s'' to slot y

again yielding the desired ledger state s'''.

In the diagram above, we again start at s, tick by moving diagonally to s', then translate by moving horizontally to s'', and finally tick again by moving diagonally to s'''.

This way, we use the FromEra logic to tick across the era/epoch boundary, which can then execute the UPEC rule in the example above.

Slogan: Cross-era ticking is fundamentally something that happens at the end of an era, so it should be done using the logic of that ending era.

Alternatives

We considered the following alternatives:

Use the tick-then-translate apprach, ie tick directly to the target slot, and then translate.

While this would probably work fine with existing any likely also with future eras, it seems wrong to use the ticking logic of the old era to let time pass across slots that lie purely within the new era. Also, we again have (just before translating) a ledger state in the "wrong" era as with the current translate-then-tick approach.
Don't change the HFC logic at all, but rather introduce an ad-hoc field in the Conway ledger state that records the Babbage update proposals, such that they can be preserved by translating.

This seems quite ugly, ie this field would only be present in the first Voltaire era, and it would require ledger rule logic that is purely related to era transitions, which is something that the ledger rules usually do not have to handle explicitly.

Remarks

Note that the HFC mechanism that detects whether we should transition in the first place (see singleEraTransition) is not affected by this bug; we still properly transition from Babbage to Conway.

The text was updated successfully, but these errors were encountered:

nfrisby · 2023-09-09T16:12:28Z

A hypothetical scenario that would conflict with tick-then-translate:

Suppose the first epoch (or every, but at least the first) of an era involves some big calculation that is incrementalized by the ledger rules. If we use the previous era's rules to tick across some of the first slots of that era, then the new era's incrementalized calculation wouldn't happen in those slots, unless the translation function anticipated this and therefore also invoked that aspect of the ticking logic.

A comparable criticism can be made against tick-translate-tick, but I think it requires more niche assumptions.

Suppose some new era expects that its epoch-transition logic is invoked at the very beginning of the era. If we only tick across that epoch transition using the previous era's rules, then this logic won't be invoked, unless the translation function anticipates this and explicitly duplicates/invokves that desired epoch-transition logic.

I claimed that is more niche, since, as far as I know, all of the epoch-transition logic in the existing Cardano eras would have no effect if it were executed at the epoch-transition that demarcates the beginning of that era.

Relatedly, it's notable that the initial Byron ledger state considers the first slot to be the "latest slot" (... there is assumed to be an EBB in it :/) and the first epoch to be the "current epoch". So, at least for the Cardano eras so far, it seems all epoch-transition logic is assumed to happen only at the end of an epoch, which is compatible with the proposed tick-translate-tick semantics (and any desired exceptions to that assumption could be snuck into the translation functions). See here.

It's also notable that our intuitive rule for when the HFC actually transitions to the next era is whenever a Cardano era's rules increase the major protocol version. Those governance rules have typically been considered to happen at the end of the epoch. Though once you think carefully, it seems like that moment cannot actually be distinguished from the beginning of the next epoch, until you consider an era transition that has entirely different governance structure (such as Byron-to-Shelley or Babbage-to-Conway)!

See #339 for explanation/motivation.

amesgen · 2023-10-11T16:48:31Z

This approach is (almost, one commit needs squashing) fully implemented in #340, and solves the motivating issue as described above.

However, we have since decided that we do not want to currently commit to a concrete scheme how the HFC will handle cross-era ticking, at least until we have figured out #345 and #418.

@lehins

Reverting the hacky approach of #366. Closes #1239 by superseding it. Addresses IntersectMBO/cardano-ledger#4635 (comment). # Justifying backwards-compatibility This PR touches the Cardano ledger rules, concretely the logic for translating a Babbage ledger state to a Conway ledger state. As the Conway HF already happend on mainnet, it is crucial to argue why this change retains backwards-compatibility with the historical chain. ## TL;DR - The original reason for #366 was resolved by the refactoring in IntersectMBO/cardano-ledger#4253, making the hack here in Consensus unnecessary. - The accidental side effects of #366 around pointer addresses were made "official" in IntersectMBO/cardano-ledger#4647. Therefore, it is fine to revert #366 without replacement. ## Detailed overview ### Context on HFC ledger ticking When the HFC ticks a ledger state across an era boundary from A to B, it does so via the "translate-then-tick" scheme: 1. First, the ledger state in A is translated into a ledger state in B. 2. Second, the ledger state is ticked to the target slot across the epoch/era boundary, using the logic of era B. For Cardano, the logic for these two operations lives in Ledger, or rather, it *should* live in Ledger. However, in #366, we introduced a temporary workaround/hack by modifying the translation logic from Babbage to Conway to resolve IntersectMBO/cardano-ledger#3491. This PR reverts the hack, such that we now again directly/transparently call Ledger logic. ### Chronology of changes to Babbage→Conway ticking 1. Mainnet era transitions are triggered by on-chain updates to the major ledger protocol version. The logic for updating the ledger protocol version lives, unsurprisingly, in the Ledger, and takes place while ticking across an epoch boundary. For `cardano-ledger-conway < 1.14` (that's significantly before any version used in a node that was mainnet-ready for Conway), this logic was broken on the era transition from Babbage to Conway, resulting in IntersectMBO/cardano-ledger#3491, ie the protocol version was *not* updated. Briefly[^1], the reason was that the governance schemes of Babbage and Conway are completely different, which caused issues because, as mentioned above, ticking across the Babbage→Conway era/epoch boundary uses the logic of Conway, which doesn't understand Babbage governance proposals, which were hence discarded during the translation step. 2. The Consensus team decided[^2] to fix this issue via #366, which updates the protocol version during the Babbage→Conway translation step in an ad-hoc fashion, by temporarily ticking the *Babbage* ledger step across the epoch/era boundary (yielding another Babbage ledger state), and then setting the `GovState` (an era-specific ledger concept deep in the ledger state, which in particular contains the current protocol parameters, and hence the protocol version) of the unticked Babbage ledger state to the one of the ticked Babbage ledger state, and then proceeding as before. Concretely, Babbage→Conway ticking now worked like this, starting with a Babbage ledger state `l0` and a target slot `s`. 1. Tick `l0` just across the era/epoch boundary to get `l1` (a Babbage ledger state). 2. Set the governance state of `l0` the the one of `l1` and get `l2` (a Babbage ledger state). 3. Translate `l2` into a Conway ledger state `l3`. 4. Tick `l4` to `s` to get the final result. 3. A few months later, for `cardano-ledger-conway-1.14`, @lehins changed in IntersectMBO/cardano-ledger#4253 how the way how protocol parameters are updated in Ledger in a way that is nicely compatible with the "translate-then-tick" scheme, see the [ADR](https://github.com/IntersectMBO/cardano-ledger/blob/a02dc6eae44287e8a1ac67ffafb8a1ecc492128f/docs/adr/2024-04-30_008-pparams-update.md) added in that PR for details[^3]. In particular, this would have allowed us to revert #366 immediately, but we didn't do so, probably because we saw now immediate motivation. (In retrospect, we should have done that immediately.) 4. A few months later, the Conway HF happened on mainnet. Due to investigating an unrelated serialization bug around pointer addresses (IntersectMBO/cardano-ledger#4589), I realized that not reverting #366 actually caused a slight difference in the ledger rules, namely regarding stake delegations from pointer addresses (also see IntersectMBO/cardano-ledger#4635 (comment)). Concretely, Ledger wants to get rid of pointer addresses as they are considered to be a misfeature and a potential liability for future projects like Leios (also see [this ADR](https://github.com/IntersectMBO/cardano-ledger/blob/master/docs/adr/2022-12-05_005-remove-ptr-addresses.md)). In Conway, stake delegations from pointer addresses are intentionally no longer considered. In particular, this happens during the SNAP rule while ticking, by invoking the [`forgoPointerAddressResolution`](https://github.com/IntersectMBO/cardano-ledger/blob/a02dc6eae44287e8a1ac67ffafb8a1ecc492128f/eras/shelley/impl/src/Cardano/Ledger/Shelley/HardForks.hs#L67) predicate on the current protocol version, branching on whether the current major protocol version is larger than `8` (the last Babbage major protocol version). - Using cardano-node 9.1 (i.e. the node that everyone was on to go to Conway), so with #366: When ticking the translated Conway ledger state into Conway, the current protocol version is `9` (the first Conway major protocol version), due to the previous ad-hoc patching of the `GovState` previously as part of the workaround from #366. Therefore, pointer addresses are *not* resolved while updating the stake distribution. - If we had reverted #366 for cardano-node 9.1: Because we directly translate the Babbage ledger state to Conway without doing the `GovState` patching before, the current protocol version while ticking is `8`, so pointer addresses *are* resolved. Altogether, the stake distribution used for the leader schedule starting in the second Conway epoch would have differed slightly (only very little stake, exactly 100 ADA, has been delegated via pointer addresses). Crucially, this difference had a chance to occur only because Ledger did *not* blank e.g. the `ptrMap` field in `IncrementalStake` during the Babbage→Conway translation. (This is actually what caused the serialization bug mentioned above.) There would have been another, less relevant difference: Because the current protocol parameters are updated *twice* with #366 (first during the Babbage tick, and then again during the Conway tick), the *previous* protocol parameters during the first Conway epoch are incorrectly equal to the *current* protocol parameters. However, the previous protocol parameters are only used for reward calculation, and reward calculation doesn't care whether the major protocol version is `8` or `9`. So this difference doesn't matter. 8. In a recent Ledger PR IntersectMBO/cardano-ledger#4647, @lehins modified the Babbage→Conway translation logic to blank out the pointer addresses, e.g. `ptrMap` in `IncrementalStake`. This change landed in Node 10.0. Therefore, the difference described in 4. does not matter anymore, as there no longer are any pointer addresses to resolve in Conway when ticking (which happens *after* translating). Crucially, this enables us to now revert #366 without replacement, because both before and after, no pointer addresses are resolved for the stake distribution while ticking from Babbage to Conway. ### Testing I tested this on mainnet by starting from a Babbage ledger state and evolving it via `db-analyser` to the first ledger state (slot `134092810`) in the second Conway epoch using full block validation, both with and without this PR. The resulting ledger states are identical. In the first Conway epoch, the ledger states differ, but only trivially in the previous protocol parameters which has no effect as explained above. We *could* also write a component-level test for the pointer address aspect, but that does not necessarily seem worth the cost/subtlety, as this is a legacy feature already. ### Concluding thoughts Generally, I think what we should take away from this is that we *really* need proper specification and testing of what exactly should happen at era boundaries, see #418 and IntersectMBO/cardano-ledger#4635, especially because certain esoteric parts of the ledger state (like pointer addresses) might not exist on any testnet. [^1]: See "Why the status quo is problematic" in #339 for the details (but ignore the rest of the issue). [^2]: After a long process that considered/prototyped various alternatives, but the details are not that relevant for this PR and the PR description is already very long. [^3]: Briefly, the logic that updates the protocol parameters on cross-epoch ticking is no longer era-dependent; rather, it just sets the protocol parameters to "future" ones that were decided on earlier by era-specific logic. The insight is that this set of future protocol parameters can be easily/cleanly translated from Babbage to Conway, and the Conway ticking logic can apply them despite having no idea *how* Babbage decided that these should be the next protocol parameters.

amesgen self-assigned this Sep 6, 2023

teodanciu mentioned this issue Sep 7, 2023

Cardano-cli reports protocol version 8 in Conway era IntersectMBO/cardano-ledger#3491

Closed

amesgen mentioned this issue Sep 7, 2023

First tick across the era boundary when translating the ledger/chain-dep state #340

Closed

2 tasks

nfrisby mentioned this issue Sep 14, 2023

[FEAT] - Generalize how the HFC handles era transitions #345

Open

amesgen added a commit that referenced this issue Sep 15, 2023

HFC ledger cross-era ticking: use tick-translate-tick strategy

de85c93

See #339 for explanation/motivation.

amesgen added a commit that referenced this issue Sep 18, 2023

HFC ledger cross-era ticking: use tick-translate-tick strategy

4f30c56

See #339 for explanation/motivation.

amesgen added a commit that referenced this issue Sep 19, 2023

HFC ledger cross-era ticking: use tick-translate-tick strategy

3fdf4a2

See #339 for explanation/motivation.

amesgen added a commit that referenced this issue Sep 20, 2023

HFC ledger cross-era ticking: use tick-translate-tick strategy

eb989c0

See #339 for explanation/motivation.

amesgen added a commit that referenced this issue Sep 21, 2023

HFC ledger cross-era ticking: use tick-translate-tick strategy

5c3a366

See #339 for explanation/motivation.

amesgen mentioned this issue Sep 22, 2023

Babbage->Conway ledger state ticking: use tick-then-translate #361

Closed

amesgen mentioned this issue Oct 11, 2023

Specify cross-era ticking/forecasting for Cardano #418

Open

amesgen closed this as completed Oct 11, 2023

amesgen mentioned this issue Oct 31, 2024

Revert hack regarding Babbage→Conway ledger state translation #1297

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change implementation of ticking in the HFC to tick-then-translate-then-tick #339

Change implementation of ticking in the HFC to tick-then-translate-then-tick #339

amesgen commented Sep 6, 2023 •

edited

Loading

nfrisby commented Sep 9, 2023 •

edited

Loading

amesgen commented Oct 11, 2023 •

edited

Loading

Change implementation of ticking in the HFC to tick-then-translate-then-tick #339

Change implementation of ticking in the HFC to tick-then-translate-then-tick #339

Comments

amesgen commented Sep 6, 2023 • edited Loading

Status quo

Why the status quo is problematic

Proposed change

Alternatives

Remarks

nfrisby commented Sep 9, 2023 • edited Loading

amesgen commented Oct 11, 2023 • edited Loading

amesgen commented Sep 6, 2023 •

edited

Loading

nfrisby commented Sep 9, 2023 •

edited

Loading

amesgen commented Oct 11, 2023 •

edited

Loading