Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream v2.59.3 #166

Merged
merged 192 commits into from
Apr 29, 2024
Merged

Upstream v2.59.3 #166

merged 192 commits into from
Apr 29, 2024

Conversation

ImTei
Copy link
Member

@ImTei ImTei commented Apr 26, 2024

No description provided.

yperbasis and others added 30 commits February 12, 2024 12:37
The txpool's defaults weren't appearing in the `ethConfig` leading to
hive test failures and (potentially) lots of nodes facing issues
* Minified Beacon Blocks
* Added sparse branching Diffs
* Removed kv.CurrentEpochAttestations and kv.PreviousEpochAttestations
* Never misses a single block on gossip (improved network)
* Simplified blocks insertion
* Added birectional balances links.
* Bugs fixed and overall stability improved.
* Mainnet Archive node size: 144 GB
* States reconstruction time: 1 Day 16 Hours
Added the prebuffering technique to save incoming blocks and avoid
ForwardSync
Since the merge of erigontech#7936,
released in version `v2.49.0` and onward, Erigon has been unable to
properly download and process historical Ethereum POW blocks. This makes
a mainnet sync from block 0 with snapshotting disabled
(`--snapshots=false`) impossible.

I discovered this issue while working on the Erigon-Pulse fork for
PulseChain, but the issue remains here, and has been verified with a
vanilla version of Erigon v2.57.3 against Ethereum Mainnet.

## Detail
When doing a full sync from block zero with snapshotting disabled, the
following error will occur:

```log
[INFO] [02-08|09:06:11.859] [3/12 Senders] DONE                      in=7h55m24.504489889s
[INFO] [02-08|09:06:11.861] [4/12 Execution] Blocks execution        from=0 to=19180016
[WARN] [02-08|09:06:12.599] [4/12 Execution] Execution failed        block=46274 hash=0x8252137084baebea389ddb826c4e7a63cbef2effc0d6526a5394377e5e8dada0 err="invalid block: could not apply tx 0 from block 46274 [0x669890439f5093b7dfbca9b06de1d6da78ceacf1ec0753252b3e6cf10ff81a3e]: insufficient funds for gas * price + value: address 0xdC49b9dc1AbB93Ebb856547a1C0578D65f4A32DE have 15000000000000000000 want 15024629952223800000"
[INFO] [02-08|09:06:12.602] [4/12 Execution] Completed on            block=46273
```

This failure is deterministic and the process will revert back to the
beginning of the `[3/12 Senders]` stage. After reprocessing senders, it
will eventually fail execution on the same transaction every time.

Looking into this issue, I discovered that the balance for the sender at
the parent block `46273` was too low. The sender should in fact have
`15156250000000000000`, enough to cover the transaction costs shown in
the error message.

Digging further into the history of this account, I found that the
balance was too low because uncle rewards for an earlier mined block had
not been properly awarded. The `15 ETH` in the chain state was the
result of 3 blocks previously mined by the account, but the uncle reward
of `0.15625 ETH` had not been awarded for block `2839`.

**This led me to discover the issue, that uncles are being dropped
entirely in the new downloader:**
1. Uncles aren't being passed from the engineapi block downloader (nil
is passed).
2. Uncles aren't being considered when mapping from native types to the
execution engine's GRPC types.
3. Uncles aren't being considered when mapping back from execution
engine's GRPC types to the native types.
4. The header stage is lacking an implementation for `GetBlock()`, which
is required for uncle verification. After passing uncles through
properly, the panic() here was triggered.

It's a little surprising this has gone overlooked for so long, but I
suspect that almost nobody is doing a full sync w/o snapshots. It is
painfully slow afterall.

> *Note that because the issue originates in the downloader, once the
execution failure has been encountered, the entire db must be deleted
and a fresh sync started. Erigon cannot recover from the downloaded
blocks being inserted into the DB w/ missing uncles.*

## Fixed

This PR adds 3 commits fixing the issue. The first is a broken test,
followed by the missing uncles fix which also resolves the broken test.
The final commit adds the missing implementations in the header stage.

This fix has been verified with a fresh sync.
* Improves Concurrency
* Simplifies Forkchoice
* Fixes ContributionAndProof
A test that generates an input file,
compresses it using both erigon and silkworm,
and compares that the result matches.
This change increases memory required from 4 GB ->10 GB but makes the
Beacon API consistently performant. By default the Beacon API is not
enabled so the memory will not be allocated consequently on the average
run.
…ontech#9459)

Updated README.md with link to official documentation on Gitbook
erigontech#9458)

* Fixed small hiccup in attestations deltas
* Added flags to caplin to determine which endpoint to expose.
* Fused ValidatorAPI <-> APIHandler
* Added benchmarking tool for latency testing
* Fixed crash in attestations/rewards handler
* Handling possible corruption of canonical chain on interrupt
This PR is to include `BlobSidecar`, `BlobIdentifier` data structure &
storage.

List of changes:
- Add `BlobSidecar` and `BlobIdentifier`
- SSZ (Simple Serialize) Marshaling/Unmarshaling
- Make it compatible with cl spectests. 
- Blob Storage `Write`,  `Read`, and `retrieveBlobsAndProofs(WIP)` 
- Simple unit tests
Downloader relies on a single writer: HeadersWriter.
* Added `blob_sidecar` resp/req support
* Added `blob_sidecar` and`subnet` to gossip
The method was iterating over snapshots.segments.Min().segments, but
passing the index to view.Segments().

This might lead to a crash, because those 2 collections are different,
and the indexes might not match.

view.Segments() only contains the snapshot files that are fully
downloaded and indexed.

The code is changed to only use the view files (as it was before).
yperbasis and others added 27 commits March 15, 2024 14:02
This removes a never ending loop when the downloader db is removed and
the web downloader is not present
# Caplin: Attestation Production Design

When we are required to produce an attestation data object. we first
asses 2 things:

* What epoch is it from?
* Is it from a future epoch?
* Is it from a past epoch?

This attestationData object changes only within different epochs, it is
extremely cheap to produce `AttestationData` for the current epoch, but
extremely expensive for past and future epochs.

### Current Epoch case

```go
return solid.NewAttestionDataFromParameters(
		slot,                                  // slot requested
		committeeIndex,                        // committee_index request
		headBlockRoot,                         // current head block root
		baseState.CurrentJustifiedCheckpoint(),// current justified checkpoint
		solid.NewCheckpointFromParameters(     // finalization vote
			targetRoot,
			targetEpoch,
		),
	), nil
```

This works in 90 microseconds on an M1 machine so no problem here.

### Past Epoch case

There is no past epoch algorithm for attestation production, what we do
instead is that we keep previously compute `AttestationData` in an
internal cache so that we can just reuse the already computed ones.

### Future Epoch case

Unfortunately, there is no workaround here :(, we just simulate the next
epoch which takes 500ms in case we need to build `AttestationData`s
preemptively.
slot.IDHash is used as a tx key, but for isSystemTx case, its value is
not fully reset on loop iterations.

PutUint64 only sets the first 8 bytes while the rest 24 bytes contained
garbage - a partial hash of a previous tx.
Earlier assumption was that `PruneLogIndex` would only have to deal with
block numbers after the last prune. So it could work with
`tx.cursor.First()`. Now, after prune hack, `kv.TransactionLog` contains
older logs for `noPruneContracts`, so, it must seek to the point of last
prune progress.

Without this, for every update of head block, the stages would run
incrementally, but `LogIndex Prune` would run from the beginning of
`TransactionLog` table, spending 1-2 hours.
To avoid unnecessary map copy, let's use `sync.Map` to manage
subscriptions
erigontech#9755)

…oad maybe not completed yet)

Cherry pick PRs erigontech#9752 and erigontech#9744 into the release

---------

Co-authored-by: Alex Sharov <[email protected]>
Co-authored-by: Mark Holt <[email protected]>
Cherry pick PR erigontech#9757 into the release

---------

Co-authored-by: Mark Holt <[email protected]>
Cherry pic PR erigontech#9734 into the release

Co-authored-by: Giulio rebuffo <[email protected]>
Cherry pick PR erigontech#9796 into the release

Co-authored-by: Giulio rebuffo <[email protected]>
erigontech#9823)

… too late checking for whitelist. need check before adding to lib

Cherry pick PR erigontech#9804 into the release

Co-authored-by: Alex Sharov <[email protected]>
…#9315)" (erigontech#9833)

Revert PR erigontech#9315 since it apparently causes MDBX bloat and performance
regression. The regression was introduced in
[v2.58.0](https://github.com/ledgerwatch/erigon/releases/tag/v2.58.0).
Cherry pick PR erigontech#9837

---------

Co-authored-by: Anshal Shukla <[email protected]>
@ImTei ImTei requested review from mininny and pcw109550 April 29, 2024 21:46
@ImTei ImTei merged commit 91c6c5a into op-erigon Apr 29, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.