Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream v2.60.0 #168

Merged
merged 266 commits into from
May 17, 2024
Merged

Upstream v2.60.0 #168

merged 266 commits into from
May 17, 2024

Conversation

ImTei
Copy link
Member

@ImTei ImTei commented May 9, 2024

domiwei and others added 30 commits March 21, 2024 11:47
- Update erigon-lib/gointerface
- Utilize `OverwriteSubscriptionExpiry` to reset expiry on subscription
for SentinelServer/SentinelClient
# Downloader lock desing

The snapshot lock design has been changed to be more flexible, before it
was an empty file which determined whether to skip snapshots or not. now
this file got extended a little bit.

We now treat the lock file as a JSON file which has the following
format:

```json
["headers", "bodies", etc...]
```

the strings are the stringfied snapshot types:

```go
var Enums = struct {
	Unknown,
	Headers,
	Bodies,
	Transactions,
	BorEvents,
	BorSpans,
	BeaconBlocks,
	BlobSidecars ,
}
```

After each download is finished we push into the list the enums of the
downloaded snapshots (for a normal sync that is only `Headers`, `Bodies`
and `Transactions`+ Bor).

When the node starts we prohibit all the snapshot types in the lock file
and keep download open for the ones not in it.

---------

Co-authored-by: alex.sharov <[email protected]>
…#9779)

This is kind of a low priority issue.

We need blobs corresponding to each blocks, but before we were not
checking if we received all of them, now we ask until the request
returns the desired amount.
new_heads events were not correctly emitted during block forks. This fix
ensures accurate event generation and emission in fork scenarios.
Shortening body encode/decode RLP code by extracting repeated parts (or
similar parts) of the code into own functions (aka reusability) and
increasing readability (not super necessary, just avoiding code
repetition)
Cherry picked initial commit against devel. Beginning of discussion is
here erigontech#9750
there are bor-mainnet and sepolia files
also up some deps like `x` and `grpc`
`miner.recommit` flag wasn't registered earlier to use in cli. This PR
adds that.

Note: This is for validator support on Polygon PoS.
This PR also does the following additional things:

* Introduces correct JSON marshalling/unmarshalling to all objects
* BaseFeePerGas marshalled as integer on json caplin-wise
* Reduced dumpSlotsStates from 32 to 4 for better performance during
reorgs
* Added full lock for `OnAttestation`

## Block Production

This section highlights how `GET eth/v3/validator/blocks/{slot}` creates
a block and then publishes it.

The validator client will do execute 2 steps when producing a beacon
block.

1) Production step: tell the beacon client to create a block and return
it.
2) Publishing step: Sign the block with the proposer private key, and
send it back for publishing to other nodes.


### Block creation

Let's first look at how block creation happens.

So Caplin needs to do 2 things to successfully create a block:

1) Asking the Execution Layer for the Execution block
2) Retrieve Consensus Operations to include in the block
(Attestations/VoluntaryExits, etc...)

#### Execution block

For the execution block, it is quite simple, we ask Erigon to produce us
a block through the `AssembleBlock` function avaiable on the Erigon
`Eth1` API. We treat erigon as a black box so we do not need to worry
too much about this. However, we also need to handle **Blob** bundles,
so that later, when we need to publish a block. we can publish the
bundles alongside the block (it is important that peers both receive
block and the blob or we will fail a check). (Erigon will also gives us
the bundle). Right now, we store the blob bundle in an `LRU` which has
size set to 8 blocks worth of blobs. **Note: we use an LRU for the
convenient eviction policy**.

#### Operations

TODO.

Operations inclusion has not been implemented yet, the execution block
is the only thing being delivered.

### Block publishing

After we produce the beacon block, we will send it back to the Validator
Client, which will sign it and re-forward it to the rest of the network.

The flow is straightforward, when we receive the block we simply:
1) pack the block with the blobs the Execution Layer gave caplin during
block production
2) Start a separate thread where we import the block into Caplin's
database and forkchoice alongside the blobs..
3) Publish blobs and blocks to the P2P.
One line summary: We must use pointer receiver type for tx methods
because if not, value receivers will copy the struct containing
`atomic.Value` field, which is prohibited to be copied.
`TransactionMisc` struct is embedded to to every tx type, so we must use
pointer receiver for every tx methods.

For more context, struct `TransactionMisc` is defined as below:
```go
type TransactionMisc struct {
	// caches
	hash atomic.Value //nolint:structcheck
	from atomic.Value
}
```

`TransactionMisc` is embedded to struct `AccessListTx`, `BlobTx`,
`BlobTxWrapper`, `DynamicFeeTransaction`, `CommonTx`, `LegacyTx`.
Methods for these structs tend to use [value receiver, not a pointer
receiver](https://go.dev/tour/methods/8).

When value receiver method is used, the program copies the struct value
and uses it. `TransactionMisc` is embedded, so its fields `hash` and
`from` are also copied.

However these fields' types are `atomic.Value`, which are [not allowed
to be copied](https://go.dev/src/sync/atomic/value.go) after first use.
This guideline is also mentioned at [Google's golang style
guide](https://google.github.io/styleguide/go/decisions#receiver-type).

Therefore we must use pointer receiver to avoid synchronization issues.
Also, using pointer receivers may be more efficient if receiver is large
struct.

Co-authored-by: Andrew Ashikhmin <[email protected]>
This PR contains changes which related to gather information about
"Bodies" stage.
Change list is next: 
- added entities for block download, write, process and processing
- added listeners and collect info for above 
- added API to query this data
… too late checking for whitelist. need check before adding to lib (erigontech#9804)
This PR does the following:
* Implement correct handling of beacon proof handler (without
aggregation)
* Disable beacon aggregate and sync committee contribution gossip if not
in validator mode

Check implemented:
https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#beacon_aggregate_and_proof
SonarCloud highlights duplicated code branches as bugs
The responsibility to maintain the status data is moved from the
stageloop Hook and MultiClient to the new StatusDataProvider. It reads
the latest data from a RoDB when asked. That happens at the end of each
stage loop iteration, and sometimes when any sentry stream loop
reconnects a sentry client.

sync.Service and MultiClient require an instance of the
StatusDataProvider now. The MessageListener is updated to depend on an
external statusDataFactory.
taratorio and others added 26 commits May 2, 2024 09:36
…ch#10164)

fixes a 2nd regression introduced by -
erigontech#7593

- it generates duplicate struct types in the same package (check
screenshot below)
- also found a better way to fix the first regression with unused
imports (improvement over
erigontech#10091)

<img width="1438" alt="Screenshot 2024-04-30 at 17 30 42"
src="https://github.com/ledgerwatch/erigon/assets/94537774/154d484b-4b67-4104-8a6e-eac2423e1c0e">
Cherry pick PR erigontech#10155 into the release

Co-authored-by: Dmytro <[email protected]>
…nt/GetHeader (erigontech#9786) (erigontech#9894)

* improved logging
* check ctx in ServeHTTP: The context might be cancelled if the client's
connection was closed while waiting for ServeHTTP.
* If execution API returns ExecutionStatus_Busy, limit retry attempts to
10 seconds. This timeout must be lower than a typical client timeout (30
sec), in order to give the client feedback about the server status.
* If execution API returns ExecutionStatus_Busy, increase retry delay
from 10 ms to 100 ms to avoid stalling ourselves with multiple busy
loops. IMO this delay should be higher (e.g. 1 sec). Ideally we
shouldn't do polling at all, but doing a blocking ctx call requires
rearchitecting the ExecutionStatus_Busy logic.

see erigontech#9786
Cherry pick PR erigontech#10187 into the release

Co-authored-by: Giulio rebuffo <[email protected]>
This PR brings the changes of erigontech#10195 to the branch release/2.60 with the
necessary modifications
Running a test every day doesn't make sense on an inactive branch. 
It also seems that the schedule trigger favours the main branch if the
test workflow has the same name on the main and other branches.
So this PR changes the test trigger to "push events".
Cherry pick PR erigontech#10214 into the release

Co-authored-by: Alex Sharov <[email protected]>
…#10224)

This adds torrent fixes that remove bad peers due to non handling of
http errs.
fixed start diag server if metrics address is different from pprof
address

---------

Co-authored-by: taratorio <[email protected]>
fix for 
```
[p2p] Server                             protocol=68 peers=2 trusted=0 inbound=1 LOG15_ERROR= LOG15_ERROR= LOG15_ERROR= LOG15_ERROR= LOG15_ERROR= i/o timeout=53 EOF=65 closed by remote=215 too many peers=6 ecies: invalid message=5
```
@ImTei ImTei changed the title Upstream v2.60.0-rc1 Upstream v2.60.0 May 16, 2024
@ImTei ImTei requested review from mininny and pcw109550 May 16, 2024 00:27
@ImTei ImTei merged commit 4b1a315 into op-erigon May 17, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.