You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BeginBlocker and EndBlocker are a way for module developers to add automatic execution of logic to their module. This is a powerful tool that should be used carefully, as complex automatic functions can slow down or even halt the chain.
...it is important to remember that application-specific blockchains are deterministic. Developers must be careful not to introduce non-determinism in BeginBlocker or EndBlocker, and must also be careful not to make them too computationally expensive, as gas does not constrain the cost of BeginBlocker and EndBlocker execution.
The Gravity Bridge EndBlocker contains a lot of computations:
// EndBlocker is called at the end of every blockfuncEndBlocker(ctx sdk.Context, k keeper.Keeper) {
params:=k.GetParams(ctx)
slashing(ctx, k)
attestationTally(ctx, k)
cleanupTimedOutBatches(ctx, k)
cleanupTimedOutLogicCalls(ctx, k)
createValsets(ctx, k)
pruneValsets(ctx, k, params)
pruneAttestations(ctx, k)
}
Slashing functions, in particular, contain the same pattern of nested loops (here shown on the example of BatchSlashing, shortened):
Let's try to estimate the numbers here. Current Ethereum can handle around 30 transactions per second (TPS). Vitalik Buterin, one of the founders of Ethereum, has alleged that Ethereum 2.0 may eventually scale to as many as 100,000 TPS using sharding and other tactics. As the motivation for the Gravity Bridge is to bring transactions from the current slow Ethereum to the much faster Cosmos, suppose the bridge becomes moderately successful, and handles 1,000 TPS flowing from Cosmos to Ethereum.
We have, multiplied:
10 batches (as 1 batch includes 100 transactions)
7 seconds per block
125 bonded validators
125 confirmations (each batch needs to be confirmed by each validator)
This gives us 70 * 125^2 = ~ 1,093,750 operations per nested slashing loop; there are 3 such nested loop constructs, which gives 3,281,250 innermost operations per block. Suppose the innermost loop finishes at half iteration, which gives ~ 1,600,000 operations per block. The innermost operation is this:
Out of those sdk.AccAddressFromBech32 is relatively simple, though still involves non-trivial validation logic. The GetOrchestratorValidator, on the other hand, spans ~80 lines of code, involving calls to the store and to the stacking keeper, which are themselves non-trivial. Let's be highly speculative, and estimate each line costing 1 clock cycle, with 100 lines in total. This gives 160,000,000 clock cycles -- that's a lot of processing only for the slashing in the EndBlocker.
This is a highly speculative estimation; but it gives the general idea: the computations in the EndBlocker are too heavy. What is particularly worrisome is the multiplication linear dependency on the number of transactions, and quadratic dependency on the number of validators. This means, e.g., if a bridge becomes successful, and sees an 10x influx of users, the computational load on executing Cosmos transactions will also increase 10x. The situation can become more dramatic quickly, because batch creation can be triggered by anyone, even for a single transaction. Going from batches of 100 transactions to batches of 1 transaction means an instant 100x increase in computational requirements: this may happen both due to legitimate users trying to get their transactions through faster, as well as due to the deliberate attacks on the Cosmos chain. Each of the aforementioned increases in computational requirements should be multiplied with the quadratic validator dependency.
Problem Scenarios
The root of the problem lies in the linear and quadratic dependency of the block computational load on the number of transactions / validators, respectively. The following scenarios seem feasible:
Gravity Bridge starts working fine, and attracts more users. At some point it becomes sufficiently popular that the computational load on full nodes is no longer bearable, the Cosmos chain performance deteriorates.
With the deterioration of chain performance, the desperate users will try to get their tokens through the bridge, and will start to request batches with a single transaction. This suddenly increases the computational load on full nodes by a factor of 100. Alternatively, the 100 increase can be triggered maliciously by submitting a lot of 1-transaction batches. The chain halts.
Another consequence is, most probably, that when the chain recovers, all validators will be slashed by Gravity Bridge in one sweep, because they were not able to confirm transactions.
Recommendation
Reduce as much as possible cost of computations done in the EndBlocker; try to move most computations away from the EndBlocker.
Re-think the slashing algorithm. Get rid of the quadratic dependency of the computational cost on the number of validators; the dependency should be at most linear.
The most simple and straightforward approach is e.g. to store the set of awaited confirmations, and remove an element from this set as soon as the confirmation arrives. This distributes the load to the time when a confirmation arrives, and makes slashing in the EndBlocker a straightforward linear pass over expected, but not received confirmations.
Other optimizations are possible, and should be used; e.g., using a HashSet would reduce the computational cost to logarithmic.
The text was updated successfully, but these errors were encountered:
Original issue
Surfaced from @informalsystems audit of Althea Gravity Bridge at commit 19a4cfe
severity: High
type: Implementation bug
difficulty: Intermediate
Involved artifacts
Description
As specified in the Cosmos SDK documentation:
Also here:
The Gravity Bridge EndBlocker contains a lot of computations:
Slashing functions, in particular, contain the same pattern of nested loops (here shown on the example of BatchSlashing, shortened):
Let's try to estimate the numbers here. Current Ethereum can handle around 30 transactions per second (TPS). Vitalik Buterin, one of the founders of Ethereum, has alleged that Ethereum 2.0 may eventually scale to as many as 100,000 TPS using sharding and other tactics. As the motivation for the Gravity Bridge is to bring transactions from the current slow Ethereum to the much faster Cosmos, suppose the bridge becomes moderately successful, and handles 1,000 TPS flowing from Cosmos to Ethereum.
We have, multiplied:
This gives us 70 * 125^2 = ~ 1,093,750 operations per nested slashing loop; there are 3 such nested loop constructs, which gives 3,281,250 innermost operations per block. Suppose the innermost loop finishes at half iteration, which gives ~ 1,600,000 operations per block. The innermost operation is this:
Out of those
sdk.AccAddressFromBech32
is relatively simple, though still involves non-trivial validation logic. The GetOrchestratorValidator, on the other hand, spans ~80 lines of code, involving calls to the store and to the stacking keeper, which are themselves non-trivial. Let's be highly speculative, and estimate each line costing 1 clock cycle, with 100 lines in total. This gives 160,000,000 clock cycles -- that's a lot of processing only for the slashing in theEndBlocker
.This is a highly speculative estimation; but it gives the general idea: the computations in the
EndBlocker
are too heavy. What is particularly worrisome is the multiplication linear dependency on the number of transactions, and quadratic dependency on the number of validators. This means, e.g., if a bridge becomes successful, and sees an 10x influx of users, the computational load on executing Cosmos transactions will also increase 10x. The situation can become more dramatic quickly, because batch creation can be triggered by anyone, even for a single transaction. Going from batches of 100 transactions to batches of 1 transaction means an instant 100x increase in computational requirements: this may happen both due to legitimate users trying to get their transactions through faster, as well as due to the deliberate attacks on the Cosmos chain. Each of the aforementioned increases in computational requirements should be multiplied with the quadratic validator dependency.Problem Scenarios
The root of the problem lies in the linear and quadratic dependency of the block computational load on the number of transactions / validators, respectively. The following scenarios seem feasible:
Recommendation
EndBlocker
; try to move most computations away from theEndBlocker
.EndBlocker
a straightforward linear pass over expected, but not received confirmations.The text was updated successfully, but these errors were encountered: