Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace web30 and clarity with ethers-rs #297

Merged
merged 116 commits into from
Dec 1, 2021
Merged

Conversation

EricBolten
Copy link
Contributor

Replace web30 and clarity with ethers-rs

This change was motivated by issues #263 and #264, and while investigating remediations for the issues with the num256 library we decided to implement a wider-scale refactor to remove transitive dependencies entirely and depend less on custom code.

This was a very large changeset of many tightly coupled dependencies, so it was not really possible to have a super clean commit history. Since most builds would be broken along the way, I didn't try to keep the history unbroken, so there are many WIP commits along the way of partially finished work. I am opening this as a draft PR as this version compiles and passes unit testing, but it would be safer to have more testing to feel confident in merging.

What this PR does

  • Replace clarity and web30 with ethers
  • Rewrite some functionality that was either tightly coupled to the mentioned libraries, or is simply structured differently when using ethers
  • Extract some functionality living within web30 and clarity that the orchestrator relies on
  • Remove manual handling of ABIs with generated code using the ethers abigen feature (note: to update if the ABIs change, first update the JSON in gravity_abi and then cargo run abi_build)

What this PR does NOT do

  • Add additional testing beyond what was already in place
  • Substantially change the function call graph of the orchestrator -- the structure was left mostly in place, and there are likely numerous improvements to be made to reduce coupling and increase testability, but this PR was long enough as is
  • Fully remove num256 -- there are a few instances where Uint256 is still required (mainly, any time that we must deal with the Coin object from deep_space) so to fully remove num256 from the project we'd also have to refactor the Cosmos side

Open questions

The first goal of this branch was to get a version of the orchestrator having removed the aforementioned libraries that fully compiles. This has been achieved, but there are likely mistakes, additional required testing, and some open questions about how we should configure some values or handle some timeout situations. I have left TODOs around the codebase, and I will highlight a list of them here that are of particular interest.

Provider interval

// TODO(bolten): should probably set a non-default interval, but what is the appropriate
// value?

The ethers Provider, which is the struct used for dealing with requests and transport deep inside the middleware stack, sets a default polling interval of 7 seconds. We should consider whether this value is suitable, or whether different parts of the orchestrator may want to use different polling intervals.

Wallet typing

// TODO(bolten): is using LocalWallet too specific?

I left this TODO here but it could be in a lot of places. This branch uses LocalWallet in the SignerMiddleware, which presumes key material has been loaded locally from a keystore or elsewhere. Is that the expected method of using the orchestrator, or would it make more sense to switch to the more generic Wallet type to account for remote signers?

Remaining uses of Uint256

// TODO(bolten): deep_space's Coin type relies internally on clarity's Uint256,
// and it would make this code super akward to try to get around that...replacing
// that here might be part of a broader deep_space replacement

As mentioned above, Coin still needs Uint256. Here is an example of a file where this is true.

Displayable values in SigningKey and VerifyingKey

// TODO(bolten): is ethers wallet even capable of printing the public and
// private keys? for now, leaving load_clarity_key in config.rs and
// maintaining the functionality here

As far as I can tell, SigningKey and VerifyingKey don't have a way to print out their values, as required by the show command in this file. Rather than trying to write something custom to do that I left the clarity key handler in place so I could still use it for this specific command. This was also true in the register_delegate_keys package.

Delegate key signing format

let data = keccak256(buf); // TODO(bolten): the rest of the orchestrator expects a hash as a message...here too?

The Gravity contract expect signed Ethereum messages to have the message prefix, a fixed length of 32, and a signed hash, as opposed to any other message type. Wanted to verify that was true in this case as well.

Long request timeouts

// TODO(bolten): original version of this used a 120 second timeout when querying
// the eth chain, should we replicate that in eth_client?

The original code passed a web3 client with a 120 second timeout when making calls in oracle_resync.rs. The Provider and Signer middlewares don't allow you to specify a timeout, as far as I can tell, they will run forever unless you stop them. There are also options to specify how many confirmation blocks we want to wait to resolve to a transaction receipt, which defaults to 1 and that mirrors how web30 was behaving. I'd like to get clarity on this long timeout and see if we need to do something like wrapping the oracle resync calls in tokio timeouts or something like that.

net_version vs eth_chainId

// TODO(bolten): technically we want the network id from net_version, but chain id
// should be the same for our use cases...when this PR is in a released version of
// ethers we can move back to net_version:
// https://github.com/gakonst/ethers-rs/pull/595

As of this branch, ethers did not support retrieving net_version. A chain does not have to have a matching network version and chain ID, but in most cases they match, and in likely all cases that we will specifically care about. Nonetheless, after bringing it up to the devs, a PR was created and merged, though the code is not yet in a released version of ethers. We should update this when the next release happens.

Dead commands

// TODO(bolten): I guess this command is also non-functional?
// are the commands under tx dead code?

It seems like the commands under tx in gorc may be dead code. I wasn't sure of their purpose, so I didn't mess around with them, but we should delete them if they aren't meant to be used.

Unsorted validator set warning

// TODO(bolten): perhaps there is a better way to handle this than logging the event?
// what would the downstream effects of not returning a ValsetUpdatedEvent here?
if validators != check {
warn!(
"Someone submitted an unsorted validator set, this means all updates will fail until someone feeds in this unsorted value by hand {:?} instead of {:?}",
validators, check
);
}

This code suggests we could get in a bad state that can only be fixed manually, but all it does is warn! and proceed. I'm curious as to what we should do if we hit this condition, because it's probably something other than this.

ERC20 allowance verification

// TODO(bolten): verify if this check is sufficient/correct
// Check if the allowance remaining is greater than half of a U256 - it's as good
// a test as any.
Ok(allowance > (U256::MAX.div_mod(2u32.into()).0))

This is how web30 was calculating if a given address could spend an ERC20 using a contract. Not sure how this allowance check was decided on but would like to verify it's what we expect.

Finding the latest valset

// TODO(bolten): the original logic only checked one valset event, even if there may have been multiple within the
// filtered blockspace...need more clarity on how severe an error it is if one of these events is malformed, and if
// we should return early with an error or just log it the way the previous version did

This is one area where I actually rewrote the logic, as it seemed like the original event searching logic would search some blockspace, get a list of valset events, only check the first one, and in some cases proceed to search earlier blockspace while ignoring the rest of the events it had retrieved. That was more of just a note to ensure this gets careful review, but in general it's also worth considering what to do with errors like finding an event that doesn't serialize cleanly into one of our types, and whether that's just an error message or something that warrants halting the process and investigating.

Bridge halting on overflows

Ok(downcast_to_u64(valset_nonce).expect("Valset nonce overflow! Bridge Halt!"))

// TODO(bolten): we're also throwing panics if we encounter downcast errors in
// ethereum_gravity/src/utils.rs, we should consider broadly how to handle
// these sorts of error conditions
let v = input
.iter()
.map(|sig| u8::try_from(sig.v).expect("Gravity Signature v overflow! Bridge halt!"))
.collect();

There are a couple situations in the two above files where it seems like the intent is to halt the bridge if we hit a numerical overflow condition. I think it would be worth verifying that this is intended behavior, or if there is a different way to recover.

Only gravity_utils used its direct dependency on num256. This removes
the unused num256 dependencies. Explicit uses of num256 are highly tied
to clarity, so trying to replace them at the top level is not useful and
we should just proceed to replace clarity and web30..
Since types just get passed down through so many functions, there isn't
really a way to break the refactor into commits that compile cleanly.
Will keep pushing progress into a broken WIP commit until the types are
all statically satisfied.
Starting the process of directly using the ABI by removing the custom
parsing for ValsetUpdatedEvent and using functionality from ethers
abigen instead. Debating whether or not to leave it as a macro (which
makes it kind of difficult to introspect) or to check in a version of
the generated code.

Also moved the downcasting functions to the generic gravity_utils
package so other code can easily pull it in.
All event types are now using abigen versions. Additionally, it appears
that the conversion of LogicCallExecutedEvent from a log was just marked
as "unimplemented" but now has been added as well. Also rewrote the
event nonce filtering using iterators and adaptors. There's probably a
way of further using traits to reduce code duplication here but it's not
a huge issue that it's not super DRY right now.
A required bugfix for contracts to function correctly is on ethers-rs
master. Here we depend on master and lock to that commit. Also have
re-run the ABI build with the new version of ethers, as well as fixed up
some use statements and type errors introduced in the newer version.
Additionally, we had to bump versions on actix and futures in order to
maintain compatibility with the latest version of ethers. We're
explicitly using std::result::Result because the new ethers prelude
includes a published type that is also called Result.
@zmanian
Copy link
Contributor

zmanian commented Nov 29, 2021

Can this be taken out of draft mode now?

@EricBolten
Copy link
Contributor Author

Good point, yes, will mark it as ready.

@EricBolten EricBolten marked this pull request as ready for review November 29, 2021 20:38
Comment on lines 28 to 33
Ok(abi::encode(&[
Token::FixedBytes(gravity_id.as_bytes().to_vec()),
Token::FixedBytes("checkpoint".as_bytes().to_vec()),
Token::Uint(valset.nonce.into()),
Token::Array(eth_addresses),
Token::Array(powers),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a function in the generated ABI that does this? it's not fatal to call the encoder directly. More of an idle question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so. As far as I can tell these structures are manually created by us for use in confirmation messages and not by any types defined in the Gravity contract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is how the contract manually composes the valset update checkpoint:

bytes32 methodName = 0x636865636b706f696e7400000000000000000000000000000000000000000000;
bytes32 checkpoint =
keccak256(abi.encode(_gravityId, methodName, _valsetNonce, _validators, _powers));
return checkpoint;

Similarly for transaction batches and for logic calls, the contract is creating a hash of a manually packed ABI structure:

keccak256(
abi.encode(
state_gravityId,
// bytes32 encoding of "transactionBatch"
0x7472616e73616374696f6e426174636800000000000000000000000000000000,
_amounts,
_destinations,
_fees,
_batchNonce,
_tokenContract,
_batchTimeout
)
),

keccak256(
abi.encode(
state_gravityId,
// bytes32 encoding of "logicCall"
0x6c6f67696343616c6c0000000000000000000000000000000000000000000000,
_args.transferAmounts,
_args.transferTokenContracts,
_args.feeAmounts,
_args.feeTokenContracts,
_args.logicContractAddress,
_args.payload,
_args.timeOut,
_args.invalidationId,
_args.invalidationNonce
)
);

Comment on lines 146 to 161
pub async fn get_call_gas_cost(eth_client: EthClient) -> Result<GasCost, GravityError> {
const GAS_LIMIT: u128 = 12450000; // the most Hardhat will allow, will work on Geth

let caller_balance = eth_client.get_balance(eth_client.address(), None).await?;
let latest_block = eth_client.get_block(BlockNumber::Latest).await?.unwrap();
let gas_price = latest_block.base_fee_per_gas.unwrap_or(1u8.into()); // "or" clause shouldn't happen unless pre-London
if gas_price == U256::zero() {
return Err(GravityError::EthereumBadDataError(
"Latest block returned base fee per gas of zero".to_string(),
));
}

let gas = min(GAS_LIMIT.into(), caller_balance.div_mod(gas_price).0);

Ok(GasCost { gas, gas_price })
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think we should just use the Etherscan API here like we do in the Cellar Rebalancer and have user provide an Etherscan key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was mirroring the web30 approach. It was also specifically for call rather than send which as far as I understand isn't meant to consume gas anyway, but we're still calculating these values. Based on your comment, it's OK to use the same gas calculation for both call and send then?

use std::convert::TryFrom;
use std::process::exit;
use std::time::Duration;
use std::{sync::Arc, time::Duration};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the Arc dependence is needed here. It's not obviously being used anywhere.

Copy link
Contributor Author

@EricBolten EricBolten Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EthClient type is an alias of Arc<SignerMiddleware<Provider<Http>, LocalWallet>>> (and we're building one here).

Comment on lines 2 to 3
//! It's a common problem to have conflicts between ipv4 and ipv6 localhost and this module is first and foremost supposed to resolve that problem
//! by trying more than one thing to handle potentially misconfigured inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file makes me slightly insane in how much it tries to auto recover from foot guns. We should definitely try and clean it up in a future PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

U256 appears to expect hex input by default when parsing a string. In
the few cases where we're taking in a string to convert to U256, use
from_dec_str explicitly. The particular usage in eth.rs probably didn't
work originally as described in the expect comment anyway, but it also
appears to be dead code, so just fixing it up to be more sane. If a
float was expected a bunch of different conversion would be necessary,
and it would also be losing information anyway.
@zmanian
Copy link
Contributor

zmanian commented Dec 1, 2021

I'm ready to merge this.

@zmanian zmanian merged commit 4bf240f into main Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants