Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make batch call size configurable #688

Closed
MartinquaXD opened this issue Oct 30, 2022 · 10 comments · Fixed by #700
Closed

Make batch call size configurable #688

MartinquaXD opened this issue Oct 30, 2022 · 10 comments · Fixed by #700
Labels
oncall Issue/PR for consideration during oncall rotation

Comments

@MartinquaXD
Copy link
Contributor

We introduced the option to execute eth_calls in batches for performance improvements in case we had to do many eth_calls at once.
But we are currently experiencing major problems with our node infrastructure and especially how dshackle handles batch calls. One idea to mitigate those issues is to execute smaller batches. For that we have to make that configurable because we are currently relying on a few constants sprinkled through the code base.

I see 2 options how to do that:

  1. Every component that uses a batch calls gets another constructor argument batch_size and passes that to CallBatch::execute_all(). This is simple to implement but leaks the implementation detail that we are doing batch calls into the signatures of every one of those components.
  2. Instead of passing a simple web3 instance to these components we pass an abstraction which knows how to execute a multiple eth_calls. Then the original component no longer needs to be aware of whether the abstraction uses batch calls or a series of "normal" calls under the hood.

I would prefer option 2 because it has better separation of concerns.

@MartinquaXD MartinquaXD added the oncall Issue/PR for consideration during oncall rotation label Oct 30, 2022
@MartinquaXD
Copy link
Contributor Author

Also since we define Web3 as a type alias in the shared crate we might also simply create a new type which derefs to the old Web3 type which also has the capabilities to do batch calls. Then we probably would only have to update the code where we create the (then different) Web3 type for our components to use.

@nlordell
Copy link
Contributor

I also prefer option 2, which would additionally allow us to dynamically chose whether or not to actually batch calls.

There is this "auto-batcher" transport that we can use everywhere.

If we go that route, we need to improve logging (since it loses its tracing_span as it runs in a separate fiber).

@fleupold
Copy link
Contributor

One thing we could potentially do in that case is to implement batching using a single eth_call (which calls the multicall contract with all the subcalls). This way we would also avoid dshackle "unbatching" our requests.

@sunce86
Copy link
Contributor

sunce86 commented Oct 31, 2022

Mentioning just so it would be linked: Replace JSON RPC batch calls with Multicalls.

@nlordell
Copy link
Contributor

One thing we could potentially do in that case is to implement batching using a single eth_call (which calls the multicall contract with all the subcalls).

Ideally, this "decision" is something that transport::Buffered component could do based on configuration. This would only work for batched eth_calls though (and a few others like eth_getBlockNumber(latest) which is supported by the Multicall contract).

I like the idea though!

@sunce86
Copy link
Contributor

sunce86 commented Oct 31, 2022

One idea to mitigate those issues is to execute smaller batches.

Would this solve the issue or just reduce the number of batch call failures?
Asking because if, for example, we need 50 eth_calls for some functionality, splitting it into two chunks of 25 would not help if one of those chunks fail, because we usually need full 50 eth_calls to continue further.

@vkgnosis
Copy link
Contributor

vkgnosis commented Nov 1, 2022

Using Multicall makes sense to me for cases where you want to enforce same state, because BatchCall does not guarantee this at all.
BatchCall is a tool to make many requests in a more efficient manner by reducing the number of individual connections. On the other hand there are also ways in which it could be less efficient like the server not being able to respond to requests the moment they finish. Even if that wasn't the case the difference in 1 vs 10 tcp connections is probably small enough to not matter in comparison to what else happens to respond to a request.
Based on this it makes sense to me to move most code to use normal eth calls which can be optionally batched in the background by something like transport::Buffered.

@nlordell
Copy link
Contributor

nlordell commented Nov 1, 2022

which can be optionally batched in the background by something like transport::Buffered.

We can also add logic to make the transport::Buffered batch eth_calls into a Multicall.

@fleupold
Copy link
Contributor

fleupold commented Nov 2, 2022

Some node providers (e.g. Ankr) charge based on the number of top_level eth_calls (not the amount of computation the call creates). Therefore using multicall could also help us save a good amount of money for our node infra.

@nlordell
Copy link
Contributor

nlordell commented Nov 2, 2022

One issue is that not all eth_calls are possible. Specifically, if you need to specify a from address, or a things like gas_price, then you can't necessarily always group eth_calls together into a Multicall.

nlordell pushed a commit that referenced this issue Nov 4, 2022
This PR is a small tangential change related to the work I started for #688, fix for 

Specifically, we discussed in the linked issue that we wanted to make use of `transport::Buffered` for having configurable batch sizes. I previously noticed that the `transport::Buffered` HTTP logs lose the `tracing::span` associated with the requests (since they are handled by a separate fiber). This will help with debuggability once 

Additionally, as discussed in Slack, this PR changes the JSON RPC request logs to be `trace` logs instead of `debug` logs. This was done for a couple reasons:
- Logging all HTTP requests by default to me seemed overly verbose. In my mind, this is more suited to `TRACE` logs. It also makes running the services locally and observing the logs easier to read.
- We now log all HTTP requests to Dshakle and upstreams separately, so we don't need to duplicate this data.

### Test Plan

Check that the buffered transport logs with the tracing spans:
```
...
2022-11-01T16:14:05.238Z TRACE shared::transport::buffered: queueing call id=0 method=eth_chainId
2022-11-01T16:14:05.879Z TRACE shared::transport::buffered: received response id=0 ok=true
...
2022-11-01T16:14:10.690Z TRACE auction{id=2971518 run=2}: shared::transport::buffered: queueing call id=44 method=eth_gasPrice
2022-11-01T16:14:10.807Z TRACE auction{id=2971518 run=2}: shared::transport::buffered: received response id=44 ok=true
...
```

### Release Notes

We now **no longer log HTTP JSON RPC logs** by default. If we want to enable this again, we need to add `shared::transport=trace` to the `LOG_FILTER` configuration.
nlordell pushed a commit that referenced this issue Nov 5, 2022
Partially implement #688.

This PR changes the default `web3::Transport` implementation that is used in the services to be the `transport::Buffered` type. This transport implementation receives JSON RPC requests over a channel and automatically batches them based on runtime configuration parameters. The main motivation for using this are two-fold:
- First, this makes coding of "batched" requests easier (you basically just need to `futures::join` concurrent RPC requests)
- Second, it allows for runtime configuration of maximum batch size (where previously it used a hard-coded `const` value of `100`) as well as number of concurrent batches (which we didn't have previously).

Additionally, I removed manual uses of the `transport::Buffered` in the code since the underlying transport is already buffered, and double-buffering RPC requests doesn't really serve a purpose.

Note that we currently don't specify any configuration parameters to the buffered transport and just use the default of 100 (so it mostly works like it used to).

In follow up PRs I will:
1. Remove the manual batching code of various components (which should simplify a lot of code!)
2. Introduce CLI configuration parameters for configuring the buffered transport

### Test Plan

CI, ran the services locally without issue and saw automatic batching happening:
```
2022-11-01T17:11:21.817Z DEBUG auction{id=2973160 run=351}: shared::transport::http: [base][id:310] sending request: "[{\"jsonrpc\":\"2.0\",\"method\":\"eth_call\",\"params\":[{\"data\":\"0xdd62ed3e0000000000000000000000009008d19f58aabd9ed0d60971565aa8510560ab410000000000000000000000007a250d5630b4cf539739df2c5dacb4c659f2488d\",\"to\":\"0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2\"},\"latest\"],\"id\":307},{\"jsonrpc\":\"2.0\",\"method\":\"eth_call\",\"params\":[{\"data\":\"0xdd62ed3e0000000000000000000000009008d19f58aabd9ed0d60971565aa8510560ab410000000000000000000000007a250d5630b4cf539739df2c5dacb4c659f2488d\",\"to\":\"0xdef1ca1fb7fbcdc777520aa7f396b4e015f497ab\"},\"latest\"],\"id\":308},{\"jsonrpc\":\"2.0\",\"method\":\"eth_call\",\"params\":[{\"data\":\"0xdd62ed3e0000000000000000000000009008d19f58aabd9ed0d60971565aa8510560ab410000000000000000000000007a250d5630b4cf539739df2c5dacb4c659f2488d\",\"to\":\"0xae7ab96520de3a18e5e111b5eaab095312d7fe84\"},\"latest\"],\"id\":309}]"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall Issue/PR for consideration during oncall rotation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants