feat: adjust ConnMgr to 32/96 #9483

lidel · 2022-12-08T15:22:15Z

This is follow-up for #9467

Rationale

I sanity checked the number of Low=50, High=150

By default if a Kubo node has 8GB of total memory, a maxMemory for libp2p will be set to 2GB. That will cause System.ConnsInbound to be 128 I believe (looks like 64 per GB of memory here: https://github.com/libp2p/go-libp2p/blob/master/p2p/host/resource-manager/limit_defaults.go#L345 since Kubo is using those same libp2p defaults: https://github.com/ipfs/kubo/blob/master/core/node/libp2p/rcmgr_defaults.go#L94). As a result, in this particular setup, our high water mark may not get hit (unless there are sufficient outbound connections). I know there isn't a perfect number here and we have no stats on the systems that folks are using. I would be in favor of dropping the ConnMgr.High to 120 instead.

lidel · 2022-12-08T15:23:13Z

@BigLep thank you for looking into this.

The number of connections at a time will usually be slightly higher than HighWater (highWater triggers GC, but in the meantime new peers connected to us because we are dhtserver, and next gracePeriod is 1-20s away).

For example, setting HighWater to 96 eventually typically hovers between 150-170.

I've set it to 120 as you suggested but suggest going with bit lower values (32/96) if we really want to move average connection count down – lmk if I should update this PR to 32/96, or keep it 50/120.

Jorropo · 2022-12-08T15:30:33Z

Looks like we should figure out scaling this based on ressource manager or system memory aswell.

BigLep · 2022-12-08T16:28:07Z

Interesting. I don't know enough here and I'm not sure the full ramification of the tradeoff. For example, we see really high success rates of finding content over bitswap (https://www.notion.so/pl-strflt/Bitswap-measurements-draft-report-c64694c1a9ae4389a86e34e7bf2baf6e#e0b2e3258b304ae0b85bfcfba8c8db41 ) but that is also with a lot of connected peers.

We have a strong datapoint of what has been working for Brave for a year plus now which his what makes me comfortable to drop it from the previous really high values.

Maybe just go with 120 now and we get probelab engagement more for future?

Jorropo · 2022-12-08T16:52:52Z

@BigLep imo we should not look at the bitswap flood success rate and try to chase that.

This comes at the cost of flooding the network with thousands of requests per second which create lots of load, we will stop flooding the network in Q1 or Q2 of 2023 when the work of move the bytes is released mainstream.
(To be clear I'm not saying we we will stop flooding completely, but either we will find an efficient way to do that and or we will add metrics to only flood / optimistically request nodes that most likely have content (like pinning services).)

BigLep · 2022-12-08T17:35:58Z

@Jorropo : ack understood. I'm not saying we should keep flooding. The area of uncertainty for me is around how much unexpected benefit are we getting from flooding and the potential impact of reducing it without having something else filling in like new transports from Move the Bytes.

I guess the strongest datapoint we have right now is that Brave/Desktop have lower limits and they work. As a result, I'm comfortable reducing the numbers to 32/96 as you suggest Lidel and we can increase if we got this wrong.

lidel · 2022-12-08T20:31:54Z

Ack, let's try 32/96 in RC1, we can adjust in RC2 if needed.

#9483 (comment)

guillaumemichel · 2022-12-12T07:51:42Z

I don't have much visibility on what System.ConnsInbound corresponds to. According to the Draft Bitswap Measurements report, on average a Bitswap successful request solicits 820 peers. However, I am not sure whether these connections are considered as Inbound or Outbound.

We also observed that most of the content was served by a small number of peers, and most of them are nft.storage and other large providers. Hence, these large providers may have different Connection Manager settings adapted to serve content to many peers.

Top 1 peers serve 10.78% of the successful bitswap requests
Top 3 peers serve 26.07% of the successful bitswap requests
Top 5 peers serve 38.56% of the successful bitswap requests
Top 10 peers serve 59.07% of the successful bitswap requests
Top 20 peers serve 76.64% of the successful bitswap requests
Top 50 peers serve 85.65% of the successful bitswap requests
Top 355 peers serve 98.64% of the successful bitswap requests
Top 711 peers serve 100.0% of the successful bitswap requests

lidel · 2023-01-04T19:12:14Z

@guillaumemichel System.ConnsInbound is the limit set by go-libp2p's resource manager (Kubo's config/docs: Swarm.ResourceMgr).

The direction of active connection can be inspected via ipfs swarm peers -v. My understanding is that only informs who initiated the bidirectional connection. The existing connection can be reused later, so it is possible the "incoming" connection is used later for outgoing bitswap queries.

My gut feeling is this is unlikely, unless some peering agreement was set up between peers.
That is to say, in most cases, ConnsInbound "limits other peers from using my resources for dht/bitswap", and not my own ability to use other peers (iirc outgoing connections have no limit)

guillaumemichel · 2023-01-05T08:13:24Z

I think that reducing the System.ConnsInbound is the right thing to do concerning go-bitswap. It will reduce spam 8x from nodes that have the reduced System.ConnsInbound.

This will likely result in a drop of the Bitswap discovery success rate, but it isn't a big deal as content routers (DHT, IPNI) are fast. However reducing Bitswap ProviderSearchDelay from 1s to 0 seems critical as described in #8807. The Effectiveness of Bitswap Discovery Process report discusses improvements suggestions for the Bitswap discovery process.

lidel requested a review from BigLep December 8, 2022 15:22

lidel mentioned this pull request Dec 8, 2022

feat(connmgr): lower the default connection pool #9467

Merged

4 tasks

Jorropo approved these changes Dec 8, 2022

View reviewed changes

lidel changed the title ~~fix: adjust DefaultConnMgrHighWater to 120~~ fix: adjust ConnMgr to 32/96 Dec 8, 2022

feat: set ConnMgr target to 32-96

fde94af

#9483 (comment)

lidel force-pushed the feat/even-lower-connection-pool branch from 953005f to fde94af Compare December 8, 2022 20:36

lidel changed the title ~~fix: adjust ConnMgr to 32/96~~ feat: adjust ConnMgr to 32/96 Dec 8, 2022

lidel enabled auto-merge (squash) December 8, 2022 20:38

lidel merged commit 1f63640 into master Dec 8, 2022

lidel deleted the feat/even-lower-connection-pool branch December 8, 2022 20:47

guillaumemichel mentioned this pull request Jan 5, 2023

Optimizing (or remove) the provider delay interval for DHT lookups used by Bitswap #8807

Open

lidel mentioned this pull request Jan 5, 2023

feat: set Bitswap.ProviderSearchDelay to 0s #9530

Closed

3 tasks

This was referenced Jan 16, 2023

Prevent Routing.Type=auto from enabling dhtserver mode if have too low of limits #9548

Open

Update IPFS Kubo to 0.18.1 brave/brave-browser#27965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adjust ConnMgr to 32/96 #9483

feat: adjust ConnMgr to 32/96 #9483

lidel commented Dec 8, 2022

lidel commented Dec 8, 2022 •

edited

Loading

Jorropo commented Dec 8, 2022

BigLep commented Dec 8, 2022

Jorropo commented Dec 8, 2022

BigLep commented Dec 8, 2022

lidel commented Dec 8, 2022

guillaumemichel commented Dec 12, 2022

lidel commented Jan 4, 2023

guillaumemichel commented Jan 5, 2023

feat: adjust ConnMgr to 32/96 #9483

feat: adjust ConnMgr to 32/96 #9483

Conversation

lidel commented Dec 8, 2022

Rationale

lidel commented Dec 8, 2022 • edited Loading

Jorropo commented Dec 8, 2022

BigLep commented Dec 8, 2022

Jorropo commented Dec 8, 2022

BigLep commented Dec 8, 2022

lidel commented Dec 8, 2022

guillaumemichel commented Dec 12, 2022

lidel commented Jan 4, 2023

guillaumemichel commented Jan 5, 2023

lidel commented Dec 8, 2022 •

edited

Loading