Add peer tracking to merkle/sync#5415
Conversation
8fdc6dd to
c0cb54d
Compare
merkle/sync
c0cb54d to
f477c5e
Compare
f477c5e to
b2381be
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds peer performance tracking to database/merkle/sync by integrating p2p.PeerTracker, allowing the syncer to learn from slow/failed peers and prioritize better ones. It also updates the EVM/Firewood sync wiring to pass through the peer tracker and construct the appropriate p2p client.
Changes:
- Extend merkle sync configuration to accept an optional
*p2p.PeerTrackerand use it for peer selection and response/failure registration. - Update graft coreth/subnet-evm network facades to expose
PeerTracker()rather than bespoke response/failure registration methods. - Add a unit test verifying that
PeerTrackerdrives node selection when provided.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| graft/subnet-evm/network/network.go | Exposes the network’s PeerTracker() to consumers. |
| graft/coreth/network/network.go | Exposes the network’s PeerTracker() to consumers. |
| graft/evm/sync/engine/client.go | Passes PeerTracker into Firewood syncer config and constructs a dedicated p2p client via P2PNetwork().NewClient. |
| graft/evm/sync/client/test_network.go | Updates test network to satisfy the new PeerTracker() requirement. |
| graft/evm/sync/client/test_client.go | Updates Client interface conformance (now includes Network()), but currently returns nil. |
| graft/evm/sync/client/client.go | Switches request outcome tracking to use a held *p2p.PeerTracker instead of network callbacks; exposes Network(). |
| graft/evm/sync/client/BUILD.bazel | Adds new deps needed by updated test network code (prometheus/logging/set). |
| graft/evm/go.mod | Promotes prometheus/client_golang to a direct dependency (now imported). |
| database/merkle/sync/syncer.go | Adds PeerTracker to config and routes requests through it when provided (including bandwidth/failure recording). |
| database/merkle/sync/sync_test.go | Adds unit test ensuring PeerTracker drives peer selection. |
| database/merkle/sync/BUILD.bazel | Adds //utils/set dep for the new unit test. |
| database/merkle/firewood/syncer/syncer.go | Threads PeerTracker through to the underlying merkle sync config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
JonathanOppenheimer
left a comment
There was a problem hiding this comment.
This is an important PR! you should also probably determine if the copilot comments are BS or not.
| } | ||
| } | ||
|
|
||
| func sendRequestWithPeerTracker(ctx context.Context, c *p2p.Client, pt *p2p.PeerTracker, requestBytes []byte, onResponse appResponseCallbackWithCheck) error { |
There was a problem hiding this comment.
also why isn't this on the syncer?
There was a problem hiding this comment.
It could have been. i separated it since it's really a helper on the p2p client, but we are puttin goff this refactor
| func sendRequestWithPeerTracker(ctx context.Context, c *p2p.Client, pt *p2p.PeerTracker, requestBytes []byte, onResponse appResponseCallbackWithCheck) error { | ||
| nodeID, ok := pt.SelectPeer() | ||
| if !ok { | ||
| return errors.New("no peers available") |
There was a problem hiding this comment.
So if a node starts state sync before any peer connects to the firewood handler, wouldn't the sync abort permanently? Shouldn't we retry in this case? or do something else?
There was a problem hiding this comment.
It looks like the EVM client will sleep and then retry:
avalanchego/graft/evm/sync/client/client.go
Lines 319 to 394 in b2381be
There was a problem hiding this comment.
You won't be able to start a state sync without connecting to a majority of stake - how are you supposed to know what to state sync to?
There was a problem hiding this comment.
ah okay that makes sense -- we could never get into a scenario like this.
There was a problem hiding this comment.
what if all the peers disconnect or is that not at all feasible
4fed96b to
1ff1784
Compare
JonathanOppenheimer
left a comment
There was a problem hiding this comment.
I think this looks pretty good!
Why this should be merged
We need some method of peer discovery for merkle sync. Many nodes will not implement the
p2p.Clientfor Firewood, or drop the request, or not store enough state, or even generate malicious proofs. In all of these cases, the syncer should learn from this mistake and prioritize other peers.How this works
coreth and subnet-evm already use a
p2p.PeerTrackerto track bandwidth, so we can use it with this state syncer as well.How this was tested
I added a new unit test to check if the peer tracker is being used, but I don't know what else can be done to ensure responses are logged correctly, since the peer tracker exports very little data.
Need to be documented in RELEASES.md?
No