feat(vms/evm/sync): add protobuf wire types and a network client#5352
feat(vms/evm/sync): add protobuf wire types and a network client#5352powerslider wants to merge 30 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a protobuf-based wire format and a dedicated p2p request client for EVM state-sync, including new handler IDs to route leaf-range, contract code, and block-batch requests over separate protocols.
Changes:
- Added
vms/evm/sync/network.Clientthat marshals/unmarshals protobuf messages and selects peers viap2p.PeerTrackerwith bandwidth scoring. - Defined new protobuf messages/enums in
proto/sync/sync.proto(and regeneratedsync.pb.go) for EVM state-sync request/response payloads. - Reserved new dedicated p2p handler IDs for EVM state-sync request types.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| vms/evm/sync/network/client.go | New protobuf-based synchronous p2p client with peer selection and bandwidth scoring. |
| vms/evm/sync/network/client_test.go | Unit tests covering round-trip behavior and key failure paths. |
| vms/evm/sync/network/BUILD.bazel | Bazel targets for the new network package and its tests. |
| proto/sync/sync.proto | Adds EVM state-sync protobuf request/response definitions and NodeType. |
| proto/pb/sync/sync.pb.go | Regenerated Go bindings for the updated sync protobuf schema. |
| network/p2p/handler.go | Reserves new handler IDs for EVM state-sync request categories. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Defines the protobuf wire format for EVM state-sync requests (leaf ranges, contract code, block batches). - Reserves dedicated p2p handler IDs. - Add a Client that dispatches requests through them with peer-tracker-backed selection and bandwidth scoring on each round trip. resolves #5351 Signed-off-by: Tsvetan Dimitrov (tsvetan.dimitrov@avalabs.org)
032b5e2 to
ce76330
Compare
- Replaces the bundled `network.Client` with a generic `Dispatcher[Req, Resp proto.Message]` and per-RPC type aliases. - External consumers can register their own RPC types by instantiating Dispatcher against a new handler ID without forking this package.
- Replaces the bundled `network.Client` with a generic `Dispatcher[Req, Resp proto.Message]` and per-RPC type aliases. - External consumers can register their own RPC types by instantiating Dispatcher against a new handler ID without forking this package.
| ErrNoPeers = errors.New("no peers available") | ||
| ErrSendRequest = errors.New("send request") | ||
| ErrHandlerFailed = errors.New("handler request failed") | ||
| ErrUnmarshalResponse = errors.New("unmarshal response") |
There was a problem hiding this comment.
You resolved the comment, but didn't address it
There was a problem hiding this comment.
It also isn't really clear to me why these are exported. Exporting what could be private internals so that we can test the package at the exported boundary to enforce that we don't test private internals seems like the worst of both worlds to me.
There was a problem hiding this comment.
I could agree and I'll unexport the sentinels, inline the helpers and switch to an internal package. However for follow-up rather than this PR: I'd prefer package network_test for this file long-term, because it exercises only the public API the way a real caller would. When I add the retrying-dispatcher follow-up and downstream syncers (leaf, code, block) will likely need the sentinels exported anyway to do errors.Is(err, ErrSendRequest) discrimination between transport-retriable and non-retriable failures. When those consumers land I'll have to re-export and move the tests back external. Out of scope here though.
- Send and SendTo now return an `*Outcome` the caller marks as Success or Failure after content validation (range proof verifies, code hashes match, blocks chain back to requested hash). - Transport-level failures (no peers, send error, ctx, unmarshal) are still auto-registered internally and return a nil Outcome. - Closes the gap where a peer returning random bytes that happen to unmarshal as a valid proto would score as a successful responder.
- Move per-RPC clients into block, code, and evmstate packages, each owning a Client alias and NewClient constructor. - Change network.NewDispatcher to take a pre-built *p2p.Client, add network.NewClient as the production helper. - Replace trackerSampler with a no-op sampler (Dispatcher always picks an explicit peer). - Inline dispatch into SendTo. - Drop nil-safety on Outcome.Success / Outcome.Failure. Keep sync.Once idempotency for the canonical defer-Failure + Success pattern. - Add vms/evm/sync/synctest with EchoHandler, NewPeerTracker, and a generic NewDispatcher. Move network tests to package network_test.
0c17b16 to
781ac59
Compare
781ac59 to
b3a26bd
Compare
| ErrNoPeers = errors.New("no peers available") | ||
| ErrSendRequest = errors.New("send request") | ||
| ErrHandlerFailed = errors.New("handler request failed") | ||
| ErrUnmarshalResponse = errors.New("unmarshal response") |
There was a problem hiding this comment.
You resolved the comment, but didn't address it
- Per-peer pinning isn't actually required for leaf-range continuations (the rootHash deterministically identifies the snapshot, so any peer with that root produces the same leaves for the same range). - There is also no other usage of that returned NodeID.
There was a problem hiding this comment.
I disagree with two design choices made:
- The defer pattern for an
Outcome. I think that always callingoutcome.Failure()is an unintuitive pattern, decreases readability, and overly complicates the struct. - Using the
_testpackage seems to be unnecessary, and requires exporting the error types. Using the normal test package would help reduce the size of the supported API for this package and violates a team norm (I think).
@powerslider seems set on these, so if the other reviewer agrees with him, I can digress on these points
JonathanOppenheimer
left a comment
There was a problem hiding this comment.
Some comments (and a lot of nits about godocs, I'm sorry lol)
| } | ||
|
|
||
| start := time.Now() | ||
| if err := d.client.AppRequest(ctx, set.Of(nodeID), requestBytes, onResponse); err != nil { |
There was a problem hiding this comment.
When
SendTois called with an already-canceled or deadline-expired context, this still issues theAppRequestbecausep2p.Client.AppRequeststrips cancellation withcontext.WithoutCancel; the subsequent select then returnsctx.Err()and registers a peer failure even though the peer did not fail. This can happen during shutdown or timed-out sync work, so checkctx.Err()beforeRegisterRequest/AppRequest.
This makes sense to me, although is it possible that SendTo could be called with an already expired context? If it can't, I would just comment the guarantee
There was a problem hiding this comment.
Yes, it can happen on orchestrator-driven shutdown or when the sync loop's context is cancelled between iterations. Applied a guard.
| nodeID, ok := d.peers.SelectPeer() | ||
| if !ok { | ||
| return nil, ErrNoPeers | ||
| } | ||
| return d.SendTo(ctx, nodeID, req, resp) |
There was a problem hiding this comment.
What happens if multiple sync workers call Send concurrently? The guards in SelectPeer don't seem sufficient because multiple workers could all see the same peer no?
There was a problem hiding this comment.
Concurrent Send calls can hit the same peer. That's fine. The p2p layer multiplexes AppRequests via request IDs, and PeerTracker scores each RegisterRequest/Outcome pair independently. The bandwidth heap naturally steers load if a peer becomes overloaded relative to others.
| EVMLeafsRequestHandlerID | ||
| EVMCodeRequestHandlerID | ||
| EVMBlockRequestHandlerID |
There was a problem hiding this comment.
I'm confused as to how we are treating these protocol IDs.
Aren't these all part of the "EVMStateSync" protocol? Specifically, when would we want to register an EVMLeafsRequestHandlerID, but not an EVMCodeRequestHandlerID, or an EVMBlockRequestHandlerID?
Is this because we are trying to write code that composes either FirewoodProofHandlerID or EVMLeafsRequestHandlerID with EVMCodeRequestHandlerID and EVMBlockRequestHandlerID?
Or perhaps because we are trying to write code that will work with non-EVM syncing?
| // EVMNodeType selects which EVM-sync trie a leaf-range request walks. | ||
| // Distinct from Firewood/merkledb sync types in this file. | ||
| // - STATE_TRIE is the EVM account and storage trie: account balances, | ||
| // nonces, storage slots, and code hashes. | ||
| // - ATOMIC_TRIE is the C-Chain atomic operations trie used for cross-chain | ||
| // transfers between the C-Chain and the X-/P-Chains via shared memory. | ||
| enum EVMNodeType { | ||
| EVM_NODE_TYPE_UNSPECIFIED = 0; | ||
| EVM_NODE_TYPE_STATE_TRIE = 1; | ||
| EVM_NODE_TYPE_ATOMIC_TRIE = 2; | ||
| } |
There was a problem hiding this comment.
When I first read this type, I thought that EVMNodeType would refer to things like ExtensionNode, LeafNode, BranchNode - not the content of the trie.
| bytes start_key = 3; | ||
| bytes end_key = 4; | ||
| uint32 key_limit = 5; | ||
| EVMNodeType node_type = 6; |
There was a problem hiding this comment.
Why did we decide to include whether the request is for the normal trie or the atomic trie in this request? I would have expected us to have a separate protocol ID for atomic trie syncing so that subnet-evm and coreth would share the normal state sync protocol (state-trie syncer + code syncer + block syncer) and then coreth would register an additional atomic-trie syncer.
There was a problem hiding this comment.
Agreed. Dropped the EVMNodeType enum and the node_type field from GetLeafRequest and added EVMAtomicLeafsRequestHandlerID as a separate handler ID. Subnet-evm registers the three EVM-state-sync IDs (leaf + code + block). Coreth registers the atomic-leaf ID in addition.
| syncpb "github.com/ava-labs/avalanchego/proto/pb/sync" | ||
| ) | ||
|
|
||
| func TestClient_Send(t *testing.T) { |
There was a problem hiding this comment.
I don't really understand what this is testing. This package added a Client type, with a NewClient function.
This test does not use NewClient or the Client type. What is this test testing?
There was a problem hiding this comment.
I deleted all client_test.go files. To give some context on why they appeared here was because when I was moving around code and deciding on doing the group by context structure - code, block, evmstate I was trying to extract some code pieces that belonged at different layers and I extracted these tests originally from the dispatcher and then changed some types, etc., so now I also realize that they are not needed anymore hence why I removed them.
| // Dispatcher is a typed synchronous client bound to one handler ID. | ||
| // Use one instance per RPC type. | ||
| type Dispatcher[Req, Resp proto.Message] struct { | ||
| client *p2p.Client | ||
| peers *p2p.PeerTracker | ||
| } |
There was a problem hiding this comment.
Why is this type generic? It doesn't actually use the generic types (aka, removing the generic types doesn't actual change any call patterns or behavior).
I had expected SendTo to return a Resp, but since it just takes on in, the generic values aren't technically needed...
They might be used later (or maybe the stronger type enforcement is nice so we don't have to worry about passing in an unexpected message type somewhere), but it felt a bit odd to me.
There was a problem hiding this comment.
Two reasons:
- The compiler catches type mismatches at the call site instead of letting them fail at
proto.Unmarshal. - The per-context packages (
block,code,evmstate) use the type params to express their typed client alias. Without generics,block.Clientandcode.Clientwould both be the same untypedDispatcher. The runtime is identical, but the API surface and ergonomics aren't.
Also I tried an interface-based approach first but ended up duplicating the same implementation across per-RPC clients. Since the only thing that varies between them is the request and response type, generics felt like the right tool.
| ErrNoPeers = errors.New("no peers available") | ||
| ErrSendRequest = errors.New("send request") | ||
| ErrHandlerFailed = errors.New("handler request failed") | ||
| ErrUnmarshalResponse = errors.New("unmarshal response") |
There was a problem hiding this comment.
It also isn't really clear to me why these are exported. Exporting what could be private internals so that we can test the package at the exported boundary to enforce that we don't test private internals seems like the worst of both worlds to me.
Why this should be merged
Check #5351.
How this works
Adds protobuf wire format for EVM state-sync requests (leaf ranges, contract code, block batches) in
proto/sync/sync.proto, with dedicated p2p handler IDs.Adds
vms/evm/sync/networkwith:Dispatcher[Req, Resp proto.Message]: a typed pluggable synchronous client over one handler ID. External consumers register their own RPC types by instantiating against a new handler ID.Outcomereturned fromSend/SendTofor caller-driven peer scoring. The application validates response content (range proof, code-hash match, block chain integrity) and callsoutcome.Success()oroutcome.Failure().Per-RPC packages each owning a typed
Clientalias andNewClient:vms/evm/sync/blockvms/evm/sync/codevms/evm/sync/evmstateFuture
handler.goandsyncer.gofiles for the server side and syncers will land alongside forming a context oriented package layout rather than a layer oriented one.How this was tested
Dispatcherusage scenarios.Need to be documented in RELEASES.md?
resolves #5351
Signed-off-by: Tsvetan Dimitrov (tsvetan.dimitrov@avalabs.org)