Skip to content

feat(vms/evm/sync): add protobuf wire types and a network client#5352

Open
powerslider wants to merge 30 commits into
masterfrom
powerslider/5351-sync-proto-messages
Open

feat(vms/evm/sync): add protobuf wire types and a network client#5352
powerslider wants to merge 30 commits into
masterfrom
powerslider/5351-sync-proto-messages

Conversation

@powerslider
Copy link
Copy Markdown
Contributor

@powerslider powerslider commented May 7, 2026

Why this should be merged

Check #5351.

How this works

  • Adds protobuf wire format for EVM state-sync requests (leaf ranges, contract code, block batches) in proto/sync/sync.proto, with dedicated p2p handler IDs.

  • Adds vms/evm/sync/network with:

    • Dispatcher[Req, Resp proto.Message]: a typed pluggable synchronous client over one handler ID. External consumers register their own RPC types by instantiating against a new handler ID.
    • Outcome returned from Send / SendTo for caller-driven peer scoring. The application validates response content (range proof, code-hash match, block chain integrity) and calls outcome.Success() or outcome.Failure().
  • Per-RPC packages each owning a typed Client alias and NewClient:

    • vms/evm/sync/block
    • vms/evm/sync/code
    • vms/evm/sync/evmstate

    Future handler.go and syncer.go files for the server side and syncers will land alongside forming a context oriented package layout rather than a layer oriented one.

How this was tested

  • UT covering Dispatcher usage scenarios.
  • Per-RPC UT in each context package exercising the alias end-to-end.

Need to be documented in RELEASES.md?

resolves #5351

Signed-off-by: Tsvetan Dimitrov (tsvetan.dimitrov@avalabs.org)

@powerslider powerslider self-assigned this May 7, 2026
Copilot AI review requested due to automatic review settings May 7, 2026 11:36
@powerslider powerslider requested a review from a team as a code owner May 7, 2026 11:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a protobuf-based wire format and a dedicated p2p request client for EVM state-sync, including new handler IDs to route leaf-range, contract code, and block-batch requests over separate protocols.

Changes:

  • Added vms/evm/sync/network.Client that marshals/unmarshals protobuf messages and selects peers via p2p.PeerTracker with bandwidth scoring.
  • Defined new protobuf messages/enums in proto/sync/sync.proto (and regenerated sync.pb.go) for EVM state-sync request/response payloads.
  • Reserved new dedicated p2p handler IDs for EVM state-sync request types.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
vms/evm/sync/network/client.go New protobuf-based synchronous p2p client with peer selection and bandwidth scoring.
vms/evm/sync/network/client_test.go Unit tests covering round-trip behavior and key failure paths.
vms/evm/sync/network/BUILD.bazel Bazel targets for the new network package and its tests.
proto/sync/sync.proto Adds EVM state-sync protobuf request/response definitions and NodeType.
proto/pb/sync/sync.pb.go Regenerated Go bindings for the updated sync protobuf schema.
network/p2p/handler.go Reserves new handler IDs for EVM state-sync request categories.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vms/evm/sync/network/client.go Outdated
- Defines the protobuf wire format for EVM state-sync requests
  (leaf ranges, contract code, block batches).
- Reserves dedicated p2p handler IDs.
- Add a Client that dispatches requests through them with peer-tracker-backed
  selection and bandwidth scoring on each round trip.

resolves #5351

Signed-off-by: Tsvetan Dimitrov (tsvetan.dimitrov@avalabs.org)
@powerslider powerslider force-pushed the powerslider/5351-sync-proto-messages branch from 032b5e2 to ce76330 Compare May 7, 2026 11:57
Comment thread vms/evm/sync/network/client.go Outdated
Comment thread vms/evm/sync/network/client.go Outdated
Comment thread vms/evm/sync/network/client.go Outdated
Comment thread vms/evm/sync/network/client.go Outdated
Comment thread vms/evm/sync/network/client.go Outdated
- Replaces the bundled `network.Client` with a generic
  `Dispatcher[Req, Resp proto.Message]` and per-RPC type aliases.
- External consumers can register their own RPC types by instantiating
  Dispatcher against a new handler ID without forking this package.
- Replaces the bundled `network.Client` with a generic
  `Dispatcher[Req, Resp proto.Message]` and per-RPC type aliases.
- External consumers can register their own RPC types by instantiating
  Dispatcher against a new handler ID without forking this package.
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment on lines +24 to +27
ErrNoPeers = errors.New("no peers available")
ErrSendRequest = errors.New("send request")
ErrHandlerFailed = errors.New("handler request failed")
ErrUnmarshalResponse = errors.New("unmarshal response")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these exported?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You resolved the comment, but didn't address it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check here #5352 (comment)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also isn't really clear to me why these are exported. Exporting what could be private internals so that we can test the package at the exported boundary to enforce that we don't test private internals seems like the worst of both worlds to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could agree and I'll unexport the sentinels, inline the helpers and switch to an internal package. However for follow-up rather than this PR: I'd prefer package network_test for this file long-term, because it exercises only the public API the way a real caller would. When I add the retrying-dispatcher follow-up and downstream syncers (leaf, code, block) will likely need the sentinels exported anyway to do errors.Is(err, ErrSendRequest) discrimination between transport-retriable and non-retriable failures. When those consumers land I'll have to re-export and move the tests back external. Out of scope here though.

Comment thread vms/evm/sync/network/dispatcher.go Outdated
- Send and SendTo now return an `*Outcome` the caller marks as Success or
  Failure after content validation (range proof verifies, code hashes
  match, blocks chain back to requested hash).
- Transport-level failures (no peers, send error, ctx, unmarshal) are
  still auto-registered internally and return a nil Outcome.
- Closes the gap where a peer returning random bytes that happen to
  unmarshal as a valid proto would score as a successful responder.
Comment thread proto/sync/sync.proto Outdated
Comment thread vms/evm/sync/network/clients_test.go Outdated
Comment thread proto/sync/sync.proto Outdated
Comment thread vms/evm/sync/network/clients.go Outdated
Comment thread vms/evm/sync/network/clients_test.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher_test.go Outdated
Comment thread vms/evm/sync/network/dispatcher_test.go Outdated
Comment thread vms/evm/sync/network/dispatcher_test.go Outdated
@github-project-automation github-project-automation Bot moved this to In Progress 🏗️ in avalanchego May 14, 2026
- Move per-RPC clients into block, code, and evmstate packages, each
  owning a Client alias and NewClient constructor.
- Change network.NewDispatcher to take a pre-built *p2p.Client, add
  network.NewClient as the production helper.
- Replace trackerSampler with a no-op sampler (Dispatcher always picks
  an explicit peer).
- Inline dispatch into SendTo.
- Drop nil-safety on Outcome.Success / Outcome.Failure. Keep sync.Once
  idempotency for the canonical defer-Failure + Success pattern.
- Add vms/evm/sync/synctest with EchoHandler, NewPeerTracker, and a
  generic NewDispatcher. Move network tests to package network_test.
@powerslider powerslider force-pushed the powerslider/5351-sync-proto-messages branch from 0c17b16 to 781ac59 Compare May 18, 2026 10:51
@powerslider powerslider changed the title feat(vms/evm/sync): add protobuf wire types and network.Client feat(vms/evm/sync): add protobuf wire types and a network client May 18, 2026
@powerslider powerslider force-pushed the powerslider/5351-sync-proto-messages branch from 781ac59 to b3a26bd Compare May 18, 2026 13:13
Comment thread proto/sync/sync.proto Outdated
Comment thread proto/sync/sync.proto Outdated
Comment thread vms/evm/sync/block/client_test.go Outdated
Comment thread vms/evm/sync/block/client_test.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment on lines +24 to +27
ErrNoPeers = errors.New("no peers available")
ErrSendRequest = errors.New("send request")
ErrHandlerFailed = errors.New("handler request failed")
ErrUnmarshalResponse = errors.New("unmarshal response")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You resolved the comment, but didn't address it

Comment thread vms/evm/sync/network/clients.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher_test.go
- Per-peer pinning isn't actually required for leaf-range continuations
  (the rootHash deterministically identifies the snapshot, so any peer
  with that root produces the same leaves for the same range).
- There is also no other usage of that returned NodeID.
Copy link
Copy Markdown
Contributor

@alarso16 alarso16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with two design choices made:

  1. The defer pattern for an Outcome. I think that always calling outcome.Failure() is an unintuitive pattern, decreases readability, and overly complicates the struct.
  2. Using the _test package seems to be unnecessary, and requires exporting the error types. Using the normal test package would help reduce the size of the supported API for this package and violates a team norm (I think).

@powerslider seems set on these, so if the other reviewer agrees with him, I can digress on these points

Comment thread vms/evm/sync/block/client_test.go Outdated
Copy link
Copy Markdown
Contributor

@JonathanOppenheimer JonathanOppenheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments (and a lot of nits about godocs, I'm sorry lol)

Comment thread vms/evm/sync/block/client_test.go Outdated
Comment thread vms/evm/sync/network/dispatcher_test.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher_test.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
}

start := time.Now()
if err := d.client.AppRequest(ctx, set.Of(nodeID), requestBytes, onResponse); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude

When SendTo is called with an already-canceled or deadline-expired context, this still issues the AppRequest because p2p.Client.AppRequest strips cancellation with context.WithoutCancel; the subsequent select then returns ctx.Err() and registers a peer failure even though the peer did not fail. This can happen during shutdown or timed-out sync work, so check ctx.Err() before RegisterRequest/AppRequest.

This makes sense to me, although is it possible that SendTo could be called with an already expired context? If it can't, I would just comment the guarantee

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can happen on orchestrator-driven shutdown or when the sync loop's context is cancelled between iterations. Applied a guard.

Comment on lines +55 to +59
nodeID, ok := d.peers.SelectPeer()
if !ok {
return nil, ErrNoPeers
}
return d.SendTo(ctx, nodeID, req, resp)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if multiple sync workers call Send concurrently? The guards in SelectPeer don't seem sufficient because multiple workers could all see the same peer no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent Send calls can hit the same peer. That's fine. The p2p layer multiplexes AppRequests via request IDs, and PeerTracker scores each RegisterRequest/Outcome pair independently. The bandwidth heap naturally steers load if a peer becomes overloaded relative to others.

Comment thread network/p2p/handler.go
Comment on lines +31 to +33
EVMLeafsRequestHandlerID
EVMCodeRequestHandlerID
EVMBlockRequestHandlerID
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused as to how we are treating these protocol IDs.

Aren't these all part of the "EVMStateSync" protocol? Specifically, when would we want to register an EVMLeafsRequestHandlerID, but not an EVMCodeRequestHandlerID, or an EVMBlockRequestHandlerID?

Is this because we are trying to write code that composes either FirewoodProofHandlerID or EVMLeafsRequestHandlerID with EVMCodeRequestHandlerID and EVMBlockRequestHandlerID?

Or perhaps because we are trying to write code that will work with non-EVM syncing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread proto/sync/sync.proto Outdated
Comment on lines +16 to +26
// EVMNodeType selects which EVM-sync trie a leaf-range request walks.
// Distinct from Firewood/merkledb sync types in this file.
// - STATE_TRIE is the EVM account and storage trie: account balances,
// nonces, storage slots, and code hashes.
// - ATOMIC_TRIE is the C-Chain atomic operations trie used for cross-chain
// transfers between the C-Chain and the X-/P-Chains via shared memory.
enum EVMNodeType {
EVM_NODE_TYPE_UNSPECIFIED = 0;
EVM_NODE_TYPE_STATE_TRIE = 1;
EVM_NODE_TYPE_ATOMIC_TRIE = 2;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first read this type, I thought that EVMNodeType would refer to things like ExtensionNode, LeafNode, BranchNode - not the content of the trie.

Comment thread proto/sync/sync.proto Outdated
bytes start_key = 3;
bytes end_key = 4;
uint32 key_limit = 5;
EVMNodeType node_type = 6;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we decide to include whether the request is for the normal trie or the atomic trie in this request? I would have expected us to have a separate protocol ID for atomic trie syncing so that subnet-evm and coreth would share the normal state sync protocol (state-trie syncer + code syncer + block syncer) and then coreth would register an additional atomic-trie syncer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Dropped the EVMNodeType enum and the node_type field from GetLeafRequest and added EVMAtomicLeafsRequestHandlerID as a separate handler ID. Subnet-evm registers the three EVM-state-sync IDs (leaf + code + block). Coreth registers the atomic-leaf ID in addition.

Comment thread vms/evm/sync/block/client_test.go Outdated
syncpb "github.com/ava-labs/avalanchego/proto/pb/sync"
)

func TestClient_Send(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what this is testing. This package added a Client type, with a NewClient function.

This test does not use NewClient or the Client type. What is this test testing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted all client_test.go files. To give some context on why they appeared here was because when I was moving around code and deciding on doing the group by context structure - code, block, evmstate I was trying to extract some code pieces that belonged at different layers and I extracted these tests originally from the dispatcher and then changed some types, etc., so now I also realize that they are not needed anymore hence why I removed them.

Comment thread vms/evm/sync/code/client_test.go Outdated
Comment on lines +28 to +33
// Dispatcher is a typed synchronous client bound to one handler ID.
// Use one instance per RPC type.
type Dispatcher[Req, Resp proto.Message] struct {
client *p2p.Client
peers *p2p.PeerTracker
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this type generic? It doesn't actually use the generic types (aka, removing the generic types doesn't actual change any call patterns or behavior).

I had expected SendTo to return a Resp, but since it just takes on in, the generic values aren't technically needed...

They might be used later (or maybe the stronger type enforcement is nice so we don't have to worry about passing in an unexpected message type somewhere), but it felt a bit odd to me.

Copy link
Copy Markdown
Contributor Author

@powerslider powerslider May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons:

  1. The compiler catches type mismatches at the call site instead of letting them fail at proto.Unmarshal.
  2. The per-context packages (block, code, evmstate) use the type params to express their typed client alias. Without generics, block.Client and code.Client would both be the same untyped Dispatcher. The runtime is identical, but the API surface and ergonomics aren't.

Also I tried an interface-based approach first but ended up duplicating the same implementation across per-RPC clients. Since the only thing that varies between them is the request and response type, generics felt like the right tool.

Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment on lines +24 to +27
ErrNoPeers = errors.New("no peers available")
ErrSendRequest = errors.New("send request")
ErrHandlerFailed = errors.New("handler request failed")
ErrUnmarshalResponse = errors.New("unmarshal response")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also isn't really clear to me why these are exported. Exporting what could be private internals so that we can test the package at the exported boundary to enforce that we don't test private internals seems like the worst of both worlds to me.

Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Comment thread vms/evm/sync/network/dispatcher.go Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In Progress 🏗️

Development

Successfully merging this pull request may close these issues.

Add protobuf wire types and network client for EVM state sync post-SAE

5 participants