sae: Implement minimal C-Chain VM#5386
Conversation
e926c58 to
26ccbc8
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a minimal, testable C-Chain VM implementation built on top of the SAE VM. It wires together C-Chain-specific block-building hooks, a cross-chain txpool, and an avax JSON-RPC service (plus client) used to issue/query Export/Import transactions, and adds integration/regression tests (including GetUTXOs pagination).
Changes:
- Added
vms/saevm/cchainVM wrapper that composes SAE + C-Chain hooks + cross-chain txpool +avaxRPC handler. - Implemented/verified
avaxAPI surface (IssueTx/GetAtomicTx/GetUTXOs) with pagination regression coverage. - Centralized goleak ignore options into
saetestand reused them across SAE/C-Chain tests.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| vms/saevm/saetest/goleak.go | Adds shared goleak ignore option helper for libevm goroutines. |
| vms/saevm/saetest/BUILD.bazel | Adds goleak dependency and includes new helper source. |
| vms/saevm/sae/vm_test.go | Switches TestMain goleak setup to shared saetest.GoleakOptions. |
| vms/saevm/sae/rpc.go | Extends SAE RPC chain backend with chain-head event subscription passthrough. |
| vms/saevm/sae/consensus.go | Exposes chain-head subscription and last-executed state accessor on SAE VM. |
| vms/saevm/sae/BUILD.bazel | Adds libevm dependency for newly exposed state reader return type. |
| vms/saevm/cchain/vm.go | New minimal C-Chain VM wrapper: initializes SAE, state, txpool, and mounts avax service. |
| vms/saevm/cchain/api.go | Implements avax JSON-RPC service + client (IssueTx/GetAtomicTx/GetUTXOs with pagination fix). |
| vms/saevm/cchain/hooks.go | Adds C-Chain hook implementation: block building, end-of-block ops, execution side effects. |
| vms/saevm/cchain/BUILD.bazel | Bazel targets for new cchain library and tests. |
| vms/saevm/cchain/api_test.go | Adds API-focused tests (invalid tx rejection, tx not found, UTXO pagination regression). |
| vms/saevm/cchain/hooks_test.go | Adds unit test for processing-block ancestor input traversal logic. |
| vms/saevm/cchain/vm_test.go | Adds end-to-end integration tests for Export/Import and processing-block build behavior. |
| vms/saevm/cchain/tx/codec.go | Exposes canonical marshal/parse helpers for UTXOs. |
| vms/saevm/cchain/tx/export.go | Uses shared UTXO marshal helper and exported ScaleAVAX. |
| vms/saevm/cchain/tx/import.go | Uses shared UTXO parse helper and exported ScaleAVAX. |
| vms/saevm/cchain/tx/tx.go | Exports X2C conversion constants/functions (X2CRate/ScaleAVAX). |
| vms/saevm/cchain/tx/tx_test.go | Updates tests to use new txtest UTXO marshal helper. |
| vms/saevm/cchain/tx/identifiers_test.go | Removes now-unneeded test-only exports and inline marshal helper. |
| vms/saevm/cchain/tx/txtest/wallet.go | Adds reusable test helpers (key gen, UTXO marshal/parse, transfer outputs, exported UTXOs). |
| vms/saevm/cchain/tx/txtest/cmp.go | Adds cmp option helper for order-insensitive UTXO comparisons. |
| go.mod | Promotes graft/evm dependency to direct (now imported by new code). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
alarso16
left a comment
There was a problem hiding this comment.
Honestly not a full review, my brain is getting sleepy
| return | ||
| } | ||
|
|
||
| for t := range b.potentialTxs { |
There was a problem hiding this comment.
Will you clarify why these specific checks are performed here, and why no other checks are necessary? For example, do we need to check for nonces, or that the payments line up?
There was a problem hiding this comment.
Update: add a short comment that all ethereum state will be checked in sae's block builder, and that all nonces will be verified by the mempool as well
There was a problem hiding this comment.
I added some comments. lmk what you think
| potentialTxs iter.Seq[*hookTx] | ||
| } | ||
|
|
||
| func (b *builder) BuildHeader(parent *types.Header) (*types.Header, error) { |
There was a problem hiding this comment.
I would put a comment somewhere around here with the specific fields that the hook MUST NOT touch, unless you think the burden of maintaining that list is high
There was a problem hiding this comment.
I think it would make sense to treat ParentHash and Number how we treat Root and GasUsed (aka - warn if they are set and clober them in SAE).
But I don't really feel like we need to make that change in this PR.
There was a problem hiding this comment.
I also added a comment.
| // TODO(StephenButtolph): Enforce the minimum block time here. | ||
| return customtypes.WithHeaderExtra( | ||
| &types.Header{ | ||
| ParentHash: parent.Hash(), |
There was a problem hiding this comment.
Why is this done in the by the hooks? I just mean, there's no reason that any other hook implementation would create a custom ParentHash. This is really a question of where we want complexity - if we want the hooks doing most of the work, this is fine, but I was under the impression that sae should take as much of the complexity as possible
There was a problem hiding this comment.
Yeah I think it would make sense for SAE to populate the ParentHash and the Number - like we do for other values. Although I don't think this PR is the right place to change this behavior.
There was a problem hiding this comment.
I think this is related to the answer on the thread about a comment. At the call site we've got some sense checks and warnings.
I agree that SAE should take as much of the complexity as possible, especially w.r.t. invariants, but not sure that expecting a parent hash is enough of a burden on a hook to warrant special-casing it.
There was a problem hiding this comment.
Only one of my comments is blocking, I'll approve once we figure out the test contexts. Almost of this PR is boilerplate and testing helpers that will continue to evolve, so I don't think we should block this review forever once we're reasonably confident that the hooks implementation is correct
| mempoolConfig := legacypool.DefaultConfig | ||
| // Treat all transactions equally regardless of submission source — no | ||
| // preferential admission or pricing for locally-submitted txs. | ||
| mempoolConfig.NoLocals = true |
There was a problem hiding this comment.
Not to be addressed in this PR, but how we handle all the different config options (e.g. verifying commit interval matches for mainnet) is unclear to me. I guess a level above this?
There was a problem hiding this comment.
My plan is to remove as many user configs as possible - but yes, we'll need to make sure everything aligns correctly.
| tb.Helper() | ||
|
|
||
| blk := s.buildVerifyAccept(tb) | ||
| require.NoErrorf(tb, blk.WaitUntilExecuted(tb.Context()), "%T.WaitUntilExecuted()", blk) |
There was a problem hiding this comment.
For some blocking calls where you're waiting on a context, it might be better to follow something like in sae:
func (s *SUT) context(tb testing.TB) context.Context {
return s.logger.CancelOnError(tb.Context())
}this will return a derived context that is canceled if an error is logged, so you will get a helpful error rather than a timeout
There was a problem hiding this comment.
I decided not to add this function, but I plumbed the contexts around and cancelled the root context on logging an error.
| } | ||
| } | ||
|
|
||
| if err := h.state.Apply(b.NumberU64(), txs); err != nil { |
There was a problem hiding this comment.
Could this run during an RPC transaction replay and persist the txs (incorrectly)? I think the protection in Apply is not sufficient -- it does not ensure the first writer is the actual executor. (Tracing is also not disabled by default IIRC.)
My understanding of what could happen:
- Joe Schmoe submits a transaction, and it gets included in Block
Awhich accepted but not yet executed.AcceptBlockwrites the block and tx lookup entries and then enqueues it for execution. - Joe then calls
debug_traceTransactionfor an his tx in that block. - The tracer calls
StateAtTransaction, which callssaexec.Execute. Executestill callsAfterExecutingBlock.AfterExecutingBlockcallsh.state.Apply(here!)State.Applywrites C-Chain tx indexes, etc.
To be clear this is possible in a scenario where A is accepted and visible to RPC, but the canonical executor has not run AfterExecutingBlock yet, so the current guard wouldn't fire. (I validated this can happen with a test in code)
The replay wrapper currently suppresses EndOfBlockOps, but I think it should also no-op AfterExecutingBlock.
With Claude's help, here's a test demonstrating the issue:
func TestDebugTraceReplayDoesNotApplyAtomicStateBeforeBlockExecution(t *testing.T) {
ethWallet := saetest.NewUNSAFEWallet(t, 1, types.LatestSigner(saetest.ChainConfig()))
ethSender := ethWallet.Addresses()[0]
exportKey := txtest.NewKey(t)
sut := newSUT(t, options.Func[sutConfig](func(c *sutConfig) {
c.genesis.Alloc = saetest.MaxAllocFor(
ethSender,
exportKey.EthAddress(),
)
}))
assertAtomicStateMissing := func(exportTx *tx.Tx, exportedUTXOs ...*avax.UTXO) {
t.Helper()
require.Equalf(t, uint64(0), sut.state.CurrentHeight(), "%T.CurrentHeight()", sut.state)
_, _, err := sut.state.GetTx(exportTx.ID())
require.ErrorIsf(t, err, database.ErrNotFound, "%T.GetTx(%s)", sut.state, exportTx.ID())
keys := make([][]byte, len(exportedUTXOs))
for i, utxo := range exportedUTXOs {
inputID := utxo.InputID()
keys[i] = inputID[:]
}
peerMemory := sut.memory.NewSharedMemory(sut.snowCtx.XChainID)
_, err = peerMemory.Get(snowtest.CChainID, keys)
require.ErrorIsf(t, err, database.ErrNotFound, "%T.Get()", peerMemory)
}
// Tracing needs an EVM transaction target.
recipient := common.Address{'r', 'e', 'c', 'v'}
tracedTx := ethWallet.SetNonceAndSign(t, 0, &types.LegacyTx{
To: &recipient,
Gas: ethparams.TxGas,
GasPrice: big.NewInt(1),
})
require.NoErrorf(t, sut.VM.GethRPCBackends().SendTx(t.Context(), tracedTx), "%T.SendTx(%#x)", sut.VM.GethRPCBackends(), tracedTx.Hash())
// Export gives us observable external state: the tx index and shared-memory UTXOs.
w := newWallet(exportKey, sut.snowCtx, sut.Client)
const (
txFee = 50
exportedAmount = 50
)
signedExport, export := w.newExportTx(
t,
sut.snowCtx.XChainID,
txFee,
txtest.NewTransferOutput(exportedAmount, exportKey.Address()),
)
require.NoErrorf(t, sut.IssueTx(t.Context(), signedExport), "%T.IssueTx()", sut.Client)
exportedUTXOs := txtest.ExportedUTXOs(signedExport.ID(), export)
// Build+verify only. Accepting would enqueue canonical execution.
blk := sut.buildVerify(t, sut.lastAccepted(t))
require.Lenf(t, blk.Transactions(), 1, "%T.Transactions()", blk)
require.Equalf(t, tracedTx.Hash(), blk.Transactions()[0].Hash(), "%T.Transactions()[0].Hash()", blk)
if diff := cmp.Diff([]*tx.Tx{signedExport}, blockTxs(t, blk), txtest.CmpOpt()); diff != "" {
t.Errorf("%T txs (-want +got):\n%s", blk, diff)
}
require.Falsef(t, blk.Executed(), "%T.Executed()", blk)
assertAtomicStateMissing(signedExport, exportedUTXOs...)
// debug_traceTransaction reaches this replay path after tx lookup. Replay
// must not publish C-Chain atomic state; only canonical execution should.
_, _, _, release, err := sut.VM.GethRPCBackends().StateAtTransaction(t.Context(), blk.EthBlock(), 0, 0)
require.NoErrorf(t, err, "%T.StateAtTransaction(...)", sut.VM.GethRPCBackends())
defer release()
require.Falsef(t, blk.Executed(), "%T.Executed()", blk)
assertAtomicStateMissing(signedExport, exportedUTXOs...)
// Canonical execution should publish the export.
require.NoErrorf(t, sut.AcceptBlock(t.Context(), blk), "%T.AcceptBlock()", sut.VM)
require.NoErrorf(t, blk.WaitUntilExecuted(t.Context()), "%T.WaitUntilExecuted()", blk)
sut.assertTxAccepted(t, signedExport, blk.NumberU64())
sut.assertUTXOsExist(t, sut.snowCtx.XChainID, exportedUTXOs...)
}There was a problem hiding this comment.
I also made sure this doesn't pass on the devnet-5, and I don't think this is covered by a TODO:
still get:
Error: Not equal:
expected: 0x0
actual : 0x1
``
There was a problem hiding this comment.
So, I don't think you can actually trigger this via the API. But this does feel problematic.
Specifically, the tracing logic requires that the parent block has been executed, and the APIs (afaict) do require the block to have been accepted.
Which I think precludes this issue. (Your PoC relies on executing a non-accepted block).
That being said... it feels close enough to a bug that something should probably change.
There was a problem hiding this comment.
Sorry I think we misunderstood each other. I fully agree (after trying!) that it's not possible to execute non-accepted blocks via the currently available RPCs. The only RPCs that touch execution are debug_traceCall and debug_traceTransaction, which require the block to be accepted. (As I said, I tried messing around with verified but not accepted blocks, you get a whole different class of errors -- not possible). As you said to me in the office, this does not preclude geth from deciding to add some new RPC which would allow this.
So, I don't think you can actually trigger this via the API.
If you're referring to this not being possible via the external API (rather than sut.VM.GethRPCBackends()), it is definitely possible, and I have done so.
If you check the below, the test is rewritten this to use the real JSON-RPC path through the HTTP handler. I did have to add a pause mechanism for execution to make the test deterministic. Without the pause, it's a race -- in production, the exposed debug RPC caller can of course still race the async executor. (and if there's an execution queue backlog, blocks are slow, node under high load etc. it'd be easier and easier).
Here's the branch if you wanna take a look (Claude again mostly for the test code): https://github.com/ava-labs/avalanchego/tree/JonathanOppenheimer/rpc-execute-blocks-before-canonical-execution.
There was a problem hiding this comment.
I do agree we should probably change the behavior of this. Even if it's not a bug, it does seem incredibly jank, and there's the risk of geth introducing something actually broken.
There was a problem hiding this comment.
I made the minimal diff to prevent this, but I suspect we'll want to have some other behavior in the long-term.
There was a problem hiding this comment.
I drafted a proposal of how we can separate read-only actions from durable side effects here: #5413. This is definitely outside of the scope of this PR, so just throwing it here as a possible follow up and long term solution.
There was a problem hiding this comment.
I made the minimal diff to prevent this, but I suspect we'll want to have some other behavior in the long-term.
That minimal diff being the change to rpc.noEndOfBlockOps.AfterExecutingBlock()?
| func ancestorInputIDs(h *types.Header, settled common.Hash, source saetypes.BlockSource) (set.Set[ids.ID], error) { | ||
| var s set.Set[ids.ID] | ||
| for h.ParentHash != settled { | ||
| parentNumber := h.Number.Uint64() - 1 |
There was a problem hiding this comment.
If settled is ever not in h's ancestry, the walk runs past genesis and h.Number.Uint64() - 1 wraps to MaxUint64. We still return errMissingBlock - so it's not a bug, but the message contains 18446744073709551615 (which is 2^64 - 1) instead of something a debugger can act on. A h.Number.Sign() == 0 check inside the loop makes the precondition explicit and turns the failure into a readable error.
| parentNumber := h.Number.Uint64() - 1 | |
| if h.Number.Sign() == 0 { | |
| return nil, fmt.Errorf("%w: walked past genesis without reaching settled %s", errMissingBlock, settled) | |
| } | |
| parentNumber := h.Number.Uint64() - 1 |
There was a problem hiding this comment.
I feel like this is really not protecting us from much. If we are iterating through the whole blockchain and hitting the genesis block, we have probably already completely bricked the node for hours (and probably OOMed)
If we want to worry about such a case, we could consider replacing lastSettledBlock common.Hash with lastSettledHeight uint64 so that we don't walk past whatever height the settled block is.
We would want to bail with an error or something in such a case though... Ideally with returning an error... But I suppose we could log an ERROR instead.
I actually want to see some of the testing changes I mentioned
| @@ -9,3 +12,101 @@ package_group( | |||
| name = "external_consumers", | |||
| packages = [], | |||
| ) | |||
There was a problem hiding this comment.
(out of PR scope) The package_group is unnecessary here and could be removed if the # gazelle directive is also changed to remove it. That would have a lot of flow-on effects so could be done in a different PR.
| // ScaleAVAX converts an amount denominated in nAVAX into the C-Chain's aAVAX | ||
| // denomination. | ||
| func scaleAVAX(nAVAX uint64) uint256.Int { | ||
| func ScaleAVAX(nAVAX uint64) uint256.Int { |
There was a problem hiding this comment.
I know this is pre-existing code but relying on units being captured by variable names is risky. Flagging even if we don't address it in this PR because I think it at least deserves thought and discussion.
There was a problem hiding this comment.
I have always used the types to understand the denomination::
uint64=>nAVAXuint256.Int=>aAVAX
The only reason the variable here is called nAVAX is because we are handling both types and I felt that was better than amountNAVAX and amountAAVAX haha
I could add an explicit:
type (
NAVAX = uint64
AAVAX = uint256.Int
)If you think that would improve the code
There was a problem hiding this comment.
I added it, although the number of places that we can actually use this isn't as many as I thought
| } | ||
|
|
||
| func (*hooks) CanExecuteTransaction(common.Address, *common.Address, libevm.StateReader) error { | ||
| return nil |
There was a problem hiding this comment.
Always nil because it's C-Chain, correct? I think at the very least this requires a comment to state that it MUST align with the respective libevm hook to be registered. Otherwise there's a small risk of catastrophic cargo culting.
There was a problem hiding this comment.
it MUST align with the respective libevm hook to be registered.
I don't think this is correct. In coreth it MAY align. In subnet-evm it MUST NOT align.
The libevm.StateReader provided to this function (worst-case state ~= settled state) is different than the libevm.StateReader provided to the libevm hook (current state). If subnet-evm were to provide the same function here, the chain would FATAL during execution.
| } | ||
| } | ||
|
|
||
| if err := h.state.Apply(b.NumberU64(), txs); err != nil { |
There was a problem hiding this comment.
I made the minimal diff to prevent this, but I suspect we'll want to have some other behavior in the long-term.
That minimal diff being the change to rpc.noEndOfBlockOps.AfterExecutingBlock()?
| // TODO(StephenButtolph): Enforce the minimum block time here. | ||
| return customtypes.WithHeaderExtra( | ||
| &types.Header{ | ||
| ParentHash: parent.Hash(), |
There was a problem hiding this comment.
I think this is related to the answer on the thread about a comment. At the call site we've got some sense checks and warnings.
I agree that SAE should take as much of the complexity as possible, especially w.r.t. invariants, but not sure that expecting a parent hash is enough of a burden on a hook to warrant special-casing it.
| }, | ||
| ), nil | ||
| } | ||
|
|
There was a problem hiding this comment.
(note to self) Reviewed to here.
f42a802 to
1612a0a
Compare
| }, | ||
| } | ||
|
|
||
| ctx := log.CancelOnError(tb.Context()) |
There was a problem hiding this comment.
q: Should you check if the context is canceled before you return? This means that the test failed in setup, which should probably be treated as fatal. It sounds kind of gross, so I'm not totally sure
There was a problem hiding this comment.
I don't think we gain much by checking. Technically we could assert this in a ton of different places.
If we run into a hard to debug issue where that would have helped... I think it would make sense to add it, but I don't think it makes a ton of sense to add it proactively here.
Why this should be merged
This PR factors out a minimally testable VM shell from the SAE PoC #5303. This provides a stable baseline for the remaining C-Chain implementation.
How this works
This PR introduces three major components, with a significant number of TODOs.
hooksandbuilderare added to implement thehook.Pointsinterface.serviceandClientare added to enable testing through the expected user usage points.VMis added to wire together the SAE VM, thehooks, and the API.During the implementation of the API, it was discovered that the prior API code did not correctly paginate UTXOs in
GetUTXOs. The prior implementation would return the last UTXO as the first UTXO of the next page. This meant that page sizes of1could loop forever. This edge case was addressed.In order to implement the backend interface required by the txpool, 2 new functions were exposed by SAE. These functions could technically be accessed through the
GethAPIBackends, but then we would need to provide a context (that is never used) and it just felt gross to use theAPIBackendsin non-API code.How this was tested
Exporttransactions.Importtransactions.GetUTXOpagination.Need to be documented in RELEASES.md?
I don't think so, but when we go remove coreth, we will need to update the docs site to reference a new MD file for the avax API in this folder.