Skip to content

Simplex reconfiguration framework - Part IV (MSM implementation - verification)#381

Open
yacovm wants to merge 2 commits into
mainfrom
reconfig-3c
Open

Simplex reconfiguration framework - Part IV (MSM implementation - verification)#381
yacovm wants to merge 2 commits into
mainfrom
reconfig-3c

Conversation

@yacovm
Copy link
Copy Markdown
Collaborator

@yacovm yacovm commented May 12, 2026

No description provided.

@yacovm yacovm force-pushed the reconfig-3c branch 2 times, most recently from bcb6d88 to d5dd2de Compare May 13, 2026 17:07
Comment thread msm/msm.go Outdated
Comment thread msm/msm.go Outdated
Comment thread msm/msm.go Outdated
Comment thread msm/msm.go Outdated
Comment thread msm/msm.go Outdated
Comment thread msm/verification.go Outdated
Comment on lines +136 to +138
if prev.NextPChainReferenceHeight == 0 {
return nil
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we need to do some verification, otherwise a malicious node can make us finalize a block without us ever checking NextEpochApprovals.

See this test I asked claude to write.

// TestNextEpochApprovalsVerifierHaltViaGarbageApprovals demonstrates the liveness halt
// that the missing nil-check in verifyNormal enables.
//
// Scenario:
//
//	A: Normal block, prev.NextPChainReferenceHeight == 0.
//	B: Normal block proposed by a malicious leader. B legitimately advances
//	   NextPChainReferenceHeight (validator set changed), and also injects an
//	   all-1s bitmask into NextEpochApprovals. With prev (A).NextPChainReferenceHeight
//	   == 0 the verifier early-returns and B is accepted.
//	B+1: Honest leader. prev (B).NextPChainReferenceHeight > 0 so verifyNormal runs
//	     the full path. areNextEpochApprovalsSignersSupersetOfApprovalsOfPrevBlock
//	     requires B+1 to be a superset of B's all-1s bitmask. No honest proposer
//	     can produce that, so no valid B+1 exists -> chain halts at B.
//
// This test will start failing at step (B) once verifyNormal is fixed to reject
// garbage approvals when prev.NextPChainReferenceHeight == 0, which is the desired
// behavior.
func TestNextEpochApprovalsVerifierHaltViaGarbageApprovals(t *testing.T) {
	validators := NodeBLSMappings{
		{BLSKey: []byte{1}, Weight: 1},
		{BLSKey: []byte{2}, Weight: 1},
		{BLSKey: []byte{3}, Weight: 1},
	}
	getValidators := func(uint64) (NodeBLSMappings, error) { return validators, nil }

	v := &nextEpochApprovalsVerifier{
		sigVerifier:          &testSigVerifier{},
		getValidatorSet:      getValidators,
		keyAggregator:        &testKeyAggregator{},
		sigAggregatorCreator: newSignatureAggregatorCreator(),
	}

	// Malicious B: triggers transition AND injects all-1s garbage approvals.
	garbageApprovals := &NextEpochApprovals{NodeIDs: []byte{0xFF}, Signature: []byte("garbage")}
	mdA := SimplexEpochInfo{NextPChainReferenceHeight: 0, PChainReferenceHeight: 50}
	mdB := SimplexEpochInfo{
		NextPChainReferenceHeight: 100,
		PChainReferenceHeight:     50,
		NextEpochApprovals:        garbageApprovals,
	}

	// Step 1: verify B against A. Today the verifier accepts B. After the fix this
	// should fail with errUnexpectedApprovals and the halt below becomes unreachable.
	errB := v.Verify(verificationInput{
		nextBlockType:   BlockTypeNormal,
		prevMD:          StateMachineMetadata{SimplexEpochInfo: mdA},
		proposedBlockMD: StateMachineMetadata{SimplexEpochInfo: mdB},
	})
	require.NoError(t, errB, "malicious B accepted today; once fixed, expect errUnexpectedApprovals")

	// Step 2: honest B+1 tries to continue. It supplies its own approval (bit 0),
	// which is the realistic best-case for an honest proposer that just observed
	// the transition. The superset check against B's all-1s bitmask rejects it.
	honestApprovals := &NextEpochApprovals{NodeIDs: []byte{0x01}, Signature: []byte("sig")}
	mdBPlus1 := SimplexEpochInfo{
		NextPChainReferenceHeight: 100,
		PChainReferenceHeight:     50,
		NextEpochApprovals:        honestApprovals,
	}

	errBPlus1 := v.Verify(verificationInput{
		nextBlockType:   BlockTypeNormal,
		prevMD:          StateMachineMetadata{SimplexEpochInfo: mdB},
		proposedBlockMD: StateMachineMetadata{SimplexEpochInfo: mdBPlus1},
	})
	require.ErrorIs(t, errBPlus1, errSignerSetShrunk, "chain is wedged: no honest B+1 can satisfy the superset check against B's garbage")
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i think we need this check

	if prev.NextPChainReferenceHeight == 0 {
		if next.NextEpochApprovals != nil {
			return fmt.Errorf("%w but got %v", errUnexpectedApprovals, next.NextEpochApprovals)
		}
		return nil
	}

Comment thread msm/verification.go Outdated
prev, next := in.prevMD.SimplexEpochInfo, in.proposedBlockMD.SimplexEpochInfo

// An epoch number of 0 means this is not a Simplex block, so the next block should be the first Simplex block with epoch number 1.
if in.prevMD.SimplexEpochInfo.EpochNumber == 0 && in.proposedBlockMD.SimplexEpochInfo.EpochNumber != 1 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we ever reach this code path via the msm? if we are in this verifier that means the EpochNumber must be > 0. Is this just a defensive check?

Comment thread msm/verification.go Outdated
Comment on lines +497 to +500
prevBlock, _, err := v.getBlock(in.prevBlockSeq, md.Prev)
if err != nil {
return fmt.Errorf("failed retrieving block: %w", err)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this additional lookup is just to check is prevBlock.InnerBlock != nil. We can remove this lookup if we just pass in the prevBlock we already queried in VerifyBlock

prevMD := prevBlock.Metadata

Comment thread msm/msm.go Outdated
&epochNumberVerifier{},
&validationDescriptorVerifier{
getValidatorSet: func(pChainHeight uint64) (NodeBLSMappings, error) {
return sm.Config.GetValidatorSet(pChainHeight)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to just pass in the validator set for the pChainHeight in the verification input rather than passing in the method to both validationDescriptorVerifier and nextPChainReferenceHeightVerifier? I guess it would also save a lookup too

Comment thread msm/verification.go Outdated
if next.SealingBlockSeq != 0 {
return fmt.Errorf("%w: expected 0 but got %d", errSealingBlockSeqMismatch, next.SealingBlockSeq)
}
case BlockTypeTelock:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a defensive check to make sure the telock case falls under one of these ifs?

switch {
		case prev.SealingBlockSeq > 0:
			if next.SealingBlockSeq != prev.SealingBlockSeq {
				return fmt.Errorf("%w: expected %d but got %d", errSealingBlockSeqMismatch, prev.SealingBlockSeq, next.SealingBlockSeq)
			}
		case prev.BlockValidationDescriptor != nil:
			md, err := simplex.ProtocolMetadataFromBytes(in.prevMD.SimplexProtocolMetadata)
			if err != nil {
				return fmt.Errorf("failed parsing protocol metadata: %w", err)
			}
			if next.SealingBlockSeq != md.Seq {
				return fmt.Errorf("%w: expected %d but got %d", errSealingBlockSeqMismatch, md.Seq, next.SealingBlockSeq)
			}
		default:
			return errInvalidTelockPredecessor

Comment thread msm/msm.go Outdated
Copy link
Copy Markdown
Collaborator

@samliok samliok May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it better to construct the verificationInput once before the loop?

@yacovm yacovm force-pushed the reconfig-3c branch 2 times, most recently from a492fe6 to 7d9705d Compare May 19, 2026 21:17
Comment thread msm/misc_test.go Outdated
return false
}

type testBlockStore map[uint64]StateMachineBlock
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think we should be testing anything other structs in misc.go in this file. When we bridge over these avalanchego utilites defined in misc.go this file should just be deleted.

Unfortunately I think this means moving everything starting from line 181 and onwards.

Comment thread msm/msm.go Outdated
errNilInnerBlock = errors.New("InnerBlock is nil")
errBuiltGenesisInnerBlock = errors.New("received a genesis block")
errZeroBlockParentNoInnerBlock = errors.New("failed constructing zero block: parent block has no inner block")
errNilBlock = errors.New("block is nil")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on errNilBlock and errNilInnerBlock being one error?

Comment thread msm/msm.go Outdated
errLastNonSimplexInnerBlockNil = errors.New("failed constructing zero block: last non-Simplex inner block is nil")
errInvalidProtocolMetadataSeq = errors.New("invalid ProtocolMetadata sequence number: should be > 0")
errUnknownState = errors.New("unknown state")
errNilInnerBlock = errors.New("InnerBlock is nil")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplex/msm/msm.go

Lines 247 to 250 in 7d9705d

func (sm *StateMachine) VerifyBlock(ctx context.Context, block *StateMachineBlock) error {
if block == nil {
return errNilInnerBlock
}

i think this is being used wrongly here. maybe we should just remove it and use errNilBlock instead

Comment thread msm/msm.go Outdated
errPChainReferenceHeightDecreased = errors.New("P-chain reference height is decreasing")
errValidatorSetUnchanged = errors.New("validator set unchanged; next P-chain reference height should not have advanced")
errPChainHeightNotReached = errors.New("haven't reached referenced P-chain height yet")
errUnknownBlockType = errors.New("unknown block type")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove this error and just use errUnknownState

Comment thread msm/msm.go Outdated
errUnknownState = errors.New("unknown state")
errNilInnerBlock = errors.New("InnerBlock is nil")
errBuiltGenesisInnerBlock = errors.New("received a genesis block")
errZeroBlockParentNoInnerBlock = errors.New("failed constructing zero block: parent block has no inner block")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this plus errParentInnerBlockHasNoInnerBlock can be merged into one

Comment thread msm/msm.go Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this function should be part of StateMachine

Comment thread msm/msm.go
Comment thread msm/msm.go
Comment thread msm/msm.go Outdated
if err != nil {
sm.Logger.Error("Error retrieving previous sealing block", zap.Uint64("seq", simplexEpochInfo.EpochNumber), zap.Error(err))
return nil, fmt.Errorf("failed to retrieve previous sealing InnerBlock at epoch %d: %w", simplexEpochInfo.EpochNumber-1, err)
return SimplexEpochInfo{}, fmt.Errorf("failed to retrieve previous sealing InnerBlock at epoch %d: %w", simplexEpochInfo.EpochNumber-1, err)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we do EpochNumber - 1 here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, damn auto-complete

Comment thread msm/msm.go Outdated
pChainHeightBuff := make([]byte, 8)
binary.BigEndian.PutUint64(pChainHeightBuff, pChainHeight)

if err := sm.SignatureVerifier.VerifySignature(next.NextEpochApprovals.Signature, pChainHeightBuff, aggPK); err != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a domain separator to this signature? similarly to what we do for quorum certificates?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was aware of it but kept postponing it because we only introduce signing at the next PR so didn't felt like it's needed at this point, but might as well do it now

Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Copy link
Copy Markdown
Collaborator

@samliok samliok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was one thing confusing me but I couldn't put my finger on it. I think the question that I have been stumped on is "How do we find the first ever simplex block?"

I know we have the FirstEverSimplexBlock callback defined in StateMachine but I don't get how this function would be implemented in practice. We would need to initialize our engine to find the first ever simplex block which is either going to be seq 1 or seq lastSnowmanBlock + 1. If we started from genesis this is easy, but I'm unsure of how we plan on storing lastSnowmanBlock? Iterating through all the blocks to find this seems slow.

Also if I'm a non-validator looking to replicate epoch 1, I must validate the first ever simplex block before any other blocks in that epoch. But the question again is "how can i query for the first ever simplex block"? If another epoch has been created then I can just use PrevSealingBlockHash, but if no other epoch has been created what do I do?

// PrevSealingBlockHash is the hash of the sealing block of the previous epoch.
	// It is empty for the first epoch, and the second epoch has the PrevSealingBlockHash set to be
	// the hash of the first ever block built by the StateMachine.
	PrevSealingBlockHash [32]byte `canoto:"fixed bytes,3"`

Copy link
Copy Markdown
Collaborator

@samliok samliok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another question regarding the MSM + non validators. The MSM is to "hijack" BlockBuilding and Verification. For non-validators we do not care about BlockBuilding but we probably want to use the MSM for the same verification logic. Does it make sense to re-use the MSM struct for both non-validators and validators, even though non-validators do not care about BlockBuilding

@samliok
Copy link
Copy Markdown
Collaborator

samliok commented May 22, 2026

There was one thing confusing me but I couldn't put my finger on it. I think the question that I have been stumped on is "How do we find the first ever simplex block?"

I know we have the FirstEverSimplexBlock callback defined in StateMachine but I don't get how this function would be implemented in practice. We would need to initialize our engine to find the first ever simplex block which is either going to be seq 1 or seq lastSnowmanBlock + 1. If we started from genesis this is easy, but I'm unsure of how we plan on storing lastSnowmanBlock? Iterating through all the blocks to find this seems slow.

Also if I'm a non-validator looking to replicate epoch 1, I must validate the first ever simplex block before any other blocks in that epoch. But the question again is "how can i query for the first ever simplex block"? If another epoch has been created then I can just use PrevSealingBlockHash, but if no other epoch has been created what do I do?

// PrevSealingBlockHash is the hash of the sealing block of the previous epoch.
	// It is empty for the first epoch, and the second epoch has the PrevSealingBlockHash set to be
	// the hash of the first ever block built by the StateMachine.
	PrevSealingBlockHash [32]byte `canoto:"fixed bytes,3"`

It would be nice in some regards(and bad in others) if the first ever epoch was the sequence number of the first simplex block. In the genesis case the first epoch will be epoch 1. In the case of 100 snowman blocks -> simplex, the first epoch will be epoch 101 because the sequence of the first simplex block is 101.

This would be nice because if we are in the first epoch we can now know which sequence number to fetch to get to the start of the epoch. However, because the epoch number is not 1, we don't know for sure if we are in the first epoch. This is an easy problem because we can just check if that sequence is a sealing block, or check if PrevSealingBlockHash is empty & BlockValidatorDescriptor is present. In these cases we know we have hit the first ever simplex block.

Another nice benefit of this change is the branching in createSealingBlock can go away PLUS the weird callback that I was confused about in my earlier comment. This is because the logic in the if simplexEpochInfo.EpochNumber > 1 is applicable to both scenarios

	// If this is not the first epoch, and this is the sealing block, we set the hash of the previous sealing block.
	if simplexEpochInfo.EpochNumber > 1 {
		prevSealingBlock, finalization, err := sm.GetBlock(simplexEpochInfo.EpochNumber, [32]byte{})
		if err != nil {
			sm.Logger.Error("Error retrieving previous sealing block", zap.Uint64("seq", simplexEpochInfo.EpochNumber), zap.Error(err))
			return nil, fmt.Errorf("failed to retrieve previous sealing InnerBlock at epoch %d: %w", simplexEpochInfo.EpochNumber-1, err)
		}
		if finalization == nil {
			sm.Logger.Error("Previous sealing block is not finalized", zap.Uint64("seq", simplexEpochInfo.EpochNumber))
			return nil, fmt.Errorf("previous sealing InnerBlock at epoch %d is not finalized", simplexEpochInfo.EpochNumber-1)
		}
		simplexEpochInfo.PrevSealingBlockHash = prevSealingBlock.Digest()
	} else { // Else, this is the first epoch, so we use the hash of the first ever Simplex block.
		firstSimplexBlock := sm.FirstEverSimplexBlock()
		if firstSimplexBlock == nil {
			return nil, fmt.Errorf("first ever Simplex block is not set, but attempted to create a sealing block for the first epoch")
		}
		simplexEpochInfo.PrevSealingBlockHash = firstSimplexBlock.Digest()
	}

These are just my thoughts, not a concrete suggestion but they could be worth thinking about.

@yacovm
Copy link
Copy Markdown
Collaborator Author

yacovm commented May 24, 2026

another question regarding the MSM + non validators. The MSM is to "hijack" BlockBuilding and Verification. For non-validators we do not care about BlockBuilding but we probably want to use the MSM for the same verification logic. Does it make sense to re-use the MSM struct for both non-validators and validators, even though non-validators do not care about BlockBuilding

I am not sure why a non-validator should use the verification function of the MSM. It checks the state transition is legal, but there is no need to do that because as a non-validator, you only care about finalized blocks, which anyway carry an aggregated signature.

@yacovm
Copy link
Copy Markdown
Collaborator Author

yacovm commented May 24, 2026

There was one thing confusing me but I couldn't put my finger on it. I think the question that I have been stumped on is "How do we find the first ever simplex block?"
I know we have the FirstEverSimplexBlock callback defined in StateMachine but I don't get how this function would be implemented in practice. We would need to initialize our engine to find the first ever simplex block which is either going to be seq 1 or seq lastSnowmanBlock + 1. If we started from genesis this is easy, but I'm unsure of how we plan on storing lastSnowmanBlock? Iterating through all the blocks to find this seems slow.
Also if I'm a non-validator looking to replicate epoch 1, I must validate the first ever simplex block before any other blocks in that epoch. But the question again is "how can i query for the first ever simplex block"? If another epoch has been created then I can just use PrevSealingBlockHash, but if no other epoch has been created what do I do?

// PrevSealingBlockHash is the hash of the sealing block of the previous epoch.
	// It is empty for the first epoch, and the second epoch has the PrevSealingBlockHash set to be
	// the hash of the first ever block built by the StateMachine.
	PrevSealingBlockHash [32]byte `canoto:"fixed bytes,3"`

It would be nice in some regards(and bad in others) if the first ever epoch was the sequence number of the first simplex block. In the genesis case the first epoch will be epoch 1. In the case of 100 snowman blocks -> simplex, the first epoch will be epoch 101 because the sequence of the first simplex block is 101.

This would be nice because if we are in the first epoch we can now know which sequence number to fetch to get to the start of the epoch. However, because the epoch number is not 1, we don't know for sure if we are in the first epoch. This is an easy problem because we can just check if that sequence is a sealing block, or check if PrevSealingBlockHash is empty & BlockValidatorDescriptor is present. In these cases we know we have hit the first ever simplex block.

Another nice benefit of this change is the branching in createSealingBlock can go away PLUS the weird callback that I was confused about in my earlier comment. This is because the logic in the if simplexEpochInfo.EpochNumber > 1 is applicable to both scenarios

	// If this is not the first epoch, and this is the sealing block, we set the hash of the previous sealing block.
	if simplexEpochInfo.EpochNumber > 1 {
		prevSealingBlock, finalization, err := sm.GetBlock(simplexEpochInfo.EpochNumber, [32]byte{})
		if err != nil {
			sm.Logger.Error("Error retrieving previous sealing block", zap.Uint64("seq", simplexEpochInfo.EpochNumber), zap.Error(err))
			return nil, fmt.Errorf("failed to retrieve previous sealing InnerBlock at epoch %d: %w", simplexEpochInfo.EpochNumber-1, err)
		}
		if finalization == nil {
			sm.Logger.Error("Previous sealing block is not finalized", zap.Uint64("seq", simplexEpochInfo.EpochNumber))
			return nil, fmt.Errorf("previous sealing InnerBlock at epoch %d is not finalized", simplexEpochInfo.EpochNumber-1)
		}
		simplexEpochInfo.PrevSealingBlockHash = prevSealingBlock.Digest()
	} else { // Else, this is the first epoch, so we use the hash of the first ever Simplex block.
		firstSimplexBlock := sm.FirstEverSimplexBlock()
		if firstSimplexBlock == nil {
			return nil, fmt.Errorf("first ever Simplex block is not set, but attempted to create a sealing block for the first epoch")
		}
		simplexEpochInfo.PrevSealingBlockHash = firstSimplexBlock.Digest()
	}

These are just my thoughts, not a concrete suggestion but they could be worth thinking about.

Yeah I like this idea. 👍

@yacovm yacovm force-pushed the reconfig-3c branch 6 times, most recently from e03f392 to 7e422d1 Compare May 27, 2026 19:04
Signed-off-by: Yacov Manevich <yacov.manevich@avalabs.org>
Copy link
Copy Markdown
Collaborator

@samliok samliok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mainly cosmetic things, except for the question about telocks. probably ready to be approved once resolved

Comment thread msm/msm.go
func (sm *StateMachine) verifyBlockZero(block *StateMachineBlock, prevBlock StateMachineBlock) error {
if block == nil {
return fmt.Errorf("block is nil")
return errNilBlock
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have this check already in verifyBlock

Comment thread msm/msm.go

var (
errLastNonSimplexInnerBlockNil = errors.New("failed constructing zero block: last non-Simplex inner block is nil")
errInvalidProtocolMetadataSeq = errors.New("invalid ProtocolMetadata sequence number: should be > 0")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never check for this error. should we combine

	errInvalidProtocolMetadataSeq     = errors.New("invalid ProtocolMetadata sequence number: should be > 0")
	errInvalidSimplexEpochInfo        = errors.New("invalid SimplexEpochInfo")

into one error maybe errSeqIsGenesis?

Comment thread msm/msm.go
if finalization != nil {
isSealingBlockFinalized = true
} else {
sm.Logger.Debug("Previous sealing block not finalized yet, "+
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we combine this log with the one on line 336? The sealingBlockSeq log is just the epoch number which we log before this call.

	sm.Logger.Debug("Block building decision",
		zap.Bool("build inner block", decisionToBuildBlock.buildInnerBlock),
		zap.Bool("transition epoch", decisionToBuildBlock.transitionEpoch),
		zap.Uint64("P-chain height", decisionToBuildBlock.pChainHeight),
		zap.Bool("SealingBlockFinalized", isSealingBlockFinalized))

Comment thread msm/msm.go
func (sm *StateMachine) buildBlockOrTransitionEpoch(ctx context.Context, parentBlock StateMachineBlock, simplexMetadata, simplexBlacklist []byte, newSimplexEpochInfo SimplexEpochInfo) (*StateMachineBlock, error) {
var isSealingBlockFinalized bool
sealingBlockSeq := parentBlock.Metadata.SimplexEpochInfo.EpochNumber
_, finalization, err := sm.GetBlock(sealingBlockSeq, [32]byte{})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no action: BlockBuilding runs in its own go-routine, therefore it's possible the sealing block was not finalized when this was called but become shortly after. In this case, I think we will fail verifying the block later on which should be fine

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we fail verifying the block later on? If the sealing block wasn't finalized when this is called, then we will just not start the epoch transition. It's perfectly legal to not opt-in and wait for some other node to start it.

Comment thread msm/msm.go
zap.Uint64("P-chain height", decisionToBuildBlock.pChainHeight))

if decisionToBuildBlock.transitionEpoch {
if decisionToBuildBlock.transitionEpoch && isSealingBlockFinalized {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure i get this change though. Don't we build telocks if the sealing block is not finalized?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regardless, should we use isSealingBlockFinalizaed() instead ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure i get this change though. Don't we build telocks if the sealing block is not finalized?

This is actually to make sure that the zero block is finalized before we start transitioning to a new epoch.

But looking at this, I think we need to add a check for that in the verification code + a comment, because this is indeed unclear.

regardless, should we use isSealingBlockFinalizaed() instead ?

Not sure what you mean by (), can you elaborate?

Comment thread msm/msm.go

return block.InnerBlock.Verify(ctx)
func (sm *StateMachine) verifyEpochNumber(block *StateMachineBlock) error {
md, err := simplex.ProtocolMetadataFromBytes(block.Metadata.SimplexProtocolMetadata)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already fetch the protocol metadata from bytes on line 232. should we just pass that in here instead of re-parsing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but then I'll need to ripple that all over the place.

I think it's actually easier to keep it as is for now, because once we canoto-tize the protocol metadata, we won't store them as bytes anymore in the block.Metadata.SimplexProtocolMetadata but as regular fields.

Comment thread msm/msm.go

}

func computeSimplexEpochInfoForNewEpoch(newSimplexEpochInfo SimplexEpochInfo, parentBlock StateMachineBlock, sealingBlockSeq uint64, prevBlockSeq uint64) SimplexEpochInfo {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we updating newSimplexEpochInfo and also returning it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

artifact of refactoring, will fix.

Comment thread msm/msm.go
}

isSealingBlockFinalized := finalization != nil
newSimplexEpochInfo := computeSimplexEpochInfoForTelock(parentBlock, sealingBlockSeq, prevBlockSeq)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just set this if the block is a tealock? i.e in the if statement right below?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we anyway override it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants