rfc: secret sharing by davidweisse · Pull Request #2341 · edgelesssys/contrast

davidweisse · 2026-04-22T14:03:59Z

No description provided.

burgerdev

Thanks, great proposal!

daniel-weisse · 2026-04-28T13:21:51Z

+
+### UserAPI Changes
+
+For the implementation we can use a secret sharing scheme based on OpenBao's [Shamir](https://pkg.go.dev/github.com/openbao/openbao/sdk/v2/helper/shamir) go library.


Any specific reason for using OpenBao's version over the original Hashiscorp version?

@burgerdev wdyt?

Vault was relicensed to BSL: https://github.com/hashicorp/vault/blob/main/LICENSE. The headers still say MPL sometimes, but it's maybe safer to use the fork.

It also looks like Bao is better maintained here: openbao/openbao#182

For MarbleRun, we decided to vendor shamir.go because we didn't want to add a huge depdendency like Vault or OpenBao SDK to the go.mod just for this small, stable part of code.
We decided to take it from this commit: https://github.com/hashicorp/vault/tree/8d07273d14ae7f5a48cc96f66cc86615dea83390/shamir
It comes with the license next to it. These two files may just be copied like here https://github.com/edgelesssys/marblerun/tree/master/3rdparty/hashicorp/shamir
We also looked at the changes in OpenBao to it since then and concluded that none of them are needed.

daniel-weisse · 2026-04-28T13:31:34Z

+```
+
+This threshold defaults to the number of provided seed share owner public keys, so by default every seed share owner would need to be involved in the recovery process.
+The lowest value would be 1, in which case the behavior is the same as the current implementation, allowing every seed share owner to recover the Coordinator on their own.


Do note that SSS requires at least 2 shares, meaning the old implementation has to be kept for this case

burgerdev · 2026-05-04T09:12:47Z

+The `AuthorizeByManifest` function can now additionally take the seed shares as an argument to reconstruct the seed if the threshold is met and return the received seed share otherwise:
+
+```go
+type SecretSourceAuthorizer interface {
+    AuthorizeByManifest(ctx context.Context, manifest *manifest.Manifest, seedShares [][]byte) (seedengine.SeedEngine, crypto.PublicKey, []byte, error)
+}
+```


If the seed shares are not relevant for all implementations, wouldn't it make more sense to make this internal to seedAuthorizer?

That makes sense.

burgerdev

lgtm

msanft

Generally LGTM, but I fail to see the practical motivation for this.

msanft · 2026-05-06T09:21:32Z

+### UserAPI Changes
+
+For the implementation we can use a secret sharing scheme based on OpenBao's [Shamir](https://pkg.go.dev/github.com/openbao/openbao/sdk/v2/helper/shamir) go library.
+During the initial `contrast set`, the seed is split into shares using the specified threshold and the provided seed share owner public keys, and each share is encrypted with the corresponding public key and returned to the user.


Is it a goal of Contrast to allow up-/downscaling of seed share owners on a live deployment? At least the latter wouldn't work anymore with this, right?

Scaling down the seed share owners would need to generate a new seed (otherwise, assumptions about prior manifests' security might not hold anymore). At that point, it's probably easier to just start a fresh manifest history.

thomasten · 2026-05-06T14:00:57Z

+
+For now, we can say the solution is to scale down the Coordinator to a single instance.


This is also what we do in MarbleRun btw

thomasten · 2026-05-06T14:02:06Z

+
+### UserAPI Changes
+
+For the implementation we can use a secret sharing scheme based on OpenBao's [Shamir](https://pkg.go.dev/github.com/openbao/openbao/sdk/v2/helper/shamir) go library.


For MarbleRun, we decided to vendor shamir.go because we didn't want to add a huge depdendency like Vault or OpenBao SDK to the go.mod just for this small, stable part of code.
We decided to take it from this commit: https://github.com/hashicorp/vault/tree/8d07273d14ae7f5a48cc96f66cc86615dea83390/shamir
It comes with the license next to it. These two files may just be copied like here https://github.com/edgelesssys/marblerun/tree/master/3rdparty/hashicorp/shamir
We also looked at the changes in OpenBao to it since then and concluded that none of them are needed.

thomasten · 2026-05-06T14:04:03Z

+
+If the Coordinator is in recovery mode, this list is used to track the received seed shares via the `Recover` RPC of the user API.
+Peer recovery still works as before, since there are no seed shares involved.
+If a `Recover` call from the user is made where the newly received seed share combined with the already received shares meets the threshold, the seed is reconstructed and the Coordinator can be recovered normally.


How should the Coordinator behave if the wrong seed is reconstructed because one of the inputs was wrong?

I guess this would imply a malicious seed share owner? The Coordinator has no way to identify a wrong share, so it will create a new seed engine anyway. This should however be immediately noticeable when attesting the Coordinator with contrast verify because the root CA changed.

No, the signature of the latest transition is checked during recovery, so directly after the seed engine has been created. If the seed engine is wrong, the user that called Recover last will receive an error verifying latest transition.... Since the Coordinator doesn't know which seed share is incorrect, maybe we have to transition to a new state with no seed shares and start again?

Yes, if we can't verify the signature with the combined seed we should throw all shares away and start from scratch.

sespiros · 2026-05-15T10:08:36Z

+When the `Recover` RPC is called and the state is stale, the `ResetState` method of the guard is invoked, which takes a `SecretSourceAuthorizer` as an argument.
+This interface currently only has the method `AuthorizeByManifest`, which validates the peer given the manifest from the store and returns the new seed engine and mesh CA key.
+This is used both by the user API for `contrast recover` and by the mesh API during peer recovery.
+The `AuthorizeByManifest` function now additionally returns the reconstructed seed if the threshold is met and the received seed share otherwise:


sespiros · 2026-05-18T20:19:58Z

+## Background
+
+During an initial `contrast set` the secret seed is encrypted with each seed share owner's public key and returned to the user.
+This means that each seed share owner has access to the complete seed, and thus the ability to recover the Coordinator in case of a restart.
+The original idea was to use a secret sharing scheme to split the seed into multiple shares, and require a threshold of them to recover the seed.
+This would have the advantage of not giving any single seed share owner access to the complete seed.


The background section should mention the motivation behind this change more explicitly (by incorporating Markus' comment about mutually distrusting data owners or the more theoretical benefit of multi-party recovery).

Another necessity that also emerges then is that we need to extend our threat model to include a new actor, the less-than-fully-trusted seed share owner. This needs to be described either as part of this proposal or at least be documented as follow-up TODO to our threat model (security.md and threat model docs page).

sespiros · 2026-05-18T21:07:37Z

+    seedEngine    *seedengine.SeedEngine
+    ...
+
+    seedShares [][]byte // new field to track the received seed shares


Overloading State's stale state with another type of "staleness", for when we are in the middle of recovery, is confusing. A manifestation of this "confusion" is having to change AuthorizeByManifest's contract to behave differently depending on whether we have shares in flight or not.

Reading the code there, it seems to me that Guard could be a more suitable home for the intermediate shares until a threshold is met. An example sketch of a design that seems more straight-forward to me:

type Guard struct { ... partialMu sync.Mutex // nil when no user recovery is in flight. // Replaced as a whole on each share submission.. // Never mutated in place because callers retain the pointer as a snapshot after AddSeedShare returns. partialRecovery *PartialRecovery } // PartialRecovery is the immutable accumulator of seed shares submitted // during a multi-party user recovery. It is bound to a specific recovery // target (insecure latest transition + manifest) and a single salt. type PartialRecovery struct { TransitionHash [history.HashSize]byte // hist.GetLatestInsecure() at first submission Salt []byte // captured from the first submitter Shares map[ShareID][]byte // keyed by submitter pubkey. Treated as read-only } // ShareID identifies a seed share submitter by their hex-encoded RSA // public key, matching an entry in Manifest.SeedshareOwnerPubKeys. type ShareID = manifest.HexString func (g *Guard) AddSeedShare(ctx context.Context, callerID ShareID, share, salt []byte) (*PartialRecovery, error)

Then AddSeedShare would implement the logic that checks for errors, checks the threshold etc. If the threshold is met, it would reconstruct the seed, build a seedAuthorizer, call Guard.ResetState etc. It could be made to also return the complete seed, this is just an example.

We would clear PartialRecovery in case of either success or error. This is also in line with the other dicsussion about what to do is the wrong seed is reconstructed etc.

sespiros

Thanks for this proposal. A couple of comments inline. The 2 most important ones are:

the consideration around how this should affect our threat model
the suggestion to put the shares in Guard instead of State

The rest are really comments on the implementation.

sespiros · 2026-05-18T21:12:38Z

+
+```go
+type SecretSourceAuthorizer interface {
+    AuthorizeByManifest(ctx context.Context, manifest *manifest.Manifest) (seedengine.SeedEngine, crypto.PublicKey, []byte, error)


I believe this is meant to be *seedengine.SeedEngine and *ecdsa.PrivateKey but ideally see my other comment that suggests not changing this method's contract at all.

sespiros · 2026-05-18T21:39:45Z

+If the Coordinator is in recovery mode, this list is used to track the received seed shares via the `Recover` RPC of the user API.
+Peer recovery still works as before, since there are no seed shares involved.
+If a `Recover` call from the user is made where the newly received seed share combined with the already received shares meets the threshold, the seed is reconstructed and the Coordinator can be recovered normally.
+If the list contains some number of seed shares below the threshold and peer recovery triggers, the Coordinator is recovered via peer recovery as expected.


My understanding from the proposal section was that in the first version we would need to scale down the coordinator to one instance which would be in non-recovered state right? So how is it possible for peer recovery to trigger mid-user-recovery via shares?

Is there a scenario where while we are scaling down to one instance, a non-recovered coordinator with seed shares accumulated, does peer recover from a recovered peer before the recovered one gets torn down?

Should it be specified more like:

if a recovered coordinator exists, use peer recovery

if no recovered coordinator exists, scale down to one and perform threshold user recovery

don't use --force for threshold recovery while recovered peers exist

This is more of an edge case: if one recovered Coordinator and one non-recovered Coordinator exist (e.g., 2 Coordinators and I restart one), and I call recover with --force, the non-recovered Coordinator can accumulate seed shares before peer recovery triggers. I think otherwise the behavior is the same as before, i.e., prefer peer recovery when --force is not specified. Scaling down to 1 for user recovery should then be documented in the docs.

sespiros · 2026-05-18T21:46:20Z

+
+We implement a secret sharing scheme for the seed, and each seed share owner only has an encrypted seed share.
+To recover the Coordinator, a threshold of seed shares need to be combined to recover the seed.
+This is done by each seed share owner calling `contrast recover` with their encrypted seed share, and the Coordinator combines the shares and recovers the seed if the threshold is met.


Technically seed share owners call contrast recover with a plain text seed share (but over encrypted transport/aTLS).

The seed share is encrypted on disk, so from a user perspective we call contrast recover with the encrypted seed share which is then decrypted by the CLI before sending it to the Coordinator.

sespiros · 2026-05-18T21:53:39Z

+If the Coordinator has enough seed shares to recover the seed but one of the seed shares is invalid, through, for example, a malicious seed share owner, the following signature verification of the latest transition will fail.
+In that case, we don't know which seed shares are valid or invalid, so the only option is to reset the state and start the recovery process from scratch with no seed shares.
+
+The proto messages for the `Recover` API remain basically unchanged, now including the encrypted seed share instead of the encrypted seed:


Maybe this is obvious but I think it should be explicitly mentioned here that each share owner is expected to send the same Salt as part of RecoverRequest and that this should be validated by the coordinator (that's why I included the Salt in my suggested design change).

I have also made the comment in the proposal section but, wdym by "encrypted seed share", is this encrypted with something else that I have missed?

Yes, I forgot about that. It only makes sense to also validate the salt.

The proto paragraph needs rewriting: the recover request now includes the seed share (not encrypted), and the SeedShare message type belongs to the SetManifestResponse, where the encrypted seed share is correct.

sespiros · 2026-05-18T22:02:46Z

+If the Coordinator has enough seed shares to recover the seed but one of the seed shares is invalid, through, for example, a malicious seed share owner, the following signature verification of the latest transition will fail.
+In that case, we don't know which seed shares are valid or invalid, so the only option is to reset the state and start the recovery process from scratch with no seed shares.
+
+The proto messages for the `Recover` API remain basically unchanged, now including the encrypted seed share instead of the encrypted seed:


Looking at

contrast/internal/manifest/crypto.go

Line 95 in 44a892d

cipherText, err := rsa.EncryptOAEP(sha256.New(), rand.Reader, pubKey, seed, []byte("seedshare"))

I wonder if we should have a more specific label for these new shares (I gather this is like aad in aead), that would also bind them to properties of their intended use like "intended to be used by sss", the threshold, total share count etc.

I see this as a robustness/ux improvement that would help in potential early mismatch detection/cleaner domain separation.

I think we can do something like "seedshare-k/n", for a k out of n threshold scheme. Both the user and the Coordinator have access to the manifest, so they both know the threshold and number of seed share owners. I'm not sure how useful this is though. This would only help when a user accidentally runs contrast recover with a seed share from a threshold scheme with different parameters. Maybe even "seedshare-k-{hash(seedshare owner keys)}".

But keep in mind that the untrusted seed share owner can simply choose the label himself and forge an incorrect share with the correct label, so this would only be a UX improvement, not adding any robustness in that scenario.

davidweisse requested a review from burgerdev April 22, 2026 14:04

davidweisse added the no changelog PRs not listed in the release notes label Apr 22, 2026

burgerdev reviewed Apr 23, 2026

View reviewed changes

Comment thread rfc/013-secret-sharing.md Outdated

Comment thread rfc/013-secret-sharing.md Outdated

Comment thread rfc/013-secret-sharing.md

davidweisse force-pushed the dav/rfc-13 branch from 126e4b9 to 8f6b9f9 Compare April 27, 2026 13:30

daniel-weisse reviewed Apr 28, 2026

View reviewed changes

davidweisse force-pushed the dav/rfc-13 branch from 8f6b9f9 to 27bdc9e Compare April 30, 2026 10:51

davidweisse requested a review from burgerdev April 30, 2026 10:52

burgerdev reviewed May 4, 2026

View reviewed changes

davidweisse force-pushed the dav/rfc-13 branch from 27bdc9e to 26f5d0b Compare May 4, 2026 10:10

sespiros self-requested a review May 5, 2026 12:05

burgerdev approved these changes May 6, 2026

View reviewed changes

burgerdev requested review from msanft and thomasten May 6, 2026 08:53

msanft reviewed May 6, 2026

View reviewed changes

msanft approved these changes May 6, 2026

View reviewed changes

thomasten reviewed May 6, 2026

View reviewed changes

rfc: secret sharing

fcbabbb

davidweisse force-pushed the dav/rfc-13 branch from 26f5d0b to fcbabbb Compare May 11, 2026 11:00

sespiros reviewed May 18, 2026

View reviewed changes


		### UserAPI Changes

		For the implementation we can use a secret sharing scheme based on OpenBao's [Shamir](https://pkg.go.dev/github.com/openbao/openbao/sdk/v2/helper/shamir) go library.


		For now, we can say the solution is to scale down the Coordinator to a single instance.

Conversation

davidweisse commented Apr 22, 2026

Uh oh!

burgerdev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

burgerdev left a comment

Choose a reason for hiding this comment

Uh oh!

msanft left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sespiros May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sespiros May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sespiros left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

sespiros May 18, 2026 •

edited

Loading

sespiros May 18, 2026 •

edited

Loading