-
Notifications
You must be signed in to change notification settings - Fork 20
rfc: secret sharing #2341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
rfc: secret sharing #2341
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
| @@ -0,0 +1,99 @@ | ||||
| # Secret Sharing | ||||
|
|
||||
| ## Background | ||||
|
|
||||
| During an initial `contrast set` the secret seed is encrypted with each seed share owner's public key and returned to the user. | ||||
| This means that each seed share owner has access to the complete seed, and thus the ability to recover the Coordinator in case of a restart. | ||||
| The original idea was to use a secret sharing scheme to split the seed into multiple shares, and require a threshold of them to recover the seed. | ||||
| This would have the advantage of not giving any single seed share owner access to the complete seed. | ||||
|
Comment on lines
+3
to
+8
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The background section should mention the motivation behind this change more explicitly (by incorporating Markus' comment about mutually distrusting data owners or the more theoretical benefit of multi-party recovery). Another necessity that also emerges then is that we need to extend our threat model to include a new actor, the less-than-fully-trusted seed share owner. This needs to be described either as part of this proposal or at least be documented as follow-up TODO to our threat model (security.md and threat model docs page). |
||||
| It also comes with the drawback that the seed share owners would need to coordinate to recover the seed, which adds complexity to the recovery process. | ||||
|
|
||||
| ## Proposal | ||||
|
burgerdev marked this conversation as resolved.
|
||||
|
|
||||
| We implement a secret sharing scheme for the seed, and each seed share owner only has an encrypted seed share. | ||||
| To recover the Coordinator, a threshold of seed shares need to be combined to recover the seed. | ||||
| This is done by each seed share owner calling `contrast recover` with their encrypted seed share, and the Coordinator combines the shares and recovers the seed if the threshold is met. | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically seed share owners call
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The seed share is encrypted on disk, so from a user perspective we call |
||||
|
|
||||
| This approach only works if there is only one Coordinator instance, since `contrast recover` only connects to one Coordinator instance. | ||||
| If there are multiple instances, the seed shares would need to be distributed to all of them. | ||||
| In the current peer recovery process, Coordinators that enter a stale state call the `Recover` RPC of the mesh API of other Coordinators to recover. | ||||
| In order to distribute the seed shares to all instances however, we need an RPC in the opposite direction, where the Coordinator in recovery can send the received seed shares to the other Coordinators. | ||||
|
|
||||
| When implementing this, we need to additionally consider concurrent recovery attempts on different Coordinator instances. | ||||
| If 2 seed shares remain to meet the threshold and 2 different seed share owners call `contrast recover` at the same time, both Coordinators have to update each other's state without running into an error. | ||||
|
|
||||
| For now, we can say the solution is to scale down the Coordinator to a single instance. | ||||
|
Comment on lines
+24
to
+25
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is also what we do in MarbleRun btw |
||||
|
|
||||
| ### Manifest Changes | ||||
|
|
||||
| In the manifest, we add an additional field `SeedShareOwnerThreshold` which specifies the number of seed shares that need to be combined to recover the seed: | ||||
|
|
||||
| ```json | ||||
| { | ||||
| ... | ||||
| "SeedShareOwnerThreshold": 3, | ||||
| "SeedShareOwnerPubKeys": [ ... ] | ||||
| } | ||||
| ``` | ||||
|
|
||||
| This threshold defaults to the number of provided seed share owner public keys, so by default every seed share owner would need to be involved in the recovery process. | ||||
| The lowest value would be 1, in which case the behavior is the same as the current implementation, allowing every seed share owner to recover the Coordinator on their own. | ||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do note that SSS requires at least 2 shares, meaning the old implementation has to be kept for this case |
||||
| Like the seed share owner public keys, this field can't be updated after the initial `contrast set`. | ||||
|
|
||||
| ### UserAPI Changes | ||||
|
|
||||
| For the implementation we can use a secret sharing scheme based on [HashiCorp's](https://github.com/hashicorp/vault/tree/8d07273d14ae7f5a48cc96f66cc86615dea83390/shamir) or [OpenBao's](https://pkg.go.dev/github.com/openbao/openbao/sdk/v2/helper/shamir) Shamir go library. | ||||
| During the initial `contrast set`, the seed is split into shares using the specified threshold and the provided seed share owner public keys, and each share is encrypted with the corresponding public key and returned to the user. | ||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it a goal of Contrast to allow up-/downscaling of seed share owners on a live deployment? At least the latter wouldn't work anymore with this, right?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Scaling down the seed share owners would need to generate a new seed (otherwise, assumptions about prior manifests' security might not hold anymore). At that point, it's probably easier to just start a fresh manifest history. |
||||
|
|
||||
| For recovery, the Coordinator needs to track the received seed shares and combine them when the threshold is met. | ||||
| For this, we add a new field to the `State` struct: | ||||
|
|
||||
| ```go | ||||
| type State struct { | ||||
| seedEngine *seedengine.SeedEngine | ||||
| ... | ||||
|
|
||||
| seedShares [][]byte // new field to track the received seed shares | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Overloading Reading the code there, it seems to me that type Guard struct {
...
partialMu sync.Mutex
// nil when no user recovery is in flight.
// Replaced as a whole on each share submission..
// Never mutated in place because callers retain the pointer as a snapshot after AddSeedShare returns.
partialRecovery *PartialRecovery
}
// PartialRecovery is the immutable accumulator of seed shares submitted
// during a multi-party user recovery. It is bound to a specific recovery
// target (insecure latest transition + manifest) and a single salt.
type PartialRecovery struct {
TransitionHash [history.HashSize]byte // hist.GetLatestInsecure() at first submission
Salt []byte // captured from the first submitter
Shares map[ShareID][]byte // keyed by submitter pubkey. Treated as read-only
}
// ShareID identifies a seed share submitter by their hex-encoded RSA
// public key, matching an entry in Manifest.SeedshareOwnerPubKeys.
type ShareID = manifest.HexString
func (g *Guard) AddSeedShare(ctx context.Context, callerID ShareID, share, salt []byte) (*PartialRecovery, error)Then We would clear |
||||
| ... | ||||
|
|
||||
| stale atomic.Bool | ||||
| } | ||||
| ``` | ||||
|
|
||||
| If the Coordinator is in recovery mode, this list is used to track the received seed shares via the `Recover` RPC of the user API. | ||||
| Peer recovery still works as before, since there are no seed shares involved. | ||||
| If a `Recover` call from the user is made where the newly received seed share combined with the already received shares meets the threshold, the seed is reconstructed and the Coordinator can be recovered normally. | ||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How should the Coordinator behave if the wrong seed is reconstructed because one of the inputs was wrong?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this would imply a malicious seed share owner? The Coordinator has no way to identify a wrong share, so it will create a new seed engine anyway. This should however be immediately noticeable when attesting the Coordinator with
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, the signature of the latest transition is checked during recovery, so directly after the seed engine has been created. If the seed engine is wrong, the user that called
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, if we can't verify the signature with the combined seed we should throw all shares away and start from scratch. |
||||
| If the list contains some number of seed shares below the threshold and peer recovery triggers, the Coordinator is recovered via peer recovery as expected. | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding from the proposal section was that in the first version we would need to scale down the coordinator to one instance which would be in non-recovered state right? So how is it possible for peer recovery to trigger mid-user-recovery via shares? Is there a scenario where while we are scaling down to one instance, a non-recovered coordinator with seed shares accumulated, does peer recover from a recovered peer before the recovered one gets torn down? Should it be specified more like:
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is more of an edge case: if one recovered Coordinator and one non-recovered Coordinator exist (e.g., 2 Coordinators and I restart one), and I call |
||||
|
|
||||
| When the `Recover` RPC is called and the state is stale, the `ResetState` method of the guard is invoked, which takes a `SecretSourceAuthorizer` as an argument. | ||||
| This interface currently only has the method `AuthorizeByManifest`, which validates the peer given the manifest from the store and returns the new seed engine and mesh CA key. | ||||
| This is used both by the user API for `contrast recover` and by the mesh API during peer recovery. | ||||
| The `AuthorizeByManifest` function now additionally returns the reconstructed seed if the threshold is met and the received seed share otherwise: | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/and/or |
||||
|
|
||||
| ```go | ||||
| type SecretSourceAuthorizer interface { | ||||
| AuthorizeByManifest(ctx context.Context, manifest *manifest.Manifest) (seedengine.SeedEngine, crypto.PublicKey, []byte, error) | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this is meant to be |
||||
| } | ||||
| ``` | ||||
|
|
||||
| In case of peer recovery, the third output is always `nil` and the process stays the same as before. | ||||
| For user recovery, the `SecretSourceAuthorizer` is initialized with the state's current seed shares and, if there are enough shares, reconstructs the seed and returns the seed engine with the recovered seed as before. | ||||
| If there aren't enough shares yet, we only return the received seed share from the function without the seed engine and mesh key. | ||||
| The function should already verify that the received seed share isn't present in the list of received seed shares. | ||||
|
|
||||
| If the returned seed engine is `nil` and we get a seed share that isn't already in the list of received seed shares, we return a new state. | ||||
| This state has the `stale` boolean still set to `true` but now includes the list of received seed shares from before plus the new received seed share. | ||||
| After updating the state, we return an error indicating that the threshold isn't met yet. | ||||
| If we get a valid seed engine, we proceed as before. | ||||
|
|
||||
| If the Coordinator has enough seed shares to recover the seed but one of the seed shares is invalid, through, for example, a malicious seed share owner, the following signature verification of the latest transition will fail. | ||||
| In that case, we don't know which seed shares are valid or invalid, so the only option is to reset the state and start the recovery process from scratch with no seed shares. | ||||
|
|
||||
| The proto messages for the `Recover` API remain basically unchanged, now including the encrypted seed share instead of the encrypted seed: | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe this is obvious but I think it should be explicitly mentioned here that each share owner is expected to send the same I have also made the comment in the proposal section but, wdym by "encrypted seed share", is this encrypted with something else that I have missed?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I forgot about that. It only makes sense to also validate the salt. The proto paragraph needs rewriting: the recover request now includes the seed share (not encrypted), and the
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at contrast/internal/manifest/crypto.go Line 95 in 44a892d
I wonder if we should have a more specific label for these new shares (I gather this is like aad in aead), that would also bind them to properties of their intended use like "intended to be used by sss", the threshold, total share count etc. I see this as a robustness/ux improvement that would help in potential early mismatch detection/cleaner domain separation.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can do something like But keep in mind that the untrusted seed share owner can simply choose the label himself and forge an incorrect share with the correct label, so this would only be a UX improvement, not adding any robustness in that scenario. |
||||
|
|
||||
| ```proto | ||||
| message SeedShare { | ||||
| string PublicKey = 1; | ||||
| bytes EncryptedSeedShare = 2; | ||||
| } | ||||
| ``` | ||||
Uh oh!
There was an error while loading. Please reload this page.