rfc: secret sharing#2341
Conversation
|
|
||
| ### UserAPI Changes | ||
|
|
||
| For the implementation we can use a secret sharing scheme based on OpenBao's [Shamir](https://pkg.go.dev/github.com/openbao/openbao/sdk/v2/helper/shamir) go library. |
There was a problem hiding this comment.
Any specific reason for using OpenBao's version over the original Hashiscorp version?
There was a problem hiding this comment.
Vault was relicensed to BSL: https://github.com/hashicorp/vault/blob/main/LICENSE. The headers still say MPL sometimes, but it's maybe safer to use the fork.
There was a problem hiding this comment.
It also looks like Bao is better maintained here: openbao/openbao#182
There was a problem hiding this comment.
For MarbleRun, we decided to vendor shamir.go because we didn't want to add a huge depdendency like Vault or OpenBao SDK to the go.mod just for this small, stable part of code.
We decided to take it from this commit: https://github.com/hashicorp/vault/tree/8d07273d14ae7f5a48cc96f66cc86615dea83390/shamir
It comes with the license next to it. These two files may just be copied like here https://github.com/edgelesssys/marblerun/tree/master/3rdparty/hashicorp/shamir
We also looked at the changes in OpenBao to it since then and concluded that none of them are needed.
| ``` | ||
|
|
||
| This threshold defaults to the number of provided seed share owner public keys, so by default every seed share owner would need to be involved in the recovery process. | ||
| The lowest value would be 1, in which case the behavior is the same as the current implementation, allowing every seed share owner to recover the Coordinator on their own. |
There was a problem hiding this comment.
Do note that SSS requires at least 2 shares, meaning the old implementation has to be kept for this case
| The `AuthorizeByManifest` function can now additionally take the seed shares as an argument to reconstruct the seed if the threshold is met and return the received seed share otherwise: | ||
|
|
||
| ```go | ||
| type SecretSourceAuthorizer interface { | ||
| AuthorizeByManifest(ctx context.Context, manifest *manifest.Manifest, seedShares [][]byte) (seedengine.SeedEngine, crypto.PublicKey, []byte, error) | ||
| } | ||
| ``` |
There was a problem hiding this comment.
If the seed shares are not relevant for all implementations, wouldn't it make more sense to make this internal to seedAuthorizer?
msanft
left a comment
There was a problem hiding this comment.
Generally LGTM, but I fail to see the practical motivation for this.
| ### UserAPI Changes | ||
|
|
||
| For the implementation we can use a secret sharing scheme based on OpenBao's [Shamir](https://pkg.go.dev/github.com/openbao/openbao/sdk/v2/helper/shamir) go library. | ||
| During the initial `contrast set`, the seed is split into shares using the specified threshold and the provided seed share owner public keys, and each share is encrypted with the corresponding public key and returned to the user. |
There was a problem hiding this comment.
Is it a goal of Contrast to allow up-/downscaling of seed share owners on a live deployment? At least the latter wouldn't work anymore with this, right?
There was a problem hiding this comment.
Scaling down the seed share owners would need to generate a new seed (otherwise, assumptions about prior manifests' security might not hold anymore). At that point, it's probably easier to just start a fresh manifest history.
|
|
||
| For now, we can say the solution is to scale down the Coordinator to a single instance. |
There was a problem hiding this comment.
This is also what we do in MarbleRun btw
|
|
||
| ### UserAPI Changes | ||
|
|
||
| For the implementation we can use a secret sharing scheme based on OpenBao's [Shamir](https://pkg.go.dev/github.com/openbao/openbao/sdk/v2/helper/shamir) go library. |
There was a problem hiding this comment.
For MarbleRun, we decided to vendor shamir.go because we didn't want to add a huge depdendency like Vault or OpenBao SDK to the go.mod just for this small, stable part of code.
We decided to take it from this commit: https://github.com/hashicorp/vault/tree/8d07273d14ae7f5a48cc96f66cc86615dea83390/shamir
It comes with the license next to it. These two files may just be copied like here https://github.com/edgelesssys/marblerun/tree/master/3rdparty/hashicorp/shamir
We also looked at the changes in OpenBao to it since then and concluded that none of them are needed.
|
|
||
| If the Coordinator is in recovery mode, this list is used to track the received seed shares via the `Recover` RPC of the user API. | ||
| Peer recovery still works as before, since there are no seed shares involved. | ||
| If a `Recover` call from the user is made where the newly received seed share combined with the already received shares meets the threshold, the seed is reconstructed and the Coordinator can be recovered normally. |
There was a problem hiding this comment.
How should the Coordinator behave if the wrong seed is reconstructed because one of the inputs was wrong?
There was a problem hiding this comment.
I guess this would imply a malicious seed share owner? The Coordinator has no way to identify a wrong share, so it will create a new seed engine anyway. This should however be immediately noticeable when attesting the Coordinator with contrast verify because the root CA changed.
There was a problem hiding this comment.
No, the signature of the latest transition is checked during recovery, so directly after the seed engine has been created. If the seed engine is wrong, the user that called Recover last will receive an error verifying latest transition.... Since the Coordinator doesn't know which seed share is incorrect, maybe we have to transition to a new state with no seed shares and start again?
There was a problem hiding this comment.
Yes, if we can't verify the signature with the combined seed we should throw all shares away and start from scratch.
| When the `Recover` RPC is called and the state is stale, the `ResetState` method of the guard is invoked, which takes a `SecretSourceAuthorizer` as an argument. | ||
| This interface currently only has the method `AuthorizeByManifest`, which validates the peer given the manifest from the store and returns the new seed engine and mesh CA key. | ||
| This is used both by the user API for `contrast recover` and by the mesh API during peer recovery. | ||
| The `AuthorizeByManifest` function now additionally returns the reconstructed seed if the threshold is met and the received seed share otherwise: |
| ## Background | ||
|
|
||
| During an initial `contrast set` the secret seed is encrypted with each seed share owner's public key and returned to the user. | ||
| This means that each seed share owner has access to the complete seed, and thus the ability to recover the Coordinator in case of a restart. | ||
| The original idea was to use a secret sharing scheme to split the seed into multiple shares, and require a threshold of them to recover the seed. | ||
| This would have the advantage of not giving any single seed share owner access to the complete seed. |
There was a problem hiding this comment.
The background section should mention the motivation behind this change more explicitly (by incorporating Markus' comment about mutually distrusting data owners or the more theoretical benefit of multi-party recovery).
Another necessity that also emerges then is that we need to extend our threat model to include a new actor, the less-than-fully-trusted seed share owner. This needs to be described either as part of this proposal or at least be documented as follow-up TODO to our threat model (security.md and threat model docs page).
| seedEngine *seedengine.SeedEngine | ||
| ... | ||
|
|
||
| seedShares [][]byte // new field to track the received seed shares |
There was a problem hiding this comment.
Overloading State's stale state with another type of "staleness", for when we are in the middle of recovery, is confusing. A manifestation of this "confusion" is having to change AuthorizeByManifest's contract to behave differently depending on whether we have shares in flight or not.
Reading the code there, it seems to me that Guard could be a more suitable home for the intermediate shares until a threshold is met. An example sketch of a design that seems more straight-forward to me:
type Guard struct {
...
partialMu sync.Mutex
// nil when no user recovery is in flight.
// Replaced as a whole on each share submission..
// Never mutated in place because callers retain the pointer as a snapshot after AddSeedShare returns.
partialRecovery *PartialRecovery
}
// PartialRecovery is the immutable accumulator of seed shares submitted
// during a multi-party user recovery. It is bound to a specific recovery
// target (insecure latest transition + manifest) and a single salt.
type PartialRecovery struct {
TransitionHash [history.HashSize]byte // hist.GetLatestInsecure() at first submission
Salt []byte // captured from the first submitter
Shares map[ShareID][]byte // keyed by submitter pubkey. Treated as read-only
}
// ShareID identifies a seed share submitter by their hex-encoded RSA
// public key, matching an entry in Manifest.SeedshareOwnerPubKeys.
type ShareID = manifest.HexString
func (g *Guard) AddSeedShare(ctx context.Context, callerID ShareID, share, salt []byte) (*PartialRecovery, error)Then AddSeedShare would implement the logic that checks for errors, checks the threshold etc. If the threshold is met, it would reconstruct the seed, build a seedAuthorizer, call Guard.ResetState etc. It could be made to also return the complete seed, this is just an example.
We would clear PartialRecovery in case of either success or error. This is also in line with the other dicsussion about what to do is the wrong seed is reconstructed etc.
sespiros
left a comment
There was a problem hiding this comment.
Thanks for this proposal. A couple of comments inline. The 2 most important ones are:
- the consideration around how this should affect our threat model
- the suggestion to put the shares in
Guardinstead ofState
The rest are really comments on the implementation.
|
|
||
| ```go | ||
| type SecretSourceAuthorizer interface { | ||
| AuthorizeByManifest(ctx context.Context, manifest *manifest.Manifest) (seedengine.SeedEngine, crypto.PublicKey, []byte, error) |
There was a problem hiding this comment.
I believe this is meant to be *seedengine.SeedEngine and *ecdsa.PrivateKey but ideally see my other comment that suggests not changing this method's contract at all.
| If the Coordinator is in recovery mode, this list is used to track the received seed shares via the `Recover` RPC of the user API. | ||
| Peer recovery still works as before, since there are no seed shares involved. | ||
| If a `Recover` call from the user is made where the newly received seed share combined with the already received shares meets the threshold, the seed is reconstructed and the Coordinator can be recovered normally. | ||
| If the list contains some number of seed shares below the threshold and peer recovery triggers, the Coordinator is recovered via peer recovery as expected. |
There was a problem hiding this comment.
My understanding from the proposal section was that in the first version we would need to scale down the coordinator to one instance which would be in non-recovered state right? So how is it possible for peer recovery to trigger mid-user-recovery via shares?
Is there a scenario where while we are scaling down to one instance, a non-recovered coordinator with seed shares accumulated, does peer recover from a recovered peer before the recovered one gets torn down?
Should it be specified more like:
- if a recovered coordinator exists, use peer recovery
- if no recovered coordinator exists, scale down to one and perform threshold user recovery
- don't use
--forcefor threshold recovery while recovered peers exist
There was a problem hiding this comment.
This is more of an edge case: if one recovered Coordinator and one non-recovered Coordinator exist (e.g., 2 Coordinators and I restart one), and I call recover with --force, the non-recovered Coordinator can accumulate seed shares before peer recovery triggers. I think otherwise the behavior is the same as before, i.e., prefer peer recovery when --force is not specified. Scaling down to 1 for user recovery should then be documented in the docs.
|
|
||
| We implement a secret sharing scheme for the seed, and each seed share owner only has an encrypted seed share. | ||
| To recover the Coordinator, a threshold of seed shares need to be combined to recover the seed. | ||
| This is done by each seed share owner calling `contrast recover` with their encrypted seed share, and the Coordinator combines the shares and recovers the seed if the threshold is met. |
There was a problem hiding this comment.
Technically seed share owners call contrast recover with a plain text seed share (but over encrypted transport/aTLS).
There was a problem hiding this comment.
The seed share is encrypted on disk, so from a user perspective we call contrast recover with the encrypted seed share which is then decrypted by the CLI before sending it to the Coordinator.
| If the Coordinator has enough seed shares to recover the seed but one of the seed shares is invalid, through, for example, a malicious seed share owner, the following signature verification of the latest transition will fail. | ||
| In that case, we don't know which seed shares are valid or invalid, so the only option is to reset the state and start the recovery process from scratch with no seed shares. | ||
|
|
||
| The proto messages for the `Recover` API remain basically unchanged, now including the encrypted seed share instead of the encrypted seed: |
There was a problem hiding this comment.
Maybe this is obvious but I think it should be explicitly mentioned here that each share owner is expected to send the same Salt as part of RecoverRequest and that this should be validated by the coordinator (that's why I included the Salt in my suggested design change).
I have also made the comment in the proposal section but, wdym by "encrypted seed share", is this encrypted with something else that I have missed?
There was a problem hiding this comment.
Yes, I forgot about that. It only makes sense to also validate the salt.
The proto paragraph needs rewriting: the recover request now includes the seed share (not encrypted), and the SeedShare message type belongs to the SetManifestResponse, where the encrypted seed share is correct.
| If the Coordinator has enough seed shares to recover the seed but one of the seed shares is invalid, through, for example, a malicious seed share owner, the following signature verification of the latest transition will fail. | ||
| In that case, we don't know which seed shares are valid or invalid, so the only option is to reset the state and start the recovery process from scratch with no seed shares. | ||
|
|
||
| The proto messages for the `Recover` API remain basically unchanged, now including the encrypted seed share instead of the encrypted seed: |
There was a problem hiding this comment.
Looking at
contrast/internal/manifest/crypto.go
Line 95 in 44a892d
I wonder if we should have a more specific label for these new shares (I gather this is like aad in aead), that would also bind them to properties of their intended use like "intended to be used by sss", the threshold, total share count etc.
I see this as a robustness/ux improvement that would help in potential early mismatch detection/cleaner domain separation.
There was a problem hiding this comment.
I think we can do something like "seedshare-k/n", for a k out of n threshold scheme. Both the user and the Coordinator have access to the manifest, so they both know the threshold and number of seed share owners. I'm not sure how useful this is though. This would only help when a user accidentally runs contrast recover with a seed share from a threshold scheme with different parameters. Maybe even "seedshare-k-{hash(seedshare owner keys)}".
But keep in mind that the untrusted seed share owner can simply choose the label himself and forge an incorrect share with the correct label, so this would only be a UX improvement, not adding any robustness in that scenario.
No description provided.