Kong CLI parity via shared command logic#3047
Conversation
* fix: mount weights in cog serve like cog run does
was not mounting managed weights into the container, while
(via predict.Predictor) did. This caused models with weights
to fail during setup with errors like:
OSError: Incorrect path_or_model_id: '/src/weights/bge-m3'
The fix adds the same weight-handling pattern to cmdServe:
1. weights.CheckDrift() to validate weights.lock is in sync with cog.yaml
2. newWeightManager(src) to create a weights manager
3. wm.Prepare(ctx) to assemble hardlinks from the local store
4. Append mount specs to runOptions.Volumes as read-only binds
5. defer mounts.Release() to clean up on server exit
Integration tests:
- weights_import_serve.txtar: tests import -> serve (warm cache)
- weights_pull_serve.txtar: tests import -> pull -> serve (cold cache)
* test: propagate weight env vars to cog serve tests
* test: stop cog serve before checking weight cleanup
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
* ci: pin mise version in release workflows * ci: pin mise version in all workflows
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This PR extracts shared command logic to achieve parity between the Cobra and Kong CLIs. I found a few parity issues where the Kong CLI surface diverges from Cobra:
- BaseImageBuildCmd missing
--no-cacheand--progressflags (highest severity). The Cobrabase-image buildcommand exposes these flags viaaddBaseImageFlags(), but the KongBaseImageBuildCmdonly embedsbaseImageVersionFlagsand omits them. SinceRunBaseImageBuildreadsopts.NoCacheandopts.ProgressOutput, the Kong path will always use the zero values. - Several commands that are hidden in Cobra are visible in Kong.
predict,train,weights, anddebugall haveHidden: truein their Cobra definitions but lackhidden:""in the Kong CLI struct tags. - Unreachable
os.Exit(1)afterparser.FatalIfErrorf(err)inmain.go. Kong'sFatalIfErrorfalready exits the process; the subsequentos.Exit(1)is dead code.
|
I found a few parity issues between the Kong and Cobra CLIs and posted them as review comments on PR #3047. Summary of issues:
The review with suggestions is here: #3047 (review) |
* feat: support JSON-native union inputs Add support for union input types such as `str | float` and `str | float | None`. Unions are restricted to JSON-native members (str, int, float, bool, dict/Any, list[T], None) so request validation happens at the HTTP edge against the OpenAPI schema. Unions involving Path, File, Secret, custom coders, and BaseModel are rejected at build/schema-generation time, and output unions remain unsupported. - pkg/schema: add a recursive InputType model and resolver, emit OpenAPI anyOf for union inputs, and keep multi-variant nullable unions required when no default is supplied - pkg/predict: parse numeric CLI values for unions that accept a number (schemaAcceptsNumber), and fall back to a string member when a numeric parse fails for number-first unions like `float | str` (schemaAcceptsString) - python/cog/_adt: add deterministic union normalisation with strict per-member compatibility (bool not int/float, scalars not dict/Any, list unions validate elements) and anyOf json_type emission - tests: Go unit tests, Python regression tests, and end-to-end txtar integration tests for HTTP, CLI, and list unions - docs: document union inputs and nullable semantics * fix: correct CLI numeric parsing for integer-only and float unions Address two related bugs in CLI `-i` parsing of union inputs: - `int | float` resolves to the integer member first, so a fractional value like `1.5` failed ParseInt and errored instead of falling back to the float member. - `str | int` resolves to the string member, then the schemaAcceptsNumber branch parsed `1.5` as a float even though the union only accepts an integer, sending an invalid float. Add a schemaAcceptsFloat helper that matches number members but not integer-only members, and gate float parsing behind it in both the integer branch (with a float fallback) and the schemaAcceptsNumber branch. Add unit tests for `int | float` and `str | int` unions. * docs: clarify optional vs multi-variant union required behaviour The previous table conflated plain single-type optionals with multi-variant nullable unions. A plain Optional[T] / T | None is never placed in required, while a multi-variant union like A | B | None stays required unless a default is supplied. Split these into separate rows and add a runtime caveat: an optional needs a Python-level default (via = Input(...) or default=None) so an omitted value resolves to None; a bare Optional[T] annotation raises TypeError when omitted.
* test: add fuzzing for input type resolution and OpenAPI generation Add fuzz coverage for the schema-gen codepaths exercised by union inputs, which previously had no fuzzing and no validity oracle: - FuzzResolveInputType: feeds arbitrary TypeAnnotation trees through ResolveInputType, then validates that any successfully resolved input type generates an OpenAPI document the build-time validator accepts. - FuzzInputTypeJSONSchema: builds arbitrary InputType trees directly (reaching shapes the resolver never produces) and validates the generated OpenAPI document. Both use an assertValidOpenAPI oracle (the same kin-openapi validator as writeAndValidateSchema), so a union shape that resolves cleanly but emits an invalid schema (e.g. an unsupported `type: null` branch) fails the fuzzer rather than surfacing as a confusing user build error. Make the test:fuzz task auto-discover every Fuzz* target via `go test -list` instead of hardcoding names, so new fuzz tests are picked up automatically. This also surfaced targets the hardcoded list missed (pkg/config's three targets and two parser helpers): 11 total vs the 4 previously run. Manage jq (used for discovery) as a mise tool and simplify the CI fuzz-go job to call the task. * docs: correct union input type support in python.md The union-inputs feature (#3048) added input union support, but the "Type limitations" section still claimed only Optional[T] was supported, contradicting the new Union section. Scope the limitation correctly: output unions remain unsupported, JSON-native input unions are supported, and input unions of Path/File/Secret/custom-coder/BaseModel members fail at build. Also add the missing Union table-of-contents entry and regenerate llms.txt. * test: use testify require in assertValidOpenAPI Replace raw t.Fatalf with require.NoError per the project testing conventions (AGENTS.md).
* origin/main: Bump version to 0.21.0-rc.3 (#3050) test: fuzz schema generation for union inputs + docs fixup (#3049) feat: support JSON-native union inputs (#3048) ci: pin mise version in release workflows (#3046) chore: regen lockfile Bump version to 0.21.0-rc.2 (#3045) fix: mount weights in cog serve like cog run does (#3044)
- Add hidden:"" tags to predict, train, weights, and debug commands to match Cobra's Hidden: true (run stays visible). - Remove unreachable os.Exit(1) after parser.FatalIfErrorf(err), which already exits the process. - Update root help test to assert hidden commands are hidden in the Kong model rather than absent as substrings.
|
LGTM |
- build: add hidden --skip-schema-validation; make --timestamp build-only (split build-only flags out of shared BuildFlags so they no longer leak onto push, matching Cobra). - push: no longer exposes --timestamp/--skip-schema-validation (parity). - serve: drop --env (Cobra serve has no --env); move --env to a dedicated EnvFlag embedded only by exec/predict/run/train. - train (kong): emit the deprecation warning Cobra prints. - base-image: add the generate-matrix subcommand; extract shared RunBaseImageGenerateMatrix runner. - exec (kong): fix missing-arg error text to match Cobra's MinimumNArgs(1). - openapi-schema: accept any string (Cobra defers file-existence to build time) instead of kong existingfile validation. - Remove dead --dockerfile flag from cobra predict/exec/train/debug (RuntimeBuildOptions never reads it); keep it on build/push where it is functional. Guard checkMutuallyExclusiveFlags against absent flags. - tests: derive command/hidden parity from the real Cobra surface (NewRootCommand/NewBaseImageRootCommand) instead of a hardcoded list; assert run/visible commands are not hidden; add value-asserting tests for build-only flags, --env placement, exec arg arity, and generate-matrix.
|
LGTM |
| "registry_default": global.DefaultReplicateRegistryHost, | ||
| }, | ||
| kong.UsageOnError(), | ||
| kong.NoDefaultHelp(), |
There was a problem hiding this comment.
--help doesn't short-circuit. With NoDefaultHelp(), help is handled by the cli.Help check after parser.Parse() (L71), but Parse() has already run Validate(), arg-count checks, enum validation and the existingfile mapper — so validation errors before help is ever shown:
cog exec --help # exit 80 (requires 1 arg)
cog push --use-cog-base-image --use-cuda-base-image=false --help # mutex error
cog build --progress bogus --help # enum error
cog build --dockerfile missing --help # stat error
Expected (Cobra): all print help and exit 0. Make --help fire before validation — e.g. use Kong's built-in help (a BeforeReset hook that runs before Validate) instead of NoDefaultHelp() + a post-parse bool.
There was a problem hiding this comment.
Fixed in 89369f1. Switched to Kong's built-in --help flag (removed NoDefaultHelp() and the post-parse cli.Help check). Help now fires in a BeforeReset hook before Validate()/arg-count/enum/file mapping, so cog exec --help, the mutex case, --progress bogus --help, and --dockerfile missing --help all print help and exit 0, matching Cobra. (--dockerfile also dropped type:"existingfile" so the file check defers to build time like Cobra.)
| Image string `arg:"" optional:"" name:"image" help:"Image to train. If omitted, builds from cog.yaml in the current directory."` | ||
| Input []string `name:"input" short:"i" help:"Inputs, in the form name=value. If value is prefixed with @, it is read from a file on disk, e.g. -i path=@image.jpg."` | ||
| Output string `name:"output" short:"o" default:"weights" help:"Output path."` | ||
| SetupTimeout uint32 `name:"setup-timeout" default:"300" help:"The timeout for a container to setup (in seconds)."` |
There was a problem hiding this comment.
--setup-timeout is registered here, but Cobra's train doesn't have it (addSetupTimeoutFlag is only wired into predict/run):
cog train --setup-timeout 5 # cobra: "unknown flag: --setup-timeout" | kong: accepted
Expected: rejected, matching Cobra. Drop this field (and the SetupTimeout: line in the RunTrain call below), or add the flag to Cobra's train if it's intentional.
There was a problem hiding this comment.
Fixed in 89369f1. Dropped the --setup-timeout field (and the SetupTimeout: line in the RunTrain call) from the Kong train command to match Cobra, which doesn't wire addSetupTimeoutFlag into train.
| @@ -25,62 +21,11 @@ func (cmd *PushCmd) Validate() error { | |||
| return cmd.ValidateMutualExclusivity() | |||
There was a problem hiding this comment.
Cobra's push has no mutual-exclusivity check (only build/exec call checkMutuallyExclusiveFlags), so this diverges:
cog push --use-cog-base-image --use-cuda-base-image=false
# cobra: accepted (proceeds to build) | kong: rejected
Expected: accepted, matching Cobra. Remove this Validate() method, or add the check to Cobra's push if the stricter behaviour is intended.
There was a problem hiding this comment.
Fixed in 89369f1. Removed the Validate() mutual-exclusivity check from PushCmd to match Cobra, where only build/exec call checkMutuallyExclusiveFlags.
- Make --help short-circuit before validation by using Kong's built-in help flag (drop NoDefaultHelp + post-parse Help check). Help now prints and exits 0 even with mutex/enum/arg/file validation errors, matching Cobra. - Drop --setup-timeout from kong train (Cobra train has no such flag). - Remove push mutual-exclusivity Validate (Cobra push has no such check). - Drop type:"existingfile" from --dockerfile so file checks defer to build time, matching Cobra and letting --help short-circuit.
|
LGTM |
anish-sahoo
left a comment
There was a problem hiding this comment.
End-to-end parity re-test after 89369f1 (kong-branch vs main, Dockerfile-generation diff + command validation + real build/predict + flag fuzz).
What this commit fixed (verified): --help now short-circuits past enum/mutex/file validation (e.g. build --progress bogus --help and build --use-cuda-base-image maybe --help both exit 0), train --setup-timeout is gone, and the push mutual-exclusivity check is removed. The fuzzer's HELP_VS_FLAGVALUE / HELP_VS_MUTEX buckets are now clean.
Still diverging from Cobra — inline comments below:
- Parse-time enum strictness on
--use-cuda-base-image/--progress(behavioral, no--help). - Usage/parse-error exit code
80vs Cobra1. cog: error:stderr prefix vs Cobra's bare message.- Remaining
--helpgap: extra positional args still exit 80 instead of showing help.
Build pipeline + predictions themselves are identical: byte-identical Dockerfiles across fixtures/examples, identical image config/OpenAPI labels, identical prediction output.
| @@ -33,42 +34,44 @@ type BuildFlags struct { | |||
| Progress string `name:"progress" default:"${progress_default}" enum:"auto,plain,tty,quiet" help:"Set type of build progress output: ${enum}."` | |||
| UseCudaBaseImage string `name:"use-cuda-base-image" default:"auto" enum:"auto,true,false" help:"Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image)."` | |||
There was a problem hiding this comment.
Parity gap (behavioral): invalid --use-cuda-base-image value. Kong enforces enum:"auto,true,false" at parse time, so an invalid value is rejected before any work:
cog build --use-cuda-base-image maybe
# main → exit 0, silently BUILDS the image (ignores "maybe")
# kong → exit 80, "must be one of auto,true,false but got maybe"
Same command, divergent outcome. Kong is arguably more correct, but this changes behavior for anyone relying on Cobra's lax handling — worth a deliberate decision + changelog note rather than a silent change.
There was a problem hiding this comment.
yeah im kind of ok with this being not compatible as we should reject invalid values so we could address this in a changelog
| @@ -33,42 +34,44 @@ type BuildFlags struct { | |||
| Progress string `name:"progress" default:"${progress_default}" enum:"auto,plain,tty,quiet" help:"Set type of build progress output: ${enum}."` | |||
There was a problem hiding this comment.
Parity gap: --progress fails fast vs fails late. Kong's enum: rejects an invalid value at parse (exit 80, no work done); Cobra accepts it and only fails after starting the Docker build:
cog build --progress bogus
# kong → exit 80 at parse
# main → exit 1, but only after "Building Docker image..." → "invalid progress mode bogus"
Different exit code and different side effects (main begins building first).
There was a problem hiding this comment.
same as above tbh
| @@ -76,6 +69,7 @@ func main() { | |||
| parser.FatalIfErrorf(err) | |||
There was a problem hiding this comment.
Two systematic divergences from Cobra flow through FatalIfErrorf here:
- Exit code: parse/usage errors exit
80(Kong'sParseError.ExitCode()); the Cobra CLI exits1. Anything asserting on exit codes (CI, scripts, integration tests) will see the difference. - stderr prefix: Kong prints
cog: error: <msg>(fromkong.Name("cog")+UsageOnError()); Cobra prints the bare message. Body + exit code otherwise match, e.g.:
kong : cog: error: cog.yaml not found in <dir> ...
main : cog.yaml not found in <dir> ...
Worth a decision on whether parity requires matching Cobra's exit 1 + bare text.
| @@ -76,6 +69,7 @@ func main() { | |||
| parser.FatalIfErrorf(err) | |||
There was a problem hiding this comment.
Remaining --help short-circuit gap after this commit. Dropping NoDefaultHelp correctly makes enum/mutex/file errors short-circuit to help, but extra positional args still don't — cog <cmd> <extra> --help errors instead of printing help:
cog predict a b --help # kong → 80 "unexpected argument b"; cobra → 0 (help)
cog login extra --help # kong → 80 "unexpected argument extra"; cobra → 0 (help)
The rescue at line 63 only matches strings.HasPrefix(parseErr.Error(), "expected"), so unexpected argument … parse errors fall through to FatalIfErrorf and exit 80. This is the only bucket still firing in the flag fuzzer (HELP_VS_ARGS, ~50 unique combos). Likely the same fix should also defer positional-arity validation past Kong's help hook.
Summary
Builds the side-by-side
cmd/cog-kongCLI to full command parity with the existing Cobra CLI (cmd/cog), while extracting parser-independent command behaviour into shared, exported functions underpkg/cli.Both CLIs now execute the same core code: command behaviour, validation, Docker/model/provider setup, output formatting, and errors live in shared
Run*functions and*Optionsstructs. Cobra owns flag registration/parsing; Kong owns struct tags + DI; neither owns behaviour.All 13 commands reach parity:
build,push,serve,exec,predict,run,train,init,login,doctor,debug,weights(import/pull/status),base-image(dockerfile/build).Approach
Each task extracts one command's behaviour into shared code, then wires both Cobra and Kong adapters into it (TDD: failing parse/behaviour test → shared runner → thin adapters):
RunBuild,BuildFlagsOptions,ResolveBuildImageNameRunPush,ResolvePushTarget,PushCommandOptionsRunServe,RunExec,RuntimeBuildOptionsRunInit,RunLogin,RunDoctor,RunDebugRunPrediction,RunTrainRunWeightsImport,RunWeightsPull,RunWeightsStatusRunBaseImageDockerfile,RunBaseImageBuildDeviations from the original plan
The plan was speculative in a few spots; the implementation matches the real code:
OCIIndex/OCIIndexEnabled()from build options.--cuda/--python/--torch), not the plan's guessed--*-version.short:"v"from Kong's--version— it diverged from Cobra (which has no-v) and collided with the weights commands'-v(verbose).PersistentPreRuninto the shared runners, so it fires once per subcommand from a single source.Verification
mise run test:go— all pass (2018 tests)mise run lint:go— 0 issuesmise run fmt:go— cleancogandcog-kongcompile; no binaries committedcmd/cog-kong/main_test.goasserts the full command surface (registration, nested commands, root globals, per-command flag parsing) without requiring DockerNotes
kong(this stacks on the existing side-by-side scaffold).cmd/cogremains the production Cobra binary, unchanged in behaviour.