Skip to content

Kong CLI parity via shared command logic#3047

Draft
markphelps wants to merge 20 commits into
kongfrom
kong-cli-parity
Draft

Kong CLI parity via shared command logic#3047
markphelps wants to merge 20 commits into
kongfrom
kong-cli-parity

Conversation

@markphelps

Copy link
Copy Markdown
Collaborator

Summary

Builds the side-by-side cmd/cog-kong CLI to full command parity with the existing Cobra CLI (cmd/cog), while extracting parser-independent command behaviour into shared, exported functions under pkg/cli.

Both CLIs now execute the same core code: command behaviour, validation, Docker/model/provider setup, output formatting, and errors live in shared Run* functions and *Options structs. Cobra owns flag registration/parsing; Kong owns struct tags + DI; neither owns behaviour.

All 13 commands reach parity: build, push, serve, exec, predict, run, train, init, login, doctor, debug, weights (import/pull/status), base-image (dockerfile/build).

Approach

Each task extracts one command's behaviour into shared code, then wires both Cobra and Kong adapters into it (TDD: failing parse/behaviour test → shared runner → thin adapters):

Task Shared runner(s) added
build RunBuild, BuildFlagsOptions, ResolveBuildImageName
push RunPush, ResolvePushTarget, PushCommandOptions
serve/exec RunServe, RunExec, RuntimeBuildOptions
init/login/doctor/debug RunInit, RunLogin, RunDoctor, RunDebug
predict/run/train RunPrediction, RunTrain
weights RunWeightsImport, RunWeightsPull, RunWeightsStatus
base-image RunBaseImageDockerfile, RunBaseImageBuild

Deviations from the original plan

The plan was speculative in a few spots; the implementation matches the real code:

  • Dropped non-existent OCIIndex / OCIIndexEnabled() from build options.
  • Used the real base-image flags (--cuda/--python/--torch), not the plan's guessed --*-version.
  • Removed short:"v" from Kong's --version — it diverged from Cobra (which has no -v) and collided with the weights commands' -v (verbose).
  • Moved the weights experimental warning from Cobra's PersistentPreRun into the shared runners, so it fires once per subcommand from a single source.

Verification

  • mise run test:go — all pass (2018 tests)
  • mise run lint:go — 0 issues
  • mise run fmt:go — clean
  • Both cog and cog-kong compile; no binaries committed
  • New cmd/cog-kong/main_test.go asserts the full command surface (registration, nested commands, root globals, per-command flag parsing) without requiring Docker

Notes

  • Base branch is kong (this stacks on the existing side-by-side scaffold).
  • cmd/cog remains the production Cobra binary, unchanged in behaviour.
  • Draft: opening for review of the shared-code seams before finalising.

markphelps and others added 13 commits June 2, 2026 13:20
* fix: mount weights in cog serve like cog run does

 was not mounting managed weights into the container, while
 (via predict.Predictor) did. This caused models with weights
to fail during setup with errors like:

    OSError: Incorrect path_or_model_id: '/src/weights/bge-m3'

The fix adds the same weight-handling pattern to cmdServe:

1. weights.CheckDrift() to validate weights.lock is in sync with cog.yaml
2. newWeightManager(src) to create a weights manager
3. wm.Prepare(ctx) to assemble hardlinks from the local store
4. Append mount specs to runOptions.Volumes as read-only binds
5. defer mounts.Release() to clean up on server exit

Integration tests:
- weights_import_serve.txtar: tests import -> serve (warm cache)
- weights_pull_serve.txtar: tests import -> pull -> serve (cold cache)

* test: propagate weight env vars to cog serve tests

* test: stop cog serve before checking weight cleanup
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
* ci: pin mise version in release workflows

* ci: pin mise version in all workflows

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR extracts shared command logic to achieve parity between the Cobra and Kong CLIs. I found a few parity issues where the Kong CLI surface diverges from Cobra:

  1. BaseImageBuildCmd missing --no-cache and --progress flags (highest severity). The Cobra base-image build command exposes these flags via addBaseImageFlags(), but the Kong BaseImageBuildCmd only embeds baseImageVersionFlags and omits them. Since RunBaseImageBuild reads opts.NoCache and opts.ProgressOutput, the Kong path will always use the zero values.
  2. Several commands that are hidden in Cobra are visible in Kong. predict, train, weights, and debug all have Hidden: true in their Cobra definitions but lack hidden:"" in the Kong CLI struct tags.
  3. Unreachable os.Exit(1) after parser.FatalIfErrorf(err) in main.go. Kong's FatalIfErrorf already exits the process; the subsequent os.Exit(1) is dead code.

Comment thread cmd/cog-kong/baseimage.go
Comment thread cmd/cog-kong/main.go
Comment thread cmd/cog-kong/main.go Outdated
Comment thread cmd/cog-kong/main.go Outdated
Comment thread cmd/cog-kong/main.go
Comment thread cmd/cog-kong/main.go Outdated
Comment thread cmd/cog-kong/main.go Outdated
@ask-bonk

ask-bonk Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

I found a few parity issues between the Kong and Cobra CLIs and posted them as review comments on PR #3047.

Summary of issues:

  1. BaseImageBuildCmd is missing --no-cache and --progress flags that the Cobra version has
  2. predict, train, weights, and debug commands are hidden in Cobra but visible in Kong
  3. Unreachable os.Exit(1) after parser.FatalIfErrorf(err) in main.go (dead code)

The review with suggestions is here: #3047 (review)

github run

markphelps and others added 5 commits June 5, 2026 09:26
* feat: support JSON-native union inputs

Add support for union input types such as `str | float` and
`str | float | None`. Unions are restricted to JSON-native members
(str, int, float, bool, dict/Any, list[T], None) so request
validation happens at the HTTP edge against the OpenAPI schema.
Unions involving Path, File, Secret, custom coders, and BaseModel
are rejected at build/schema-generation time, and output unions
remain unsupported.

- pkg/schema: add a recursive InputType model and resolver, emit
  OpenAPI anyOf for union inputs, and keep multi-variant nullable
  unions required when no default is supplied
- pkg/predict: parse numeric CLI values for unions that accept a
  number (schemaAcceptsNumber), and fall back to a string member
  when a numeric parse fails for number-first unions like
  `float | str` (schemaAcceptsString)
- python/cog/_adt: add deterministic union normalisation with strict
  per-member compatibility (bool not int/float, scalars not dict/Any,
  list unions validate elements) and anyOf json_type emission
- tests: Go unit tests, Python regression tests, and end-to-end
  txtar integration tests for HTTP, CLI, and list unions
- docs: document union inputs and nullable semantics

* fix: correct CLI numeric parsing for integer-only and float unions

Address two related bugs in CLI `-i` parsing of union inputs:

- `int | float` resolves to the integer member first, so a fractional
  value like `1.5` failed ParseInt and errored instead of falling back
  to the float member.
- `str | int` resolves to the string member, then the schemaAcceptsNumber
  branch parsed `1.5` as a float even though the union only accepts an
  integer, sending an invalid float.

Add a schemaAcceptsFloat helper that matches number members but not
integer-only members, and gate float parsing behind it in both the
integer branch (with a float fallback) and the schemaAcceptsNumber
branch. Add unit tests for `int | float` and `str | int` unions.

* docs: clarify optional vs multi-variant union required behaviour

The previous table conflated plain single-type optionals with
multi-variant nullable unions. A plain Optional[T] / T | None is never
placed in required, while a multi-variant union like A | B | None stays
required unless a default is supplied. Split these into separate rows
and add a runtime caveat: an optional needs a Python-level default
(via = Input(...) or default=None) so an omitted value resolves to
None; a bare Optional[T] annotation raises TypeError when omitted.
* test: add fuzzing for input type resolution and OpenAPI generation

Add fuzz coverage for the schema-gen codepaths exercised by union
inputs, which previously had no fuzzing and no validity oracle:

- FuzzResolveInputType: feeds arbitrary TypeAnnotation trees through
  ResolveInputType, then validates that any successfully resolved input
  type generates an OpenAPI document the build-time validator accepts.
- FuzzInputTypeJSONSchema: builds arbitrary InputType trees directly
  (reaching shapes the resolver never produces) and validates the
  generated OpenAPI document.

Both use an assertValidOpenAPI oracle (the same kin-openapi validator
as writeAndValidateSchema), so a union shape that resolves cleanly but
emits an invalid schema (e.g. an unsupported `type: null` branch) fails
the fuzzer rather than surfacing as a confusing user build error.

Make the test:fuzz task auto-discover every Fuzz* target via
`go test -list` instead of hardcoding names, so new fuzz tests are
picked up automatically. This also surfaced targets the hardcoded list
missed (pkg/config's three targets and two parser helpers): 11 total
vs the 4 previously run. Manage jq (used for discovery) as a mise tool
and simplify the CI fuzz-go job to call the task.

* docs: correct union input type support in python.md

The union-inputs feature (#3048) added input union support, but the
"Type limitations" section still claimed only Optional[T] was supported,
contradicting the new Union section. Scope the limitation correctly:
output unions remain unsupported, JSON-native input unions are
supported, and input unions of Path/File/Secret/custom-coder/BaseModel
members fail at build. Also add the missing Union table-of-contents
entry and regenerate llms.txt.

* test: use testify require in assertValidOpenAPI

Replace raw t.Fatalf with require.NoError per the project testing
conventions (AGENTS.md).
* origin/main:
  Bump version to 0.21.0-rc.3 (#3050)
  test: fuzz schema generation for union inputs + docs fixup (#3049)
  feat: support JSON-native union inputs (#3048)
  ci: pin mise version in release workflows (#3046)
  chore: regen lockfile
  Bump version to 0.21.0-rc.2 (#3045)
  fix: mount weights in cog serve like cog run does (#3044)
- Add hidden:"" tags to predict, train, weights, and debug commands to
  match Cobra's Hidden: true (run stays visible).
- Remove unreachable os.Exit(1) after parser.FatalIfErrorf(err), which
  already exits the process.
- Update root help test to assert hidden commands are hidden in the
  Kong model rather than absent as substrings.
@ask-bonk

ask-bonk Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

LGTM

github run

- build: add hidden --skip-schema-validation; make --timestamp build-only
  (split build-only flags out of shared BuildFlags so they no longer leak
  onto push, matching Cobra).
- push: no longer exposes --timestamp/--skip-schema-validation (parity).
- serve: drop --env (Cobra serve has no --env); move --env to a dedicated
  EnvFlag embedded only by exec/predict/run/train.
- train (kong): emit the deprecation warning Cobra prints.
- base-image: add the generate-matrix subcommand; extract shared
  RunBaseImageGenerateMatrix runner.
- exec (kong): fix missing-arg error text to match Cobra's MinimumNArgs(1).
- openapi-schema: accept any string (Cobra defers file-existence to build
  time) instead of kong existingfile validation.
- Remove dead --dockerfile flag from cobra predict/exec/train/debug
  (RuntimeBuildOptions never reads it); keep it on build/push where it is
  functional. Guard checkMutuallyExclusiveFlags against absent flags.
- tests: derive command/hidden parity from the real Cobra surface
  (NewRootCommand/NewBaseImageRootCommand) instead of a hardcoded list;
  assert run/visible commands are not hidden; add value-asserting tests for
  build-only flags, --env placement, exec arg arity, and generate-matrix.
@ask-bonk

ask-bonk Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

LGTM

github run

@markphelps markphelps requested a review from anish-sahoo June 10, 2026 20:08
Comment thread cmd/cog-kong/main.go Outdated
"registry_default": global.DefaultReplicateRegistryHost,
},
kong.UsageOnError(),
kong.NoDefaultHelp(),

@anish-sahoo anish-sahoo Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--help doesn't short-circuit. With NoDefaultHelp(), help is handled by the cli.Help check after parser.Parse() (L71), but Parse() has already run Validate(), arg-count checks, enum validation and the existingfile mapper — so validation errors before help is ever shown:

cog exec --help                                                  # exit 80 (requires 1 arg)
cog push --use-cog-base-image --use-cuda-base-image=false --help # mutex error
cog build --progress bogus --help                                # enum error
cog build --dockerfile missing --help                            # stat error

Expected (Cobra): all print help and exit 0. Make --help fire before validation — e.g. use Kong's built-in help (a BeforeReset hook that runs before Validate) instead of NoDefaultHelp() + a post-parse bool.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 89369f1. Switched to Kong's built-in --help flag (removed NoDefaultHelp() and the post-parse cli.Help check). Help now fires in a BeforeReset hook before Validate()/arg-count/enum/file mapping, so cog exec --help, the mutex case, --progress bogus --help, and --dockerfile missing --help all print help and exit 0, matching Cobra. (--dockerfile also dropped type:"existingfile" so the file check defers to build time like Cobra.)

Comment thread cmd/cog-kong/train.go Outdated
Image string `arg:"" optional:"" name:"image" help:"Image to train. If omitted, builds from cog.yaml in the current directory."`
Input []string `name:"input" short:"i" help:"Inputs, in the form name=value. If value is prefixed with @, it is read from a file on disk, e.g. -i path=@image.jpg."`
Output string `name:"output" short:"o" default:"weights" help:"Output path."`
SetupTimeout uint32 `name:"setup-timeout" default:"300" help:"The timeout for a container to setup (in seconds)."`

@anish-sahoo anish-sahoo Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--setup-timeout is registered here, but Cobra's train doesn't have it (addSetupTimeoutFlag is only wired into predict/run):

cog train --setup-timeout 5   # cobra: "unknown flag: --setup-timeout" | kong: accepted

Expected: rejected, matching Cobra. Drop this field (and the SetupTimeout: line in the RunTrain call below), or add the flag to Cobra's train if it's intentional.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 89369f1. Dropped the --setup-timeout field (and the SetupTimeout: line in the RunTrain call) from the Kong train command to match Cobra, which doesn't wire addSetupTimeoutFlag into train.

Comment thread cmd/cog-kong/push.go Outdated
@@ -25,62 +21,11 @@ func (cmd *PushCmd) Validate() error {
return cmd.ValidateMutualExclusivity()

@anish-sahoo anish-sahoo Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cobra's push has no mutual-exclusivity check (only build/exec call checkMutuallyExclusiveFlags), so this diverges:

cog push --use-cog-base-image --use-cuda-base-image=false
# cobra: accepted (proceeds to build) | kong: rejected

Expected: accepted, matching Cobra. Remove this Validate() method, or add the check to Cobra's push if the stricter behaviour is intended.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 89369f1. Removed the Validate() mutual-exclusivity check from PushCmd to match Cobra, where only build/exec call checkMutuallyExclusiveFlags.

- Make --help short-circuit before validation by using Kong's built-in
  help flag (drop NoDefaultHelp + post-parse Help check). Help now prints
  and exits 0 even with mutex/enum/arg/file validation errors, matching Cobra.
- Drop --setup-timeout from kong train (Cobra train has no such flag).
- Remove push mutual-exclusivity Validate (Cobra push has no such check).
- Drop type:"existingfile" from --dockerfile so file checks defer to build
  time, matching Cobra and letting --help short-circuit.
@ask-bonk

ask-bonk Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

LGTM

github run

@anish-sahoo anish-sahoo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End-to-end parity re-test after 89369f1 (kong-branch vs main, Dockerfile-generation diff + command validation + real build/predict + flag fuzz).

What this commit fixed (verified): --help now short-circuits past enum/mutex/file validation (e.g. build --progress bogus --help and build --use-cuda-base-image maybe --help both exit 0), train --setup-timeout is gone, and the push mutual-exclusivity check is removed. The fuzzer's HELP_VS_FLAGVALUE / HELP_VS_MUTEX buckets are now clean.

Still diverging from Cobra — inline comments below:

  1. Parse-time enum strictness on --use-cuda-base-image / --progress (behavioral, no --help).
  2. Usage/parse-error exit code 80 vs Cobra 1.
  3. cog: error: stderr prefix vs Cobra's bare message.
  4. Remaining --help gap: extra positional args still exit 80 instead of showing help.

Build pipeline + predictions themselves are identical: byte-identical Dockerfiles across fixtures/examples, identical image config/OpenAPI labels, identical prediction output.

Comment thread cmd/cog-kong/flags.go
@@ -33,42 +34,44 @@ type BuildFlags struct {
Progress string `name:"progress" default:"${progress_default}" enum:"auto,plain,tty,quiet" help:"Set type of build progress output: ${enum}."`
UseCudaBaseImage string `name:"use-cuda-base-image" default:"auto" enum:"auto,true,false" help:"Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image)."`

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parity gap (behavioral): invalid --use-cuda-base-image value. Kong enforces enum:"auto,true,false" at parse time, so an invalid value is rejected before any work:

cog build --use-cuda-base-image maybe
#   main → exit 0, silently BUILDS the image (ignores "maybe")
#   kong → exit 80, "must be one of auto,true,false but got maybe"

Same command, divergent outcome. Kong is arguably more correct, but this changes behavior for anyone relying on Cobra's lax handling — worth a deliberate decision + changelog note rather than a silent change.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah im kind of ok with this being not compatible as we should reject invalid values so we could address this in a changelog

Comment thread cmd/cog-kong/flags.go
@@ -33,42 +34,44 @@ type BuildFlags struct {
Progress string `name:"progress" default:"${progress_default}" enum:"auto,plain,tty,quiet" help:"Set type of build progress output: ${enum}."`

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parity gap: --progress fails fast vs fails late. Kong's enum: rejects an invalid value at parse (exit 80, no work done); Cobra accepts it and only fails after starting the Docker build:

cog build --progress bogus
#   kong → exit 80 at parse
#   main → exit 1, but only after "Building Docker image..." → "invalid progress mode bogus"

Different exit code and different side effects (main begins building first).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above tbh

Comment thread cmd/cog-kong/main.go
@@ -76,6 +69,7 @@ func main() {
parser.FatalIfErrorf(err)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two systematic divergences from Cobra flow through FatalIfErrorf here:

  • Exit code: parse/usage errors exit 80 (Kong's ParseError.ExitCode()); the Cobra CLI exits 1. Anything asserting on exit codes (CI, scripts, integration tests) will see the difference.
  • stderr prefix: Kong prints cog: error: <msg> (from kong.Name("cog") + UsageOnError()); Cobra prints the bare message. Body + exit code otherwise match, e.g.:
kong : cog: error: cog.yaml not found in <dir> ...
main : cog.yaml not found in <dir> ...

Worth a decision on whether parity requires matching Cobra's exit 1 + bare text.

Comment thread cmd/cog-kong/main.go
@@ -76,6 +69,7 @@ func main() {
parser.FatalIfErrorf(err)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining --help short-circuit gap after this commit. Dropping NoDefaultHelp correctly makes enum/mutex/file errors short-circuit to help, but extra positional args still don'tcog <cmd> <extra> --help errors instead of printing help:

cog predict a b --help   # kong → 80 "unexpected argument b";  cobra → 0 (help)
cog login extra --help   # kong → 80 "unexpected argument extra"; cobra → 0 (help)

The rescue at line 63 only matches strings.HasPrefix(parseErr.Error(), "expected"), so unexpected argument … parse errors fall through to FatalIfErrorf and exit 80. This is the only bucket still firing in the flag fuzzer (HELP_VS_ARGS, ~50 unique combos). Likely the same fix should also defer positional-arity validation past Kong's help hook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants