Summary
The current model runtime centers on a single ModelRuntime abstraction that mixes provider concerns with every model capability (LLM, embeddings, rerank, speech-to-text, moderation, and TTS).
That boundary is too wide. It forces downstream implementers to provide methods for capabilities they do not use, and it makes the type surface larger than the real dependency graph.
Problem
- An
LLM-only runtime currently has to satisfy unrelated embedding, rerank, moderation, speech-to-text, and TTS methods.
- Model wrappers depend on a broader runtime surface than they actually consume.
- Partial runtime integrations require stub methods or placeholder implementations.
- Evolving one model capability unnecessarily touches a shared global interface.
Proposal
Split the runtime into capability-specific abstractions and place each in its own module.
Proposed interfaces:
ModelProviderRuntime
LLMModelRuntime
TextEmbeddingModelRuntime
RerankModelRuntime
SpeechToTextModelRuntime
ModerationModelRuntime
TTSModelRuntime
Design direction:
- Move provider discovery, icon lookup, credential validation, and schema lookup into
ModelProviderRuntime.
- Make each model wrapper depend only on the narrow runtime abstraction it actually needs.
- Keep capability-specific behavior isolated to capability-specific runtime modules.
- Stop treating the current monolithic
ModelRuntime as the required implementation target for downstream users.
Scope
- Runtime protocol definitions under
graphon.model_runtime
- Base model wrapper typing and construction
- Prepared model helpers that currently depend on the full runtime
- Public protocol exports
- Contributor-facing runtime documentation
Breaking Change
This should be treated as an intentional API break for custom runtime implementers.
Expected migration impact:
- Custom runtimes will need to update imports and implemented protocols.
- Downstream users should be able to delete unrelated stub methods after migration.
- We should prefer a clean abstraction boundary over preserving the current monolithic interface.
Expected Outcome
- Downstream users can implement only the model capabilities they support.
- Type annotations reflect real dependencies instead of a catch-all runtime.
- Runtime integrations become smaller, clearer, and easier to extend.
- Adding a new capability no longer expands the required interface for every existing runtime.
Non-Goals
- Changing model entities or provider schema semantics
- Changing invocation behavior for existing capabilities
- Introducing new model capabilities as part of this refactor
Acceptance Criteria
- No model wrapper depends on unrelated runtime capabilities.
- An
LLM-only runtime can type-check without embedding, rerank, moderation, speech-to-text, or TTS methods.
- A provider-only flow can type-check without model invocation methods.
- Runtime documentation explains which abstraction to implement for each integration path.
Summary
The current model runtime centers on a single
ModelRuntimeabstraction that mixes provider concerns with every model capability (LLM, embeddings, rerank, speech-to-text, moderation, andTTS).That boundary is too wide. It forces downstream implementers to provide methods for capabilities they do not use, and it makes the type surface larger than the real dependency graph.
Problem
LLM-only runtime currently has to satisfy unrelated embedding, rerank, moderation, speech-to-text, andTTSmethods.Proposal
Split the runtime into capability-specific abstractions and place each in its own module.
Proposed interfaces:
ModelProviderRuntimeLLMModelRuntimeTextEmbeddingModelRuntimeRerankModelRuntimeSpeechToTextModelRuntimeModerationModelRuntimeTTSModelRuntimeDesign direction:
ModelProviderRuntime.ModelRuntimeas the required implementation target for downstream users.Scope
graphon.model_runtimeBreaking Change
This should be treated as an intentional API break for custom runtime implementers.
Expected migration impact:
Expected Outcome
Non-Goals
Acceptance Criteria
LLM-only runtime can type-check without embedding, rerank, moderation, speech-to-text, orTTSmethods.