Skip to content

Proposal: split model runtime abstractions by capability #56

@laipz8200

Description

@laipz8200

Summary

The current model runtime centers on a single ModelRuntime abstraction that mixes provider concerns with every model capability (LLM, embeddings, rerank, speech-to-text, moderation, and TTS).

That boundary is too wide. It forces downstream implementers to provide methods for capabilities they do not use, and it makes the type surface larger than the real dependency graph.

Problem

  • An LLM-only runtime currently has to satisfy unrelated embedding, rerank, moderation, speech-to-text, and TTS methods.
  • Model wrappers depend on a broader runtime surface than they actually consume.
  • Partial runtime integrations require stub methods or placeholder implementations.
  • Evolving one model capability unnecessarily touches a shared global interface.

Proposal

Split the runtime into capability-specific abstractions and place each in its own module.

Proposed interfaces:

  • ModelProviderRuntime
  • LLMModelRuntime
  • TextEmbeddingModelRuntime
  • RerankModelRuntime
  • SpeechToTextModelRuntime
  • ModerationModelRuntime
  • TTSModelRuntime

Design direction:

  • Move provider discovery, icon lookup, credential validation, and schema lookup into ModelProviderRuntime.
  • Make each model wrapper depend only on the narrow runtime abstraction it actually needs.
  • Keep capability-specific behavior isolated to capability-specific runtime modules.
  • Stop treating the current monolithic ModelRuntime as the required implementation target for downstream users.

Scope

  • Runtime protocol definitions under graphon.model_runtime
  • Base model wrapper typing and construction
  • Prepared model helpers that currently depend on the full runtime
  • Public protocol exports
  • Contributor-facing runtime documentation

Breaking Change

This should be treated as an intentional API break for custom runtime implementers.

Expected migration impact:

  • Custom runtimes will need to update imports and implemented protocols.
  • Downstream users should be able to delete unrelated stub methods after migration.
  • We should prefer a clean abstraction boundary over preserving the current monolithic interface.

Expected Outcome

  • Downstream users can implement only the model capabilities they support.
  • Type annotations reflect real dependencies instead of a catch-all runtime.
  • Runtime integrations become smaller, clearer, and easier to extend.
  • Adding a new capability no longer expands the required interface for every existing runtime.

Non-Goals

  • Changing model entities or provider schema semantics
  • Changing invocation behavior for existing capabilities
  • Introducing new model capabilities as part of this refactor

Acceptance Criteria

  • No model wrapper depends on unrelated runtime capabilities.
  • An LLM-only runtime can type-check without embedding, rerank, moderation, speech-to-text, or TTS methods.
  • A provider-only flow can type-check without model invocation methods.
  • Runtime documentation explains which abstraction to implement for each integration path.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions