Proposal: split model runtime abstractions by capability

## Summary

The current model runtime centers on a single `ModelRuntime` abstraction that mixes provider concerns with every model capability (`LLM`, embeddings, rerank, speech-to-text, moderation, and `TTS`).

That boundary is too wide. It forces downstream implementers to provide methods for capabilities they do not use, and it makes the type surface larger than the real dependency graph.

## Problem

- An `LLM`-only runtime currently has to satisfy unrelated embedding, rerank, moderation, speech-to-text, and `TTS` methods.
- Model wrappers depend on a broader runtime surface than they actually consume.
- Partial runtime integrations require stub methods or placeholder implementations.
- Evolving one model capability unnecessarily touches a shared global interface.

## Proposal

Split the runtime into capability-specific abstractions and place each in its own module.

Proposed interfaces:

- `ModelProviderRuntime`
- `LLMModelRuntime`
- `TextEmbeddingModelRuntime`
- `RerankModelRuntime`
- `SpeechToTextModelRuntime`
- `ModerationModelRuntime`
- `TTSModelRuntime`

Design direction:

- Move provider discovery, icon lookup, credential validation, and schema lookup into `ModelProviderRuntime`.
- Make each model wrapper depend only on the narrow runtime abstraction it actually needs.
- Keep capability-specific behavior isolated to capability-specific runtime modules.
- Stop treating the current monolithic `ModelRuntime` as the required implementation target for downstream users.

## Scope

- Runtime protocol definitions under `graphon.model_runtime`
- Base model wrapper typing and construction
- Prepared model helpers that currently depend on the full runtime
- Public protocol exports
- Contributor-facing runtime documentation

## Breaking Change

This should be treated as an intentional API break for custom runtime implementers.

Expected migration impact:

- Custom runtimes will need to update imports and implemented protocols.
- Downstream users should be able to delete unrelated stub methods after migration.
- We should prefer a clean abstraction boundary over preserving the current monolithic interface.

## Expected Outcome

- Downstream users can implement only the model capabilities they support.
- Type annotations reflect real dependencies instead of a catch-all runtime.
- Runtime integrations become smaller, clearer, and easier to extend.
- Adding a new capability no longer expands the required interface for every existing runtime.

## Non-Goals

- Changing model entities or provider schema semantics
- Changing invocation behavior for existing capabilities
- Introducing new model capabilities as part of this refactor

## Acceptance Criteria

- No model wrapper depends on unrelated runtime capabilities.
- An `LLM`-only runtime can type-check without embedding, rerank, moderation, speech-to-text, or `TTS` methods.
- A provider-only flow can type-check without model invocation methods.
- Runtime documentation explains which abstraction to implement for each integration path.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: split model runtime abstractions by capability #56

Summary

Problem

Proposal

Scope

Breaking Change

Expected Outcome

Non-Goals

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: split model runtime abstractions by capability #56

Description

Summary

Problem

Proposal

Scope

Breaking Change

Expected Outcome

Non-Goals

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions