This guide covers the full process of adding new models and new engine families to the library, including tests and examples.
src/
lib.rs # SpeechModel trait, TranscriptionResult, ModelCapabilities
error.rs # TranscribeError enum
audio.rs # WAV file reading (read_wav_samples)
features/ # Shared audio feature extraction (mel, LFR, CMVN)
decode/ # Shared decoding (CTC, SentencePiece, vocab loading)
onnx/ # ONNX engine family (feature: "onnx")
PORTING.md # Detailed guide for adding ONNX models
mod.rs # Quantization enum, registers model modules
session.rs # Shared ONNX session utilities
gigaam/mod.rs # Model implementation
sense_voice/mod.rs
parakeet/mod.rs
moonshine/ # Multi-file model (mod.rs, model.rs, streaming.rs)
whisper_cpp/ # whisper.cpp engine (feature: "whisper-cpp")
mod.rs
whisperfile.rs # Whisperfile engine (feature: "whisperfile")
remote/ # Remote engines (feature: "openai")
mod.rs # RemoteTranscriptionEngine trait
openai.rs
tests/
common/mod.rs # Shared test utilities (require_paths)
gigaam.rs # One test file per model
...
examples/
gigaam.rs # One example per model
...
If your model uses an existing inference runtime (e.g. ONNX), see the engine-specific porting guide:
- ONNX models: See
src/onnx/PORTING.md
The short version:
- Create
src/onnx/your_model/mod.rs - Register in
src/onnx/mod.rs - Implement
SpeechModeltrait - Add test, example, and Cargo.toml entries
A new engine family is needed when you're integrating a new inference runtime (e.g. Candle, Burn, MLX, TensorRT). Each engine family gets its own feature flag and source directory.
src/your_engine/
mod.rs # Engine-level types, re-exports model modules
your_model/
mod.rs # First model implementation
For single-model engines, a flat file is also acceptable:
src/your_engine.rs # Everything in one file (like whisperfile.rs)
[features]
your-engine = ["dep:your-runtime-crate"]
# Update the "all" feature
all = ["onnx", "whisper-cpp", "whisperfile", "openai", "your-engine"]
[dependencies]
your-runtime-crate = { version = "...", optional = true }If your engine needs shared audio feature extraction (mel spectrograms, CTC decoding), depend on the audio-features feature:
your-engine = ["audio-features", "dep:your-runtime-crate"]#[cfg(feature = "your-engine")]
pub mod your_engine;Every local model must implement the SpeechModel trait:
pub trait SpeechModel {
fn capabilities(&self) -> ModelCapabilities;
fn transcribe(
&mut self,
samples: &[f32],
language: Option<&str>,
) -> Result<TranscriptionResult, TranscribeError>;
// transcribe_file has a default impl that reads the WAV then calls transcribe()
}Required conventions:
- CAPABILITIES constant: Define a
const CAPABILITIES: ModelCapabilitieswith all fields populated - Constructor:
Model::load(...)— single step, returns a ready-to-use model - Params struct:
{Model}Paramswith#[derive(Debug, Clone, Default)] - Two transcribe methods:
transcribe_with(&mut self, samples, &Params)for engine-specific params, plus the traittranscribe()for the generic interface - Errors: Use
TranscribeErrorvariants (ModelNotFound,Inference,Audio,Config)
If your runtime crate has its own error type, add a From impl in src/error.rs:
#[cfg(feature = "your-engine")]
impl From<your_runtime::Error> for TranscribeError {
fn from(e: your_runtime::Error) -> Self {
TranscribeError::Inference(e.to_string())
}
}Create src/your_engine/PORTING.md documenting how to add models within this engine family. See src/onnx/PORTING.md as a reference.
There are two patterns depending on how expensive model loading is.
Use this for models that load quickly (most ONNX models). Each test loads its own model instance.
// tests/your_model.rs
mod common;
use std::path::PathBuf;
use transcribe_rs::your_engine::your_model::YourModel;
use transcribe_rs::SpeechModel;
#[test]
fn test_your_model_transcribe() {
env_logger::init();
let model_dir = PathBuf::from("models/your-model");
let wav_path = PathBuf::from("samples/jfk.wav");
// Skip gracefully if model files aren't present
if !common::require_paths(&[&model_dir, &wav_path]) {
return;
}
let mut model = YourModel::load(&model_dir).expect("Failed to load model");
let result = model
.transcribe_file(&wav_path, None)
.expect("Failed to transcribe");
assert!(!result.text.is_empty(), "Transcription should not be empty");
println!("Transcription: {}", result.text);
}Use this for models that are slow to load (whisper.cpp, whisperfile server startup). A Lazy<Mutex<Option<Engine>>> shares one instance across all tests in the file.
// tests/your_model.rs
mod common;
use once_cell::sync::Lazy;
use std::path::PathBuf;
use std::sync::Mutex;
use transcribe_rs::your_engine::YourEngine;
use transcribe_rs::SpeechModel;
fn model_path() -> PathBuf {
PathBuf::from("models/your-model.bin")
}
static ENGINE: Lazy<Mutex<Option<YourEngine>>> = Lazy::new(|| {
let model = model_path();
if !common::require_paths(&[&model]) {
return Mutex::new(None);
}
match YourEngine::load(&model) {
Ok(engine) => Mutex::new(Some(engine)),
Err(e) => {
eprintln!("Failed to load model: {}", e);
Mutex::new(None)
}
}
});
fn get_engine() -> Option<std::sync::MutexGuard<'static, Option<YourEngine>>> {
let guard = ENGINE.lock().unwrap_or_else(|e| e.into_inner());
if guard.is_none() {
return None;
}
Some(guard)
}
#[test]
fn test_transcription() {
let mut guard = match get_engine() {
Some(g) => g,
None => {
eprintln!("Skipping test: engine not available");
return;
}
};
let engine = guard.as_mut().unwrap();
let audio_path = PathBuf::from("samples/jfk.wav");
let result = engine
.transcribe_file(&audio_path, None)
.expect("Failed to transcribe");
assert!(!result.text.is_empty());
}Every test file must be registered with its required feature:
[[test]]
name = "your_model"
required-features = ["your-engine"]- Always use
common::require_paths()to skip gracefully when model files are absent - Never panic on missing models — CI environments won't have them
- Test at minimum: non-empty transcription output
- If the model is deterministic, assert exact text output
- If the model supports timestamps, add a timestamp test asserting chronological order and reasonable ranges
- Use
env_logger::init()at the start of tests (only call once per process — fine for single-test files, the Lazy pattern handles multi-test files)
Every model should have an example demonstrating load + transcribe.
// examples/your_model.rs
use std::path::PathBuf;
use std::time::Instant;
use transcribe_rs::your_engine::your_model::{YourModel, YourModelParams};
use transcribe_rs::SpeechModel;
fn get_audio_duration(path: &PathBuf) -> Result<f64, Box<dyn std::error::Error>> {
let reader = hound::WavReader::open(path)?;
let spec = reader.spec();
let duration = reader.duration() as f64 / spec.sample_rate as f64;
Ok(duration)
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::init();
let model_path = PathBuf::from("models/your-model");
let wav_path = PathBuf::from("samples/jfk.wav");
let audio_duration = get_audio_duration(&wav_path)?;
println!("Audio duration: {:.2}s", audio_duration);
// Load
let load_start = Instant::now();
let mut model = YourModel::load(&model_path)?;
println!("Model loaded in {:.2?}", load_start.elapsed());
// Transcribe
let transcribe_start = Instant::now();
let samples = transcribe_rs::audio::read_wav_samples(&wav_path)?;
let result = model.transcribe_with(
&samples,
&YourModelParams {
language: Some("en".to_string()),
..Default::default()
},
)?;
let transcribe_duration = transcribe_start.elapsed();
// Results
println!("Transcription completed in {:.2?}", transcribe_duration);
println!(
"Real-time speedup: {:.2}x faster than real-time",
audio_duration / transcribe_duration.as_secs_f64()
);
println!("Transcription result:");
println!("{}", result.text);
if let Some(segments) = result.segments {
println!("\nSegments:");
for segment in segments {
println!(
"[{:.2}s - {:.2}s]: {}",
segment.start, segment.end, segment.text
);
}
}
Ok(())
}[[example]]
name = "your_model"
required-features = ["your-engine"]- Show load timing and transcription timing
- Calculate real-time speedup factor
- Print segments if the model supports timestamps
- Use
transcribe_with()to demonstrate model-specific params - Accept model/audio paths as CLI args with sensible defaults
When adding a new model, make sure all of these are done:
- Model source file:
src/{engine}/{model}/mod.rs - Module registered in engine's
mod.rs -
const CAPABILITIESwith all fields filled in -
{Model}Paramsstruct with#[derive(Debug, Clone, Default)] -
load()constructor -
transcribe_with()method -
impl SpeechModelwithcapabilities()andtranscribe() - Test file:
tests/{model}.rsusingcommon::require_paths - Test registered in
Cargo.tomlwithrequired-features - Example file:
examples/{model}.rs - Example registered in
Cargo.tomlwithrequired-features - Model files placed in
models/{model-name}/with correct naming
When adding a new engine family, also:
- Feature flag in
Cargo.toml[features] - Feature added to
allconvenience feature - Optional dependency in
[dependencies] -
#[cfg(feature = "...")]guard inlib.rs -
Fromimpl inerror.rsfor runtime error type (if applicable) -
PORTING.mdin the engine directory - If the engine supports GPU, integrate with the accelerator system in
src/accel.rs(add an enum, global preference, and wire it into session/model creation)