LoomLLM

3 lines. 10 providers. Automatic quality loop.

from ai_staff_v4 import AIStaff

staff = AIStaff.from_env()                      # Set one API key, auto-discover everything
result = staff.chat("Write a quicksort")        # Auto-classify → draft → review → refine
print(f"Score: {result.quality_score}/100")     # Built-in quality gate

What makes it different?

Every LLM framework lets you call an API. LoomLLM makes the output good.

	LangChain	CrewAI	AutoGen	LoomLLM
Lines to first result	50+	30+	40+	3
Quality gate	Build it	No	No	Auto review + rewrite
Cheap model drafts, strong reviews	Manual	No	No	Built-in
429 fallback	Build it	No	No	Auto
"Is my answer good?"	You guess	You guess	You guess	Score out of 100
Simple Q&A wastes tokens?	Yes	Yes	Yes	No — auto fast-path

The core idea

1. Classify the task (simple? code? research?)
2. Pick the cheapest model that can handle it
3. Draft → Review → Refine until quality passes
4. Auto-save the result

A simple question like "1+1=?" takes 1 API call, 12 tokens.
A code task like "Write quicksort" takes 3 calls, ~6K tokens but scores 92/100.

No wasted tokens. No manual prompt engineering. No "I hope this is good enough."

Features

🎯 Smart Routing — Don't Burn Tokens on Easy Questions

staff.chat("1+1=?")              # → direct: 1 call, 12 tokens
staff.chat("Write quicksort")    # → code: Coder + Critic loop, 92/100
staff.chat("AI trend analysis")  # → research: Multi-turn inquiry
staff.chat("React vs Vue")       # → decision: Multi-dimensional analysis

TaskClassifier scores your input against 6 task types, then routes to the cheapest pipeline that can deliver quality. Simple questions skip the review loop entirely.

🔄 V5 CollabLoop — The Quality Engine

Writer (cheap model) → Reviewer (strong model) → Score < 80? → Rewrite with feedback → Repeat

Cost-aware: Fast/cheap model drafts, strong model only reviews (~50% token savings)
Structured feedback: Reviewer returns specific issues + suggestions, not vague "make it better"
Debate protocol: Writer can push back on unfair criticism, preventing over-iteration
Auto-terminate: Stops when score ≥ threshold, max iterations reached, or no improvement

🔌 10 Providers, Zero Vendor Lock-in

Provider	Direct	Free Tier	Proxy
DeepSeek	✅	❌	❌
Zhipu GLM	✅	✅ glm-4-flash	❌
SiliconFlow	✅	✅ Qwen2.5-7B	❌
Moonshot	✅	❌	❌
Qwen	✅	✅	❌
Google AI	❌	✅ flash-lite	✅
OpenAI	❌	❌	✅
Groq	❌	✅	✅
Anthropic	❌	❌	✅
Ollama	Local	✅	❌

All providers use OpenAI-compatible format (/v1/chat/completions). No proprietary APIs, no vendor lock-in. If it speaks OpenAI, it works with LoomLLM.

🛡️ Cascade Fallback

Provider A (429) → Provider B (503) → Provider C (200 OK ✓)

When a provider fails (429 quota, 503 overload, timeout), LoomLLM automatically falls back across providers and models.

💰 Token Budget & Cost Tracking

💰 gemini-2.5-flash-lite | 1,234 tok | free | Total: 5,678 tokens (3 calls)

Per-call token counting, cost estimation, daily budget limits, session totals.

💾 Auto-Save

Every chat() call automatically saves output to a timestamped directory:

ai_staff_code_Write_quicksort_20260425_153000/
├── solution.py          # The code
├── report.md            # Quality report
└── transcript.txt       # Full execution log

🧠 Persistent Memory

SQLite-backed conversation history across sessions:

staff = AIStaff.from_env(session_id="project-x")
staff.chat("I'm building a Flask app")   # Remembered next session

Quick Start

Install

pip install httpx pyyaml

Set an API key (any one)

# Pick one — all work out of the box
export DEEPSEEK_API_KEY=your-key       # Best value, direct connect from China
export ZHIPU_API_KEY=your-key          # Free tier available (glm-4-flash)
export OPENAI_API_KEY=your-key         # Standard choice
export GEMINI_API_KEY=your-key         # Free tier available (flash-lite)

All supported keys

Key	Provider	Get Key	Free?
`DEEPSEEK_API_KEY`	DeepSeek	platform.deepseek.com	❌
`ZHIPU_API_KEY`	Zhipu GLM	open.bigmodel.cn	✅
`SILICONFLOW_API_KEY`	SiliconFlow	cloud.siliconflow.cn	✅
`MOONSHOT_API_KEY`	Moonshot	platform.moonshot.cn	❌
`QWEN_API_KEY`	Qwen/DashScope	dashscope.console.aliyun.com	✅
`GEMINI_API_KEY`	Google AI	aistudio.google.com	✅
`OPENAI_API_KEY`	OpenAI	platform.openai.com	❌
`GROQ_API_KEY`	Groq	console.groq.com	✅
`ANTHROPIC_API_KEY`	Anthropic	console.anthropic.com	❌

Run

from ai_staff_v4 import AIStaff

# Zero-config: detects keys, tests connectivity, picks best model
staff = AIStaff.from_env()

# That's it. Start chatting.
answer = staff.chat("Hello")

Or use the setup wizard

python -m ai_staff_v4 setup

Usage Patterns

Simple Q&A — auto fast-path

staff.chat("What is asyncio?")      # 1 call, no review, ~100 tokens

Code with quality gate

result = staff.chat("Write quicksort", mode="code", return_details=True)
print(result.quality_score)          # 92/100
print(result.rounds_used)            # 1 (passed on first review)

Multi-model arena

report = staff.chat("Explain quantum entanglement", mode="arena")
# 6 models compete, ranked by quality

Deep research

report = staff.chat("Compare asyncio vs threading in Python", mode="research")
# Multi-turn inquiry with follow-up questions

Custom experts

# experts/experts.yaml
- id: code_reviewer
  name: Code Reviewer
  system_prompt: "You are a senior engineer. Review for correctness and performance."
  temperature: 0.3

Architecture

User Input → chat()
  │
  ├─ TaskClassifier
  │   ├─ direct    → 1 API call (12 tokens for "1+1=?")
  │   ├─ code      → Coder + Critic loop
  │   ├─ research  → Multi-turn follow-up inquiry
  │   ├─ decision  → Multi-perspective analysis
  │   └─ creative  → Writer + Reviewer loop
  │
  └─ V5 CollabLoop (for complex tasks)
      ├─ Writer drafts (fast/cheap model)
      ├─ Reviewer scores + gives feedback (strong model)
      ├─ Score < threshold? → Writer rewrites with feedback
      └─ Auto-terminate when quality passes

ai_staff_v4/
├── core/              # Infrastructure (logging, budget, events, memory)
├── experts/           # Expert roles (YAML-configurable)
├── agents/            # AI sub-agents (collab loop, reviewer, executor)
├── backends/          # 10 LLM providers (OpenAI-compatible)
├── main_mod/          # AIStaff orchestrator
├── examples/          # Working examples
└── tests/             # Unit tests (20/20 passing)

Design Philosophy

See DESIGN.md for the full rationale. TL;DR:

Not everything needs a roundtable — Simple Q&A should be 1 call, not a 5-agent meeting
Cost-aware by default — Cheap model drafts, strong model reviews
Quality > Speed > Cost — But never waste tokens on trivial tasks
Zero config is a feature — Set one key, get 10 providers
OpenAI format only — No vendor lock-in, no proprietary APIs

Testing

# Unit tests (no API key needed)
python -m unittest ai_staff_v4.tests.test_core -v

# Quick import check
python -c "from ai_staff_v4 import AIStaff; print('OK')"

Requirements

Python 3.10+
httpx >= 0.27
pyyaml >= 6.0

Comparison

Feature	LoomLLM	LangChain	CrewAI	AutoGen
Lines to first chat	3	50+	30+	40+
Iterative quality loop	Built-in	Manual	Manual	Manual
Multi-provider	10 built-in	Manual	Manual	Manual
Zero-config startup	Yes	No	No	No
Auto-fallback on 429	Yes	No	No	No
Token cost tracking	Real-time	No	No	No
Task auto-classification	Yes	No	No	No
OpenAI-compatible only	Yes	Multi-format	Multi-format	Multi-format

License

MIT

中文文档 | 设计哲学

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
agents		agents
backends		backends
core		core
examples		examples
experts		experts
main_mod		main_mod
tests		tests
.gitignore		.gitignore
ARCH_FLOWCHART.html		ARCH_FLOWCHART.html
DESIGN.md		DESIGN.md
GUIDE.md		GUIDE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_CN.md		README_CN.md
USAGE.md		USAGE.md
__init__.py		__init__.py
__main__.py		__main__.py
config_template.yaml		config_template.yaml
getting_started.py		getting_started.py
py.typed		py.typed
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoomLLM

What makes it different?

The core idea

Features

🎯 Smart Routing — Don't Burn Tokens on Easy Questions

🔄 V5 CollabLoop — The Quality Engine

🔌 10 Providers, Zero Vendor Lock-in

🛡️ Cascade Fallback

💰 Token Budget & Cost Tracking

💾 Auto-Save

🧠 Persistent Memory

Quick Start

Install

Set an API key (any one)

Run

Or use the setup wizard

Usage Patterns

Simple Q&A — auto fast-path

Code with quality gate

Multi-model arena

Deep research

Custom experts

Architecture

Design Philosophy

Testing

Requirements

Comparison

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LoomLLM

What makes it different?

The core idea

Features

🎯 Smart Routing — Don't Burn Tokens on Easy Questions

🔄 V5 CollabLoop — The Quality Engine

🔌 10 Providers, Zero Vendor Lock-in

🛡️ Cascade Fallback

💰 Token Budget & Cost Tracking

💾 Auto-Save

🧠 Persistent Memory

Quick Start

Install

Set an API key (any one)

Run

Or use the setup wizard

Usage Patterns

Simple Q&A — auto fast-path

Code with quality gate

Multi-model arena

Deep research

Custom experts

Architecture

Design Philosophy

Testing

Requirements

Comparison

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages