🎭 PolyVox Studio

Many voices, one story.
Professional audiobook creation with AI character voices.

✨ What is PolyVox Studio?

PolyVox Studio turns books into audiobooks with distinct voices per character. It detects characters and dialogue, lets you assign voices (or clone your own), and renders polished audio via an intuitive desktop GUI.

Highlights

🤖 Character & dialogue detection (BookNLP + heuristics)
🎤 Built-in voices & voice cloning (XTTS v2 / Coqui TTS)
🗂️ Chapter handling, line-level editing & attribution fixes
⚙️ GPU acceleration with selectable CUDA 12.8 (RTX 50-series) / CUDA 12.1 / CUDA 11.8 / CPU modes
🖥️ Modern CustomTkinter UI with progress & logs

🚀 Installation

Clone the repo and run the platform installer. Each script creates a PolyVox virtual environment, installs the right PyTorch wheel (CUDA 12.8, CUDA 12.1, CUDA 11.8, or CPU), and pulls the remaining dependencies.

🆕 v2.5 install & startup improvements

Linux installer now retries with the latest compatible PyTorch wheels if pinned wheels are unavailable for your Python version.
Linux installer now falls back to CPU PyTorch automatically if GPU wheel install fails.
Linux launcher now uses the PolyVox/bin/python interpreter directly (prevents accidental use of base/conda Python).
Linux launcher now auto-checks core dependencies and attempts repair from requirements_min.txt when something is missing.

🐧 Linux (and advanced macOS setups)

git clone https://github.com/dmarsh400/PolyVoxStudio.git
cd PolyVoxStudio
chmod +x install_linux.sh
./install_linux.sh

The installer will prompt for your preferred GPU runtime. See INSTALL_LINUX.md for prerequisites, optional system packages (FFmpeg/Tesseract), and troubleshooting tips.

🪟 Windows (one click)

git clone https://github.com/dmarsh400/PolyVoxStudio.git
cd PolyVoxStudio
.\install_windows.bat

You can also double-click install_windows.bat in Explorer. The batch file launches PowerShell with execution-policy bypass, then runs the full installer. Detailed notes live in INSTALL_WINDOWS.md.

Need CPU-only? Choose the CPU option when prompted. Unsure which CUDA runtime to pick? Check the GPU Support section below.

⚡ Quick Start

Launch the app

./run_gui.sh          # Linux / macOS

On Linux, run_gui.sh automatically uses the local PolyVox environment if present and prints which Python executable is being used.

./run_gui.bat        # Windows (double-click works too)

Book Processing
Import Book → select .txt / .pdf / .epub → Detect Chapters → (optionally) select 1–3 chapters to process first for best attribution.
Characters
Click Detect Characters → review/merge/rename characters → fix split/merged lines using split/merge tools.
Voices
Refresh Characters → assign built-in voices or Clone Voice using a clean 6–20 s sample → Send to Audio Processing.
Audio Processing
Select chapters/batches → choose output (defaults to /output/audio) → Export (per-chapter or M4B full audiobook).

Tip: Start with a few chapters to dial in detection & voices, then run the whole book.

🖥️ GPU Support & Decision Guide

Both installers prompt for a PyTorch runtime. Pick the option that matches your hardware:

GPU Series / Setup	Runtime choice in installer	Notes
RTX 50-series (Blackwell)	CUDA 12.8	Required for GB202/GB203 (RTX 5090/5080/5070/5060); driver ≥ 570
RTX 40 / 30 / 20, GTX 16	CUDA 12.1	Latest features and fastest inference
RTX 10 / GTX 10 & older RTX	CUDA 11.8	Best fit for earlier CUDA-capable cards
No NVIDIA GPU / Virtualized	CPU only	Works everywhere (slower)

Driver baseline (NVIDIA): Ensure your driver supports the selected CUDA version (≥ 520 for CUDA 11.8, ≥ 535 for CUDA 12.1, ≥ 570 for CUDA 12.8). If in doubt, choose CPU to finish the install, then upgrade drivers and rerun the installer later.

🧩 Requirements

Minimum

Python 3.9+
Windows 10+/Ubuntu 20.04+/macOS 11+
8 GB RAM (16 GB recommended)
Optional NVIDIA GPU (see table above)

Nice to have

FFmpeg in PATH for audio enhancement/export niceties.

🔧 Troubleshooting (fast fixes)

No characters detected: Ensure the text uses standard " quotes; try processing 1–3 chapters.
GPU OOM or slow: Lower batch size in Settings or rerun the installer with a different runtime (CUDA 11.8 or CPU) after updating drivers.
Audio cut-offs / robotic output: Use higher-quality/longer (10–20 s) voice samples; check FFmpeg install.
PyTorch wheel version not found during install: Re-run the installer. v2.5 now retries with compatible wheel versions automatically.
Launch uses wrong Python environment: Use ./run_gui.sh from the project root. v2.5 pins startup to the project interpreter and reports the active executable.

🛡️ Hallucination Guard (optional)

PolyVox can now self-check XTTS lines before finalizing them. The guard synthesizes a segment, transcribes it with Whisper (prefers faster-whisper), and compares the transcript with the expected text. If the similarity drops below a threshold, it can retry—falling back to the deterministic XTTS preset on later attempts.

Enable it by exporting an environment variable before launching the app:

export POLYVOX_TTS_GUARD=on

Available modes:

on / true — guard every XTTS line.
auto (default) — guard narrator lines only.
narrator — identical to auto, explicit for readability.
off / false — disable the guard entirely.

Fine-tune behavior with optional overrides:

export POLYVOX_TTS_GUARD_THRESHOLD=0.9      # similarity target (0..1)
export POLYVOX_TTS_GUARD_RETRIES=2          # additional attempts if under threshold
export POLYVOX_TTS_GUARD_MODEL=base         # Whisper size: base, small, medium, etc.

Voice definitions (e.g., JSON entries) can opt in/out per character using:

"hallucination_guard": true | false
"guard_threshold": 0.88
"guard_max_retries": 2
"guard_model_size": "base.en"

Dependencies: install faster-whisper and rapidfuzz for the fast path. The guard falls back gracefully if the packages are missing.

�📚 Documentation

Linux Installation Guide — prerequisites, installer walkthrough, troubleshooting
→ INSTALL_LINUX.md
Windows Installation Guide — one-click batch installer and manual PowerShell steps
→ INSTALL_WINDOWS.md
GPU Compatibility Guide — supported cards, drivers, and runtime recommendations
→ GPU_COMPATIBILITY.md
Contributing guide — dev setup, testing, PR flow
→ CONTRIBUTING.md
PDF Chapter Detection Guide — font-size analysis, formatting detection, false-positive filtering
→ PDF_CHAPTER_DETECTION.md

📖 PDF Chapter Detection (v2.0)

PolyVox now includes advanced PDF chapter detection with:

🔤 Font-size detection (configurable 1.2x, 1.3x, 1.5x sensitivity)
🔨 Bold text detection for chapter headers
🛡️ False-positive filtering (bankruptcy context, paragraph boundaries, duplicates)
⚡ Background threading prevents UI freezing on large PDFs

👉 Full PDF Chapter Detection guide →

🤝 Contributing

PRs welcome! See CONTRIBUTING.md for style, tests, and PR checklist.

📜 License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
app		app
assets		assets
docs		docs
voices		voices
.gitignore		.gitignore
CHAPTER_DETECTION_IMPROVEMENTS.md		CHAPTER_DETECTION_IMPROVEMENTS.md
Contraction test.txt		Contraction test.txt
Emotion test.txt		Emotion test.txt
LICENSE		LICENSE
PDF_CHAPTER_DETECTION.md		PDF_CHAPTER_DETECTION.md
PDF_CLEANUP_SUMMARY.md		PDF_CLEANUP_SUMMARY.md
README.md		README.md
clear_chapter_cache.sh		clear_chapter_cache.sh
diagnose_issue.py		diagnose_issue.py
environment.yml		environment.yml
glob		glob
install.py		install.py
install_linux.sh		install_linux.sh
install_windows.bat		install_windows.bat
install_windows.ps1		install_windows.ps1
quick_test.py		quick_test.py
requirements_min.txt		requirements_min.txt
run_gui.bat		run_gui.bat
run_gui.sh		run_gui.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎭 PolyVox Studio

✨ What is PolyVox Studio?

🚀 Installation

🆕 v2.5 install & startup improvements

🐧 Linux (and advanced macOS setups)

🪟 Windows (one click)

⚡ Quick Start

🖥️ GPU Support & Decision Guide

🧩 Requirements

🔧 Troubleshooting (fast fixes)

🛡️ Hallucination Guard (optional)

�📚 Documentation

📖 PDF Chapter Detection (v2.0)

🤝 Contributing

📜 License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎭 PolyVox Studio

✨ What is PolyVox Studio?

🚀 Installation

🆕 v2.5 install & startup improvements

🐧 Linux (and advanced macOS setups)

🪟 Windows (one click)

⚡ Quick Start

🖥️ GPU Support & Decision Guide

🧩 Requirements

🔧 Troubleshooting (fast fixes)

🛡️ Hallucination Guard (optional)

�📚 Documentation

📖 PDF Chapter Detection (v2.0)

🤝 Contributing

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages