Skip to content

meekstellar/dicto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dicto

Offline voice dictation for Linux. Press a global hotkey, speak, and the recognized text lands on your clipboard, ready to paste anywhere with Ctrl+V.

Features

  • Offline. Uses whisper.cpp locally — no audio leaves the machine.
  • GPU-accelerated. Vulkan backend on any modern GPU (NVIDIA / AMD / Intel). Inference for short phrases finishes in 1–3 seconds.
  • Global hotkey. Registered via xdg-desktop-portal GlobalShortcuts (default: Ctrl+Alt+Space, rebindable in system settings).
  • Tray icon. Visible state — idle / recording / processing.
  • Sound cues on each transition (opt-out via --no-sounds).
  • Auto language detection between English and Russian (and 97 other whisper languages).
  • Clipboard-only output — no keystroke synthesis, no ydotool / uinput setup. You paste with Ctrl+V.

Platforms

  • Linux on Wayland: GNOME 45+ or KDE Plasma.
  • Tested on Ubuntu 25.10.
  • Windows port planned.

Install (Debian/Ubuntu)

Download the latest .deb from Releases and:

sudo apt install ./dicto_0.1.0-1_amd64.deb

The whisper large-v3-turbo model (~1.6 GB) is bundled inside the package — no extra download needed. System dependencies (libasound2, libvulkan1, systemd, xdg-desktop-portal, …) are pulled in by apt automatically.

Usage

Launch dicto from your application menu, or run:

dicto-daemon

Then press Ctrl+Alt+Space to start recording, press again to stop. The transcript is copied to your clipboard — paste with Ctrl+V anywhere.

The shortcut can be rebound in Settings → Keyboard → View and Customize Shortcuts (look for the dicto entry).

Switching models

sudo dicto-fetch-model small         # smaller, faster, lower quality (~466 MB)
sudo dicto-fetch-model medium        # balanced (~1.5 GB)
sudo dicto-fetch-model large-v3      # highest quality, slower (~2.9 GB)
sudo dicto-fetch-model large-v3-turbo  # default — quality close to large-v3, much faster

Or per-user (no sudo, downloads to ~/.local/share/dicto/models/):

dicto-fetch-model medium

The daemon picks the per-user model first, falling back to the system-wide one.

Build from source

System packages required:

sudo apt install build-essential clang cmake libasound2-dev libvulkan-dev glslc

Then:

cargo build --release

Build a .deb:

cargo install cargo-deb
cargo deb --no-build   # uses target/release/dicto built above

How it works

Concern Solution
Audio capture cpal on a dedicated OS thread
Speech recognition whisper-rs (whisper.cpp) with use_gpu = true via Vulkan
Global hotkey ashpd + xdg-desktop-portal GlobalShortcuts
Tray icon ksni (StatusNotifierItem)
Sound cues rodio generating short sine-wave tones
Clipboard arboard
GNOME app_id resolution Launcher wraps the binary in systemd-run --scope with a name the portal can parse from /proc/PID/cgroup

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors