An intelligent voice agent that navigates Interactive Voice Response (IVR) systems using LiveKit's voice agents, allowing users accomplish custom tasks through external phone menu systems.
NavigatorAgent - A voice-enabled agent that listens to IVR menu options, interprets the choices, and automatically sends DTMF (dual-tone multi-frequency) codes to navigate through phone systems based on a user-defined task.
- Task-Based Navigation: Define your goal, and the agent intelligently navigates IVR menus to accomplish it
- DTMF Code Automation: Automatically sends touch-tone codes to interact with phone systems
- Real-time Visual Feedback: See DTMF codes being pressed in the web interface
- Voice-Enabled: Built using LiveKit's voice capabilities with support for:
- Speech-to-Text (STT) using Deepgram
- Large Language Model (LLM) using OpenAI
- Text-to-Speech (TTS) using Cartesia
- Voice Activity Detection (VAD) using Silero
- SIP Integration: Connects to phone systems via SIP trunks
- Web Frontend: React-based interface for initiating calls and monitoring progress
- User defines a task in the web interface (e.g., "Check my account balance")
- User enters the phone number they want to call
- The agent connects to the phone system via SIP
- As the IVR presents menu options, the agent listens and interprets them
- Based on the task, the agent automatically presses the appropriate DTMF codes
- Visual feedback shows which codes are being pressed in real-time
- The agent continues navigating until the task is completed or requires human intervention
- Python 3.10+
livekit-agents>=1.0- LiveKit account and credentials
- API keys for:
- OpenAI (for LLM capabilities)
- Deepgram (for speech-to-text)
- Cartesia (for text-to-speech)
- SIP trunk configured in LiveKit for phone connectivity
- Node.js and pnpm (for the frontend)
-
Clone the repository
-
Install Python dependencies:
pip install -r requirements.txt
-
Create a
.envfile in the parent directory with your API credentials:LIVEKIT_URL=your_livekit_url LIVEKIT_API_KEY=your_api_key LIVEKIT_API_SECRET=your_api_secret LIVEKIT_HOST=your_livekit_host OPENAI_API_KEY=your_openai_key DEEPGRAM_API_KEY=your_deepgram_key CARTESIA_API_KEY=your_cartesia_key SIP_TRUNK_ID=your_sip_trunk_id
-
Start the agent:
python agent.py dev
-
In a separate terminal, start the Flask backend:
python app.py
-
In another terminal, navigate to the frontend directory and start the React app:
cd ivr-agent-frontend pnpm install pnpm dev
The agent will be ready to accept calls through the web interface at http://localhost:5173.
- NavigatorAgent: The core agent class that handles IVR navigation
- UserData: Stores session-specific data including the task and DTMF cooldown timing
- send_dtmf_code: Function tool that sends DTMF codes with a 3-second cooldown to prevent rapid firing
- SIP participant connects to the LiveKit room
- Task is extracted from participant attributes
- Agent session is initialized with voice providers
- Agent receives task-specific instructions
- Agent listens to IVR and uses the LLM to decide which DTMF codes to send
- React app provides task input and phone number entry
- Real-time visualization of agent state and DTMF codes being pressed
- WebSocket connection to LiveKit room for live updates
While not strictly necessary (a Next.js app could handle both frontend and API calls), this example includes a Flask backend specifically to demonstrate how to use LiveKit's Python API library for making API calls. The Flask server handles:
- Generating LiveKit tokens for room access
- Making API calls to initiate SIP calls
- Serving as an example of Python-based LiveKit integration
In production, you could consolidate these functions into your frontend framework of choice.
- Modify Agent Instructions: Update the prompt template in the
on_entermethod to change how the agent interprets tasks - Change Voice Providers: Replace Deepgram, OpenAI, or Cartesia with other supported providers in the
entrypointfunction - Adjust DTMF Timing: Modify the cooldown period in the
send_dtmf_codefunction (currently 3 seconds) - Extend Task Capabilities: Add more sophisticated task parsing or multi-step navigation logic