py-xiaozhi Guide: Python Voice Client for Xiaozhi AI - Smart Home & IoT Integration

py-xiaozhi is an open-source Python-based voice client designed to help users experience Xiaozhi AI
functionality without dedicated hardware. It supports voice interaction, image recognition, IoT device integration, and online music playback, compatible with Windows, macOS, and Linux systems. This guide provides detailed instructions on installation, configuration, operation, and feature usage, suitable for both beginners and advanced users.
1. Project Overview
py-xiaozhi
is an open-source project ported from xiaozhi-esp32, offering the following core features:
- AI Voice Interaction: Natural language dialogue through microphone
- Visual Multimodal: Support for image recognition and processing
- IoT Device Integration: Smart home device control and ecosystem building
- Online Music Playback: High-performance music player based on Pygame, supporting playback control, lyrics display, and local caching
- Graphical Interface: Intuitive GUI with Xiaozhi expressions and text display
- Voice Wake-up: Support for wake word activation (disabled by default)
- Auto-dialogue Mode: Continuous conversation for smoother interaction
- Cross-platform Volume Control: Adaptable to different operating systems
The project is hosted on GitHub, with documentation and tutorials available at the project website.
2. System Requirements
Before using py-xiaozhi
, ensure your device meets the following requirements:
Python Version
- Python version: 3.9 to 3.12 (latest stable version recommended)
Operating System
- Windows 10 or higher
- macOS 10.15 (Catalina) or higher
- Linux (common distributions like Ubuntu, Debian, etc.)
Hardware Requirements
- Microphone: For voice input
- Speakers: For voice output
- Stable network connection: For online features like music playback and AI interaction
- Disk space: At least 500MB available space (for project files and dependencies)
3. Download and Installation
3.1 Download the Project
- Visit the GitHub repository
- Choose one of the following methods to get the project:
- Clone the repository (recommended for easy updates):
git clone https://github.com/huangjunsen0406/py-xiaozhi.git
- Download ZIP file: Click the “Code” button on the GitHub page, select “Download ZIP”, then extract locally
- Clone the repository (recommended for easy updates):
- Ensure Git (for cloning and updates) or extraction tools are installed
3.2 Install Python
- Verify Python version 3.9-3.12:
- Run
python --version
orpython3 --version
to check version - If not installed or version mismatch, download and install from Python website
- Run
- Ensure pip (Python package manager) is available:
- Run
pip --version
orpip3 --version
to check - If unavailable, refer to Python official documentation for installation
- Run
3.3 Install Dependencies
- Enter project directory:
cd py-xiaozhi
- Install dependencies:
- If project provides requirements.txt, run:
pip install -r requirements.txt
- If no explicit list, refer to project documentation or GitHub README for dependency list
- If project provides requirements.txt, run:
Note: New dependencies may be introduced after main branch updates, requiring re-running pip install to ensure environment consistency.
4. Configuration
py-xiaozhi
configuration files are located in the config/
directory, including:
config.json
: Stores general settings like wake words, communication protocols, API keysefuse.json
: Hardware or specific feature configurations (as needed)
4.1 Configuration Steps
- Open the
config/
directory, locateconfig.json
- Modify key fields as needed (refer to configuration documentation for specific fields):
- Wake word: Set trigger keyword for voice interaction (disabled by default)
- Protocol: Choose between mqtt or websocket for communication
- API keys: Enter relevant keys for online AI services
- Device settings: Configure microphone and speaker device IDs (detectable via script)
- Save file, ensuring valid JSON format
4.2 IoT Feature Configuration
For IoT functionality (e.g., smart home device control), refer to IoT documentation:
- Configure IoT device connection parameters (IP address, port)
- Ensure devices are on the same network as py-xiaozhi
- Test connection
5. Running py-xiaozhi
5.1 Launch Commands
py-xiaozhi
launches via main.py
, supporting the following parameters:
Parameter | Description | Options |
---|---|---|
--mode | Operation mode | gui (graphical interface) or cli (command line) |
--protocol | Communication protocol | mqtt or websocket |
Example command:
python main.py --mode gui --protocol websocket
- GUI mode: Provides graphical interface with Xiaozhi expressions and interaction text, suitable for beginners
- CLI mode: Suitable for embedded devices or environments without GUI, ideal for advanced users
5.2 First Run
- Ensure microphone and speakers are working properly
- Run the above command, program will initialize and wait for voice input
- If wake word is enabled, speak the wake word (e.g., “Xiaozhi”) to activate interaction
- Test voice interaction, e.g., “What’s the weather today?” or “Play a song”
6. Feature Usage
6.1 AI Voice Interaction
Feature: Natural dialogue with Xiaozhi AI through microphone input.
Usage:
- After launching, wait for prompt tone or interface indication
- Speak commands, e.g., “Tell a joke” or “Check news”
- Program responds through speakers or interface
Tip: Enable continuous dialogue mode for smooth interaction without repeated wake-up.
6.2 Visual Multimodal
Feature: Support for image upload and recognition, combined with voice for multimodal interaction.
Usage:
- In GUI mode, upload images (supports JPG, PNG, etc.)
- Speak related commands, e.g., “Describe this image”
- Xiaozhi analyzes image and returns description
Note: Ensure stable network connection, requires additional API configuration (Zhipu multimodal model).
6.3 IoT Device Integration
Feature: Control smart home devices like lights, air conditioners, etc.
Usage:
- Refer to IoT documentation for device configuration
- Use voice commands, e.g., “Turn on living room lights”
- Verify device state changes
Tip: Ensure devices support MQTT or WebSocket protocol.
6.4 Online Music Playback
Feature: Play online music through Pygame, supporting playback control, lyrics display.
Usage:
- Speak commands, e.g., “Play Jay Chou’s songs”
- Program fetches and plays music from online sources
- Support control commands like “pause” or “next”
Note: Local caching feature requires sufficient disk space.
6.5 Voice Wake-up and Continuous Dialogue
- Voice wake-up: Speak configured wake word to activate interaction (disabled by default)
- Continuous dialogue: No need for repeated wake-up after enabling, suitable for extended interaction
- Setup: Enable related options in config.json
6.6 Volume Control
Feature: Cross-platform volume adjustment for different scenarios.
Usage: Adjust through voice commands (e.g., “Increase volume”) or GUI interface.
7. Learning and Support Resources
7.1 Official Documentation
- Project website: huangjunsen0406.github.io/py-xiaozhi
- Includes quick start, configuration guide, and feature documentation
- GitHub documentation:
7.2 Video Tutorials
- Bilibili Video
- Provides visual installation and usage guidance, suitable for beginners
7.3 Community Support
- GitHub Issues: Visit Issues page to submit problems or view FAQs
- Community Discussion: Participate in GitHub Discussions or related forums for user experience sharing
8. Important Notes
- Regular Updates: py-xiaozhi is an open-source project with frequent main branch updates. Run
git pull
for latest code and reinstall dependencies - Network Dependency: Online features (music playback, AI interaction) require stable network
- Debugging:
- If voice recognition fails, check microphone settings and network connection
- If GUI doesn’t display, ensure necessary graphics libraries (PyQt or Tkinter) are installed
- Compatibility: Some features (e.g., IoT) may require specific hardware or protocol support, confirm device compatibility in advance
- Current Time: This guide is based on information from May 20, 2025, check latest documentation for updates
9. Troubleshooting
Issue | Possible Cause | Solution |
---|---|---|
Program won’t start | Missing dependencies or version incompatibility | Run pip install -r requirements.txt , check Python version |
Voice recognition fails | Microphone not properly configured | Check system audio settings, verify microphone availability |
IoT device connection fails | Network or protocol configuration error | Refer to IoT documentation, check device IP and protocol |
GUI doesn’t display | Missing graphics libraries or environment issues | Install PyQt/Tkinter, check display settings |
Music playback fails | Unstable network or API configuration error | Check network, verify API key validity |
For other issues, check GitHub Issues or submit new issues.
Key Resources
More Articles
![OpenAI 12-Day Technical Livestream Highlights Detailed Report [December 2024]](/_astro/openai-12day.C2KzT-7l_1ndTgg.jpg)







