py-xiaozhi Guide: Python Voice Client for Xiaozhi AI - Smart Home & IoT Integration

py-xiaozhi AI Voice Assistant IoT Control Open Source Home Assistant

May 15, 2025 5 min read

Cover image for py-xiaozhi Guide: Python Voice Client for Xiaozhi AI - Smart Home & IoT Integration

py-xiaozhi is an open-source Python-based voice client designed to help users experience Xiaozhi AI functionality without dedicated hardware. It supports voice interaction, image recognition, IoT device integration, and online music playback, compatible with Windows, macOS, and Linux systems. This guide provides detailed instructions on installation, configuration, operation, and feature usage, suitable for both beginners and advanced users.

1. Project Overview

py-xiaozhi is an open-source project ported from xiaozhi-esp32, offering the following core features:

AI Voice Interaction: Natural language dialogue through microphone
Visual Multimodal: Support for image recognition and processing
IoT Device Integration: Smart home device control and ecosystem building
Online Music Playback: High-performance music player based on Pygame, supporting playback control, lyrics display, and local caching
Graphical Interface: Intuitive GUI with Xiaozhi expressions and text display
Voice Wake-up: Support for wake word activation (disabled by default)
Auto-dialogue Mode: Continuous conversation for smoother interaction
Cross-platform Volume Control: Adaptable to different operating systems

The project is hosted on GitHub, with documentation and tutorials available at the project website.

2. System Requirements

Before using py-xiaozhi, ensure your device meets the following requirements:

Python Version

Python version: 3.9 to 3.12 (latest stable version recommended)

Operating System

Windows 10 or higher
macOS 10.15 (Catalina) or higher
Linux (common distributions like Ubuntu, Debian, etc.)

Hardware Requirements

Microphone: For voice input
Speakers: For voice output
Stable network connection: For online features like music playback and AI interaction
Disk space: At least 500MB available space (for project files and dependencies)

3. Download and Installation

3.1 Download the Project

Visit the GitHub repository
Choose one of the following methods to get the project:
- Clone the repository (recommended for easy updates):
```
git clone https://github.com/huangjunsen0406/py-xiaozhi.git
```
- Download ZIP file: Click the “Code” button on the GitHub page, select “Download ZIP”, then extract locally
Ensure Git (for cloning and updates) or extraction tools are installed

3.2 Install Python

Verify Python version 3.9-3.12:
- Run python --version or python3 --version to check version
- If not installed or version mismatch, download and install from Python website
Ensure pip (Python package manager) is available:
- Run pip --version or pip3 --version to check
- If unavailable, refer to Python official documentation for installation

3.3 Install Dependencies

Enter project directory:
```
cd py-xiaozhi
```
Install dependencies:
- If project provides requirements.txt, run:
```
pip install -r requirements.txt
```
- If no explicit list, refer to project documentation or GitHub README for dependency list

Note: New dependencies may be introduced after main branch updates, requiring re-running pip install to ensure environment consistency.

4. Configuration

py-xiaozhi configuration files are located in the config/ directory, including:

config.json: Stores general settings like wake words, communication protocols, API keys
efuse.json: Hardware or specific feature configurations (as needed)

4.1 Configuration Steps

Open the config/ directory, locate config.json
Modify key fields as needed (refer to configuration documentation for specific fields):
- Wake word: Set trigger keyword for voice interaction (disabled by default)
- Protocol: Choose between mqtt or websocket for communication
- API keys: Enter relevant keys for online AI services
- Device settings: Configure microphone and speaker device IDs (detectable via script)
Save file, ensuring valid JSON format

4.2 IoT Feature Configuration

For IoT functionality (e.g., smart home device control), refer to IoT documentation:

Configure IoT device connection parameters (IP address, port)
Ensure devices are on the same network as py-xiaozhi
Test connection

5. Running py-xiaozhi

5.1 Launch Commands

py-xiaozhi launches via main.py, supporting the following parameters:

Parameter	Description	Options
`--mode`	Operation mode	`gui` (graphical interface) or `cli` (command line)
`--protocol`	Communication protocol	`mqtt` or `websocket`

Example command:

python main.py --mode gui --protocol websocket

GUI mode: Provides graphical interface with Xiaozhi expressions and interaction text, suitable for beginners
CLI mode: Suitable for embedded devices or environments without GUI, ideal for advanced users

5.2 First Run

Ensure microphone and speakers are working properly
Run the above command, program will initialize and wait for voice input
If wake word is enabled, speak the wake word (e.g., “Xiaozhi”) to activate interaction
Test voice interaction, e.g., “What’s the weather today?” or “Play a song”

6. Feature Usage

6.1 AI Voice Interaction

Feature: Natural dialogue with Xiaozhi AI through microphone input.

Usage:

After launching, wait for prompt tone or interface indication
Speak commands, e.g., “Tell a joke” or “Check news”
Program responds through speakers or interface

Tip: Enable continuous dialogue mode for smooth interaction without repeated wake-up.

6.2 Visual Multimodal

Feature: Support for image upload and recognition, combined with voice for multimodal interaction.

Usage:

In GUI mode, upload images (supports JPG, PNG, etc.)
Speak related commands, e.g., “Describe this image”
Xiaozhi analyzes image and returns description

Note: Ensure stable network connection, requires additional API configuration (Zhipu multimodal model).

6.3 IoT Device Integration

Feature: Control smart home devices like lights, air conditioners, etc.

Usage:

Refer to IoT documentation for device configuration
Use voice commands, e.g., “Turn on living room lights”
Verify device state changes

Tip: Ensure devices support MQTT or WebSocket protocol.

6.4 Online Music Playback

Feature: Play online music through Pygame, supporting playback control, lyrics display.

Usage:

Speak commands, e.g., “Play Jay Chou’s songs”
Program fetches and plays music from online sources
Support control commands like “pause” or “next”

Note: Local caching feature requires sufficient disk space.

6.5 Voice Wake-up and Continuous Dialogue

Voice wake-up: Speak configured wake word to activate interaction (disabled by default)
Continuous dialogue: No need for repeated wake-up after enabling, suitable for extended interaction
Setup: Enable related options in config.json

6.6 Volume Control

Feature: Cross-platform volume adjustment for different scenarios.

Usage: Adjust through voice commands (e.g., “Increase volume”) or GUI interface.

7. Learning and Support Resources

7.1 Official Documentation

Project website: huangjunsen0406.github.io/py-xiaozhi
Includes quick start, configuration guide, and feature documentation
GitHub documentation:
- Configuration Guide
- IoT Feature Guide

7.2 Video Tutorials

Bilibili Video
Provides visual installation and usage guidance, suitable for beginners

7.3 Community Support

GitHub Issues: Visit Issues page to submit problems or view FAQs
Community Discussion: Participate in GitHub Discussions or related forums for user experience sharing

8. Important Notes

Regular Updates: py-xiaozhi is an open-source project with frequent main branch updates. Run git pull for latest code and reinstall dependencies
Network Dependency: Online features (music playback, AI interaction) require stable network
Debugging:
- If voice recognition fails, check microphone settings and network connection
- If GUI doesn’t display, ensure necessary graphics libraries (PyQt or Tkinter) are installed
Compatibility: Some features (e.g., IoT) may require specific hardware or protocol support, confirm device compatibility in advance
Current Time: This guide is based on information from May 20, 2025, check latest documentation for updates

9. Troubleshooting

Issue	Possible Cause	Solution
Program won’t start	Missing dependencies or version incompatibility	Run `pip install -r requirements.txt`, check Python version
Voice recognition fails	Microphone not properly configured	Check system audio settings, verify microphone availability
IoT device connection fails	Network or protocol configuration error	Refer to IoT documentation, check device IP and protocol
GUI doesn’t display	Missing graphics libraries or environment issues	Install PyQt/Tkinter, check display settings
Music playback fails	Unstable network or API configuration error	Check network, verify API key validity