Tech Explorer Logo

Search Content

py-xiaozhi Guide: Python Voice Client for Xiaozhi AI - Smart Home & IoT Integration

5 min read
Cover image for py-xiaozhi Guide: Python Voice Client for Xiaozhi AI - Smart Home & IoT Integration

py-xiaozhi is an open-source Python-based voice client designed to help users experience Xiaozhi AI functionality without dedicated hardware. It supports voice interaction, image recognition, IoT device integration, and online music playback, compatible with Windows, macOS, and Linux systems. This guide provides detailed instructions on installation, configuration, operation, and feature usage, suitable for both beginners and advanced users.

1. Project Overview

py-xiaozhi is an open-source project ported from xiaozhi-esp32, offering the following core features:

  • AI Voice Interaction: Natural language dialogue through microphone
  • Visual Multimodal: Support for image recognition and processing
  • IoT Device Integration: Smart home device control and ecosystem building
  • Online Music Playback: High-performance music player based on Pygame, supporting playback control, lyrics display, and local caching
  • Graphical Interface: Intuitive GUI with Xiaozhi expressions and text display
  • Voice Wake-up: Support for wake word activation (disabled by default)
  • Auto-dialogue Mode: Continuous conversation for smoother interaction
  • Cross-platform Volume Control: Adaptable to different operating systems

The project is hosted on GitHub, with documentation and tutorials available at the project website.

2. System Requirements

Before using py-xiaozhi, ensure your device meets the following requirements:

Python Version

  • Python version: 3.9 to 3.12 (latest stable version recommended)

Operating System

  • Windows 10 or higher
  • macOS 10.15 (Catalina) or higher
  • Linux (common distributions like Ubuntu, Debian, etc.)

Hardware Requirements

  • Microphone: For voice input
  • Speakers: For voice output
  • Stable network connection: For online features like music playback and AI interaction
  • Disk space: At least 500MB available space (for project files and dependencies)

3. Download and Installation

3.1 Download the Project

  1. Visit the GitHub repository
  2. Choose one of the following methods to get the project:
    • Clone the repository (recommended for easy updates):
         git clone https://github.com/huangjunsen0406/py-xiaozhi.git
    • Download ZIP file: Click the “Code” button on the GitHub page, select “Download ZIP”, then extract locally
  3. Ensure Git (for cloning and updates) or extraction tools are installed

3.2 Install Python

  1. Verify Python version 3.9-3.12:
    • Run python --version or python3 --version to check version
    • If not installed or version mismatch, download and install from Python website
  2. Ensure pip (Python package manager) is available:
    • Run pip --version or pip3 --version to check
    • If unavailable, refer to Python official documentation for installation

3.3 Install Dependencies

  1. Enter project directory:
       cd py-xiaozhi
  2. Install dependencies:
    • If project provides requirements.txt, run:
         pip install -r requirements.txt
    • If no explicit list, refer to project documentation or GitHub README for dependency list

Note: New dependencies may be introduced after main branch updates, requiring re-running pip install to ensure environment consistency.

4. Configuration

py-xiaozhi configuration files are located in the config/ directory, including:

  • config.json: Stores general settings like wake words, communication protocols, API keys
  • efuse.json: Hardware or specific feature configurations (as needed)

4.1 Configuration Steps

  1. Open the config/ directory, locate config.json
  2. Modify key fields as needed (refer to configuration documentation for specific fields):
    • Wake word: Set trigger keyword for voice interaction (disabled by default)
    • Protocol: Choose between mqtt or websocket for communication
    • API keys: Enter relevant keys for online AI services
    • Device settings: Configure microphone and speaker device IDs (detectable via script)
  3. Save file, ensuring valid JSON format

4.2 IoT Feature Configuration

For IoT functionality (e.g., smart home device control), refer to IoT documentation:

  1. Configure IoT device connection parameters (IP address, port)
  2. Ensure devices are on the same network as py-xiaozhi
  3. Test connection

5. Running py-xiaozhi

5.1 Launch Commands

py-xiaozhi launches via main.py, supporting the following parameters:

ParameterDescriptionOptions
--modeOperation modegui (graphical interface) or cli (command line)
--protocolCommunication protocolmqtt or websocket

Example command:

   python main.py --mode gui --protocol websocket
  • GUI mode: Provides graphical interface with Xiaozhi expressions and interaction text, suitable for beginners
  • CLI mode: Suitable for embedded devices or environments without GUI, ideal for advanced users

5.2 First Run

  1. Ensure microphone and speakers are working properly
  2. Run the above command, program will initialize and wait for voice input
  3. If wake word is enabled, speak the wake word (e.g., “Xiaozhi”) to activate interaction
  4. Test voice interaction, e.g., “What’s the weather today?” or “Play a song”

6. Feature Usage

6.1 AI Voice Interaction

Feature: Natural dialogue with Xiaozhi AI through microphone input.

Usage:

  1. After launching, wait for prompt tone or interface indication
  2. Speak commands, e.g., “Tell a joke” or “Check news”
  3. Program responds through speakers or interface

Tip: Enable continuous dialogue mode for smooth interaction without repeated wake-up.

6.2 Visual Multimodal

Feature: Support for image upload and recognition, combined with voice for multimodal interaction.

Usage:

  1. In GUI mode, upload images (supports JPG, PNG, etc.)
  2. Speak related commands, e.g., “Describe this image”
  3. Xiaozhi analyzes image and returns description

Note: Ensure stable network connection, requires additional API configuration (Zhipu multimodal model).

6.3 IoT Device Integration

Feature: Control smart home devices like lights, air conditioners, etc.

Usage:

  1. Refer to IoT documentation for device configuration
  2. Use voice commands, e.g., “Turn on living room lights”
  3. Verify device state changes

Tip: Ensure devices support MQTT or WebSocket protocol.

6.4 Online Music Playback

Feature: Play online music through Pygame, supporting playback control, lyrics display.

Usage:

  1. Speak commands, e.g., “Play Jay Chou’s songs”
  2. Program fetches and plays music from online sources
  3. Support control commands like “pause” or “next”

Note: Local caching feature requires sufficient disk space.

6.5 Voice Wake-up and Continuous Dialogue

  • Voice wake-up: Speak configured wake word to activate interaction (disabled by default)
  • Continuous dialogue: No need for repeated wake-up after enabling, suitable for extended interaction
  • Setup: Enable related options in config.json

6.6 Volume Control

Feature: Cross-platform volume adjustment for different scenarios.

Usage: Adjust through voice commands (e.g., “Increase volume”) or GUI interface.

7. Learning and Support Resources

7.1 Official Documentation

7.2 Video Tutorials

  • Bilibili Video
  • Provides visual installation and usage guidance, suitable for beginners

7.3 Community Support

  • GitHub Issues: Visit Issues page to submit problems or view FAQs
  • Community Discussion: Participate in GitHub Discussions or related forums for user experience sharing

8. Important Notes

  • Regular Updates: py-xiaozhi is an open-source project with frequent main branch updates. Run git pull for latest code and reinstall dependencies
  • Network Dependency: Online features (music playback, AI interaction) require stable network
  • Debugging:
    • If voice recognition fails, check microphone settings and network connection
    • If GUI doesn’t display, ensure necessary graphics libraries (PyQt or Tkinter) are installed
  • Compatibility: Some features (e.g., IoT) may require specific hardware or protocol support, confirm device compatibility in advance
  • Current Time: This guide is based on information from May 20, 2025, check latest documentation for updates

9. Troubleshooting

IssuePossible CauseSolution
Program won’t startMissing dependencies or version incompatibilityRun pip install -r requirements.txt, check Python version
Voice recognition failsMicrophone not properly configuredCheck system audio settings, verify microphone availability
IoT device connection failsNetwork or protocol configuration errorRefer to IoT documentation, check device IP and protocol
GUI doesn’t displayMissing graphics libraries or environment issuesInstall PyQt/Tkinter, check display settings
Music playback failsUnstable network or API configuration errorCheck network, verify API key validity

For other issues, check GitHub Issues or submit new issues.

Key Resources

Share

More Articles