Tech Explorer Logo

Search Content

ClearerVoice-Studio: A One-Stop Solution for Speech Enhancement, Speech Denoising, Speech Separation and Speaker Extraction

3 min read
Cover image for ClearerVoice-Studio: A One-Stop Solution for Speech Enhancement, Speech Denoising, Speech Separation and Speaker Extraction

Introduction

ClearerVoice-Studio is a unified inference platform focusing on Speech Enhancement, Speech Separation, and Audio-Visual Target Speaker Extraction. This tutorial will guide you through using this powerful tool for various audio processing tasks.

Supported Pre-trained Models

The platform currently offers the following pre-trained models:

Speech Enhancement (16kHz & 48kHz)

  • MossFormer2_SE_48K
  • FRCRN_SE_16K
  • MossFormerGAN_SE_16K

Speech Separation (16kHz)

  • MossFormer2_SS_16K

Audio-Visual Target Speaker Extraction (16kHz)

  • AV_MossFormer2_TSE_16K

All models are hosted on HuggingFace and will be automatically downloaded when needed.

Environment Setup

1. Clone the Repository

   git clone https://github.com/modelscope/ClearerVoice-Studio.git

2. Create Conda Environment

   cd ClearerVoice-Studio
conda create -n ClearerVoice-Studio python=3.8
conda activate ClearerVoice-Studio
pip install -r requirements.txt

Usage Tutorial

1. Speech Enhancement Example

   from clearvoice import ClearVoice
import os

# Initialize speech enhancement model
cv_se = ClearVoice(
    task='speech_enhancement',
    model_names=['MossFormer2_SE_48K']
)

# Process single audio file
input_path = 'samples/noisy.wav'
output_wav = cv_se(
    input_path=input_path,
    online_write=False
)

# Save enhanced audio
output_dir = 'samples/enhanced'
os.makedirs(output_dir, exist_ok=True)
output_path = os.path.join(output_dir, 'enhanced.wav')
cv_se.write(output_wav, output_path=output_path)

2. Speech Separation Example

   # Initialize speech separation model
cv_ss = ClearVoice(
    task='speech_separation',
    model_names=['MossFormer2_SS_16K']
)

# Process mixed speech file
input_path = 'samples/mixed.wav'
output_dir = 'samples/separated'
os.makedirs(output_dir, exist_ok=True)

# Separate speech and save automatically
cv_ss(
    input_path=input_path,
    online_write=True,
    output_path=output_dir
)

# Generated files:
# - output_MossFormer2_SS_16K_spk1.wav
# - output_MossFormer2_SS_16K_spk2.wav

3. Target Speaker Extraction Example

   # Initialize target speaker extraction model
cv_tse = ClearVoice(
    task='target_speaker_extraction',
    model_names=['AV_MossFormer2_TSE_16K']
)

# Process video file
input_path = 'samples/video.mp4'
output_dir = 'samples/extracted'
os.makedirs(output_dir, exist_ok=True)

# Extract target speaker's voice
cv_tse(
    input_path=input_path,
    online_write=True,
    output_path=output_dir
)

# Generated files:
# - extracted_speech.wav (target speaker's voice)
# - background.wav (background audio)

Batch Processing Example

   def process_directory(input_dir, output_dir, task='speech_enhancement'):
    # Initialize model
    cv = ClearVoice(
        task=task,
        model_names=['MossFormer2_SE_48K'] if task == 'speech_enhancement' else 
                   ['MossFormer2_SS_16K'] if task == 'speech_separation' else
                   ['AV_MossFormer2_TSE_16K']
    )
    
    # Ensure output directory exists
    os.makedirs(output_dir, exist_ok=True)
    
    # Get all audio files
    audio_files = [f for f in os.listdir(input_dir) if f.endswith(('.wav', '.mp4', '.avi'))]
    
    # Batch processing
    for audio_file in audio_files:
        input_path = os.path.join(input_dir, audio_file)
        cv(
            input_path=input_path,
            online_write=True,
            output_path=output_dir
        )
        print(f"Processed: {audio_file}")

# Usage example
process_directory(
    input_dir='samples/input',
    output_dir='samples/output',
    task='speech_enhancement'
)

Advanced Usage: Progress Monitoring

   import tqdm

def process_with_progress(input_files, task='speech_enhancement'):
    cv = ClearVoice(task=task)
    
    for file in tqdm.tqdm(input_files, desc=f"Processing {task}"):
        try:
            cv(
                input_path=file,
                online_write=True,
                output_path='samples/output'
            )
        except Exception as e:
            print(f"Error processing {file}: {str(e)}")
            continue

Parameter Description

  • task: Select processing task
    • speech_enhancement: Speech Enhancement
    • speech_separation: Speech Separation
    • target_speaker_extraction: Target Speaker Extraction
  • model_names: List of model names, can select one or more models
  • input_path: Input path, supports single file, directory, or list file (.scp)
  • online_write: Whether to save results during processing
  • output_path: Output path, can be file or directory

Performance Evaluation

VoiceBank+DEMAND Test Set (16kHz) Performance Comparison

ModelPESQSTOISSNRP808_MOS
Noisy Audio1.970.926.133.05
FRCRN_SE_16K3.230.957.603.59
MossFormerGAN_SE_16K3.470.969.093.57
MossFormer2_SE_48K3.160.956.863.53

DNS-Challenge-2020 Test Set Performance Comparison

ModelPESQSTOISSNRP808_MOS
Noisy Audio1.580.919.353.15
FRCRN_SE_16K3.240.987.604.03
MossFormerGAN_SE_16K3.570.9814.034.05
MossFormer2_SE_48K2.940.9711.863.92

Best Practices

  1. Model Selection:

    • For 48kHz high-fidelity audio, prefer MossFormer2_SE_48K
    • For 16kHz audio, choose based on scenario:
      • General use: Use MossFormerGAN_SE_16K or MossFormer2_SE_16K
  2. Batch Processing Optimization:

    • Use online_write=True when processing large amounts of audio files
    • Use .scp list files to manage batch processing tasks
  3. Performance Considerations:

    • Balance audio quality and processing speed based on actual needs
    • Choose appropriate batch size based on hardware resources

Conclusion

ClearerVoice-Studio provides a powerful and user-friendly solution for audio processing. Through this tutorial, you should be able to master its basic usage and choose appropriate models based on your specific needs. As the project continues to evolve, we look forward to seeing more excellent pre-trained models and features added.

Share

Related Posts

No related posts yet