Microsoft TRELLIS Tutorial Guide to 3D Generation with Image and Text

TRELLIS 3D Generation AI Open Source Image-to-3D Text-to-3D

Jan 11, 2025 3 min read

Cover image for Microsoft TRELLIS Tutorial Guide to 3D Generation with Image and Text

Introduction to TRELLIS

TRELLIS is a large-scale 3D asset generation model open-sourced by Microsoft that supports high-quality 3D content generation from text or images. It employs a structured 3D latent space approach to achieve scalable and versatile 3D generation.

Key Features:

Supports both image-to-3D and text-to-3D generation modes
Uses structured 3D latent space approach for higher generation quality
Provides multiple 3D representation formats (Gaussian point clouds, radiance fields, meshes, etc.)
Open source and easy to deploy
Supports export to standard 3D file formats like GLB/PLY

Online Demo

Hugging Face

Quick Start

System Requirements

CUDA-compatible NVIDIA GPU (RTX 30/40 series recommended)
CUDA Toolkit 11.8 or 12.2
Python 3.8+
conda package manager

Installation Steps

Clone the repository:

git clone https://github.com/microsoft/TRELLIS.git
cd TRELLIS

Create and activate conda environment:

# Using CUDA 11.8
. ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast

# For CUDA 12.2, manually install dependencies
conda create -n trellis python=3.10
conda activate trellis
pip install -r requirements.txt

Download pre-trained models:

Currently available pre-trained models include:

TRELLIS-image-large: Large image-to-3D model (1.2B parameters)
TRELLIS-text-base: Base text-to-3D model (342M parameters)
TRELLIS-text-large: Large text-to-3D model (1.1B parameters)
TRELLIS-text-xlarge: Extra-large text-to-3D model (2.0B parameters)

You can download these models from Hugging Face.

Usage Tutorial

Image-to-3D Example

import os
# Set backend
os.environ['SPCONV_ALGO'] = 'native'  # Options: 'native' or 'auto'

import imageio
from PIL import Image
from trellis.pipelines import TrellisImageTo3DPipeline
from trellis.utils import render_utils, postprocessing_utils

# Load model
pipeline = TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large")
pipeline.cuda()

# Load input image
image = Image.open("input.png")

# Run generation
outputs = pipeline.run(
    image,
    seed=1,
    # Optional parameters
    # sparse_structure_sampler_params={
    #     "steps": 12,
    #     "cfg_strength": 7.5,
    # },
    # slat_sampler_params={
    #     "steps": 12,
    #     "cfg_strength": 3,
    # },
)

# Render preview video
video = render_utils.render_video(outputs['gaussian'][0])['color']
imageio.mimsave("preview_gs.mp4", video, fps=30)

# Export 3D file
glb = postprocessing_utils.to_glb(
    outputs['gaussian'][0],
    outputs['mesh'][0],
    simplify=0.95,          # Mesh simplification ratio
    texture_size=1024,      # Texture size
)
glb.export("output.glb")

# Save point cloud data
outputs['gaussian'][0].save_ply("output.ply")

Web Demo Interface

TRELLIS provides a Gradio-based web demo interface. Run the following commands to start:

# Install additional dependencies
. ./setup.sh --demo

# Start service
python app.py

After starting, you can access the web interface through your browser.

Best Practices

Input Image Recommendations:

Use clear images with moderate contrast
Ensure object contours are clearly visible
Avoid complex backgrounds and occlusions

Generation Parameter Tuning:

Increase sampling steps for better quality
Adjust cfg_strength parameter to control generation fidelity
Try different random seeds

Performance Optimization:

Use native backend for faster initial runs
Reduce texture resolution to decrease VRAM usage
Adjust mesh simplification ratio as needed during export

Common Issues

Insufficient VRAM:

Reduce batch size
Use smaller model versions
Decrease sampling steps

Generation Quality Issues:

Check input image quality
Increase sampling steps
Adjust cfg_strength parameter

Large Export Files:

Increase mesh simplification ratio
Reduce texture resolution
Choose appropriate file formats

References

OpenAI 12-Day Technical Livestream Highlights Detailed Report [December 2024]

Understanding Core AI Technologies: The Synergy of MCP, Agent, RAG, and Function Call

AI Model Tools Comparison How to Choose Between SGLang, Ollama, VLLM, and LLaMA.cpp?

Ant Design X - React Component Library for Building AI Chat Applications

CES 2024 Review：Revisiting the Tech Highlights of 2024

VLC Automatic Subtitles and Translation (Based on Local Offline Open-Source AI Models) | CES 2025

Chrome(Chromium) Historical Version Offline Installer Download Guide

ClearerVoice-Studio: A One-Stop Solution for Speech Enhancement, Speech Denoising, Speech Separation and Speaker Extraction

CogAgent-9B Released: A GUI Interaction Model Jointly Developed by Zhipu AI and Tsinghua

Search Content