MarkItDown: Microsoft AI-Powered Document Conversion Tool for PDF, Office Files and More

MarkItDown Tutorial Smart Document Conversion OCR Recognition AI Document Processing Multi-format Support Document Intelligence Knowledge Base Tools

Dec 15, 2024 2 min read

Cover image for MarkItDown: Microsoft AI-Powered Document Conversion Tool for PDF, Office Files and More

MarkItDown is a powerful document conversion tool open-sourced by Microsoft that can convert various file formats including PDF, Office documents, and images to Markdown format. It also supports integration with AI models for intelligent image processing. This article will detail how to install and use this tool.

Key Features

Support for multiple file format conversions:
- PDF files (.pdf)
- PowerPoint presentations (.pptx)
- Word documents (.docx)
- Excel spreadsheets (.xlsx)
- Images (with EXIF metadata and OCR support)
- Audio (with EXIF metadata and transcription)
- HTML (special handling for Wikipedia and more)
- Other text formats (csv, json, xml, etc.)
Integration with OpenAI and other AI models for intelligent descriptions
Simple and easy-to-use API
Batch file processing support

Quick Start

1. Installation

Install using pip:

pip install markitdown

Or install from source:

pip install -e .

2. Dependency Configuration

Before using image processing features, you need to install and configure the following dependencies:

ExifTool Setup:
- Download ExifTool from ExifTool website
- Add ExifTool to system environment variables
- ExifTool is used for extracting image metadata
EasyOCR Installation:
- Install using pip: pip install -U easyocr
- EasyOCR is used for text recognition in images
Multimodal LLM Configuration:
- Proper mlm_client configuration is required for AI image description
- Supports OpenAI and other multimodal models

Note: Image conversion requires all three components working together:

ExifTool for metadata extraction
EasyOCR for OCR recognition
Multimodal LLM for intelligent descriptions

3. Basic Usage

Simplest way to use:

from markitdown import MarkItDown

# Create MarkItDown instance
markitdown = MarkItDown()

# Convert file
result = markitdown.convert("test.xlsx")
print(result.text_content)

3. Using AI Models for Image Processing

Integrate OpenAI for image descriptions:

from markitdown import MarkItDown
from openai import OpenAI

# Configure OpenAI client
client = OpenAI()

# Create AI-enabled MarkItDown instance
md = MarkItDown(mlm_client=client, mlm_model="gpt-4")

# Convert image file
result = md.convert("example.jpg")
print(result.text_content)

Environment Variables

If you’re using OpenAI functionality, set the API key:

export OPENAI_API_KEY=your_key

Developer Guide

1. Running Tests

Run tests using:

hatch shell
hatch test

2. Running Code Checks

pre-commit run --all-files

Use Cases

Document Indexing and Retrieval
- Convert various document formats to Markdown for indexing
- Support full-text search
Content Analysis
- Extract document structure and content
- Perform text analysis and processing
AI-Enhanced Processing
- Generate image descriptions using AI models
- Intelligent document content recognition
Batch Document Processing
- Handle large-scale document conversion tasks
- Maintain format consistency