MarkItDown: Microsoft AI-Powered Document Conversion Tool for PDF, Office Files and More
MarkItDown is a powerful document conversion tool open-sourced by Microsoft that can convert various file formats including PDF, Office documents, and images to Markdown format. It also supports integration with AI models for intelligent image processing. This article will detail how to install and use this tool.
Key Features
- Support for multiple file format conversions:
- PDF files (.pdf)
- PowerPoint presentations (.pptx)
- Word documents (.docx)
- Excel spreadsheets (.xlsx)
- Images (with EXIF metadata and OCR support)
- Audio (with EXIF metadata and transcription)
- HTML (special handling for Wikipedia and more)
- Other text formats (csv, json, xml, etc.)
- Integration with OpenAI and other AI models for intelligent descriptions
- Simple and easy-to-use API
- Batch file processing support
Quick Start
1. Installation
Install using pip:
pip install markitdown
Or install from source:
pip install -e .
2. Dependency Configuration
Before using image processing features, you need to install and configure the following dependencies:
-
ExifTool Setup:
- Download ExifTool from ExifTool website
- Add ExifTool to system environment variables
- ExifTool is used for extracting image metadata
-
EasyOCR Installation:
- Install using pip:
pip install -U easyocr
- EasyOCR is used for text recognition in images
- Install using pip:
-
Multimodal LLM Configuration:
- Proper mlm_client configuration is required for AI image description
- Supports OpenAI and other multimodal models
Note: Image conversion requires all three components working together:
- ExifTool for metadata extraction
- EasyOCR for OCR recognition
- Multimodal LLM for intelligent descriptions
3. Basic Usage
Simplest way to use:
from markitdown import MarkItDown
# Create MarkItDown instance
markitdown = MarkItDown()
# Convert file
result = markitdown.convert("test.xlsx")
print(result.text_content)
3. Using AI Models for Image Processing
Integrate OpenAI for image descriptions:
from markitdown import MarkItDown
from openai import OpenAI
# Configure OpenAI client
client = OpenAI()
# Create AI-enabled MarkItDown instance
md = MarkItDown(mlm_client=client, mlm_model="gpt-4")
# Convert image file
result = md.convert("example.jpg")
print(result.text_content)
Environment Variables
If you’re using OpenAI functionality, set the API key:
export OPENAI_API_KEY=your_key
Developer Guide
1. Running Tests
Run tests using:
hatch shell
hatch test
2. Running Code Checks
pre-commit run --all-files
Use Cases
-
Document Indexing and Retrieval
- Convert various document formats to Markdown for indexing
- Support full-text search
-
Content Analysis
- Extract document structure and content
- Perform text analysis and processing
-
AI-Enhanced Processing
- Generate image descriptions using AI models
- Intelligent document content recognition
-
Batch Document Processing
- Handle large-scale document conversion tasks
- Maintain format consistency
Resources
Related Posts
No related posts yet