Tech Explorer Logo

Search Content

MarkItDown: Microsoft AI-Powered Document Conversion Tool for PDF, Office Files and More

2 min read
Cover image for MarkItDown: Microsoft AI-Powered Document Conversion Tool for PDF, Office Files and More

MarkItDown is a powerful document conversion tool open-sourced by Microsoft that can convert various file formats including PDF, Office documents, and images to Markdown format. It also supports integration with AI models for intelligent image processing. This article will detail how to install and use this tool.

Key Features

  • Support for multiple file format conversions:
    • PDF files (.pdf)
    • PowerPoint presentations (.pptx)
    • Word documents (.docx)
    • Excel spreadsheets (.xlsx)
    • Images (with EXIF metadata and OCR support)
    • Audio (with EXIF metadata and transcription)
    • HTML (special handling for Wikipedia and more)
    • Other text formats (csv, json, xml, etc.)
  • Integration with OpenAI and other AI models for intelligent descriptions
  • Simple and easy-to-use API
  • Batch file processing support

Quick Start

1. Installation

Install using pip:

   pip install markitdown

Or install from source:

   pip install -e .

2. Dependency Configuration

Before using image processing features, you need to install and configure the following dependencies:

  1. ExifTool Setup:

    • Download ExifTool from ExifTool website
    • Add ExifTool to system environment variables
    • ExifTool is used for extracting image metadata
  2. EasyOCR Installation:

    • Install using pip: pip install -U easyocr
    • EasyOCR is used for text recognition in images
  3. Multimodal LLM Configuration:

    • Proper mlm_client configuration is required for AI image description
    • Supports OpenAI and other multimodal models

Note: Image conversion requires all three components working together:

  • ExifTool for metadata extraction
  • EasyOCR for OCR recognition
  • Multimodal LLM for intelligent descriptions

3. Basic Usage

Simplest way to use:

   from markitdown import MarkItDown

# Create MarkItDown instance
markitdown = MarkItDown()

# Convert file
result = markitdown.convert("test.xlsx")
print(result.text_content)

3. Using AI Models for Image Processing

Integrate OpenAI for image descriptions:

   from markitdown import MarkItDown
from openai import OpenAI

# Configure OpenAI client
client = OpenAI()

# Create AI-enabled MarkItDown instance
md = MarkItDown(mlm_client=client, mlm_model="gpt-4")

# Convert image file
result = md.convert("example.jpg")
print(result.text_content)

Environment Variables

If you’re using OpenAI functionality, set the API key:

   export OPENAI_API_KEY=your_key

Developer Guide

1. Running Tests

Run tests using:

   hatch shell
hatch test

2. Running Code Checks

   pre-commit run --all-files

Use Cases

  1. Document Indexing and Retrieval

    • Convert various document formats to Markdown for indexing
    • Support full-text search
  2. Content Analysis

    • Extract document structure and content
    • Perform text analysis and processing
  3. AI-Enhanced Processing

    • Generate image descriptions using AI models
    • Intelligent document content recognition
  4. Batch Document Processing

    • Handle large-scale document conversion tasks
    • Maintain format consistency

Resources

Share

Related Posts

No related posts yet