IBM Research's SmolDocling, a 256M-parameter vision-language model, delivers fast document OCR and multimodal processing at 0.35s per page on consumer GPUs, handling text, formulas, code and charts efficiently.
A detailed tutorial on how to use MinerU, including online experience and local deployment methods. Supports extracting text, images, tables, and mathematical formulas from PDF documents, suitable for academic research, data analysis, and more.