본문으로 건너뛰기
All news

Mistral Launches OCR 4: Self-Hosted Document AI in 170 Languages

Summary: Mistral AI released OCR 4 on June 23, a document intelligence model covering 170 languages that extracts paragraph-level structured text — bounding boxes included — and ships as a single self-hosted container, keeping sensitive documents entirely on-premises.

Key Facts

  • Supports 170 languages with paragraph-level bounding boxes alongside extracted text — useful for multi-column PDFs, forms, and scanned documents
  • Deploys as a single container on private infrastructure; no data leaves the enterprise environment
  • Outputs are citation-ready structured JSON, designed to plug directly into RAG pipelines, agentic workflows, and enterprise search systems
  • Targets regulated sectors — finance, healthcare, legal — where sending documents to third-party cloud APIs is restricted or prohibited

Why It Matters

Cloud OCR APIs from Google, Microsoft, and AWS require data to leave the enterprise, which is a non-starter for many regulated industries. Mistral OCR 4 offers comparable extraction quality fully air-gapped. It's Mistral's clearest play yet at the enterprise segment that benefits most from open, self-hostable models — and signals the company is building a full document-intelligence stack, not just language models.

Further Reading

뉴스레터 구독

무료 뉴스레터

매주 핵심 AI 소식, 한 번에 받기

쏟아지는 AI·LLM 뉴스 중 꼭 알아야 할 것만 골라 메일로 보내드려요. 뉴스레터 발송이 시작되면 구독자분들께 가장 먼저 보내드립니다.