Mistral AI has introduced OCR 4, a new model that extracts text from documents such as PDFs, Word, and PowerPoint files.
Unlike previous versions, OCR 4 not only provides the plain text but also indicates where each element is located on the page and what role it plays—for example, whether it is a title, a table, an equation, or a signature. This so-called block classification helps automatically structure documents in a meaningful way so they can, for example, be fed into search systems or processed by AI agents. Additionally, the model outputs confidence scores—an estimate of how certain it is about the recognition of individual words or pages.
OCR 4 supports 170 languages and, according to Mistral, performs well even with less commonly spoken languages. In a blind test involving over 600 documents, independent evaluators preferred its results over those of the competition in 72 percent of cases, Mistral reports. The model is available via the API, Mistral Studio, and Microsoft Foundry and costs $4 per 1,000 pages, or $2 in batch mode.
Mistral AI is not owned by any single large corporation but is an independent French startup. Control and ownership are shared among the founders, a French consortium, and international investors: The founders—Arthur Mensch (CEO), Guillaume Lample, and Timothée Lacroix—continue to hold significant stakes in the company they founded in 2023.
Financial investors: Key investors in several funding rounds include Andreessen Horowitz, Lightspeed Venture Partners, General Catalyst, Bpifrance (the French development bank), and the MGX fund from Abu Dhabi.
Strategic partners: The Dutch chip equipment manufacturer ASML and the U.S. hardware manufacturer Nvidia also hold stakes in Mistral. Note: Microsoft also holds a small stake in Mistral; however, according to official figures, it holds less than 1% of the company’s shares.

