Microsoft updates free OCR library, adds four more languages

In February Microsoft announced that the Windows 10 November update enables OCR support for four new languages, bringing the total number of supported languages to 25.

While they did not list what languages they added, they mentioned that this OCR technology is the same that is “used in major products like Word, OneNote, OneDrive, Bing, Office Lens, andTranslator for various scenarios, including image indexing, document reconstruction, and augmented reality.”

They also confirmed that “the same technology is released as part of Project Oxford …compared to the Windows.Media.Ocr namespace, the service has additional features such as language detection and text orientation detection”. Or positively formulated, the Microsoft ocr dll offers the same “Project Oxford” level OCR recognition quality for free.

So what **really **changed with the Win 10 November update?

  • Four new languages added: Romanian, Serbian Cyrillic, Serbian Latin, Slovak
  • The recognition quality did not change. The OCR benchmark suite for English gives exactly the same the result (exactly as in: totally the same, including exactly the same errors).

I am not a language expert by any means, but it seems all four newly added languages are rather similar. I hope they work on adding other major languages. If the feedback we get from our free OCR API is any indication, Arabic is especially missed.

So while the November update is not too impressive, the OCR library itself is continues to be the best free OCR engine in the market.