Azure VM Performance for OCR

In Fall 2015, Microsoft Azure released its Dv2-Series Infrastructure as a Service (IaaS), with  “with compute processors that are approximately 35 percent faster than the  D-Series VM sizes” (Microsoft).  The Dv2-series are based on the latest generation 2.4 GHz Intel Xeon E5-2673 v3 Haswell processor. But what is the best option for OCR?

Azure is great, and Amazon AWS is also great. But if you need access to Windows 2016 servers at this point in time, the Azure cloud is your only option outside unsupported custom OS configurations at other providers. For example, is no AWS EC2 Windows 2016 Server instance yet.

So what is the best Azure machine type for document processing with the Microsoft OCR library?

To find out, I created a test suite of 40 images of different size and difficulty, similar to the OCR benchmark test images. Here are the test results:

Azure VM Tier Cores Processing time
A0 1 3:56
A1 1 1:03
A2 2 1:02
A8 8 0:48
D1_v2 1 0:18
D2_v2 2 0:18

With this data, the findings are pretty obvious:

  • the free Microsoft OCR DLL utilizes only one CPU core.
  • the faster the CPU, the better.
  • Memory does not matter
The best deal is D1_v2 ($0.14/hr, ~$104/mo) . It is about twice as expensive as A1 (currently $0.077/hr, ~$57/mo) but four times as fast. So it is currently the best option. 
All tests were performed on Windows 2016 Server (Tech Preview) with our open-source [Free OCR Software](http://blog.a9t9.com/p/free-ocr-software.html), which uses Microsoft OCR inside.