Azure VM Performance for OCR
In Fall 2015, Microsoft Azure released its Dv2-Series Infrastructure as a Service (IaaS), with “with compute processors that are approximately 35 percent faster than the D-Series VM sizes” (Microsoft). The Dv2-series are based on the latest generation 2.4 GHz Intel Xeon E5-2673 v3 Haswell processor. But what is the best option for OCR?
Azure is great, and Amazon AWS is also great. But if you need access to Windows 2016 servers at this point in time, the Azure cloud is your only option outside unsupported custom OS configurations at other providers. For example, is no AWS EC2 Windows 2016 Server instance yet.
So what is the best Azure machine type for document processing with the Microsoft OCR library?
To find out, I created a test suite of 40 images of different size and difficulty, similar to the OCR benchmark test images. Here are the test results:
Azure VM Tier | Cores | Processing time |
---|---|---|
A0 | 1 | 3:56 |
A1 | 1 | 1:03 |
A2 | 2 | 1:02 |
A8 | 8 | 0:48 |
D1_v2 | 1 | 0:18 |
D2_v2 | 2 | 0:18 |
With this data, the findings are pretty obvious:
- the free Microsoft OCR DLL utilizes only one CPU core.
- the faster the CPU, the better.
- Memory does not matter