The Best Online OCR Software for Converting Images to Text

Reading robots.... What is the best free online OCR tool?

Update May 1, 2015: (a9t9) launched its very own free and open-source Online OCR service - try it out and let us know how it compares.

What is the best free optical character recognition (OCR) service to convert (text in) images to plain, editable text? This review compares the recognition accuracy of free and commercial cloud OCR offerings. No-name OCR beats Google Docs OCR is just one of the surprising test results.

Usability, speed, formatting, non-English language support are not rated. This market overview is all about finding the best online service to convert images to plain text.

Six documents that are gradually more difficult to recognize serve as OCR benchmark: A screenshot, two scans, a mobile phone camera picture and, as highlight, text of an XKCD comic and readings from the image of a gas meter. The test results were not what you would expect. Read more about the five OCR surprises.

OCR Test 1: Recognize text in screenshot

A New York Times article served as benchmark for OCR recognition quality in tests 1-4.
You find the original input images by clicking on the preview image in the table headers - or see the reference section below.

The first four tests use the New York times article as input. In the first round, the OCR input is a screenshot of the article – the image quality can not get better. And as the newspager name suggests the font is straight forward -Times New Roman.

Test 1: Screenshot OCR

OCR Service	Result	Output (Excerpt)
Abbyy Cloud SDK	100%	Internet start-up here
Google Docs OCR	100%	Internet start-up here
OnlineOCR	Good	Internet start-up here
i2 OCR	Good	Internet start-up here
FreeOnlineOCR	Good	Internet start-up here
Tesseract Online	Fail	Intemel SK-3!’l—.\p hereiii
AntiMatter.Js	Fail	IIllelnel _hl _p he_P

The purpose this first easy task was to wheat out any essentially not working online OCR services. Two services flunked this test: Orcad.js and – Tesseract.

What? Tesseract???

Surprise #1: Yes, Tesseract flunked this test. That is a huge surprise. According to Google “Tesseract is probably the most accurate open source OCR engine available.” This makes me feel a bit like an elementary school teacher who has to give bad grades to the son of the major - so I first blamed the online version of Tesseract (at CustomOCR) that I am using. I doubled checked the result with PDF OCR X, a Windows/Mac tool that wraps the Tesseract-OCR engine. The result is not as bad as in the Tesseract online demo, but still poor. If someone can explain the bad result, I would be very interested to hear!

Disappoint results with Tesseract all over. This image shows the verification of the OCR result
with PDF OCR X, a desktop OCR software that uses the Tesseract engine.

OCR Test 2: High-Quality Scan

Test setup: A printout of the NY Times article was scanned at a resolution of 100dpi. As some services do not take PDF format as input, the JPEG (JPG extension) format is used as the lowest common denominator in all tests.

Test 2: High-quality Scan OCR

OCR Service	Result	Output (Excerpt)
Abbyy Cloud SDK	100%	BEIJING — Jing Yuechen,
Google Docs OCR	Good	BEIJINGJing Yuechen,
OnlineOCR	100%	BEIJING — Jing Yuechen, the
i2 OCR	Good	BEIJING — J ing Yuechen,
FreeOnlineOCR	Good	BEIJING - Jing Yuechen~
Tesseract Online	Good	BEIJING — Jing Yuechen,

No big surprises here, OCR on high-quality scans worked ok with all services.

OCR Test 3: Low -Quality Scan

Test setup: The printout was scanned at a resolution of 75 dpi, printed out again and scanned a second time at 75dpi. The result is a low quality scan. Nevertheless the image still easily readable for humans.

Test 3: Low-quality Scan OCR

OCR Service	Result	Output (Excerpt)
Abbyy Cloud SDK	Good	BEIJING — Jing Yucchen,
Google Docs OCR	Poor	EEN; – ing Yucchen, ,
OnlineOCR	Good	BEIJING - ding Yuechen,
i2 OCR	Poor	BI-‘.IJtNG — Jing Yucchcn,
FreeOnlineOCR	Fail	Bta t0 mt 21 the e
Tesseract Online	Fail	BEIJING — -1 ing Yuodzcn,

Surprise #2: Google OCR did worse than the “noname” OnlineOcr service (and Abbyy)! My clear expectation at the start of this market research was that Google’s OCR service will set the gold standard. After all, Google is the company that builds self-driving cars and won the ImageNet Visual Recognition Challenge last year.

OCR Test 4: Smartphone Camera

Your camera as a document scanner? That is possible in 2015. For this test the meanwhile well-known NY times article was printed out, and then folded/unfolded. The result is a creased and slightly wrinkled printout. This makes the task more realistic in a reproducible way. Usually you are not in a photo studio when archiving your creased receipts or coffee-stained invoices.

Test 4: Mobile Phone Image OCR

OCR Service	Result	Output (Excerpt)
Abbyy Cloud SDK	Poor	rely heavily on leu-fetierea
Google Docs OCR	Poor	rely muneinterne - anorameror
OnlineOCR	Poor	rely heavily on lass-fettereir
i2 OCR	Poor	rely hcavilyon less-fctt
FreeOnlineOCR	Fail	(no output)
Tesseract Online	Fail	7- \5g@	ff‘<. - nu-lain

None of the OCR engines did well in this benchmark, plenty of room for improvement. No longer a surprise: Google OCR did again worse then the free OnlineOCR service and Abbyy’s commercial OCR solution.

OCR Test 5: Recognize an XKCD comic (tricky font)

XKCD comics are fun – and not difficult to read for humans (just sometimes difficult to understand?). Can robots read them better?

Test 5: Read a XKCD comic

OCR Service	Result	Output (Excerpt)
Abbyy Cloud SDK	Fail	nrs udr d hcu 111 oaismiY
Google Docs OCR	Fail	ITS UERD HOUTM (DNSIANTY
OnlineOCR	Poor	115 WEIRD HOW rM CONSTANTLY
i2 OCR	Fail	fT5UElRD HOlJD’1CON$I’PNlLY
FreeOnlineOCR	Fail	(no output)
Tesseract Online	Fail	TFSUEIKP HOUI‘r’l0)NSWlTLY

Surprise #4: While OnlineOCR received a passing grade with its “poor” result, all other services failed completely. Amazing! In other words: XKCD speech bubbles are so difficult to decode, they could be used as CAPTCHAs :)

OCR Test 6: Recognize a number in an image

This is not your typical OCR task. Number recognition inside an image is not easy. After all, Google uses house numbers in its reCAPTCHA service.

Google's reCAPTCHA service with house numbers.
OCR Test 6 was easier: The image contained the number in horizontal orientation and with no disturbing background.

OCR number recognition has many applications: There are billions of of analog and digital gauges in houses and industrial settings that have no API or user interface and are not easily replaced. Reading them via taking an image and running OCR on it is often the an economical solution. The input image number used in this test is a high-quality mobile phone image of a gas meter. The image was cropped down so it just contains the reading.

Test 6: Number OCR

OCR Service	Result	Output (Excerpt)
Abbyy Cloud SDK	Good	0491024,3
Google Docs OCR	Fail	(no output)
OnlineOCR	Fail	0 6 4 0
i2 OCR	Fail	01} § 7 6,2 F.
FreeOnlineOCR	Fail	(no output)
Tesseract Online	Fail	f11l§7ﬁ,2Z?3§-

Surprise #5: Every service completely flunked this test except Abbyy’s FineReader engine, who delivered a good result.

Summary & Online OCR Converter Ranking:

The following tables summarizes the results and ranks the services.

Ranking	Score	Screen	Scan1	Scan2	Mobile	XKCD	Number
Abbyy Cloud SDK	10	++	++	+	0	-	+
OnlineOCR	9	++	++	+	0	0	-
Google Docs	7	++	+	0	0	-	-
i2OCR	6	+	+	0	0	-	-
FreeOnlineOCR	4	+	+	-	-	-	-
Tesseract	2	-	+	-	-	-	-
Ocrad.JS	0	-	-	-	-	-	-

The clear winner is Abbyy’s commercial service. The “no-name” OnlineOCR website ranks #2, before Google OCR, which comes in only at the third place. i2OCR works also fine (albeit slow). All other services can not be recommended.

References:

Reviewed online OCR services:

OCR Test images:

Screenshot
High-quality scan
Low-quality scan
Smartphone image
XKCD speech bubble
Gas-meter reading
All test images are available on GitHub: OCR Benchmark