How to detect the checkbox status from a scanned PDF or image


Do you need to parse image or PDF files and get checkbox values in addition to text? Then OCR.space can help.

Check-box or radio button values are read by specialized OMR software, not OCR. OCR stands for Optical Character Recognition, and a checkmark is not a character, but a compound object consisting of base checkmark object and some kind of mark on top of it. OCR alone cannot provide a singe ASCII value for such combination to represent a checkmark. Thus a special OMR computer vision algorithms needs to compare empty and marked value, and generate 1 or 0, true or false Boolean type value. OMR stands for Optical Mark Recognition.

Checkmark Recognition (OMR)

A checkmark field is an element on a machine-readable form (usually rectangular in shape and often called a “check box”) in which a mark should be made (a check/tick, an X, a large dot, inking over, etc.).

Example forms

The image shows detected checkbox states on a 4-checkbox form. Note that the form is rotated, and the checkboxes are still detected correctly.

OMR Checkbox detection result
Checkbox detection result for a low-quality, rotated faxed or scanned receipt.

The next image below image shows detected checkbox states on a 10-checkbox form:

OMR Checkbox detection result
Another detection result - this time with ten checkboxes.

If the checkbox detection feature is enabled in your account, then the checkbox states are returned along with the regular OCR API response:

OMR Checkbox detection result
OCR and OMR API JSON response for the above 10 checkbox template.

OCR.space includes customizable checkbox detection. It can detect machine and handwritten marks of checkboxes from e. g. questionnaires. It is available as onlince OCR API and as offline OCR, with no Internet connection required. OCR.space can be customized to read and process checkmarks, the technical term for this is “Optical Mark Recognition” (OMR). The OCR.space OMR technology can recognizes simple checkmarks, square checkboxes, round checkbox and other types of checkboxes. The state of a checkmark can be "Selected" or "Not selected".

To get started using OCR.space for checkbox detection, please contact us at team AT ocr.space and send us a few samples of the type of checkboxes that you need to recognize. We can then tell how easy or difficult your checkbox status detection project is - at no cost for you.

View OCR API Performance
Follow OCR API on X/Twitter
Free Open-Source RPA Software
Copyfish OCR Browser Extension
Selenium IDE
Try UI.Vision RPA, our OCR-powered Robotic Process Automation (RPA) software. It is available as free browser extension as RPA Chrome and RPA Firefox (OSI-certified Open-Source) plus computer-vision extension modules. UI.Vision RPA is fun to use - and its OCR screen scraping features are powered by the OCR.space OCR API.

Do you have an OCR API question? Please email us or visit the OCR API Forum - we love to answer OCR questions.