How to detect the checkbox status from a scanned PDF or image


Do you need to parse image or PDF files and get checkbox values in addition to text? Then OCR.space can help.

Check-box or radio button values are read by specialized OMR software, not OCR. OCR stands for Optical Character Recognition, and a checkmark is not a character, but a compound object consisting of base checkmark object and some kind of mark on top of it. OCR alone cannot provide a singe ASCII value for such combination to represent a checkmark. Thus a special OMR computer vision algorithms needs to compare empty and marked value, and generate 1 or 0, true or false Boolean type value. OMR stands for Optical Mark Recognition.

Checkmark Recognition (OMR)

A checkmark field is an element on a machine-readable form (usually rectangular in shape and often called a “check box”) in which a mark should be made (a check/tick, an X, a large dot, inking over, etc.).

Example forms

The image shows detected checkbox states on a 4-checkbox form. Note that the form is rotated, and the checkboxes are still detected correctly.

OMR Checkbox detection result
Checkbox detection result for a low-quality, rotated faxed or scanned receipt.

The next image below image shows detected checkbox states on a 10-checkbox form:

OMR Checkbox detection result
Another detection result - this time with ten checkboxes.

If the checkbox detection feature is enabled in your account, then the checkbox states are returned along with the regular OCR API response:

OMR Checkbox detection result
OCR and OMR API JSON response for the above 10 checkbox template.

OCR.space includes customizable checkbox detection. It can detect machine and handwritten marks of checkboxes from e. g. questionnaires. It is available as onlince OCR API and as offline OCR, with no Internet connection required. OCR.space can be customized to read and process checkmarks, the technical term for this is “Optical Mark Recognition” (OMR). The OCR.space OMR technology can recognizes simple checkmarks, square checkboxes, round checkbox and other types of checkboxes. The state of a checkmark can be "Selected" or "Not selected".

To get started using OCR.space for checkbox detection, please contact us at team AT ocr.space and send us a few samples of the type of checkboxes that you need to recognize. We can then tell how easy or difficult your checkbox status detection project is - at no cost for you.

View OCR API Performance
Our OCR Browser Extension
Open-Source RPA Software
Selenium IDE
Need to automate browser tasks like web testing or form filling? Check out our sister product Ui.Vision - a free and open-source RPA browser extension with over 100,000 users that leverages our computer vision and OCR.Space technology to power automation workflows.

Do you have an OCR API question? Please email us or visit the OCR API Forum - we love to answer OCR questions.