Zonal Optical Character Recognition (OCR), also sometimes referred to as Template OCR, is a technology used to extract text located at a specific location inside a scanned document. In this article, we’ll explain how Zonal OCR works and how it can be used to automate data-entry workflows.
Most of today’s document and PDF scanning offer out of the box Optical Character Recognition (OCR) capabilities which convert your scanned images (JPG, PNG, or TIFF files) into searchable and editable PDF documents. In some cases, a simple OCR system is however not enough and you need to level up your game. For example, if you are not interested in the whole text of a document, but rather want to pull certain text elements that are located at specific positions.
This is when a technology called “Zonal OCR” (also referred to as Template OCR) comes into play. Zonal OCR basically allows us to extract only the important data fields from a scanned document and store the extracted values in a structured database. One popular use case for Zonal OCR is to convert PDF to Excel or Automated Invoice Processing.
How does Zonal OCR software work?
First, let’s talk a bit about what the term actually means. You probably already read about OCR and how it is used to convert scanned documents into searchable and editable documents. But having the whole text of the document accessible is only the first step.
Zonal or Template OCR goes one step further. Instead of only converting your scanned images into text, a software system can be trained to understand the structure and hierarchy of your document. By defining “zones”, it is possible to teach a zone-based OCR system to distinguish certain data fields from each other.
Let’s imagine your business receives hundreds of purchase orders or sales orders every week. Thanks to a consistent layout, it’s easy to teach a Zonal OCR system where certain data fields can be found. More advanced systems like Docparser can apply PDF data extractions for various layouts, for example in the case of invoice OCR processing.
What is Zonal OCR?
To sum it up: Zonal OCR is a special type of Optical Character Recognition which extracts only certain text data fields from a document. The extraction is based on “zones” which are defined by the user prior to scanning. Your
Training your software
Training a Zonal OCR system basically means to define where all data fields can be found inside a document. This process needs to be done only once and the locations (zones) of the data fields are then saved in a template.
Once you trained your system properly, the zone templates can be used for scanning further documents. And this is where Zonal OCR really shines.
Big batches of documents having the same layout can be processed in a snap once the system was trained properly. Need to extract client names and reference numbers from hundreds of quotes, purchase orders or sales orders? No problem at all. Once you set up your master template, all you need to do is to feed more documents to the system.
How Docparser can help
Setting up a zonal or template OCR system is straightforward in most cases. As the data extraction is based on the location inside the document, most solutions offer a visual “zone definition” process. The screenshot below shows the setup process of Docparser. All the user has to do is to draw a square around the area where the data field is located.
This process is then repeated for each data field that the user wants to extract. In a typical scenario, the user needs to define a handful of zones which will then result in the equivalent number of extracted data fields.
Where Zonal OCR tends to fail
Most Zonal OCR systems are purely location-based. The advantage of such systems is that the setup is very easy. As mentioned above, the user only needs to draw a rectangle (zone) around a specific area and the setup is done.
This covers however only a subset of cases. In reality, extracting data from semi-structured documents is a bit more complex. To give you a better picture, let’s look at some examples. The following cases can not be handled by a simple Zonal OCR system:
- Extracting compound data fields (e.g. First + Last Name, Postal Address, …)
- Repeating data fields (e.g. Multiple product numbers, …)
- Table data
- Data fields with variable positions (e.g. Invoice totals, ..)
This is why Docparser offers a powerful set of features that goes beyond the capabilities of a classic Zonal OCR system. By offering sophisticated tools, Docparser compensates perfectly for the shortcomings of classic systems. Create a free account today and give it a try!