In this article, we discuss how and when invoice capture software is a viable solution and can be used to eliminate manual data entry. We discuss in detail how invoice scanning software works in general and what methods lead to accurate data.
Why you want to invest in automated invoice scanning software
The digitalization of the workplace continues its road to success. Over are the days of paper filing cabinets that used to occupy precious office space with the size of a little warehouse. Today’s business processes are mostly digitalized and printing documents became the exception and is no longer the default for business documents.
Most incoming invoices are nowadays received in digital formats (PDF, scanned image, photo, email, …). While this already streamlines the business process and makes physical storage capacity irrelevant, one major problem persists: How to leverage the data hidden in those digitized invoices? And how to easily access key data in invoices and move the data to where it belongs?
This is where automated Invoice Capture Software comes into play. Each invoice holds key data which is crucial for accounting, resource planning, and business intelligence applications. The data trapped in invoices need to be transferred to ERP, accounting, or data analytics systems.
An Invoice Capture Software (also called Invoice Scanning Software or Invoice Recognition Software) is basically an automated data entry solution tailored to the use case of invoices. It basically tries to recognize all key data fields in your invoices and returns easy to handle structured data. Once your PDF invoices are converted into something like an Excel spreadsheet, you can easily reuse the data in other applications. Over are the times of manually re-keying invoice data from PDF to your ERP system.
How accurate is an invoice scanner?
Converting PDF invoices to structured data formats (e.g. Excel) is still a challenging task for modern computer systems. Invoices come in various formats and, even though they follow a certain hierarchical logic, computer systems struggle to accurately extract fine-grained data points. Even though machine learning and artificial intelligence made huge progress in the last years, identifying complex patterns such as invoice line items stays a problem that is not entirely solved.
It’s important to get a realistic idea of when invoice recognition software can help you. Important questions to ask are for example:
- Are the invoices ‘real’ PDF documents or actually scanned images?
- Are the scanned images all perfectly aligned and have an overall good quality?
- Do you also need to process photos of receipts?
- Is extracting the key data (totals, date, vendor, …) enough or do you need line-item granularity?
By looking at the questions above, the big question which comes to mind is: How accurate and reliable are invoice scanning solutions?
The short answer: It really depends … For the long answer, keep on reading.
When automated invoice processing software is a viable solution
Automated invoice processing is not a ‘solved problem’. Even though there are still technical limitations, nowadays invoice scanning and processing solutions offer great results when your use-case falls into one of the two situations described below.
1) Scanning recurring invoices from a limited number of vendors and suppliers
For a lot of businesses, the majority of invoices are issued by a limited amount of suppliers. In some cases, businesses receive hundreds of invoices each month just from a handful of suppliers. This is especially true for brick & mortar businesses, eCommerce shops, as well as the food, shipping, and wholesale industry.
If you run a business with hundreds of recurring invoices, automated invoice processing is a great solution to streamline your workflow. It is quite easy to train invoice processing software (e.g. Docparser) to reliable recognize and extract data fields from a known document format.
When the format of an invoice is known, techniques like Optical Character Recognition (more precisely Zonal OCR) and keyword-based pattern matching can be applied which leads to accurate and reliable parsing results. This method makes heavy usage of the location of data points inside the documents.
Training an invoice OCR software basically means that you define at which location the key data fields are expected. Once trained, all future documents having the same layout will be recognized and the invoice processing software will automatically extract data in a fine-grained structured format for further use. Docparser offers an easy-to-use OCR invoice processing solution and a fully functional invoice parser can be set up in a couple of minutes.
Furthermore, this method makes it even possible to extract line-items from invoices. This means that you can not only extract metadata such as the invoice date, invoice number, and totals but also have detailed data about the merchandise included in an invoice. This is especially interesting when you want to feed fine-grained data into an ERP system or do some advanced number crunching.
The accuracy of data extraction for this method is near to perfect and there is no need for manual data validation (see below) in most cases. Check out our screencast below to get an idea of how to create an invoice parser with Docparser.
2) Extract metadata such as invoice date, invoice number, total, tax, … from a variety of unknown layouts
When you have hundreds or thousands of different invoice formats, training a computer system for each layout is not practical and another approach needs to be chosen.
Instead of training an invoice OCR scanning software based on the position of the data points, intelligent filters can be used which find specific data fields in variable locations. The way these filters work is by identifying entities such as numbers and then searching for typical keywords nearby. For example, the keyword ‘Total Due’ followed by a dollar amount would be considered as the invoice total.
The method of keyword-based extraction works really well for most metadata fields such as the totals (net, tax, total), the invoice date, and number. However, extracting line items presented in a table is working less reliable. This is due to the fact that line item tables come in different formats and contain different types of data.
If you want to process invoices from hundreds of different suppliers and you are OK with manually validating the extracted data, Docparser is the right tool for you.
When do invoice scanning solutions tend to fail?
As described above, invoice scanning solutions tend to fail when fine-grained table data (invoice items) is needed and the layout of the invoice is unknown at the same time. While many researchers are trying to approach this problem with artificial intelligence, the data accuracy is however still sub-optimal, to say the least.
That being said, nowadays solutions work best when either the invoice format is known or only metadata needs to be extracted. This limitation can, however, be bypassed by adding an additional layer of human data validation to the process.
Overcoming limitations of invoice data extraction with a hybrid model of automation and human validation
A common approach to overcome the limitations of automated invoice OCR systems is to choose a hybrid model. The way this method works is to let a computer system do the heavy lifting and then manually validate the extracted data.
Some invoice processing software solutions have a built-in ‘data validation’ interface which allows a human operator to quickly flip through all processed invoices and either validates or correct the parsed data.
Are there alternatives to automated invoice scanning?
To circumvent the issues connected with invoice scanning and accounts payable automation, a standard called Electronic Data Interchange (EDI) was introduced over 30 years ago. The idea was simple: instead of exchanging invoices in a human-readable format, transactional data gets automatically transferred from company A to company B in a machine-readable format. By letting machines talk to each other directly, the necessity of manual validation or hand typing is eliminated. While EDI certainly had its fair share of success in larger organizations, the reality is that most SMEs are still receiving invoices in paper or PDF format and are desperately looking for alternatives to EDI.
Another alternative is manually re-keying invoice data.
Manual data entry can be done in-house if you only need to process a couple of invoices per month. If you need to process hundreds of invoices though and don’t want to use invoice capture software, you should think about outsourcing the task. For sure outsourcing comes with other issues such as data security, processing time, and the overhead of finding a service provider.
To sum it up …
Whether or not your invoice automation project will become a success or a source of frustration heavily depends on your use-case and the solution you choose.
We hope this article gave you a good overview of the invoice capture software category and helped you in deciding which route to take for your business.
Please let us know your thoughts in the comments or reach out by email if you want to discuss your invoice automation needs.