In this article, we discuss how and when invoice capture software is a viable solution and can be used to eliminate manual data entry. We discuss in detail how invoice scanning software works in general and what methods lead to accurate data.
Why you want to invest in an Automated Invoice Scanning Software
The digitalization of the workplace continues its road to success. Over are the days of paper filing cabinets that used to occupy precious office space with the size of a little warehouse. Today’s business processes are mostly digitalized and printing documents became the exception and is no longer the default for business documents.
Most incoming invoices are nowadays received in digital formats (PDF, scanned image, photo, email, …). While this already streamlines the business process and makes physical storage capacity irrelevant, one major problem persists: How to leverage the data hidden in those digitized invoices? And how to easily access key data in invoices and move the data to where it belongs?
This is where automated Invoice Capture Software comes into play. Each invoice holds key data which is crucial for accounting, resource planning, and business intelligence applications. The data trapped in invoices need to be transferred to ERP, accounting, or data analytics systems.
An Invoice Capture Software (also called Invoice Scanning Software or Invoice Recognition Software) is basically an automated data entry solution tailored to the use case of invoices. It basically tries to recognize all key data fields in your invoices and returns easy to handle structured data. Once your PDF invoices are converted into something like an Excel spreadsheet, you can easily reuse the data in other applications. Over are the times of manually re-keying invoice data from PDF to your ERP system.
How accurate is an Invoice Scanner?
Converting PDF invoices to structured data formats (e.g. Excel) is still a challenging task for modern computer systems. Invoices come in various formats and, even though they follow a certain hierarchical logic, computer systems struggle to accurately extract fine-grained data points. Even though machine learning and artificial intelligence made huge progress in the last years, identifying complex patterns such as invoice line items stays a problem that is not entirely solved.
It’s important to get a realistic idea of when invoice recognition software can help you. Important questions to ask are for example:
- Are the invoices ‘real’ PDF documents or actually scanned images?
- Are the scanned images all perfectly aligned and have an overall good quality?
- Do you also need to process photos of receipts?
- Is extracting the key data (totals, date, vendor, …) enough or do you need line-item granularity?
By looking at the questions above, the big question which comes to mind is: How accurate and reliable are invoice scanning solutions?
The short answer: It really depends … For the long answer, keep on reading.
When automated invoice processing software is a viable solution
Automated invoice processing is not a ‘solved problem’. Even though there are still technical limitations, nowadays invoice scanning and processing solutions offer great results when your use-case falls into one of the two situations described below.
1) Scanning recurring invoices from a limited number of vendors and suppliers
For a lot of businesses, the majority of invoices are issued by a limited amount of suppliers. In some cases, businesses receive hundreds of invoices each month just from a handful of suppliers. This is especially true for brick & mortar businesses, eCommerce shops, as well as the food, shipping, and wholesale industry.
If you run a business with hundreds of recurring invoices, automated invoice processing is a great solution to streamline your workflow. It is quite easy to train invoice processing software (e.g. Docparser) to reliable recognize and extract data fields from a known document format.
When the format of an invoice is known, techniques like Optical Character Recognition (more precisely Zonal OCR) and keyword-based pattern matching can be applied which leads to accurate and reliable parsing results. This method makes heavy usage of the location of data points inside the documents.
Training an invoice OCR software basically means that you define at which location the key data fields are expected. Once trained, all future documents having the same layout will be recognized and the invoice processing software will automatically extract data in a fine-grained structured format for further use. Docparser offers an easy-to-use OCR invoice processing solution and a fully functional invoice parser can be set up in a couple of minutes.
Furthermore, this method makes it even possible to extract line-items from invoices. This means that you can not only extract metadata such as the invoice date, invoice number, and totals but also have detailed data about the merchandise included in an invoice. This is especially interesting when you want to feed fine-grained data into an ERP system or do some advanced number crunching.
The accuracy of data extraction for this method is near to perfect and there is no need for manual data validation (see below) in most cases. Check out our screencast below to get an idea of how to create an invoice parser with Docparser.
2) Extract metadata such as invoice date, invoice number, total, tax, … from a variety of unknown layouts
When you have hundreds or thousands of different invoice formats, training a computer system for each layout is not practical and another approach needs to be chosen.
Instead of training an invoice OCR scanning software based on the position of the data points, intelligent filters can be used which find specific data fields in variable locations. The way these filters work is by identifying entities such as numbers and then searching for typical keywords nearby. For example, the keyword ‘Total Due’ followed by a dollar amount would be considered as the invoice total.
The method of keyword-based extraction works really well for most metadata fields such as the totals (net, tax, total), the invoice date, and number. However, extracting line items presented in a table is working less reliable. This is due to the fact that line item tables come in different formats and contain different types of data.
If you want to process invoices from hundreds of different suppliers and you are OK with manually validating the extracted data, Docparser is the right tool for you.
When do invoice scanning solutions tend to fail?
As described above, invoice scanning solutions tend to fail when fine-grained table data (invoice items) is needed and the layout of the invoice is unknown at the same time. While many researchers are trying to approach this problem with artificial intelligence, the data accuracy is however still sub-optimal, to say the least.
That being said, nowadays solutions work best when either the invoice format is known or only metadata needs to be extracted. This limitation can, however, be bypassed by adding an additional layer of human data validation to the process.
Overcoming limitations of invoice data extraction with a hybrid model of automation and human validation
A common approach to overcome the limitations of automated invoice OCR systems is to choose a hybrid model. The way this method works is to let a computer system do the heavy lifting and then manually validate the extracted data.
Some invoice processing software solutions have a built-in ‘data validation’ interface which allows a human operator to quickly flip through all processed invoices and either validates or correct the parsed data.
Are there alternatives to automated invoice scanning?
To circumvent the issues connected with invoice scanning and accounts payable automation, a standard called Electronic Data Interchange (EDI) was introduced over 30 years ago. The idea was simple: instead of exchanging invoices in a human-readable format, transactional data gets automatically transferred from company A to company B in a machine-readable format. By letting machines talk to each other directly, the necessity of manual validation or hand typing is eliminated. While EDI certainly had its fair share of success in larger organizations, the reality is that most SMEs are still receiving invoices in paper or PDF format and are desperately looking for alternatives to EDI.
Another alternative is manually re-keying invoice data.
Manual data entry can be done in-house if you only need to process a couple of invoices per month. If you need to process hundreds of invoices though and don’t want to use invoice capture software, you should think about outsourcing the task. For sure outsourcing comes with other issues such as data security, processing time, and the overhead of finding a service provider.
To sum it up …
Whether or not your invoice automation project will become a success or a source of frustration heavily depends on your use-case and the solution you choose.
We hope this article gave you a good overview of the invoice capture software category and helped you in deciding which route to take for your business.
Please let us know your thoughts in the comments or reach out by email if you want to discuss your invoice automation needs.
Currently we have a requirement where we receive PDF invoices as soft copy from vendors and would need to parse/read them to identify the key fields and which are required to create invoice in SAP system.
We need an automated software which can perform above parsing without manual intervention and export required fields to an excel/text file.
Is it possible to achieve above using docparser?
Hi Sravanthi! Thanks for reaching out and the great question. What you describe is exactly what Docparser does! If you like, please create a free trial and click through the setup guide. And if you have any questions, just reach out to our support team.
Hi…will this work effectively on a scanned document? Also, are there options available to check on the “correctness” of the data extracted by matching it to a given sample?
Hi JJ! Thanks for the great question! Yes, Docparser works also with scanned documents if they are scanned with a reasonable quality. We offer a built-in OCR and have advanced filters in place to improve OCR accuracy. We do offer a “confidence” level for things like the invoice totals. But there is no data validation interface yet which you could use to validate the extracted data of each invoice.
I would like to receive the vendor invoices to a dedicated email id and save the invoices automatically in the repository as per the plant location wise and vendor wise. The vendor should be identified by the name of the invoice and the plant location should be identified by the billing address present in the invoice. Based on this two factors: plant location first and then vendor name second, the invoice should be save automatically to a repository.
Could you tell me how to achieve this?
Hi Rajamahender, thanks a lot for reaching out and your interest in Docparser! What you write sounds definitely like something we can help you with. Docparser gives you a dedicated email ID to which you can send your invoices. You can then build parsing rules to get all your data. As a final step, you can use one of our integration partners to store your file in your cloud storage under a specific name. I would suggest to create a free trial account and contact our support team if you have any trouble setting up your account.
Hi, we are an accounting firm interested in these kinds of solutions. Is it possible to configure docparser for reading documents in Swedish?
Hi Dennis, thanks a lot for reaching out and your interest in Docparser! Absolutely, Docparser supports a variety of languages, including Swedish.
We are thinking of starting to scan different documents such as Sales Orders, Work Orders, Invoices and Purchase Orders. What we have now will store a PDF file on our server with a random generated name (i.e. BRN3C2AF441C4FC_000002.pdf) from the scanner. Can this software read the PDF and then rename it “Invoice XXXXXX.pdf” ,”Sales Order XXXXXX.pdf” , “Work Order XXXXX.pdf”, with the XXXXX begging the invoice, sales or work order number the software recognizes? At the moment we don’t need it to read the entirety of the document just recognize what it is and rename it accordingly as a PDF. Thanks
Hi Kim, thanks for the great question! At this point, Docparser does not support renaming of files unfortunately. You can however achieve this functionality by connecting Docparser to one of our integration partners (Zapier, MS Flow, Workato, …) which will then let you rename your files based on the data extracted by Docparser.
a. I would like to have the output of an Invoice as a JSON doc is this something that docParser would do?
b. would docParser do validations like QTY=2 and PRICE=$5.00 so Total is $10.00 and not $18.00 when the confidence level of the 10,00 is not so high and could be maybe an 8
Hi Yontal, thanks for the great questions! Yes, Docparser can convert an invoice document (PDF or scanned invoice) into JSON data. We also provide basic validation logic for invoices, but we are not going as far as summing up all line-items to validate the total value. Docparser is more like a “data extraction API” and further business logic would need to be implemented by you if needed.
a. How would docParser act in a case it is not sure about the of the accuracy of the OCR results? could the API return the confidence level of a value? I know in Invoice Total preset there is a “confidence” level can I do something the same with my own Preset.
b. Why not have the “Invoice Total ” and “Invoice Number” etc public that we can modify it.
I would appreciate if I can get an email when there is a response.
it took 5 days the last time I asked and have no way to know.
As a hotel, we generate thousands of restaurant check invoices each day. We are looking for a system whereby the signed receipts can be scanned and then filed for easy access by a third party. Filing would need to be done by date and time.
Hi there, thanks for reaching out! Our app was built to extract data from documents you send us and then make that data available elsewhere. We have many customers parsing receipts and sending them to a cloud storage endpoint like a Google Drive:
I would recommend creating a free account (no credit card required) and letting us know if you have any questions or run into trouble getting set up at [email protected]!
I have a question. In India, still GST and Income Tax departments rules says original tax invoice required to claim the expenses and GST input credit. soft copy of the invoice can be considered for accounting but how do we manage the compliance of hard copy of the invoice?
Hi Sanoop, thanks for reaching out! I’m having a bit of trouble understanding your question – our app was built to extract data locked inside of invoices and send that data elsewhere. We do have some capabilities for forwarding on the original invoice to wherever you need:
If you have any other questions please let us know at [email protected]arser.com!