There are several players in the data extraction market. Many of whom not only claim to do a great job but also deliver great results. SimpleIndex is one of them. Docparser and SimpleIndex do a relatively similar job and in this regard, we can say that Docparser is an alternative to SimpleIndex. How? Let’s find out. Continue reading “SimpleIndex vs Docparser: Is Docparser A SimpleIndex Alternative?”
A major problem that many businesses face today is the inability to leverage data which is trapped inside scanned documents and images. Whenever a business relies on data which is trapped inside paper documents, manually re-keying the data can quickly become a bottleneck and harm the business. Continue reading “Extract Data From Scanned Documents And Images”
Optical Character Recognition (OCR) technology got better and better over the past decades thanks to more elaborated algorithms, more CPU power and advanced machine learning methods. Getting to OCR accuracy levels of 99% or higher is however still rather the exception and definitely not trivial to achieve.
At Docparser we learned how to improve OCR accuracy the hard way and spent weeks on fine-tuning our OCR engine. If you are in the midst of setting up an OCR solution and want to know how to increase the accuracy levels of your OCR engine, keep on reading … In this article, we cover different techniques to improve OCR accuracy and share our takeaways from building a world-class OCR system for Docparser. Continue reading “Improve OCR Accuracy With Advanced Image Preprocessing”
If you already worked in an office equipped with a document scanner, you probably stumbled more than once on the expression Optical Character Recognition (OCR). But what is OCR and what is it used for? This article explains what OCR means and covers the most popular use cases. Continue reading “What Is OCR And What Is It Used For?”
PDF is here to stay. In today’s work environment, PDF became ubiquitous as a digital replacement for paper and holds all kind of important business data. But what are the options if you want to extract data from PDF documents? Manually rekeying PDF data is often the first reflex but fails most of the time for a variety of reasons. In this article we talk about PDF data extraction solutions (PDF Parser) and how to eliminate manual data entry from your workflow. Continue reading “Extract Data From PDF: How to Convert PDF Files Into Structured Data”
Zonal Optical Character Recognition (OCR), also sometimes referred to as Template OCR, is a technology used to extract text located at a specific location inside a scanned document. In this article we’ll explain how Zonal OCR works and how it can be used to automate data-entry workflows. Continue reading “Using Zonal OCR to Extract Data Fields From Scanned Documents”
In this article we discusses how and when invoice capture software is a viable solution and can be used to eliminate manual data entry. We discuss in detail how invoice scanning software works in general and what methods lead to accurate data. Continue reading “Invoice Capture Software – Is Automated Invoice Scanning a Viable Solution?”
Optical Character Recognition (OCR) is a technology that allows you to extract data from scanned documents. Text which you can then edit, update, or aggregate with other tools for data analysis and a range of other uses.
Optical Character Recognition (OCR), is essentially the conversion of scanned images with text, be it typed, in print, or written by hand, into … well … text. Typically you see OCR used in extracting text information from photos, passports, and scanned documents. OCR is often used for “digitizing” recognized text, so it can be utilized later, edited, searched, aggregated for analysis, etc. Continue reading “OCR PDF Scanner”