Improve OCR Accuracy With Advanced Image Preprocessing

Optical Character Recognition (OCR) technology got better and better over the past decades thanks to more elaborated algorithms, more CPU power and advanced machine learning methods. Getting to OCR accuracy levels of 99% or higher is however still rather the exception and definitely not trivial to achieve.

At Docparser we learned how to improve OCR accuracy the hard way and spent weeks on fine-tuning our OCR engine. If you are in the midst of setting up an OCR solution and want to know how to increase the accuracy levels of your OCR engine, keep on reading … In this article, we cover different techniques to improve OCR accuracy and share our takeaways from building a world-class OCR system for Docparser. Continue reading “Improve OCR Accuracy With Advanced Image Preprocessing”

Extract Data From PDF: How to Convert PDF Files Into Structured Data

PDF is here to stay. In today’s work environment, PDF became ubiquitous as a digital replacement for paper and holds all kind of important business data. But what are the options if you want to extract data from PDF documents? Manually rekeying PDF data is often the first reflex but fails most of the time for a variety of reasons. In this article we talk about PDF data extraction solutions (PDF Parser) and how to eliminate manual data entry from your workflow. Continue reading “Extract Data From PDF: How to Convert PDF Files Into Structured Data”

OCR PDF Scanner

Optical Character Recognition (OCR) is a technology that allows you to extract data from scanned documents. Text which you can then edit, update, or aggregate with other tools for data analysis and a range of other uses.

Optical Character Recognition (OCR), is essentially the conversion of scanned images with text, be it typed, in print, or written by hand, into … well … text. Typically you see OCR used in extracting text information from photos, passports, and scanned documents. OCR is often used for “digitizing” recognized text, so it can be utilized later, edited, searched, aggregated for analysis, etc. Continue reading “OCR PDF Scanner”