OCR PDF Scanner

Optical Character Recognition (OCR) is a technology that allows you to extract data from scanned documents. Text which you can then edit, update, or aggregate with other tools for data analysis and a range of other uses.

Optical Character Recognition (OCR), is essentially the conversion of scanned images with text, be it typed, in print, or written by hand, into … well … text. Typically you see OCR used in extracting text information from photos, passports, and scanned documents. OCR is often used for “digitizing” recognized text, so it can be utilized later, edited, searched, aggregated for analysis, etc.

Docparser uses OCR to extract data from PDF documents. It allows you to convert PDF to Excel files, convert PDF to JSON and even update cloud platforms through integrations.

There are often many steps to OCR

Pre-processing happens to improve the possibility of having the text recognized in the process. De-skewing is one of the most used techniques, and layout analysis to target zones of the PDF is also important to consider when extracting text with a high degree of accuracy. Additionally converting grey-scale and color to black and white allows the process to focus on just 2 options (Binarization), and increases the opportunity for a successful extraction of the text, from the source.

A Complete Cloud Based OCR PDF Scanning Solution

If you have PDFs with text you need OCR data extraction from PDF documents, our free trial of Docparser.com leaves you in the driver seat.

Whether you are working to extract information from scanned PDF invoicespurchase orders, or looking to automate the receipt of payroll PDF’s for your bookkeeper, we’ve got you covered. We use the best OCR software available that currently supports 46 languages. An example of Japanese and English scanned PDF, with before and after parsing shown below:

PDF OCR Multiple Languages

PDF OCR parser

Current languages supported with our PDF OCR:

Languages Supported Languages Supported
English Indonesian
Afrikaans Italian
Albanian Japanese
Basque Korean
Brazilian (Portuguese) Latin
Bulgarian Latvian
Byelorussian Lithuanian
Catalan Macedonian
Chinese Simplified Malay
Chinese Traditional Moldavian
Croatian Norwegian
Czech Polish
Danish Portuguese
Dutch Romanian
Esperanto Russian
Estonian Serbian
Finnish Slovak
French Slovenian
Galician Spanish
German Swedish
Greek Tagalog
Hungarian Turkish
Icelandic Ukrainian

 

14 thoughts on “OCR PDF Scanner”

  1. Hello!
    There are ~36000 scanned PDFs and I want to parse them. Just check existing a small sentence inside them. As result I want to see true/false.
    Do you have a solution for me?

    1. Hi Andrey, this sounds definitely like something we can do. You can use our “Tag Document” parsing rule to search for a specific phrase and then output a custom value (e.g. true) when the phrase is present. I would suggest to create a free trial and contact our support stuff once you uploaded a couple of sample documents.

  2. ABBYY Fine Reader has functionality to automate new image files placed in a folder called a ‘hot folder’, do you have similar functionality to this? I would be receiving multiple image files during a day and would to have them converted automatically upon receipt.
    Thanks

    1. Hi Benjamin! Thanks for the question! Docparser is all about automating workflows and you there are several ways to import your documents. As Docparser is a cloud solution, you need to make sure that the documents get uploaded to us though. For example, you can use our integration partners to import documents from your cloud storage provider (Dropbox, Google Drive, Box, …) or automatically forward incoming emails to your parser.

  3. I am not clear is Docparser able to read hand written text (say within a pre-printed form)?

    How do you handle the error reporting such as certain words the OCR is unsure of one of the letters or digits and so it needs a human to review and ‘teach’ the system what is the correct character or letter?

    Thanks

    1. Hi James! Thanks for the great question. Docparser does not recognize handwritten text at this point of time. As you pointed already out, OCR for handwritten text comes with a high error rate and a human validation is mandatory. This is however not something we built into Docparser yet. However, adding a validation interface to Docparser is definitely something we would like to do in the future. If you sign up for a free trial, you’ll be informed about product updates.

  4. Hi,
    Just wondering if docparser can parse a scanned multiple number of receipts and have them organised in an excel worksheet as fields with their corresponding data. The receipts are of the ATM machines when withdrawing money. I need to keep track of when and how much was withdrawn from the account.

    Thank you.
    Al.

    1. Hi Ali, thanks a lot for reaching out and your interest in Docparser! If your receipt are scanned properly (well aligned with an office scanner), Docparser should be able to get the data you need. However, Docparser does not do a great job for documents which were “scanned” with a photo camera. I would suggest you create a free trial account and give it a try. If you experience any issues during the setup, please don’t hesitate to contact our support staff.

  5. We have 5000 analyst reports in PDF and want to extract all the content (text, tables and images) into json formats. Is this possible with docparser?

    1. Hi Audrey! Thanks a lot for reaching out and your interest in Docparser! Whether or not Docparser is a good fit depends on how your documents are structured and what data you want to extract. I would suggest to create a free trial account and upload a couple of sample files. You can then reach out to our customer support and they’ll be happy to check if Docparser is a good fit or not.

  6. Do you support converting a PDF without a text layer, via OCR, to a PDF with a text layer (e.g., PDF/A)?

    1. Hi Ryan, yes, Docparser is producing “Sandwich PDFs” as a side product. However, Docparser was primarily designed to pull specific data points from your documents. If you are only looking for a PDF/A generation tool, you will probably be better with OCRMyPDF as it’s free and designed for this purpose.

Comments are closed.