Which file formats does Docparser support?

Docparser reads PDF, Word (.docx and .doc), CSV, XLS, TXT, XML, and image files including JPG, PNG, and TIFF. For image files and scanned documents, the OCR engine converts visual content to text before extraction rules run.

Can Docparser process XML and other structured data files?

Yes. For XML files like UBL invoices, Docparser parses the structured data directly — extracting nodes, attributes, and nested values as separate output fields. The UBL Invoice template has pre-mapped rules for the standard UBL schema. For custom XML formats, you configure extraction rules against your specific schema. CSV and TXT files process the same way.

Can it handle multiple document formats and types in the same workspace?

Yes. You set up a separate parser for each document type and format. A credit report parser, a work order parser, and a UBL invoice parser all run in the same workspace and can all route to the same destination. Different formats, different document types, one output destination with no manual consolidation step.

What is the SmartAI Parser and when should I use it?

The SmartAI Parser uses DocparserAI's OCR engine to process any document without pre-mapping rules. It identifies and extracts fields automatically from formats it has not encountered before. For consistent, repeatable output from the same format every month, a configured template produces more reliable results.

Can Docparser extract data from credit reports across multiple bureaus?

Yes. The MyScoreIQ Credit Report template extracts FICO scores, personal information, and account histories across TransUnion, Experian, and Equifax — including balances, credit limits, payment statuses, and inquiry records. For other credit report formats, a custom template maps the same fields from your provider's layout.

Which systems does extracted document data route to?

Docparser exports data via CSV, webhook, or REST API. For Google Sheets, Airtable, and HubSpot, extracted fields map directly via Zapier or a direct webhook. For CRM systems and ERPs, a webhook or API delivers structured data in the format your system expects.

How long does it take to set up a parser for a document type not in the template library?

Start with a blank template, upload a sample document, and use the visual rule builder to configure extraction rules. Most custom document types take 15 to 30 minutes to configure, no coding required. For document types with a pre-built template, setup takes under five minutes.

How is document data handled securely?

Docparser runs on AWS across multiple availability zones. All data is encrypted in transit and at rest. Your documents belong to your organisation and Docparser does not resell or reuse them. You set retention between 0 and 180 days. GDPR compliant with Standard Contractual Clauses for EU customers.

Automate Data Extraction from Any Document Type.
Built for Productive Teams.

PDF, Word, CSV, XLS, TXT, XML, and image files all reach your team from different sources. Docparser reads them all, extracts the fields you define, and delivers structured data to your spreadsheet or system automatically. One rules engine. Any document type.

Get Started Free Schedule a Demo

14-day free trial · No credit card required · Set up in under 5 minutes

Works with your existing tools

Google Sheets Airtable HubSpot Zapier Power Automate REST API

How It Works

Document Extraction, Whatever File Type It Arrives In.

Every team processes a mix of document types. Some have pre-built templates. Others need a custom rule. Docparser handles both — and routes all output to the same destination regardless of where it came from.

Import

Any File Format, Any Source

PDF, Word, CSV, XLS, TXT, XML, and image files all reach Docparser the same way. Any format, any source, any volume.

PDF files and Word documents from email or document portals
CSV, XLS, and spreadsheet exports from external systems
TXT and XML structured data files
Image files (JPG, PNG, TIFF) with OCR processing
Cloud folder sync: Google Drive, Dropbox, OneDrive, Box

Parse

Configure the Rules Once. Every Document Runs the Same Way.

Point at the data on a sample document. Docparser builds the extraction rule. Use a pre-built template for known document formats or start from a blank template for any custom layout.

Extraction rules you configure:

Text Fixed Position Text Variable Position Smart Tables Table Data Date Person Name Postal Address Email Address Phone Number Regular Expression Tag Document & more

Export

Straight Into Your System, Whatever Document It Came From

Extracted data lands in your spreadsheet, CRM, or database automatically. Documents from different sources and formats all feed the same downstream destination.

Download: CSV, Excel, JSON, XML
Direct: Google Sheets, Airtable, HubSpot
Custom: Webhook, FTP, REST API
Platforms: Zapier, Make, Power Automate, Workato

See all integrations →

Pre-Built Templates

Some Formats Already Mapped. Others Take Minutes to Configure.

Docparser has pre-built templates for dozens of document types. Pick one that matches your format, upload a sample, and your first clean export is ready in minutes. For anything else, the SmartAI Parser handles it automatically.

Browse All Templates →

Generated Reports

Structured report documents with consistent output formats. Configure rules once to pull summary data, totals, or key metrics from any regularly generated report.

Use Template →

Standardized Contracts

Party Names · Effective Date · Expiration Date · Contract Value · Governing Terms — extracted using variable-position rules that find each field wherever it appears.

Use Template →

MyScoreIQ Credit Report

FICO Score · Personal Information · Account History (TransUnion, Experian, Equifax) · Account Status · Balance · Credit Limit · Payment Status · Inquiries

Use Template →

Work Orders

Structured work order documents with job details, assignments, and status fields. Configure rules to extract the data your operations team needs to log or route downstream.

Use Template →

UBL Invoice

Invoice Number · Issue Date · Due Date · Currency · Supplier Info · Customer Info · Invoice Lines · Tax Subtotal · Totals · Payment Info

Use Template →

Your Format Not Listed?

The SmartAI Parser reads any document — PDF, Word, image, XML — without pre-mapping. Upload a sample and the AI extracts the fields automatically.

Use SmartAI Parser →

Use Cases

Where Multi-Format Document Processing Actually Slows Down

The teams that need this page most are the ones processing more than one document type. Pick the scenario that matches yours.

Credit Reports and Financial Documents Processed Without Manual Data Entry.

Finance and lending teams receive credit reports, generated financial summaries, and assessment documents from external providers. Each one carries structured data — FICO scores, account histories, payment statuses — that needs capturing before a decision can be made. Docparser extracts those fields automatically from each document and routes the data to your system. The report arrives. The data is already where your team needs it.

Replacing manual data entry with automated document processing →

Work Orders and Operational Documents Captured as They Close.

Operations teams process work orders, service reports, and field documentation across multiple formats — some printed, some scanned, some generated from field apps. Each one contains job details, completion data, and assignment information that needs logging. Docparser extracts those fields from each document regardless of format and routes structured data to your operations platform or spreadsheet. The paperwork closes. The records update.

Advanced Docparser features for complex document workflows →

Paper and Handwritten Documents Converted to Structured Data Automatically.

Teams receiving paper forms, handwritten notes, or scanned image files need a way to extract structured data without transcribing each one by hand. Docparser's OCR engine reads any image file and DocparserAI handles handwriting recognition before extraction rules run. Any document type — printed, handwritten, or typed — produces the same structured output and routes to the same destination.

Converting handwriting to structured text with DocparserAI →

Documents That Contain Lead Data Should Feed Your CRM Automatically.

Sales and marketing teams receive documents — enquiry forms, proposal responses, contact submissions — across PDF, Word, and XML formats. Each one carries lead data that should live in the CRM, not in a folder. Docparser extracts names, contact details, and qualification data from each document and routes it to HubSpot, Salesforce, or any CRM via webhook or Zapier. The document arrives. The lead is logged.

Automating CRM lead management with document data extraction →

All Solutions

You've Reached the End of the List. But Not the End of What Docparser Handles.

If your document type brought you here, you're in the right place. But Docparser also has dedicated pages for invoices, bank statements, contracts, utility statements, and more — each with pre-built templates and verified field extraction.

Other Documents Invoices & Accounts Payable Bank & Credit Card Statements Purchase & Sales Orders Shipping & Delivery Notes Contracts & NDAs Resumes & Applications PDF Forms Other Documents Invoices & Accounts Payable Bank & Credit Card Statements Purchase & Sales Orders Shipping & Delivery Notes Contracts & NDAs Resumes & Applications PDF Forms

Word Documents Other Documents Accounting & Bookkeeping Logistics & Warehousing PDF to Excel OCR & Document Intelligence AI-Powered Processing Browse All Templates Word Documents Other Documents Accounting & Bookkeeping Logistics & Warehousing PDF to Excel OCR & Document Intelligence AI-Powered Processing Browse All Templates

FAQ

Questions Teams With Unusual Document Types Ask First

Not covered here? The support centre has step-by-step walkthroughs for every scenario.

Docparser reads PDF, Word (.docx and .doc), CSV, XLS, TXT, XML, and image files including JPG, PNG, and TIFF. For image files and scanned documents, the OCR engine converts visual content to text before extraction rules run. This means any document your team receives — regardless of format — can be processed through the same parser workflow.
Yes. For XML files like UBL invoices, Docparser parses the structured data directly — extracting nodes, attributes, and nested values as separate output fields. The UBL Invoice template has pre-mapped rules for the standard UBL schema. For custom XML formats, you configure extraction rules against your specific schema. CSV and TXT files process the same way — structured data extracted field by field into clean output.
Yes. You set up a separate parser for each document type and format. A credit report parser, a work order parser, and a UBL invoice parser all run in the same workspace and can all route to the same destination. Different formats, different document types, one output destination — with no manual consolidation step between them.
The SmartAI Parser uses DocparserAI's OCR engine to process any document without pre-mapping rules. It identifies and extracts fields automatically from formats it has not encountered before — useful for new document types, one-off formats, or any document where building a custom template first is not practical. For consistent, repeatable output from the same format every month, a configured template produces more reliable results.
Yes. The MyScoreIQ Credit Report template extracts FICO scores, personal information, and account histories across TransUnion, Experian, and Equifax — including balances, credit limits, payment statuses, and inquiry records. For other credit report formats, a custom template maps the same fields from your provider's layout.
Docparser exports data via CSV, webhook, or REST API. For Google Sheets, Airtable, and HubSpot, extracted fields map directly via Zapier or a direct webhook. For CRM systems and ERPs, a webhook or API delivers structured data in the format your system expects. See the full list at docparser.com/integrations.
Start with a blank template, upload a sample document, and use the visual rule builder to configure extraction rules. Most custom document types take 15 to 30 minutes to configure, no coding required. For document types with a pre-built template, setup takes under five minutes.
Docparser runs on AWS across multiple availability zones. All data is encrypted in transit and at rest. Your documents belong to your organisation — Docparser does not resell or reuse them. You set retention between 0 and 180 days. GDPR compliant, with Standard Contractual Clauses for EU customers. Full details at docparser.com/security.

Get Started

Whatever Your Team Processes.
Docparser Handles It.

Start your 14-day free trial. Upload any document type, pick a template or configure your own rules, and see the extracted data before your team processes another one by hand. No credit card required to start.

Get Started Free Schedule a Demo

14-day free trial · No credit card required · Set up in under 5 minutes

Automate Data Extraction from Any Document Type.Built for Productive Teams.

Document Extraction, Whatever File Type It Arrives In.

Any File Format, Any Source

Configure the Rules Once. Every Document Runs the Same Way.

Straight Into Your System, Whatever Document It Came From

Some Formats Already Mapped. Others Take Minutes to Configure.

Generated Reports

Standardized Contracts

MyScoreIQ Credit Report

Work Orders

UBL Invoice

Your Format Not Listed?

Where Multi-Format Document Processing Actually Slows Down

Credit Reports and Financial Documents Processed Without Manual Data Entry.

Work Orders and Operational Documents Captured as They Close.

Paper and Handwritten Documents Converted to Structured Data Automatically.

Documents That Contain Lead Data Should Feed Your CRM Automatically.

You've Reached the End of the List. But Not the End of What Docparser Handles.

Questions Teams With Unusual Document Types Ask First

Whatever Your Team Processes.Docparser Handles It.

Automate Data Extraction from Any Document Type.
Built for Productive Teams.

Whatever Your Team Processes.
Docparser Handles It.