Getting Started

Introduction

Welcome to the API of Docparser! You can use this API to

list Document Parsers created with Docparser
load documents to a Document Parser
obtain your parsed data

The Docparser API is organized around REST principles. Our API has predictable, resource-oriented URLs, and uses clear response messages to indicate API errors.

The code examples in the right sidebar are designed to show you how to call our API. All you need to do is to replace the secret_api_key in the sample with your private API token.

This documentation was last updated 2018-02-13.

Client Libraries (SDKs)

Docparser comes with two official client libraries to make it easier for you to build an integration with Docparser.

Both client libraries are open source and are published under the MIT license. Which means that you can use them in your projects without any restrictions. If you want to contribute to the development of our libraries, please don't hesitate to create a pull-requests in our Github repositories.

Official Libraries

Third Party Libraries And Code Snippets

Authentication

Every request to the Docparser API needs to be authenticated with a secret API key linked to your account. You can obtain and reset your secret API key in the API Settings of your Docparser Account. Your API key carries many privileges, so be sure to keep them secret!

All API requests must be made over HTTPS. Calls made over plain HTTP will fail. API requests without authentication will also fail.

Authentication to the API can be performed in two ways:

via HTTP Basic Auth (recommended)
by directly providing your API key in your request

You can test if the authentication works by pinging the following URL. Please make sure to include the correct authentication headers / parameters as described below.

GET https://api.docparser.com/v1/ping

HTTP Basic Auth

curl https://api.docparser.com/v1/ping \
   -u <secret_api_key>:

{"msg": "pong"}

This authentication method is the preferred way of authenticating your requests to Docparser. When using HTTP Basic Auth, use your secret API key as the "username" and leave the "password" blank.

With API Key

curl https://api.docparser.com/v1/ping -H 'api_key: <secret_api_key>'

curl https://api.docparser.com/v1/ping?api_key=<secret_api_key>

require('./vendor/autoload.php');

use Docparser\Docparser;

$docparser = new Docparser("secret_api_key");

echo $docparser->ping();

import pydocparser

parser = pydocparser.Parser()

parser.login("secret_api_key")

print(parser.ping())

var docparser = require('docparser-node');

var client = new docparser.Client("secret_api_key");

client.ping()
  .then(function() {
    console.log('authentication succeeded!')
  })
  .catch(function(err) {
    console.log('authentication failed!')
  });

{"msg": "pong"}

In case Basic Auth is not an option for you, it is also possible to include your secret API key directly within your request. You can provide your API key either as a header (api_key: ABC123), a post-field (api_key=ABC123) or an URL query parameter (&api_key=ABC123).

Please note that including your API as an URL query parameter is the least secure method and we don't recommend doing this. Including API keys in URLs comes with a high risk of accidentally exposing them to others.

Parsers

List Document Parsers

curl https://api.docparser.com/v1/parsers \
   -u <secret_api_key>:

    $docparser->getParsers();

parsers = parser.list_parsers()

    client.getParsers()
    .then(function (parsers) {
        console.log(parsers)
    })
    .catch(function (err) {
        console.log(err)
    })

[{
  "id":"mwekrupomwekrupo",
  "label":"Acme Inc. Invoice Parser"
},{
  "id":"cadqtvgjcadqtvgj",
  "label":"Acme Corp. Invoice Parser"
}]

This endpoint returns a list of all Document Parsers linked to your account. Each entry contains an id and a label. The id value an be used in other API routes, e.g. for importing documents to a specific Document Parser or obtaining parsing results.

GET https://api.docparser.com/v1/parsers

List Parser Model Layouts

curl https://api.docparser.com/v1/parser/models/<PARSER_ID> \
   -u <secret_api_key>:

[{
  "id":"1",
  "label":"Acme Inc. Invoice Parser Layout #1"
},{
  "id":"2",
  "label":"Acme Corp. Invoice Parser Layout #2"
}]

This endpoint returns a list of all the Model Layouts for a specific parser linked to your account.

GET https://api.docparser.com/v1/parser/models/<PARSER_ID>

Documents

Import Documents

We offer several options to import a document to Docparser to make it as easy as possible for you to integrate Docparser in your existing workflow.

Next to manually uploading your documents with our app, Docparser also allows you to import documents using this API. You can upload your document with a HTTP POST request, or by providing a publicly accessible URL which can be used to fetch the document.

Hint: Another easy way of importing your documents is to forward them by e-mail to a private email inbox linked to your account. You can learn more about this method in the settings of your Document Parser.

Upload Document From Local Path

curl \
  -u <secret_api_key>: \
  -F "file=@/home/your/local/file.jpdf" \
  https://api.docparser.com/v1/document/upload/<PARSER_ID>

$docparser->uploadDocumentByPath($parserId, $filePath, $remoteId = null);

document_id = parser.upload_file_by_path("path to document.pdf", "parser name")

client.uploadFileByPath('PARSER_ID', './test.pdf', {remote_id: 'test'})
  .then(function (result) {
    console.log(result)
  })
  .catch(function (err) {
    console.log(err)
  })

{
    "id" : "abc123efg456",
    "quota_used" : 642,
    "quota_left" : 258,
    "quota_refill" : "2017-05-02T02:43:48+00:00"
}

Docparser allows you to upload documents from your local hard-drive with a multipart/form-data request. This is the same type of request a HTML form with a file upload field would send. The field name used for the document upload needs to be file.

The return value of a successful upload is the ID of the newly created document, as well as account usage data.

Each of your Document Parsers has a unique API route to which you need to send your request. The <PARSER_ID> shown in the URL below can be obtained by calling the List Parsers API route. You can also easily obtain the <PARSER_ID> inside the Docparser app in the settings of your Document Parser under Settings > API.

In addition, you can submit an arbitrary string to Docparser which will be stored together with the uploaded document. The submitted value (remote_id) will be kept throughout the processing and will be available later once you obtain the parsed data with our API, as CSV/XLS/XML file or through webhooks. This optional parameter makes it easy to relate the parsed data returned by Docparser with document records in your own system.

POST https://api.docparser.com/v1/document/upload/<PARSER_ID>

Parameter	Description
file	The file object to upload
remote_id	Optional parameter to pass through your own document ID

Upload Document By Content

curl \
  -u <secret_api_key>: \
  -F "file_content=....&file_name=...." \
  https://api.docparser.com/v1/document/upload/<PARSER_ID>

$docparser->uploadDocumentByContents($parserId, $file, $remoteId = null, $filename = null);

document_id = parser.upload_file_by_base64(base64_content, "file name.pdf", "parser name")

client.uploadFileByStream('someparserid', fs.createReadStream('filepath'), options)
  .then(function (result) {
    console.log(result)
  })
  .catch(function (err) {
    console.log(err)
  })

{
    "id" : "abc123efg456",
    "file_size" : 119540,
    "quota_used" : 642,
    "quota_left" : 258,
    "quota_refill" : "2017-05-02T02:43:48+00:00"
}

Alternatively to uploading a document from your hard drive, you can also send files in using a simple form-data HTTP POST request. To make this work, name your form field file_content and use base64 encoding for safe delivery of the data. The document name can be transferred in a second form field called file_name.

The return value of a successful upload is the ID of the newly created document, the filesize of the imported document as well as account usage data.

POST https://api.docparser.com/v1/document/upload/<PARSER_ID>

Parameter	Description
file_content	The file content encoded with base64.
file_name	The file name for this document. This parameter is optional and we will attribute a file-name based on the time of uploading if empty.
remote_id	Optional parameter to pass through your own document ID

Fetch Document From URL

curl \
  -u <secret_api_key>: \
  -F "url=http://www.pdf995.com/samples/pdf.pdf" \
  https://api.docparser.com/v2/document/fetch/<PARSER_ID>

$docparser->fetchDocumentFromURL($parserId, $url, $remoteId = null);

document_id = parser.upload_file_by_url(url_of_file, "parser name")

client.fetchDocumentFromURL('PARSER_ID', 'http://example.com/test.pdf', {remote_id: 'test'})
  .then(function (result) {
    console.log(result)
  })
  .catch(function (err) {
    console.log(err)
  })

{
    "document_id": "b07b080b357334510e10f5b41567000c",
    "parser_id": "lgaxqwtznuoa",
    "remote_id": "",
    "message": "The document is scheduled to be fetched from the URL you provided and will be processed in a few minutes. You can check the status of the document at https://api.docparser.com/v2/document/status/lgaxqwtznuoa/b07b080b357334510e10f5b41567000c"
}

If your files are stored under a publicly accessible URL, you can also import a document by providing the URL to our API. This method is really straight forward and you just need to perform a simple POST or GET request with url as the parameter.

In addition, you can submit an arbitrary string to Docparser which will be stored together with the fetched document. The submitted value (remote_id) will be kept throughout the processing and will be available later once you obtain the parsed data with our API, as CSV/XLS/XML file or through webhooks. This optional parameter makes it easy to relate the parsed data returned by Docparser with document records in your own system.

POST https://api.docparser.com/v2/document/fetch/<PARSER_ID>

Parameter	Description
url	The location of a publicly accessible document
remote_id	Optional parameter to pass through your own document ID

Response

Field	Description
document_id	The unique ID of the fetched document.
parser_id	the parser id
remote_id	the remote id that is passed in the request parameters
message	This message contains the status URL of the document. You can check the status if the document is imported or not by visiting this URL.

Document Status

curl \
  -u <secret_api_key>: \
  https://api.docparser.com/v2/document/status/<PARSER_ID>/<DOCUMENT_ID>

{
    "token": "fa36ba4b7ac507fe76f9388a54c18114",
    "remote_id": "",
    "file_source": "api",
    "filename": "example.name",
    "mime_type": "",
    "pages": 0,
    "supported": true,
    "importing_in_progress": false,
    "processing_in_progress": false,
    "webhook_dispatching_in_progress": false,
    "uploaded_at": 1724028973,
    "imported_at": 0,
    "ocr_started_at": 0,
    "preprocessed_at": 0,
    "preprocessing_in_progress_at": 0,
    "processed_at": 0,
    "first_processed_at": 0,
    "dispatched_webhook": false,
    "dispatched_webhook_at": 0,
    "dispatched_webhook_problem": false,
    "webhooks_created": 0,
    "webhooks_sent": 0,
    "failed_jobs": [
        "file_fetch_api"
    ]
}

To check the status of a document, this endpoint provides all the information about the document's state, including timestamps and flags. If any job associated with the document fails, it will be listed under the failed_jobs field.

GET https://api.docparser.com/v2/document/status/<PARSER_ID>/<DOCUMENT_ID>

Parsed Data

Docparser provides a couple of different ways to obtain the data parsed from your documents. Basically, you have the following three options:

Create permanent download links
Send parsed data to your API with webhooks
Fetch parsed data with this API

Get Data Of One Document

curl \
  -u <secret_api_key>: \
  https://api.docparser.com/v1/results/<PARSER_ID>/<DOCUMENT_ID>

$docparser->getResultsByDocument($parserId, $documentId, $format = 'object');

data = parser.get_one_result("parser name", document_id)

client.getResultsByDocument(parserId, documentId, {format: 'object'})
  .then(function (result) {
    console.log(result)
  })
  .catch(function (err) {
    console.log(err)
  })

[
    {
        "id": "967bcf5658d73c80563072373d5002e3",
        "document_id": "1d35639d4b53b59e77f737c93cd1d3d7",
        "remote_id": "your_optional_id",
        "file_name": "pdf.pdf",
        "media_link": "https://api.docparser.com/v1/document/media/...",
        "media_link_original": "https://api.docparser.com/v1/document/media/.../original",
        "media_link_data": "https://api.docparser.com/v1/document/media/.../data",
        "page_count": 4,
        "uploaded_at": "2016-07-27T14:57:05+00:00",
        "processed_at": "2016-07-27T14:57:10+00:00",
        "purchase_number": "ABC123",
        "customer": {
            "last_name" : "Doe",
            "first_name" : "John"
        },
        "table_data": [{
            "key" : "value"
        }, {
            "key" : "value"
        },
        ...
        ],
        "....": "...."
    }
]

This API route returns the parsed data of one document. The response structure is identical to the list route above, only that the contains a single object representing the data of the requested document.

The <PARSER_ID> shown in the URL below can be obtained by calling the List Parsers API route. You can also easily obtain the <PARSER_ID> inside the Docparser app in the settings of your Document Parser under Settings > API.

The <DOCUMENT_ID> is returned when uploading/importing a new document.

GET https://api.docparser.com/v1/results/<PARSER_ID>/<DOCUMENT_ID>

Parameter	Default	Description
format	object	Valid values are `object` or `flat`. By default, parsed document data is returned as nested JSON objects. Setting this parameter to flat will return a simplified version of the parsed data which does not contain flat key/value pairs instead of nested objects.
include_children		If child documents were created during preprocessing (e.g. when splitting documents), setting this parameter to `true` ensures that the parsed data of all child documents is returned.

Get Data Of Multiple Documents

curl \
  -u <secret_api_key>: \
  https://api.docparser.com/v1/results/<PARSER_ID>



curl -G \
  -u <secret_api_key>: \
  https://api.docparser.com/v1/results/<PARSER_ID> \
  -d "sort_by=<SORT_BY>" \
  -d "sort_order=<DESC | ASC>"

$docparser->getResultsByParser($parserId, $options = []);

data = parser.get_multiple_results("parser name")

client.getResultsByParser(parserId, {format: 'object'})
  .then(function (result) {
    console.log(result)
  })
  .catch(function (err) {
    console.log(err)
  })

[
    {
        "id": "967bcf5658d73c80563072373d5002e3",
        "document_id": "1d35639d4b53b59e77f737c93cd1d3d7",
        "remote_id": "your_optional_id",
        "file_name": "pdf.pdf",
        "media_link": "https://api.docparser.com/v1/document/media/...",
        "media_link_original": "https://api.docparser.com/v1/document/media/.../original",
        "media_link_data": "https://api.docparser.com/v1/document/media/.../data",
        "page_count": 1,
        "uploaded_at": "2016-07-27T14:57:05+00:00",
        "processed_at": "2016-07-27T14:57:10+00:00",
        "purchase_number": "ABC123",
        "customer": {
            "last_name" : "Doe",
            "first_name" : "John"
        },
        "table_data": [{
            "key" : "value"
        }, {
            "key" : "value"
        },
        ...
        ],
        "....": "...."
    },
    {
       ....
    }
]

This API route returns a list of JSON objects representing the parsed data. By default, the data of the last 100 documents in reverse chronological order. Additional parameters can be used to change this default behaviour.

GET https://api.docparser.com/v1/results/<PARSER_ID>

Parameter	Default	Description
format	object	Valid values are `object` or `flat`. By default, parsed document data is returned as nested JSON objects. Setting this parameter to `flat` will return a simplified version of the parsed data which contains flat key/value pairs instead of nested objects.
list	last_uploaded	Valid values are `last_uploaded`, `uploaded_after` and `processed_after`. By default, the data of the last uploaded documents in reverse chronological order is returned. If set to `uploaded_after`, documents imported after a certain date are returned (see below). If set to `processed_after`, documents parsed after a certain date are returned (see below).
limit	100	This parameter indicates how many documents to include when the parameter `list` is set to `last_uploaded`. The maximum quantity of documents which can be returned is limited 10,000.
date		This parameter is mandatory if the parameter `list` is set to `uploaded_after` or `processed_after`. The parameter needs to be a valid ISO 8601 (e.g. 2017-02-12T15:19:21+00:00) date string or a Linux timestamp and determines which documents are included in the return. Please note that the maximum quantity of documents which can be returned is limited 10,000.
remote_id		When this parameter is set, only documents having the provided `remote_id` will be returned. The `remote_id` of a document can be set when importing the file via the API (see above).
include_processing_queue		By default, only documents which are fully processed (imported, preprocessed, parsed) are included in the results. By setting `include_processing_queue` to `true`, files which are not yet entirely processed are included in the results.
sort_by		By default, it will be sorted by files as they are uploaded into the system. Valid values are `parsed_at`, `processed_at`, `uploaded_at`, `first_processed_at`, `imported_at`, `integrated_at`, `dispatched_webhook_at`, and `preprocessed_at`. Results will be sorted according to the given value.
sort_order	DESC	Valid values are `ASC` and `DESC`. The results will be sorted in ascending or descending order accordingly.

Re-Parse Data


curl -X POST \
  -u <secret_api_key>: \
  https://api.docparser.com/v1/document/reparse/<PARSER_ID> \
  -d "document_ids[]=<DOCUMENT_ID_1>" \
  -d "document_ids[]=<DOCUMENT_ID_2>" \
  -d "document_ids[]=<DOCUMENT_ID_3>" \

{
    "total_reparsed": 3,
    "msg": ""
}

This API route will schedule documents for re-parsing.

POST https://api.docparser.com/v1/document/reparse/<PARSER_ID>

Parameter	Default	Description
document_ids		Valid value is a non-empty array of document ids. These document IDs can be obtained from Parsed Data API results. The field in the result is `document_id`.

Re-Integrate Data


curl -X POST \
  -u <secret_api_key>: \
  https://api.docparser.com/v1/document/reintegrate/<PARSER_ID> \
  -d "document_ids[]=<DOCUMENT_ID_1>" \
  -d "document_ids[]=<DOCUMENT_ID_2>" \
  -d "document_ids[]=<DOCUMENT_ID_3>" \

{
    "total_reintegrate": 3,
    "msg": ""
}

This API route will schedule documents for integration queue.

POST https://api.docparser.com/v1/document/reintegrate/<PARSER_ID>

Parameter	Default	Description
document_ids		Valid value is a non-empty array of document ids. These document IDs can be obtained from Parsed Data API results. The field in the result is `document_id`.