← Back to Skills
Media

documents-ai

dbirulia By dbirulia 👁 21 views ▲ 0 votes

Real-time OCR and data extraction API by Veryfi.

GitHub
---
name: veryfi-documents-ai
description: Real-time OCR and data extraction API by Veryfi (https://veryfi.com). Extract structured data from receipts, invoices, bank statements, W-9s, purchase orders, bills of lading, and any other document. Use when you need to OCR documents, extract fields, parse receipts/invoices, bank statements, classify documents, detect fraud, or get raw OCR text from any document.
metadata:
  openclaw:
    requires:
      env:
        - VERYFI_CLIENT_ID
        - VERYFI_USERNAME
        - VERYFI_API_KEY
    primaryEnv: VERYFI_CLIENT_ID
---

# Documents AI by Veryfi

Real-time OCR and data extraction API — extract structured data from receipts, invoices, bank statements, W-9s, purchase orders, and more, with document classification, fraud detection, and raw OCR text output.

> **Get your API key:** https://app.veryfi.com/api/settings/keys/
> **Learn more:** https://veryfi.com

## Quick Start

For Receipts and Invoices:
```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/documents/" \
  -H "Content-Type: multipart/form-data" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -F "[email protected]"
```

Response:
```json
{
  "id": 62047612,
  "created_date": "2026-02-19",
  "currency_code": "USD",
  "date": "2026-02-18 14:22:00",
  "document_type": "receipt",
  "category": "Meals & Entertainment",
  "is_duplicate": false,
  "vendor": {
    "name": "Starbucks",
    "address": "123 Main St, San Francisco, CA 94105"
  },
  "line_items": [
    {
      "id": 1,
      "order": 0,
      "description": "Caffe Latte Grande",
      "quantity": 1,
      "price": 5.95,
      "total": 5.95,
      "type": "food"
    }
  ],
  "subtotal": 5.95,
  "tax": 0.52,
  "total": 6.47,
  "payment": {
    "type": "visa",
    "card_number": "1234"
  },
  "ocr_text": "STARBUCKS\n123 Main St...",
  "img_url": "https://scdn.veryfi.com/documents/...",
  "pdf_url": "https://scdn.veryfi.com/documents/..."
}
```

For Bank Statements:
```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/bank-statements/" \
  -H "Content-Type: multipart/form-data" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -F "[email protected]"
```

Response:
```json
{
  "id": 4820193,
  "created_date": "2026-02-19T12:45:00.000000Z",
  "bank_name": "Chase",
  "bank_address": "270 Park Avenue, New York, NY 10017",
  "account_holder_name": "Jane Doe",
  "account_holder_address": "456 Oak Ave, San Francisco, CA 94110",
  "account_number": "****7890",
  "account_type": "Checking",
  "routing_number": "021000021",
  "currency_code": "USD",
  "statement_date": "2026-01-31",
  "period_start_date": "2026-01-01",
  "period_end_date": "2026-01-31",
  "beginning_balance": 12500.00,
  "ending_balance": 11835.47,
  "accounts": [
    {
      "number": "****7890",
      "beginning_balance": 12500.00,
      "ending_balance": 11835.47,
      "summaries": [
        { "name": "Total Deposits", "total": 3200.00 },
        { "name": "Total Withdrawals", "total": 3864.53 }
      ],
      "transactions": [
        {
          "order": 0,
          "date": "2026-01-05",
          "description": "Direct Deposit - ACME Corp Payroll",
          "credit_amount": 3200.00,
          "debit_amount": null,
          "balance": 15700.00,
          "category": "Income"
        },
        {
          "order": 1,
          "date": "2026-01-12",
          "description": "Rent Payment - 456 Oak Ave",
          "credit_amount": null,
          "debit_amount": 2800.00,
          "balance": 12900.00,
          "category": "Housing"
        },
        {
          "order": 2,
          "date": "2026-01-20",
          "description": "PG&E Utility Bill",
          "credit_amount": null,
          "debit_amount": 1064.53,
          "balance": 11835.47,
          "category": "Utilities"
        }
      ]
    }
  ],
  "pdf_url": "https://scdn.veryfi.com/bank-statements/...",
  "img_url": "https://scdn.veryfi.com/bank-statements/..."
}
```

## Setup

### 1. Get Your API Key

```bash
# Visit API Auth Credentials page
https://app.veryfi.com/api/settings/keys/
```

Save your API keys:
```bash
export VERYFI_CLIENT_ID="your_client_id_here"
export VERYFI_USERNAME="your_username_here"
export VERYFI_API_KEY="your_api_key_here"
```

### 2. OpenClaw Configuration (Optional)

**Recommended: Use environment variables** (most secure):
```json5
{
  skills: {
    entries: {
      "veryfi-documents-ai": {
        enabled: true,
        // Keys loaded from environment variables:
        // VERYFI_CLIENT_ID, VERYFI_USERNAME, VERYFI_API_KEY
      },
    },
  },
}
```

**Alternative: Store in config file** (use with caution):
```json5
{
  skills: {
    entries: {
      "veryfi-documents-ai": {
        enabled: true,
        env: {
          VERYFI_CLIENT_ID: "your_client_id_here",
          VERYFI_USERNAME: "your_username_here",
          VERYFI_API_KEY: "your_api_key_here",
        },
      },
    },
  },
}
```

**Security Note:** If storing API keys in `~/.openclaw/openclaw.json`:
- Set file permissions: `chmod 600 ~/.openclaw/openclaw.json`
- Never commit this file to version control
- Prefer environment variables or your agent's secret store when possible
- Rotate keys regularly and limit API key permissions if supported

## Common Tasks

### Extract data from a Receipt or Invoice (file upload)

```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/documents/" \
  -H "Content-Type: multipart/form-data" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -F "[email protected]"
```

### Extract data from a Receipt or Invoice (base64)

When your agent already has the document as base64-encoded content (e.g., received via API, email attachment, or tool output), use `file_data` instead of uploading a file:

```bash
# Encode the file first
BASE64_DATA=$(base64 -i invoice.pdf)

curl -X POST "https://api.veryfi.com/api/v8/partner/documents/" \
  -H "Content-Type: application/json" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -d "{
    \"file_name\": \"invoice.pdf\",
    \"file_data\": \"$BASE64_DATA\"
  }"
```

### Extract data from a URL

```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/documents/" \
  -H "Content-Type: application/json" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -d '{
    "file_url": "https://example.com/invoice.pdf"
  }'
```

### Extract data from a Passport

```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/any-documents/" \
  -H "Content-Type: multipart/form-data" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -F "[email protected]" \
  -F "blueprint_name=passport"
```

### Extract data from Checks

```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/checks/" \
  -H "Content-Type: multipart/form-data" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -F "[email protected]"
```

### Extract data from W-9s

```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/w9s/" \
  -H "Content-Type: multipart/form-data" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -F "[email protected]"
```

### Extract data from W-2s and W-8s

W-2 and W-8 forms do not have dedicated endpoints. Use the `any-documents` endpoint with the appropriate blueprint:

```bash
# W-2
curl -X POST "https://api.veryfi.com/api/v8/partner/any-documents/" \
  -H "Content-Type: multipart/form-data" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -F "[email protected]" \
  -F "blueprint_name=w2"

# W-8
curl -X POST "https://api.veryfi.com/api/v8/partner/any-documents/" \
  -H "Content-Type: multipart/form-data" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -F "[email protected]" \
  -F "blueprint_name=w8"
```

> **Note:** W-2 and W-8 appear as classification types (via `/classify/`) but their extraction is handled through the Any Document endpoint. Do **not** POST to `/api/v8/partner/w2s/` or `/api/v8/partner/w8s/` — those endpoints do not exist.

### Get Raw OCR Text from a Document

All extraction endpoints return an `ocr_text` field in the response containing the raw text content of the document as a plain string. This is useful when you want to process the text yourself or pass it to an LLM.

```bash
# Extract and pull ocr_text with jq
curl -X POST "https://api.veryfi.com/api/v8/partner/documents/" \
  -H "Content-Type: multipart/form-data" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -F "[email protected]" \
  | jq '.ocr_text'
```

> **Note:** `ocr_text` is plain text, not markdown. If you need markdown-formatted output, pass `ocr_text` to an LLM for reformatting after extraction.

### Classify a Document

Identify the document type without full data extraction. Useful for routing documents to the correct processing endpoint, pre-filtering uploads, or bulk sorting.

```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/classify/" \
  -H "Content-Type: multipart/form-data" \
  -H "Client-Id: $VERYFI_CLIENT_ID" \
  -H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
  -F "[email protected]"
```

> **Note:** By default, the API classifies against 15 built-in types. You can also pass a `document_types` array with custom classes (see example below).

Response:
```json
{
  "id": 81023456,
  "document_type": {
    "score": 0.97,
    "value": "invoice"
  }
}
```

Default document types: `receipt`, `invoice`, `purchase_order`, `bank_statement`, `check`, `w2`, `w8`, `w9`, `statement`, `contract`, `credit_note`, `remittance_advice`, `business_card`, `packin

... (truncated)
media

Comments

Sign in to leave a comment

Loading comments...