Media
documents-ai
Real-time OCR and data extraction API by Veryfi.
---
name: veryfi-documents-ai
description: Real-time OCR and data extraction API by Veryfi (https://veryfi.com). Extract structured data from receipts, invoices, bank statements, W-9s, purchase orders, bills of lading, and any other document. Use when you need to OCR documents, extract fields, parse receipts/invoices, bank statements, classify documents, detect fraud, or get raw OCR text from any document.
metadata:
openclaw:
requires:
env:
- VERYFI_CLIENT_ID
- VERYFI_USERNAME
- VERYFI_API_KEY
primaryEnv: VERYFI_CLIENT_ID
---
# Documents AI by Veryfi
Real-time OCR and data extraction API — extract structured data from receipts, invoices, bank statements, W-9s, purchase orders, and more, with document classification, fraud detection, and raw OCR text output.
> **Get your API key:** https://app.veryfi.com/api/settings/keys/
> **Learn more:** https://veryfi.com
## Quick Start
For Receipts and Invoices:
```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/documents/" \
-H "Content-Type: multipart/form-data" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-F "[email protected]"
```
Response:
```json
{
"id": 62047612,
"created_date": "2026-02-19",
"currency_code": "USD",
"date": "2026-02-18 14:22:00",
"document_type": "receipt",
"category": "Meals & Entertainment",
"is_duplicate": false,
"vendor": {
"name": "Starbucks",
"address": "123 Main St, San Francisco, CA 94105"
},
"line_items": [
{
"id": 1,
"order": 0,
"description": "Caffe Latte Grande",
"quantity": 1,
"price": 5.95,
"total": 5.95,
"type": "food"
}
],
"subtotal": 5.95,
"tax": 0.52,
"total": 6.47,
"payment": {
"type": "visa",
"card_number": "1234"
},
"ocr_text": "STARBUCKS\n123 Main St...",
"img_url": "https://scdn.veryfi.com/documents/...",
"pdf_url": "https://scdn.veryfi.com/documents/..."
}
```
For Bank Statements:
```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/bank-statements/" \
-H "Content-Type: multipart/form-data" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-F "[email protected]"
```
Response:
```json
{
"id": 4820193,
"created_date": "2026-02-19T12:45:00.000000Z",
"bank_name": "Chase",
"bank_address": "270 Park Avenue, New York, NY 10017",
"account_holder_name": "Jane Doe",
"account_holder_address": "456 Oak Ave, San Francisco, CA 94110",
"account_number": "****7890",
"account_type": "Checking",
"routing_number": "021000021",
"currency_code": "USD",
"statement_date": "2026-01-31",
"period_start_date": "2026-01-01",
"period_end_date": "2026-01-31",
"beginning_balance": 12500.00,
"ending_balance": 11835.47,
"accounts": [
{
"number": "****7890",
"beginning_balance": 12500.00,
"ending_balance": 11835.47,
"summaries": [
{ "name": "Total Deposits", "total": 3200.00 },
{ "name": "Total Withdrawals", "total": 3864.53 }
],
"transactions": [
{
"order": 0,
"date": "2026-01-05",
"description": "Direct Deposit - ACME Corp Payroll",
"credit_amount": 3200.00,
"debit_amount": null,
"balance": 15700.00,
"category": "Income"
},
{
"order": 1,
"date": "2026-01-12",
"description": "Rent Payment - 456 Oak Ave",
"credit_amount": null,
"debit_amount": 2800.00,
"balance": 12900.00,
"category": "Housing"
},
{
"order": 2,
"date": "2026-01-20",
"description": "PG&E Utility Bill",
"credit_amount": null,
"debit_amount": 1064.53,
"balance": 11835.47,
"category": "Utilities"
}
]
}
],
"pdf_url": "https://scdn.veryfi.com/bank-statements/...",
"img_url": "https://scdn.veryfi.com/bank-statements/..."
}
```
## Setup
### 1. Get Your API Key
```bash
# Visit API Auth Credentials page
https://app.veryfi.com/api/settings/keys/
```
Save your API keys:
```bash
export VERYFI_CLIENT_ID="your_client_id_here"
export VERYFI_USERNAME="your_username_here"
export VERYFI_API_KEY="your_api_key_here"
```
### 2. OpenClaw Configuration (Optional)
**Recommended: Use environment variables** (most secure):
```json5
{
skills: {
entries: {
"veryfi-documents-ai": {
enabled: true,
// Keys loaded from environment variables:
// VERYFI_CLIENT_ID, VERYFI_USERNAME, VERYFI_API_KEY
},
},
},
}
```
**Alternative: Store in config file** (use with caution):
```json5
{
skills: {
entries: {
"veryfi-documents-ai": {
enabled: true,
env: {
VERYFI_CLIENT_ID: "your_client_id_here",
VERYFI_USERNAME: "your_username_here",
VERYFI_API_KEY: "your_api_key_here",
},
},
},
},
}
```
**Security Note:** If storing API keys in `~/.openclaw/openclaw.json`:
- Set file permissions: `chmod 600 ~/.openclaw/openclaw.json`
- Never commit this file to version control
- Prefer environment variables or your agent's secret store when possible
- Rotate keys regularly and limit API key permissions if supported
## Common Tasks
### Extract data from a Receipt or Invoice (file upload)
```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/documents/" \
-H "Content-Type: multipart/form-data" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-F "[email protected]"
```
### Extract data from a Receipt or Invoice (base64)
When your agent already has the document as base64-encoded content (e.g., received via API, email attachment, or tool output), use `file_data` instead of uploading a file:
```bash
# Encode the file first
BASE64_DATA=$(base64 -i invoice.pdf)
curl -X POST "https://api.veryfi.com/api/v8/partner/documents/" \
-H "Content-Type: application/json" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-d "{
\"file_name\": \"invoice.pdf\",
\"file_data\": \"$BASE64_DATA\"
}"
```
### Extract data from a URL
```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/documents/" \
-H "Content-Type: application/json" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-d '{
"file_url": "https://example.com/invoice.pdf"
}'
```
### Extract data from a Passport
```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/any-documents/" \
-H "Content-Type: multipart/form-data" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-F "[email protected]" \
-F "blueprint_name=passport"
```
### Extract data from Checks
```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/checks/" \
-H "Content-Type: multipart/form-data" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-F "[email protected]"
```
### Extract data from W-9s
```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/w9s/" \
-H "Content-Type: multipart/form-data" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-F "[email protected]"
```
### Extract data from W-2s and W-8s
W-2 and W-8 forms do not have dedicated endpoints. Use the `any-documents` endpoint with the appropriate blueprint:
```bash
# W-2
curl -X POST "https://api.veryfi.com/api/v8/partner/any-documents/" \
-H "Content-Type: multipart/form-data" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-F "[email protected]" \
-F "blueprint_name=w2"
# W-8
curl -X POST "https://api.veryfi.com/api/v8/partner/any-documents/" \
-H "Content-Type: multipart/form-data" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-F "[email protected]" \
-F "blueprint_name=w8"
```
> **Note:** W-2 and W-8 appear as classification types (via `/classify/`) but their extraction is handled through the Any Document endpoint. Do **not** POST to `/api/v8/partner/w2s/` or `/api/v8/partner/w8s/` — those endpoints do not exist.
### Get Raw OCR Text from a Document
All extraction endpoints return an `ocr_text` field in the response containing the raw text content of the document as a plain string. This is useful when you want to process the text yourself or pass it to an LLM.
```bash
# Extract and pull ocr_text with jq
curl -X POST "https://api.veryfi.com/api/v8/partner/documents/" \
-H "Content-Type: multipart/form-data" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-F "[email protected]" \
| jq '.ocr_text'
```
> **Note:** `ocr_text` is plain text, not markdown. If you need markdown-formatted output, pass `ocr_text` to an LLM for reformatting after extraction.
### Classify a Document
Identify the document type without full data extraction. Useful for routing documents to the correct processing endpoint, pre-filtering uploads, or bulk sorting.
```bash
curl -X POST "https://api.veryfi.com/api/v8/partner/classify/" \
-H "Content-Type: multipart/form-data" \
-H "Client-Id: $VERYFI_CLIENT_ID" \
-H "Authorization: apikey $VERYFI_USERNAME:$VERYFI_API_KEY" \
-F "[email protected]"
```
> **Note:** By default, the API classifies against 15 built-in types. You can also pass a `document_types` array with custom classes (see example below).
Response:
```json
{
"id": 81023456,
"document_type": {
"score": 0.97,
"value": "invoice"
}
}
```
Default document types: `receipt`, `invoice`, `purchase_order`, `bank_statement`, `check`, `w2`, `w8`, `w9`, `statement`, `contract`, `credit_note`, `remittance_advice`, `business_card`, `packin
... (truncated)
media
By
Comments
Sign in to leave a comment