classifyDocumentsApiV1ClassifierClassifyPost

classifyDocumentsApiV1ClassifierClassifyPost<ThrowOnError>(options): RequestResult<ClassifyResponse, HttpValidationError, ThrowOnError>

Defined in: packages/cloud/src/client/sdk.gen.ts:5279

Classify Documents [BETA] Classify documents based on provided rules - simplified classification system.

This is a Beta feature - API may change based on user feedback.

This endpoint supports:

Classifying new uploaded files
Classifying existing files by ID
Both new files and existing file IDs in one request

v0 Features:

Simplified Rules: Only type and description fields needed
Matching Threshold: Confidence-based classification with configurable threshold
Smart Classification: Filename heuristics + LLM content analysis
Document Type Filtering: Automatically filters out non-document file types
Fast Processing: Uses LlamaParse fast mode + GPT-4.1-nano
Optimized Performance: Parses each file only once for all rules

Simplified Scoring Logic:

Evaluate All Rules: Compare document against all classification rules
Best Match Selection: Return the highest scoring rule above matching_threshold
Unknown Classification: Return as "unknown" if no rules score above threshold

This ensures optimal classification by:

Finding the best possible match among all rules
Avoiding false positives with confidence thresholds
Maximizing performance with single-pass file parsing

Rule Format:

[
\{
"type": "invoice",
"description": "contains invoice number, line items, and total amount"
\},
\{
"type": "receipt",
"description": "purchase receipt with transaction details and payment info"
\}
]

Classification Process:

Metadata Heuristics (configurable via API):

Document Type Filter: Only process document file types (PDF, DOC, DOCX, RTF, TXT, ODT, Pages, HTML, XML, Markdown)
Filename Heuristics: Check if rule type appears in filename
Content Analysis: Parse document content once and use LLM for semantic matching against all rules

Result: Returns type, confidence score, and matched rule information

API Parameters:

matching_threshold (0.1-0.99, default: 0.6): Minimum confidence threshold for acceptable matches
enable_metadata_heuristic (boolean, default: true): Enable metadata-based features

Supported Document Types:

Text Documents: pdf, doc, docx, rtf, txt, odt, pages Web Documents: html, htm, xml Markup: md, markdown

Limits (Beta):

Maximum 100 files per request
Maximum 10 rules per request
Rule descriptions: 10-500 characters
Document types: 1-50 characters (alphanumeric, hyphens, underscores)

Beta Notice: This API is subject to change. Please provide feedback!

Type Parameters

ThrowOnError

ThrowOnError extends boolean = false

Parameters

options

Options<ClassifyDocumentsApiV1ClassifierClassifyPostData, ThrowOnError>

Returns

RequestResult<ClassifyResponse, HttpValidationError, ThrowOnError>

classifyDocumentsApiV1ClassifierClassifyPost

On this page