Logo
Functions

classifyDocumentsApiV1ClassifierClassifyPost

classifyDocumentsApiV1ClassifierClassifyPost<ThrowOnError>(options): RequestResult<ClassifyResponse, HttpValidationError, ThrowOnError>

Defined in: packages/cloud/src/client/sdk.gen.ts:5279

Classify Documents [BETA] Classify documents based on provided rules - simplified classification system.

This is a Beta feature - API may change based on user feedback.

This endpoint supports:

  • Classifying new uploaded files
  • Classifying existing files by ID
  • Both new files and existing file IDs in one request

v0 Features:

  • Simplified Rules: Only type and description fields needed
  • Matching Threshold: Confidence-based classification with configurable threshold
  • Smart Classification: Filename heuristics + LLM content analysis
  • Document Type Filtering: Automatically filters out non-document file types
  • Fast Processing: Uses LlamaParse fast mode + GPT-4.1-nano
  • Optimized Performance: Parses each file only once for all rules

Simplified Scoring Logic:

  1. Evaluate All Rules: Compare document against all classification rules
  2. Best Match Selection: Return the highest scoring rule above matching_threshold
  3. Unknown Classification: Return as "unknown" if no rules score above threshold

This ensures optimal classification by:

  • Finding the best possible match among all rules
  • Avoiding false positives with confidence thresholds
  • Maximizing performance with single-pass file parsing

Rule Format:

[
\{
"type": "invoice",
"description": "contains invoice number, line items, and total amount"
\},
\{
"type": "receipt",
"description": "purchase receipt with transaction details and payment info"
\}
]

Classification Process:

  1. Metadata Heuristics (configurable via API):
  • Document Type Filter: Only process document file types (PDF, DOC, DOCX, RTF, TXT, ODT, Pages, HTML, XML, Markdown)
  • Filename Heuristics: Check if rule type appears in filename
  • Content Analysis: Parse document content once and use LLM for semantic matching against all rules
  1. Result: Returns type, confidence score, and matched rule information

API Parameters:

  • matching_threshold (0.1-0.99, default: 0.6): Minimum confidence threshold for acceptable matches
  • enable_metadata_heuristic (boolean, default: true): Enable metadata-based features

Supported Document Types:

Text Documents: pdf, doc, docx, rtf, txt, odt, pages Web Documents: html, htm, xml Markup: md, markdown

Limits (Beta):

  • Maximum 100 files per request
  • Maximum 10 rules per request
  • Rule descriptions: 10-500 characters
  • Document types: 1-50 characters (alphanumeric, hyphens, underscores)

Beta Notice: This API is subject to change. Please provide feedback!

Type Parameters

ThrowOnError

ThrowOnError extends boolean = false

Parameters

options

Options<ClassifyDocumentsApiV1ClassifierClassifyPostData, ThrowOnError>

Returns

RequestResult<ClassifyResponse, HttpValidationError, ThrowOnError>