classifyDocumentsApiV1ClassifierClassifyPost
classifyDocumentsApiV1ClassifierClassifyPost<
ThrowOnError
>(options
):RequestResult
<ClassifyResponse
,HttpValidationError
,ThrowOnError
>
Defined in: packages/cloud/src/client/sdk.gen.ts:5279
Classify Documents [BETA] Classify documents based on provided rules - simplified classification system.
This is a Beta feature - API may change based on user feedback.
This endpoint supports:
- Classifying new uploaded files
- Classifying existing files by ID
- Both new files and existing file IDs in one request
v0 Features:
- Simplified Rules: Only
type
anddescription
fields needed - Matching Threshold: Confidence-based classification with configurable threshold
- Smart Classification: Filename heuristics + LLM content analysis
- Document Type Filtering: Automatically filters out non-document file types
- Fast Processing: Uses LlamaParse fast mode + GPT-4.1-nano
- Optimized Performance: Parses each file only once for all rules
Simplified Scoring Logic:
- Evaluate All Rules: Compare document against all classification rules
- Best Match Selection: Return the highest scoring rule above matching_threshold
- Unknown Classification: Return as "unknown" if no rules score above threshold
This ensures optimal classification by:
- Finding the best possible match among all rules
- Avoiding false positives with confidence thresholds
- Maximizing performance with single-pass file parsing
Rule Format:
[
\{
"type": "invoice",
"description": "contains invoice number, line items, and total amount"
\},
\{
"type": "receipt",
"description": "purchase receipt with transaction details and payment info"
\}
]
Classification Process:
- Metadata Heuristics (configurable via API):
- Document Type Filter: Only process document file types (PDF, DOC, DOCX, RTF, TXT, ODT, Pages, HTML, XML, Markdown)
- Filename Heuristics: Check if rule type appears in filename
- Content Analysis: Parse document content once and use LLM for semantic matching against all rules
- Result: Returns type, confidence score, and matched rule information
API Parameters:
matching_threshold
(0.1-0.99, default: 0.6): Minimum confidence threshold for acceptable matchesenable_metadata_heuristic
(boolean, default: true): Enable metadata-based features
Supported Document Types:
Text Documents: pdf, doc, docx, rtf, txt, odt, pages Web Documents: html, htm, xml Markup: md, markdown
Limits (Beta):
- Maximum 100 files per request
- Maximum 10 rules per request
- Rule descriptions: 10-500 characters
- Document types: 1-50 characters (alphanumeric, hyphens, underscores)
Beta Notice: This API is subject to change. Please provide feedback!
Type Parameters
ThrowOnError
ThrowOnError
extends boolean
= false
Parameters
options
Options
<ClassifyDocumentsApiV1ClassifierClassifyPostData
, ThrowOnError
>
Returns
RequestResult
<ClassifyResponse
, HttpValidationError
, ThrowOnError
>