Logo

Metadata Extraction Usage Pattern

You can use LLMs to automate metadata extraction with our Metadata Extractor modules.

Our metadata extractor modules include the following "feature extractors":

  • SummaryExtractor - automatically extracts a summary over a set of Nodes
  • QuestionsAnsweredExtractor - extracts a set of questions that each Node can answer
  • TitleExtractor - extracts a title over the context of each Node by document and combine them
  • KeywordExtractor - extracts keywords over the context of each Node

Then you can chain the Metadata Extractors with the IngestionPipeline to extract metadata from a set of documents.

import { Document, IngestionPipeline, TitleExtractor, QuestionsAnsweredExtractor } from "llamaindex";
import { OpenAI } from "@llamaindex/openai";
 
async function main() {
  const pipeline = new IngestionPipeline({
    transformations: [
      new TitleExtractor(),
      new QuestionsAnsweredExtractor({
        questions: 5,
      }),
    ],
  });
 
  const nodes = await pipeline.run({
    documents: [
      new Document({ text: "I am 10 years old. John is 20 years old." }),
    ],
  });
 
  for (const node of nodes) {
    console.log(node.metadata);
  }
}
 
main().then(() => console.log("done"));

API Reference

Edit on GitHub

Last updated on

On this page