Modules/Data
Ingestion Pipeline
An IngestionPipeline
uses a concept of Transformations
that are applied to input data.
These Transformations
are applied to your input data, and the resulting nodes are either returned or inserted into a vector database (if given).
Installation
npm i llamaindex @llamaindex/openai @llamaindex/qdrant
pnpm add llamaindex @llamaindex/openai @llamaindex/qdrant
yarn add llamaindex @llamaindex/openai @llamaindex/qdrant
bun add llamaindex @llamaindex/openai @llamaindex/qdrant
Usage Pattern
The simplest usage is to instantiate an IngestionPipeline like so:
import fs from "node:fs/promises";
import { OpenAI, OpenAIEmbedding } from "@llamaindex/openai";
import {
Document,
IngestionPipeline,
MetadataMode,
TitleExtractor,
SentenceSplitter,
} from "llamaindex";
async function main() {
// Load essay from abramov.txt in Node
const path = "node_modules/llamaindex/examples/abramov.txt";
const essay = await fs.readFile(path, "utf-8");
// Create Document object with essay
const document = new Document({ text: essay, id_: path });
const pipeline = new IngestionPipeline({
transformations: [
new SentenceSplitter({ chunkSize: 1024, chunkOverlap: 20 }),
new TitleExtractor(),
new OpenAIEmbedding(),
],
});
// run the pipeline
const nodes = await pipeline.run({ documents: [document] });
// print out the result of the pipeline run
for (const node of nodes) {
console.log(node.getContent(MetadataMode.NONE));
}
}
main().catch(console.error);
Connecting to Vector Databases
When running an ingestion pipeline, you can also chose to automatically insert the resulting nodes into a remote vector store.
Then, you can construct an index from that vector store later on.
import fs from "node:fs/promises";
import { OpenAIEmbedding } from "@llamaindex/openai";
import { QdrantVectorStore } from "@llamaindex/qdrant";
import {
Document,
IngestionPipeline,
MetadataMode,
TitleExtractor,
SentenceSplitter,
VectorStoreIndex,
} from "llamaindex";
async function main() {
// Load essay from abramov.txt in Node
const path = "node_modules/llamaindex/examples/abramov.txt";
const essay = await fs.readFile(path, "utf-8");
const vectorStore = new QdrantVectorStore({
host: "http://localhost:6333",
});
// Create Document object with essay
const document = new Document({ text: essay, id_: path });
const pipeline = new IngestionPipeline({
transformations: [
new SentenceSplitter({ chunkSize: 1024, chunkOverlap: 20 }),
new TitleExtractor(),
new OpenAIEmbedding(),
],
vectorStore,
});
// run the pipeline
const nodes = await pipeline.run({ documents: [document] });
// create an index
const index = VectorStoreIndex.fromVectorStore(vectorStore);
}
main().catch(console.error);
API Reference
Edit on GitHub
Last updated on