Skip to main content

Loader

Before you can start indexing your documents, you need to load them into memory.

SimpleDirectoryReader

Open in StackBlitz

LlamaIndex.TS supports easy loading of files from folders using the SimpleDirectoryReader class.

It is a simple reader that reads all files from a directory and its subdirectories.

import { SimpleDirectoryReader } from "llamaindex/readers/SimpleDirectoryReader";
// or
// import { SimpleDirectoryReader } from 'llamaindex'

const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("../data");

documents.forEach((doc) => {
console.log(`document (${doc.id_}):`, doc.getText());
});

Currently, it supports reading .csv, .docx, .html, .md and .pdf files, but support for other file types is planned.

Also, you can provide a defaultReader as a fallback for files with unsupported extensions. Or pass new readers for fileExtToReader to support more file types.

import type { BaseReader, Document, Metadata } from "llamaindex";
import {
FILE_EXT_TO_READER,
SimpleDirectoryReader,
} from "llamaindex/readers/SimpleDirectoryReader";
import { TextFileReader } from "llamaindex/readers/TextFileReader";

class ZipReader implements BaseReader {
loadData(...args: any[]): Promise<Document<Metadata>[]> {
throw new Error("Implement me");
}
}

const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({
directoryPath: "../data",
defaultReader: new TextFileReader(),
fileExtToReader: {
...FILE_EXT_TO_READER,
zip: new ZipReader(),
},
});

documents.forEach((doc) => {
console.log(`document (${doc.id_}):`, doc.getText());
});

LlamaParse

LlamaParse is an API created by LlamaIndex to efficiently parse files, e.g. it's great at converting PDF tables into markdown.

To use it, first login and get an API key from https://cloud.llamaindex.ai. Make sure to store the key in the environment variable LLAMA_CLOUD_API_KEY.

Then, you can use the LlamaParseReader class to read a local PDF file and convert it into a markdown document that can be used by LlamaIndex:

import { LlamaParseReader, VectorStoreIndex } from "llamaindex";

async function main() {
// Load PDF using LlamaParse
const reader = new LlamaParseReader({ resultType: "markdown" });
const documents = await reader.loadData("../data/TOS.pdf");

// Split text and create embeddings. Store them in a VectorStoreIndex
const index = await VectorStoreIndex.fromDocuments(documents);

// Query the index
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({
query: "What is the license grant in the TOS?",
});

// Output response
console.log(response.toString());
}

main().catch(console.error);

Alternatively, you can set the resultType option to text to get the parsed document as a text string.

API Reference