Loader

Before you can start indexing your documents, you need to load them into memory.

All "basic" data loaders can be seen below, mapped to their respective filetypes in SimpleDirectoryReader. More loaders are shown in the sidebar on the left. Additionally the following loaders exist without separate documentation:

AssemblyAIReader transcribes audio using AssemblyAI.
- AudioTranscriptReader: loads entire transcript as a single document.
- AudioTranscriptParagraphsReader: creates a document per paragraph.
- AudioTranscriptSentencesReader: creates a document per sentence.
- AudioSubtitlesReader: creates a document containing the subtitles of a transcript.
NotionReader loads Notion pages.
SimpleMongoReader loads data from a MongoDB.

Check the LlamaIndexTS Github for the most up to date overview of integrations.

SimpleDirectoryReader

Open in StackBlitz

LlamaIndex.TS supports easy loading of files from folders using the SimpleDirectoryReader class.

It is a simple reader that reads all files from a directory and its subdirectories.

import { SimpleDirectoryReader } from "@llamaindex/readers/directory";
 
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("../data");
 
documents.forEach((doc) => {
  console.log(`document (${doc.id_}):`, doc.getText());
});

Currently, the following readers are mapped to specific file types:

TextFileReader: .txt
PDFReader: .pdf
PapaCSVReader: .csv
MarkdownReader: .md
DocxReader: .docx
HTMLReader: .htm, .html
ImageReader: .jpg, .jpeg, .png, .gif

You can modify the reader three different ways:

overrideReader overrides the reader for all file types, including unsupported ones.
fileExtToReader maps a reader to a specific file type. Can override reader for existing file types or add support for new file types.
defaultReader sets a fallback reader for files with unsupported extensions. By default it is TextFileReader.

SimpleDirectoryReader supports up to 9 concurrent requests. Use the numWorkers option to set the number of concurrent requests. By default it runs in sequential mode, i.e. set to 1.

Example

import {
  FILE_EXT_TO_READER,
  SimpleDirectoryReader,
} from "@llamaindex/readers/directory";
import { TextFileReader } from "@llamaindex/readers/text";
import type { Document, Metadata } from "llamaindex";
import { FileReader } from "llamaindex";
 
class ZipReader extends FileReader {
  loadDataAsContent(fileContent: Uint8Array): Promise<Document<Metadata>[]> {
    throw new Error("Implement me");
  }
}
 
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({
  directoryPath: "../data",
  defaultReader: new TextFileReader(),
  fileExtToReader: {
    ...FILE_EXT_TO_READER,
    zip: new ZipReader(),
  },
});
 
documents.forEach((doc) => {
  console.log(`document (${doc.id_}):`, doc.getText());
});

API Reference

SimpleDirectoryReader

Loader

SimpleDirectoryReader

Example

API Reference

On this page