Loader
Before you can start indexing your documents, you need to load them into memory.
All "basic" data loaders can be seen below, mapped to their respective filetypes in SimpleDirectoryReader
. More loaders are shown in the sidebar on the left.
Additionally the following loaders exist without separate documentation:
AssemblyAIReader
transcribes audio using AssemblyAI.- AudioTranscriptReader: loads entire transcript as a single document.
- AudioTranscriptParagraphsReader: creates a document per paragraph.
- AudioTranscriptSentencesReader: creates a document per sentence.
- AudioSubtitlesReader: creates a document containing the subtitles of a transcript.
- NotionReader loads Notion pages.
- SimpleMongoReader loads data from a MongoDB.
Check the LlamaIndexTS Github for the most up to date overview of integrations.
SimpleDirectoryReader
LlamaIndex.TS supports easy loading of files from folders using the SimpleDirectoryReader
class.
It is a simple reader that reads all files from a directory and its subdirectories.
Currently, the following readers are mapped to specific file types:
- TextFileReader:
.txt
- PDFReader:
.pdf
- PapaCSVReader:
.csv
- MarkdownReader:
.md
- DocxReader:
.docx
- HTMLReader:
.htm
,.html
- ImageReader:
.jpg
,.jpeg
,.png
,.gif
You can modify the reader three different ways:
overrideReader
overrides the reader for all file types, including unsupported ones.fileExtToReader
maps a reader to a specific file type. Can override reader for existing file types or add support for new file types.defaultReader
sets a fallback reader for files with unsupported extensions. By default it isTextFileReader
.
SimpleDirectoryReader supports up to 9 concurrent requests. Use the numWorkers
option to set the number of concurrent requests. By default it runs in sequential mode, i.e. set to 1.
Example
API Reference
Last updated on