Skip to main content

Local LLMs

LlamaIndex.TS supports OpenAI and other remote LLM APIs. You can also run a local LLM on your machine!

Using a local model via Ollama

The easiest way to run a local LLM is via the great work of our friends at Ollama, who provide a simple to use client that will download, install and run a growing range of models for you.

Install Ollama

They provide a one-click installer for Mac, Linux and Windows on their home page.

Pick and run a model

Since we're going to be doing agentic work, we'll need a very capable model, but the largest models are hard to run on a laptop. We think mixtral 8x7b is a good balance between power and resources, but llama3 is another great option. You can run Mixtral by running

ollama run mixtral:8x7b

The first time you run it will also automatically download and install the model for you.

Switch the LLM in your code

To tell LlamaIndex to use a local LLM, use the Settings object:

Settings.llm = new Ollama({
model: "mixtral:8x7b",

Use local embeddings

If you're doing retrieval-augmented generation, LlamaIndex.TS will also call out to OpenAI to index and embed your data. To be entirely local, you can use a local embedding model like this:

Settings.embedModel = new HuggingFaceEmbedding({
modelType: "BAAI/bge-small-en-v1.5",
quantized: false,

The first time this runs it will download the embedding model to run it.

Try it out

With a local LLM and local embeddings in place, you can perform RAG as usual and everything will happen on your machine without calling an API:

async function main() {
// Load essay from abramov.txt in Node
const path = "node_modules/llamaindex/examples/abramov.txt";

const essay = await fs.readFile(path, "utf-8");

// Create Document object with essay
const document = new Document({ text: essay, id_: path });

// Split text and create embeddings. Store them in a VectorStoreIndex
const index = await VectorStoreIndex.fromDocuments([document]);

// Query the index
const queryEngine = index.asQueryEngine();

const response = await queryEngine.query({
query: "What did the author do in college?",

// Output response


You can see the full example file.