Local LLMs

LlamaIndex.TS supports OpenAI and other remote LLM APIs. You can also run a local LLM on your machine!

Using a local model via Ollama

The easiest way to run a local LLM is via the great work of our friends at Ollama, who provide a simple to use client that will download, install and run a growing range of models for you.

Install Ollama

They provide a one-click installer for Mac, Linux and Windows on their home page.

Pick and run a model

Since we're going to be doing agentic work, we'll need a very capable model, but the largest models are hard to run on a laptop. We think mixtral 8x7b is a good balance between power and resources, but llama3 is another great option. You can run Mixtral by running

ollama run mixtral:8x7b

The first time you run it will also automatically download and install the model for you.

Switch the LLM in your code

To tell LlamaIndex to use a local LLM, use the Settings object:

Settings.llm = new Ollama({
  model: "mixtral:8x7b",
});

Use local embeddings

If you're doing retrieval-augmented generation, LlamaIndex.TS will also call out to OpenAI to index and embed your data. To be entirely local, you can use a local embedding model like this:

Settings.embedModel = new HuggingFaceEmbedding({
  modelType: "BAAI/bge-small-en-v1.5",
  quantized: false,
});

The first time this runs it will download the embedding model to run it.

Try it out

With a local LLM and local embeddings in place, you can perform RAG as usual and everything will happen on your machine without calling an API:

async function main() {
  // Load essay from abramov.txt in Node
  const path = "node_modules/llamaindex/examples/abramov.txt";

  const essay = await fs.readFile(path, "utf-8");

  // Create Document object with essay
  const document = new Document({ text: essay, id_: path });

  // Split text and create embeddings. Store them in a VectorStoreIndex
  const index = await VectorStoreIndex.fromDocuments([document]);

  // Query the index
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({
    query: "What did the author do in college?",
  });

  // Output response
  console.log(response.toString());
}

main().catch(console.error);

You can see the full example file.

Using a local model via Ollama​

Install Ollama​

Pick and run a model​

Switch the LLM in your code​

Use local embeddings​

Try it out​