Skip to main content

Using other LLM APIs

By default LlamaIndex.TS uses OpenAI's LLMs and embedding models, but we support lots of other LLMs including models from Mistral (Mistral, Mixtral), Anthropic (Claude) and Google (Gemini).

If you don't want to use an API at all you can run a local model

Using another LLM

You can specify what LLM LlamaIndex.TS will use on the Settings object, like this:

import { MistralAI, Settings } from "llamaindex";

Settings.llm = new MistralAI({
model: "mistral-tiny",
apiKey: "<YOUR_API_KEY>",

You can see examples of other APIs we support by checking out "Available LLMs" in the sidebar of our LLMs section.

Using another embedding model

A frequent gotcha when trying to use a different API as your LLM is that LlamaIndex will also by default index and embed your data using OpenAI's embeddings. To completely switch away from OpenAI you will need to set your embedding model as well, for example:

import { MistralAIEmbedding, Settings } from "llamaindex";

Settings.embedModel = new MistralAIEmbedding();

We support many different embeddings.

Full example

This example uses Mistral's mistral-tiny model as the LLM and Mistral for embeddings as well.

import * as fs from "fs/promises";
import {
} from "llamaindex";

// Update embed model
Settings.embedModel = new MistralAIEmbedding();
// Update llm to use MistralAI
Settings.llm = new MistralAI({ model: "mistral-tiny" });

async function rag(query: string) {
// Load essay from abramov.txt in Node
const path = "node_modules/llamaindex/examples/abramov.txt";

const essay = await fs.readFile(path, "utf-8");

// Create Document object with essay
const document = new Document({ text: essay, id_: path });

const index = await VectorStoreIndex.fromDocuments([document]);

// Query the index
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({ query });
return response.response;

(async () => {
// embeddings
const embedding = new MistralAIEmbedding();
const embeddingsResponse = await embedding.getTextEmbedding(
"What is the best French cheese?",
`MistralAI embeddings are ${embeddingsResponse.length} numbers long\n`,

// chat api (non-streaming)
const llm = new MistralAI({ model: "mistral-tiny" });
const response = await{
messages: [{ content: "What is the best French cheese?", role: "user" }],

// chat api (streaming)
const stream = await{
messages: [
{ content: "Who is the most renowned French painter?", role: "user" },
stream: true,
for await (const chunk of stream) {

// rag
const ragResponse = await rag("What did the author do in college?");