Local LLMs
LlamaIndex.TS supports OpenAI and other remote LLM APIs. You can also run a local LLM on your machine!
Using a local model via Ollama
The easiest way to run a local LLM is via the great work of our friends at Ollama, who provide a simple to use client that will download, install and run a growing range of models for you.
Install Ollama
They provide a one-click installer for Mac, Linux and Windows on their home page.
Pick and run a model
Since we're going to be doing agentic work, we'll need a very capable model, but the largest models are hard to run on a laptop. We think mixtral 8x7b
is a good balance between power and resources, but llama3
is another great option. You can run Mixtral by running
The first time you run it will also automatically download and install the model for you.
Switch the LLM in your code
To switch the LLM in your code, you first need to make sure to install the package for the Ollama model provider:
Then, to tell LlamaIndex to use a local LLM, use the Settings
object:
Use local embeddings
If you're doing retrieval-augmented generation, LlamaIndex.TS will also call out to OpenAI to index and embed your data. To be entirely local, you can use a local embedding model from Huggingface like this:
First install the Huggingface model provider package:
And then set the embedding model in your code:
The first time this runs it will download the embedding model to run it.
Try it out
With a local LLM and local embeddings in place, you can perform RAG as usual and everything will happen on your machine without calling an API:
You can see the full example file.
Last updated on