OpenAI
Installation
npm i llamaindex @llamaindex/openai
pnpm add llamaindex @llamaindex/openai
yarn add llamaindex @llamaindex/openai
bun add llamaindex @llamaindex/openai
import { OpenAI } from "@llamaindex/openai";
import { Settings } from "llamaindex";
Settings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0, apiKey: <YOUR_API_KEY> });
You can setup the apiKey on the environment variables, like:
export OPENAI_API_KEY="<YOUR_API_KEY>"
You can optionally set a custom base URL, like:
export OPENAI_BASE_URL="https://api.scaleway.ai/v1"
or
Settings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0, apiKey: <YOUR_API_KEY>, baseURL: "https://api.scaleway.ai/v1" });
Using OpenAI Responses API
The OpenAI Responses API provides enhanced functionality for handling complex interactions, including built-in tools, annotations, and streaming responses. Here's how to use it:
Basic Setup
import { openaiResponses } from "@llamaindex/openai";
const llm = openaiResponses({
model: "gpt-4o",
temperature: 0.1,
maxOutputTokens: 1000
});
Message Content Types
The API supports different types of message content, including text and images:
const response = await llm.chat({
messages: [
{
role: "user",
content: [
{
type: "input_text",
text: "What's in this image?"
},
{
type: "input_image",
image_url: "https://example.com/image.jpg",
detail: "auto" // Optional: can be "auto", "low", or "high"
}
]
}
]
});
Advanced Features
Built-in Tools
const llm = openaiResponses({
model: "gpt-4o",
builtInTools: [
{
type: "function",
name: "search_files",
description: "Search through available files"
}
],
strict: true // Enable strict mode for tool calls
});
Response Tracking and Storage
const llm = openaiResponses({
trackPreviousResponses: true, // Enable response tracking
store: true, // Store responses for future reference
user: "user-123", // Associate responses with a user
callMetadata: { // Add custom metadata
sessionId: "session-123",
context: "customer-support"
}
});
Streaming Responses
const response = await llm.chat({
messages: [
{
role: "user",
content: "Generate a long response"
}
],
stream: true // Enable streaming
});
for await (const chunk of response) {
console.log(chunk.delta); // Process each chunk of the response
}
Configuration Options
The OpenAI Responses API supports various configuration options:
const llm = openaiResponses({
// Model and basic settings
model: "gpt-4o",
temperature: 0.1,
topP: 1,
maxOutputTokens: 1000,
// API configuration
apiKey: "your-api-key",
baseURL: "custom-endpoint",
maxRetries: 10,
timeout: 60000,
// Response handling
trackPreviousResponses: false,
store: false,
strict: false,
// Additional options
instructions: "Custom instructions for the model",
truncation: "auto", // Can be "auto", "disabled", or null
include: ["citations", "reasoning"] // Specify what to include in responses
});
Response Structure
The API returns responses with rich metadata and optional annotations:
interface ResponseStructure {
message: {
content: string;
role: "assistant";
options: {
built_in_tool_calls: Array<ToolCall>;
annotations?: Array<Citation | URLCitation | FilePath>;
refusal?: string;
reasoning?: ReasoningItem;
usage?: ResponseUsage;
toolCall?: Array<PartialToolCall>;
}
}
}
Best Practices
- Use
trackPreviousResponses
when you need conversation continuity - Enable
strict
mode when using tools to ensure accurate function calls - Set appropriate
maxOutputTokens
to control response length - Use
annotations
to track citations and references in responses - Implement error handling for potential API failures and retries
Using JSON Response Format
You can configure OpenAI to return responses in JSON format:
Settings.llm = new OpenAI({
model: "gpt-4o",
temperature: 0,
responseFormat: { type: "json_object" }
});
// You can also use a Zod schema to validate the response structure
import { z } from "zod";
const responseSchema = z.object({
summary: z.string(),
topics: z.array(z.string()),
sentiment: z.enum(["positive", "negative", "neutral"])
});
Settings.llm = new OpenAI({
model: "gpt-4o",
temperature: 0,
responseFormat: responseSchema
});
Response Formats
The OpenAI LLM supports different response formats to structure the output in specific ways. There are two main approaches to formatting responses:
1. JSON Object Format
The simplest way to get structured JSON responses is using the json_object
response format:
Settings.llm = new OpenAI({
model: "gpt-4o",
temperature: 0,
responseFormat: { type: "json_object" }
});
const response = await llm.chat({
messages: [
{
role: "system",
content: "You are a helpful assistant that outputs JSON."
},
{
role: "user",
content: "Summarize this meeting transcript"
}
]
});
// Response will be valid JSON
console.log(response.message.content);
2. Schema Validation with Zod
For more robust type safety and validation, you can use Zod schemas to define the expected response structure:
import { z } from "zod";
// Define the response schema
const meetingSchema = z.object({
summary: z.string(),
participants: z.array(z.string()),
actionItems: z.array(z.string()),
nextSteps: z.string()
});
// Configure the LLM with the schema
Settings.llm = new OpenAI({
model: "gpt-4o",
temperature: 0,
responseFormat: meetingSchema
});
const response = await llm.chat({
messages: [
{
role: "user",
content: "Summarize this meeting transcript"
}
]
});
// Response will be typed and validated according to the schema
const result = response.message.content;
console.log(result.summary);
console.log(result.actionItems);
Response Format Options
The response format can be configured in two ways:
- At LLM initialization:
const llm = new OpenAI({
model: "gpt-4o",
responseFormat: { type: "json_object" } // or a Zod schema
});
- Per request:
const response = await llm.chat({
messages: [...],
responseFormat: { type: "json_object" } // or a Zod schema
});
The response format options are:
{ type: "json_object" }
- Returns responses as JSON objectszodSchema
- A Zod schema that defines and validates the response structure
Best Practices
- Use JSON object format for simple structured responses
- Use Zod schemas when you need:
- Type safety
- Response validation
- Complex nested structures
- Specific field constraints
- Set a low temperature (e.g. 0) when using structured outputs for more reliable formatting
- Include clear instructions in system or user messages about the expected response format
- Handle potential parsing errors when working with JSON responses
Load and index documents
For this example, we will use a single document. In a real-world scenario, you would have multiple documents to index.
import { Document, VectorStoreIndex } from "llamaindex";
const document = new Document({ text: essay, id_: "essay" });
const index = await VectorStoreIndex.fromDocuments([document]);
Query
const queryEngine = index.asQueryEngine();
const query = "What is the meaning of life?";
const results = await queryEngine.query({
query,
});
Full Example
import { OpenAI } from "@llamaindex/openai";
import { Document, Settings, VectorStoreIndex } from "llamaindex";
// Use the OpenAI LLM
Settings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0 });
async function main() {
const document = new Document({ text: essay, id_: "essay" });
// Load and index documents
const index = await VectorStoreIndex.fromDocuments([document]);
// get retriever
const retriever = index.asRetriever();
// Create a query engine
const queryEngine = index.asQueryEngine({
retriever,
});
const query = "What is the meaning of life?";
// Query
const response = await queryEngine.query({
query,
});
// Log the response
console.log(response.response);
}
API Reference
OpenAI Live LLM
The OpenAI Live LLM integration in LlamaIndex provides real-time chat capabilities with support for audio streaming and tool calling.
Basic Usage
import { openai } from "@llamaindex/openai";
import { tool, ModalityType } from "llamaindex";
// Get the ephimeral key on the server
const serverllm = openai({
apiKey: "your-api-key",
model: "gpt-4o-realtime-preview-2025-06-03",
});
// Get an ephemeral key
// Usually this code is run on the server and the ephemeral key is passed to the
// client - the ephemeral key can be securely used on the client side
const ephemeralKey = await serverllm.live.getEphemeralKey();
// Create a client-side LLM instance with the ephemeral key
const llm = openai({
apiKey: ephemeralKey,
model: "gpt-4o-realtime-preview-2025-06-03"
});
// Create a live sessionimport { tool } from "llamaindex";
const session = await llm.live.connect({
systemInstruction: "You are a helpful assistant.",
});
// Send a message
session.sendMessage({
content: "Hello!",
role: "user",
});
Tool Integration
Tools are handled server-side, making it simple to pass them to the live session:
// Define your tools
const weatherTool = tool({
name: "weather",
description: "Get the weather for a location",
parameters: z.object({
location: z.string().describe("The location to get weather for"),
}),
execute: async ({ location }) => {
return `The weather in ${location} is sunny`;
},
});
// Create session with tools
const session = await llm.live.connect({
systemInstruction: "You are a helpful assistant.",
tools: [weatherTool],
});
Audio Support
For audio capabilities:
// Get microphone access
const userStream = await navigator.mediaDevices.getUserMedia({
audio: true,
});
// Create session with audio
const session = await llm.live.connect({
audioConfig: {
stream: userStream,
onTrack: (remoteStream) => {
// Handle incoming audio
audioElement.srcObject = remoteStream;
},
},
});
Event Handling
Listen to events from the session:
for await (const event of session.streamEvents()) {
if (liveEvents.open.include(event)) {
// Connection established
console.log("Connected!");
} else if (liveEvents.text.include(event)) {
// Received text response
console.log("Assistant:", event.text);
}
}
Capabilities
The OpenAI Live LLM supports:
- Real-time text chat
- Audio streaming (if configured)
- Tool calling (server-side execution)
- Ephemeral key generation for secure sessions
API Reference
LiveLLM Methods
// Get an ephemeral key // Usually this code is run on the server and the ephemeral key is passed to the // client - the ephemeral key can be securely used on the client side
connect(config?: LiveConnectConfig)
Creates a new live session.
interface LiveConnectConfig {
systemInstruction?: string;
tools?: BaseTool[];
audioConfig?: AudioConfig;
responseModality?: ModalityType[];
}
getEphemeralKey()
Gets a temporary key for the session.
LiveLLMSession Methods
sendMessage(message: ChatMessage)
Sends a message to the assistant.
interface ChatMessage {
content: string | MessageContentDetail[];
role: "user" | "assistant";
}
disconnect()
Closes the session and cleans up resources.
Error Handling
try {
const session = await llm.live.connect();
} catch (error) {
if (error instanceof Error) {
console.error("Connection failed:", error.message);
}
}
Best Practices
-
Tool Definition
- Keep tool implementations server-side
- Use clear descriptions for tools
- Handle tool errors gracefully
-
Session Management
- Always disconnect sessions when done
- Clean up audio resources
- Handle reconnection scenarios
-
Security
- Use ephemeral keys for sessions
- Validate tool inputs
- Secure API key handling
Last updated on