Logo

OpenAI

Installation

npm i llamaindex @llamaindex/openai
pnpm add llamaindex @llamaindex/openai
yarn add llamaindex @llamaindex/openai
bun add llamaindex @llamaindex/openai
import { OpenAI } from "@llamaindex/openai";
import { Settings } from "llamaindex";

Settings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0, apiKey: <YOUR_API_KEY> });

You can setup the apiKey on the environment variables, like:

export OPENAI_API_KEY="<YOUR_API_KEY>"

You can optionally set a custom base URL, like:

export OPENAI_BASE_URL="https://api.scaleway.ai/v1"

or

Settings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0, apiKey: <YOUR_API_KEY>, baseURL: "https://api.scaleway.ai/v1" });

Using OpenAI Responses API

The OpenAI Responses API provides enhanced functionality for handling complex interactions, including built-in tools, annotations, and streaming responses. Here's how to use it:

Basic Setup

import { openaiResponses } from "@llamaindex/openai";

const llm = openaiResponses({
  model: "gpt-4o",
  temperature: 0.1,
  maxOutputTokens: 1000
});

Message Content Types

The API supports different types of message content, including text and images:

const response = await llm.chat({
  messages: [
    {
      role: "user",
      content: [
        {
          type: "input_text",
          text: "What's in this image?"
        },
        {
          type: "input_image",
          image_url: "https://example.com/image.jpg",
          detail: "auto" // Optional: can be "auto", "low", or "high"
        }
      ]
    }
  ]
});

Advanced Features

Built-in Tools

const llm = openaiResponses({
  model: "gpt-4o",
  builtInTools: [
    {
      type: "function",
      name: "search_files",
      description: "Search through available files"
    }
  ],
  strict: true // Enable strict mode for tool calls
});

Response Tracking and Storage

const llm = openaiResponses({
  trackPreviousResponses: true, // Enable response tracking
  store: true, // Store responses for future reference
  user: "user-123", // Associate responses with a user
  callMetadata: { // Add custom metadata
    sessionId: "session-123",
    context: "customer-support"
  }
});

Streaming Responses

const response = await llm.chat({
  messages: [
    {
      role: "user",
      content: "Generate a long response"
    }
  ],
  stream: true // Enable streaming
});

for await (const chunk of response) {
  console.log(chunk.delta); // Process each chunk of the response
}

Configuration Options

The OpenAI Responses API supports various configuration options:

const llm = openaiResponses({
  // Model and basic settings
  model: "gpt-4o",
  temperature: 0.1,
  topP: 1,
  maxOutputTokens: 1000,
  
  // API configuration
  apiKey: "your-api-key",
  baseURL: "custom-endpoint",
  maxRetries: 10,
  timeout: 60000,
  
  // Response handling
  trackPreviousResponses: false,
  store: false,
  strict: false,
  
  // Additional options
  instructions: "Custom instructions for the model",
  truncation: "auto", // Can be "auto", "disabled", or null
  include: ["citations", "reasoning"] // Specify what to include in responses
});

Response Structure

The API returns responses with rich metadata and optional annotations:

interface ResponseStructure {
  message: {
    content: string;
    role: "assistant";
    options: {
      built_in_tool_calls: Array<ToolCall>;
      annotations?: Array<Citation | URLCitation | FilePath>;
      refusal?: string;
      reasoning?: ReasoningItem;
      usage?: ResponseUsage;
      toolCall?: Array<PartialToolCall>;
    }
  }
}

Best Practices

  1. Use trackPreviousResponses when you need conversation continuity
  2. Enable strict mode when using tools to ensure accurate function calls
  3. Set appropriate maxOutputTokens to control response length
  4. Use annotations to track citations and references in responses
  5. Implement error handling for potential API failures and retries

Using JSON Response Format

You can configure OpenAI to return responses in JSON format:

Settings.llm = new OpenAI({ 
  model: "gpt-4o", 
  temperature: 0,
  responseFormat: { type: "json_object" }  
});

// You can also use a Zod schema to validate the response structure
import { z } from "zod";

const responseSchema = z.object({
  summary: z.string(),  
  topics: z.array(z.string()),
  sentiment: z.enum(["positive", "negative", "neutral"])
});

Settings.llm = new OpenAI({ 
  model: "gpt-4o", 
  temperature: 0,
  responseFormat: responseSchema  
});

Response Formats

The OpenAI LLM supports different response formats to structure the output in specific ways. There are two main approaches to formatting responses:

1. JSON Object Format

The simplest way to get structured JSON responses is using the json_object response format:

Settings.llm = new OpenAI({ 
  model: "gpt-4o", 
  temperature: 0,
  responseFormat: { type: "json_object" }  
});

const response = await llm.chat({
  messages: [
    {
      role: "system",
      content: "You are a helpful assistant that outputs JSON."
    },
    {
      role: "user", 
      content: "Summarize this meeting transcript"
    }
  ]
});

// Response will be valid JSON
console.log(response.message.content);

2. Schema Validation with Zod

For more robust type safety and validation, you can use Zod schemas to define the expected response structure:

import { z } from "zod";

// Define the response schema
const meetingSchema = z.object({
  summary: z.string(),
  participants: z.array(z.string()),
  actionItems: z.array(z.string()),
  nextSteps: z.string()
});

// Configure the LLM with the schema
Settings.llm = new OpenAI({ 
  model: "gpt-4o", 
  temperature: 0,
  responseFormat: meetingSchema
});

const response = await llm.chat({
  messages: [
    {
      role: "user",
      content: "Summarize this meeting transcript" 
    }
  ]
});

// Response will be typed and validated according to the schema
const result = response.message.content;
console.log(result.summary);
console.log(result.actionItems);

Response Format Options

The response format can be configured in two ways:

  1. At LLM initialization:
const llm = new OpenAI({
  model: "gpt-4o",
  responseFormat: { type: "json_object" } // or a Zod schema
});
  1. Per request:
const response = await llm.chat({
  messages: [...],
  responseFormat: { type: "json_object" } // or a Zod schema
});

The response format options are:

  • { type: "json_object" } - Returns responses as JSON objects
  • zodSchema - A Zod schema that defines and validates the response structure

Best Practices

  1. Use JSON object format for simple structured responses
  2. Use Zod schemas when you need:
    • Type safety
    • Response validation
    • Complex nested structures
    • Specific field constraints
  3. Set a low temperature (e.g. 0) when using structured outputs for more reliable formatting
  4. Include clear instructions in system or user messages about the expected response format
  5. Handle potential parsing errors when working with JSON responses

Load and index documents

For this example, we will use a single document. In a real-world scenario, you would have multiple documents to index.

import { Document, VectorStoreIndex } from "llamaindex";

const document = new Document({ text: essay, id_: "essay" });

const index = await VectorStoreIndex.fromDocuments([document]);

Query

const queryEngine = index.asQueryEngine();

const query = "What is the meaning of life?";

const results = await queryEngine.query({
  query,
});

Full Example

import { OpenAI } from "@llamaindex/openai";
import { Document, Settings, VectorStoreIndex } from "llamaindex";

// Use the OpenAI LLM
Settings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0 });

async function main() {
  const document = new Document({ text: essay, id_: "essay" });

  // Load and index documents
  const index = await VectorStoreIndex.fromDocuments([document]);

  // get retriever
  const retriever = index.asRetriever();

  // Create a query engine
  const queryEngine = index.asQueryEngine({
    retriever,
  });

  const query = "What is the meaning of life?";

  // Query
  const response = await queryEngine.query({
    query,
  });

  // Log the response
  console.log(response.response);
}

API Reference

OpenAI Live LLM

The OpenAI Live LLM integration in LlamaIndex provides real-time chat capabilities with support for audio streaming and tool calling.

Basic Usage

import { openai } from "@llamaindex/openai";
import { tool, ModalityType } from "llamaindex";

// Get the ephimeral key on the server 
const serverllm = openai({
  apiKey: "your-api-key", 
  model: "gpt-4o-realtime-preview-2025-06-03",
});

// Get an ephemeral key 
// Usually this code is run on the server and the ephemeral key is passed to the
// client - the ephemeral key can be securely used on the client side
const ephemeralKey = await serverllm.live.getEphemeralKey();

// Create a client-side LLM instance with the ephemeral key
const llm = openai({
  apiKey: ephemeralKey,
  model: "gpt-4o-realtime-preview-2025-06-03"
});

// Create a live sessionimport { tool } from "llamaindex";
const session = await llm.live.connect({
  systemInstruction: "You are a helpful assistant.",
});

// Send a message
session.sendMessage({
  content: "Hello!",
  role: "user",
});

Tool Integration

Tools are handled server-side, making it simple to pass them to the live session:

// Define your tools
const weatherTool = tool({
  name: "weather",
  description: "Get the weather for a location",
  parameters: z.object({
    location: z.string().describe("The location to get weather for"),
  }),
  execute: async ({ location }) => {
    return `The weather in ${location} is sunny`;
  },
});

// Create session with tools
const session = await llm.live.connect({
  systemInstruction: "You are a helpful assistant.",
  tools: [weatherTool],
});

Audio Support

For audio capabilities:

// Get microphone access
const userStream = await navigator.mediaDevices.getUserMedia({
  audio: true,
});

// Create session with audio
const session = await llm.live.connect({
  audioConfig: {
    stream: userStream,
    onTrack: (remoteStream) => {
      // Handle incoming audio
      audioElement.srcObject = remoteStream;
    },
  },
});

Event Handling

Listen to events from the session:

for await (const event of session.streamEvents()) {
  if (liveEvents.open.include(event)) {
    // Connection established
    console.log("Connected!");
  } else if (liveEvents.text.include(event)) {
    // Received text response
    console.log("Assistant:", event.text);
  }
}

Capabilities

The OpenAI Live LLM supports:

  • Real-time text chat
  • Audio streaming (if configured)
  • Tool calling (server-side execution)
  • Ephemeral key generation for secure sessions

API Reference

LiveLLM Methods

// Get an ephemeral key // Usually this code is run on the server and the ephemeral key is passed to the // client - the ephemeral key can be securely used on the client side

connect(config?: LiveConnectConfig)

Creates a new live session.

interface LiveConnectConfig {
  systemInstruction?: string;
  tools?: BaseTool[];
  audioConfig?: AudioConfig;
  responseModality?: ModalityType[];
}

getEphemeralKey()

Gets a temporary key for the session.

LiveLLMSession Methods

sendMessage(message: ChatMessage)

Sends a message to the assistant.

interface ChatMessage {
  content: string | MessageContentDetail[];
  role: "user" | "assistant";
}

disconnect()

Closes the session and cleans up resources.

Error Handling

try {
  const session = await llm.live.connect();
} catch (error) {
  if (error instanceof Error) {
    console.error("Connection failed:", error.message);
  }
}

Best Practices

  1. Tool Definition

    • Keep tool implementations server-side
    • Use clear descriptions for tools
    • Handle tool errors gracefully
  2. Session Management

    • Always disconnect sessions when done
    • Clean up audio resources
    • Handle reconnection scenarios
  3. Security

    • Use ephemeral keys for sessions
    • Validate tool inputs
    • Secure API key handling
Edit on GitHub

Last updated on